Overview

A programming language is a formal notation for writing computer programs. Programming languages are defined with a strict syntax and structure which allows for another computer program to break down and translate the language into simpler instructions for the computer to execute.

High and Low Level

A high-level programming language is one which abstracts away some or many of the intricacies of a computer's system. The abstracted form is meant to more human readable, sometimes even resembling natural human language, such as SQL.

Most modern programming language are high-level, however, there is a spectrum which delegates languages at 'higher' or 'lower' level. Higher-level languages would be ones which provide more abstraction and consequentially require more processing power to translate and run.

For example, C would be a 'lower-end' language due to its low level of abstraction over machine code. JavaScript and Python on the other hand higher-end due to its higher level of abstraction.

Low-level languages on the other hand are meant to be more machine understandable. Assembly code is considered the lowest-level human-readable language. While understandable by humans, its much more geared towards how machines operate.

Compiling

Compiled languages traditionally, translate source files directly into machine code. Languages are usually complied when efficiency is a concern when running a program. This comes with the tradeoff of being somewhat more difficult to port cross-platform as the program has to be re-compiled for the target system.

More generally they translate language structures into an executable form.

Note

The act of compiling is fairly broad however and compilers can also translate source files to an intermediate machine code representation typically called (bytecode or p-code). This intermediate representation is then given to an intepreter.

Compiling Steps

Compilers usually follow a sequence of steps to translate to machine code. They can include the following:

Preprocessing expands the code into a more standard representation. This usually involves expanding out certain patterns or special syntax into simpler code.

Compiling translates the preprocessed code into assembly code.

Assembling takes the assembly code and translates that further to object code. Object code is a binary file which contains all the instructions from the compiled file.

Linking links related files to each other. If a file calls a function or attribute defined in another file, the linker links the related object files. This final step outputs an executable file.

Interpreting

Interpreted languages execute code immediately without completely translating the entire program [1]. Interpreted languages can be implemented in many ways even incorporation bits of compilation. Interpreted programs can use one of the following strategies:

In other words, they define a function that executes language syntax directly.

There are various types of interpreters each of which have a different approach to representing and executing programs. The three more commonly known interpreters are bytecode interpreters, AST interpreters, and JIT compiling interpreters.

Bytecode Interpreters

Bytecode interpreters first compile the written code to an intermediate representation called bytecode. bytecode is a highly condensed and optimized set of instructions, similar in a sense to assembly, but machine agnostic. It is called as such because each instruction starts with a byte, although some forms of bytecode may have longer instructions.

The compiled bytecode is then fed to a bytecode interpreter which processes and executes the bytecode linearly.

Abstract Syntax Tree Interpreters

An abstract syntax tree comes from formal language and is a tree-like data structure which represents the source code.

After the code is parsed into an AST. It is fed to a tree-walk interpreter which executes code as it walks the tree.

This method is more of an interpreter in the strict sense and has its own benefits and drawbacks. Benefits including a better intermediate representation of code, with drawbacks including more overhead for storing and walking the tree.

Just-in-time Compiled Interpreters

This is where the line between interpreters and compilers can blur heavily. There are a lot of misconceptions around JIT compiled interpreters but put simply, a JIT compiler continuously looks for specific code that would benefit from being compiled to machine code.

The initial steps of JIT compiled interpreters are essentially the same as most other interpreters, usually relying on an AST as the intermediate representation. However as the AST is being built the interpreter looks for patterns in the code that would indicate where compiling would be ideal over moving through the AST.

Definitions

Domain Specific Language (DSL)

This is a language specific to a set of tasks. Some examples could include vimscript, SQL, or RegEx

General Purpose Language (GPL)

Not to be confused with the GNU General Public License (GPL), a general purpose language aims to solve any arbitrary computing problem with its defined syntax.

Reference

  1. https://en.wikipedia.org/wiki/Programming_language
  2. https://en.wikipedia.org/wiki/Interpreter_(computing)
  3. https://en.wikipedia.org/wiki/Compiler

  1. This is not strictly true but generally true for most modern interpreted languages. Many advancements in interpreters have blurred the distinction between themselves and compilers ↩︎