5 minute read

This semester I have been studying Computer Theory, and I had the idea of writing a compiler which generates Turing Machines. I started by designing the language. At first I explored compiling from an imperative language similar to Rust or C, but then I realized that doing it in a functional language similar to Nix would be much easier (and cooler).

Language Design

The language aims to be as simple as possible. The entire program consists of a single anonymous function.

Types

I decided that a strongly typed language with only 4 types would be a good starting point:

  • symbol: represents a possible symbol of a cell in the turing machine’s tape.
  • union: represents a union of symbols.
  • tape: represents the tape and head of the turing machine. Tapes are restricted to ownership rules.
  • function { arg, ret }: represents a function which takes the type arg and returns the type ret.

Expressions

There are seven possible types of expressions:

  • id: a variable, bound by a function argument or by a let expression.
  • 'A': a symbol.
  • exp1 | exp2: a union of symbols, which can be used in match patterns. Assumes that both exp1 and exp2 are symbols or unions.
  • any: a special union which matches any symbol.
  • match exp { pat > exp, ... }: matches a symbol against a list of patterns, and returns the first expression found.
  • let id1 = exp1, ..., id2 = exp2, in body: binds multiple expressions to identifiers which will then be replaced by the expressions in the main expression, which is then returned. Let expressions were only added to the language to make it more readable, and recursive definitions are not supported.
  • id: exp: a lambda expression function which returns exp, with the variable id bound to it.
  • f exp: applies the function f to the expression exp (left-associative).

The root expression of the program must evaluate to a tape -> tape function, which represents the operation the machine will perform on the tape.

Match Patterns

Match patterns are unions of symbols, and may optionally capture the variable matched by prepending the pattern with id @. For example, the following match expression would produce 'A':

match 'A' {
  id @ 'A' | 'B' > id,
  'C' > 'D',
}

Built-in Functions

I added five built-in functions which are essential to the language:

  • set s t: writes the symbol s to the cell at the head of the tape t, consuming the tape and returning a new one. (symbol -> tape -> tape)
  • get t: returns the symbol at the cell at the head of the tape t. (&tape -> symbol)
  • next t: moves the head of the tape t to the right, consuming the tape and returning a new one. (tape -> tape)
  • prev t: moves the head of the tape t to the left, consuming the tape and returning a new one. (tape -> tape)
  • Y f: applies the Y-combinator to the function f, with the signature (tape -> tape) -> tape -> tape, and returns a function with the the signature tape -> tape. This function is essential to implement recursion in this language.

Recursion

Since let expressions forbid recursive definitions, the built-in Y-combinator function Y is used to implement recursion. The following example should produce a turing machine which moves the head of the tape to the right forever.

Y f: t: f (next t)

If recursive let bindings were allowed, the resulting function would be equivalent to the following:

let loop = t: loop (next t), in loop

Tape Ownership

I had to add an ownership rule to the language to ensure that, for example the following code is disallowed:

t: match get (next (set 'A' t)) {
  'A' > set 'A' t,
  'B' > set 'B' t,
}

If there were no ownership rules, the above code would be legal. Generating a turing machine from this code would be very difficult, because the tape t used in the match arm expressions refers to the old tape, which was edited in the get expression argument. What this means is that the argument of a get application must not consume the tape. An alternative, correct version of the program above would be:

t: (t: match get t {
  'A' > set 'A' t,
  'B' > set 'B' t,
}) (next (set 'A' t))

The rule is that a tape must only be consumed once. Since the root function of any program receives a tape and must return a tape, this rule means that you can’t consume a tape and then forget the new tape (e.g.: t: get (next t), the tape is consumed by the next function).

Imports

A programming language without the ability to split code between files would be very cumbersome to work with, so I added the import keyword to the language. The functionality is very simple: it just replaces the import expression with the contents of the file specified by the import expression:

# defs.tmc (in case you're wondering, .tmc stands for turing machine compiler)
let
  zero = '0',
  one = '1',
in
# main.tmc
import 'defs.tmc'
let
  flip = s: match s {
    zero > one,
    one > zero,
  },
in
  t: set (flip (get t)) t,

Example Programs

# Prints the symbol 'A' at the head of the tape, and moves the head to the
# right.
t: next (set 'A' t)
# Until an empty cell is found, moves the head to the right.
Y f: t: match get t {
  '0' | '1' > f (next t),
  ' ' > t,
}
let
  # A function which moves the head right until it finds the symbol passed as
  # argument.
  find = s: Y f: t: match get t {
    s   > t,
    any > f (next t),
  },
in
  # Moves the head right until it finds the symbol 'A'.
  t: find 'A' t

What’s next?

After I had a more or less stable language, I started working on the compiler itself. I chose Rust to write it in, because I’ve been wanting to work on a Rust project for a while, and also because Rust enums and match expressions make it very pleasant to write compilers in.

In the next post I’ll talk the issues I faced and how I implemented the compiler.

Comments