Compiling a Functional Language to Turing Machines - Part 1
This semester I have been studying Computer Theory, and I had the idea of writing a compiler which generates Turing Machines. I started by designing the language. At first I explored compiling from an imperative language similar to Rust or C, but then I realized that doing it in a functional language similar to Nix would be much easier (and cooler).
Language Design
The language aims to be as simple as possible. The entire program consists of a single anonymous function.
Types
I decided that a strongly typed language with only 4 types would be a good starting point:
symbol
: represents a possible symbol of a cell in the turing machine’s tape.union
: represents a union of symbols.tape
: represents the tape and head of the turing machine. Tapes are restricted to ownership rules.function { arg, ret }
: represents a function which takes the typearg
and returns the typeret
.
Expressions
There are seven possible types of expressions:
id
: a variable, bound by a function argument or by a let expression.'A'
: a symbol.exp1 | exp2
: a union of symbols, which can be used in match patterns. Assumes that bothexp1
andexp2
are symbols or unions.any
: a special union which matches any symbol.match exp { pat > exp, ... }
: matches a symbol against a list of patterns, and returns the first expression found.let id1 = exp1, ..., id2 = exp2, in body
: binds multiple expressions to identifiers which will then be replaced by the expressions in the main expression, which is then returned. Let expressions were only added to the language to make it more readable, and recursive definitions are not supported.id: exp
: a lambda expression function which returnsexp
, with the variableid
bound to it.f exp
: applies the functionf
to the expressionexp
(left-associative).
The root expression of the program must evaluate to a tape -> tape
function,
which represents the operation the machine will perform on the tape.
Match Patterns
Match patterns are unions of symbols, and may optionally capture the variable
matched by prepending the pattern with id @
. For example, the following match
expression would produce 'A'
:
match 'A' {
id @ 'A' | 'B' > id,
'C' > 'D',
}
Built-in Functions
I added five built-in functions which are essential to the language:
set s t
: writes the symbols
to the cell at the head of the tapet
, consuming the tape and returning a new one. (symbol -> tape -> tape
)get t
: returns the symbol at the cell at the head of the tapet
. (&tape -> symbol
)next t
: moves the head of the tapet
to the right, consuming the tape and returning a new one. (tape -> tape
)prev t
: moves the head of the tapet
to the left, consuming the tape and returning a new one. (tape -> tape
)Y f
: applies the Y-combinator to the functionf
, with the signature(tape -> tape) -> tape -> tape
, and returns a function with the the signaturetape -> tape
. This function is essential to implement recursion in this language.
Recursion
Since let expressions forbid recursive definitions, the built-in Y-combinator
function Y
is used to implement recursion. The following example should
produce a turing machine which moves the head of the tape to the right forever.
Y f: t: f (next t)
If recursive let bindings were allowed, the resulting function would be equivalent to the following:
let loop = t: loop (next t), in loop
Tape Ownership
I had to add an ownership rule to the language to ensure that, for example the following code is disallowed:
t: match get (next (set 'A' t)) {
'A' > set 'A' t,
'B' > set 'B' t,
}
If there were no ownership rules, the above code would be legal. Generating a
turing machine from this code would be very difficult, because the tape t
used in the match
arm expressions refers to the old tape, which was edited
in the get
expression argument. What this means is that the argument of a
get
application must not consume the tape. An alternative, correct version of
the program above would be:
t: (t: match get t {
'A' > set 'A' t,
'B' > set 'B' t,
}) (next (set 'A' t))
The rule is that a tape must only be consumed once. Since the root function of
any program receives a tape and must return a tape, this rule means that you
can’t consume a tape and then forget the new tape (e.g.: t: get (next t)
, the
tape is consumed by the next
function).
Imports
A programming language without the ability to split code between files would be
very cumbersome to work with, so I added the import
keyword to the language.
The functionality is very simple: it just replaces the import expression with
the contents of the file specified by the import expression:
# defs.tmc (in case you're wondering, .tmc stands for turing machine compiler)
let
zero = '0',
one = '1',
in
# main.tmc
import 'defs.tmc'
let
flip = s: match s {
zero > one,
one > zero,
},
in
t: set (flip (get t)) t,
Example Programs
# Prints the symbol 'A' at the head of the tape, and moves the head to the
# right.
t: next (set 'A' t)
# Until an empty cell is found, moves the head to the right.
Y f: t: match get t {
'0' | '1' > f (next t),
' ' > t,
}
let
# A function which moves the head right until it finds the symbol passed as
# argument.
find = s: Y f: t: match get t {
s > t,
any > f (next t),
},
in
# Moves the head right until it finds the symbol 'A'.
t: find 'A' t
What’s next?
After I had a more or less stable language, I started working on the compiler itself. I chose Rust to write it in, because I’ve been wanting to work on a Rust project for a while, and also because Rust enums and match expressions make it very pleasant to write compilers in.
In the next post I’ll talk the issues I faced and how I implemented the compiler.
Comments