🐌
Yours truly is not an expert in compilers but does enjoy diving into this topic and learning something new from time to time.

Writing a toy compiler is lots of fun!

Probably that’s why there’re so many of toy compilers around. And that’s also why I developed one too. So, what are some features that might set this one apart?

The language is concise but retains a degree of expressivity

A lot of mini-compilers have a language consisting of functions, primitive values and pretty much nothing else. Whereas this project’s language Cool 2020 is a small Scala subset — it has classes, inheritance, virtual dispatch, and even a very simple form of pattern matching. Implementing all of these features yourself, making them function at the assembly level provides a lot of insights into how production-grade programming languages work.

A taste of the language

Code calculating a portion of the Fibonacci sequence looks like this:

class Fib() extends IO() {
  def fib(x: Int): Int =
    if (x == 0) 0
    else if (x == 1) 1
    else fib(x - 2) + fib(x - 1);

  {
    var i: Int = 0;
    while (i <= 10) {
      out_string("fib("); out_int(i); out_string(") = ");
      out_int(fib(i));
      out_nl();
      
      i = i + 1
    }
  };
}

class Main() {
  { new Fib() };
}

Or take a look at the sources of Conway’s Game of Life

It is garbage-collected

The language’s runtime library implements a simple generational garbage collector.

It compiles down to x86-64 assembly

Then GNU as and ld come into play to build a native executable from the assembly. Many educational compilers emit MIPS assembly, and although it’s possible to run it using an emulator, to me, running a native executable produced by your own compiler feels much more rewarding.

I also prefer emitting assembly instead of, for instance, C. It helps you understand the inner workings of various things that developers often take for granted:

  • Managing a function’s stack frame
  • Calling conventions
  • Finding the memory address of an identifier
  • Converting expressions into assembly
  • And so on, and so forth

High-level languages are a bit too… well, high-level for someone who wants to grasp how these things work under the hood. However, if you’re more interested in designing a programming language than the nitty-gritty of compilers, targeting a high-level language like C might actually be a great idea.

Human-readable assembly

The emitted assembly code contains a lot of hopefully useful comments. The idea here is to help understand the assembly and relate it back to the source code as much as possible.

    .text

    # ../CoolPrograms/Runtime/Fibonacci.cool(1,7): Fib
    .global Fib..ctor
Fib..ctor:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $64, %rsp
    # store actuals on the stack
    movq    %rdi, -8(%rbp)
    # store callee-saved regs on the stack
    movq    %rbx, -24(%rbp)
    movq    %r12, -32(%rbp)
    movq    %r13, -40(%rbp)
    movq    %r14, -48(%rbp)
    movq    %r15, -56(%rbp)
    # actual #0
    movq    -8(%rbp), %rbx
    subq    $8, %rsp
    movq    %rbx, 0(%rsp)                         # actual #0
    # load up to 6 first actuals into regs
    movq    0(%rsp), %rdi
    # remove the register-loaded actuals from stack
    addq    $8, %rsp
    call    IO..ctor                              # super..ctor
    movq    %rax, %rbx                            # returned value
    # ../CoolPrograms/Runtime/Fibonacci.cool(8,18): 0
    movq    $INT_0, %rbx
    # ../CoolPrograms/Runtime/Fibonacci.cool(8,5): var i: Int = 0
    movq    %rbx, -16(%rbp)                       # i
.L6_WHILE_COND:
    # ../CoolPrograms/Runtime/Fibonacci.cool(9,12): i
    movq    -16(%rbp), %r10                       # i
    # ../CoolPrograms/Runtime/Fibonacci.cool(9,17): 10
    movq    $INT_10, %r11

It is cross-platform

The two supported hosts are x86-64 Windows and Linux. And in general, the project is based on cross-platform technologies. The compiler itself runs on .NET. The GNU tools as and ld are available on Linux and Windows (and many, many other, of course). The language’s runtime is implemented in assembly. The code responsible for calling Windows API or Linux system functions reside in two separate assembly files. One of these gets linked with a compiler-produced executable depending on the host OS.

The test suite

Approximately 313 automated tests make up the test suite. One nice consequence of generating native executables is — an automated test can compile a program, run it, and compare the program’s output to the expected values. Out of the 313 tests, there are 52 that do exactly that.

Maybe the only hobby-compiler that has a demo video :)

Here’s how compiling a hello world looks like.

A compilation session demo