I’m building a small compiler for a toy language which emits .NET executables, implemented in F#. Demo compilers are a dime a dozen, but there are a few things which make this project distinct:
- No lexer or parser generators are used. The entire compiler is written from the ground up, and is intended to be fairly simple code.
- The source code is idiomatic F#, and is almost totally purely functional.
The project started as a fairly simple port of Jack Crenshaw’s code from his classic series, "Lets Build A Compiler," but deviated pretty quickly as I found I wanted to take the project in a slightly different direction.
- In the name of simplicity, Crenshaw avoids using distinct lexing, parsing, etc. phases in his code. I completely understand his reasoning, but found that this was confusing to me personally, because the code I was writing did not resemble the code I read in compiler textbooks. Indeed, many "educational" compiler implementations swing the other way, using a "nanopass" (many distinct phases) design.
- F# code directly ported from Pascal is ugly, and I wanted to use purely functional code whenever possible.
- I had been attempting to port one "chapter" of Crenshaw’s code at a time, but I fairly quickly discovered that this is exactly backwards; it makes far more sense to implement a complete compiler and then break it down into chapter-sized steps. It is easier to design a road when you know what the destination will be! I do recognize the value of Crenshaw’s "learn by doing / just write the code" approach. I want to go back and "chapterize" the compiler when I’m done.
- Crenshaw’s code emits 68000 ASM for SK*DOS as a string; mine emits a binary .NET assembly as a file.
As I develop the compiler, I have found that I use a back-and-forth cycle of adding features and beautifying code. New features tend to be pretty ugly until I understand how the code needs to work in the end, and then I go back and rewrite the uglier bits using the knowledge I have gained in the implementation.
For example, I have just added support for decoration and dereferencing of local variables, and, with that, multiple-line programs. This resulted in a considerable expansion of both the parser and the code generator. So I’m going to go back and simplify a number of things, especially error handling. I’m going to use a combination "railway oriented" design and error nodes in the parser.
The code does have moments of beauty, though. Here’s the top level of the compiler:
let compile = lex >> parse >> optimize >> codeGen >> methodBuilder
That’s not an ASCII art diagram; it’s actual, executable F# code.
Here is one of the unit tests:
[<TestMethod>] member x.``2+-3 should equal -1`` () = Compiler.compile("2+-3") |> shouldEqual <| -1
This test uses the compiler to emit a .NET assembly in memory, execute its main function, and compare the result with the value in the test.
Anyway, this is an ongoing project. You can see where I’m at by following the GitHub repository.