Saturday, 7 November 2015

Compiler 3 Part 3 - Intermediate Representation

After presenting Calc2 I came to discover that many parts of the compiler were quite fragile. The type checking was particularly bad, occasionally reporting type errors (false positives) when run over the exact same source code multiple times. 

That’s a serious problem.

I had assumed that using an intermediate representation was strictly for doing optimizations. While they certainly help with, and may be crucial for, optimization they can be useful for much, much more.

The IR used by Calc is nothing special. In fact, it looks very much like the AST and the objects it uses share many of the same names. So what gives?

If you look more closely, there are some crucial differences. For one, each object is represented a little differently. Second, there is much more in common between each object. In fact, there is a struct embedded into every one: object.

This struct gives me access to almost everything needed to perform various tasks, including: type checking, error reporting and code generation.

While the AST and IR may both be trees, and represent much of the same data, they represent two very distinct things. The AST, as the name implies, represents the actual syntax of the language. Using it, you can somewhat faithfully recreate the original source code. The IR, on the other hand, sits much closer to code generation and thereby represents the code we wish to output.

I look at the IR like this: it acts as a bridge between (intermediate) the syntax and code (representation) generation.

The first step is transforming the AST into IR. This is done with a simple tree walking algorithm, converting each node into the new form. Typically, the compiler starts with MakePackage. From there, you can follow the calls along to see how the tree is built.

There are a few things worth noting.

First, notice that the parameters of functions get separated into objects stored in a function's scope and only a string slice of parameter names is stored with the declaration. This will become important in the next few steps.

Second, we can also begin a simple optimization of converting literals, like number and boolean strings, into actual values called constants. This saves us the step of doing these conversions later on and provides an opportunity to do constant folding in a much more efficient manner.

Once we’re done converting the AST into the IR tree, the real work can begin. Stay tuned!