Monday, 22 December 2014

Compiler 2 Part 13: The Front End

This is it! You've made it!

The final post in this series merely covers the user-facing code and installation of the compiler and isn't really part of the compiling process.


I chose to use a Makefile for installing and building calcc and it’s runtime. The install target will not only install calcc but ensure the runtime is built before doing so.

It’s a nice improvement.

I considered the GNU autotools but it felt like bringing a howitzer to a knife fight. Just a little bit of overkill for marginal gain and a much slower system overall.

There is a test target too for generating a simple test program for the runtime.

It was a pain making it portable to Windows.


Calcc has had a few changes too.

First, I added a function called findRuntime to make an attempt to locate the C runtime. I wanted to ensure that the runtime would not be installed on the developers system so I needed a method of locating it. I had two reasons for this: a) this is an instructional compiler and once someone is finished with it I would like for them to be able to delete the source tree and the runtime with it; and, b) the runtime is statically linked into the final binary so there’s no point to having it hanging around in la-la land.

This function does not guarantee the runtime is there when it comes time to link it. All it does is do a sanity check to see if it can find it prior to trying to link it.

I added a cheeky flag to output “assembly”. Really, it outputs pseudo-assembly for the instructions in the C runtime. It’s nice to have when you want to see the C output to verify correctness.

The balance of the changes relate to multiple files and qualifying the path. For one, we want the absolute path rather than the relative one. Also, we have to do some work when compiling a directory because the path is part of the binary name.

It also strips the file extension, if need be, so that the filename itself can be used for the intermediate and final files. A file called foo.calc will generate the files foo.c, foo.o and a binary named foo. On Windows, the binary will be called foo.exe.

Learning as We Go

In writing the compiler and these articles I came across a few minor issues with the Calc 2 spec. These were unexpected and only reared their head when actually implementing them.

Making the return type of a function mandatory was one example. Since the Calc 2 spec doesn't have anything that doesn't require a side effect it made no sense to be able to call a function in an imperative form. That is, a function which executes the expressions in the body but does not return any result. The result of such a function would be what’s called a no-op.

A no-op is an operation with no effect. Look at this example:

(decl incr (n int) (= n (+ n 1)))
(decl main int (
(call incr(2)))

What does this program do? The function ‘incr’ does not return a value. Essentially, it does nothing. Yes, the value of ‘n’ is incremented within ‘incr’ but nothing is actually produced as output. No value is returned.

So why did I start with an optional return value in the first place? Well, because in the future, perhaps even Calc 3, I hope to include external libraries. I would like for there to be a print function so that an imperative function could produce output without a side-effect, without a return value.

Still keeping with functions, I had to revert my original idea of allowing nested function declarations. I may enable this again in future versions but it was problematic. For one, you could assign a declaration to a variable but there are no pointers in the language. So, what would that mean? What would it do?

I could have allowed declarations just within other declarations but it likely would have meant a bunch more code to make work and I wasn't sure it wouldn't feel clunky. So, if I couldn't do it right, I better not do it at all.

Those are a couple examples of how I had to tweak the spec slightly as I wrote. Unintended consequences of my decisions.

Again, I was not completely re-writing the spec. I was clarifying and tweaking it slightly. Any larger of an adjustment may have meant massive changes to the code which would be unacceptable. You need to stick to your plan!

The Future

So what do I have in store for Calc 3 and beyond?

I can’t say for certain just yet but I have a few ideas. Here’s my wishlist:

  • growable stack (high)
  • simple garbage collector (unknown; dependant on other features)
  • loops (moderate-high)
  • generated code optimizations (low-moderate)
  • library/packages(moderate)
  • assembly (moderate)
  • allocations and simple garbage collector (moderate-high)
  • objects/structures (high)

Some things, like packages, I’d very much like to do but would require a lot of engineering to get right. I think if I added it, I probably couldn't do much else.

Optimizing seems unlikely. I want to teach how language features work rather than building an efficient binary.

Closing Statement

This series was far too big and were I to do it all over again I would change a great many things.

First off, I would implement fewer features in one go. When drafting the outline of the series I did not anticipate just how big it would get. Some of the new features were very large topics and I’m not sure I did them all justice. I think I could have easily split this series into two to make it more manageable.

Multiple source file support was a bad idea. Not that adding the feature was bad in its own right but that it makes no sense with the rest of the lessons and adds unnecessary bloat to the series.

Next, I would write the C Runtime a bit differently to make it easier for the C compiler to optimize the assembly instructions. While my intentions of using pseudo-assembly were noble I don’t think they were the correct answer to the problem.

I am really not happy with the type-checking implementation. It’s sloppy and buggy.

I don't know if I would create another LISP-like language. I would consider retaining prefix notation for basic arithmetic operations but I would implement the rest of the language in a more C-like way. It’s just what makes me feel comfortable.

I am not entirely sure whether I will continue with another series, or not. I would like to, I think, provided it was much, much smaller. I think I would do minor, incremental steps from now on. With my current workload and the effort required to put this together I am, quite frankly, exhausted!

Thanks for reading and I hope you enjoyed it!