Saturday, 3 May 2014

Compiler Part 1: Introduction to Writing a Compiler in Pure Go

Introduction


I've long been interested in learning how a compiler works. Cryptic compiler messages and odd behaviours have always baffled me. I’ve never quite understood how optimizations are done or how it is a compiler knows what I've done wrong.

When I first decided it was time to learn how to write a compiler I found out that there are a lot of terms and acronyms not often used outside of the discipline. What was an SLR or LALR parser? What in the heck is a lexeme or finite automata? What is recursive-descent parsing? What is an AST?

It can be pretty overwhelming at first.

The largest hurdle I found was that most online courses, tutorials and other learning materials on the Internet don’t teach you how to write a compiler or interpreter. Correction; they don’t teach you how to build one from scratch but provide you an overarching, top-down approach, relying on existing tools to help you do the job. This is because building a compiler is a pretty big job and it would be nigh impossible to cover everything in a single semester. I can’t fault anyone for that.

That said, I felt these tools skipped over what I think are valuable lessons and information. I wanted more.

So, I want to take you on a tour of my personal journey through learning how to write a compiler.

Bare in mind, I’m not a professional programmer nor am I finished figuring everything out. That said, I've discovered so many wonderful things that I can’t help but want to share them in the hopes they’ll help you discover your own path. After all, it’s why you’re here, isn't it?

Credit Where Credit is Due


Calc is an original work and while all the code for this project has been hand written there can be no denying parallels and similarities to Go and Go parser package in the standard library. In learning to write a compiler I have often referenced those works when I got stuck. Any and all credit must go to the Go developers for any code that resembles any of their collective works. You can find the source code for the Go compiler at golang.org.

Prerequisites


As much as I’d like it to be, this series is not really intended for novice programmers. It assumes you have some programming experience and are already familiar with the Go programming language. Therefore, it is targeted at more of an intermediate programmer.

While I think a more novice programmer might be able to follow the articles I will make no attempt to explain Go or any code not directly related to compiling. There are several utility functions which I will point out but since they aren't explicitly about compiling I won’t explain them.


Tools


Compiler design is a component, or perhaps companion, to language design. It’s a massive topic and each step of the compiler is a lesson in of itself.

There are a couple tools out there that people use to simplify or expedite the process of building a compiler. Each tool tends to focus on one particular step of the process. I am telling you about these tools so that you know that they exist but they are not relevant to this blog series. There are, however, many articles and classes that will teach you compiler design by using these tools.

Flex and Bison are probably the most common open source options available. Flex is based on a program called Lex, short for Lexical analysis, which is the first step of compiling. Bison is based on a program called YACC, an acronym for Yet Another Compiler Compiler, which does parsing and builds something called an abstract syntax tree.

I don't expect you to understand any of that, so don't worry. I will caution you that trying to use either of those tools without comprehending how a compiler works will be difficult in the extreme. You should prepare for some hurt.

I have actually found it more useful to skip these tools and learn the basics, first. I think building the entire compiler stack by hand provides a better solution in the long term anyway but that’s just my opinion. Take it for what you will.

The problem, for me, is that there ends up being a very serious disconnect and loss of control between each stage of compiling. It also begs the question of: “how does it actually work?”

The Solution


That’s why in this series I’m going to cover each individual component and we’re going to write code from scratch. It might take longer but I think you’ll have a stronger understanding in the end. If, after we’re done, you decide you want to use the aforementioned tools you’ll be able to do so much easier and make a more informed decision.

Stay tuned for Part 2 to start getting our feet wet!

[Edit: Updated the tools section to be more clear in its intent]