Saturday, 3 May 2014

Compiler Part 4: Language Design

Part 1: Introduction
Part 2: Compilers, Transpilers and Interpreters
Part 3: Overview of Compiling

In part one we had a quick introduction to what this series is going to be about. In parts two and three I gave a very brief, non-exhaustive overview of the steps involved in compiling a computer programming language.

In this post, I’m going to be a bit more specific rather than the broad overviews in the last few articles. I’m going to discuss the language specification for our language.

Language Design Overview


Did I not just finish saying I was going to stop with the overly broad overviews? Turns out, I’m a dirty, rotten liar.

I’ll try and be as brief as possible so we can get into laying down some code but I want it to be clear that this is a very incomplete view of language design. Computer language design is a topic you can devote your life to. In no way am I going to give you any theory behind the design choices in Calc. I’m not going to discuss much at all.

In fact, I’m basically going to just tell you what the language specification is.

I want to be completely frank that Calc 1 isn't even really a programming language. It’s sole purpose is to teach the basics of building a compiler and act as a spring-board for future specifications of Calc to be covered in a future blog series.

Language design is hard. Very hard. It’s easy to say: “I want this feature, that feature and those features.” Why? For what purpose? How will it complicate the design? Could it confuse users? Is it necessary? Who is your target audience and would they want or need the feature?

Loops


These are not questions that are easy to answer even with tremendous amounts of experience. Some features, like looping constructs, might appear like a no-brainer but are they? Let’s take the two most common loop constructs: for and while.

Almost every computer language ever designed has some method of looping, whether it’s a jump like a goto statement or the very common for and while loops. Recursion is another form of looping. The most popular tend to be for and while.

First question, first: Why?

Well, we want to repeat an action. OK, pretty easy.

For and while both require an exit condition to complete. We usually associate these conditions with the logical true and false. Both have slightly different signatures:

while(condition) do { action }

for (start; condition; increment) do { action }


Why? Well, originally, I think it was logical to think of these loops as different. The Go developers, however, saw some commonality between the two constructs and asked themselves this very question: "Why?"

Why have two keywords, two reserved names, to essentially do the same thing? It might be harder to parse but wouldn’t it be nice for the end user if they just had to use one looping construct and only a single, reserved keyword instead of two?

The Go developers chose the shorter for keyword. I think it’s a more intuitive word which likely had bearing on their decision, too. ‘for (condition) do { action }’ makes sense. Suddenly, our language is simpler and cleaner by challenging a simple premise we’ve long taken for granted. Why use two keywords when one will suffice?

Trade Offs


There’s always a trade off. Complexity for simplicity. Speed for convenience.

The cost of the for loop decision was some mild complexity in parsing. The developers made a conscious choice to make parsing the for statement harder in order to provide a vast improvement in productivity for the programmer.

Generics is a hot topic. Go doesn’t provide any. Why?

Generics are all about trade offs and I can’t even begin to explain them to you. Your best bet is to do some searches in the golang mailing list and look at the FAQ.

The end result is that you need to decide what you want. Do you want a lean and mean language or a complex one that does everything under the sun? Why did you make the choices you did?

Just because a feature is in every other language doesn’t mean it needs to be. The while loop is a perfect example of that.

A switch, in Go, is so much more than it is in languages like C. It’s a simple concept made vastly more powerful without adding much cognitive complexity for the programmer.

Type System


Dynamic or Static? Strong or Weak? Explicit or implicit?

I, personally, prefer a strong, statically typed language. I like to be sure that what I expect is what I get without having to do anything more.

Can, or should, a type be castable? Can a floating point number be an integer? Could a character be a float? What is a string?

Bounds checking on arrays? Slices?

Pointers and addressable variables? Variables passed by reference or by value?

Like I said, something seemingly simple becomes very, very complex quickly.

Summary


I have new-found respect for many languages. I’ll be honest. I don’t like Java. I recognise its strengths but writing code in the language feels like programming with a donkey. That said, I have to respect the designers. I have to respect what they’ve done and achieved. I have to respect how what perhaps was the best, or only, decision back then is a bad idea now. Java does a job and millions of programmers use it. You have to respect that.

You will make bad design choices, no matter how clever you think you are. I’ve followed the Go mailing lists since early 2011 when I first became aware of the language and I’ve seen how the language evolved until it was finalized in its 1.0 release. There have been, and continues to be, many arguments over the design of the language. Language design is HARD and your decisions won’t please everyone.

As I said earlier, language design is a huge, huge topic. Many languages, I think, fall victim to wanting to please everyone and anyone. They have toolboxes full to overflowing. Other languages draw a line and refuse to cross it, even in the face over overwhelming evidence of its necessity. Both types of governance piss people off.

You need to choose which camp you want to be in and why.

Next up: The Calc1 Specification