Sunday, 23 April 2017

Been a While

It's been a long while since my last post. That's not to say I have been idle. In fact, it's quite the opposite.

After working 10 years first as a sales representative and another 10 years as a credit professional I decided it was time to finally pursue my true passion: programming. I have been programming since I was in my early teens and it has always been a passion of mine.

At one point I thought maybe I'd like to write games but I soon became disillusioned by the video game industry. I became confused as what I wanted to do and one thing leading to another, my career path headed in another direction.

I was quickly approaching a time in my life where either I commit to my path or pursue my passions. It was time to make a choice. However, being married with kids, making a life change comes with a tremendous amount of pressure and fear. If I fail or make the wrong choice, it's not just me who suffers the consequences.

The opportunity that presented itself also required us moving roughly 300km away. We had been wanting to move back to the Okanagan Valley for quite some time and now we could do it. My children are still young enough that lasting friendships haven't formed yet but still it would be hard on them.

So, I accepted a position at Acro Media where we primarily build custom Drupal e-commerce websites. I, specifically, work on supporting existing sites and maintaining our in-house infrastructure. Luckily, this means I don't exclusively program in PHP, Javascript, HTML and CSS. I also get to work in C++ and Go.

I wasn't sure how I would feel about working in PHP for much of my day. I am not a fan of the language for many reasons but I've been pleasantly surprised to discover that they joy of programming has thus-far eclipsed working in a language I dislike. I think this is mainly due to the fact that my day is quite varied. My job is about solving problems. It's not so much what language I'm using.

So where does that leave the blog?

Well, I hope that it means that'll be writing more again. Compilers are still a passion and I'm working on another project. It's rather ambitious so we'll see what happens there.

On that note, I think I'll call it a day. Until next time!

Sunday, 21 February 2016

The Accumulator Register

At all curious about the accumulator register? I was. Here's a little summation of what I discovered.

Overview

In modern, general purpose CPUs, the accumulator register has lost some of its meaning and much of its use has been relegated to convention rather than necessity.

I got to wondering why the ax/eax/rax register was called the accumulator and why it was used so much since it seemed that any of the general purpose registers would do. Does it just come down to convention or is there more to the story?

All the MIPS documentation I’ve read to date indicates the only register considered the accumulator is a hidden, double-word register accessed via the MFHI and MFLO instructions. This speical register is used when doing multiplication and division.

In developing my own virtual machine, and delving into electrical engineering, I got to be more intimately familiar with this happy little register.

A Little History

If my sources are correct, the first commercially available CPU from Intel was the 4004. It had a single register that was implictly used in the vast majority of its instructions. You can see for yourself on the data sheet (PDF). Can you guess which one?

The 4004 was designed in such a way that the output of the ALU fed directly into this single register to accumulate data. Operations that expect two operands always implicitly use the accumulator as the first operand.

The ADD instruction, for example, takes a single register as an operand. It performs addition on the value stored in the accumulator plus the given register and then stores the result back into the accumulator.

To my knowledge, the 4004 did not pioneer the use of Accumulator but it serves as a nice example.

Example

The design seems simple but consider how calculators or adding machines work. How about an example?

Assume the calculator has just been turned on or a clear instruction has been set so that the accumulator has a value of zero.

The user types in a value then hits the addition key. The accumulator and the value on screen get added together. There’s no operands just a simple ADD instruction. ACC += DISPLAY.

Arithmetic instructions would not need explicit operands at this level of design. It makes the accumulator a very important facet in the design of these machines.

In modern implementations, even if a CPU can use any general purpose register to execute a particular instruction, it may be optimized to perform certain operations in fewer clock cycles when using the accumulator. You’ll need to read documentation to find out.

Legacy

The result is that the importance of this register is historic and thereby remains important to standard convention. We probably no longer need to do it this way but it does have a logical, and historic, significance.

Happy programming!

Saturday, 14 November 2015

Compiler 3 Part 8 - Finale

This will end up being the longest, and final, post in the series. So let's tuck in!

Testing and Everything Else

The scanner and most of the infrastructure stayed exactly as it was.

The parser and AST did see some changes. The parser, in particular, saw several bug fixes and simplification. Some of the error checking that had been done during parsing has been passed on to the semantic analyzer. Similarly, anything related to type checking could be removed.

Also, the AST-related scoping code could be simplified and confined to the parser since it is discarded during semantic analysis.

Detecting “true” and “false” was added for boolean the type.

With those changes out of the way, I can say that if there is one area that Calc has truly lacked, it was in testing.

A compiler project should, in my opinion at least, be concerned with correctness. Correctness and consistency. The IR was one step in shoring up Calc in this area and testing is the other. Calc has always incorporated at least a little testing but it has never been as comprehensive as it should be.

Go provides great testing facilities right out of the box. The cover tool has also proven helpful in sussing out corner cases that need testing.

I have, I think, made large strides have been made in making Calc a much more stable compiler.

That isn’t to say that it’s perfect. I’m sure within 5 minutes of playing with it someone will discover some bugs. Hopefully, those than find them will take the time to submit an issue so I can create tests and fix them.

This is the value of having an open source project.

The Future...

Calc has continued to evolve along with my own skills and I hope that I’ve helped someone, in some way, to further their own growth. Whether it be teaching someone how to, or how NOT to code, the end result remains the same. Someone, somewhere, has benefitted from my efforts.

Now that I’ve had a chance to regroup from the herculean effort of writing and implementing the Calc2 spec and teaching series, I feel like I can continue to produce more content.

I do hope to continue talking about compiler construction. In the future, I hope that by using a smaller format, like this one, that I can produce more posts more quickly. I can certainly say that this third series has been much more enjoyable to write!

I have also been working on another side project to create a virtual machine with a complete software stack. It has an assembler, linker and virtual machine running it’s own bytecode. If there is interest, I’ll write some blog posts about it. You can find the project here.

I have concluded both previous series talking about future plans and I shall do no less today. In series two I had a list of things I wanted to implement and change. Let’s go through the list:

  • growable stack - well, this is null and void. Calc no longer manages it’s own stack.
  • simple garbage collector - I no longer plan on implementing a garbage collector myself. See below.
  • loops - not implemented
  • generated code optimizations - done! 
  • library/packages - not implemented
  • assembly - null and void. I don’t plan on generating assembly.
  • objects/structures - not implemented
Only one of the seven features I wanted to implement was actually completed. Of the remaining six features, three of them no longer make sense. The groundwork for other three has been laid down. In particular, loops and structures only await language design since implementation should be almost trivial.

Incorporating the remaining three features, I now present you with an updated list:

  • #line directives for source-debug mapping - I think this is a very reasonable and easily obtainable goal. It will make debugging Calc with the GNU debugger easier.
  • logical && and || operators - are easily added with minimal work
  • importing libraries - thanks to the x/debug/elf package, importing libraries into code may not be unreasonable to achieve in the near future. The scope of adding imports is likely a series in of itself.
  • garbage collection - Calc 2.x does not need garbage collection. It does not have pointers and it does no memory allocation. However, when it does, I will probably use the Beohm garbage collector rather than attempting to spin my own. Additional info can be found on Wikipedia. This feature will be far in the future so don't expect it any time soon.
  • structs, loops and arrays - with the new code generator, implementing structs and arrays on the back end ought to be trivial. Work on this has actually already begun and you can view some of the changes here. Sorry, the specification is not published publicly right now.
  • stack tracing - there isn’t much nothing holding me back from implementing stack traces on crashes now that the new code generator is in place. Time and effort are all that remains.
And that, as they say, is that! Thank you for taking the time to read through this series! At some point I’d like to put everything into a PDF but that would be a large task since large parts of both series would need to be re-written entirely. In fact, I might even want to start again from scratch.


Until next time!

Friday, 13 November 2015

Compiler 3 Part 7 - Code Generation

Here we are at the final step.

As mentioned in the introduction, I cut out a massive amount of code by offloading much of the work on the IR.

So here is what has changed:

I introduced a small function to map Calc types to C types. This could probably exist in ir.Types, too, but I chose to keep it coupled with the code generator since that’s the only code that uses it.

Any object with an ID has its original named stripped and is replaced by a generic variable name. These names start with an underscore and the lower-case letter “v” followed by the ID of the object.

Each binary operation is assigned to a new variable (a reason for why C99 is required). Don’t worry about this being wasteful. Even if you chose to output a chain of infix arithmetic (1 + 2 + … + N) the underlying assembly instructions usually take no more than two operands anyway. This is why using the above method works so well since it more closely matches the machine code.

Note: I've spent a lot of time comparing the assembly generated from C to see how things work. This is a fun (am I sick?) and interesting exercise to try yourself.
Even if you didn't follow along with the last series, I encourage you to view the previous binary code generation function. 58 lines of confusing mess now pared down to a simple 5 line function. Perhaps more importantly, the generated code is easier to read and follow even though it’s not really intended for visual parsing.

Another important change is using C-style function calling and function declarations. This was made possible by the new IR and type system. With every object being assigned a valid type, we can easily create proper C function prototypes and definitions.

By utilizing the SSA-like code generation and the new IR it also becomes trivial to use C-style calling convention. Types have already been checked, the number of arguments verified, and sub-expressions have been assigned ID’s. Therefore, only raw values and object IDs are passed into the function.

All in all, fairly simple.

Tuesday, 10 November 2015

Compiler 3 Part 6 - Tagging

This step is crucial to Calc’s code generator but may not exist at all in other compilers. Regardless, it makes code generation for Calc dead simple. Before I can get into the process, you do need a little background information.

In the introduction I mentioned something about 3AC and SSA. Make sure you check out the articles but the cut and dry is thus:

In SSA, each calculation is given a unique ID. These ID’s replace variable names and other objects.

So, what does this mean to us?

Consider the following two example infix arithmetic operations:


  1. a + b + c + d 
  2. a * (b + c) / d 

These two operations could be translated into something like the following:

Example 1: a + b + c + d
r1 := a + b
r2: := r1 + c
r3 := r2 + d


Example 1: a * (b + c) / d
r1 := b + c
r2 := a * r1
r3 := r2 / d


As you can see, the result of each binary operation is assigned to a new, unique variable. While verbose, it is much more akin to assembly and easy for C compilers to optimize.

It also ensures that calculations are done in the correct order and removes the necessity of pushing and popping operands to the stack.

Armed with knowledge, we can now get on with what I call tagging.

Tagging is the process by which we attach these unique identifiers to each operation. Variables, parameters, binary operations and unary operations all need to be tagged.

Oddly, perhaps, even if statements need to get tagged and you may wonder why that is. Well, if statements in Calc are not statements. They’re expressions. Like if expressions in functional languages, if expressions in Calc return a value.

As part of escape analysis and type checking, the value of any branch in an if expression must be checked and tagged since it’s result may be used elsewhere in the code.

As you can see, Folding constants can potentially save us time by reducing the number of operations needing an ID.

We can traverse the tree in any manner we chose provided that every ID is unique.

One more stop left to go. Code generation!

Monday, 9 November 2015

Compiler 3 Part 5 - Constant Folding

The only optimization included in series 2 was constant folding. Unlike in the previous series, we’re now much better equipped to handle this optimization.

I feel that I should point out that most C compilers do this step, too. I was somewhat reluctant to keep it in at first. However, my reasoning for keeping it is two-fold: one, I think learning a bit about optimizations on an IR is a worthy lesson; and two, it can help make the next step a little quicker.

Once again, the IR comes to the rescue! In the first step of transforming the AST into the IR we created constants. These were objects representing actual values. This crucial step makes it much easier on us now.

Unary and Binary operations are ideal candidates for folding. We have already done the work of verifying types and correctness so we can ignore all that now. When we check if a value is a certain type we can be confident that information is correct.

Looking at the code you can see that it’s pretty simple. Traverse the tree depth first.  If both operands of a binary object, or the single operand of a unary object, are values of the correct type we can fold them together. We then return the result as a new constant value and replace the previous object with it.

Moving back up the tree we repeat the process until we exhaust all the foldable values.

You can see the value in converting basic literals into constants when building the initial IR tree. It makes folding constants together much, much simpler.

Onward and upward!

Sunday, 8 November 2015

Compiler 3 Part 4 - Type Checking

Whether you chose to produce assembly or an intermediate language, the job of the IR is to take us ever closer to generating actual output. We need to do the crucial step of type checking. Since Calc’s type system is strong and static it is necessary to ensure type safety before moving on.

Each struct in the IR tree has an underlying object. This object provides a Pos() function that maps the remaining constructs back into the original source code. This will be useful to us when we are doing type checking so we can provide well formed error messages.

We must traverse the tree and verify that expected types match. Not all objects currently have a type, like variable declarations using type inference. This means that we don’t actually know the type of the variable when parsing the source language since the type isn't explicitly declared but inferred from the object being assigned to the variable. 

Similarly, binary expressions do not have a defined type when parsing. Your language may allow adding (concatenating) strings, in which case an add binary operation on strings would have the type string. In the case of Calc, logical comparison operations make a binary expression type bool. Don't forget that most languages have different integer types so your binary expressions will need to take on and verify these types.

This requires we traverse the tree depth first to identify the types of certain structures prior to verifying their correctness.

In addition to verifying type correctness, the type-checker also does some additional tests…
  • checks that function parameters match the number of arguments in function calls;
  • verifies that an identifier matches the right kind (variable vs. function names);
  • checks for undeclared variable and function names.

While not true of the current version of Calc, many languages have varying bit sizes of integer types. Somewhere during this process you would will need to verify if a particular value fits within the bounds of a particular type. If the source has a value greater than 255 being stored into an unsigned 8 bit type, this is a problem. When and how you chose to handle this error is up to you.

Type checking may also involve a level of escape analysis. You need to verify that any value being returned anywhere in the tree conforms to the parent type. In imperative languages, this means looking for statements like “return”. For functional languages, you need to search for terminal points like the "then" and "else" clauses in an "if" expression.

Once we are sure that everything is type safe and correct, we can move on.