Sunday, 19 October 2014

Compiler 2 Part 5: Tokens

Finally! We get to look at some code!

In the token package, I had to add a plethora of new operators, some keywords and a few other tokens.


New Tokens


The IDENT token will be used for any type of identifier, be it a variable name or a type name.


The COMMA token is a special case which only sees use in function parameters in Calc 2. When multi-value assignment and multi-variable declarations come in to play it will see more use.


The DECL, IF and VAR tokens are representative of the three keywords by the same names.


The balance of the new tokens are all new operators. ASSIGN for variable assignment, AND and OR for the corresponding logical operators and the other six are for comparisons.


The IsKeyword function was added to complement the other boolean testing functions.


Last, and perhaps most importantly, Lookup had a seemingly minor change with further reaching consequences. It now returns an IDENT token rather than ILLEGAL when looking up a string. This is useful in the scanner when determining whether an identifier is really an identifier or if it is actually a keyword.


Errors


Errors were moved from the scanner to the token package.


It felt like a better fit, here, considering tokens are used throughout the entire system whereas the scanner is only used by the parser. It now eliminates the need for an additional, essentially unnecessary, import.


The Add function was changed to make it easier to write error messages.

There are various other refinements to the error handling that I encourage you to look at but I go into here.


Files


The complete Calc source of the file is no longer stored in the File structure. It doesn't need to be there since it’s main facility is to help with error reporting throughout the package. All the data it needs is already there and the source wasn't ever used for anything but the file size.


The length of the source code now tracked by the size field.


The base field is now supplied as a parameter in NewFile. This is because the base value of the file may no longer be one when multiple files are involved.


FileSets


A file set holds the collective file information for a package. A new file set has a base of one for the same reason a file had a base of one in the previous series.


When the Add method is called, a new file is added to the set and the base is increased by the size of the source code added.


The Position of a token is now determined by cycling through the files in a file set. If the position (of type Pos) is within the base and the base + size of the file then we know the position is in that file. We then call Position method on the file to get the Position information from it for use in error reporting.


NoPos


Every variable has a type. When inferring a type of a variable via assignment, there is no actual type keyword. So, a new ast.Ident is created with the type’s name and given a position of NoPos. Using the term NoPos is more indicative of it’s use than illegalPos used in Calc 1 so the variable was renamed.


Summary


That concludes the changes to the token package. Not much work needed to be done here and I think the changes are pretty straight forward.


Adding new tokens or keywords is incredibly simple. Some of the infrastructure to support things like multiple source files is a bit more difficult but not too bad.

Lets move on to the scanner.