macOS Support

Per the release of Mojave (10.14) I’ve upgraded my development hardware and am focused on cross-platform SDL support for macOS. There are a few known bugs with the current release of SDL2 (2.0.8) that are being addressed. Once those fixes make it into a stable release will push out v4.1.1. This release will also include some minor bug fixes to the Objeck compiler for example allowing “break” statements in nested loops.

Native Support for 64-bit Windows

Over the past couple of weeks I’ve added native support for 64-bit Windows. To do this, the code was overhauled to be more portable and maintainable on all platforms. Code generated by all JIT compilers is now more efficient and there’s better hand-off between interpreted code and machine code.

In addition, the memory managers have been tuned for 32-bit and 64-bit environments respectively. This refactoring will allow Objeck to be ported to other platforms (i.e. Raspberry Pi) more easily.

BrainF**k

Wrote a simple BrainF**k interpreter for a Rosetta Code task and found out that it’s a great way to teach kids about computers and introduce programming concepts. Kids can quickly pick up the language’s 8 commands and teach the machine how to do basic math using counting. Used the language to teach my daughter about multiplication and ASCII encoding.

Implementation Notes

Added implementation notes to GitHub for the general design and implementation of Objeck. Over the years, I’ve learned a lot about how compilers and runtime systems work as such I felt obligated to better document what I’ve learned. One of the most fascinating aspects of Objeck was implementing the debugger which I still need to add design notes for.

Design notes:
General Design
Compiler
Virtual Machine

Tiny Scanner

In this post I’ll outline the process I went through to write the scanner for the “Tiny” language using Objeck. With minor adjustment the algorithm below can be used to build scanners for more complex languages. The code is general enough to be easily rewritten in Python, Java or C++.

Here’s the full code as well as a more complex scanner written for Objeck in C++.

Step 1 – Read the file

reader := FileReader->New(@file);
leaving {
  reader->Close();
};
 
line_num := 0;
while(reader->IsEOF()  true) {
  line := reader->ReadString();
  line_num += 1;
  # do stuff
}

There are two approaches to reading a file. The first is to read the file a line at a time and the second is read the entire file into memory. For this scanner we read the file a line at a time. This approach allows us to remove newlines during the process using Objeck’s “FileReader->ReadString()” method.

Step 2 – Ignore Whitespace

# skip whitespace
while((line->Get(i) = ' ' | line->Get(i) = '\t') & i->Size()) {
  i += 1;
};

One of the main major functions of a scanner is to remove whitespace. Most languages encourage programs to indent their programs for readability. Indention characters should be ignored such the same tokens are parsed regardless of whitespace.

Step 3 – Identify what needs to be parsed
At this point we have a good idea what type of token we’ll be dealing with. If it starts with a character it’s an identifier or keyword. If it starts with a numeric it’s a number, if it’s an open quote a string. Otherwise, it’s a symbolic token such as equals or less than.

if(line->Get(i)->IsChar() | line->Get(i) = '_') {
  # do stuff
} else if(line->Get(i)->IsDigit()) {
  # do stuff
} else if(line->Get(i) = '""') {
  # do stuff
} other {
  # symbol based token
}

Step 4 – Parse the token
When parsing words we need to determine if they are keywords or identifiers. The easiest way to do this is to load all of your identifiers into a dictionary (i.e. a map or a hash table) and have the values associated with token types. When a word is found look it up and if it’s in the dictionary create a new token of the identified type.

token := reserved->Find(string)->As(IntHolder);
if(token  Nil) {
  type := token->Get()->As(Token->Type);
  @tokens->AddBack(Token->New(line_num, type, string));
}
else {
  @tokens->AddBack(Token->New(line_num, Token->Type->VAR, string));
};

Simple character matching can be used for symbolic tokens. Using a “select” statement to match the first character can be a quick way to parse these types of tokens. You should also correctly handle cases where the identified symbol is present in more than one token type for example “<" and "<=".

label ';': {
  @tokens->AddBack(Token->New(line_num, Token->Type->SEMI, "semi-colon"));
}
 
label '<': {
  if(i + 1 Size() & line->Get(i + 1) = '=') {
    @tokens->AddBack(Token->New(line_num, Token->Type->LESS_EQL, "equal"));
    i += 1;
  }
  else {
    @tokens->AddBack(Token->New(line_num, Token->Type->LESS, "less"));
  };
}

Step 5 – Storing the tokens
A scanner normally works hand-in-hand with a parser to build an AST (abstract syntax tree). In doing so, one may first scan all of the tokens in the source code and then proceed to building the parse tree or scan for tokens incrementally as while the parse tree is being built. For a small language like “Tiny” I choose to parse all of the tokens ahead of time before building the tree. For larger languages it’s more efficient to scan for tokens as you build your AST.

Tiny

Was in the Bahamas on vacation I spent part of my time working on the “Tiny” language Rosetta Code challenge and making improves (invaliding my submission). While drinking Sands and Guinness Export Stout I wrote the following code.  In doing so, I plan to post a few articles on writing a compiler/virtual machine. Once over the learning curve anyone can do it and it provide a great outlet for creativity.