As some of you might have noticed, I have become somewhat of a Smalltalk weenie recently. This was a long time coming, but basically a reaction to many years of tedious hacking in C. Now don’t get me wrong, C is a great language, I still love it, but I have been using it in entirely inappropriate places.
After reading up on Smalltalk a few months ago, I was entranced, and quickly decided that I just had to build my own Smalltalk system. Since I didn’t want it to be tied to any sort of platform (i.e. Java), I chose to write it in C (C99 to be specific).
I started out with the compiler. In order to learn more about compiler theory, I wrote a Smalltalk source-to-bytecode compiler from the ground up. The compiler translates source code into a compact bytecode representation consisting of about 30 different bytecodes. Currently, all Smalltalk-80 syntax is supported (except for message cascades). Of course, the compiler needs to manipulate on Smalltalk objects, so I also designed an object memory model (basically how objects are stored in physical memory). All objects have the same header, consisting of book-keeping, hash, and class fields. I intend to reduce the number of header fields to two though, since the hash field is often unnecessary. For example, instances of String will have a custom hash function, and will never need to access the hash field.
As with most Smalltalk VMs (and possibly Python/Ruby), integers are represented using tagged pointers. This kind of pointer has a 1-bit tag in the low-order position. If the bit is 1, then the rest of the pointer contains a 31-bit integer, otherwise, the pointer simply points to a heap-allocated object. This arrangement increases performance greatly, as having to heap-allocate integer objects would absolutely kill performance. We can get away with this, since on most architectures, pointers are aligned on 4 byte boundaries, thus leaving the low-order 2 bits of each pointer unused.
Just this evening, the VM passed a critical milestone, that of being able to evaluate the expression “3 + 4″. This sounds trivial, but a whole of lot nuts-and-bolts have to be in place to make this work. A few hours after the “3+4″ test, the VM was capable of executing unary methods, as well as handling local variables and activation records.
Below is some sample Smalltalk code which shows what the VM can do currently, basically some arithmetic, local variable manipulation, and a method call to “increment”. Note that the method doIt belongs to the UndefinedObject class, and increment to SmallInteger.