I am having great fun working on my Smalltalk VM. I spent the last few weeks mainly on implementing a compacting garbage collector. There have been some nice additions to the class library, including the Set, Dictionary, and System classes.What’s been particularly pleasing to me personally is that certain benchmarks have come out very fast, faster than Python or Ruby.
fibonacci self <= 2 ifTrue: [^ 1] ifFalse: [^ (self - 2) fibonacci + (self - 1) fibonacci].
The naive recursive fibonacci function isn’t a very exemplary benchmark. Nevertheless, I am advertising the results here because I think its a good start. In my opinion, the fibonacci function does at least give a good benchmark for method invocation performance (i.e changing the current context or activation record of an interpreter). The benchmarks below are all based on calculating the 32nd fibonacci number. I assume that the Python and Ruby binaries are compiled using the optimal CFLAGS. If that’s not the case, then I guess Ubuntu has a bit of a problem.
Squeak Smalltalk: 0.427s
The winner, Squeak. I can’t reason about why its faster than Panda. I suspect squeak’s context management scheme is slightly more optimized at the moment.
Panda Smalltalk: 0.67s
Yes, that’s my VM! coming a respectable second. As I stated above, there’s still work to be done on context management. The executable was compiled using “-g -O3 -fomit-frame-pointer”.
Python has a bytecode interpreter and uses the C stack to store activation records. So its context management is probably an order of magnitude faster than Panda. I suspect the python is slower as it allocates an actual object for each integer. In Panda, integers are just stuffed into pointers, which are then tagged in the two low-order bits so that the interpreter can differentiate between normal objects and integers.
No excuse. This gives dynamic languages a bad name. I suspect the main performance killer here is that Ruby’s execution model is based upon an abstract syntax tree rather than bytecodes. The problem with AST’s is that the nodes are spread throughout the heap, reducing locality. A lot of indirect memory references are required during execution, which slows things down. Hopefully Rubinius or JRuby will improve matters since they include bytecode interpreters.