I've been meaning to write about Ruby performance for a while, and a recent blog post by an old friend and colleague, got me off my proverbial.
The old friend is John Duimovich, who wrote about the relative performance of C++ and Smalltalk and what that could mean for ruby.
John's message is important for those who bemoan the performance of Ruby, and I plan to expand on that message in this and later posts to this blog, but first a few words about Mr. Duimovich.
Consider the source
In his day job, paraphrasing his self description John "works for IBM on Java virtual machines and is the lead on the Eclipse tools project management commitee."
But some of my readers might be interested in John's background. John was for a very long time, the lead of the Smalltalk and Java virtual machine team at Object Technology International (OTI) dating from before the time it was acquired by IBM. Among other things John was responsible for the development of embedded Smalltalk virtual machines from OTI, which spawned the VM used in Smalltalk/V Mac, IBM Smalltalk (used in IBM/VisualAge), the 'Universal' Virtual machine which implemented Java on an extended Smalltalk VM, and which was used for the early releases of IBM/VisualAge for Java, and the J9 Java VM. A good deal of what I know about implementing VMs comes from working, lunching, and bar-hopping with John.
John had become OTI's Chief Technology Officer before OTI got assimilated into the IBMborg.
John is a brilliant guy, with a great sense of humor. Two characteristics which seem to have been requirements for a job at OTI. I'm still not sure how I ended up spending several years there.
Dynamically Typed Doesn't Need to Mean Slow
I encourage you to read John's blog post yourself, but to summarize; John ran across another blog item which gave a benchmark written in C++, Ruby and Python. The C++ version runs in under 1/10 of the time needed for either the Ruby or Python versions.
John duplicated the results on his machine, then decided to port the Ruby version of the benchmark to Smalltalk. He then ran it using VisualAge Smalltalk.
And the Smalltalk version runs in the same time as the optimized C++ version!
How can this be?
The Value of Pole Vaulting
Languages like Smalltalk and Self started from the position that a clean object-oriented language was more important than one which makes compromises to make efficient implementation obvious.
Early implementations of Smalltalk used obvious implementations of some features, which were 'fast enough' in many cases, but by no means fast. Two areas which cried for improvement were method dispatch and garbage colection. The obvious techniques were walking up the class hierarchy each time a method was needed, and relatively easy to implement GC techniques like reference counting, and mark-and-sweep. Reference counting has a fairly high cost for each change of an object reference, and also has the drawback of leaking memory because cyclical references lead to garbage which is uncollectable. Mark and sweep delays the overhead until storage is exhausted, but leads to more perceptible pauses when the application gets paused so that the housemaid cleans the room.
Encountering (or having set) this high bar, various implementors of these languages found very clever techniques for both problems. Dave Ungar made measurements of the lifespans of Smalltalk objects and observed that most objects died very shortly after being instantiated, with few living a long life. This led to the invention of generational GC techniques, which quickly dispatched young dead objects, which are the vast majority.
Method dispatching techniques of efficiently implemented dynamically typed languages tend to use clever caching algorithms which can get to what is probably the right method quickly, with a quick test to make sure that the right method was found.
These dispatching techniques turn out to be faster than the virtual function pointer dispatching made possible by strongly-typed languages like C++. In fact, I've heard that more modern implementations of these languages have actually used a more dynamic method dispatch mechanism internally in order to increase performance.
Anoher implementation choice is how to represent executable code. Most efficient implementations use a combination of byte-code representation, and some form of just-in-time translation of byte-codes to machine code. Just how to divide execution between byte-code and machine code is a complicated decision. Back when DIgitalk first produced a version of Smalltalk/V for OS/2, they decided to eschew byte-codes entirely and generate 80286 machine code. The reason was that they were tired of hearing complaints about Smalltalk being an 'interpreted' language.
The surprising result of this experiment was that the resulting implementation was slower. Machine code was bigger, so it took longer to load, and caused more swapping. These costs were paid whether the code in question was executed once or a million times.
Again caching was the basis for getting the best of both worlds. Peter Deutsch of Xerox, later ParcPlace, had introduced the notion of translating byte-codes to machine code into a cache during execution, David Ungar's implementation of Self introduced the notion of using light-weight profiling techniques to avoid the overhhead of translating byte-coded methods which were infrequently executed.
Another area which posed difficulties in implementation was control flow. Smalltalk-80 defines all control flow as methods. Even primitive control flow constructs such as if (ifTrue: in Smalltalk) were implemented as methods on Boolean classes. This is one area where Smalltalk implementations cheated compiling such methods in to testing and branching byte-codes, and requiring the receivers to be boolean instances.
Self eschewed this early optimization. Ungar's team instead relied on run-time type inference in order to dynamicaly generate code which achieved the same or better performance when such a message was sent to a boolean without restricting other cases.
The Current State of Ruby Implementation
Ruby performance today is surprisingly acceptable for a wide range of uses.
This is despite the fact that the implementation is relatively straightforward, almost to the point of being naive. In the current standard implementation of Ruby:
- Method dispatch is done by walking up the 'class' hierarchy looking for methods in a hash table in each class/module.
- Garbage Collection is done by a simple mark and sweep algorithm.
- Executable code is represented by a parse tree which is executed by traversal.
This is not meant to understate the achievements of Matz and the ruby developers. Ruby as it is definitely usable for many production uses.
The point is how much better Ruby performance can get as the implementation matures. A virtual machine, with byte-codes, and better GC is on the roadmap. Ruby virtual machines such as YARV, and JRuby are showing glimmers of the value of implementing Ruby as a virtual machine. If Ruby continues to grow in acceptance, I've no doubt that other clever implementers with experience in efficently implementing dynamically typed languages will provide more implementations.
My prediction is that the future will be so bright that we're going to have to wear (ruby colored) shades!
Trackbacks
Use the following link to trackback from your own site:
http://talklikeaduck.denhaven2.com/trackbacks?article_id=39





To Bob Bane,
No Smalltalk was for a time being used for lots of enterprise level programming. What took the wind out of its sails was the huge groundswell of Java popularity, not a lack of the ability to call C, which is something which both languages have.
To Alan Friendly,
Yes Alan and Peter were both early contributors to Smalltalk implementation. And the implementation techniques were improved over time by many folks including (off the top of my head) Dave Ungar at UC Berkeley, George Bosworth and his team at Digitalk, and John Duimovich and the OTI VM team.
This is now a very old post, but I want to respond to wog regarding C++ bigotry.
I think most folks who know me would agree that I by no means a C++ bigot. It is possible to avoid virtual function dispatch in C++ and at times get increased performance. But that comes at a cost.
C++ give you a lot of choice, but that isn’t always what developers want. I think languages like python are easy to develop in, but currently at a cost of performance.