My former OTI colleague Andrew “Roo” Low just wrote an interesting article about the trade-offs involved in designing a VM (and software in general) to exploit 64-bit machine architectures. Although, Roo discusses JVM implementation, he also grounds it with his experiences developing the IBM/OTI Smalltalk VM, and the lessons would be equally applicable to, say, a Ruby implementation.

Charlie Savage just posted a long article about how he reduced the rendering time on his rails app by an order of magnitude.
The article is full of advice on what to do and not to do in coding rails apps. The important message though is the approach to performance tuning by profiling.
Recently on gluttonous, Kevin Clark announced that
Powerset is going to launch their front-end on Ruby. It seems that they were already pre-disposed to a major ruby comitment having built a sizable Ruby talent pool for their internal applications.
Prior to making the final decision to go all out with ruby for their front-end launch, the did some due diligence which included investigating the facts behind the recent furore caused by an interview with one of Twitter’s developers.
So they went to
Twitter’s lead developer, Blaine Cook to get the straight dope.
Quoting Kevin:
The simple fact is that Ruby wasn’t the source of Twitter’s woes. As it often happens with rapidly growing sites, they ran into architectural problems. Some design decisions don’t hurt until they reach a massive scale and at that point you have to rethink your approach. In an email he writes:
For us, it’s really about scaling horizontally – to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January
That last sounds quite true and corresponds to my experiences in the past with Smalltalk. I used to tell IBM customers that almost all performance comes not from the language but from application design, and that using a dynamic language which allowed the application to be developed in what’s now called an agile development process allows major performance gains through refining, refactoring, and retuning the system as both business and performance requirements get uncovered/discovered.
That’s as least as true of Ruby as it was of Smalltalk 15-20 years ago.
Hmmmmm
I’ve been thinking a bit about the ramifications of the C++ vs. Python vs. Ruby benchmarking fud I wrote about in an earlier post
I’ve been looking at this from the point of view of how Ruby implements Arrays, digging deeply in to array.c, but I just stepped back and I noticed something which might or might not be important.
Besides using an Array rather than a linked list, the Baus Ruby benchmark doesn’t even do the same thing as the C++ one. In C++ the list is being used as a stack, items are added to the head of the list and removed from the head.
In the case of the Ruby benchmark, elements are inserted at the front of the Array using insert(0, obj), but removed using pop, which actually removes the last element. So instead of a stack we have a queue.
The Python code has a variation on the same problem, it uses append which adds to the end of the list, but del(0) which removes the first element.
Now I know that in benchmarking only performance matters, so I guess the old adage Make it work right before you make it work fast doesn’t necessarily apply.
Having said all that, another aspect of this is that the benchmarks take a rather naive approach in the use of the classes. For example, using Array#insert in Ruby is overkill for adding to the beginning or end of an array. More targeted methods such as push, unshift exist which are at least marginally faster because they have less cases to consider.
There’s no doubt that Ruby implementation technology needs to mature. Ruby is one of the most dynmic languages I’ve played with. Instance variable access is one area which is challenging, unlike in Smalltalk, where instance variables are declared and instance variable references can be bound to fixed offsets in compiled methods, Ruby instance variables are acquired dynamically and need more sophisticated run-time access. This is similar to the challenges which the designers of languages like Self posed for themselves and addressed, leading to impressive run-time performance achievements.
Subjectively, Ruby performance is remarkably good, but it can get better, and I’m sure that it will.
The topic of defining a new ruby class which could have instances that, like false and nil were seen as a boolean which was not true, just came up on ruby-talk.
This has come up before, and it turns out that in Ruby being untrue is reserved to these two specific instances.
The boolean test is pretty deeply engrained into the implementation of ruby. The actual test seems to be defined in the RTEST macro in ruby.h
#define Qnil ((VALUE)4)
#define RTEST(v) (((VALUE)(v) & ~Qnil) != 0)Which means that any object whose reference VALUE has any bit set other than the 3rd LSB is true.
In ruby, the control flow statements like ‘if’ aren’t messages, but are ‘compiled’ into a direct test and conditional branch.
Even Smalltalk, which defined even if/then/else as a message, e.g.
booleanValue ifTrue:[Transcript show:'true'] ifFalse: [Transcript show:'false']Tended to cheat in the implementation…
In most Smalltalk implementations #ifTrue;ifFalse: and its ilk are never sent, but are, like in ruby, compiled into test and branch code. Some implementations might have had a fallback if booleanValue wasn’t actually a boolean, but IIRC most would trigger a MustBeBoolean exception.
By the way, those […] are the Smalltalk analog to ruby’s blocks. In Smalltalk, blocks can be used as the value of any argument to a method, and a method could take more than one block argument.
But blocks are also another area where Smalltalk implementations tend to cheat a bit.
Smalltalk maintains the fiction that ifTrue:ifFalse: is really a message, and the methods in True and False are there to see:
in the True class
ifTrue: trueBlock ifFalse; falseBlock
^trueBlock valueand in the False class
ifTrue: trueBlock ifFalse: falseBlock
^falseBlock valueThe sending the message value to a Smalltalk block is analogous to sending call to a Proc in ruby, although the actuall message varies with the arity of the block. In the case of ifTrue:ifFalse, the block arguments don’t really bedome block objects, they get compiled as in-line code.
A lot of Smalltalkers ran across a head-scratcher when they got to the point of looking at the implementation of the value method in block which looks something like thi value
"return the result of evaluating the receiver"
^self valueThis certainly looks like it should be an infinite loop, but it isn’t. The trick is that in almost all circumstances, the Smalltalk compiler compiles sending value to a block into either special bytecodes. The only reason that the value method in block needs to be there is to handle cases like:
aBlock perform:#valuewhere perform is the analog to ruby’s send.
Smalltalk VM implementors like to say that it’s okay to cheat as long as they don’t get caught. MustBeBoolean is one area where they do get caught.
Pushing the Envelope
It’s quite a daunting task to fully implement a computation model where everything is a message and get reasonable performance. Smalltalk pushed this model quite far, but bowed to practicality in a few cases.
Ruby, although it’s more dynamic than Smalltalk in a lot of ways, makes more concessions in it’s current implementation.
Dave Ungar’s self language, in the spirit of Nigel Tufnel, turned theknob up to eleven, and eschewed the pre-optimization of even if tests. Instead, the self implementation developed and used sophistacated run-time type inferencing and analysis to generate optimized code at runtime by detecting the common cases of a boolean receiver and converting the message send to tests and branches, while avoiding supporting the less common case.
In common with Ruby, self had a more dynamic object model than Smalltalk, based on delegation rather than inheritance. Something which engendered roaring debates in the early OOPSLA community over how delegation and inheritance are related. The status of that discussion ca. 1989 is captured in “The Treaty of Orlando.” which documented the observation that the differences were really a matter of perception and viewpoint.
The real contribution of self was setting the bar high in terms of the difficulty of implementing a simple and dynamic specification, and then jumping over it. The self team later applied the lessons they had learned to the implementation of Strongtalk, a Smalltalk VM which applied the techniques of the self implementation to both make Smalltalk more dynamic, and better performing.
Many of those sessons should be applicable to implementations of Ruby. I hope to see that unfold over time.
I've been meaning to write about Ruby performance for a while, and a recent blog post by an old friend and colleague, got me off my proverbial.
The old friend is John Duimovich, who wrote about the relative performance of C++ and Smalltalk and what that could mean for ruby.
John's message is important for those who bemoan the performance of Ruby, and I plan to expand on that message in this and later posts to this blog, but first a few words about Mr. Duimovich.
Consider the source
In his day job, paraphrasing his self description John "works for IBM on Java virtual machines and is the lead on the Eclipse tools project management commitee."
But some of my readers might be interested in John's background. John was for a very long time, the lead of the Smalltalk and Java virtual machine team at Object Technology International (OTI) dating from before the time it was acquired by IBM. Among other things John was responsible for the development of embedded Smalltalk virtual machines from OTI, which spawned the VM used in Smalltalk/V Mac, IBM Smalltalk (used in IBM/VisualAge), the 'Universal' Virtual machine which implemented Java on an extended Smalltalk VM, and which was used for the early releases of IBM/VisualAge for Java, and the J9 Java VM. A good deal of what I know about implementing VMs comes from working, lunching, and bar-hopping with John.
John had become OTI's Chief Technology Officer before OTI got assimilated into the IBMborg.
John is a brilliant guy, with a great sense of humor. Two characteristics which seem to have been requirements for a job at OTI. I'm still not sure how I ended up spending several years there.
Dynamically Typed Doesn't Need to Mean Slow
I encourage you to read John's blog post yourself, but to summarize; John ran across another blog item which gave a benchmark written in C++, Ruby and Python. The C++ version runs in under 1/10 of the time needed for either the Ruby or Python versions.
John duplicated the results on his machine, then decided to port the Ruby version of the benchmark to Smalltalk. He then ran it using VisualAge Smalltalk.
And the Smalltalk version runs in the same time as the optimized C++ version!
How can this be?
The Value of Pole Vaulting
Languages like Smalltalk and Self started from the position that a clean object-oriented language was more important than one which makes compromises to make efficient implementation obvious.
Early implementations of Smalltalk used obvious implementations of some features, which were 'fast enough' in many cases, but by no means fast. Two areas which cried for improvement were method dispatch and garbage colection. The obvious techniques were walking up the class hierarchy each time a method was needed, and relatively easy to implement GC techniques like reference counting, and mark-and-sweep. Reference counting has a fairly high cost for each change of an object reference, and also has the drawback of leaking memory because cyclical references lead to garbage which is uncollectable. Mark and sweep delays the overhead until storage is exhausted, but leads to more perceptible pauses when the application gets paused so that the housemaid cleans the room.
Encountering (or having set) this high bar, various implementors of these languages found very clever techniques for both problems. Dave Ungar made measurements of the lifespans of Smalltalk objects and observed that most objects died very shortly after being instantiated, with few living a long life. This led to the invention of generational GC techniques, which quickly dispatched young dead objects, which are the vast majority.
Method dispatching techniques of efficiently implemented dynamically typed languages tend to use clever caching algorithms which can get to what is probably the right method quickly, with a quick test to make sure that the right method was found.
These dispatching techniques turn out to be faster than the virtual function pointer dispatching made possible by strongly-typed languages like C++. In fact, I've heard that more modern implementations of these languages have actually used a more dynamic method dispatch mechanism internally in order to increase performance.
Anoher implementation choice is how to represent executable code. Most efficient implementations use a combination of byte-code representation, and some form of just-in-time translation of byte-codes to machine code. Just how to divide execution between byte-code and machine code is a complicated decision. Back when DIgitalk first produced a version of Smalltalk/V for OS/2, they decided to eschew byte-codes entirely and generate 80286 machine code. The reason was that they were tired of hearing complaints about Smalltalk being an 'interpreted' language.
The surprising result of this experiment was that the resulting implementation was slower. Machine code was bigger, so it took longer to load, and caused more swapping. These costs were paid whether the code in question was executed once or a million times.
Again caching was the basis for getting the best of both worlds. Peter Deutsch of Xerox, later ParcPlace, had introduced the notion of translating byte-codes to machine code into a cache during execution, David Ungar's implementation of Self introduced the notion of using light-weight profiling techniques to avoid the overhhead of translating byte-coded methods which were infrequently executed.
Another area which posed difficulties in implementation was control flow. Smalltalk-80 defines all control flow as methods. Even primitive control flow constructs such as if (ifTrue: in Smalltalk) were implemented as methods on Boolean classes. This is one area where Smalltalk implementations cheated compiling such methods in to testing and branching byte-codes, and requiring the receivers to be boolean instances.
Self eschewed this early optimization. Ungar's team instead relied on run-time type inference in order to dynamicaly generate code which achieved the same or better performance when such a message was sent to a boolean without restricting other cases.
The Current State of Ruby Implementation
Ruby performance today is surprisingly acceptable for a wide range of uses.
This is despite the fact that the implementation is relatively straightforward, almost to the point of being naive. In the current standard implementation of Ruby:
- Method dispatch is done by walking up the 'class' hierarchy looking for methods in a hash table in each class/module.
- Garbage Collection is done by a simple mark and sweep algorithm.
- Executable code is represented by a parse tree which is executed by traversal.
This is not meant to understate the achievements of Matz and the ruby developers. Ruby as it is definitely usable for many production uses.
The point is how much better Ruby performance can get as the implementation matures. A virtual machine, with byte-codes, and better GC is on the roadmap. Ruby virtual machines such as YARV, and JRuby are showing glimmers of the value of implementing Ruby as a virtual machine. If Ruby continues to grow in acceptance, I've no doubt that other clever implementers with experience in efficently implementing dynamically typed languages will provide more implementations.
My prediction is that the future will be so bright that we're going to have to wear (ruby colored) shades!




