require 'rubygems' require 'sinatra' def be "do, be, do, be, do" end get '/strangers' do be do be do end end end
I just ran across a reference to this article by Alex Sandler, on how C++ implements "object-oriented" concepts.
It's a more detailed, and probably more recently researched, coverage of a topic I briefly covered in my RubyConf 2008 talk. If you understand this stuff, you have an appreciation why a compile-time static typed, run-time weakly typed language like C++, as compared to a run-time typed language like Ruby or Smalltalk, makes it crucial to avoid tricking the compiler into thinking that an object is of the wrong type.
I’ll be giving a talk on “The Fall and Rise of Dynamic Languages”
tomorrow at 7:00 p.m., at Red Hat HQ to the Raleigh Ruby Brigade.
Originally this was going to be a slightly revamped talk I gave some months ago to the local Agile group, with a slight change of emphasis from the history of agile methods to focus more on Ruby and other dynamic languages. It’s morphed into a completely different talk.
I plan to take a journey from the 1970s to today, and compare and contrast static and dynamic languages, and examine the recent resurgence in interest in dynamic languages and virtual machines. Along the way, I’ll have a few things to say about whether or not the recent news from RailsConf about MagLev is hype or reality.
If you’re in the area, please come by. Luckily the salmonella scare will probably keep the supply of (rotten) tomatoes to a minimum, so I should be fairly safe.
InfoQ recently published a video interview with Dave Thomas (of OTI fame).
In his inimitable style, Dave covers lots of interesting topics in software development, both today and with a historical perspective.
I agree with almost everything he says, and find the rest food for thought.
His comments about Java as a platform are quite germane to the article I published yesterday. If you’ve been exposed to Big Dave before, you’ll enjoy this, and if you haven’t it’s a good introduction.

Besides being a master werewolf, Marcel Molina Jr. gives great presentations!
In his keynote presentation on the second day of the Ruby Hoedown, Marcel talked about
“What Makes Code Beautiful”,
click on the link for the confreaks video of this session.
The talk started with an exploration of the classical Philosophy of Beauty, from Plato to Descartes.
Marcel summarized this by proposing that beauty lies in the balance between three aspects which,
at times, either strengthen or oppose each other:
- Proportion
- The property that components have the appropriate(relative) size/weight.
- Integrity
- I would summarize this as “fitness of purpose.” Marcel’s anti-example was a
hammer made out of glass. Although it might be beautifully constructed, and a joy to the eye, it
would be unlikely to serve its intended purpose, and thus would fall short on integrity - Clarity
The property of being easily grasped as to meaning and function.
Beauty and Ruby
About 16 minutes into the talk, Marcel started talking about this view of beauty in the context of Ruby code. He gave an example of some really “clever” code to convert strings to an appropriate
instance of a Ruby class, for example “true” would me converted to true, “false” to false, and
strings representing integer or time values to Integers or Times, respectively.
The code in question, implemented a kind of functional language pattern match against the string.
Marcel suggested that he might have been into studying Haskell at the time he wrote this code.
He used a generator to produce an enumerable collection of patterns to try, and did some “nice”
tricks to allow the result of a pattern match to sometimes be solely the value he wanted, and
sometimes to be an array with the value as the second element, to handle the special case where the
desired value was the literal false. If it sounds complicated, it is, I’ve placed the code at
the end of this article.
Some of us in the audience, “smelled” this code right away.
He then critiqued this solution. Although he had originally considered it “beautiful” since it was “elegant” and “sophisticated” he came to smell it too.
A Fresh Design
Here’s how the code ultimately was written:
class CoercibleString < String
def coerce
case self
when 'true': true
when 'false': false
when /^\d+$/: Integer(self)
when datetime_format: Time.parse(self)
else
self
end
end
endOnce this much simpler design is unveiled the original “sophisticated”, and “elegant” design
looks anything but.
Measuring against Proportion, Integrity, and Clarity
- Proportion
- The original is a total failure, it’s much too long compared to the final
code. - Integrity
- Again the original loses on this aspect. The use of the generator, particularly
the early continuation based implementation, causes very slow performance, and leaks memory. Marcel
stated that the simpler version is an order of magnitude faster. - Clarity
- Do I really have to explore this?
Marcel had pointed out when describing the original design that one of it’s “cool features” was
extensibility. Adding a new coercion just required adding another call to try in the Generator.new
block.
In contrast adding a new coercion to the better design just requires adding a when leg to the
case statement.
The Questionable Beauty of Making Subclasses of Core Classes
While I loved the talk and agree with 99 and 44/100%, I’m just a bit troubled by the introduction
of the CoercibleString class. I think that it falls down on proportion at least.
It seems to me that there’s some missing code here. How do you actually coerce a string.
This seems to strongly imply a usage like this:
class PayloadProcessor
def process
# code which extracts a string to be coerced
#coerce the string referenced by the variable value_str
value = CoercibleString.new(value_str).coerce
#further processing
end
endAn alternative, and it seems to me to be a better one, although I’m convinceable otherwise, would
be to just make that method part of the class requiring the conversion, either directly, or through a
module:
class PayloadProcessor
def process
# code which extracts a string to be coerced
#coerce the string referenced by the variable value_str
value = coerce(value_str)
#further processing
end
def coerce(str)
case str
when 'true': true
when 'false': false
when /^\d+$/: Integer(str)
when datetime_format: Time.parse(str)
else
str
end
end
endNow some might argue that the ‘functional’ looking coerce method which takes the string
as an argument rather than the receiver seems somehow less ‘object oriented’, but I find this
unconvincing.
If CoercibleString is a class we need code to create it from a string, something like:
class CoercibleString < String
# Create a new coercible string
# Note that since the actual value of
# Ruby strings are not held by an instance variable
# we need to alter the internal representation
def initialize(source_str)
self << source_str
end
endI had a brief conversation with Marcel about whether or not subclassing string really seemed
appropriate, but it lasted all of about a minute. There’s a bit of supposition here on my
part, so apologies to Marcel if I misunderstood the exchange. He indicated that he would probably
advocate
defining a method called CoercibleString, in parallel with Kernel#Integer and its ilk.
module Kernel
def CoercibleString(str)
CoercibleString.new(str)
end
endBut this syntactic sugar, just seems to be tilting the balance towards a less proportional design.
Conclusion
Building new classes is often a good idea, but not always. I’m not totally convinced that
coerce(str) is more beautiful than CoercibleString.new(str).coerce, or CoercibleString(str).coerce,
but my sense of esthetics tilts me that way.
Comments?
A “Smelly” Way to Coerce Strings
Here’s Marcel’s original code:
class CoercibleString < String
attr_accessor generator
def coerce
attempt = nil
break unless {attempt = coercions.next).nil? while coercions.next?
attempt.nil? ? self : attempt
end
private
def coercions
Generator.new do | self.generator |
try { self == 'true' }
try { [self == 'false', false ] }
try { Integer(self) }
try { Date.parse(self) }
end
end
def try
attempt, desired = yield
generator.yield(desired.nil? ? attempt : desired) if attempt
rescue ArgumentError
generator.yield nil
end
endI've been meaning to write about Ruby performance for a while, and a recent blog post by an old friend and colleague, got me off my proverbial.
The old friend is John Duimovich, who wrote about the relative performance of C++ and Smalltalk and what that could mean for ruby.
John's message is important for those who bemoan the performance of Ruby, and I plan to expand on that message in this and later posts to this blog, but first a few words about Mr. Duimovich.
Consider the source
In his day job, paraphrasing his self description John "works for IBM on Java virtual machines and is the lead on the Eclipse tools project management commitee."
But some of my readers might be interested in John's background. John was for a very long time, the lead of the Smalltalk and Java virtual machine team at Object Technology International (OTI) dating from before the time it was acquired by IBM. Among other things John was responsible for the development of embedded Smalltalk virtual machines from OTI, which spawned the VM used in Smalltalk/V Mac, IBM Smalltalk (used in IBM/VisualAge), the 'Universal' Virtual machine which implemented Java on an extended Smalltalk VM, and which was used for the early releases of IBM/VisualAge for Java, and the J9 Java VM. A good deal of what I know about implementing VMs comes from working, lunching, and bar-hopping with John.
John had become OTI's Chief Technology Officer before OTI got assimilated into the IBMborg.
John is a brilliant guy, with a great sense of humor. Two characteristics which seem to have been requirements for a job at OTI. I'm still not sure how I ended up spending several years there.
Dynamically Typed Doesn't Need to Mean Slow
I encourage you to read John's blog post yourself, but to summarize; John ran across another blog item which gave a benchmark written in C++, Ruby and Python. The C++ version runs in under 1/10 of the time needed for either the Ruby or Python versions.
John duplicated the results on his machine, then decided to port the Ruby version of the benchmark to Smalltalk. He then ran it using VisualAge Smalltalk.
And the Smalltalk version runs in the same time as the optimized C++ version!
How can this be?
The Value of Pole Vaulting
Languages like Smalltalk and Self started from the position that a clean object-oriented language was more important than one which makes compromises to make efficient implementation obvious.
Early implementations of Smalltalk used obvious implementations of some features, which were 'fast enough' in many cases, but by no means fast. Two areas which cried for improvement were method dispatch and garbage colection. The obvious techniques were walking up the class hierarchy each time a method was needed, and relatively easy to implement GC techniques like reference counting, and mark-and-sweep. Reference counting has a fairly high cost for each change of an object reference, and also has the drawback of leaking memory because cyclical references lead to garbage which is uncollectable. Mark and sweep delays the overhead until storage is exhausted, but leads to more perceptible pauses when the application gets paused so that the housemaid cleans the room.
Encountering (or having set) this high bar, various implementors of these languages found very clever techniques for both problems. Dave Ungar made measurements of the lifespans of Smalltalk objects and observed that most objects died very shortly after being instantiated, with few living a long life. This led to the invention of generational GC techniques, which quickly dispatched young dead objects, which are the vast majority.
Method dispatching techniques of efficiently implemented dynamically typed languages tend to use clever caching algorithms which can get to what is probably the right method quickly, with a quick test to make sure that the right method was found.
These dispatching techniques turn out to be faster than the virtual function pointer dispatching made possible by strongly-typed languages like C++. In fact, I've heard that more modern implementations of these languages have actually used a more dynamic method dispatch mechanism internally in order to increase performance.
Anoher implementation choice is how to represent executable code. Most efficient implementations use a combination of byte-code representation, and some form of just-in-time translation of byte-codes to machine code. Just how to divide execution between byte-code and machine code is a complicated decision. Back when DIgitalk first produced a version of Smalltalk/V for OS/2, they decided to eschew byte-codes entirely and generate 80286 machine code. The reason was that they were tired of hearing complaints about Smalltalk being an 'interpreted' language.
The surprising result of this experiment was that the resulting implementation was slower. Machine code was bigger, so it took longer to load, and caused more swapping. These costs were paid whether the code in question was executed once or a million times.
Again caching was the basis for getting the best of both worlds. Peter Deutsch of Xerox, later ParcPlace, had introduced the notion of translating byte-codes to machine code into a cache during execution, David Ungar's implementation of Self introduced the notion of using light-weight profiling techniques to avoid the overhhead of translating byte-coded methods which were infrequently executed.
Another area which posed difficulties in implementation was control flow. Smalltalk-80 defines all control flow as methods. Even primitive control flow constructs such as if (ifTrue: in Smalltalk) were implemented as methods on Boolean classes. This is one area where Smalltalk implementations cheated compiling such methods in to testing and branching byte-codes, and requiring the receivers to be boolean instances.
Self eschewed this early optimization. Ungar's team instead relied on run-time type inference in order to dynamicaly generate code which achieved the same or better performance when such a message was sent to a boolean without restricting other cases.
The Current State of Ruby Implementation
Ruby performance today is surprisingly acceptable for a wide range of uses.
This is despite the fact that the implementation is relatively straightforward, almost to the point of being naive. In the current standard implementation of Ruby:
- Method dispatch is done by walking up the 'class' hierarchy looking for methods in a hash table in each class/module.
- Garbage Collection is done by a simple mark and sweep algorithm.
- Executable code is represented by a parse tree which is executed by traversal.
This is not meant to understate the achievements of Matz and the ruby developers. Ruby as it is definitely usable for many production uses.
The point is how much better Ruby performance can get as the implementation matures. A virtual machine, with byte-codes, and better GC is on the roadmap. Ruby virtual machines such as YARV, and JRuby are showing glimmers of the value of implementing Ruby as a virtual machine. If Ruby continues to grow in acceptance, I've no doubt that other clever implementers with experience in efficently implementing dynamically typed languages will provide more implementations.
My prediction is that the future will be so bright that we're going to have to wear (ruby colored) shades!
One theory I’ve seen defines a “duck type” as a set of messages which an object bound to a parameter or value needs to understand. This leads some, who want to make type-checking happen a bit earler, to propose testing the values of such variables with oneor more respond_to? tests before using the object “in anger.”
But ducks can be subtle…
Sometimes it’s not enough just to have a certain set of methods in a duck’s repetoire.
Let’s look at an, admittedly cooked-up, example:
def reverse_lookup(value, dictionary)
(dictionary.collect { |kv| kv[1] == value ? kv[0] : nil }).compact
endThe method is intended to return an array containing
keys in a “Dictionary” which map to the given value:
irb(main):001:0> load 'reverse_lookup.rb'
=> trueirb(main):002:0> reverse_lookup(1, { 'a' => 1, 'b' => 2, 'c' => 1})
=> ["a", "c"]
Now, I said that this was a “cooked-up” example to illustrate a point about “ducks.” It would probably be more proper to implement a method like keys_for_value in Hash or a module which could be used to extend Hash or other “dictionary” classes.
The point here is that for this implementation of reverse_lookup, the only message sent to dictionary is collect, soaccording to the “respond_to?” theory any object which has “collect” in it’s repetoire is the right species of duck for the dictionary parameter.
So let’s try another duck:
irb(main):003:0> reverse_lookup(1, [1, 2, 3])
=> [0, 1]Well, it works in the sense that nothing blows up, but the results look strange, where did that 0 come from?
While you weren’t looking, I snuck a debug parameter into my reverse_lookup method to give an “x-ray” view into what’s happening.
irb(main):004:0> reverse_lookup(1, [1, 2, 3], true)
kv = 1, kv[0]=1, kv[1]=0
kv = 2, kv[0]=0, kv[1]=1
kv = 3, kv[0]=1, kv[1]=1
=> [0, 1]The reason that it works at all with an array of integers is a bit of an anomally due to the fact that Ruby integers define the [] method as a bit reference. When kv is an integer kv1 and kv0 are respectively the least significant, and second least significant bits in the binary representation of kv.
Now let’s try another array;
irb(main):005:0> reverse_lookup(1, [Array, NilClass])
NoMethodError: undefined method `[]' for NilClass:Class
from ./reverse_lookup.rb:2:in `reverse_lookup'
from ./reverse_lookup.rb:2:in `reverse_lookup'
from (irb):4
from :0So this doesn’t work at all.
To sum this up, “duck types” aren’t just collections of messages, whether or not an object will work in any given role also depends on subtleties of the semantics of theimplementation of the corresponding methods, and can also depend on the state of the object as well.
Now this shouldn’t be considered a weakness of “duck typing” compared to “strong typing,” or something in-between the two. Strong-typing systems have similar “flaws” for example strongly typing a variable to an integer won’t protect against division by zerowhcn the variable is used as a divisor, nor will it alone protect against out of bounds errors when it’s used as an array index. The errors which aren’t detected by strong-typing is a much larger set than those which are. In languages which provide limited run-time checking of code which has passed the strong-typing “hurdle” the effects can be disastrous. In my humble opinion, the benefits of dynamic typing far outweigh “unlearning” what has been taught by other languages such as C++ and Java.
Bjarne Stroustrup used to quip, in panel discussions over strong vs. dynamic typing (i.e C++ vs. Smalltalk), that he’d hate to be a passenger on a plane which flashed a light labled “message not understood” in the cockpit when the pilot tried to lower the landing gear. I’d hate even more to be a passenger on a plane whose control software seg-faulted under this condition instead because putting the landing gear down caused a buffer overflow.
The real answer to uncovering and routing out errors is testing, testing, and more testing.




