Algorithm Choice Trumps Programming Language Choice For Performance

Posted by Rick DeNatale Sun, 13 Mar 2011 14:46:00 GMT

My friend Brian Adkins just published an article on his blog comparing the performance of his Haskell program to find the longest palindrome in a string to a similar program in Ruby.

His Haskell program runs 25 times faster than his Ruby program. He reports that the Haskell program takes 7 times as long to process an input twice as long.

Brian's approach is to generate all of the substrings in the input string, then filter that list of substrings to those which are palindromes, and then output the longest one which passes through that filter. A cursory analysis indicates that this is O(n^2) which is in-line with his data.

I couldn't resist, so I sat down with my MacBook and my Sunday morning coffee and wrote my version of the program, in about 30-45 minutes.

My first thought was that we want to cut down on the search space, by only examining substrings that could be palindromes. By definition a palindrome starts and ends with the same character, so we only need consider such substrings. It took a few minutes to find a reasonable approach.

My basic idea was to start by looking for the initial substrings of the string ending with the first character, then do the same for subsequent characters. Then it occurred to me that I should find the LONGEST initial substring, since if it were a palindrome shorter substrings can't be longer.

In pondering the best way to do this in Ruby, and thinking about using a regular expression, I realized that rather than starting with the beginning of the string, it would be better to start at the end. I could then use Ruby 1.9's String.match(regexp, pos), to find the longest substring at the end of the string starting and ending with the same character, using the pos parameter to search for a shorter string when the last match is not a palindrome.

So in the end my algorithm examines each initial substring of the input string starting with the longest. For each of those it examines the final substrings which begin and end with the same character, again starting with the longest, and returns the longest of those which is a palindrome. The result of the overall program is the longest palindrome from any initial substring.

Here's the code:

  TEXT = <<END
  I'll just type in some example text here and embed a little
  palindrome - A man, a plan, a canal, Panama! - I expect that will be
  the longest palindrome found in this text.
  Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris
  tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id
  condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere
  ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
   odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante
  ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium
  dolor, et mattis leo turpis non est. Sed aliquet, sapien quis
  consequat condimentum, sem magna ornare ligula, id blandit odio nisl
  vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt
  lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros
  accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc
  et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc
  vitae tempus purus.
  END

  def clean(str)
    str.gsub(/[^A-Za-z]/,'').downcase
  end

  class String
    def palindrome?
      self == self.reverse
    end

    def longest_palindrome_at_end
      first_possible_start = 0
      end_match_regexp = /#{self[-1]}/
      palindrome = nil
      while (palindrome.nil? && (candidate_start = self.match(end_match_regexp, first_possible_start)))
        candidate_index = candidate_start.begin(0)
        candidate = self[candidate_index..-1]
        if candidate.palindrome?
          palindrome = candidate
        else
          first_possible_start = candidate_index + 1
        end
      end
      palindrome || ""
    end

    def longest_palindrome
      longest = ""
      self.size.downto(1) do |last|
        break if last < longest.size
        candidate = self[0,last].longest_palindrome_at_end
        longest = candidate if candidate.size > longest.size
      end
      longest
    end
  end

  require 'benchmark'

  double = TEXT + TEXT

  puts Benchmark.measure {puts clean(TEXT).longest_palindrome}
  puts Benchmark.measure {puts clean(double).longest_palindrome}

And here's the output:

  amanaplanacanalpanama
    0.110000   0.000000   0.110000 (  0.110347)
  amanaplanacanalpanama
    0.470000   0.000000   0.470000 (  0.463541)

Now my MacBook Pro is a little newer than Brian's, it's a late 2009 model, with a 2.66 GHz Core 2 Duo. But my version is 36x faster (.11 vs. 4 seconds) than Brian's Haskell program, and 891x faster than his Ruby program. Hence the title of this article!


How Arlo got injected into Ruby

Posted by Rick DeNatale Fri, 26 Nov 2010 16:39:00 GMT

Yesterday, I happened to catch a bit of the annual Macy*s Thanksgiving Day Parade in New York.

In particular I saw Arlo Guthrie and his daughter perform Arlo's father, Woody's, song "This Land is Your Land" while standing in front of a giant Turkey on the Ocean Spray cranberry float.

I was struck by the combination of Arlo's still youthful face, framed by long white hair. Arlo is getting on in years. Thankfully he has outlived his father, who died of Huntingdon's disease at age 55 by some 8 years.

As an aside, although I grew up in the Connecticut suburbs of New York City, I've only gone to the Macy*s parade once in 1997, after I'd moved to North Carolina. That day was very windy and the crews handling those big parade balloons had their hands more than full. The Cat in the Hat balloon knocked over a lamp post, and a parade-goer was struck in the head with falling debris and suffered a month-long coma, my wife and I witnessed the Barney the Dinosaur being stabbed and stomped on by police to avert another accident. "Oh, the humanity!"

And I was reminded of Arlo again today when Andy Ihnatko picked the song Alice's Restaurant Massacree for his Amazon advent calendar.

For those readers of tender years who might not be familiar with Arlo and Alice's Restaurant, the song tells the tale of a communal Thanksgiving feast in a deconsecrated church in Great Barrington, Massachussetts, the home of his Alice and Ray Block. After the dinner Arlo and a friend volunteered to get rid of the post-feast trash, and loaded it into a VW Microbus, bound for the dump, only to find that the dump was closed for the holiday. So they dumped the trash over a fifteen foot cliff by a country road on top of a pile of trash that was already there, figuring "one big pile is better than two little piles." Two days later they were arrested for littering, when the local constable Officer Obie (no his last name wasn't Fernandez) discovered an envelope with Guthries name on it under the pile of garbage, and Arlo when asked about this said "Yes, sir, Officer Obie, I cannot tell a lie, I put that envelope under that garbage."

This led to a trial, for which he was fined $50 and ordered to pick up the garbage.

The story continues when Guthrie was later ordered to report for induction into the Army. Again, for the benefit of my younger audience, I probably need to explain that, during the Vietnam War, the US had a military draft, unlike today's "volunteer" military which attracts recruits out of a mixture of patriotism and economic necessity.

The upshot of the story was that after going through a series of medical examinations at the induction center, he was nearing being sworn into the Army, except that "the last man" asked a crucial question, "Kid, we only got one question. Have you ever been arrested?"

In reply Arlo told the tale of his conviction for littering, and was sent to the "Group W" bench.

Here's how Arlo decribes Group W.

" Group W's where they put you if you may not be moral enough to join the army after committing your special crime, and there was all kinds of mean nasty ugly looking people on the bench there. Mother rapers. Father stabbers. Father rapers! Father rapers sitting right there on the bench next to me! And they was mean and nasty and ugly and horrible crime-type guys sitting on the bench next to me. And the meanest, ugliest, nastiest one, the meanest father raper of them all, was coming over to me and he was mean 'n' ugly 'n' nasty 'n' horrible and all kind of things and he sat down next to me and said, "Kid, whad'ya get?" I said, "I didn't get nothing, I had to pay $50 and pick up the garbage." He said, "What were you arrested for, kid?" And I said, "Littering." And they all moved away from me on the bench there, and the hairy eyeball and all kinds of mean nasty things, till I said, "And creating a nuisance." And they all came back, shook my hand, and we had a great time on the bench, talkin about crime, mother stabbing, father raping, all kinds of groovy things that we was talking about on the bench.

After that a sergeant came over and gave him a long form to fill out with the details of his crime, ending with the question "Have you rehabilitated yourself?" Arlo took the form to the sergeant and said:

"Sergeant, you got a lot a damn gall to ask me if I've rehabilitated myself, I mean, I mean, I mean that just, I'm sittin' here on the bench, I mean I'm sittin here on the Group W bench 'cause you want to know if I'm moral enough join the army, burn women, kids, houses and villages after bein' a litterbug." He looked at me and said, "Kid, we don't like your kind, and we're gonna send your fingerprints off to Washington."

So What Does This Have to Do with Ruby?

I'm glad you asked that question.

The answer is that the main enumeration method names of Ruby, collect, select, reject, inject... came to Ruby from the Alice's Restaurant Massacree, indirectly via Smalltalk.

Some Ruby programmers, seem to have an aversion to the inject method. It seems particularly irksome to programmers who have come from languages which use map and reduce for collect and inject. Later ruby versions have made aliased map and collect, and reduce and inject. But this leaves map as taking an optional argument, which if passed is used as an artificial first element to be effectively pre-pended to the enumeration being mapped. That optional argument and the name inject comes from Smalltalk's inject:into: method which makes sense to a Smalltalk programmer since you are injecting that initial value into a reduction of the collection using the block given by the second argument. Smalltalk programmers get used to passing the 'right' value for that argument, for example if inject is used to sum a collection the initial argument is normally the identity value for addition, e.g. 0, or might be something else if we want to add the sum to another value.

A couple of months ago, I was listening to a podcast interview with Dan Ingalls who was the first implementor of Smalltalk, and probably the key contributor as Smalltalk evolved from Smalltalk from Smalltalk-71, through Smalltalk-72, and Smalltalk-76 up through Smalltalk-80 which is what most people think of as Smalltalk today. During the interview he was asked about the origin of those enumeration methods of the Smalltalk collection classes. Alan Kay had told the interviewer that they had come from a song. At first Dan didn't remember this but then remembered that there was a song which had a string of words like inject, select, detect etc. As far as I recall, though he didn't name the song.

But I recognized it right away, here's how "Alice's Restaurant Massacree" transitions from the littering trial to the draft:

Came to talk about the draft. They got a building down New York City, it's called Whitehall Street, where you walk in, you get injected, inspected, detected, infected, neglected and selected.

So Dan picked the collection enumeration method selectors in Smalltalk from "Alice's Restaurant", no doubt. I suspect that that initial argument of inject:into: came about because he wanted to use that pattern and map and reduce didn't fit. Actually I'm not sure that map and reduce were commonly used terms at that time.

So if you don't like inject in Ruby, don't blame Matz, blame Dan and Arlo!


Making RSpec, Rake, and Bundler play well together

Posted by Rick DeNatale Fri, 25 Jun 2010 11:45:00 GMT

I'm working a project which uses RSpec, Bundler, and Rails 2.3.4.

The team has been struggling with the changes in gem management with Bundler. The "Old School" rails way to configure gems for different rails environments is to conditionally execute gem.configure based on the rails environment. Then use the rake:gems:install task to gem install the gems, one for each environment.

Bundler does things a bit differently. It uses groups to segregate gems by environment, so you name the gems needed for the test environment in a :test group. When you run bundle install, it installs all of the gems for all of the groups (unless you opt out of some groups. It uses the groups at run time, to selectively expose only gems applicable to the current environment.

This cause problems for things like the spec rake task. If you run

rake spec

without explicitly setting the Rails environment.

The way the spec rake task works is that it depends on a rails provided rake task called :environment, which 'boots' the rails environment, then the spec_helper file which each spec whould require, sets ENV['RAILS_ENV'] to 'test' if it's not already set, then it requires needed gem code, like 'spec/rails'.

In the old school approach this works, since those gems are installed and visible. With bundler, it fails since they won't be exposed, since the bundler environment got set up using the development environment, so gems in the :test group aren't available.

In an attempt to fix this, other members on the team were mixing things up, doing things like putting conditional tests of the rails environment in the Gemfile, and then running the bundle command with different overrides of the RAILS_ENV environment variable.

But that way lies madness.

This morning I worked out a solution which involves deferring setting up the bundle environment.

  1. I needed to keep the spec task from running the environment task. The project was already using a variation of override_rake_task, so I added an override in a task within the lib directory:
          override_task :spec do
            Rake::Task["spec:original"].execute
          end
        
  2. The next step was to bootstrap the rails environment in spec_helper. To do this I started it with:
          ENV["RAILS_ENV"] ||= 'test'
    
          RAILS_ROOT = "#{File.dirname(__FILE__)}/.." unless defined?(RAILS_ROOT)
          environment_path = File.expand_path(File.join(RAILS_ROOT, 'config', 'environment'))
          require(environment_path)
            require 'spec/rails'
            #... any other needed requires
        

    It's crucial to expand the path which ensures that the same string is used whenever the environment file is required. Before I did that I was having problems with the environment being required a second time, presumably for the second spec. If you have any code which patches anything using alias_method_chain, or a similar technique, loading that code a second time can cause infinite loops which can be mystifying.

So this seems to be working. I'll try to update if I find anything else, but in the meantime I hope some folks find it useful.


Of Procs, Blocks, and Dancing Angels

Posted by Rick DeNatale Fri, 05 Mar 2010 02:08:00 GMT

Recently, Yehuda Katz wrote an article in reaction to a Pythonista's criticism of Ruby and in particular to the complaint that you can't call a proc with parentheses:

  my_method = Proc.new {1 + 2}
  my_method()

Yehuda did a great job of defending why this is consistent with the rest of Ruby, and talked about how blocks in Ruby are really used. As part of the article, he conflated blocks and Procs. The difference is rather subtle, and in most cases can be ignored. But Giles pointed out that the difference is there, and is documented both in the Pickaxe and in David Flanigan and Matz's "The Ruby Programming language" and I posted a comment in support of Giles after Yehuda pushed back maintaining his view that there was no difference.

Yehuda followed up with another article, Which prompted this article.

Let me start by saying that I'm not interested in starting a war here. First, I've nothing but respect for Yehuda, and I'm very impressed by his body of work, particularly the refactoring of Rails from Rails 2 to Rails 3. Second, I agree that most of this stuff really doesn't matter all that much, and only when discussing things at a rather deep level. It can be like a debate about how many angels can dance on the head of a pin! But such discussions can be fun and in rare instances even enlightening.

Proc Identity

Here's Yehuda's example from the second article:

    def foo
      yield
    end

    def bar(&block)
      puts block.object_id
      baz(&block)
    end

    def baz(&block)
      puts block.object_id
      yield
    end

    foo { puts "HELLO" } #=> "HELLO"
    bar { puts "HELLO" } #=> "2148083200\n2148083200\nHELLO"
  

He points out that the identity of the proc object bound to the two arguments named block in the bar and baz methods doesn't change. He then gives a slightly different example:

  def foo(&block)
    puts block.object_id
    yield
  end

  b = Proc.new { puts "OMG" }
  puts b.object_id
  foo(&b) #=> 2148084040\n2148084040\nOMG

and then proposes two "mental models" of what's going on:

  1. The &b unwraps the Proc object, and the &block recasts it into a Proc. However, it somehow also wraps it back into the same wrapper that it came from into the first place. or...
  2. The &b puts the b Proc into the block slot in foo’s argument list, and the &block gives the implicit Proc a name. There is no need to explain why the Proc has the same object_id; it is the same Object!

He then says that the first actually represents the MRI implementation. Based both on a reading of Flanigan and Matz and the MRI code, I respectfully disagree, the truth lies in the middle

The first thing to realize is that the use of & before the last argument of a method definition is not the same thing as the use of & before the last parameter value in a method invocation. The

An analogy with Splat

Ruby has a similar prefix *, a.k.a splat, or what David Black likes to call "the unary unarray" operator. Actually depending on where it occurs * is either an "unarray" or a "make array" operator.

Consider this code:

  array = [1, 2, 3]

  def three_args(a, b, c)
    "a is #{a}, b is #{b}, c is #{c}"
  end

  a, b, c = *array

  a # => 1
  b # => 2
  c # => 3

  three_args(*array) # => "a is 1, b is 2, c is 3"

  def glob_args(*args)
    args
  end

  x, *array2 = 4, 5, 6

  x      # => 4
  array2 # => [5, 6]

  glob_args(:now, :is, "the", Time) # => [:now, :is, "the", Time]

As I said above, * is an unarray operator when it appears either in the right hand side of an assignment, or in the argument list of a method invocation. It acts as an operator which coalesces multiple values into an array if it appears on the left hand side of an assignment or in front of a formal parameter in a method

One important consideration in the use of both splat and proc arguments, is that the method definition doesn't "know" whether the method will be invoked like this:

  foo  # with no block
  
  # or this:
  
  foo { 1 }
  
  # or this:

  foo(&b)
  
  or even this:
  
  proc = Proc.new {1}
  foo(proc)

All of these are valid, and need to work.

Method definition with a & Formal Parameter

In Yehuda's last quoted example this is the def foo case.

Method Invocation with a & arg

Two things happen when a method is called with a & argument. The cases of calling the method with either an implicit block, an explicit value for the argument, or no argument for the block at all all need to be handled. And the cases of explicitly calling the argument e.g via block.call, and yielding to the block both have to work.

So the &block argument in the definition of foo means that the method prelude ensures that the block argument will either be nil or will refer an object which is a Proc, or at least appears to be a proc. It does this at the entry to the invocation of the method.

It also must ensure that yield semantics will work whether the method was called with an implicit block, or an explicit value for the parameter. Note that in MRI, block yield works by reference to a field called iter in the current stack frame, which in Ruby 1.8 is used to find the node in the abstract syntax tree of the method which defined the block which corresponds to that block. And yield is implemented by evaluating the subtree of the AST rooted at that node. Of course the representation in YARV in Ruby 1.9 represents executable code differently but the effect is the same. The yield keyword deals with the internal "VM" representation of the executable ruby code directly without surfacing it as a Ruby object. This is the real difference between a block and an instance of Proc.

The draft Ruby standard abstracts this iter field a bit using the notation [block] to refer to a logical stack of blocks in the execution context. The draft distinguishes between block and procs. Since I started writing this article, Avdi Grimm wrote another reaction to Yehuda's second article, which looks at the same issues I'm talking about here from the perspective of the draft standard.

With all that said, here's how that formal block argument is handled when the foo method is invoked

  1. If block is not nil, then it sees if a block was given in the method call, and if so, it creates a Proc object which will cause the block to be executed when the proc is called.
  2. If block is NOT nil
    1. If the value of block is not already an instance of Proc send :to_proc to the value (with a guard to see if it responds, but that's a minor implementation decision to avoid having the overhead of catching a MethodMissing exception). This is why defining Symbol#to_proc allows writing things like (1..10).map(&:succ).
    2. Set up the VM so that yield will work should that happen. In Ruby 1.8 this involves correctly pushing an iterator onto the stack frame, I haven't read through the YARV implementation but I'm sure that it has the same effect.
  3. If the object referenced by block isn't a proc or convertible using #to_proc, to a 'proc-like' duck which can quack to the tune of #call, then a TypeError is raised.

Note that this isn't wrapping the argument with a proc, it's ensuring that we have an object which acts as a proc, if it needs to be 'cast' into a proc it will be, but if it's already a real Proc, or it responds to to_proc by returning self, then it will be the same object.

Method invocation with a & prefix on the last argument value

This part is a bit simpler. The argument itself is just passed through. The trick is that in the process of invoking the method, the sending code must do the same thing as step 2.2 above in case the method does a yield.

Does it Matter?

Barring any embarrassing mistakes on my part, this is what happens in MRI ruby, as described in section 6.4.5 Block Arguments of Flanigan and Matsumoto, as well as my reading of the MRI code.

Yehuda makes the point that this is all pretty invisible to the Ruby programmer, and he's right. It would seem that a Ruby implementation could ALWAYS turn blocks into procs and not have a separate hidden iter structure in the VM. One reason for not doing so is performance. Since Procs are closures and capture the bindings of any variables in their scope, there is some overhead to their creation and destruction, if a block is only accessed via yield, then it's guaranteed not to have a lifetime past the return of the called method. So this is an optimization. And such optimizations are known in other dynamic language implementations. Smalltalk gives the illusion of uniformly using closures to represent blocks, but most implementations cheat and recognize cases where the overhead of creating a closure can be avoided. In some cases this is invisible to the Smalltalk programmer, but not always.

So although we might know exactly angels are dancing on the head of the pin, or what steps they are doing, the ruby language books and the draft standard let it slip that they are there.


Holy CoW!

Posted by Rick DeNatale Wed, 16 Dec 2009 14:38:00 GMT

Last week the Phusion guys, gave a talk at Google about the implementation of Ruby Enterprise Edition, and it's now available on YouTube.

It's good stuff if you are a VM geek, like me. The cover two major topics

  1. How they made the Ruby 1.8 garbage collector Copy on Write (CoW) friendly, which allows multiple Ruby processes to share memory for unchanged objects. This greatly reduces the footprint of a typical Rails Deployment for example. They talks about a series of attempts to change the Ruby GC to move the mark bits from the objects themselves to a separate memory structure. The result is significant memory savings, a slightly slower GC, but overall faster Ruby performance because they replaced malloc with a faster allocator (from Google apparently) which more than offsets the GC performance
  2. How a contribution from the Event Machine developers improves thread context switching for Ruby 1.8's user space threads. Ruby 1.8 handles thread context switches by copying the execution stack to and from the heap, which can result in significant time being taken up by memcopy calls. Ruby Enterprise edition now has an optional feature which instead switches the base stack pointer on a thread switch. This is processor specific, and is only available right now for Intel 32 and 64 bit processors.

So if this kind of stuff interests you, I'd recommend spending the half hour or so that it takes to watch the video


It's the Little Things

Posted by Rick DeNatale Tue, 27 Oct 2009 14:26:00 GMT
$ rvm use 1.8.7
$ ruby -e'puts [1, "a"].to_s'
1a
 $ rvm use 1.9
$ ruby -e'puts [1, "a"].to_s'
[1, "a"]

I've been working on converting my client's Rails App to Ruby 1.9.

It's been fairly painless, but there have been a few stumbling blocks. One has been the use of arrays within string interpolations, and the difference in the result

Prior to Ruby 1.9 Array#to_s was a synonym for Array.join which resulted in the concatenation of the results of sending to_s to each element. In Ruby 1.9 Array#to_s is the same as Array#inspect.

This has had a tendency to produce subtle problems which are at time hard to track down. Once they are found, the solution is to do something like changing:

"Whatever #{some_array}"

to:

"Whatever #{some_array.join}"

Ruby Version Management: Multiruby and RVM

Posted by Rick DeNatale Wed, 02 Sep 2009 21:11:00 GMT

Many Rubyists find the need from time to time to run multiple versions of Ruby. If you are developing open-source code, it's a good idea to try to maintain compatibility with all three of the main versions of Ruby current in use, 1.8.6, 1.8.7 and 1.9

There have been some tools for this for a while now. A lot of you probably already know about multiruby, and many may be aware of a new gem called rvm for Ruby Version Manager.

These two are useful for different purposes. Multiruby excels for testing ruby code against different versions, while rvm is great for quickly switching rubies so that you can play with one or the other. I think of RVM as a set of hand tools, and multiruby as a power tool workshop

They complement each other, but I've had a few bumps getting them to work together, hence this article.

Multiruby

One tool which helps do this is the multiruby suite of tools which are part of Ryan Davis' ZenTest gem. There are three tools in this suite:

multiruby_setup
which allows you to install and maintain a collection of Ruby versions. The various versions are installed in a subdirectory of .multiruby in your home directory
multiruby
which runs each of the installed ruby commands with the same arguments.
multigem
which uses multiruby to run the gem command in order to install gems in the right place for each of the ruby versions under that .multiruby directory.

For my RiCal gem, I have some rake tasks which run the rspec suite for the gem using multiruby, so I can be sure it still works with "the big three" before I publish a new version. To enable this I submitted a patch to RSpec to let you tell an instance of RSpecRakeSpecTask where to find the 'ruby' command. David incorporated this several releases of RSpec ago. This lets me have this in a rake file:

  multiruby_path = `which multiruby`.chomp
  if multiruby_path.length > 0 && Spec::Rake::SpecTask.instance_methods.include?("ruby_cmd")
    namespace :multi do
      desc "Run all specs with multiruby and ActiveSupport"
      Spec::Rake::SpecTask.new(:with_active_support) do |t|
        t.spec_opts = ['--options', "spec/spec.opts"]
        t.spec_files = FileList['spec/**/*_spec.rb']
        t.ruby_cmd = "#{multiruby_path}"
        t.verbose = true
        t.ruby_opts << "-r #{File.join(File.dirname(__FILE__), *%w[gem_loader load_active_support])}"
      end

      desc "Run all specs multiruby and the tzinfo gem"
      Spec::Rake::SpecTask.new(:with_tzinfo_gem) do |t|
        t.spec_opts = ['--options', "spec/spec.opts"]
        t.spec_files = FileList['spec/**/*_spec.rb']
        t.ruby_cmd = "#{multiruby_path}"
        t.verbose = true
        t.ruby_opts << "-r #{File.join(File.dirname(__FILE__), *%w[gem_loader load_tzinfo_gem])}"
      end
    end

    desc "run all specs under multiruby with ActiveSupport and also with the tzinfo gem"
    task :multi => [:"spec:multi:with_active_support", :"spec:multi:with_tzinfo_gem"]
  end  

I've got three tasks here because RiCal works with either the tzinfo gem OR activesupport from Rails, and I want to test each combination of gems and ruby versions.

RVM

Like multiruby, rvm lets you set up and use multiple versions of ruby. As I said above the difference here is that while multiruby runs them all together, rvm is for when you want to pick one to use for a while.

The rvm command is used to:

  • install a ruby implementation specifying one of ruby for MRI ruby, ree for Ruby Enterprise Edition a version of MRI patched for use with passenger (a/k/a mod-rails) or jruby surprisingly enough for JRuby and optionally the specific version and even patch level.
  • pick which ruby to use by using "rvm use which", where which is one of the above or default for the standard ruby installation for your system.

as well as other management functions.

The rvm gem is actually a thin ruby wrapper around some bash scripts. The way rvm works is to set up shell environment variables when you use "rvm use" so you get the right ruby executables and environment, and there lies the rub.

Who's got the Gem

Now did I tell you that I decided to add rvm to my arsenal right after I upgraded my MacBook to run Snow Leopard?

Because of this I had to rebuild a lot of my ruby development tool chain. I decided just to 'fault in' things that I found to be missing when I found that they were missing. A lot of those things were gems. So I'd run my various ruby projects, and when I found a missing gem, I'd install it.

So I ran my normal spec tasks against RiCal, and installed the missing gems. When I got those working, I ran the multiruby taks, and found that the tzinfo gem was missing. This wasn't a surprise since multiruby (like rvm) maintains a separate set of gems for each implementation. It was just a matter of "multigem install tzinfo" and move on to the next step. Wrong!

Multiruby reported that it had installed the tzinfo gem for each of the installed multiruby implementations, but when I ran the rake task again, no joy, same thing. Running "multigem list" revealed that there were no gems for any of the multiruby installs!

After a bit of head-scratching, I realized that rvm was setting GEM_HOME so that the gem command would know where to look for and install gems, and this was confusing multigem, which simply runs the gem command with ruby which ends up installing gems relative to the implementations installation directory. But GEM_HOME overrides this, so multigem was just reinstalling the gem three times in whatever directory rvm wanted them.

The Workaround

What's working for me is to use the bash command "unset GEM_HOME" before running multigem. This removes the variable entirely, and multigem goes back to working "normally." It's not ideal but it works.


RSpec Meet MultiRuby

Posted by Rick DeNatale Wed, 24 Jun 2009 23:41:00 GMT

Yesterday, David released RSpec 1.2.7, which includes a patch I provided to allow the specification of where to find the 'ruby' program when creating a SpecTask, rather than relying Rakes RUBY variable.

Why did I submit this patch you ask, assuming you didn't read the title of this post?

So you can do this in a Rakefile :

multiruby_path = `which multiruby`.chomp
if multiruby_path.length > 0 && Spec::Rake::SpecTask.instance_methods.include?("ruby_cmd")
  namespace :spec do
      desc "Run all specs with multiruby and ActiveSupport"
      Spec::Rake::SpecTask.new(:multi) do |t|
        t.spec_opts = ['--options', "spec/spec.opts"]
        t.spec_files = FileList['spec/**/*_spec.rb']
        t.ruby_cmd = multiruby_path
       end
   end
end

This is derived from something I just added to RiCal but haven't yet released.

What it does is check that you have multiruby, which is part of the zentest gem, installed, and that your version of RSpec supports the new ruby_cmd option. If both conditions are met it makes a spec task which runs the specs using multiruby instead of ruby.

Now it's easy to run specs with the various ruby versions you want to support.


Silly Sinatra Application

Posted by Rick DeNatale Mon, 15 Jun 2009 03:46:00 GMT
Strangersinthenight
require 'rubygems'
require 'sinatra' 

def be
  "do, be, do, be, do"
end 

get '/strangers' do
   be do 
       be do
       end
   end
end
The thought occurred to me during Glenn Vanderberg's presentation on Sinatra at RubyRX a few months back.

Singleton Methods in Smalltalk and Ruby

Posted by Rick DeNatale Sat, 30 May 2009 21:29:00 GMT

Yesterday, Travis Griggs posted an interesting article on his blog about a couple of tricks he used to write a test which needed to ensure that a race condition actually happened during the test.

And Randal Schartz just discovered it too. These two posts point out some interesting similarities and differences between Ruby and Smalltalk.

Singleton methods

A lot of folks point to Smalltalk as a source for the kind of metaprogramming techniques that we Ruby programmers take for granted. Smalltalk does allow a lot of metaprogramming, but things like instance-specific behavior aren't part of the standard repertoire of most Smalltalkers. That's why Randal, who is no slouch at Smalltalk, expressed a certain amount of amazement at how Travis did this. Here's his (Travis') example Smalltalk code:

p := 4 @ 3.
p changeClassTo: (p class copy superclass: p class).
p class methodDictionary at: #negated put: (p class methodDictionary at: #transpose).
p negated

For those unfamiliar with Smalltalk, the expression 4 @ 3 creates an instance of Point, which is a 2-d point object with x=4 and y=3.

Let's write a Point class in Ruby which works like a subset of Smalltalk's Point

class Point
  attr_accessor :x, :y
  def initialize(x, y)
    @x, @y = x, y
  end
  
  def negated
    self.class.new(-x, -y)
  end
  
  def transpose
    self.class.new(y, x)
  end
  
  def inspect
    "#{@x} @ #{@y}"
  end
end

p = Point.new(4,3) # => 4 @ 3

p.negated
p.negated          # => -4 @ -3
p.transpose        # => 3 @ 4

Now , the above code might look something like this:

p = Point.new(4,3)
def p.negated
  transpose
end

p.negated # => 3 @ 4

Most Rubyists of any experience will recognize that what I've done is define a singleton method for that sole instance of point which overrides the negate method by calling transpose instead.

Another way to do this which might be a little bit more like what Travis shows in Smalltalk might be:

p = Point.new(4,3)
class <<p
  alias_method :negated, :transpose
end

Let's look at Travis' example and how he creates a 'singleton' class in Smalltalk. Here are the relevant lines:

p changeClassTo: (p class copy superclass: p class).
p class methodDictionary at: #negated put: (p class methodDictionary at: #transpose).

He gets p's class (which is Point up to now), copies it, and sets it's super class to p's class, effectively interposing the copied class object. Next he gets the transpose instance method from the copied class's method dictionary and replaces the negated method. Effectively what Ruby's alias_method does.

Now, I should point out here that Travis's example is probably specific to the Cincom VisualWorks Smalltalk dialect. The changeClassTo: method doesn't seem to be available in Squeak, although there might be a similar method, and I don't recall a similar method in the Smalltalks I've used in the past. Also changeClassTo: isn't entirely guaranteed to work, besides the cases where Ruby can't create a singleton class (for example immediate objects like FixNums), changeClassTo: requires that the 'shape' of the instance conform to the instance variable template defined in the new class, simplifying things a bit, that means that it needs to have either exactly the number of instance variables expected by the class, or at least the number of fixed instance variables expected by the class. Smalltalk classes can have a variable set of indexed instance variables which are placed after all the fixed (named) instance variables.

So let's end this exploration of comparative instance specific method creation between Ruby and Smalltalk before I move on to another interesting difference which Travis' article exposes:

  1. Smalltalk provides dialect-specific mechanisms for monkeying with the class of an object and manipulating methods.
  2. Ruby has some nice syntactic sugar for doing this in a way which is part of the language definition.

As a result, these techniques are far more commonly used in Ruby than in Smalltalk.

Now for the second difference, one where Smalltalk leaves Ruby a bit behind.

Turtles All the Way Down

One of the really interesting things about Smalltalk is just how much of the runtime system is exposed as 'normal' Smalltalk objects. In Ruby certain things are hidden away (sometimes not so securely) from the Ruby programmer, there's a "Wizard of Oz" behind the curtain which separates to the Ruby program from the VM (or interpreter if you prefer).

Almost everything the VM deals with in Smalltalk pokes up as one or more Smalltalk objects. In Smalltalk everything really is an object, it's turtles all the way down, at least it looks that way.

Travis' motivating problem was to force a race condition to occur on demand. Anyone who has done concurrent programming has learned that race conditions have probabilistic minds of their own.

The lever that Travis pulled was the fact that Smalltalk threads (which Smalltalk calls Processes for historic reasons) are implemented as Smalltalk objects which can be manipulated directly from Smalltalk. The Smalltalk IDE makes use of this. For example the Smalltalk debugger is really just a specialized inspector which inspects a Process object, including showing it's state (the stack frames) and manipulate it (breaking, stepping, and responding when class definitions change).

So Travis could grab the instance of a single process and change its instance-specific behavior to modify the way it handled termination in order to simulate the condition he needed.

This aspect of Smalltalk is something which many maybe too many Smalltalk programmers are familiar with. The second part of that old article link I just snuck in talks about the distributed version of Smalltalk I did at IBM many years ago which added a new kind of Process proxy which tied execution threads which crossed machine boundaries together allowing distributed debugging, exception handling etc.

Can Ruby get along without this? Sure. But it is one thing I miss from Smalltalk