I'm working a project which uses RSpec, Bundler, and Rails 2.3.4.
The team has been struggling with the changes in gem management with Bundler. The "Old School" rails way to configure gems for different rails environments is to conditionally execute gem.configure based on the rails environment. Then use the rake:gems:install task to gem install the gems, one for each environment.
Bundler does things a bit differently. It uses groups to segregate gems by environment, so you name the gems needed for the test environment in a :test group. When you run bundle install, it installs all of the gems for all of the groups (unless you opt out of some groups. It uses the groups at run time, to selectively expose only gems applicable to the current environment.
This cause problems for things like the spec rake task. If you run
without explicitly setting the Rails environment.
The way the spec rake task works is that it depends on a rails provided rake task called :environment, which 'boots' the rails environment, then the spec_helper file which each spec whould require, sets ENV['RAILS_ENV'] to 'test' if it's not already set, then it requires needed gem code, like 'spec/rails'.
In the old school approach this works, since those gems are installed and visible. With bundler, it fails since they won't be exposed, since the bundler environment got set up using the development environment, so gems in the :test group aren't available.
In an attempt to fix this, other members on the team were mixing things up, doing things like putting conditional tests of the rails environment in the Gemfile, and then running the bundle command with different overrides of the RAILS_ENV environment variable.
But that way lies madness.
This morning I worked out a solution which involves deferring setting up the bundle environment.
- I needed to keep the spec task from running the environment task. The project was already using a variation of override_rake_task, so I added an override in a task within the lib directory:
override_task :spec do Rake::Task["spec:original"].execute end - The next step was to bootstrap the rails environment in spec_helper. To do this I started it with:
ENV["RAILS_ENV"] ||= 'test' RAILS_ROOT = "#{File.dirname(__FILE__)}/.." unless defined?(RAILS_ROOT) environment_path = File.expand_path(File.join(RAILS_ROOT, 'config', 'environment')) require(environment_path) require 'spec/rails' #... any other needed requires It's crucial to expand the path which ensures that the same string is used whenever the environment file is required. Before I did that I was having problems with the environment being required a second time, presumably for the second spec. If you have any code which patches anything using alias_method_chain, or a similar technique, loading that code a second time can cause infinite loops which can be mystifying.
So this seems to be working. I'll try to update if I find anything else, but in the meantime I hope some folks find it useful.
Recently, Yehuda Katz wrote an article in reaction to a Pythonista's criticism of Ruby and in particular to the complaint that you can't call a proc with parentheses:
my_method = Proc.new {1 + 2} my_method()
Yehuda did a great job of defending why this is consistent with the rest of Ruby, and talked about how blocks in Ruby are really used. As part of the article, he conflated blocks and Procs. The difference is rather subtle, and in most cases can be ignored. But Giles pointed out that the difference is there, and is documented both in the Pickaxe and in David Flanigan and Matz's "The Ruby Programming language" and I posted a comment in support of Giles after Yehuda pushed back maintaining his view that there was no difference.
Yehuda followed up with another article, Which prompted this article.
Let me start by saying that I'm not interested in starting a war here. First, I've nothing but respect for Yehuda, and I'm very impressed by his body of work, particularly the refactoring of Rails from Rails 2 to Rails 3. Second, I agree that most of this stuff really doesn't matter all that much, and only when discussing things at a rather deep level. It can be like a debate about how many angels can dance on the head of a pin! But such discussions can be fun and in rare instances even enlightening.
Proc Identity
Here's Yehuda's example from the second article:
def foo yield end def bar(&block) puts block.object_id baz(&block) end def baz(&block) puts block.object_id yield end foo { puts "HELLO" } #=> "HELLO" bar { puts "HELLO" } #=> "2148083200\n2148083200\nHELLO"
He points out that the identity of the proc object bound to the two arguments named block in the bar and baz methods doesn't change. He then gives a slightly different example:
def foo(&block) puts block.object_id yield end b = Proc.new { puts "OMG" } puts b.object_id foo(&b) #=> 2148084040\n2148084040\nOMG
and then proposes two "mental models" of what's going on:
- The &b unwraps the Proc object, and the &block recasts it into a Proc. However, it somehow also wraps it back into the same wrapper that it came from into the first place. or...
- The &b puts the b Proc into the block slot in foo’s argument list, and the &block gives the implicit Proc a name. There is no need to explain why the Proc has the same object_id; it is the same Object!
He then says that the first actually represents the MRI implementation. Based both on a reading of Flanigan and Matz and the MRI code, I respectfully disagree, the truth lies in the middle
The first thing to realize is that the use of & before the last argument of a method definition is not the same thing as the use of & before the last parameter value in a method invocation. The
An analogy with Splat
Ruby has a similar prefix *, a.k.a splat, or what David Black likes to call "the unary unarray" operator. Actually depending on where it occurs * is either an "unarray" or a "make array" operator.
Consider this code:
array = [1, 2, 3] def three_args(a, b, c) "a is #{a}, b is #{b}, c is #{c}" end a, b, c = *array a # => 1 b # => 2 c # => 3 three_args(*array) # => "a is 1, b is 2, c is 3" def glob_args(*args) args end x, *array2 = 4, 5, 6 x # => 4 array2 # => [5, 6] glob_args(:now, :is, "the", Time) # => [:now, :is, "the", Time]
As I said above, * is an unarray operator when it appears either in the right hand side of an assignment, or in the argument list of a method invocation. It acts as an operator which coalesces multiple values into an array if it appears on the left hand side of an assignment or in front of a formal parameter in a method
One important consideration in the use of both splat and proc arguments, is that the method definition doesn't "know" whether the method will be invoked like this:
foo # with no block # or this: foo { 1 } # or this: foo(&b) or even this: proc = Proc.new {1} foo(proc)
All of these are valid, and need to work.
Method definition with a & Formal Parameter
In Yehuda's last quoted example this is the def foo case.
Method Invocation with a & arg
Two things happen when a method is called with a & argument. The cases of calling the method with either an implicit block, an explicit value for the argument, or no argument for the block at all all need to be handled. And the cases of explicitly calling the argument e.g via block.call, and yielding to the block both have to work.
So the &block argument in the definition of foo means that the method prelude ensures that the block argument will either be nil or will refer an object which is a Proc, or at least appears to be a proc. It does this at the entry to the invocation of the method.
It also must ensure that yield semantics will work whether the method was called with an implicit block, or an explicit value for the parameter. Note that in MRI, block yield works by reference to a field called iter in the current stack frame, which in Ruby 1.8 is used to find the node in the abstract syntax tree of the method which defined the block which corresponds to that block. And yield is implemented by evaluating the subtree of the AST rooted at that node. Of course the representation in YARV in Ruby 1.9 represents executable code differently but the effect is the same. The yield keyword deals with the internal "VM" representation of the executable ruby code directly without surfacing it as a Ruby object. This is the real difference between a block and an instance of Proc.
The draft Ruby standard abstracts this iter field a bit using the notation [block] to refer to a logical stack of blocks in the execution context. The draft distinguishes between block and procs. Since I started writing this article, Avdi Grimm wrote another reaction to Yehuda's second article, which looks at the same issues I'm talking about here from the perspective of the draft standard.
With all that said, here's how that formal block argument is handled when the foo method is invoked
- If block is not nil, then it sees if a block was given in the method call, and if so, it creates a Proc object which will cause the block to be executed when the proc is called.
- If block is NOT nil
- If the value of block is not already an instance of Proc send :to_proc to the value (with a guard to see if it responds, but that's a minor implementation decision to avoid having the overhead of catching a MethodMissing exception). This is why defining Symbol#to_proc allows writing things like (1..10).map(&:succ).
- Set up the VM so that yield will work should that happen. In Ruby 1.8 this involves correctly pushing an iterator onto the stack frame, I haven't read through the YARV implementation but I'm sure that it has the same effect.
- If the object referenced by block isn't a proc or convertible using #to_proc, to a 'proc-like' duck which can quack to the tune of #call, then a TypeError is raised.
Note that this isn't wrapping the argument with a proc, it's ensuring that we have an object which acts as a proc, if it needs to be 'cast' into a proc it will be, but if it's already a real Proc, or it responds to to_proc by returning self, then it will be the same object.
Method invocation with a & prefix on the last argument value
This part is a bit simpler. The argument itself is just passed through. The trick is that in the process of invoking the method, the sending code must do the same thing as step 2.2 above in case the method does a yield.
Does it Matter?
Barring any embarrassing mistakes on my part, this is what happens in MRI ruby, as described in section 6.4.5 Block Arguments of Flanigan and Matsumoto, as well as my reading of the MRI code.
Yehuda makes the point that this is all pretty invisible to the Ruby programmer, and he's right. It would seem that a Ruby implementation could ALWAYS turn blocks into procs and not have a separate hidden iter structure in the VM. One reason for not doing so is performance. Since Procs are closures and capture the bindings of any variables in their scope, there is some overhead to their creation and destruction, if a block is only accessed via yield, then it's guaranteed not to have a lifetime past the return of the called method. So this is an optimization. And such optimizations are known in other dynamic language implementations. Smalltalk gives the illusion of uniformly using closures to represent blocks, but most implementations cheat and recognize cases where the overhead of creating a closure can be avoided. In some cases this is invisible to the Smalltalk programmer, but not always.
So although we might know exactly angels are dancing on the head of the pin, or what steps they are doing, the ruby language books and the draft standard let it slip that they are there.
Last week the Phusion guys, gave a talk at Google about the implementation of Ruby Enterprise Edition, and it's now available on YouTube.
It's good stuff if you are a VM geek, like me. The cover two major topics
- How they made the Ruby 1.8 garbage collector Copy on Write (CoW) friendly, which allows multiple Ruby processes to share memory for unchanged objects. This greatly reduces the footprint of a typical Rails Deployment for example. They talks about a series of attempts to change the Ruby GC to move the mark bits from the objects themselves to a separate memory structure. The result is significant memory savings, a slightly slower GC, but overall faster Ruby performance because they replaced malloc with a faster allocator (from Google apparently) which more than offsets the GC performance
- How a contribution from the Event Machine developers improves thread context switching for Ruby 1.8's user space threads. Ruby 1.8 handles thread context switches by copying the execution stack to and from the heap, which can result in significant time being taken up by memcopy calls. Ruby Enterprise edition now has an optional feature which instead switches the base stack pointer on a thread switch. This is processor specific, and is only available right now for Intel 32 and 64 bit processors.
So if this kind of stuff interests you, I'd recommend spending the half hour or so that it takes to watch the video
$ rvm use 1.8.7 $ ruby -e'puts [1, "a"].to_s' 1a $ rvm use 1.9 $ ruby -e'puts [1, "a"].to_s' [1, "a"]
I've been working on converting my client's Rails App to Ruby 1.9.
It's been fairly painless, but there have been a few stumbling blocks. One has been the use of arrays within string interpolations, and the difference in the result
Prior to Ruby 1.9 Array#to_s was a synonym for Array.join which resulted in the concatenation of the results of sending to_s to each element. In Ruby 1.9 Array#to_s is the same as Array#inspect.
This has had a tendency to produce subtle problems which are at time hard to track down. Once they are found, the solution is to do something like changing:
"Whatever #{some_array}"
to:
"Whatever #{some_array.join}"
Many Rubyists find the need from time to time to run multiple versions of Ruby. If you are developing open-source code, it's a good idea to try to maintain compatibility with all three of the main versions of Ruby current in use, 1.8.6, 1.8.7 and 1.9
There have been some tools for this for a while now. A lot of you probably already know about multiruby, and many may be aware of a new gem called rvm for Ruby Version Manager.
These two are useful for different purposes. Multiruby excels for testing ruby code against different versions, while rvm is great for quickly switching rubies so that you can play with one or the other. I think of RVM as a set of hand tools, and multiruby as a power tool workshop
They complement each other, but I've had a few bumps getting them to work together, hence this article.
Multiruby
One tool which helps do this is the multiruby suite of tools which are part of Ryan Davis' ZenTest gem. There are three tools in this suite:
- multiruby_setup
- which allows you to install and maintain a collection of Ruby versions. The various versions are installed in a subdirectory of .multiruby in your home directory
- multiruby
- which runs each of the installed ruby commands with the same arguments.
- multigem
- which uses multiruby to run the gem command in order to install gems in the right place for each of the ruby versions under that .multiruby directory.
For my
multiruby_path = `which multiruby`.chomp if multiruby_path.length > 0 && Spec::Rake::SpecTask.instance_methods.include?("ruby_cmd") namespace :multi do desc "Run all specs with multiruby and ActiveSupport" Spec::Rake::SpecTask.new(:with_active_support) do |t| t.spec_opts = ['--options', "spec/spec.opts"] t.spec_files = FileList['spec/**/*_spec.rb'] t.ruby_cmd = "#{multiruby_path}" t.verbose = true t.ruby_opts << "-r #{File.join(File.dirname(__FILE__), *%w[gem_loader load_active_support])}" end desc "Run all specs multiruby and the tzinfo gem" Spec::Rake::SpecTask.new(:with_tzinfo_gem) do |t| t.spec_opts = ['--options', "spec/spec.opts"] t.spec_files = FileList['spec/**/*_spec.rb'] t.ruby_cmd = "#{multiruby_path}" t.verbose = true t.ruby_opts << "-r #{File.join(File.dirname(__FILE__), *%w[gem_loader load_tzinfo_gem])}" end end desc "run all specs under multiruby with ActiveSupport and also with the tzinfo gem" task :multi => [:"spec:multi:with_active_support", :"spec:multi:with_tzinfo_gem"] end
I've got three tasks here because RiCal works with either the tzinfo gem OR activesupport from Rails, and I want to test each combination of gems and ruby versions.
RVM
Like multiruby, rvm lets you set up and use multiple versions of ruby. As I said above the difference here is that while multiruby runs them all together, rvm is for when you want to pick one to use for a while.
The rvm command is used to:
- install a ruby implementation specifying one of ruby for MRI ruby, ree for Ruby Enterprise Edition a version of MRI patched for use with passenger (a/k/a mod-rails) or jruby surprisingly enough for JRuby and optionally the specific version and even patch level.
- pick which ruby to use by using "rvm use which", where which is one of the above or default for the standard ruby installation for your system.
as well as other management functions.
The rvm gem is actually a thin ruby wrapper around some bash scripts. The way rvm works is to set up shell environment variables when you use "rvm use" so you get the right ruby executables and environment, and there lies the rub.
Who's got the Gem
Now did I tell you that I decided to add rvm to my arsenal right after I upgraded my MacBook to run Snow Leopard?
Because of this I had to rebuild a lot of my ruby development tool chain. I decided just to 'fault in' things that I found to be missing when I found that they were missing. A lot of those things were gems. So I'd run my various ruby projects, and when I found a missing gem, I'd install it.
So I ran my normal spec tasks against RiCal, and installed the missing gems. When I got those working, I ran the multiruby taks, and found that the tzinfo gem was missing. This wasn't a surprise since multiruby (like rvm) maintains a separate set of gems for each implementation. It was just a matter of "multigem install tzinfo" and move on to the next step. Wrong!
Multiruby reported that it had installed the tzinfo gem for each of the installed multiruby implementations, but when I ran the rake task again, no joy, same thing. Running "multigem list" revealed that there were no gems for any of the multiruby installs!
After a bit of head-scratching, I realized that rvm was setting GEM_HOME so that the gem command would know where to look for and install gems, and this was confusing multigem, which simply runs the gem command with ruby which ends up installing gems relative to the implementations installation directory. But GEM_HOME overrides this, so multigem was just reinstalling the gem three times in whatever directory rvm wanted them.
The Workaround
What's working for me is to use the bash command "unset GEM_HOME" before running multigem. This removes the variable entirely, and multigem goes back to working "normally." It's not ideal but it works.
Yesterday, David released RSpec 1.2.7, which includes a patch I provided to allow the specification of where to find the 'ruby' program when creating a SpecTask, rather than relying Rakes RUBY variable.
Why did I submit this patch you ask, assuming you didn't read the title of this post?
So you can do this in a Rakefile :
multiruby_path = `which multiruby`.chomp if multiruby_path.length > 0 && Spec::Rake::SpecTask.instance_methods.include?("ruby_cmd") namespace :spec do desc "Run all specs with multiruby and ActiveSupport" Spec::Rake::SpecTask.new(:multi) do |t| t.spec_opts = ['--options', "spec/spec.opts"] t.spec_files = FileList['spec/**/*_spec.rb'] t.ruby_cmd = multiruby_path end end end
This is derived from something I just added to RiCal but haven't yet released.
What it does is check that you have multiruby, which is part of the zentest gem, installed, and that your version of RSpec supports the new ruby_cmd option. If both conditions are met it makes a spec task which runs the specs using multiruby instead of ruby.
Now it's easy to run specs with the various ruby versions you want to support.
require 'rubygems' require 'sinatra' def be "do, be, do, be, do" end get '/strangers' do be do be do end end end
Yesterday, Travis Griggs posted an interesting article on his blog about a couple of tricks he used to write a test which needed to ensure that a race condition actually happened during the test.
And Randal Schartz just discovered it too. These two posts point out some interesting similarities and differences between Ruby and Smalltalk.
Singleton methods
A lot of folks point to Smalltalk as a source for the kind of metaprogramming techniques that we Ruby programmers take for granted. Smalltalk does allow a lot of metaprogramming, but things like instance-specific behavior aren't part of the standard repertoire of most Smalltalkers. That's why Randal, who is no slouch at Smalltalk, expressed a certain amount of amazement at how Travis did this. Here's his (Travis') example Smalltalk code:
p := 4 @ 3.
p changeClassTo: (p class copy superclass: p class).
p class methodDictionary at: #negated put: (p class methodDictionary at: #transpose).
p negatedFor those unfamiliar with Smalltalk, the expression 4 @ 3 creates an instance of Point, which is a 2-d point object with x=4 and y=3.
Let's write a Point class in Ruby which works like a subset of Smalltalk's Point
class Point
attr_accessor :x, :y
def initialize(x, y)
@x, @y = x, y
end
def negated
self.class.new(-x, -y)
end
def transpose
self.class.new(y, x)
end
def inspect
"#{@x} @ #{@y}"
end
end
p = Point.new(4,3) # => 4 @ 3
p.negated
p.negated # => -4 @ -3
p.transpose # => 3 @ 4Now , the above code might look something like this:
p = Point.new(4,3)
def p.negated
transpose
end
p.negated # => 3 @ 4Most Rubyists of any experience will recognize that what I've done is define a singleton method for that sole instance of point which overrides the negate method by calling transpose instead.
Another way to do this which might be a little bit more like what Travis shows in Smalltalk might be:
p = Point.new(4,3)
class <<p
alias_method :negated, :transpose
endLet's look at Travis' example and how he creates a 'singleton' class in Smalltalk. Here are the relevant lines:
p changeClassTo: (p class copy superclass: p class).
p class methodDictionary at: #negated put: (p class methodDictionary at: #transpose).He gets p's class (which is Point up to now), copies it, and sets it's super class to p's class, effectively interposing the copied class object. Next he gets the transpose instance method from the copied class's method dictionary and replaces the negated method. Effectively what Ruby's alias_method does.
Now, I should point out here that Travis's example is probably specific to the Cincom VisualWorks Smalltalk dialect. The changeClassTo: method doesn't seem to be available in Squeak, although there might be a similar method, and I don't recall a similar method in the Smalltalks I've used in the past. Also changeClassTo: isn't entirely guaranteed to work, besides the cases where Ruby can't create a singleton class (for example immediate objects like FixNums), changeClassTo: requires that the 'shape' of the instance conform to the instance variable template defined in the new class, simplifying things a bit, that means that it needs to have either exactly the number of instance variables expected by the class, or at least the number of fixed instance variables expected by the class. Smalltalk classes can have a variable set of indexed instance variables which are placed after all the fixed (named) instance variables.
So let's end this exploration of comparative instance specific method creation between Ruby and Smalltalk before I move on to another interesting difference which Travis' article exposes:
- Smalltalk provides dialect-specific mechanisms for monkeying with the class of an object and manipulating methods.
- Ruby has some nice syntactic sugar for doing this in a way which is part of the language definition.
As a result, these techniques are far more commonly used in Ruby than in Smalltalk.
Now for the second difference, one where Smalltalk leaves Ruby a bit behind.
Turtles All the Way Down
One of the really interesting things about Smalltalk is just how much of the runtime system is exposed as 'normal' Smalltalk objects. In Ruby certain things are hidden away (sometimes not so securely) from the Ruby programmer, there's a "Wizard of Oz" behind the curtain which separates to the Ruby program from the VM (or interpreter if you prefer).
Almost everything the VM deals with in Smalltalk pokes up as one or more Smalltalk objects. In Smalltalk everything really is an object, it's turtles all the way down, at least it looks that way.
Travis' motivating problem was to force a race condition to occur on demand. Anyone who has done concurrent programming has learned that race conditions have probabilistic minds of their own.
The lever that Travis pulled was the fact that Smalltalk threads (which Smalltalk calls Processes for historic reasons) are implemented as Smalltalk objects which can be manipulated directly from Smalltalk. The Smalltalk IDE makes use of this. For example the Smalltalk debugger is really just a specialized inspector which inspects a Process object, including showing it's state (the stack frames) and manipulate it (breaking, stepping, and responding when class definitions change).
So Travis could grab the instance of a single process and change its instance-specific behavior to modify the way it handled termination in order to simulate the condition he needed.
This aspect of Smalltalk is something which many maybe too many Smalltalk programmers are familiar with. The second part of that old article link I just snuck in talks about the distributed version of Smalltalk I did at IBM many years ago which added a new kind of Process proxy which tied execution threads which crossed machine boundaries together allowing distributed debugging, exception handling etc.
Can Ruby get along without this? Sure. But it is one thing I miss from Smalltalk
The other day, someone brought up a UTF-8 related issue with RiCal.
RFC2445 specifies that each line of a icalendar datastream must be no more than 75 bytes, and longer lines need to be folded by breaking them into sections with the second and following sections put into lines with an initial space character to mark them as continuation lines. As was pointed out to me, simply breaking a UTF-8 string in Ruby runs the risk of splitting up a multi-byte character.
Here's a spec to show what I needed:
describe "String#safe_utf8_split" do
context "For an all-ascii string" do
before(:each) do
@it = "abcdef"
end
it "should properly split an ascii string when n leaves 1 character" do
@it.utf8_safe_split(5).should == ["abcde", "f"]
end
it "should return a nil remainder if the string has less than n characters" do
@it.utf8_safe_split(7).should == ["abcdef", nil]
end
it "should return a nil remainder if the string has exactly n characters" do
@it.utf8_safe_split(6).should == ["abcdef", nil]
end
end
context "For a string containing a 2-byte UTF-8 character" do
before(:each) do
@it = "Café"
end
it "should split properly just before the 2-byte character" do
@it.utf8_safe_split(3).should == ["Caf", "é"]
end
it "should split before when n is at the start of the 2-byte character" do
@it.utf8_safe_split(4).should == ["Caf", "é"]
end
it "should split after when n is at the second byte of a 2-byte character" do
@it.utf8_safe_split(5).should == ["Café", nil]
end
end
context "For a string containing a 3-byte UTF-8 character" do
before(:each) do
@it = "Prix €200"
end
it "should split properly just before the 3-byte character" do
@it.utf8_safe_split(5).should == ["Prix ", "€200"]
end
it "should split before when n is at the start of the 3-byte character" do
@it.utf8_safe_split(6).should == ["Prix ", "€200"]
end
it "should split before when n is at the second byte of a 3-byte character" do
@it.utf8_safe_split(7).should == ["Prix ", "€200"]
end
it "should split after when n is at the third byte of a 3-byte character" do
@it.utf8_safe_split(8).should == ["Prix €", "200"]
end
end
end
So to fix this I came up with a pretty simple idea, split the string and check to see if the second part is valid UTF-8:
class String
def valid_utf8?
unpack("U") rescue nil
end
def utf8_safe_split(n)
if length <= n
[self, nil]
else
before = self[0, n]
after = self[n..-1]
until after.valid_utf8?
n = n - 1
before = self[0, n]
after = self[n..-1]
end
[before, after.empty? ? nil : after]
end
end
endIn RiCal, I actually implemented this using functional methods in another object, since I didn't want to 'pollute' Strings instance methods, but the code here illustrates the basic idea.
After a few weeks of maturation on github, setting up a bug tracker, and a google group for project discussions, the bug reports died down to the point where I felt comfortable putting a more "official" release out on rubyforge.
Thanks to my most active 'beta-testers', Adam Williams who really drove the calendar generation DSL, and Paul Scott-Murphy, and Bruno Duyé gave a much needed workout to occurrence enumeration.
With folks from Australia and France providing input, it felt a bit like the old OTI days




