Recently, Yehuda Katz wrote an article in reaction to a Pythonista's criticism of Ruby and in particular to the complaint that you can't call a proc with parentheses:
my_method = Proc.new {1 + 2} my_method()
Yehuda did a great job of defending why this is consistent with the rest of Ruby, and talked about how blocks in Ruby are really used. As part of the article, he conflated blocks and Procs. The difference is rather subtle, and in most cases can be ignored. But Giles pointed out that the difference is there, and is documented both in the Pickaxe and in David Flanigan and Matz's "The Ruby Programming language" and I posted a comment in support of Giles after Yehuda pushed back maintaining his view that there was no difference.
Yehuda followed up with another article, Which prompted this article.
Let me start by saying that I'm not interested in starting a war here. First, I've nothing but respect for Yehuda, and I'm very impressed by his body of work, particularly the refactoring of Rails from Rails 2 to Rails 3. Second, I agree that most of this stuff really doesn't matter all that much, and only when discussing things at a rather deep level. It can be like a debate about how many angels can dance on the head of a pin! But such discussions can be fun and in rare instances even enlightening.
Proc Identity
Here's Yehuda's example from the second article:
def foo yield end def bar(&block) puts block.object_id baz(&block) end def baz(&block) puts block.object_id yield end foo { puts "HELLO" } #=> "HELLO" bar { puts "HELLO" } #=> "2148083200\n2148083200\nHELLO"
He points out that the identity of the proc object bound to the two arguments named block in the bar and baz methods doesn't change. He then gives a slightly different example:
def foo(&block) puts block.object_id yield end b = Proc.new { puts "OMG" } puts b.object_id foo(&b) #=> 2148084040\n2148084040\nOMG
and then proposes two "mental models" of what's going on:
- The &b unwraps the Proc object, and the &block recasts it into a Proc. However, it somehow also wraps it back into the same wrapper that it came from into the first place. or...
- The &b puts the b Proc into the block slot in foo’s argument list, and the &block gives the implicit Proc a name. There is no need to explain why the Proc has the same object_id; it is the same Object!
He then says that the first actually represents the MRI implementation. Based both on a reading of Flanigan and Matz and the MRI code, I respectfully disagree, the truth lies in the middle
The first thing to realize is that the use of & before the last argument of a method definition is not the same thing as the use of & before the last parameter value in a method invocation. The
An analogy with Splat
Ruby has a similar prefix *, a.k.a splat, or what David Black likes to call "the unary unarray" operator. Actually depending on where it occurs * is either an "unarray" or a "make array" operator.
Consider this code:
array = [1, 2, 3] def three_args(a, b, c) "a is #{a}, b is #{b}, c is #{c}" end a, b, c = *array a # => 1 b # => 2 c # => 3 three_args(*array) # => "a is 1, b is 2, c is 3" def glob_args(*args) args end x, *array2 = 4, 5, 6 x # => 4 array2 # => [5, 6] glob_args(:now, :is, "the", Time) # => [:now, :is, "the", Time]
As I said above, * is an unarray operator when it appears either in the right hand side of an assignment, or in the argument list of a method invocation. It acts as an operator which coalesces multiple values into an array if it appears on the left hand side of an assignment or in front of a formal parameter in a method
One important consideration in the use of both splat and proc arguments, is that the method definition doesn't "know" whether the method will be invoked like this:
foo # with no block # or this: foo { 1 } # or this: foo(&b) or even this: proc = Proc.new {1} foo(proc)
All of these are valid, and need to work.
Method definition with a & Formal Parameter
In Yehuda's last quoted example this is the def foo case.
Method Invocation with a & arg
Two things happen when a method is called with a & argument. The cases of calling the method with either an implicit block, an explicit value for the argument, or no argument for the block at all all need to be handled. And the cases of explicitly calling the argument e.g via block.call, and yielding to the block both have to work.
So the &block argument in the definition of foo means that the method prelude ensures that the block argument will either be nil or will refer an object which is a Proc, or at least appears to be a proc. It does this at the entry to the invocation of the method.
It also must ensure that yield semantics will work whether the method was called with an implicit block, or an explicit value for the parameter. Note that in MRI, block yield works by reference to a field called iter in the current stack frame, which in Ruby 1.8 is used to find the node in the abstract syntax tree of the method which defined the block which corresponds to that block. And yield is implemented by evaluating the subtree of the AST rooted at that node. Of course the representation in YARV in Ruby 1.9 represents executable code differently but the effect is the same. The yield keyword deals with the internal "VM" representation of the executable ruby code directly without surfacing it as a Ruby object. This is the real difference between a block and an instance of Proc.
The draft Ruby standard abstracts this iter field a bit using the notation [block] to refer to a logical stack of blocks in the execution context. The draft distinguishes between block and procs. Since I started writing this article, Avdi Grimm wrote another reaction to Yehuda's second article, which looks at the same issues I'm talking about here from the perspective of the draft standard.
With all that said, here's how that formal block argument is handled when the foo method is invoked
- If block is not nil, then it sees if a block was given in the method call, and if so, it creates a Proc object which will cause the block to be executed when the proc is called.
- If block is NOT nil
- If the value of block is not already an instance of Proc send :to_proc to the value (with a guard to see if it responds, but that's a minor implementation decision to avoid having the overhead of catching a MethodMissing exception). This is why defining Symbol#to_proc allows writing things like (1..10).map(&:succ).
- Set up the VM so that yield will work should that happen. In Ruby 1.8 this involves correctly pushing an iterator onto the stack frame, I haven't read through the YARV implementation but I'm sure that it has the same effect.
- If the object referenced by block isn't a proc or convertible using #to_proc, to a 'proc-like' duck which can quack to the tune of #call, then a TypeError is raised.
Note that this isn't wrapping the argument with a proc, it's ensuring that we have an object which acts as a proc, if it needs to be 'cast' into a proc it will be, but if it's already a real Proc, or it responds to to_proc by returning self, then it will be the same object.
Method invocation with a & prefix on the last argument value
This part is a bit simpler. The argument itself is just passed through. The trick is that in the process of invoking the method, the sending code must do the same thing as step 2.2 above in case the method does a yield.
Does it Matter?
Barring any embarrassing mistakes on my part, this is what happens in MRI ruby, as described in section 6.4.5 Block Arguments of Flanigan and Matsumoto, as well as my reading of the MRI code.
Yehuda makes the point that this is all pretty invisible to the Ruby programmer, and he's right. It would seem that a Ruby implementation could ALWAYS turn blocks into procs and not have a separate hidden iter structure in the VM. One reason for not doing so is performance. Since Procs are closures and capture the bindings of any variables in their scope, there is some overhead to their creation and destruction, if a block is only accessed via yield, then it's guaranteed not to have a lifetime past the return of the called method. So this is an optimization. And such optimizations are known in other dynamic language implementations. Smalltalk gives the illusion of uniformly using closures to represent blocks, but most implementations cheat and recognize cases where the overhead of creating a closure can be avoided. In some cases this is invisible to the Smalltalk programmer, but not always.
So although we might know exactly angels are dancing on the head of the pin, or what steps they are doing, the ruby language books and the draft standard let it slip that they are there.
Yesterday I found myself trying to figure how which version of Ruby was the latest on a particular date. I had trouble finding a resource with this information using google, so I posted a query to the ruby-talk forum, and Urabi Shyouhei posted a list derived from the ftp site.
In the interest of posterity, here's the list as of now:
| Ruby Version | Release Date (in Japanese time zone) |
|---|---|
| Ruby birthday | 02/24/1993 |
| 0.95 | 12/21/1995 |
| 1.0-961225 | 12/25/1996 |
| 1.0-971225 | 12/25/1997 |
| 1.1c0 | 07/17/1998 |
| 1.1c1 | 07/24/1998 |
| 1.1c2 | 08/11/1998 |
| 1.1c3 | 08/27/1998 |
| 1.1c4 | 09/03/1998 |
| 1.1c5 | 09/08/1998 |
| 1.1c6 | 10/05/1998 |
| 1.1c7 | 11/09/1998 |
| 1.1c8 | 11/19/1998 |
| 1.1c9 | 11/26/1998 |
| 1.2 | 12/25/1998 |
| 1.2.1 | 01/11/1999 |
| 1.2.1 | 01/12/1999 |
| 1.2.2 | 01/21/1999 |
| 1.2.3 | 02/16/1999 |
| 1.2.4 | 04/09/1999 |
| 1.2.5 | 04/13/1999 |
| 1.2.6 | 06/21/1999 |
| 1.4.0 | 08/13/1999 |
| 1.4.1 | 09/16/1999 |
| 1.4.2 | 09/17/1999 |
| 1.4.3 | 12/07/1999 |
| 1.4.4 | 04/14/2000 |
| 1.4.5 | 06/23/2000 |
| 1.4.6 | 08/16/2000 |
| 1.6.0 | 09/19/2000 |
| 1.6.1 | 09/27/2000 |
| 1.6.2 | 12/25/2000 |
| 1.6.3 | 03/20/2001 |
| 1.6.4 | 06/04/2001 |
| 1.6.5 | 09/19/2001 |
| 1.6.6 | 12/26/2001 |
| 1.6.7 | 03/01/2002 |
| 1.6.8 | 12/24/2002 |
| 1.8.0 | 08/04/2003 |
| 1.8.1 | 12/25/2003 |
| 1.8.2 | 12/25/2004 |
| 1.8.3 | 09/21/2005 |
| 1.8.4 | 12/24/2005 |
| 1.8.5 | 08/25/2006 |
| 1.8.5-p2 | 12/04/2006 |
| 1.8.5-p12 | 12/25/2006 |
| 1.8.5-p35 | 03/13/2007 |
| 1.8.6 | 03/13/2007 |
| 1.8.5-p52 | 06/09/2007 |
| 1.8.6-p36 | 06/09/2007 |
| 1.8.5-p113 | 09/23/2007 |
| 1.8.6-p110 | 09/23/2007 |
| 1.8.5-p114 | 10/04/2007 |
| 1.8.6-p111 | 10/04/2007 |
| 1.8.5-p115 | 03/03/2008 |
| 1.8.6-p114 | 03/03/2008 |
| 1.8.7 | 06/01/2008 |
| 1.8.7-p17 | 06/09/2008 |
| 1.8.5-p231 | 06/20/2008 |
| 1.8.6-p230 | 06/20/2008 |
| 1.8.7-p22 | 06/20/2008 |
| 1.8.6-p286 | 08/08/2008 |
| 1.8.7-p71 | 08/08/2008 |
| 1.8.6-p287 | 08/11/2008 |
| 1.8.7-p72 | 08/11/2008 |
| 1.9.1-p0 | 01/30/2009 |
| 1.8.6-p368 | 03/31/2009 |
| 1.8.7-p160 | 04/09/2009 |
| 1.9.1-p129 | 05/12/2009 |
| 1.8.6-p369 | 06/09/2009 |
| 1.8.7-p173 | 06/09/2009 |
| 1.8.7-p174 | 06/15/2009 |
| 1.9.1-p243 | 07/18/2009 |
| 1.8.6-p383 | 08/03/2009 |
| 1.8.7-p248 | 12/24/2009 |
| 1.9.1-p376 | 12/07/2009 |
| 1.8.6-p388 | 01/10/2010 |
| 1.8.7-p249 | 01/10/2010 |
| 1.9.1-p378 | 01/10/2010 |
| 1.8.6-p398 | 02/03/2010 |
| 1.8.6-p399 | 02/04/2010 |
I had an interesting debugging session the other day.
The Rails app I'm working on for a client uses oauth to get user info from one of the popular professional social networking sites.
In order to keep from hitting that site with an oauth request too often, yesterday I worked on a story to add throttling. The client wanted to refresh the data no more often than an application configurable refresh period.
Easy Peasy
Easy enough! I already had a model responsible for keeping the oauth token and returning the relevant data. I added a text field to the model where I could cache the data as a serialized object. Then I added code to the method which returned the data which checked to see if the refresh period had elapsed since the record was updated, and if so, make the oauth request to refresh the data, and save the updated record.
Hang on!
Then I started trying it out, and I got a surprise. The first time I asked for the data, it got fetched via oauth, and the model got saved. I asked again, expecting that the cached data would be returned, but once again the oauth request went out.
After playing with it some more in script/console, and adding some debugging printouts, I finally realized that the problem was that the updated_at field in the record wasn't changing. I saw that the sql query generated by the save was inserting the data value, but not a new value for updated_at, strange!
Alternatives
I'd been using the update_attribute method to change the data value. Just for giggles, I decided to see if setting the value and then explicitly saving the record would fix it. Nope! Exactly the same SQL was generated, with updated_at left unchanged.
Digging deeper
This is the point where I resort to reading code. This project has Rails vendored, so I had no qualms about inserting some debugging alterations to the code. Actually I'm not immune to temporarily monkeying with non-vendored gems to debug arcane stuff like this either.
So the first thing was to find the code in ActiveRecord which writes the updated_at attribute. I found it in activerecord/lib/activerecord/timestamp.rb
def update_with_timestamps(*args) #:nodoc: if record_timestamps && (!partial_updates? || changed?) current_time = current_time_from_proper_timezone write_attribute('updated_at', current_time) if respond_to?(:updated_at) write_attribute('updated_on', current_time) if respond_to?(:updated_on) end update_without_timestamps(*args) end
Those familiar with Rails conventions will realize this as an override to the ActiveRecord::Base#update method using alias_method_chain.
So, in order for the timestamps to be written, record_timestamps has to be truthy, partial_updates? has to be falsy or changed? has to be truthy.
Now record_timestamps is a class_inheritable attribute on ActiveRecord::Base, which allows timestamp recording to be turned on or off for all or particular ActiveRecord classes, and that was true by default. The method partial_updates? determines whether or not the partial update feature has been turned on or off for the model, again this is true by default
This leaves changed? a method added by dirty tracking which indicates whether anything in the record has changed and therefore needs to be written back to the database as part of the update.
So facing this code, I inserted a puts to show me what those three values were.
And the culprit turned out to be the result of the changed? method, which was returning false!
Active Record Dirty Attribute Tracking
The partial update feature, along with. dirty attribute tracking, was introduced in Rails 2.1. Dirty attribute tracking keeps track of the value of attributes when a model is fetched, and partial update only writes changed attributes to minimize the amount of data send to the SQL adapter. They use an instance variable @changed_attributes, a hash mapping the name of any attributes whose value has been changed maintained by the write_attribute method. The changed? method indicates whether any of the attributes have changed. There's also a will_change method which alerts ActiveRecord that a an attribute might change directly, say with something like Book.title.capitalize!
So in this case my model fetched the data via an oauth request, and since the value hadn't changed, dirty tracking determined that the model object hadn't changed, so now updated_at timestamp was written.
I'd run afoul of dirty tracking/partial update before. Back in 2008, I got hired by a startup with a sizable Rails app they'd been working on, which used lots of ActiveRecord callbacks and observers. One of my assignments was to upgrade it from Rails 2.0 to 2.1, but the introduction of partial update made some subtle changes to things like the ordering of those callbacks and observed events which badly broke the application. The CTO quickly came to the conclusion that the upgrade was going to cost more than he deemed it to be worth. So, in our case, the jump from Rails 2.0 to 2.1 was a much bigger one than from Rails 1.x to 2.0.
The Fix
In this case, the fix was considerably simpler, use update_attributes(:data => the_date, :updated_at => Time.now). This worked and ensured that the updated_at attribute gets updated whenever the data is fetched.
One More Little Mystery
After fixing this, I moved on, but one little thing nagged at me. Since the data attribute hadn't actually changed its value, why hadn't the partial update code kept it from being written in the first place?
So I went back and looked a little harder at the active record code and found this:
def update_with_dirty if partial_updates? # Serialized attributes should always be written in case they've been # changed in place. update_without_dirty(changed | (attributes.keys & self.class.serialized_attributes.keys)) else update_without_dirty end end
Since the data attribute was serialized, it always gets written!
So Is This a Bug in ActiveRecord?
Actually I could make the case that there's one of two bugs here:
- Serialized attributes shouldn't be written under partial update if they haven't actually changed, or
- The updated_at attribute should be changed if there are any serialized attributes even if the values of those serialized attributes haven't changed.
To be honest, I'm not sure which if either of these is a bug. In my use case, I'd prefer that AR used option 2. On the other hand I can see cases where it would be preferable if updated_at reflected the last time the record actually changed.
On the other hand, I'm not sure that the code right now updates updated_at if the only change to a record is in serialized attributes at least one of whose values has changed.
But, I've got other things to worry about right now, so I guess I'll follow Scarlett O'Hara's advice and "think about that tomorrow!"
Last week the Phusion guys, gave a talk at Google about the implementation of Ruby Enterprise Edition, and it's now available on YouTube.
It's good stuff if you are a VM geek, like me. The cover two major topics
- How they made the Ruby 1.8 garbage collector Copy on Write (CoW) friendly, which allows multiple Ruby processes to share memory for unchanged objects. This greatly reduces the footprint of a typical Rails Deployment for example. They talks about a series of attempts to change the Ruby GC to move the mark bits from the objects themselves to a separate memory structure. The result is significant memory savings, a slightly slower GC, but overall faster Ruby performance because they replaced malloc with a faster allocator (from Google apparently) which more than offsets the GC performance
- How a contribution from the Event Machine developers improves thread context switching for Ruby 1.8's user space threads. Ruby 1.8 handles thread context switches by copying the execution stack to and from the heap, which can result in significant time being taken up by memcopy calls. Ruby Enterprise edition now has an optional feature which instead switches the base stack pointer on a thread switch. This is processor specific, and is only available right now for Intel 32 and 64 bit processors.
So if this kind of stuff interests you, I'd recommend spending the half hour or so that it takes to watch the video
Today was the day to upgrade the server running this blog to Ubuntu 9.10 "Karmic Koala".
Late in the upgrade, during the process of installing and configuring the Debian packages, the upgrade hung. I had just tried to stop bind9 and given an series of error messages indicating that rndc was unable to communicate with bind9.
After waiting a while I hit control-c and was warned that this might leave the system in an inconsistent state, but I figured I had little to lose, so I clicked ok. The upgrade continued, but didn't reboot, saying that there had been a problem
I tried to start bind9 manually, both with rndc start, and /etc/init.d/bind9 start but only got bad news.
So I then tried to start it while tailing the syslog, and noticed that I was getting an error about a failure to open /etc/ssl/openssl.cnf for read. This was strange since the permissions on that file allowed reading by anyone.
A little googling eventually revealed that I was running afoul of AppArmor, something I hadn't encountered before. It turns out that Ubuntu doesn't automatically include an AppArmor profile for bind9, not sure why.
The solution seems to have been to manually install the apparmor-profile package with apt-get.
After verifying that I could manually start bind9 and talk to it with rndc, I rebooted, and as they say "so far so good."
$ rvm use 1.8.7 $ ruby -e'puts [1, "a"].to_s' 1a $ rvm use 1.9 $ ruby -e'puts [1, "a"].to_s' [1, "a"]
I've been working on converting my client's Rails App to Ruby 1.9.
It's been fairly painless, but there have been a few stumbling blocks. One has been the use of arrays within string interpolations, and the difference in the result
Prior to Ruby 1.9 Array#to_s was a synonym for Array.join which resulted in the concatenation of the results of sending to_s to each element. In Ruby 1.9 Array#to_s is the same as Array#inspect.
This has had a tendency to produce subtle problems which are at time hard to track down. Once they are found, the solution is to do something like changing:
"Whatever #{some_array}"
to:
"Whatever #{some_array.join}"
I'm speaking about the "RiCal experience" tomorrow evening 20 October 2009, at the October meeting of the Raleigh Ruby Brigade. As usual the meeting is at Red Hat HQ in Raleigh and starts at 7:00 pm.
I plan to talk about calendars, time, time zones, camels, icalendar, Ruby implementation of complex "stuff", among other things.
I hope to see you there!
There's a bit of a buzz on the interwebs, that Microsoft's project to compete with the iPhone is having some problems.
What I find ironic is that they decided to call this project "Pink". Apple once had a project with the same name, which was to be the 'new' operating system for the Macintosh, as a replacement for System 7. Pink was to be an operating system built around an application framework which was to provide the API. This was a popular 'strategic' idea in those days, Apple, IBM and, yes even Microsoft were pursuing it.
Apple's Pink attempted to take the ideas from MacApp which had just made the transition from Object Pascal to C++, and grow them into an operating system. Larry Tesler had advocated this and John Sculley, then Apple's presidency during the "inter-regnum" period, had sunk lots of money into the project which was headquartered in the original building where the Macintosh had been developed, when it was the lair of the "Dread Pirate" Job's merry band. The project had gotten bogged down, and Sculley was on the verge of killing it, unless other 'investors' could be found. Since IBM was enamored with such things back then, a task-force was dispatched to Cupertino to look at Pink. I was tapped for the task-force because of my expertise in Object Oriented programming and visibility due to my connections with VisualAge and Smalltalk, Despite whatever technical misgivings which might have resulted, in other words they didn't listen to me, IBM bought in. The result was an alliance which formed two jointly companies which were jointly owned by IBM and Apple, Kaleida which focused on multi-media, and Taligent led by IBM Executive Joe Gugliemi, where Pink died a slow death. The longest term impact of the alliance was that it got Apple to agree to switching from the Motorola 68K family of microprocessors, which had been used in Macs until then, to the IBM PowerPC. But, since Motorola was a large IBM customer, IBM licensed the PowerPC to them, so that most Macs used IBM architected chips actually produced by Motorola until the "Dread" Pirate came back and eventually switched to using Intel chips.
Of course Microsoft's "Pink" has no relation to Apple's other than the name, but I can't help but finding this all ironic.




