Dirty Rotten Scoundrels

Posted by Rick DeNatale Wed, 03 Feb 2010 17:45:32 GMT

I had an interesting debugging session the other day.

The Rails app I'm working on for a client uses oauth to get user info from one of the popular professional social networking sites.

In order to keep from hitting that site with an oauth request too often, yesterday I worked on a story to add throttling. The client wanted to refresh the data no more often than an application configurable refresh period.

Easy Peasy

Easy enough! I already had a model responsible for keeping the oauth token and returning the relevant data. I added a text field to the model where I could cache the data as a serialized object. Then I added code to the method which returned the data which checked to see if the refresh period had elapsed since the record was updated, and if so, make the oauth request to refresh the data, and save the updated record.

Hang on!

Then I started trying it out, and I got a surprise. The first time I asked for the data, it got fetched via oauth, and the model got saved. I asked again, expecting that the cached data would be returned, but once again the oauth request went out.

After playing with it some more in script/console, and adding some debugging printouts, I finally realized that the problem was that the updated_at field in the record wasn't changing. I saw that the sql query generated by the save was inserting the data value, but not a new value for updated_at, strange!

Alternatives

I'd been using the update_attribute method to change the data value. Just for giggles, I decided to see if setting the value and then explicitly saving the record would fix it. Nope! Exactly the same SQL was generated, with updated_at left unchanged.

Digging deeper

This is the point where I resort to reading code. This project has Rails vendored, so I had no qualms about inserting some debugging alterations to the code. Actually I'm not immune to temporarily monkeying with non-vendored gems to debug arcane stuff like this either.

So the first thing was to find the code in ActiveRecord which writes the updated_at attribute. I found it in activerecord/lib/activerecord/timestamp.rb

  def update_with_timestamps(*args) #:nodoc:
    if record_timestamps && (!partial_updates? || changed?)
      current_time = current_time_from_proper_timezone

      write_attribute('updated_at', current_time) if respond_to?(:updated_at)
      write_attribute('updated_on', current_time) if respond_to?(:updated_on)
    end

    update_without_timestamps(*args)
  end

Those familiar with Rails conventions will realize this as an override to the ActiveRecord::Base#update method using alias_method_chain.

So, in order for the timestamps to be written, record_timestamps has to be truthy, partial_updates? has to be falsy or changed? has to be truthy.

Now record_timestamps is a class_inheritable attribute on ActiveRecord::Base, which allows timestamp recording to be turned on or off for all or particular ActiveRecord classes, and that was true by default. The method partial_updates? determines whether or not the partial update feature has been turned on or off for the model, again this is true by default

This leaves changed? a method added by dirty tracking which indicates whether anything in the record has changed and therefore needs to be written back to the database as part of the update.

So facing this code, I inserted a puts to show me what those three values were.

And the culprit turned out to be the result of the changed? method, which was returning false!

Active Record Dirty Attribute Tracking

The partial update feature, along with. dirty attribute tracking, was introduced in Rails 2.1. Dirty attribute tracking keeps track of the value of attributes when a model is fetched, and partial update only writes changed attributes to minimize the amount of data send to the SQL adapter. They use an instance variable @changed_attributes, a hash mapping the name of any attributes whose value has been changed maintained by the write_attribute method. The changed? method indicates whether any of the attributes have changed. There's also a will_change method which alerts ActiveRecord that a an attribute might change directly, say with something like Book.title.capitalize!

So in this case my model fetched the data via an oauth request, and since the value hadn't changed, dirty tracking determined that the model object hadn't changed, so now updated_at timestamp was written.

I'd run afoul of dirty tracking/partial update before. Back in 2008, I got hired by a startup with a sizable Rails app they'd been working on, which used lots of ActiveRecord callbacks and observers. One of my assignments was to upgrade it from Rails 2.0 to 2.1, but the introduction of partial update made some subtle changes to things like the ordering of those callbacks and observed events which badly broke the application. The CTO quickly came to the conclusion that the upgrade was going to cost more than he deemed it to be worth. So, in our case, the jump from Rails 2.0 to 2.1 was a much bigger one than from Rails 1.x to 2.0.

The Fix

In this case, the fix was considerably simpler, use update_attributes(:data => the_date, :updated_at => Time.now). This worked and ensured that the updated_at attribute gets updated whenever the data is fetched.

One More Little Mystery

After fixing this, I moved on, but one little thing nagged at me. Since the data attribute hadn't actually changed its value, why hadn't the partial update code kept it from being written in the first place?

So I went back and looked a little harder at the active record code and found this:

  def update_with_dirty
    if partial_updates?
      # Serialized attributes should always be written in case they've been
      # changed in place.
      update_without_dirty(changed | (attributes.keys & self.class.serialized_attributes.keys))
    else
      update_without_dirty
    end
  end

Since the data attribute was serialized, it always gets written!

So Is This a Bug in ActiveRecord?

Actually I could make the case that there's one of two bugs here:

  1. Serialized attributes shouldn't be written under partial update if they haven't actually changed, or
  2. The updated_at attribute should be changed if there are any serialized attributes even if the values of those serialized attributes haven't changed.

To be honest, I'm not sure which if either of these is a bug. In my use case, I'd prefer that AR used option 2. On the other hand I can see cases where it would be preferable if updated_at reflected the last time the record actually changed.

On the other hand, I'm not sure that the code right now updates updated_at if the only change to a record is in serialized attributes at least one of whose values has changed.

But, I've got other things to worry about right now, so I guess I'll follow Scarlett O'Hara's advice and "think about that tomorrow!"


Trackbacks

Use the following link to trackback from your own site:
http://talklikeaduck.denhaven2.com/trackbacks?article_id=587

Comments

  1. Derek Neighbors about 16 hours later:

    Nice to see someone write up a debugging session in hopes of saving someone else the pain later.

  2. sinclair bain 21 days later:

    Rick, Thanks for this write-up. Hitting an error in our app where because the #partial_updates? returns true (the default as you state) then the curious line in #update_with_dirty

    update_without_dirty(changed | (attributes.keys & self.class.serialized_attributes.keys))

    is executed which results in the boolean true being passed to to #update_without_dirty … This boolean to the best of my debugging is propagated through subsequent calls till it ends up in #update where it is passed to #attributes_with_quotes as the final argument. Unfortunately the assumption in #attributes_with_quotes is that this will be an Enumerable (or nil) !

    The ‘changed | (@attributes.keys & self.class.serializable_attributes.keys)’ always resolves to ‘true’ in our app. Thus in my mind this should always cause the NoMethodError to be raised.

    Curious code indeed. Still have not figured this out. The ‘fix’ is to turn-off #partial_updates for this class, however this not ‘ideal’ .

    Cheers!