On Variables, Values, and Objects
Posted by Rick DeNatale Wed, 13 Sep 2006 19:47:00 GMT
I’ve recently observed some posts to ruby-talk which evidence some confusion on the part of the posters about the relationship between variables and objects in Ruby. One currently active thread concerns several participants who are upset that instances of the Matrix class in the Ruby standard library can’t be changed once created.
In Ruby, like in other uniformly object-oriented languages, the relationship between variables and their values is subtly different than in other languages, and this is a crucial paradigm shift, which must be crossed in order to understand Ruby.
In many languages a variable names an area in memory which holds a “bag-of-bits” representing the value of the variable. The size of that area depends on the type of the variable. A variable holding an integer might be 4-bytes long, while one holding a particular structure might be 325 (or whatever) bytes.
In a uniformly object-oriented language, all variables reference objects, and any variable can reference any object, or different objects over time. My good friend at IBM, the late David N. Smith, used to say that in such a language, “all variables are the same size,” when he wrote about or taught Smalltalk.
This distinction can trip up the unwary. Let’s try to clear some of the stumbling blocks out of the way.
Ruby Variables Don’t Have (Permanent) Classes
In one recent thread, someone suggested adding a method to the “class of a variable.” The problem with this idea, is that a variable doesn’t really have a class. For example, consider thiscode:require 'matrix'
a = 5
a = a * 1.0
a = a * Matrix.identity(3)Here the variable a refers to a sequence of objects, each with a different class. First a Fixnum, then a Float, then a Matrix. While Fixnum, and Float inherit from Numeric, Matrix doesn’t but it can form a duck-type with Numeric in many uses.
This also illustrates that “all variables are the same size.” In a language like C, a variable holding an integer, will probably have a diffferent size as one holding a double (which is what Ruby defines as the representation of a float), and one which holds a 3×3 matrix of doubles certainly will be bigger than one holding a scalar double.Ruby Variables Hold Object References, not Object State
The magic which allows this is the fact that in Ruby, as in Smalltalk, a variable doesn’t hold the “bag-of-bits” which represents a particular data-structure. Instead it contains a reference to an object which holds the “bag-of-bits.” The actual structure of that “bag-of-bits” is of no concern to, and is hidden from the variable and it’s users.This encapsulation barrier is key to what makes a language truly “object-oriented,” at least the way I use the term.
The “bag-of-bits” which represents a particular object is visible to the methods of that object’s class, at least in an abstract way. Some details are even hidden to those methods at the Ruby source code level, and are tied up in the language implementation, hand written extensions to Ruby usually do need to deal with those details.
In Smalltalk, the pseudo-variable self is actually strongly typed in the C-sense since instances of Smalltalk classes have a fixed structure, and self in a method is guaranteed to refer to an instance of the class owning the method or one of it’s subclasses, and instance variables are found in known slots within the object.
In Ruby it’s a little different because Ruby instance variables are created dynamically and accessed via a hash table within the object, at least in the 1.8.x implementation.
As a first approximation, we can initially think of variables as holding pointers to objects, but I’ll expose this for the pedagogical lie that it is, a little later in this article.
Mutability, and Aliasing
Here’s one of those stumbling blocks for those who expect variables in a uniformly object-oriented language to work like they do in a language like C or Fortran: 1: a = [1, 2, 3]
2: b = [1, 2, 3]
3: c = a
4: a[1] = 0
5: p a #=> [1, 0, 3]
6: p b #=> [1, 2, 3]
7: p c #=> [1, 0, 3]Everything looks pretty normal up ot line 7. We never did anything with c after we assigned it a value, but it doesn’t seem to have held it’s value.
Or has it?
Well, it actually has, since it’s value is a reference to an object, and that object is the same one referenced by a.
Line 4 might look like an assignment to the variable a, but it’s really a method call to the array which a happens to be referencing at the time. And that method (called []=) changes, or mutates that array. That change will be visible through the variables a, c and any others that reference that particular array. Multiple references to the same object are called aliases to that object. They might be named variables, or referenced which are inside another object: 8: d = [[1, 2, 3], [4, 5, 6]]
9: e = d
10: d[1][1] = "Fred"
11: p e #=> [[1, 2, 3], [4, "Fred", 5]]]While such mutating methods are often very useful, this effect of aliasing is something which the Ruby programmer who uses them needs to keep in mind.
Many built-in Ruby classes have both mutating, and non-mutating versions of methods, for example Array#compact returns a new array which is a copy of the receiver with nil elements removed, leaving the original array intact, while Array#compact! mutates the receiver and removes the nil’s in-place.
As with compact and ocmpact!, it’s standard practice to give such mutating methods a name with a trailing ‘!’ which serves as a warning that they mutate the receiver. It’s a warning to the wise.
That Little White/Pedagogical Lie, or Variables and Object Identity
I said above that thinking that variables hold pointers to objects was a first approximation to reality. The fact is they really don’t12: a1 = [1, 2, 3]
13: a2 = [1, 2, 3]
14: p a1.object_id #=> -605450882
15: p a2.object_id #=> -605225658The object_id method returns a number which is unique in the sense that no two active objects will ever have the same object_id, and a given object will always have the same object_id. In the Ruby 1.8.x implementation, object_id actually just gives the bits of the value used to reference the object in the guise of a Fixnum.
So we see here that a1 and a2 are diffferent objects. Let’s try a few more objects:14: i1 = 1
15: i2 = 1
16: p i1.object_id #=> 3
17: p i2.object_id #=> 3
18: sym1 = :sym
19: sym2 = :sym
20: p sym1.object_id #=> 4090126
21: p sym2.object_id #=> 4090126 Here’s another surprise. Before when we created two different arrays, we ended up with two different objects. But even though we never assigned i2 to i1, or sym2 to sym1, the ‘two’ 1s have the same object_id, and the ‘two’ symbols named :sym share an object_id, what’s that about?
Certain classes ensure that if ‘two’ of their instances are equal they are the same object. We can see that Fixnum, and Symbol do this, others are NilClass, TrueClass, and FalseClass. For symbols, this conflation of value and identity is part of their definition, for the others, it’s probably not strictly necessary, but it makes some common operations like testing for equality, and in the case of Fixnums arithmetic operations, much faster.
What’s really held in variables is a value which can either be used to recognize that the object is one of these special objects, or that can be mapped efficiently to the real address of the object. The details are implementation-specific, and not really germane to the current topic, so I won’t go into them in this article.Immutable Objects
It’s often advisable to design classes without mutating methods. For example, Fixnum has a method [] which returns the nth bit of the representation of the number where bit 0 is the least significant bit. Suppose that it also had a method []=, which set the value of the cooresponding bit:#Hypothetical code - with a tribute to Jimi Hendrix
h1: a = 6
h2: b = 6
h3: a[0] = 1; a[1] = 0; a[2] = 0; a[3] = 1
h3: p a #=> 9
h4: p b #=> 9Since all 6’s are the same object, then if Fixnum had mutating methods, we could make all 6’s turn out to be 9. While Jimi might not mind, my program might not like it.
And it would cause head-scratching debugging sessions. Thirty-two years ago, I found myself in just such a situation. Fortran II had a bug in the language specification which allowed the value of integer literals to be changed when you passed a literal as an argument to a subroutine which assigned to that argument. I spent over a day trying to figure out why my program didn’t work.
For this reason, to quote Martha Stewart, it’s a good thing that the Ruby numeric classes don’t have mutating methods. Even though Bignums and Floats don’t conflate value with identity, it would probably be even more mystifying to find that some instance of 1.5 changed to 1.45 when other’s didn’t. To me it seems far better to head off that particular ‘feature’ at the pass.
Another reason for this choice of immutability is taken from functional programming. Although Ruby is not really an FP language, it takes many concepts from that paradigm. In FP, functions with side-effects are disallowed, and mutation is certainly a side-effect. Taking FP to the limit makes doing things which require side-effects, like I/O, take special tricks. Ruby allows side-effects, and doesn’t require an FP approach, but like other features like regular expressions, understanding functional approaches is a useful tool in the Ruby programmer’s bag.
Immutability at the Instance Level
Ruby Object’s have a freeze method which makes a particular instance immutable:F1: a1 = [1, 2, 3]
F2: a2 = a1
F3: a1.freeze
F4: a1[1] = "Fred" #=> TypeError: can't modify frozen array
F5: a2[1] = "Fred" #=> TypeError: can't modify frozen arrayF6: a3 = [[1, 2, 3], [4, 5, 6]]
F7: a3.freeze
F8: a3[0][1] = "Fred"
F9: p a3 #=> [[1, "Fred", 3], [4, 5, 6]]Matrix and Mutability
Following the pattern established by Numerics, the designer of the Matrix class, who, according to the comments in the source code, ported it from Smalltalk, made Matrix instances immutable.This upsets some who prefer to think of a Matrix as a 2-dimensional array rather than a mathematical construct which acts much like a numeric in linear algebra.
If you want to view matrices that way you’ve got a few choices:
- Swim-against the current: Monkey-patch Matrix to add mutating methods.
- Swim with the current: Use the Matrix#to_a and Matrix[] to get a mutable array and then create a new Matrix from it:
M1: m1 = Matrix.identity(3) M2: p.m1 #=> Matrix[[1, 0, 0], [0, 1, 0], [0, 0, 1]] M3: a1 = m1.to_a M4 a1[1][1] = 3 M4 m1 = Matrix[a1] M5: p m1 #=> Matrix[[[1, 0, 0], [0, 3, 0], [0, 0, 1]]] - Find a new stream: look at other sources of a mutable 2-dimensional array, for example









