Refactoring With Continuous Testing - a Guided Tour
Posted by Rick DeNatale Thu, 11 Oct 2007 10:47:00 GMT
Today on Ruby Fleebie, Frank poses some ruby code to be "rubyized." I took this as an opportunity to do a little exposition of re-factoring under the watchful eye of autotest.So I've taken Frank's code and run it through the re-factoring machine several times. A word to the 'squeamish' because of the use case, there are a few words in the code which some folks, and spam filters might find mildly offensive, but we're all adults here right?
Now I've got to admit that I really came up with the final solution pretty quickly after seeing the blog post, but then I went back and eased up to it for pedagogical purposes. So if you're in the mood, sit down and lets re-factor some ruby!
The Original Code
Here's Frank's original code:
def remove_insults(input)
ctr = 0
input.each do |word|
word = word.downcase
if word == "stupid" || word == "moron" ||
word == "dumbass" || word == "retard"
i=0
word.length.times do
input[ctr][i,1] = "*"
i+=1
end
end
ctr += 1
end
puts input.join(" ").to_s
end
remove_insults "you truly are a moron sir!".split(" ") Getting Ready for Autotest
My first step was to setup a local subversion repository, so that I could go back over my work and tell you about it. Then I set up a project directory structure and imported it.
Autotest want's a particular directory structure, so I set up a project directory with a lib subdirectory to hold the code, and a test directory to hold the testcase. The code also needs to be structured as a class, so I changed the code a bit and put it in lib/civilizer.rbclass Civilizer
def self.remove_insults(input)
puts civilize(input)
end
def self.civilize(input)
ctr = 0
input.each do |word|
word = word.downcase
if word == "stupid" || word == "moron" || word == "dumbass" || word == "retard"
i=0
word.length.times do
input[ctr][i,1] = "*"
i+=1
end
end
ctr += 1
end
input.join(" ").to_s
end
end
Civilizer.remove_insults "you truly are a moron sir!".split(" ") Note that I also set this class up so that it has two methods, remove_insults prints the bowdlerized input, while civilize returns it as a string. This change makes the code testable
Now we need a test case, that goes in test/test_civilizer.rb:
require "test/unit"
require "civilizer"
class TestCivilizer < Test::Unit::TestCase
def test_case_name
assert false
end
end More autotest conventions here, we're testing the class Civilizer which lives in lib/civilizer.rb so that autotest can find it, the test case class needs to be TestCivilizer (i.e. the class under test suffixed by Test).
This initial test case is designed to fail, at this stage I'm just verifying that autotest is happy with my setup, and runs the test when either the code under test or the test case changes.
A Real Test Case
The next step is to test the actual code. So I changed the test_case_name method (that clever name was generated by the textmate tc snippet by the way, and I didn't change it) to:
def test_basic_operation
assert_equal "you truly are a ***** sir!" , Civilizer.civilize("you truly are a moron sir!".split(" "))
end When I save this, autotest runs the test and it passes. Okay, now that we've got a baseline to refactor against, lets start changing code.
Our First Change
Let's make use of Ruby's Array#include? method and get rid of that if statement with the long chain of ||s:
class Civilizer
def self.remove_insults(input)
puts civilize(input)
end
def self.civilize(input)
ctr = 0
input.each do |word|
word = word.downcase
if %w[stupid moron dumbass retard].include?(word)
i=0
word.length.times do
input[ctr][i,1] = "*"
i+=1
end
end
ctr += 1
end
input.join(" ").to_s
end
end
Civilizer.remove_insults "you truly are a moron sir!".split(" ") Once again, I save this, and autotest automatically confirms that my change didn't break anything.
Let's Throw out a Loop or Two
That loop to generate the replacement string, seems unwieldy and error prone, let's get rid of it. Here's the new code. I'm focusing on just the method here, the rest of the file hasn't changed:
def self.civilize(input)
ctr = 0
input.each do |word|
word = word.downcase
if %w[stupid moron dumbass retard].include?(word)
input[ctr] = "*" * word.length
end
ctr += 1
end
input.join(" ").to_s
end Once again we've successfully cleaned up the code a bit, and the test success shows we haven't broken anything.
While we're looking at loops, how about the one which steps through the words. We're initializing a counter before the loop and incrementing it at the bottom of the loop. Smells too much like C if you ask me!
def self.civilize(input)
input.each_with_index do |word, ctr|
word = word.downcase
if %w[stupid moron dumbass retard].include?(word)
input[ctr] = "*" * word.length
end
end
input.join(" ").to_s
end Now this looks more rubyish, but looking at it in retrospect, a better move here would have been to use input.map here with a block which returned either the original nice, or blotted out naughty word, but I didn't think of that at the time. Of course with my tests I could go back and do that, but I'll leave that as an exercise for the interested reader.
Anyway, that if statement in the middle is taking up more than it's share of vertical space, let's fix that by using a statement modifier:
def self.civilize(input)
input.each_with_index do |word, ctr|
word = word.downcase
input[ctr] = "*" * word.length if %w[stupid moron dumbass retard].include?(word)
end
input.join(" ").to_s
end This code is getting shorter and tidier as we go, although not as short and tidy as that idea of using map, oh well, but keep with me, it's going to end up real short.
def self.civilize(input)
input.each_with_index do |word, ctr|
input[ctr] = "*" * word.length if %w[stupid moron dumbass retard].include?(word.downcase)
end
input.join(" ").to_s
end I don't see any need for a separate assignment statement to downcase the input, so I moved that to the argument of include?, still shorter, but hang on.
Time For Some More Tests, and a Shift in Strategy
We've gotten to this point without changing any testcase code. I've been using the testcase to ensure that I haven't changed the code in a way that would make it fail to meet the original 'specification.
Let's revisit the testcase now. It's a pretty simple test case, how about some variations. Here's a new test method I added to test/test_civilizer.rb:
def test_should_find_multiple_words
assert_equal "With all due respect, you must be a ******, you *****!",
Civilizer.civilize(
"With all due respect, you must be a retard, you moron!".split(" "))
end I've reformatted this a bit to try to keep it narrower for the blog presentation.
Finally, when I save this, autotest informs me that the test has failed, a little debugging reveals that the problem is those pesky punctuation characters. That String#split method is limiting us to splitting words on spaces.
Now one way to fix that would be to use a regexp instead of a string as the argument to String#split, but why not just use a regexp to implement the Civilizer, here's the new sleek version of the civilize method:
def self.civilize(input)
input.join(" ").gsub(/\b(stupid|moron|dumbass|retard)\b/) {|match| '*' * match.length}
end We've gotten the body of the method down to a one liner, and we now properly handle punctuation!
The Value of [TB]DD
Let's add another test:
def test_should_ignore_case
assert_equal "*******!", Civilizer.civilize("Dumbass!")
end Oops! this test fails. Our original tests didn't exercise the use of lowercase in the original code.
This just shows that even test infected developers are human. Actually if we had developed the original code in a test driven style, we wouldn't have invoked lowercase until we'd written a test case to motivate it.
Another useful tool to talk about putting in the arsenal is a coverage tool like rcov which profiles your running tests to show how much of the code under test is being exercised, although in this case, it wouldn't have shown the problem
Anyway, let's fix the problem, we just make the regexp case insensitive:
input.join(" ").gsub(/\b(stupid|moron|dumbass|retard)\b/i) {|match| '*' * match.length} All it took was one little 'i' to fix this bug. Now autotest is once again showing us the green light.
Is the Specification Right?
I've been ignoring it all along, but having the caller split the input before giving it to the Civilizer has been bothering me even way before I changed the implementation to using gsub with a regexp, and since then, that code to 'unsplit' it has been sticking out like a sore thumb. So, after switching from my consultant hat to my client hat and having a quick conference, "we've" agreed to change the interface so that the parameter is a simple string.
Having agreed to this specification change, let's change the specification. What do we change? The testcase code of course, this is the point of test-driven design (or behavior driven design for you Sapir–Whorf adherents), whether you call them tests or something else, they are really specifications.
Without further ado, here's the new spec:
require "test/unit"
require "civilizer"
class TestCivilizer < Test::Unit::TestCase
def test_should_find_one_word
assert_equal "you truly are a ***** sir!" ,
Civilizer.civilize("you truly are a moron sir!")
end
def test_should_find_multiple_words
assert_equal "With all due respect, you must be a ******, you *****!",
Civilizer.civilize("With all due respect, you must be a retard, you moron!")
end
def test_should_ignore_case
assert_equal "*******!",
Civilizer.civilize("Dumbass!")
end
end I've used slightly unusual formatting here too, which makes it easy to visually compare the expectation with the input.
In good TDD style, we save this test before we change the code to meet the new specification. As we expected, autotest shows us a red light. This is an indication that our changed tests are good. Now let's change the code:
class Civilizer
def self.remove_insults(input)
puts civilize(input)
end
def self.civilize(input)
input.gsub(/\b(stupid|moron|dumbass|retard)\b/i) {|match| '*' * match.length}
end
end
Civilizer.remove_insults "you truly are a moron sir!" Meeting the new spec just required getting rid of the join in civilize, and the split in the final statement which gets executed when we run this code for 'real' as opposed to testing it. Once again, the feedback from autotest goes green.
Test Code Deserves to Be Beautiful Too
At this point I stepped back and tidied up the test code. I wanted to make it DRYer, so I introduced a private method to move the repeated code out of each test case:
require "test/unit"
require "civilizer"
class TestCivilizer < Test::Unit::TestCase
def test_should_find_one_word
assert_civilizes "you truly are a moron sir!",
"you truly are a ***** sir!"
end
def test_should_find_multiple_words
assert_civilizes "With all due respect, you must be a retard, you moron!",
"With all due respect, you must be a ******, you *****!"
end
def test_should_ignore_case
assert_civilizes "Dumbass!",
"*******!"
end
private
def assert_civilizes(input, expected)
assert_equal expected, Civilizer.civilize(input)
end
end I don't know about you, but I like the way this code looks. I made a conscious decision on the order of the arguments. Normally assert* methods first argument is what's expected, but here it's the second. The decision on the argument ordering was tied up in what to name the method. I read it as assert_civilizes this [to] that. I probably spent a full minute or two pondering this, probably the most 'agonizing' decision in this exercise. If I kept on with this particular 'project' that might change, no code is sacred!
Another Test, Another Spec Change?
The last test I wrote was this:
def test_should_treat_parts_of_words
assert_civilizes "You, sir are a retarded moron.",
"You, sir are a ******ed *****."
end I've left this as a proposed spec change. It fails because one of the bad words is not bracketed by word boundaries. This would be easy to make work, just get rid of the two '\b' anchors in the regexp, but I'm not sure that it's a good idea, and neither is the 'client'. Should 'stupidity' be considered a bad word? or oxymoron?
I've raised it with the 'client' and I'm still waiting to hear back.
What Have We Learned?
The whole exercise, involving some 20 subversion commits, probably took me 30-45 minutes, far less time than writing this article.
I hope that those of you who have stuck this out to the end found it interesting and perhaps useful. I think it illustrates some key points about test-driven design, rapid iterative development, and dynamic languages:
- Having and maintaining testcode as you go provides value by helping ensure that you aren't wandering away from the goal.
- Good test code, particularly when run automatically, serves as a more powerful substitute for the type checking done by compilers of statically typed languages, it causes both interface and logic errors to cause fast failures which can usually be quickly corrected.
- Just because code exists doesn't mean it's sacred. I'm just as happy, probably happier when I throw old code away and replace it with better code, as long as I'm making progress. Once again testing is a big boon to this goal.











A few times in the above I talked about reformatting the code slightly (additional line breaks) to make it narrower and minimize the horizontal scroll bars.
Well it seems that markup reflows it anyway, so …
Thank you very much – I really enjoyed that!
Great stuff Rick. Thanks for the teaching, as always.
Two notes:
- I’m an anal liner-upper too, and would have definitely lined up the civil/uncivil strings like you did.
- Having to do SVN commits seems like overkill. No way to run in a ‘lighter’ mode where it just polls for file changes? Certainly this sort of behaviour isn’t going to work for people who like commits to be more coarsely grained. But not as much of a problem if you were using a distributed, change set based SCM.
Wow… great article Rick.
I really liked the step-by-step approach that you took.
And thanks for using my article as the starting point!
Frank
Pat,
Actually I think I prefer having control over commits.
I suspect that you’re comparing SVN to Envy, but think that the overall environmental differences, e.g. file vs method granularity, that I think that manually committing makes sense.
And maybe that’s not the real reason. It’s really a matter of committing transactions which in general contain multiple changes. Sort of like an Envy configuration, but not quite.
Rick
Excellent article—much appreciated.
-John
I really enjoyed following this article, thank you for the contribution :)
Thanks to this article, I discovered autotest, which lead to driving over the tracks to UnitRecord and Mocha. Is there no end in sight?