Ruby Extensions vs. Smalltalk Primitives

Posted by Rick DeNatale Mon, 04 Jun 2007 14:16:00 GMT

One of the ways in which Ruby differs from Smalltalk is in how much of the implementation is buried in C, which forms a barrier for deep understanding.

For the purposes of this article, I’m going to use the term extension a little loosely to refer to both core library and extension code written in C.

For example, in Ruby, much of the code which implements core classes like Array, is implemented by C code. This is good for performance, but for those with a need or desire to grok the code, not so much. In contrast, Smalltalk has a large brigade of Collection classes which are written in Smalltalk.

This is not to say that Smalltalk doesn’t use the equivalent of extensions. Smalltalk calls them primitives.

There are some interesting differences which might be interesting for the Ruby Core team to ponder, if they haven’t already.

Smalltalk primitives

In Smalltalk-80 and it’s descendants, primitive methods are implemented in a low-level languages like C or Assembler. They are attached to normal Smalltalk methods by a special syntax. For example, here’s a simppet from the Smalltalk-80 Blue Book, this is a method definition for the + method in SmallInteger (Smalltalks analogue to Fixnum in Ruby:

1: + addend
2:   <primitive: 1>
3:   ^ super + addend

The Smalltalk compiler turns method code into CompiledMethod objects which contain the bytecodes implementing the smalltalk code along with other information such as where to find temporary variables. In the case of a method such as the bytecodes would be something like:

   push self
   push addend
   send_super #+, 1  // invoke the inherited + method with 1 argument 
   return_top_of_stack

These are the bytecodes to implement line 3.

The effect of line 2 is to mark the compiled method as being associated with a numbered primitive. A Ruby implementation of a Smalltalk VM might have code like this.

Module VM

  class CompiledMethod

  def execute
      begin
        return exec_prim if has_prim?
      rescue PrimitiveFailed
      end
      return VM.interpret(byte_codes)
  end    

So the idea is that, if a compiled method has an associated primitive method the VM executes it. If it succeeds then that’s the end of this method invocation. If it fails then execution falls through to interpret the byte codes.

In the case of the method we are considering the primitive suceeds if the argument is also a SmallInteger, and the result doesn’t overflow othewise it fails. This allows primitives to quickly handle the 80% cases, and fall back on high-level smalltalk code for the more complicated cases.

User Primitives

Later Smalltalk implementations such as Smalltalk/V and IBM/Smalltalk allowed user written primitives. These were usually referenced by namerather than a number. The same flow allowed fallback to Smalltalk code on failure of a user primitive, either for error recovery, or for balancing implementation complexity against performance.

Ruby Extensions

In contrast, Ruby methods are either wholly normal Ruby code, or wholly extension methods written in C functions registered using ruby api calls like rb_define_method. These methods are somewhat invisible to the ruby programmer. This can be confusing, particularly when you are trying to read code which is partially implemented in C and partially in Ruby. Just this morning I was trying to debug somebody else’s code which was using the eventmachine gem. Looking at the source for eventmachine showed methods being invoked with no definition visible in the ruby source. These methods actually live in a C extension to the class.

Could Ruby Use Something like Smalltalk’s Primitive Methods?

It might be nice if this could be made visible using a mechanism like Smalltalk’s primitive not only to document such cases, but as a way to do th ekind of complexity/performance balancing I’ve described.

It might be interesting to think about combining this with the ideas behind ruby-inline, and to allow the primitives to be written in-line as well. The difference I would see would be the introduction of the idea of failure/recovery and better support for usage of the ruby.h api inside the inlined primitives.

Just a thought.

Postscript – Related Technology

Although I consider the Smalltalk primitive idea as something which might be added to Ruby, Smalltalk and related languages have added more modern techniques for moving the implementation of the core up into the higher-level language.

Squeak smalltalkUses a language called slang (not tobe confused with s-lang), as the source language for the Squeak VM. Slang has a Smalltalk-like syntax but it is easily translated to C. Squeak user primitives are written in slang as plug-ins.

Evan Phoenix’, Rubinius started out as a Ruby VM being written in Ruby or a slang-like language with a ruby-like syntax, but it seems to have changed tack to use a handcoded port of Evan’s earlier Ruby code to C. This ported VM is called Shotgun. Shotgun seems to be taking an approach quite similar to the original Smalltalk-80 numbered primitives to implement the functions of the ruby standard library. I don’t beleive that they are using the primitive failure with fallback idea though.

Java has the Java Native Interface (or JNI) which is similar to ruby’s extensions in that it provides a c-API to the JVM. It differs from Ruby in that Java classes must have declarations of any native functions they provide and are responsibole for loading the load library containing the binary.

I must confess that it’s been a few years since I stopped keeping up with Java evolution, so there may be new things in Java in this area.

An interesting historical sidenote is that the original VisualAge for Java which was written before the advent of JNI used what was called the “UVM” or Universal Virtual Machine. This was a version of the IBM Smalltalk VM extended to support both Smalltalk and Java bytecodes. The UVM used Smalltalk as the language in which Java native methods were written. When Sun came out with the JNI, with requirements for C language primitives, VA/Java moved from a Smalltalk VM to either the Sun or IBM JVMs.

After Dave Ungar cut his teeth (and earned his PhD) working on the UCB implementation of Smalltalk-80, he took on the challenge of implementing a dynamic language completely in itself in the aptly named self language. The philosophy behind self abhored statically optimized techniques, including prmitives written in a low-level language, in favor of dynamically optimizing code at run-time. This was the genesis of the JIT and HotSpot approaches commonly used in Java. It was also the basis of the Strongtalk project which saw members of Dave’s self team taking some of the ideas from Self back to a Smalltalk implementation.


Comments

  1. Patrick Mueller about 4 hours later:

    “there may be new things in Java in this area”

    Nope. Nothing’s changed.

    One thing I’d like to see available in all languages is what we had in Smalltalk called PlatformFunctions and OSObjects. Basically, a generic way to interface with platform-specific shared libraries (.so’s, .dll’s). Not sure whether Java will ever get over the whole “pointers are evil” mantra. But there’s no reason Ruby can’t. From memory, in all the ‘native’ interactions I ever did in Smalltalk, I only ever wrote one true native, everything else was PlatformFunctions.

  2. Francis Bogsanyi about 5 hours later:

    Self wasn’t implemented completely in itself. The JIT, GC and runtime was implemented in C+++ and C/C+++ natives were identified in Self source code with an underscore prefix.

    There was a recent attempt to implement Self in Self, called Klein, but that appears to have been abandoned (the source code is available).

  3. Rick DeNatale about 21 hours later:

    Pat,

    Good to hear from you.

    Yes, the PlatformFunctions and OSOjects in IBM Smalltalk form an example of what’s generally called a Foreign Function Interface.

    You’re stirring up old memories. Didn’t we have a special primitive syntax in IBM Smalltalk that PlatformFunctions used in which you described the type interface to a C function something like:

    Am I inventing this? Or maybe I saw it in another Smalltalk back in the X3J20 days.

    Rubinius appears to have an analogous way to dynamically generate stubs to call external code. They’re using GNU Lightning which is something I hadn’t heard of before. I don’t see why this idea couldn’t be implemented in the context of standard Ruby.

    Francis,

    Yes, you are right, the point is that Self went quite far in keeping the stuff below the VM interface as small as possible and doing as much of the ‘class’ library in the high-level language. Smalltalk is in the middle between Ruby and Self here in that it had a collection of primitives to provide things like indexed and byte indexed objects and then built the collections and other classes on top of these, mostly in Smalltalk but with help from these primitives as needed. Ruby on the other hand buries the implementation of Array beneath the language/implementation line.

    So perhaps my abhored might be a better word than eschewed, it seems to me that Dave had a philosophy of keeping things as high-level as possible only bowing to pragmatics as a last resort. Since I notice I’ve got a typo anyway, I’ll change that sentence.