Hamstar Transforms Immutable Ruby Collections Better

When working with Amazon’s AWS SDK for Ruby there are two flavors of API. The most mature parts of AWS have what’s called a resource interface. These are fine-grained object-oriented interfaces to AWS objects. Want to delete a bunch of objects having the /tmp-files/ path in an S3 bucket? Easy:

bucket.objects(prefix: '/tmp-files/').delete

That’s a convenient way to deal with S3 buckets. Alas vast swaths of the AWS service-space have no such convenient interface. Those services offer only a thin veneer over the underlying REST interface via a client object. A prime example of a service offering this more primitive interface is the EC2 Container Service or ECS.

If you want to say, change the version of a Docker image, referenced within an ECS task definition, the process is:

call describe_task_definition to retrieve the structure
pluck the container_definitions array from the structure and modify the image value for the container of interest
call register_task_definition, passing the doctored container array

Ruby is pretty good at this sort of thing. At step 2, you just mutate the structure retrieved in step 1. At step 3 you submit part of the mutated structure back to AWS.

If you’ve spent any time programming in Haskell or Clojure though, you may be hankering for a more “functional” solution. Do you really have to mutate that big complex structure? What if later you found that you needed two different versions of the structure, with two different mutations? Would you retrieve the structure from AWS twice? Or would you think about “deep cloning” the structure each time? That way lies madness.

First, we’ll take a look at how we would usually approach this in Ruby, then we’ll look at how a Clojurist might think about the problem, and finally we’ll apply a more functional appproach to our Ruby solution. In the process we may even do Clojure one better!

The Problem

The AWS ECS client returns a structure something like this from describe_task_definition:

response = {
  task_definition: {
    task_definition_arn: 'some-arn-string',
    container_definitions: [
      {
        name: 'front',
        image: 'nginx:1.7'
      },
      {
        name: 'my-python-web-app',
        image: 'my-python-web-app:latest'
      }
    ]
  }
}

Now there is a lot more stucture in the actual response. I’m just showing the parts that are essential for our discussion.

What I’d like to do is upgrade the NGINX Docker image used by the container named “front”, to version 1.9. Note that in my upgrade script, I don’t know what version of nginx is currently in place, nor do I want to upgrade every reference to the NGINX image. I only want to upgrade ‘front’ to NGINX version 1.9.

Here’s some typical Ruby to do the job:

response[:task_definition][:container_definitions].each { |c| \
  c[:image] = 'nginx:1.9' if c[:name] == 'front' }

Now our response object can be submitted back to the server. But what if we didn’t want to mutate the original (deep) structure? Let’s just do a plain old clone of that object and see what we get:

> r2 = response.clone
> response.__id__
 => 2155921900
> r2.__id__
 => 2161504860

So far, so good: we’ve made a copy of the response object! But wait:

> response[:task_definition].__id__
 => 2155921940
> r2[:task_definition].__id__
 => 2155921940

While the top-level Hash has been cloned, the internal values have not. This was not a deep clone.

You may want to take a moment now to follow the rabbit down the hole of deep cloning Ruby objects. Or you may want to go shave your yak. I’ve been down that hole and I’ve spent time with the yak and I believe there is a better way.

Forget Deep Cloning: Use Hamster’s Immutable Collections Instead

When we program in Clojure we never clone anything. Why is that? Well its because the day-to-day operations we perform on collections always automatically create what appear to be brand new, modified, copies. Part of the magic of Clojure is that this illusion is efficiently maintained. Hint: your structures aren’t really deep cloned every time you would otherwise mutate them. The Clojure Data Structures page explains it well.

Can we use the same idea to completely side-step this problem in Ruby? Well yes, we can! Some smart people already created a Clojure-like immutable collections library for Ruby. The Hamster Gem gives you a set of collection classes that mirror the familiar Ruby ones like Hash and Array. But the Hamster classes provide immutable operations (Hamster API documentation is here).

Here’s the Hamster code:

require 'hamster'
ri = Hamster.from(response) # convert to immutable containers
containers = ri[:task_definition][:container_definitions].\
 map{|c| c[:name] == 'front' ? c.put( :image, 'nginx:1.9') : c}
pp Hamster.to_ruby(containers)
[{:image=>"nginx:1.9", :name=>"front"},
 {:image=>"my-python-web-app:latest", :name=>"my-python-web-app"}]

We convert the response to immutable collections and then we use map to create a brand new container_definitions array (Vector). We examine each Hash in turn. If a hash has :name 'front' we apparently create a brand new hash. Hamster magic takes care of making that apparent duplication efficient. The new structure will share the unmodified parts of the original. Only the modified parts will require extra storage.

This is fine if we only need to modify one collection. But what if we wish to retain the deep structure? The Hamster example just given isn’t actually doing everything our original Ruby example did. This first Hamster example is only creating a new containers vector. What about the enclosing task_definition hash?

It is not necessarily obvious that immutable versions of Hash and Array will be sufficient for this task. The response structure is a hash containing a hash containing an array of hashes. Will Hamster provide a way for us to modify these deep structures in nested containers and create new nested containers (i.e. create the illusion of creating new nested containers)?

The answer is “yes!”. Hamster provides an update_in() method that works just like Clojure’s update-in() function. It’s accessed as a method on Hamster’s Hash and Vector classes and it is generic enough to transform deep structures consisting of Hashes and Vectors.

Hamster update_in() takes a path specification of “keys” (hash keys and array indices) into your deep (hash and array) structure and applies a block to the target value, deep within the structure. Applying that block modifies the target value, resulting (apparently) in a new collection object. As it unwinds the stack, update_in() modifies enclosing collections (apparently) creating new ones until it reaches the top-most collection. Because it’s all built on Hamsters efficient collection primitives, the whole thing is efficient.

With update_in you could, for instance, update the :image on container 0 to nginx:1.9 like this:

rv2 = ri.update_in( :task_definition, :container_definitions, \
 0, :image){|i| 'nginx:1.9'}
pp Hamster.to_ruby(rv2)
{:task_definition=>
  {:task_definition_arn=>"some-arn-string",
   :container_definitions=>
    [{:name=>"front", :image=>"nginx:1.9"},
     {:name=>"my-python-web-app", :image=>"my-python-web-app:latest"}]}}

You can see the Clojure philosophy at work here. An array or vector is very much like a hash in that an array maps keys to values. In the case of an array or vector the keys all happen to be natural numbers. Hamster has stacked the deck by providing a core set of methods on both Hamster::Vector and Hamster::Hash, that update_in() can rely on, specifically fetch(key,default) and put(key,val). We Rubyists call this duck typing. Hamster’s update_in is described in the Transformations section of the Hamster API doc. Hamster defines a mix-in module called Associable that implements update_in() for classes that meet the criteria.

Not Quite There—We Need Something More

This would be great if we always knew the index of the Vector element that we wanted to update. But in general we do not. What we really want to do is update whatever Vector element is a Hash with :name equal to 'front'. Instead of specifying Vector offset ‘0’ we’d like to specify the element whose value meets our criterion.

What if we had a function called update_having() that could take a different kind of path specification e.g.

update_in():     :task_definition, :container_definitions, 0,               :image
update_having(): :task_definition, :container_definitions, [:name,'front'], :image

I’ve created a little gem called Hamstar that does just that. Dig:

require 'hamstar'
rv3 = Hamstar.update_having( ri, :task_definition, :container_definitions, \
   [:name,'front'], :image){|i| 'nginx:1.9'}
pp Hamster.to_ruby(rv3)
=> (exact same result as before with update_in)

Oh and Hamstar.update_having() uses the case comparison operator === on your match value so you can use regexps and ranges and other interesting objects besides strings there. So if you wanted a case-insensitive match on the name you could:

rv4 = Hamstar.update_having( ri, :task_definition, :container_definitions, \
   [:name,/FRONT/i], :image){|i| 'nginx:1.9'}
pp Hamster.to_ruby(rv4)
=> (exact same result as before)

Rather than injecting update_having() into Hamster classes (as a method) I opted to implement it as a (module_function) on the Hamstar module. This is less intrusive and might make it a little more apparent that update_having() operates on compound structures of arrays and vectors.

The name “Hamstar” is a portmonteau of “Hamster” and “star”. I chose the name because of the other feature Hamstar adds, and that is the Kleene star. You can use the Kleene star to match all container elements.

Say you wanted to capitalize the :name value on every element of the :container_definitions array. Doing this with update_in() would require one call for each element of the array. With Hamstar you can do it in a single call using the Kleen star '*'. Here we use a '*'at the top-level to select every association in the top-level hash, and another '*' to select every element of the :container_definitions value:

rv5 = Hamstar.update_having( ri, '*', :container_definitions, '*', :name){|n| \
  n.capitalize}
pp Hamster.to_ruby(rv5)
{:task_definition=>
  {:task_definition_arn=>"some-arn-string",
   :container_definitions=>
    [{:name=>"Front", :image=>"nginx:1.7"},
     {:name=>"My-python-web-app", :image=>"my-python-web-app:latest"}]}}

The addition of the Kleene star was inspired by the Instar Clojure library. That’s an interesting and powerful library. It doesn’t, however, provide the associative selection that Hamstar offers.

And finally, if none of those matchers work for you, you can specify your own Proc as a matcher. Here we capitalize all names containing an ‘f’:

rv6 = Hamstar.update_having( ri, '*', :container_definitions, '*', ->(k,v){ \
 k==:name&&v=~/f/}){|n| n.capitalize}
pp Hamster.to_ruby(rv6)
{:task_definition=>
  {:task_definition_arn=>"some-arn-string",
   :container_definitions=>
    [{:name=>"Front", :image=>"nginx:1.7"},
     {:name=>"my-python-web-app", :image=>"my-python-web-app:latest"}]}}

Conclusion

Thinking like a functional programmer is new territory for me. There is a menagerie of data structures, algorithms, and techniques to master. The rewards are considerable.

It’s fun and even profitable to take some of these ideas back to our Ruby cave for experimentation. You’ve seen how the Hamster Gem’s immutable collections can be used to do an everyday job in a different way. My Hamstar Gem’s update_having() extended the functionality of update_in(), with associative selection criteria and a Kleene star.

You may look at the original (idomatic) Ruby and conclude that it is more readable than the more functional alternative. You may also encounter surprising results from your new “functional” code. For instance:

x = Hamster.from({name:'Pat'})
 => Hamster::Hash[:name => "Pat"]
x.update_in(:name){|name| name << 'sy'}
 => Hamster::Hash[:name => "Patsy"]
x.update_in(:name){|name| name << 'sy'}
 => Hamster::Hash[:name => "Patsysy"]

“The collection x is immutable—but it appears that x was modified by the first update_in()”, you say. What happened here is we used a mutating operation << on a String stored in an immutable collection. The Hamster main page says:

While Hamster collections are immutable, you can still mutate objects stored in them. We recommend that you don’t do this, unless you are sure you know what you are doing.

If we had written it this way, we would have gotten the expected results:

x = Hamster.from({name:'Pat'})
 => Hamster::Hash[:name => "Pat"]
x.update_in(:name){|name| name + 'sy'}
 => Hamster::Hash[:name => "Patsy"]
x.update_in(:name){|name| name + 'sy'}
 => Hamster::Hash[:name => "Patsysy"]

Since the whole Ruby language, environment and eco-system formed around mutability, while functional programming in Ruby is possible, there are many pitfalls. There is a heavy intellectual burden, at least initially, when you attempt to go functional in Ruby.

My experience experimenting with functional programming in Ruby mirrors what Mister Miyagi said about karate:

Miyagi: get squish just like grape. Here, karate, same thing. Either you karate do “yes” or karate do “no.” You karate do “guess so,”

from IMDB Karate Kid quotes

Clojure, on the other hand, started from a more functional place. Immutability is the default there. While you can port much of Clojure to Ruby, you may find after you’re done, that you’ve created a somewhat confusing and hostile environment.