Please Stop Abusing Ruby Memoization

March 18, 2021

I've seen Ruby's memoization taken a license with many times, I've even done it myself, so it's good to keep ourselves grounded in the right abstractions to help us understand that another tool could be a better fit for use. For the uninitiated, memoization refers to storing a result so that it can be accessed later without recalculation. In other words, memoization is a way to optimize for only calculating a thing once and in Ruby this usually takes the following shape:

      def expensive_calculation
      @expensive_calculation ||= big_number * big_number * big_number * big_number * big_number
    end

    def big_number
      puts "10"
      10
    end

The var ||= val translates to var = var || val which ensures that if var is ever nil, it can be assigned to val, and then never reassigned again for the same revaluation of the same expression.

Further to the point, once we call this method once then no further calculations will be made. As an example:

      puts "Please don't recalculate this value!: #{expensive_calculation}"
    # => 10
    # => 10
    # => 10
    # => 10
    # => 10
    # => Please don't recalculate this value!: 10000

    puts "Please don't recalculate this value!: #{expensive_calculation}"
    # => Please don't recalculate this value!: 10000

Voila! The power of memoization!

The Confusion Sets In

Let's look at a formal definition of memoization borrowed from a popular wiki(pedia):

Memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

This sounds strangely familiar to things like key:value stores, like Redis, which we use for caching. In fact, I've even seen people conflate "temporary" writes into persistent data stores (like Postgres) with memoization; confusing! The misunderstanding is worsened by the language so easily allowing you to seemingly replace an external cache:

    class TempCache
    def costly_value1
      @costly_value1 ||= 11111111 * 11111111
    end

    def costly_value2
      @costly_value2 ||= 22222222 * 22222222
    end

    def costly_value3
      @costly_value3 ||= 33333333 * 33333333
    end
  end

With TempCache, I will never have to recalculate any of the costly_values; I have essentially "cached" them. Why add the overhead of Redis when Ruby answers all my hopes and dreams?

Everything is fine, right? Well, for now it is since every simple calculation is made on my local machine or server.

The Wrong Abstraction

Instead, imagine that we need to make several external requests:

    class Pricing
    def amazon_pricing
      @amazon_pricing  ||= HTTPRequest.new(...)
    end

    def walmart_pricing
      @walmart_pricing  ||= HTTPRequest.new(...)
    end

    def tesco_pricing
      @tesco_pricing  ||= HTTPRequest.new(...)
    end
  end

What has changed? Every memoized value is now the result of external HTTP request. Imagine one of these Pricing objects is instantiated to satisfy the request of a web server; you'd better be patient, as these requests (if they are called) would be made on every single request-response cycle.

You might think this example is a little silly. After all, who would build a class which only contains these requests? It doesn't look so awkward when your classes begin to age and mature and another person comes into the class looking at the rest of it for a pattern to follow. At that point you've entered the death spiral of bad patterns: people copying bad ones.

Let's look at another example which I've seen in production:

    class Pricing
    class << self
      def amazon_pricing
        @amazon_pricing  ||= redis.get('amazon_pricing')
      end

      def walmart_pricing
        @walmart_pricing  ||= redis.get('walmart_pricing')
      end

      def tesco_pricing
        @tesco_pricing  ||= redis.get('tesco_pricing')
      end
    end
  end

The variables used here are actually class instance variables and won't be gobbled up by the garbage collector in the same way an instance would, so if classes are cached on the server then there won't be a performance cost aside from the first time the "cache" is hit.

This is an improvement on the previous example, but also a degradation in some respects. Testing singletons can be tricky since you'll have to manually reset class isntance variables to ensure tests start off from a blank slate.

There is also the danger of multiple application servers spinning up simultaneously causing them all to re-instantiate their classes, so they all fetch the relevant information at once. This is also know as a "thundering herd of elephants" and can easily lead to rate-limiting and denial-of-service "attacks".

Hijacking The Language

So far, we've seen two options which handle two use cases:

Class instances: Storing expensive calculations for the length of a request-response cycle
Class singletons: Storing expensive calculations until the application server is recycled

By moving memoized calls into individual method calls, we are abandoning the language constructs that Ruby provides, namely constructors. Instead of memoizing a method result, use what Ruby has blessed us with:

    class Pricing
    attr_reader :amazon_pricing, :walmart_pricing, :tesco_pricing

    def initialize
      @amazon_pricing = HTTPRequest.new(...)
      @walmart_pricing = HTTPRequest.new(...)
      @tesco_pricing = HTTPRequest.new(...)
    end
  end

This is much simpler and easier to read! We now also have a guarantee that every method call will take the same length of time. Previously, each first method call would take longer than the first. Now that the requests are grouped together, we can even execute these IO-bound requests concurrently. Wins all around!

The skeptic might ask, "what if I won't want to execute every request every time I instantiate an object?" If that is the case, and not every request made in the constructor is used throughout the class, then a red flag should immediately be raised. If only one method is called in very special circumstances, we should think about the cost of having such low cohesion in our classes that has led to requiring "smart clients". In such cases, functionality should be extracted and multiple objects composed out of the larger one which has so many unrelated parts.

Moving next to the singleton usage, we can finally address the real problem: we are abusing Ruby and treating its memoization as a sort of cache for external requests. The easiest solution is to replace the memoized calls on the singleton with external cache fetches on a plain 'ole Ruby objects:

    class Pricing
    attr_reader :amazon_pricing, :walmart_pricing, :tesco_pricing

    def initialize
      @amazon_pricing = cache.fetch { HTTPRequest.new(...) }
      @walmart_pricing = cache.fetch { HTTPRequest.new(...) }
      @tesco_pricing = cache.fetch { HTTPRequest.new(...) }
    end
  end

"But what about the cost of fetching each value each time we instantiate the Pricing class?"

A round-trip to a key:value store like Redis within your VPC is extremely fast and very unlikely to cause a significant impact on your response times. Moving the calls to an actual cache removes the thundering herd on server recycling and also makes testing so much easier since we are no longer preserving state on the class' singleton between tests.

In Conclusion

There are potentially times where disregarding these rules is useful, but I would avoid these situations unless it was an absolute necessity.

I typically treat memoization for calculations carried across different scopes (e.g. using memoized results within a method, outside of it) to be a code smell. Using memoization in this way typically means I need to reach for something else (a real cache) or reorganize my code to use what Matz gave us for better code organization and reuse.

Lukas + Code

Archive