Please Stop Abusing Ruby Memoization
March 18, 2021
I've seen Ruby's memoization taken a license with many times, I've even done it myself, so it's good to keep ourselves grounded in the right abstractions to help us understand that another tool could be a better fit for use. For the uninitiated, memoization refers to storing a result so that it can be accessed later without recalculation. In other words, memoization is a way to optimize for only calculating a thing once and in Ruby this usually takes the following shape:
def expensive_calculation
@expensive_calculation ||= big_number * big_number * big_number * big_number * big_number
end
def big_number
puts "10"
10
end
The var ||= val
translates to var = var || val
which ensures that if var
is ever nil
, it can be assigned to val
, and then never reassigned again for the same revaluation of the same expression.
Further to the point, once we call this method once then no further calculations will be made. As an example:
puts "Please don't recalculate this value!: #{expensive_calculation}"
# => 10
# => 10
# => 10
# => 10
# => 10
# => Please don't recalculate this value!: 10000
puts "Please don't recalculate this value!: #{expensive_calculation}"
# => Please don't recalculate this value!: 10000
Voila! The power of memoization!
The Confusion Sets In
Let's look at a formal definition of memoization borrowed from a popular wiki(pedia):
Memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
This sounds strangely familiar to things like key:value stores, like Redis, which we use for caching. In fact, I've even seen people conflate "temporary" writes into persistent data stores (like Postgres) with memoization; confusing! The misunderstanding is worsened by the language so easily allowing you to seemingly replace an external cache:
class TempCache
def costly_value1
@costly_value1 ||= 11111111 * 11111111
end
def costly_value2
@costly_value2 ||= 22222222 * 22222222
end
def costly_value3
@costly_value3 ||= 33333333 * 33333333
end
end
With TempCache
, I will never have to recalculate any of the costly_values
; I have essentially "cached" them. Why add the overhead of Redis when Ruby answers all my hopes and dreams?
Everything is fine, right? Well, for now it is since every simple calculation is made on my local machine or server.
The Wrong Abstraction
Instead, imagine that we need to make several external requests:
class Pricing
def amazon_pricing
@amazon_pricing ||= HTTPRequest.new(...)
end
def walmart_pricing
@walmart_pricing ||= HTTPRequest.new(...)
end
def tesco_pricing
@tesco_pricing ||= HTTPRequest.new(...)
end
end
What has changed? Every memoized value is now the result of external HTTP request. Imagine one of these Pricing
objects is instantiated to satisfy the request of a web server; you'd better be patient, as these requests (if they are called) would be made on every single request-response cycle.
You might think this example is a little silly. After all, who would build a class which only contains these requests? It doesn't look so awkward when your classes begin to age and mature and another person comes into the class looking at the rest of it for a pattern to follow. At that point you've entered the death spiral of bad patterns: people copying bad ones.
Let's look at another example which I've seen in production:
class Pricing
class << self
def amazon_pricing
@amazon_pricing ||= redis.get('amazon_pricing')
end
def walmart_pricing
@walmart_pricing ||= redis.get('walmart_pricing')
end
def tesco_pricing
@tesco_pricing ||= redis.get('tesco_pricing')
end
end
end
The variables used here are actually class instance variables and won't be gobbled up by the garbage collector in the same way an instance would, so if classes are cached on the server then there won't be a performance cost aside from the first time the "cache" is hit.
This is an improvement on the previous example, but also a degradation in some respects. Testing singletons can be tricky since you'll have to manually reset class isntance variables to ensure tests start off from a blank slate.
There is also the danger of multiple application servers spinning up simultaneously causing them all to re-instantiate their classes, so they all fetch the relevant information at once. This is also know as a "thundering herd of elephants" and can easily lead to rate-limiting and denial-of-service "attacks".
Hijacking The Language
So far, we've seen two options which handle two use cases:
- Class instances: Storing expensive calculations for the length of a request-response cycle
- Class singletons: Storing expensive calculations until the application server is recycled
By moving memoized calls into individual method calls, we are abandoning the language constructs that Ruby provides, namely constructors. Instead of memoizing a method result, use what Ruby has blessed us with:
class Pricing
attr_reader :amazon_pricing, :walmart_pricing, :tesco_pricing
def initialize
@amazon_pricing = HTTPRequest.new(...)
@walmart_pricing = HTTPRequest.new(...)
@tesco_pricing = HTTPRequest.new(...)
end
end
This is much simpler and easier to read! We now also have a guarantee that every method call will take the same length of time. Previously, each first method call would take longer than the first. Now that the requests are grouped together, we can even execute these IO-bound requests concurrently. Wins all around!
The skeptic might ask, "what if I won't want to execute every request every time I instantiate an object?" If that is the case, and not every request made in the constructor is used throughout the class, then a red flag should immediately be raised. If only one method is called in very special circumstances, we should think about the cost of having such low cohesion in our classes that has led to requiring "smart clients". In such cases, functionality should be extracted and multiple objects composed out of the larger one which has so many unrelated parts.
Moving next to the singleton usage, we can finally address the real problem: we are abusing Ruby and treating its memoization as a sort of cache for external requests. The easiest solution is to replace the memoized calls on the singleton with external cache fetches on a plain 'ole Ruby objects:
class Pricing
attr_reader :amazon_pricing, :walmart_pricing, :tesco_pricing
def initialize
@amazon_pricing = cache.fetch { HTTPRequest.new(...) }
@walmart_pricing = cache.fetch { HTTPRequest.new(...) }
@tesco_pricing = cache.fetch { HTTPRequest.new(...) }
end
end
"But what about the cost of fetching each value each time we instantiate the Pricing class?"
A round-trip to a key:value store like Redis within your VPC is extremely fast and very unlikely to cause a significant impact on your response times. Moving the calls to an actual cache removes the thundering herd on server recycling and also makes testing so much easier since we are no longer preserving state on the class' singleton between tests.
In Conclusion
There are potentially times where disregarding these rules is useful, but I would avoid these situations unless it was an absolute necessity.
I typically treat memoization for calculations carried across different scopes (e.g. using memoized results within a method, outside of it) to be a code smell. Using memoization in this way typically means I need to reach for something else (a real cache) or reorganize my code to use what Matz gave us for better code organization and reuse.