Django patterns: memoizing

One of the things I wanted to do with this blog was to cover some of the design patterns I've discovered/come across/stolen over the years I've been working with Django. So this is the first in what I hope will be a long-running series on Django patterns.

Memoizing is the process by which a complicated or expensive function is replaced by a simpler one that returns the previously calculated value. This is a very useful thing to do in a complicated model, especially in cases where methods like get_absolute_url are calculated via a series of lookups on related models. Frequently I've found myself calling one of these methods on the same object several times within a view or template, leading to a huge amount of unnecessary database calls.

It's very easy to do this manually - the method simply needs to check whether the cached value already exists, if not calculate it and store it somewhere, then return the cached value:

def get_expensive_calculation(self):
    if not hasattr(self, '_expensive_calculation'):
        self._expensive_calculation = do_expensive_calculation()
    return self._expensive_calculation

Here the cache lives within the instance itself. For the way I use it, this is useful: instances are created and destroyed within a single request/response cycle, so the cache dies with the object at the end of that process, and I don't need to worry about invalidating the cache if the value subsequently changes. Naturally, you could use Django's cache framework here - you'd need to create a unique key somehow, perhaps using the model name and pk as a prefix - but otherwise it would work pretty much the same way.

However, it's a bit of a pain having to write this same boilerplate each time you want to memoize something, so I wanted to write a decorator that would do it, which I could simply apply to a model method to get it to automatically cache the result. There are various memoizing decorators out there, but they mostly suffer from two problems: either they only work on plain functions, rather than methods, or they create a global cache, which would lead to a memory leak as the value would be kept even though the instance had gone out of scope.

So here's my version:

def memoize_method(func):
    key = "__%s" % func.__name__
    def inner(self, *args, **kwargs):
        if not hasattr(self, key):
            setattr(self, key, func(self, *args, **kwargs))
        return getattr(self, key)
    return inner

This is pretty simple in the end. The decorator uses the name of the function it's decorating to create a key, and when it's called it is passed 'self', so it checks if that key exists on that object and either creates or returns it.

One potential problem with this is that it doesn't take any account of the method's arguments: after the first call, it will always return the same value even if called again with completely different arguments. Most of the time, this won't be a problem: since the cache only persists for a single request, you're most likely to be calling it with the same arguments each time. But it's fairly simple to extend the caching mechanism to use parameters within the key:

def memoize_method_with_params(params):
    def wrap(func):
        key = "__%s__%s" % (func.__name__, '__'.join(['%s:%%(%s)s' % (a, a) for
                                                      a in params]))
        def inner(self, *args, **kwargs):
            actual_key = key % kwargs
            if not hasattr(self, actual_key):
                setattr(self, actual_key, func(self, *args, **kwargs))
            return getattr(self, actual_key)
        return inner
    return wrap

This time, since the decorator itself takes arguments, you need to use the double-wrap method: the outer function is called on definition, and it returns the decorator function, which itself contains the inner wrapped function. The algorithm to calculate the key looks complex, but is actually just creating a string in the form __funcname__key1:%(key1)s__key2:%(key2)s, which will use the dictionary string interpolation method to include the actual values when the function is called. (One issue, left for the reader to correct: params must be a list or tuple, if passed a string it will fail.)

Although this is pretty nice, I can't help feeling that I should be using descriptors to do this. Inspired by a posting by Marty Alchin and one by Ian Bicking, I attempted to make this work, but I unfortunately drew a blank - the problem is that only the __get__ method has access to the instance, where the cache needs to be stored, but that needs to be available in __call__ somehow. One possible solution would be to have __get__ return another descriptor itself, but that seems like overkill for this.

Comments