Class-based views and thread safety

I previously wrote about a thread-safety bug I encountered in writing a middleware object. I recently found a very similar situation in a class-based view I was modifying, and thought it was worth writing up what the problem was and how I solved it. Interestingly, there's a discussion going on in Django-developers at the moment on class-based views which touches on many of the same issues.

By 'class-based view', I mean that rather than just being a function which is called from urls.py when the URL matches its pattern, this was a class with a __call__ method. In theory, that should work in exactly the same way as a function view - to Python, functions and callable objects are pretty much the same thing.

One of the reasons that the controller was written as a class was that it had multiple utility methods that were called during the rendering process. To make it easier to pass data between the methods, various things were set as instance variables - self.item_id, self.key, and so on.

However, the urlpattern for this view was defined like this:

(r'^(?P<key>)/((?P<page>[^/]+)/)?', FormController()),

which meant that it was instantiated just once per process: when the urls are first imported. Unfortunately, when running under a production server like Apache/mod_wsgi, a process does not equal a request. Processes are persistent - managed as part of a pool by the server - and a process can serve many requests before eventually being killed and restarted.

This means that the FormController object declared above will probably be responsible for serving multiple requests - and any instance attributes, eg self.item_id, will be visible to all of them. Clearly, this has the potential to be a massive bug, if information starts leaking between requests - one user might start seeing information meant for someone else, or a form could validate itself based on information from a completely different user.

There are various potential ways to solve this. The main idea is to instantiate the object once per request, rather than once per process - this ensures that any instance data is specific to that request. The way I've done it for now is to redefine the url like this:

(r'^(?P<key>)/((?P<page>[^/]+)/)?', FormController),

ie make the handler a class, rather than an instance. That means that Django will instantiate a FormController object when the url matches, so the code that was previously in__call__ goes in __init__ instead. Unfortunately, in Python you can't return anything from an __init__ method - the returned value from an instantiation is automatically the object itself. So what I've done (based on an idea from Mike Malone) is to make the view class a subclass of HttpResponse - then, rather than returning render_to_response, you can just render the template to a string and pass this to the super class's __init__ method:

class FormController(HttpResponse):
  def __init__(self, request, key, page=None):
      ... lots of code ...
      content = loader.render_to_string(
                                  template, 
                                  form_context,
                                  context_instance=RequestContext(request)
                )
      super(FormController, self).__init__(content)

The value returned from the instantiation is therefore an HttpResponse object containing the rendered template, as Django expects.

One small side-effect of doing this is that you can't return anything else - for example an HttpResponseRedirect (or an HttpResponseNotFound). You need to do the redirect manually:

      if self.redirect_to_next_page:
          self.status_code = 302
          super(FormController, self).__init__()
          self['Location'] = next_page
          return

Comments