Feb 22

Django patterns, part 4: forwards generic relations

Simulating select_related() on a GenericForeignKey

My last post talked about how to follow reverse generic relations efficiently. However, there's a further potential inefficiency in using generic relations, and that's the forward relationship.

If once again we take the example of an Asset model with a GenericForeignKey used to point at Articles and Galleries, we can get from each individual Asset to its related item by doing asset.content_object. However, if we have a whole queryset of Assets, doing this:

{% for asset in assets %}
   {{ asset.content_object }}
{% endfor %}

will result in as many queries as there are assets - in fact it's n+m, where n is the number of assets and m is the number of different content types, as you'll have one extra query per type to get the ContentType object. (Although it might be slightly less than that if you've used ContentTypes elsewhere, as the model manager caches lookups on the assumption that they never change once they've been set.)

However, luckily we can make this much more efficient as well, again using a variation of the dictionary technique.

generics = {}
for item in queryset:
    generics.setdefault(item.content_type_id, set()).add(item.object_id)

content_types = ContentType.objects.in_bulk(generics.keys())

relations = {}
for ct, fk_list in generics.items():
    ct_model = content_types[ct].model_class()
    relations[ct] = ct_model.objects.in_bulk(list(fk_list))

for item in queryset:
    setattr(item, '_content_object_cache', 
            relations[content_type_id][object_id])

Here we get all the different content types used by the relationships in the queryset, and the set of distinct object IDs for each one, then use the built-in in_bulk manager method to get all the content types at once in a nice ready-to-use dictionary keyed by ID. Then, we do one query per content type, again using in_bulk, to get all the actual object.

Finally, we simply set the relevant object to the _content_object_cache field of the source item. The reason we do this is that this is the attribute that Django would check, and populate if necessary, if you called x.content_object directly. By pre-populating it, we're ensuring that Django will never need to call the individual lookup - in effect what we're doing is implementing a kind of select_related() for generic relations.

Feb 15

Django patterns part 3: efficient generic relations

Extending the dictionary technique to cover generic lookups

I've previously talked about how to make reverse lookups more efficient using a simple dictionary trick. Today I want to write about how this can be extended to generic relations.

At its heart, a generic relationship is defined by two elements: a foreign key to the ContentType table, to determine the type of the related object, and an ID field, to identify the specific object to link to. Django uses these two elements to provide a content_object pseudo-field which, to the user, works similarly to a real ForeignKey field. And, again just like a ForeignKey, Django can helpfully provide a reverse relationship from the linked model back to the generic one, although you do need to explicitly define this using generic.GenericRelation to make Django aware of it.

As usual, though, the real inefficiency arises when you are accessing reverse relationships for a whole lot of items - say, each item in a QuerySet. As with reverse foreign keys, Django will attempt to resolve this relationship individually for each item, resulting in a whole lot of queries. The solution is a little different, though, to take into account the added complexity of generic relations.

Assuming the list of items is all of one type, the first step is to get the content type ID for this model. From that, we can get the object IDs, and then do the query in one go. From there, we can use the dictionary trick described last time to associate each item with its particular related items. In this example, we have an Asset model that is the generic model, holding assets for other models such as Article and Gallery.

articles = Article.objects.all()
article_dict = dict([(article.id, article for article in articles])

article_ct = ContentType.objects.get_for_model(Article)
assets = Asset.objects.filter(
                content_type=article_type, 
                object_id__in=[a.id for a in all_articles]
              )
asset_dict = {}
for asset in assets:
    asset_dict.setdefault(asset.object_id, []).append(asset)
for id, related_items in asset_dict.items():
    article_dict[id]._assets = related_items

This is good as far as it goes, but what about when we have a heterogeneous list of items? That, after all, is the point of generic relations. So what if our starting point is a collection of both Galleries and Articles, and we still want to get all the related Assets in one go? As it turns out, the solution is not massively different: we just need to change the way we key the items in the intermediate dictionary, to record the content type as well as the object ID.

article_ct = ContentType.objects.get_for_model(Article)
gallery_ct = ContentType.objects.get_for_model(Gallery
assets = Asset.objects.filter(
                Q(content_type=article_type, 
                    object_id__in=[a.id for a in articles]) |
                Q(content_type=gallery_ct, object_id__in=[g.id for g in galleries])
             )

    asset_dict = {}
    for asset in assets:
        asset_dict.setdefault("%s_%s" % (asset.content_type_id, asset.object_id), 
                                         []).append(asset)

    for article in articles:
        article._assets = asset_dict.get("%s_%s" % (article_ct.id, article.id), None)

    for gallery in galleries:
        gallery._assets = asset_dict.get("%s_%s" % (gallery_ct.id, gallery.id), None)

Here we first of all use Q objects to get all the assets of type Article with IDs in the list of articles, plus all those of type Gallery with IDs in the list of galleries. Then we use the fact that each asset knows its own content type ID to create the dictionary keys in the form <content_type_id>_<object_id>. Finally, we loop through the articles and the galleries separately to get the relevant assets for each item.

Feb 01

Middleware post-processing in Django: a gotcha

Just because it's a class, it doesn't mean you should store state in it

One of the requirements for the new Heart website we've just launched was to allow users to personalise their location to one of 33 radio stations across the country. For various reasons, this meant rewriting all the links on the page, dynamically, depending on the user's location setting.

The easiest place to do this sort of post-processing in Django is in response middleware. So I wrote a quick class that used regexes to grab all the href and action attributes (for a and form elements respectively - images didn't need localising) and add the relevant locations. Because it was dynamic, I used the ability of re.sub to call a function to determine the replacement value; and to save on multiple database queries, I saved various things in the instance. So it looked a bit like this:

href = re.compile(r'(href|action)=["\'](.+?)["\']')

class LocalisationMiddleware(object):
    def process_response(self, request, response):
        self.current_station = get_station(request)
        self.stations = Station.objects.values_list('slug', flat=True)

        content = href.sub(self.re_replace, response.content.decode('utf8'))
        response.content = unicode(content)
        return response

    def re_replace(self, matchobj):
        current_station = self.current_station
        url = "/%s%s" % (current_station.slug, matchobj.group(2))
        return "%s=%s" % (matchobj.group(1), url)

But then, during testing, we started getting some rather odd bug reports. Someone would be happily browsing the London pages, and would suddenly get a link pointing at Essex - which is supposed to be impossible.

We eventually realised what the problem was. Django middleware is instantiated once per process: so several requests were being serviced by the same instance, and the values of the local instance attributes - in particular self.current_station - were being leaked across requests.

The solution is to use a separate object to contain the current station and the re_replace method, and instantiate it explicitly in process_response:

class LocalisationMiddleware(object):

    def process_response(self, request, response):
         url_replacement = UrlReplacement(request)
         content = href.sub(url_replacement,
                           response.content.decode('utf8'))
        # etc

class UrlReplacement(object):
    def __init__(self, request):
       self.current_station = get_station(request)
       self.stations = Station.objects.values_list('slug', flat=True)

    def __call__(self, matchobj):
        # do replacements
Jan 11

Django patterns, part 2: efficient reverse lookups

Avoiding extra database calls on backwards ForeignKey queries

One of the main sources of unnecessary database queries in Django applications is reverse relations.

By default, Django doesn't do anything to follow relations across models. This means that unless you're careful, any relationship can lead to extra hits on the database. For instance, assuming MyModel has a ForeignKey to MyRelatedModel, this:

myobj = MyModel.objects.get(pk=1)
print myobj.myrelatedmodel.name

hits the database two separate times - once to get the MyModel object, and once to get the related MyRelatedModel object. Luckily, it's easy to get Django to optimise this into a single call:

myobj = MyModel.objects.select_related.get(pk=1)

This way Django does a JOIN in the database call, and caches the related object in a hidden attribute of myobj. Printing myobj.__dict__ will show this:

{'_myrelatedmodel_cache': &lt;MyRelatedModel: obj&gt;,
 'name': 'My name'}

Now, whenever you call myobj.myrelatedmodel, Django automatically uses the version in _myrelatedmodel_cache rather than going back to the database to get it. Note that this is exactly the same as what happens once the the related object was accessed in the first snippet above - Django caches it in the same way for future use. All select_related() does is pre-cache it before the first access.

None of this is new - it's quite well explained in the Django documentation. However, what's not obvious is how to do the same for reverse relationships. In other words, this:

myrelatedobj = MyRelatedObject.objects.get(pk=1)
print myrelatedobj.mymodel_set.all()

Here you'll always get two separate db calls, and adding select_related() anywhere won't help at all. Now one extra db call isn't that significant, but consider this in a template:

<ul>
{% for obj in myobjects %}
    <li>{{ myobj.name }}</li>
    <ul>
         {% for relobj in myobj.backwardsrelationship_set.all %}
         <li>{{ relobj.name }}</li>
         {% endfor %}
    </ul>
{% endfor %}
</ul>

Not an unreasonable thing to want to do - iterate through a bunch of objects, then for each one display all the objects in its backwards relationship. However, this will always cost n+1 queries, where n is the number of objects in the myobjects queryset. And what's worse, Django will go back and get the items from the database each time they're accessed, even if we've already got them for the same object in the same view or template. The queries quickly mount up. So how can we optimise this?

The answer is to get all the related objects at once, for the entire queryset, then cache each object's related objects in a hidden attribute. We can do this by sorting the objects once we've got them into a dict, keyed by the id of their parent object:

qs = MyRelatedObject.objects.all()
obj_dict = dict([(obj.id, obj) for obj in qs])
objects = MyObject.objects.filter(myrelatedobj__in=qs)
relation_dict = {}
for obj in objects:
    relation_dict.setdefault(obj.myobject_id, []).append(obj)
for id, related_items in relation_dict.items():
    obj_dict[id]._related_items = related_items

Now each MyRelatedObject instance in qs has a _related_items attribute, containing all the MyObject items in its reverse relationship. Obviously, since Django doesn't know about this, the only way to get the items is to explicitly iterate through _related_items rather than myobject_set.all in the template. And if you need extra filtering, you need to do it in the view where you first get the objects, since the resulting attribute isn't a queryset and can't be filtered.

There's quite a bit of looping etc in this snippet, so you should probably profile carefully to ensure this isn't actually more expensive than just going back to the database. But I've found that this is fairly efficient, and saves a lot of database access.

Jan 07

SSH and Mac OSX Terminal

Resetting terminal tab titles after SSH has messed with them.

I like the Mac as a development environment most of the time, but occasionally some things annoy me.

One of these niggles is the way that the tab title in Terminal changes when you SSH to an external server, but doesn't change back when you close the connection. So you end up with tabs that claim to be connected to a server, but aren't.

The culprit seems to be SSH itself. Here's my solution: a shell script that runs SSH and then sets the tab title back to the default "Terminal".

ssh $*
echo "\033]0;Terminal\007"

I've saved this to ~/bin/sshp, and made it executable, so now I just type sshp myserver instead of ssh. A further step would be to alias it back to ssh in .bash_profile with alias ssh=sshp

Dec 26

Vim taglist and Django

Inspired by the graphical cheat sheet here, I've recently moved over to Vim as my main development environment.

After installing a whole range of plugins, I found that one of them, taglist, no longer worked with my Django code. The reason was that something was changing the filetype of Django modules to 'python.django', and taglist - unlike most other plugins - was trying to match against the whole filetype, rather than just a part of it.

My solution is to hack taglist so that it does a partial match on the filetype. In the Tlist_Get_Buffer_Filetype function (line 984), change

let buf_ft = getbufvar(a:bnum, '&filetype')

to

let buf_ft = split(getbufvar(a:bnum, '&filetype'), '\.')[0]
Dec 26

Showing queries in Haystack

A Django debug toolbar panel for Haystack

At work we've been using Haystack to manage our site search, with a Solr backend.

As usual, we're customising things quite a lot - using faceted queries and weighted indexes, and bypassing the built-in search forms - so I wanted to be sure, in line with my general obsession with query efficiency, that we weren't generating multiple Solr queries for every search.

Haystack does log queries for every request internally, but as far as I can tell there's no way of getting to that information without writing some custom code to import and expose the relevant variable. So I've written a (very basic) panel for the Django debug toolbar which does just that.

Just put this somewhere on your pythonpath or in your project, and add it to the DEBUG_TOOLBAR_PANELS list in settings.py.

Dec 20

Django patterns: memoizing

How to cache expensive operations to prevent repeated database calls

One of the things I wanted to do with this blog was to cover some of the design patterns I've discovered/come across/stolen over the years I've been working with Django. So this is the first in what I hope will be a long-running series on Django patterns.

Memoizing is the process by which a complicated or expensive function is replaced by a simpler one that returns the previously calculated value. This is a very useful thing to do in a complicated model, especially in cases where methods like get_absolute_url are calculated via a series of lookups on related models. Frequently I've found myself calling one of these methods on the same object several times within a view or template, leading to a huge amount of unnecessary database calls.

It's very easy to do this manually - the method simply needs to check whether the cached value already exists, if not calculate it and store it somewhere, then return the cached value:

def get_expensive_calculation(self):
    if not hasattr(self, '_expensive_calculation'):
        self._expensive_calculation = do_expensive_calculation()
    return self._expensive_calculation

Here the cache lives within the instance itself. For the way I use it, this is useful: instances are created and destroyed within a single request/response cycle, so the cache dies with the object at the end of that process, and I don't need to worry about invalidating the cache if the value subsequently changes. Naturally, you could use Django's cache framework here - you'd need to create a unique key somehow, perhaps using the model name and pk as a prefix - but otherwise it would work pretty much the same way.

However, it's a bit of a pain having to write this same boilerplate each time you want to memoize something, so I wanted to write a decorator that would do it, which I could simply apply to a model method to get it to automatically cache the result. There are various memoizing decorators out there, but they mostly suffer from two problems: either they only work on plain functions, rather than methods, or they create a global cache, which would lead to a memory leak as the value would be kept even though the instance had gone out of scope.

So here's my version:

def memoize_method(func):
    key = "__%s" % func.__name__
    def inner(self, *args, **kwargs):
        if not hasattr(self, key):
            setattr(self, key, func(self, *args, **kwargs))
        return getattr(self, key)
    return inner

This is pretty simple in the end. The decorator uses the name of the function it's decorating to create a key, and when it's called it is passed 'self', so it checks if that key exists on that object and either creates or returns it.

One potential problem with this is that it doesn't take any account of the method's arguments: after the first call, it will always return the same value even if called again with completely different arguments. Most of the time, this won't be a problem: since the cache only persists for a single request, you're most likely to be calling it with the same arguments each time. But it's fairly simple to extend the caching mechanism to use parameters within the key:

def memoize_method_with_params(params):
    def wrap(func):
        key = "__%s__%s" % (func.__name__, '__'.join(['%s:%%(%s)s' % (a, a) for
                                                      a in params]))
        def inner(self, *args, **kwargs):
            actual_key = key % kwargs
            if not hasattr(self, actual_key):
                setattr(self, actual_key, func(self, *args, **kwargs))
            return getattr(self, actual_key)
        return inner
    return wrap

This time, since the decorator itself takes arguments, you need to use the double-wrap method: the outer function is called on definition, and it returns the decorator function, which itself contains the inner wrapped function. The algorithm to calculate the key looks complex, but is actually just creating a string in the form __funcname__key1:%(key1)s__key2:%(key2)s, which will use the dictionary string interpolation method to include the actual values when the function is called. (One issue, left for the reader to correct: params must be a list or tuple, if passed a string it will fail.)

Although this is pretty nice, I can't help feeling that I should be using descriptors to do this. Inspired by a posting by Marty Alchin and one by Ian Bicking, I attempted to make this work, but I unfortunately drew a blank - the problem is that only the __get__ method has access to the instance, where the cache needs to be stored, but that needs to be available in __call__ somehow. One possible solution would be to have __get__ return another descriptor itself, but that seems like overkill for this.

Dec 08

South migrations with MPTT

We've been using django-MPTT at work for quite a while. It's a great way to manage hierarchical data in a read-efficient way, and we use it heavily in our CMS application. I'll definitely be talking about it further in future posts.

Recently we moved our database migrations from our defunct dmigrations project to Andrew Godwin's wonderful South application. One of South's best features is the ability to 'freeze' the ORM within each migration, so that you can manipulate the db via the familiar Django syntax rather than having to deal with raw SQL.

However, we ran into a problem when trying to use this to add new instances to a model that uses MPTT. We're actually using Ben Frishman's fork of django-mptt, which he wrote while he was working for us this summer. This has a base model class that defines all the MPTT fields and methods, rather than monkey-patching them in as the original version does.

The issue was that the frozen ORM only includes the basic fields that are defined on the actual model. This led to trouble when inserting a new object, especially when it's in the middle of an existing tree. MPTT includes values which identify an item's place in its tree, and when a new object is inserted most of the elements in the tree have to be updated to reflect the new positioning. django-mptt normally deals with all the SQL changes necessary, but this wasn't happening within a migration, because the dynamically-created model wasn't inheriting the correct models and fields.

The answer turned out to be simple, although it is undocumented. The frozen ORM definitions are stored in each migration as a nested dictionary. Each model is an key in the top level dictionary, whose value is a dictionary containing the field name/definitions as keys/values. However, in the sub-dictionaries, along with the field definitions, you can also store Meta defintions, including a South-specific extension: _bases, which defines the model base to inherit from. For example:

{
    'categories.category': {
        'Meta': {'unique_together': "(['slug', 'parent'],)", '_bases': ('mptt.models.Model',)},
        'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
        'name': ('django.db.models.fields.CharField', [], {'max_length': '50'}),
        'parent': ('django.db.models.fields.related.ForeignKey', [], {'blank': 'True', 'related_name': "'children'", 'null': 'True', 'to': "orm['categories.Category']"}),
        'slug': ('django.db.models.fields.CharField', [], {'max_length': '50'}),
    }
}

This ensures that the frozen category model inherits from mptt.models.Model, and gains all the special MPTT magic.

Dec 05

Customising Mingus, part 2

This is intended to be primarily a technical blog, so I was keen to get the presentation of code snippets correct. I'm a - shall we say - fairly frequent answerer on StackOverflow, and I've got used to their Markdown-enabled edit box. Luckily, the Mingus basic-blog application allows a choice of markup for body text, and even defaults to Markdown. But as always there were quite a few things to improve.

Firstly, I do like StackOverflow's dynamic WYSIWSYG preview of the marked-up copy. Although Markdown syntax is quite simple, it's easy to get it wrong - using a three-space indent rather than four for code, for example. An instant preview just underneath the text entry field in the admin form is very useful. SO does it using the showdown.js library, which is part of their port of the 'what you see is what you mean' markdown editor, WMD.

It was as easy to integrate the whole of WMD as just the preview, by adding a mingus\admin.py like this:

from django import forms
from django.conf import settings
from django.contrib import admin
from django.utils.safestring import mark_safe
from basic.blog.models import Post
from basic.blog.admin import PostAdmin

class WMDEditor(forms.Textarea):

    def __init__(self, *args, **kwargs):
        attrs = kwargs.setdefault('attrs', {'class':'vLargeTextField'})
        super(WMDEditor, self).__init__(*args, **kwargs)

    def render(self, name, value, attrs=None):
        rendered = super(WMDEditor, self).render(name, value, attrs)
        return rendered + mark_safe(u'''
            <div id='wmd-container'>
            <div id='wmd-button-bar'></div>
            <div id='wmd-preview'></div>
            <script type="text/javascript">
            wmd_options = {
                output: "Markdown",
                buttons: "bold italic | link blockquote code image | ol ul"
            };
            </script>
            <script type="text/javascript" src="%sstatic/js/wmd.js"></script>
            </div>''' % settings.MEDIA_URL)

class PostForm(forms.ModelForm):
    body = forms.CharField(widget=WMDEditor)
    class Meta:
        model = Post

class WMDPostAdmin(PostAdmin):
    form = PostForm

    class Media:
        css = {
            "all": ("static/css/wmd.css",)
        }
        js = ("static/js/showdown.js",)

admin.site.unregister(Post)
admin.site.register(Post, WMDPostAdmin)

Because Mingus already does some Javascript on the Post admin to add the 'body inlines' section under the main textbox, I've made the WMD button bar appear underneath that, on top of the preview, instead of on top of the actual textarea. A bit weird, but it does work - it's not as if I use it all the time, anyway. This no doubt breaks if you use another markup language, but I always use Markdown, so no problem there.

So, from markup to syntax highlighting. Mingus is, unfortunately, a bit confusing here. Partly this is a result of Kevin's desire to integrate as many standalone applications as possible, and only write the minimum of glue code. However, this means that there are several applications that potentially supply markup functionality, and it confused me for quite a while. These include the django-extensions app, which includes the syntax_color templatetag; and django-sugar, which includes the pygment_tags library.

However, the basic django-blog app actually deals with markup and highlighting itself already. On saving a post, the markup is translated into HTML and saved in a body_markup field, thanks to the django-markup app. What I didn't realise is that django-markup already runs the formatted text through pygments to add the highlighting. The reason I didn't realise this is that pygments turns out not to be very clever in guessing the code language. If you don't tell it explicitly, it doesn't do anything. In the absence of a hard-coded hint, its attempt to guess the language is limited to looking at the first line of the code, where it hopes to see a pseudo-shebang line:

...
#! python

Once I started doing that, highlighting worked as expected (although there were some minor CSS issues - on some browsers the font used for pre was far too big). This also meant I could remove the call to the django-sugar pygmentize filter that mingus has for some reason added to all the blog templates.

I can't help feeling the proliferation of markup/highlighting code within mingus is a bit silly. I only realised in writing this that there is actually yet another place where highlighting could take place, as the Markdown library itself has an extension to call pygments (although presumably django-markup prefers to do this explicitly because other markup libraries don't have this extension).

There's one issue that remains unresolved. As well as the now-removed pygmentize filter, mingus also runs blog content through render_inlines, which allows insertion of arbitrary Django model content within a blog post. However, for some reason this removes all the indentation from code blocks - obviously not very useful when posting Python. I'm not using the inlines at the moment anyway, so I've removed them from the template until I can work out what's going on.

Other than that, everything works and the blog is now ready to use.

Oct 31

Cambridge Stack Overflow dev day

I don't go to a lot of tech conferences - family life tends to make getting away for any length of time fairly difficult. So originally I ignored the banners advertising the Stack Overflow DevDays, thinking I wouldn't be able to make it anyway. But when my employer arbitrarily changed the rules over how much holiday I'm allowed to carry forward into next year, I ended up a couple of days in hand - and a conversation with a co-worker convinced me to go at the last minute. After a comedy of errors regarding the last available ticket for the London event, I finally managed to snap up a ticket for the Cambridge day.

Since this was a Stack Overflow conference, it wasn't surprising that the keynote was by Joel Spolsky. It was preceded by a mildly amusing short film where he satirised his 'treat developers right' reputation by pretending to be a cross between an autocratic boss and a sadistic PE teacher, which was funny enough but slightly pointless. The talk itself was good: it was about the tension between the 'simplicity is everything' attitude of firms like 37 Signals, versus the undeniable fact that people want features, as evidenced by the way FogBugz' sales went up every time they added more features.

Spolsky is an entertaining speaker and I enjoyed the talk, even if there wasn't a particularly coherent take-home message: he was trying to say that you should only give people options for things that are actually important, but the whole point is that what's not important to one user is vital for another, which is why software like Microsoft Word ends up with so many hundreds of options.

Next up was Christian Heilmann talking about Yahoo! Developer Tools. Now this was really interesting - something I haven't had a chance to play with at all, but definitely will in the future. Yahoo has put together a very nice way of querying any of their APIs via REST with a simple SQL-like language, YQL. What's more, it's possible to submit your own data sources which can be linked up via an XML translation table and made available for everyone to query via YQL. Carrying that forward, you can write mini-applications in Javascript that use any of these APIs and soon you'll be able to offer these to be installed on users' Yahoo home pages in much the same way as Facebook apps. I must admit my heart did sink a bit when Christian mentioned the customised markup language, after too much time wrestling with FBML, but it's an exciting possibility.

After a short break, next up was Cambridge University's Frank Stajano. This talk was ostensibly about computer security, and specifically what we can learn from fraudsters to make our systems more secure. But he's a fan of the BBC3 programme The Real Hustle, a hidden-camera show where members of the public are conned in various ways, and he's done various bits of research analysing the cons from the programme and relating them to systems security. So the format of the lecture was to show us various clips from the show, then a couple of slides which were supposed to tell us how this type of con was used in computer terms and how we could avoid it. However, it didn't really achieve that - the links to computer security were not well explained, and although the talk was quite fun I didn't feel I learned much.

Next was Joel again, talking about FogBugz. Now I know you have to expect this sort of thing at conferences (especially at Carsonified ones, or so I'm told), but I actually object to paying to sit through an hour of sales pitch, however entertainingly delivered it is. FogBugz looks like a perfectly competent product, but I didn't see anything that made it shine over a product like Jira, or even particularly over the open-source Redmine that we use these days at work. Plus the demo included a couple of screens that clearly violated the principle Joel had pushed earlier of only giving options where they made a difference.

Lunch, followed by Steven Sanderson on ASP.NET-MVC. I actually found this fairly good - despite my complete lack of interest in any Microsoft technology, I'm not actually hostile, so I paid enough attention to find out what they were doing in this area. As the speaker freely admitted, .NET MVC is quite obviously ripped off from Ruby on Rails. It does offer some nice ways of doing things, but is missing a lot of the things that Django and Rails do - no ORM, for example, because it relies on LINQ; and no real templating system, because you just use standard ASP files. So nothing amazingly revolutionary, except if you're a Microsoft fanboy who's totally unaware of what the wider world is doing, but still good to see that Microsoft is learning things and giving its developers some alternatives. Best part: it's "open source", which in Microsoft language means "we're not going to accept your patches or anything, but you're free to fork it if you want". Great.

Next: Remy Sharp on jQuery. A deeply disappointing talk. Ryan Carson introduced it by asking how many of the audience had used jQuery (about half) and how many considered themselves experts (a handful), telling the latter that they may as well get a cup of coffee. In fact, that whole half of the audience should have done so: this was a very basic introduction, covering only the fundamentals. Remy is not a particularly fluent talker and this was not very well presented.

After another break, we had Michael Foord on Python. This was another fairly basic introduction - I had suspected I wasn't going to learn anything, but got my hopes up when Michael started off by talking about IronPython (he's the co-author of IronPython In Action). Unfortunately this was only a short digression, although it did look very cool (instantiating a Windows dialog from the IronPython console...) and the rest of the talk was a run-through of a clever little spellchecker in 40-odd lines of Python. This was all well and good, but the code wasn't anything particularly special to Python - you could have done it in any of a dozen other languages in about the same number of lines - and it didn't cover any of Python's cooler features. If I'd never dabbled in Python, I don't think this would have been enough to whet my appetite.

Finally, Jeff Atwood talking about Stack Overflow. This was only a short talk, where Jeff spoke about the reasons he and Joel had set up the site, what he hoped and hopes to achieve, and the achievement he gets from it.

So, that was it for the talks. Free beer was offered in a bar in town, but unfortunately those family obligations raised their heads again and I had to drive home.

Overall, a good day. I had about a 50% hit rate on interesting talks, which I suppose is fairly good going, and I did get a chance to meet some new people. It was a shame that most of the talks slightly overran, leaving almost no time for questions.

One surprising thing was that the day wasn't very well integrated with Stack Overflow. I had at least expected us to get preprinted badges showing our SO username and reputation scores, but no such luck. And when Carson asked the audience at one point who thought they had the highest rep, I didn't put my hand up, assuming my 9,000 points would be average in this crowd. But when he tried to work it out, starting by asking who had 1,000 points, who had 1,500, etc, I soon found I did indeed have by far the highest rep - the next highest put his hand down at about 2,500. Made me feel slightly sad (which I am, of course). A shame that I missed the chance to parlay my brief moment of fame into something more long-lasting by skipping the drinks.

On the whole, I'm glad I went, and if nothing else it's convinced me I need to try to go to more of this sort of thing.

Oct 04

Customising Django-Mingus

This blog is built using Kevin Fricovsky's excellent django-mingus project, which is mainly a set of standard pre-existing reusable apps with some templates and a bit of glue to hold it together.

Although it's quite usable out of the box, I found - inveterate hacker that I am - that there were several things that I didn't quite like in the project as it was. So I changed them (isn't open source great, laydees-n-genelmen). At some point I'll fork the project on github and upload the changes, but for now here's what I've done.

Firstly, mingus forsakes Django's built-in comments framework for the external Disqus project. I didn't really fancy signing up for another service - especially as I'm not expecting vast numbers of comments on this blog. It's quite a simple matter to reinstate the comments - the relevant template code is included in the post_detail.html template included with the basic-blog app which mingus extends, so I just needed to copy and paste it into the mingus version. Then add (r'comments/', include('django.contrib.comments.urls')), to urls.py, django.contrib.comments to settings.py, run a syncdb and it's all done.

There are however a couple of missing pieces here. basic-blog doesn't include templates for the comment preview and post confirmation, so you just get an unstyled white page. Simple to fix: add a comments directory with a base.html template as follows:

{% extends "base.html" %}
{% block content %}{% endblock %}

By default the post-confirmation page doesn't include a link back to the original object, leaving the user nowhere. So an overwritten posted.html in the same directory fixes that:

{% extends "comments/base.html" %}
{% load i18n %}    
{% block title %}{% trans "Thanks for commenting" %}.{% endblock %}    
{% block content %}
  <h2>{% trans "Thank you for your comment" %}.</h2>    
  <p><a href="{{ comment.get_content_object_url }}">Return to blog</a>
{% endblock %}

The last issue with comments was that there was no indication on the index page of how many comments each post had. This is a standard feature of blogs, and a bit surprising it wasn't there - perhaps it's a consequence of using Disqus. Anyway, the solution was to add the following to templates/proxy/includes/post_item.html:

{% if object.content_object.allow_comments %}
{% get_comment_count for object.content_object as comment_count %}
<div class="comment_count"><a href="{{ object.content_object.get_absolute_url }}#comments">{{ comment_count }} comment{{ comment_count|pluralize }}</a></div>
{% endif %}

I also added a style rule for the .comment_count class in base.css.

So much for comments. Now, layout. I couldn't help thinking that the default layout had the main area to narrow and the right-hand column too wide. Luckily the templates are based on the 960 Grid System css, so it was easy to change the central column to use the grid_11 suffix_1 classes, for a width of 11/16 and a gutter of 1/16, and the right-hand column to use grid_4.

The final issue was to do with markup - that was a bit more complicated, so I'll leave it to part 2.

Oct 03

The one where my friend the sysadmin kills me

Using git hooks to automatically deploy changes to the server

Warning: this entry is very much a matter of 'This isn't the right way to do it, but it works for me'.

For small projects that are in active development, I frequently have to deploy code changes to the live server. To make this as simple as possible for me, so I can concentrate on the coding, I tend to like running on a live checkout of the code directly from the repo.

I never really got this automated properly with svn, although no doubt it's a simple matter of setting up the right post-commit hooks. However, now I'm working mainly in git, and I thought it would be good if I could push straight from my local repo to the remote one, and automatically see the production code update.

It's fairly easy to set up a remote repository to push to - I followed the instructions here, which worked a treat. However, this wasn't helping with getting this code to auto-checkout and deploy itself. So I began experimenting, and what I came up with was this.

Firstly, instead of setting up a bare repo as recommended in those instructions, use a standard git init for your remote. If you now try and push to this, git will complain with a long message explaining that "Updating the currently checked out branch may cause confusion". It gives some tips about how to turn off that message, but we can avoid it altogether by using branches.

On the server, simply create and check out a live branch:

   git branch live
   git checkout live

Now, we just need a hook that pulls from master to live every time we commit to master. The hook we need is called post-receive, and like all hooks it lives in .git/hooks. Here's mine:

#!/bin/sh
read params
cd .. 
echo "ASSET_VERSION = '`echo $params|cut -d " " -f2`'" > local_settings.py
env -i ~/bin/git reset --hard
env -i ~/bin/git pull
exec ~/webapps/mysite/apache2/bin/restart

The two git commands simply ensure that the live branch has no local changes, and pulls all changes direct from master - which in turn of course has been updated directly from my development machine.

The rest is me trying to be even cleverer. I wanted an automatic cache-busting mechanism to stop my javascript being cached while in development. So I have a simple local_settings.py file which defines a value which is appended to the querystring of all my asset urls. The hook updates this automatically - it is passed the hash of the current commit, so it reads the parameters (which is far more difficult in bash than it needs to be, by the way), extracts the hash, and writes it to local_settings.py.

The final step is to restart Apache, and we're laughing.

Now, no doubt there are much better ways of doing this. But like I say, it works for me.