My last post talked about how to follow reverse generic relations efficiently. However, there's a further potential inefficiency in using generic relations, and that's the forward relationship.

If once again we take the example of an Asset model with a GenericForeignKey used to point at Articles and Galleries, we can get from each individual Asset to its related item by doing asset.content_object. However, if we have a whole queryset of Assets, doing this:

{% for asset in assets %}
   {{ asset.content_object }}
{% endfor %}

will result in as many queries as there are assets - in fact it's n+m, where n is the number of assets and m is the number of different content types, as you'll have one extra query per type to get the ContentType object. (Although it might be slightly less than that if you've used ContentTypes elsewhere, as the model manager caches lookups on the assumption that they never change once they've been set.)

However, luckily we can make this much more efficient as well, again using a variation of the dictionary technique.

generics = {}
for item in queryset:
    generics.setdefault(item.content_type_id, set()).add(item.object_id)

content_types = ContentType.objects.in_bulk(generics.keys())

relations = {}
for ct, fk_list in generics.items():
    ct_model = content_types[ct].model_class()
    relations[ct] = ct_model.objects.in_bulk(list(fk_list))

for item in queryset:
    setattr(item, '_content_object_cache', 

Here we get all the different content types used by the relationships in the queryset, and the set of distinct object IDs for each one, then use the built-in in_bulk manager method to get all the content types at once in a nice ready-to-use dictionary keyed by ID. Then, we do one query per content type, again using in_bulk, to get all the actual object.

Finally, we simply set the relevant object to the _content_object_cache field of the source item. The reason we do this is that this is the attribute that Django would check, and populate if necessary, if you called x.content_object directly. By pre-populating it, we're ensuring that Django will never need to call the individual lookup - in effect what we're doing is implementing a kind of select_related() for generic relations.


comments powered by Disqus