Django patterns part 3: efficient generic relations

I've previously talked about how to make reverse lookups more efficient using a simple dictionary trick. Today I want to write about how this can be extended to generic relations.

At its heart, a generic relationship is defined by two elements: a foreign key to the ContentType table, to determine the type of the related object, and an ID field, to identify the specific object to link to. Django uses these two elements to provide a content_object pseudo-field which, to the user, works similarly to a real ForeignKey field. And, again just like a ForeignKey, Django can helpfully provide a reverse relationship from the linked model back to the generic one, although you do need to explicitly define this using generic.GenericRelation to make Django aware of it.

As usual, though, the real inefficiency arises when you are accessing reverse relationships for a whole lot of items - say, each item in a QuerySet. As with reverse foreign keys, Django will attempt to resolve this relationship individually for each item, resulting in a whole lot of queries. The solution is a little different, though, to take into account the added complexity of generic relations.

Assuming the list of items is all of one type, the first step is to get the content type ID for this model. From that, we can get the object IDs, and then do the query in one go. From there, we can use the dictionary trick described last time to associate each item with its particular related items. In this example, we have an Asset model that is the generic model, holding assets for other models such as Article and Gallery.

articles = Article.objects.all()
article_dict = dict([(article.id, article for article in articles])

article_ct = ContentType.objects.get_for_model(Article)
assets = Asset.objects.filter(
                content_type=article_type, 
                object_id__in=[a.id for a in all_articles]
              )
asset_dict = {}
for asset in assets:
    asset_dict.setdefault(asset.object_id, []).append(asset)
for id, related_items in asset_dict.items():
    article_dict[id]._assets = related_items

This is good as far as it goes, but what about when we have a heterogeneous list of items? That, after all, is the point of generic relations. So what if our starting point is a collection of both Galleries and Articles, and we still want to get all the related Assets in one go? As it turns out, the solution is not massively different: we just need to change the way we key the items in the intermediate dictionary, to record the content type as well as the object ID.

article_ct = ContentType.objects.get_for_model(Article)
gallery_ct = ContentType.objects.get_for_model(Gallery
assets = Asset.objects.filter(
                Q(content_type=article_type, 
                    object_id__in=[a.id for a in articles]) |
                Q(content_type=gallery_ct, object_id__in=[g.id for g in galleries])
             )

    asset_dict = {}
    for asset in assets:
        asset_dict.setdefault("%s_%s" % (asset.content_type_id, asset.object_id), 
                                         []).append(asset)

    for article in articles:
        article._assets = asset_dict.get("%s_%s" % (article_ct.id, article.id), None)

    for gallery in galleries:
        gallery._assets = asset_dict.get("%s_%s" % (gallery_ct.id, gallery.id), None)

Here we first of all use Q objects to get all the assets of type Article with IDs in the list of articles, plus all those of type Gallery with IDs in the list of galleries. Then we use the fact that each asset knows its own content type ID to create the dictionary keys in the form <content_type_id>_<object_id>. Finally, we loop through the articles and the galleries separately to get the relevant assets for each item.

Comments