One common database operation that isn't supported out of the box by Django's ORM is create_or_update - in other words, given a set of parameters, either update an existing object or create a new one if there isn't one already.

The naive implementation is to do a get() on the model, catching the DoesNotExist exception if there's no match and instantiating a new object, then updating the attributes and saving. (You wouldn't want to use get_or_create here, as that doesn't allow you to update the instance if it already exists, so you'd have some duplication of code and db queries).

try:
    obj = MyModel.objects.get(field1=value1)
except MyModel.DoesNotExist:
    obj = MyModel()
    obj.field1 = field1
obj.field2 = value2
obj.save()

The only problem with this is that it creates multiple queries: one to get the existing row, and then two to save it - Django checks to see if it should do an insert or an update when you save, which costs another query. Most of the time, this doesn't massively matter: creating and updating is usually done outside of the standard page rendering flow, so it's not a huge problem if it's a tiny bit slower.

But there are times when you do want to optimise this. One, which we recently ran into at work, is when you want to log items to the database in the course of normal page rendering. We do this to let users of our CMS know when they've put items on a page that aren't rendering how they should be, usually because they don't have the right selection of image assets. (There are good operational reasons as to why we can't stop them from entering them in the first place: I won't go into that here.) A further wrinkle for us is that we want to ensure each error only gets one entry in the log table, but should always record the most recent time that particular error scenario was encountered. So, an ideal case for create_or_update, if only it existed.

Of course I can't stand to see unnecessary db queries, so here's an implementation that uses QuerySet.update to do the initial getting and updating if a match exists. The trick is to realise that update returns the number of rows affected by the query - which has been true more or less ever since queryset-refactor landed nearly two years ago, but which was wrongly and explicitly denied in the documentation until recently (and still is denied in the 1.1 docs, even though it's true). We can use this number to tell if a matching row existed - and if it doesn't, we can then simply call create with the same arguments. Simple.

attrs = {'field1': 'value1', 'field2': 'value2'}
filter_attrs = {'filter_field': 'filtervalue'}
rows = MyModel.objects.filter(**filter_attrs).update(**attrs)
if not rows:
    attrs.update(filter_attrs)
    obj = MyModel.objects.create(**attrs)

The attrs dictionary contains the field names/values to use to update the object, and filter_attrs is the filter names/values to find the object to update. If we're creating a new object, it will of course need to set both the attrs values and the filter_attrs, so we update one dictionary from the other.

Now, note that this will always call a db UPDATE, and if no match exists, it will additionally call an INSERT. Compare this with the original version, which always calls a SELECT, plus another SELECT and an UPDATE if the match exists, but just an INSERT if there's no match. So whether this is more efficient will depend on the use case - if you expect more updates than create, this version should be better (a single UPDATE versus SELECT+UPDATE), but if the reverse is true the original implementation will probably be better.


Comments

comments powered by Disqus