I love Django's aggregation framework. It very successfully abstracts the common aggregration tasks into a Pythonic syntax that sits extremely well with the rest of the ORM, and the documentation explains it all without a single reference to SQL.

But sometimes that very abstraction gets in the way of working out what you want to do. One example of this happened to me today when I needed to do a sum of values grouped by a single value on a model - in SQL terms, a simple GROUP BY query.

The documentation is very clear about how to do aggregations across an entire QuerySet, and annotations across a relationship. So you can, for example, easily do a sum of all the 'value' fields in a model, or a sum of all the 'value' fields on a related model for each instance of the parent model. But the implication is that these are the only things you can do. So I was left wondering if I had to create a dummy related model just to contain the unique values of the field I wanted to group on.

In fact, what I wanted to do was clearly documented, but because of the (probably correct) desire not to express things in terms of SQL, it's not that easy to find. So here's how to do it: you just need to use values.

For instance, my model is a set of transactions for a financial accounting system. Each transaction is associated with an order, which is just an integer ID referring to records in a completely different system. I wanted to get the total of transactions for each unique order ID. It's as simple as this:

Transaction.objects.values('order_id').annotate(total=Sum('value'))

Which gives you a ValuesQuerySet along the lines of:

[{'order_id': 12345L, 'total': Decimal('1.23')}, 
 {'order_id': 54321L, 'total': Decimal('2.34')}, 
 {'order_id': 56789L, 'total': Decimal('3.45')}]

One thing to be aware of: as the docs do note, the order of values and annotate is significant here. This way round, it groups by the fields listed in values and then annotates. But if you put annotate first, it does the calculation for each individual record, without grouping, then uses values to simply restrict the fields it outputs.


Comments

comments powered by Disqus