CNK's blog

Forcing Django to Make a Subquery

Django has extensive documentation for it’s ORM but somehow I still end up surprised by some of the queries it builds. The default logging configuration doesn’t log queries in the way Rails does (in its development environment) so when a query appears to give the correct results, I don’t usually check the SQL. But I recently had a page start to fail; I finally tracked it down to a specific query but couldn’t immediately see why I was not getting the row I expected so I printed the query and Django was not building the query I thought it had been. The python is:

Material.objects.filter(goal_id=goal_key,
                        material_state='deployed',
                        category_id=category) \
                .exclude(individualizations__user__id=user_id,
                         individualizations__material_state__in=('done',)) \
                .select_related('category') \
                .order_by('category__position', 'created_at')

This produces the following SQL (edited slightly to make it easier to read):

SELECT "appname_materials".*, "appname_categories"."category_key", "appname_categories"."position"
FROM "appname_materials" INNER JOIN "appname_categories"
  ON ( "appname_materials"."category_key" = "appname_categories"."category_key" )
WHERE ("appname_materials"."material_state" = 'deployed'
       AND "appname_materials"."goal_key" = 'goal1'
       AND "appname_materials"."category_key" = 'transition'
       AND NOT ("appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                       FROM "appname_material_individualizations" U1
                                                       WHERE U1."material_state" IN ('done'))
                AND "appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                           FROM "appname_material_individualizations" U1
                                                           WHERE U1."user_id" = 1))
      )
ORDER BY "appname_categories"."position" ASC, "appname_materials"."created_at" ASC

Hmmm that’s not what I want. I don’t want 2 subqueries, one for each condition. I want one subquery, with two two conditions. If I had wanted 2 subqueries, I would have written 2 excludes, like this:

Material.objects.filter(goal_id=goal_key,
                        material_state='deployed',
                        category_id=category) \
                .exclude(individualizations__user__id=user_id) \
                .exclude(individualizations__material_state__in=('done',)) \
                .select_related('category') \
                .order_by('category__position', 'created_at')

But both of those QuerySet definitions produce the same SQL. So how can I produce the following SQL using the Django ORM:

SELECT "appname_materials".*, "appname_categories"."category_key", "appname_categories"."position"
FROM "appname_materials" INNER JOIN "appname_categories"
  ON ( "appname_materials"."category_key" = "appname_categories"."category_key" )
WHERE ("appname_materials"."material_state" = 'deployed'
       AND "appname_materials"."goal_key" = 'goal1'
       AND "appname_materials"."category_key" = 'transition'
       AND NOT "appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                      FROM "appname_material_individualizations" U1
                                                      WHERE U1."material_state" IN ('done')
                                                      AND U1."user_id" = 1)
       )
ORDER BY "appname_categories"."position" ASC, "appname_materials"."created_at" ASC

I tried a couple of things using Q but mostly ended up with syntax errors. Fortunately I finally found this Stack Overflow thread with references the bug report for this problem AND the solution. You can force Django to build the desired subquery by writing the subquery explicitly:

Material.objects.filter(goal_id=goal_key,
                        material_state='deployed',
                        category_id=category) \
                .exclude(
                     material_key__in=(
                         MaterialIndividualization.objects.values_list('material_id', flat=True)
                        .filter(user__id=user_id, material_state__in=('done',))
                     )
                 ) \
                .select_related('category') \
                .order_by('category__position', 'created_at')

It’s a little verbose, but it is actually a little clearer in some respects - it is more like a direct python translation of the desired SQL.

Comments