CNK's Blog

New Django Query Tricks

Union queries

Union queries are surprisingly easy to create. I need a list of ids and content type ids from a bunch of different models. I was very surpised at how straightforward it is in Django 3.2 to create the UNION query I want.

  union_query = None
  for content_type in <queryset of content types>:
      model = apps.get_model(content_type.app_label, content_type.model)
      query = model.objects.filter(<criteria>).values('pk', <content_type.id>)
      if union_query is None:
         union_query = query
      else:
          union_query = union_query.union(query, all=True)

Note: I used all=True because I will never have duplicates in my (id, content_type_id) tuples and UNION ALL is faster than UNION in this case because we can skip the DISTINCT operation on the final result.

The observant among you will have noticed a bit of pseudocode in the example above. I want to insert the content_type_id from python into my query. In SQL this would be something like:

  SELECT id, 99 FROM myapp_model;

In the Django ORM, that turns out to be something I didn’t know how to do. I can’t leave it as a bare name and I can’t quote it or the ORM tries to turn it into a column name or relation that could be turned into a column name. Turns out I need to use Value:

  query = model.objects \
               .filter(<criteria>) \
               .values('pk', Value(content_type.id, output_field=IntegerField()))

OK so that now will give me a queryset that produces a list of dicts like: [{pk: 3, content_type_id: 44}, {pk: 3, content_type_id: 48}] But when I tried to use those results in the filter section of another query… I had my next problem.

Querying by value - without Foreign Key relationships

So now I need to use those ids and content_type_ids to filter another model that has rows with content_type_id and object_id columns. I want all the lines in the table for the ModelLogEntry model where the (object_id, content_type_id) tuple is in the list of (pk, content_type_id) tuples created by our UNION query above.

If I only needed to match on a single value, I would probably evaluate the UNION query, and then do something like .filter(pk__in=<list of pks>) - as I did to get the list of content types I need. But I need to match the id and content_type_id fields. In SQL, I would do:

  SELECT wagtailcore_modellogentry.*
    FROM wagtailcore_modellogentry
    INNER JOIN (
    (
    ((SELECT `link_farms_audience`.`id`, 104 AS `content_type_id`  FROM `link_farms_audience` WHERE `link_farms_audience`.`site_id` =  12)
     UNION
     (SELECT `link_farms_collection`.`id`, 105 AS `content_type_id` FROM `link_farms_collection` WHERE `link_farms_collection`.`site_id` = 12))
     UNION
     (SELECT `link_farms_link`.`id`, 106 AS `content_type_id` FROM `link_farms_link` WHERE `link_farms_link`.`site_id` = 12))
     UNION
     (SELECT `core_didyouknowitem`.`id`, 110 AS `content_type_id` FROM `core_didyouknowitem` WHERE `core_didyouknowitem`.`site_id` = 12 ORDER BY `core_didyouknowitem`.`text` ASC)
    ) AS models
    ON models.id = wagtailcore_modellogentry.object_id
    AND models.content_type_id = wagtailcore_modellogentry.content_type_id

This was relatively straightforward to write in SQL, so I tried using raw SQL, e.g. ModelLogQuery.objets.raw('<query here>'). That definitely gave me the data I was looking for when I ran it in shell_plus. But when I tried to use it in my monkey patch, the calling function wanted to use values(), which is a function only defined on real ORM QuerySets - and not available when using raw.

At this point I suspect I won’t want to use this in production. Goodness only knows how big the union query is likely to get. But it is bothering me that I don’t know how to get Django to let me do a relatively straightforward join without having defined a ForeignQuery relationship in my Python model code.

I still don’t know how to tell Django “do this join damn it!”, but after some reading and thinking of alternate ways to write the SQL, I think I have found a way to write this in the ORM using Exists to create a correlated subquery.

    from django.apps import apps
    from django.db.models import Exists, IntegerField, OuterRef, Value
    from django.contrib.contenttypes.models import ContentType

    request = get_current_request()
    site = Site.find_for_request(request)
    union_query = None
    content_types = (
        ContentType.objects
                   .filter(id__in=ModelLogEntry.objects.values_list('content_type_id', flat=True).distinct())
    )
    for content_type in content_types:
        model = apps.get_model(content_type.app_label, content_type.model)
        query = (model.objects.filter(site_id=site.id)
                 .values('pk', content_type_id=Value(content_type.id, output_field=IntegerField()))
                 )
        if union_query is None:
            union_query = query
        else:
            union_query = union_query.union(query, all=True)

    return ModelLogEntry.objects.filter(Exists(
        union_query.filter(pk=OuterRef('object_id'), content_type_id=OuterRef('content_type_id'))
    ))

Sigh. One can’t combine .filter with a union query.

    NotSupportedError at /admin/reports/site-history/
    Calling QuerySet.filter() after union() is not supported.

I tested the Exists query by setting the union_query to be just one type and it works fine. So I learned something useful about the Django ORM - even if I can’t apply that knowledge in the context in which I wanted to to use it this time.