CNK's Blog

Tuning Django REST Framework Serializers

One problem that often comes up when you are using an object-relational mapper is called the N+1 query problem - inadvertently doing a query and then doing a separate query for the related objects for each row. When building sites using Ruby on Rails, the framework logs all SQL queries (while you are in development mode). So one tends to fix these inefficient queries as you are developing - if nothing else, in self-defense so you can actually see the things you care about in your logs.

Django, on the other hand, does not log anything except the timestamp, request, response_code, and response size. Its default logging configuration doesn’t log any request parameters or database queries. So it’s easy to overlook inefficient queries. So when we finally put a reasonable amount of test data into our staging server, we found that several of our API endpoints were agonizingly slow. So, time for some tuning!

Setup

Lots of people use the django debug toolbar but I really prefer log files. So I installed and configured Django Query Inspector. That was helpful for identifying some of the worst offenders but for the real tuning, I needed this stanza to log all database queries:

    LOGGING = {
        'version': 1,
        'disable_existing_loggers': False,
        'handlers': {
            'console': {
                'level': 'DEBUG',
                'class': 'logging.StreamHandler',
            }
        },
        'loggers': {
            'django.db.backends': {
                'handlers': ['console'],
                'level': 'DEBUG',
            },
        }
    }

Once I had that going, I started looking at some of my nested serializers. With a couple of well placed “select_related”s on the queries in my views, I was able to get rid of most of the excess queries but I was consistently seeing an extra query that I couldn’t figure out - until I started to write up an issue to post on IRC.

The extra query was coming in because I was using DRF’s browsable API to do my query tuning. The browsable API includes a web form for experimenting with the create and update actions in a ModelViewSet and that form has a select menu for each foreign key relationship that needs to be created. So when I made a request in the browser, I saw:

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    (0.000) QUERY = '
    SELECT "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_metagoal"' - PARAMS = (); args=()

    [SQL] 2 queries (0 duplicates), 0 ms SQL time, 101 ms total request time
    [15/Jul/2016 01:40:53] "GET /api/goals/3/ HTTP/1.1" 200 10565

But when I made the same request using curl, I only see the one join query that I was expecting:

    $ curl http://127.0.0.1:8000/api/goals/3/ | jq .
    {"id": 3,
     "url": "http://127.0.0.1:8000/api/goals/3/",
     "name": "Subgoal 3",
     "metagoal": "http://127.0.0.1:8000/api/metagoals/1/"
    }

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    [SQL] 1 queries (0 duplicates), 0 ms SQL time, 12 ms total request time
    [15/Jul/2016 01:40:47] "GET /api/goals/3/ HTTP/1.1" 200 5398