CNK's blog

Django’s GenericForeignKeys and GenericRelations

I am working on a project that has two separate but interrelated Django web sites (projects in Django’s parlance). In an earlier blog post, I described setting up the second project (mk_ai) to have read-only access to the first project’s database (mk_web_core) in dev but then getting around those access restrictions for testing. The main thing I need for testing is a big, set of hierarchical data to be loaded into the first project’s test database. I can use the manage commands dumpdata and loaddata to preserve date in my development environment, but when I tried to load that same data into the test database, I ran into problems.

We are using GenericForeignKeys and GenericRelations. Django implements GenericForeignKeys by creating a database foreign key into the django_content_type table. In our mixed database setup, my django_content_type table is in the mk_ai schema. So, even if I set up my database router to allow_relation across databases AND the postgres database adapter would even attempt to make that join, the content types in the references in mk_web_core would not be in mk_ai’s django_content_type table. So we can’t use Django’s GenericForeignKeys. What shall we do instead?

Rails implements a similar type of relationship with a feature it calls Polymorphic Associations. Django stores the object’s id + a FK link to row in the content_type table representing the the object’s model. Rails store’s the object’s id + the object’s class name in a field it calls _type. I decided to use the Rails method to set up my database representations. That replaces the GenericForiegnKey aspect. To replace the GenericRelation part, I just created a case statement that allows queries to chain in the approrpriate related model based on the … content type. Perhaps showing an example will make this clearer.

The original way, using Django’s GenericForeignKey:

class PageBlock(models.Model):
    page = models.ForeignKey('Page')
    position = models.PositiveSmallIntegerField()
    allowed_block_types = models.Q(app_label='materials', model='text') | \
            models.Q(app_label='materials', model='video') | \
            models.Q(app_label='course_materials', model='image')
    block_type = models.ForeignKey(ContentType, limit_choices_to=allowed_block_types)
    object_id = models.PositiveSmallIntegerField()
    material = GenericForeignKey(block_type', 'object_id')

The ‘rails’ way, using a block_type name field that can be read directly in the mk_ai schema.

class PageBlock(models.Model):
    """
    This is a mapping table to all us to access collections of
    blocks regardless of their actual type.

    TODO:
    Figure out how to make the object_id options fill a select
    list once the user chooses a block_type in the form on the
    admin interface.
    """
    BLOCK_TYPE_NAMES = [('text', 'TextBlock'),
                        ('video', 'VideoBlock'),
                        ('image', 'ImageBlock'),
                       ]
    page = models.ForeignKey('Page')
    position = models.PositiveSmallIntegerField()
    block_type_name = models.CharField(max_length=100, choices=BLOCK_TYPE_NAMES)
    # The block_id would be a ForeignKey field into a Video, Image... if we were mapping to just one model
    block_id = models.PositiveSmallIntegerField()

    @property
    def block(self):
        if self.block_type == 'TextBlock':
            return TextBlock.objects.get(pk=self.block_id)
        if self.block_type == 'VideoBlock':
            return VideoBlock.objects.get(pk=self.block_id)
        if self.block_type == 'ImageBlock':
            return ImageBlock.objects.get(pk=self.block_id)

GenericForeignKey and GenericRelation are two sides of the coin - they allow you to easily make queries both directions. In our domain, I don’t really have much occaision to go from Block to Page, so I don’t really need to GenericRelation. However, if you need to replace it, you can create a method to do the appropriate query.

# ORIGINALLY
class VideoBlock(models.Model):
    title = models.CharField(max_length=256)
    content = models.FileField(upload_to='videos/')
    page_block = GenericRelation(PageBlock,
                                 object_id_field='object_id',
                                 content_type_field='page_block')
    @property
    def model_name(self):
       return "VideoBlock"

# AFTER REMOVING THE GenericForeignKey
class VideoBlock(models.Model):
    title = models.CharField(max_length=256)
    content = models.FileField(upload_to='videos/')

    @property
    def model_name(self):
        return "VideoBlock"

    @property
    def page_block(self):
        return self.PageBlock.objects.filter(block_type_name='VideoBlock',
                                             object_id=self.id)

Review of ‘React Under the Hood’

We are using React at work. The official documentation is really good - especially the Thinking in React section. But I could still use some additional examples, ideas, etc. so when I saw an offer for several books on React, including one from Arkency that I had been considering buying for a while, I broke down and bought the bundle. The first one I read is “React Under the Hood” by Freddy Rangel.

Overall it is a really good book. The author points out that some of the ideas in react come from the game development world. To emphasize that, the example code for the book is a Star Trek ‘game’. The author provides a git repository you can clone to get started. The project is set up to use babel and webpack and a node dev server - all of which just work out of the box. I need to dig into one of the other books from the Indie Bundle, Survive JS, to learn more about how to set these up. You build almost all of the code - except for the rather hairy navigation and animation parts which are available in the clone you are encouraged to use to get started.

The example stresses good engineering practices - especially having one or two smart components that control all state mutation and lots of well separated dumb components that just render the approprite info for a given state. I really liked the EditableElement component and will probably steal it for a play project I want to do after completing this book.

The author did not use ES6 syntax because it might be unfamiliar to some people. I actually find the new syntax easier so I translated most things into using ‘let’ instead of var and all seemed to go just fine. The other change I made throughout is to the module.exports for each .jsx file. The book suggests starting each class like this:

module.exports = React.createClass({
  render: function() {
    //whatever
  },
});

If you do this, the React plugin for the Chrome Developer tools just labels each component as which means you have to dig around for the section of rendered code you want to inspect. The project I am on at work uses a slightly different syntax - but one that is a lot easier to read and understand:

let Something = React.createClass({
  render: function() {
    //whatever
  },
});

module.exports = Something;

If you do this, then the React debugging tab now shows this component as which makes it a LOT easier to find the code you want to inspect.

The example was good but the best part was the great material in the final chapter. It discusses

  1. PropTypes (which I had heard of but forgotten).

  2. getDefaultState and getDefaultProps (haven’t used them, but they might come in handy).

  3. How to profile you code with Perf - and then some suggestions about what to do about what you find. Good information about how to improve performance of components that are basically render only (per the design espoused in the rest of the book) using a React add on called PureRenderMixin. I am going to have to look into mixins.

Using Multiple Databases in Django

I am currently working on a project that has a main public web site (mk_web_core) and then a separate AI (mk_ai) application that needs access to a large percentage of the information in the public site’s database. Since the AI only makes sense in the context of the public web site, one option might be to make them a single application. However, we are planning to experiment with different versions of the AI, so it seems sensible to separate them and develop an API contract between the two halves.

My first thought was to completely separate the two - separate code bases and separate databases. However, the main application has a deeply nested hierarchical schema and the AI needs to know all the gorey details of that structure. So if we completely separate the two apps, we need to build something to keep the two views of that hierarchy in sync. We will eventually need to do that - and then build an extract, transform, and load (ETL) process for keeping the AI in sync with the main site. But for now, we are going to put that off and instead allow the AI read-only access to the information it needs from the main site.

Django has built in support for multiple database connections so getting things set up so my AI site could read from the mk_web_core database was pretty straightforward. The documentation on multiple datbases indicated that one should create a database router for each database and then in my settings.py file give DATABASE_ROUTERS a list containing the two routers. After setting up the database configuration, I copied the model files from the mk_web_core project into corresponding app locations in the mk_ai project. I did not want the mk_ai project to make any changes to the mk_web_core schema, so I added managed = False to the Meta class for each model class.

Tests That Depend On The “Read-Only” Database

The original two database router configuration seemed to work but then I decided I really had to write some unit tests for the mk_ai application. The mk_web_core application already has unit tests. And since it is fairly independent - it only interacts with the AI system through a single “next recommendation” API call - it is easy to mock out the way it depends on mk_ai without compromising my confidence in the tests. However, the behavior AI application depends strongly on the data from the mk_web_core application. So to create any meaningful tests, we really need to be able to create specific data in a test version of the mk_web_core database. So all of the configuration I did to prevent the AI application from writing to the mk_web_core schema made it impossible to set up my test database. Hmmm.

So I removed the managed = False from each model’s Meta class and tried to figure out how to set up my routers so that I can write to the mk_web_core database test database, but not the mk_web_core production database. I struggled for a while and then I found this blog post from NewCircle. After some trial and error, this router appears to do what I need:

from django.conf import settings

class DefaultDatabaseRouter(object):
    def db_for_read(self, model, **hints):
        """
        This is the fall through. If the table isn't found in mk_web_core, it must be here.
        """
        if model._meta.app_label in ['accounts', 'materials']:
            return 'mk_web_core'
        else:
            return 'default'

    def db_for_write(self, model, **hints):
        """
        This is the fall through. All writes should be directed here.
        """
        if model._meta.app_label in ['accounts', 'materials]:
            if settings.TESTING:
                return 'mk_web_core'
            else:
                raise Exception('Attempt to write to mk_web_core from mk_ai when settings.TESTING not true!')
        else:
            return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        """
        Relations between objects are allowed if both objects are in the same pool.
        """
        return obj1._state.db == obj2._state.db

    def allow_migrate(self, db, app_label, model=None, **hints):
        """
        Write to test_mk_web_core when we are running unit tests.

        The check for model is because the contenttypes.0002_remove_content_type_name migration fails
        with message: AttributeError: 'NoneType' object has no attribute '_meta'
        """
        if app_label in ['accounts', 'materials']:
            if db == 'mk_web_core' and settings.TESTING:
                return True
            else:
                return False
        else:
            # Shortcut, we do import into default (mk_ai) but not into mk_web_core
            return db == 'default'

It is somewhat confusing that even though the tables for migrations from the materials app are not created, the migrations from materials are listed when you run python manage.py showmigrations and are recorded in the django_migrations table.

Django has support for test fixtures in its TestCase. But the fixtures are loaded and removed for each and every test. That is excessive for my needs and will make out tests very slow. I finally figured out how to load data into mk_web_core once at the beginning our tests - using migrations:

from django.db import migrations
from django.core import management

def load_test_data(apps, schema_editor):
    management.call_command('loaddata', 'materials/test_materials, verbosity=2, database='mk_web_core')

class Migration(migrations.Migration):
    dependencies = [('accounts', '0001_initial'),
                    ('materials', '0001_initial'),
                   ]
    operations = [
        migrations.RunPython(load_test_data),
    ]

Image Upload With Thumbnailing and S3 Storage

Django has great API documentation - as do most of the libraries and apps in the ecosystem. But I have been having a hard time finding examples that put all the pieces together. So as an aid to myself - and anyone else who is having trouble stringing image upload, thumbnail creation and S3 storage together, I put together a minimal project that supports uploading a user avatar in a Django 1.8 project. (Sorry, there are no unit tests, but the tests in the django-cleanup repository might be useful examples.)

The example is here: https://github.com/cnk/easy_thumbnails_example

ERB in IRB

Someday I may expand this to a longer post on Ruby debugging, but until then I am writing it down so I don’t have to search for it on Stack Overflow again.

If you need to debug an erb snippet in irb, this function gives you a handy shortcut for combining the template and arbitrary instance variables. Copy it into your irb session:

require 'erb'
require 'ostruct'

def erb(template, vars)
  ERB.new(template).result(OpenStruct.new(vars).instance_eval { binding })
end

And then you can use it as follows:

erb("Hey, <%= first_name %> <%= last_name %>", :first_name => "James", :last_name => "Moriarty")
 => "Hey, James Moriarty"

Kind of handy especially if you need to test out some ruby inside the <%= %> tag.