What really annoys me about Django migrations
Tools Technical Opinion DevOps Development Back-end

What really annoys me about Django migrations

Automated database migrations have been a convenient way of dealing with schema changes for a long time in Django. It’s been only 3 years since migrations have been incorporated into Django but South had been the de-facto solution since 2008.

The same way an ORM allows us to forget about SQL when writing queries to the database, migrations make sure we don’t write a single ‘ALTER TABLE’ in our schema changes. Some may argue that’s bad: we “lose control” over a critical part of our infrastructure, we don’t know how to write SQL anymore when needed, we’re not sure how that operation is really translated into SQL, etc, etc. Ok, these points are actually valid. However, Django migrations module is more than just a way of automatically generating and applying SQL statements, it’s also a transparent API to write your own database changes in Python. It comes with wheels for those who need it (or trust enough) and tools for those who like to get their hands dirty.

At this point, you must have noticed already on which team I play (it’s the pro-automated-migration one, if it was not clear). Still I have a critical point of view when it comes to it. It’s no secret that working with migrations on medium-to-large-sized teams can be quite annoying, having to resolve migration conflicts on each deploy sometimes. And this is old news, I remember even South library had a dedicated section: “Part 5: Teams and Workflow”, explaining best practices to avoid and resolve conflicts.

In these almost 10 years of Django migrations, there’s plenty of literature on the Internet with solutions to this topic, which summarizes to: avoid working on the same app, –merge them, manually change the migration dependencies or rollback and re-apply migrations. Actually, there’s also data migrations and database locks and downtime, but I won’t go over them, because this is not what annoys me the most.

The problem

This is what annoys me the most about Django migrations:

ProgrammingError Migration

Let me try to put it in words:

  1. I was in branch bernardo/remove-favorite-color.
  2. Removed the field “favorite_color” from my Profile model on core app.
  3. Generated the migration:
     python manage.py makemigrations core
  4. Applied it:
    python manage.py migrate core
  5. Committed:
    git add . && git commit -m "Remove favorite color from profile"
  6. And switched to branch bernardo/some-other-feature.

Then you start making some code changes on another app, nothing related to your Profile model, spend a couple of minutes doing it, open Django admin to human-test it and remember you have to create another profile. Nothing wrong with that, right? Nope, you get an error.

Some may say I’m overreacting, because it’s a simple change to fix it:

  1. Stash your current changes:
    git add . && git stash
  2. Go back to bernardo/remove-favorite-color
  3. Get the migration name:
    python manage.py showmigrations core
  4. Rollback the migration:
    python manage.py migrate core 0002_auto_20170618_1549
  5. Go back to bernardo/some-other-feature
  6. Unstash changes:
    git stash pop && git reset
  7. Profit

Yet the key point is having to stop all your line of thought, rollback the migrations and get back to it. That’s not only annoying, it’s counter-productive. What I wanted is something simple as:

$ ~ git checkout bernardo/some-other-branch
$ These migrations are applied to the database and are not on your code:
$ - core/migrations/0002_auto_20170618_1549.py
$    - RemoveField(model_name='profile', name='favorite_color')
$ Do you want me to resolve this issue? [y/n]

One solution

One day I stumbled upon this guy: django migrations – workflow with multiple dev branches. It’s a StackOverflow question asking whether there’s a git-hook to deal with these sort of things. I did find some solutions, but not quite what I was looking for (mainly this gist and this lib). Talking to other devs here at Cheesecake Labs we thought this deserved some attention. My first try was actually to do something similar to the question’s accepted answer suggestion:

1. Just before switching [branch], you dump the list of currently applied migrations into a temporary file mybranch_database_state.txt
2. Then, you apply myfeature branch migrations, if any
3. Then, when checking back mybranch, you reapply your previous database state by looking to the dump file.

Git has only a post-checkout hook, so no good to the first item on the list, it’s not possible to know the operations of applied migrations after the checkout happened. Django stores applied migrations on a database table called django_migrations, but all we have there is a migration file name and app, without the migration operations it had applied. We wanted the hook to be a bit smarter: some migration operations don’t raise ProgrammingErrors, we wanted to provide this information to the developer and avoid unnecessary rollbacks.

Our solution

To have the desired output mentioned above, we started off with these set of short-term goals:

  1. Find out whether there are applied migrations on the previous branch, that are not present on target branch.
  2. Read the contents of these migrations and find all operations.
  3. Warn the user, when the checkout is done, that a set of migrations are applied and not tracked.

After git checkout finishes, it calls a post-checkout hook passing as argument the two branch references and whether it was a branch checkout:

$ post-checkout <previous-ref> <target-ref> <is-branch-checkout>

And then our hook has to follow these steps:

  1. Find the closest commit ancestor between the two references passed by the post-checkout hook. Which translates basically to: find where in the history the two branches have diverted. We use git-merge-base command for that.
  2. With a git-diff command, list all files that have been changed between the previous branch and the ancestor commit.
  3. From this list, filter all files that are migration modules. By default, Django creates a migration module on each app, but the name can be changed using the MIGRATION_MODULES settings, so we take this into consideration.
  4. With all migrations in hand, check which of them are applied to the database. We do that by querying the database, using Django’s MigrationRecorder class, which already provides an applied_migrations helper function.
  5. Now with the final migration files list, we can execute their code and discover operations for each migration.

We’ve created a library called django-nomad. So far, we have implemented steps 1 to 5 and a git-hook installer. The code is still on the first stages, but it’s open on Github and accepting suggestions and Pull Requests.

What’s next?

On the long term, we would like to automatically resolve the mentioned issues. But this raises a couple of concerns, since simply rolling back the previous migrations may cause data loss, and despite being only development environment, it can be undesired.

When there’s a problem, there are also possibilities (quoting my friend Jonatas on his post about a Serverless architecture):

  1. We could create a separate schema and override the database connector to switch between them based on the branch, whenever a clash happens.
  2. Instead of dumping the whole schema, we could maintain only a separate table with different migrations applied, and update the db_table attribute on each Model.

These are only in the field of ideas and were not put into thorough consideration to find pros, cons or blockers yet. Anyway they are a good start.

Thoughts? Suggestions? We’re doing something wrong? Help is always welcome. Thanks!

About the author

Bernardo Smaniotto

Bernardo started working as a backend engineer at Cheesecake Labs and later joined the frontend team, working on ReactJS and AngularJS web apps.

Need a team for your projects?
We'd love to hear your ideas!

Connect with us!