What really annoys me about Django migrations
Tools Technical Opinion DevOps Development Back-end

What really annoys me about Django migrations

Automated database migrations have been a convenient way of dealing with schema changes for a long time in Django. It’s been only 3 years since migrations have been incorporated into Django but South had been the de-facto solution since 2008.

The same way an ORM allows us to forget about SQL when writing queries to the database, migrations make sure we don’t write a single ‘ALTER TABLE’ in our schema changes. Some may argue that’s bad: we “lose control” over a critical part of our infrastructure, we don’t know how to write SQL anymore when needed, we’re not sure how that operation is really translated into SQL, etc, etc. Ok, these points are actually valid. However, Django migrations module is more than just a way of automatically generating and applying SQL statements, it’s also a transparent API to write your own database changes in Python. It comes with wheels for those who need it (or trust enough) and tools for those who like to get their hands dirty.

At this point, you must have noticed already on which team I play (it’s the pro-automated-migration one, if it was not clear). Still I have a critical point of view when it comes to it. It’s no secret that working with migrations on medium-to-large-sized teams can be quite annoying, having to resolve migration conflicts on each deploy sometimes. And this is old news, I remember even South library had a dedicated section: “Part 5: Teams and Workflow”, explaining best practices to avoid and resolve conflicts.

In these almost 10 years of Django migrations, there’s plenty of literature on the Internet with solutions to this topic, which summarizes to: avoid working on the same app, –merge them, manually change the migration dependencies or rollback and re-apply migrations. Actually, there’s also data migrations and database locks and downtime, but I won’t go over them, because this is not what annoys me the most.

The problem

This is what annoys me the most about Django migrations:

ProgrammingError Migration

Let me try to put it in words:

  1. I was in branch bernardo/remove-favorite-color.
  2. Removed the field “favorite_color” from my Profile model on core app.
  3. Generated the migration:
     python manage.py makemigrations core
  4. Applied it:
    python manage.py migrate core
  5. Committed:
    git add . && git commit -m "Remove favorite color from profile"
  6. And switched to branch bernardo/some-other-feature.

Then you start making some code changes on another app, nothing related to your Profile model, spend a couple of minutes doing it, open Django admin to human-test it and remember you have to create another profile. Nothing wrong with that, right? Nope, you get an error.

Some may say I’m overreacting, because it’s a simple change to fix it:

  1. Stash your current changes:
    git add . && git stash
  2. Go back to bernardo/remove-favorite-color
  3. Get the migration name:
    python manage.py showmigrations core
  4. Rollback the migration:
    python manage.py migrate core 0002_auto_20170618_1549
  5. Go back to bernardo/some-other-feature
  6. Unstash changes:
    git stash pop && git reset
  7. Profit

Yet the key point is having to stop all your line of thought, rollback the migrations and get back to it. That’s not only annoying, it’s counter-productive. What I wanted is something simple as:

$ ~ git checkout bernardo/some-other-branch
$ These migrations are applied to the database and are not on your code:
$ - core/migrations/0002_auto_20170618_1549.py
$    - RemoveField(model_name='profile', name='favorite_color')
$ Do you want me to resolve this issue? [y/n]

One solution

One day I stumbled upon this guy: django migrations – workflow with multiple dev branches. It’s a StackOverflow question asking whether there’s a git-hook to deal with these sort of things. I did find some solutions, but not quite what I was looking for (mainly this gist and this lib). Talking to other devs here at Cheesecake Labs we thought this deserved some attention. My first try was actually to do something similar to the question’s accepted answer suggestion:

1. Just before switching [branch], you dump the list of currently applied migrations into a temporary file mybranch_database_state.txt
2. Then, you apply myfeature branch migrations, if any
3. Then, when checking back mybranch, you reapply your previous database state by looking to the dump file.

Git has only a post-checkout hook, so no good to the first item on the list, it’s not possible to know the operations of applied migrations after the checkout happened. Django stores applied migrations on a database table called django_migrations, but all we have there is a migration file name and app, without the migration operations it had applied. We wanted the hook to be a bit smarter: some migration operations don’t raise ProgrammingErrors, we wanted to provide this information to the developer and avoid unnecessary rollbacks.

Our solution

To have the desired output mentioned above, we started off with these set of short-term goals:

  1. Find out whether there are applied migrations on the previous branch, that are not present on target branch.
  2. Read the contents of these migrations and find all operations.
  3. Warn the user, when the checkout is done, that a set of migrations are applied and not tracked.

After git checkout finishes, it calls a post-checkout hook passing as argument the two branch references and whether it was a branch checkout:

$ post-checkout <previous-ref> <target-ref> <is-branch-checkout>

And then our hook has to follow these steps:

  1. Find the closest commit ancestor between the two references passed by the post-checkout hook. Which translates basically to: find where in the history the two branches have diverted. We use git-merge-base command for that.
  2. With a git-diff command, list all files that have been changed between the previous branch and the ancestor commit.
  3. From this list, filter all files that are migration modules. By default, Django creates a migration module on each app, but the name can be changed using the MIGRATION_MODULES settings, so we take this into consideration.
  4. With all migrations in hand, check which of them are applied to the database. We do that by querying the database, using Django’s MigrationRecorder class, which already provides an applied_migrations helper function.
  5. Now with the final migration files list, we can execute their code and discover operations for each migration.

We’ve created a library called django-nomad. So far, we have implemented steps 1 to 5 and a git-hook installer. The code is still on the first stages, but it’s open on Github and accepting suggestions and Pull Requests.

What’s next?

On the long term, we would like to automatically resolve the mentioned issues. But this raises a couple of concerns, since simply rolling back the previous migrations may cause data loss, and despite being only development environment, it can be undesired.

When there’s a problem, there are also possibilities (quoting my friend Jonatas on his post about a Serverless architecture):

  1. We could create a separate schema and override the database connector to switch between them based on the branch, whenever a clash happens.
  2. Instead of dumping the whole schema, we could maintain only a separate table with different migrations applied, and update the db_table attribute on each Model.

These are only in the field of ideas and were not put into thorough consideration to find pros, cons or blockers yet. Anyway they are a good start.

Thoughts? Suggestions? We’re doing something wrong? Help is always welcome. Thanks!

About the author

Bernardo Smaniotto

Bernardo Smaniotto is an engineer by heart, crossing borders within product development disciplines. As COO at Cheesecake he’s always making sure that we have efficient, collaborative processes that deliver on the quality we’re bound to.

Need a team for your projects?
We'd love to hear your ideas!

Connect with us!

4 thoughts on “What really annoys me about Django migrations”

  1. Adam Hopkins disse:

    Nice work, and I like the idea and direction you are heading. The problem seems to me a little more systemic, and the need to fix it seems a bit deeper than what you are trying to achieve. What git has given us is a way to easily

    1. Bernardo Smaniotto Bernardo Smaniotto disse:

      Hey Adam, thanks! Interesting thoughts. Since it’s a problem only on dev mode, I don’t think it would explain a revamping of current SQL operations. Migrations are kind of versioning on the schema, the issue is keeping it in sync with data and the schema representation on your application. Or maybe I’m not seeing that far yet haha

  2. Brice disse:

    For my projects, I manually create various versions of my database file (I use SQLite in development). Whenever I switch from one branch to the other, I rename my sqlite files. I’ve been doing this for years, but not that often (I usually only work on a single branch at a time).
    I’ve thought a lot of times of automatizing it with hooks and config: Every time I create a new branch, a copy of the database is created using the name of the branch. Every time a branch is merged into another one, we replace the old db with the one from the new branch if there is no conflict. As I work with Git-Flow, I always have my feature branch up-to-date with dev before a merge. It should be studied for when it’s not the case… Maybe a prompt to know which db to keep? When a branch is deleted, the file is too. And finally, create a sample db configuration (in settings.py) which points to the db file with the same name (or hash) as current branch.
    … but I haven’t had the time to do it yet!
    Also, that would only work:
    – for those using sqlite
    – for those with few data in the database (or the data storage would be even bigger due to the duplication of db files).
    – and there probably be many edge cases as well.
    A good thing about that is that it could allow some other behaviours like when you fetch a branch from a coworker, you could have some kind of repository where the script can go and fetch the database with the data to use.

    Anyway, it confirms there’s something missing here (or was missing, I haven’t yet had the time to look at your solution).
    Thanks for having taken the time to do it, which I haven’t, and for having made it open.

  3. Marcin Nowak disse:

    There is no necessity to fight with builtin migrations, conflicts, deps, python deps, squashing hell and occassional creating migration files in installed eggs. Just do not use it. The Django’s migration system is not well architected, it should be avoided and requires rewritting in the future.

    I’m using Liquibase with Liquimigrate wrapper for Django. Currently we’re working on syncing between builtin migrations and Liquibase changesets to avoid manual (re)writing. Liquibase is a database-centric and project-wide db schema and db data migration system. There are no deps, no filename conflicts (just pure diffs), no requirement for squashing, no python deps nor code. Preparing changesets is not so fast like `makemigrations`, but the migration system is very stable.

Deixe uma resposta