Infrastructure as Code Best Practices with Terraform for DevOps
João Victor Alhadas | Dec 17, 2024
Automated database migrations have been a convenient way of dealing with schema changes for a long time in Django. It’s been only 3 years since migrations have been incorporated into Django but South had been the de-facto solution since 2008.
The same way an ORM allows us to forget about SQL when writing queries to the database, migrations make sure we don’t write a single ‘ALTER TABLE’ in our schema changes. Some may argue that’s bad: we “lose control” over a critical part of our infrastructure, we don’t know how to write SQL anymore when needed, we’re not sure how that operation is really translated into SQL, etc, etc. Ok, these points are actually valid. However, Django migrations module is more than just a way of automatically generating and applying SQL statements, it’s also a transparent API to write your own database changes in Python. It comes with wheels for those who need it (or trust enough) and tools for those who like to get their hands dirty.
At this point, you must have noticed already on which team I play (it’s the pro-automated-migration one, if it was not clear). Still I have a critical point of view when it comes to it. It’s no secret that working with migrations on medium-to-large-sized teams can be quite annoying, having to resolve migration conflicts on each deploy sometimes. And this is old news, I remember even South library had a dedicated section: “Part 5: Teams and Workflow”, explaining best practices to avoid and resolve conflicts.
In these almost 10 years of Django migrations, there’s plenty of literature on the Internet with solutions to this topic, which summarizes to: avoid working on the same app, –merge them, manually change the migration dependencies or rollback and re-apply migrations. Actually, there’s also data migrations and database locks and downtime, but I won’t go over them, because this is not what annoys me the most.
This is what annoys me the most about Django migrations:
Let me try to put it in words:
python manage.py makemigrations core
python manage.py migrate core
git add . && git commit -m "Remove favorite color from profile"
Then you start making some code changes on another app, nothing related to your Profile model, spend a couple of minutes doing it, open Django admin to human-test it and remember you have to create another profile. Nothing wrong with that, right? Nope, you get an error.
Some may say I’m overreacting, because it’s a simple change to fix it:
git add . && git stash
python manage.py showmigrations core
python manage.py migrate core 0002_auto_20170618_1549
git stash pop && git reset
Yet the key point is having to stop all your line of thought, rollback the migrations and get back to it. That’s not only annoying, it’s counter-productive. What I wanted is something simple as:
$ ~ git checkout bernardo/some-other-branch
$ These migrations are applied to the database and are not on your code:
$ - core/migrations/0002_auto_20170618_1549.py
$ - RemoveField(model_name='profile', name='favorite_color')
$ Do you want me to resolve this issue? [y/n]
One day I stumbled upon this guy: django migrations – workflow with multiple dev branches. It’s a StackOverflow question asking whether there’s a git-hook to deal with these sort of things. I did find some solutions, but not quite what I was looking for (mainly this gist and this lib). Talking to other devs here at Cheesecake Labs we thought this deserved some attention. My first try was actually to do something similar to the question’s accepted answer suggestion:
1. Just before switching [branch], you dump the list of currently applied migrations into a temporary file mybranch_database_state.txt
2. Then, you apply myfeature branch migrations, if any
3. Then, when checking back mybranch, you reapply your previous database state by looking to the dump file.
Git has only a post-checkout hook, so no good to the first item on the list, it’s not possible to know the operations of applied migrations after the checkout happened. Django stores applied migrations on a database table called django_migrations, but all we have there is a migration file name and app, without the migration operations it had applied. We wanted the hook to be a bit smarter: some migration operations don’t raise ProgrammingErrors, we wanted to provide this information to the developer and avoid unnecessary rollbacks.
To have the desired output mentioned above, we started off with these set of short-term goals:
After git checkout finishes, it calls a post-checkout hook passing as argument the two branch references and whether it was a branch checkout:
$ post-checkout <previous-ref> <target-ref> <is-branch-checkout>
And then our hook has to follow these steps:
We’ve created a library called django-nomad. So far, we have implemented steps 1 to 5 and a git-hook installer. The code is still on the first stages, but it’s open on Github and accepting suggestions and Pull Requests.
On the long term, we would like to automatically resolve the mentioned issues. But this raises a couple of concerns, since simply rolling back the previous migrations may cause data loss, and despite being only development environment, it can be undesired.
When there’s a problem, there are also possibilities (quoting my friend Jonatas on his post about a Serverless architecture):
These are only in the field of ideas and were not put into thorough consideration to find pros, cons or blockers yet. Anyway they are a good start.
Thoughts? Suggestions? We’re doing something wrong? Help is always welcome. Thanks!
Settled down from travelling to build some good applications. Feels comfortable developing on both ends, though lately tends to the front-side of the moon.