When (if) to consolidate ActiveRecord migrations? - ruby-on-rails

As I move through the iterations on my application*(s) I accumulate migrations. As of just now there are 48 such files, spanning about 24 months' activity.
I'm considering taking my current schema.rb and making that the baseline.
I'm also considering deleting (subject to source control, of course) the existing migrations and creating a nice shiny new single migration from my my current schema? Migrations tend to like symbols, but rake db:schema:dump uses strings: should I care?
Does that seem sensible?
If so, at what sort of interval would such an exercise make sense?
If not, why not?
And am I missing some (rake?) task that would do this for me?
* In my case, all apps are Rails-based, but anything that uses ActiveRecord migrations would seem to fit the question.

Yes, this makes sense. There is a practice of consolidating migrations. To do this, simply copy the current schema into a migration, and delete all the earlier migrations. Then you have fewer files to manage, and the tests can run faster. You need to be careful doing this, especially if you have migrations running automatically on production. I generally replace a migration that I know everyone has run with the new schema one.
Other people have slightly different ways to do this.
I generally haven't done this until we had over 100 migrations, but we can hit this after a few months of development. As the project matures, though, migrations come less and less often, so you may not have to do it again.
This does go against a best practice: Once you check in a migration to source control, don't alter it. I make a rare exception if there is a bug in one, but this is quite rare (1 in 100 maybe). The reason is that once they are out in the wild, some people may have run them. They are recorded as being completed in the db. If you change them and check in a new version, other people will not get the benefit of the change. You can ask people to roll back certain changes, and re-run them, but that defeats the purpose of the automation. Done often, it becomes a mess. It's better left alone.

I think that there are two kinds of migrations:
those you made during design/development, because you changed your mind on how your db should be like;
those you made between releases, reflecting some behaviour changes.
I get rid of the first kind of migrations as soon as I can, as they do not really represent working releases, and keep the second kind, so that it is possible, in theory, to update the app.
About symbols vs strings: many argue that only strings should be used in migrations: symbols are meant to be "handles" to objects, and should not be used to represent names (column and table names, in this case). This is a mere stylistic consideration, but convinced me, and I'm no more using symbols in migrations.
I've read of another point for using strings: "ruby symbols are memory leaks", meaning that, when you create a symbol, it never gets disposed for all the application life time. This seems quite pointless to me, as all your db columns will be used as symbols in a Rails (and ActiveRecord) app; the migrating task, also, will not last forever, so I don't think that this point actually makes sense.

The top of schema.rb declares:
# This file is auto-generated from the current state of the database. Instead of editing this file,
# please use the migrations feature of Active Record to incrementally modify your database, and
# then regenerate this schema definition.
#
# Note that this schema.rb definition is the authoritative source for your database schema. If you need
# to create the application database on another system, you should be using db:schema:load, not running
# all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for issues).
#
# It's strongly recommended to check this file into your version control system.
I must endorse what [giorgian] said above about different migrations for different purposes. I recommend cleaning up development-oriented migrations along with other tasks you do when you branch for a release. That works for well for me, for myself and small teams. Of course my main app sits atop and between two other databases with their own schemas which I have to be careful of so we use migrations (rather than schema restore) for a new install and those need to survive release engineering.

Having lots of migrations are a good thing. Combined with your version control system, they allow you to see what developer made a change to the database and why. This helps with accountability. Removing them just makes this a big hassle.
If you really want to get a new database up and running quickly you can just load the schema with rake db:schema:load RAILS_ENV=your_environment and if you want to get your test database setup quick you can just use rake db:test:prepare
That being said, if you really want to consolidate your migrations then I'd create a new migration that checks to see if the very last migration in your set has been performed (ex: does the column you added exist?) and if not, then it will fire. Otherwise the migration will just add itself to the schema table as completed so it doesn't attempt to fire again.
Just communicate what you're doing to the rest of your team so that they understand what is going on lest they blindly fire off a rake db:migrate and screw up something they already had.

Although I'm sure everyone has their own practices, there's a few rules implied by the way the migration system works:
Never commit changes to migrations that may have been used by other developers or previous deployments. Instead, make an additional migration to adjust things as required.
Never put model-level dependencies in a migration. The model may be renamed or deleted at some point in the future and this would prevent the migration. Keep the migration as self-contained as possible, even if that means it's quite simplistic and low-level.
Of course there are exceptions. For example, if a migration doesn't work, for whatever reason, a patch may be required to bring it up to date. Even then, though, the nature of the changes effected by the migration shouldn't change, though the implementation of them may.
Any mature Rails project will likely have around 200 to 1000 migrations. In my experience it is unusual to see a project with less than 30 except in the planning stages. Each model, after all, typically needs its own migration file.
Collapsing multiple migrations into a single one is a bad habit to get into when working on an evolving piece of software. You probably don't collapse your source control history, so why worry about database schema history?
The only occasion I can see it as being reasonably practical is if you're forking an old project to create a new version or spin-off and don't want to have to carry forward with an extraordinary number of migrations.

You shouldn't be deleting migrations. Why create the extra work?
Migrations essentially are a set of instructions that define how to build the database to support your application. As you build your application the migrations record the iterative changes you make to the database.
IMHO by resetting the baseline periodically you are making changes that have the potential to introduce bugs/issues with your application, creating extra work.
In the case where a column is mistakenly added and then needs to be removed sometime later, just create a new migration to remove extra column. My main reason for this is that when working in a team you don't want your colleagues to have to keep rebuilding their databases from scratch. With this simple approach you (and they) can carry on working in an iterative manner.
As an aside - when building a new database from scratch (without any data) migrations tend to run very quickly. A project I am currently working on has 177 migrations, this causes no problems when building a new database.

Related

When is a "down" migration useful?

Short version:
What's up with down migrations? In what scenario(s) would you need to use one?
Long version:
Many frameworks that support database schema versions (including, not limited to, Rails' "migrations") allow the developer to specify how data upgrades (an up operation) can be reversed (aka down), or even automatically generate a downgrade operation by analysing the code (as in Rails' change method).
Indeed, it seems extremely common to do so in all the Rails migration code I've come across, which makes me wonder if it's considered a best practice.
Personally, I've never needed to downgrade a database schema, and I can't imagine a reasonable scenario where I would want to, either in dev or production. My experience seems at odds with the prevalence of down migrations, so I'm guessing I'm missing something...
What are the most common scenarios where a down is useful?
Say you've pushed a new version to production and run the migrations, and some time later you find a bug that can't be solved immediately. Since you need to keep the production server running, you revert to a previous commit. However, this wouldn't revert the changes made to the database, which would result in errors. So you'd need a way to roll back the changes made to the database, and then revert to the older version. This situation may not arise too frequently, but it's important to have the mechanism for when it does.
In development, you might run some migrations and later decide that you want to change, add, or remove something. Having a way to undo them means you don't need to create a new migration for every small change.
Two reasons from the practice:
code refactoring on the development environment - we can revert
database schema, refactor model & migration and then migrate up
again,
rollup previous application version after sending a "lemon" to
the production.
Missing "down" migrations means, that we haven't migrations at all, and we are back in the early 90' reverting database from backup or painfully modify the table(s) believing, that everything should go well.
So Basically Rails can detect inverse operation when you create migrations like
add_column => remove_column
create_table => drop_table
But for some kind of migrations where you can't really use change to downgrade db schema, for example if you want to change the precision of a decimal column, then it would not be possible to guess the original precision on rollback? So you need to define the down method in this case.

Changing schema incrementally with "monolithic" migrations

I'm in in a development environment, and though I understand migrations, I'm beginning to want (just because it seems more fun) to define my changes in a monolithic schema file that shows what I want the whole schema to be (not schema.rb, but maybe a migration file) instead of creating small, incremental changes through migrations. The only thing I can think of would be write migrations as normal, but have every migration dump all tables and re-create them the way I want them. Is this madness?
I've had a look at the rails guide Active Record Migrations and I've done some searching around. It seems like I'm going against the grain and should just define incremental migrations. I should not be messing with schema.rb and then using rake db:schema:load because that's intended for deployment only, correct?
This is madness. Don't go against the grain >:D
Seriously though, why would you want that when you could use git to see a snapshot of your schema.rb at any given time?
As an aside, what we typically do (we as in every company I've worked for so far) is delete all of our old migrations after a certain point. Ex, in March of this year, I wiped out all migrations older than January 2015.

Rails : Why shouldn't I directly make changes directly in schema than do migrate

I am using ruby on rails for an application. I am developing it on my local server as of now.
Everytime I need to make a change to the database, I need to create a migration. Why can't I directly make changes in schema.rb itself?
I am allowed to reset the database and reset all values in the tables. I came across a problem where I needed to change date format from "dateTime" to "timestamp". Now there are just too many fields to change. Why can't I just change them in schema.rb?
schema.rb is an automatically generated file and will be dumped from the current state of the database after you ran a migration. Although I strongly discourage it, it is in fact possible to change it manually, then run rake db:schema:load to apply it to the database. However, you will loose all the benefits you get from migrations, and you'd be ignoring the convention.
So, what are the benefits, you ask? Just to name a few:
They can be rolled back when you made a mistake
They make it easier to handle multiple developers on a single project
They provide a place to clean up and move around data before/after applying the change
They give you a history of changes to the database schema
They reduce some of the boilerplate by mapping rails concepts such as polymorphic relationships to simple DSL commands so you don't need to think as much how the columns should be named and typed
First, migrations are easily revertable. Second, with migrations you have history and, more important, order of changing things. Third, you can add additional code to migrations (for example, calculate some value for just added column).
And I'm sure there are more benefits.
Migrations are key to ensuring that your dev, test and prod environments are all identical. If you start mucking around in the database manually dev will very quickly look nothing like prod... In fact, it becomes extremely likely that you will begin doing your development in prod which is a very bad idea!

is it acceptable to modify migrations

I'm joined a rails project which have been going for past two months. I saw that the developers are modifying existing migrations to change column types/names? When I ran migrations nothing happens and I get random errors like, method not found. When I deubug and check the database, find that the field names are different and thats the reason errors come.
As per my understanding for each modification of database, we need to create new migration.
Is this behavior of modifying existing migrations acceptable?
Generally, once you check in a migration it should not be modified. For one project I was a part of, we reset the database on a regular basis and so people would modify migrations from time to time rather than create new ones, but it's not a practice I'm fond of.
So, if you want the default Rails migration behavior to work, no, don't change a migration that someone may have already used. However, if you are okay with working around the default behavior, it doesn't matter. Note: if you're running migrations on production to keep the database up to date, it's important that you never change a migration that has already been run on production!
It's all about good style. Of course you can change migrations, but this is bad style. According the guidelines you should make new migration for each changing.
In my opinion this is not acceptable. Here are some reasons when and why it could be acceptable or nor:
I'm developing 2 rails applications for myself only, and I am the only developer. When I have made an error in the migration, I sometimes just fix it, rollback the migration and redo it after fixing the error. This only works because
I know that I am the only one.
I do it right after the erroneous change
I know how and when to first rollback and then redo it.
In all other cases, it is not acceptable:
More than one developer.
Integrated by version control system like Subversion, Git, ...
Independent development done in different rooms or even location.
So I think you have the hard job to change the behavior of others. Good luck with that!

collapse previous rails migrations

Is it possible to create a single migration from all the previous ones, so that it would have the effect of rake db:schema:load? I have many migrations that are useless (going back and forth between models).
You could take the code from db/schema.rb that gets generated, and make a migration out of it - deleting the old migrations.
However I reccomend you not to do that.
You should keep your original "messy" migrations, they represent versions of your db schema in sync with your source code versioning. There is no value in doing that other the perceived cleanliness of the code. More than than, it's actually a loss of value since you loose some of your code history, and history is meaningful when one analyses the code - maybe someone debugs something etc.
It depends on your project. If you're the only person on it, and you know you can just drop some migrations, then it's probably ok to do it.
But that's only the case when you have something like create_posts and then remove_posts a little while later.
Anyway, I'd advise against this, since migrations are kind of version control management for the database, especially if it's a multiple person project. It's kind of like trying to merge old commits for the sake of having your git log cleaner. In some cases it might be ok, but it can cause a lot more trouble than it's worth.

Resources