Best Practice for DB Migrations that Affect Many Tables

Best Practice for DB Migrations that Affect Many Tables - ruby-on-rails

Say you have a database of 50 tables. You are making a change to the wording of one of the columns that by relationship affects 20 of those tables. How would you set up this migration? I see at least three possibilities
a separate migration for a change on every table
a single migration for all of them
changing the initial declaration of the creation of the tables.
I'm quite confident 3 is the worst approach because now everyone cannot simply migrate up but would have to rebuild the entire schema. But I'm stuck between 1 and 2. 2 is probably the best approach because you are creating one migration for one change that just so happens to affect a lot of tables. This is what I'm leaning towards. On the other hand, it feels very messy.
Any resources on this would be appreciated as well. Thanks

It makes more sense to go for option 2.
Say you take option one and do x separate migrations. You'll end up with x migrations that by themselves will mess up with your database's integrity, so you'll have to run them all together (or rollback them all together if you want to undo the changes). So if all your changes need to be made together, then it makes sense to put them together in the same file

Related

Rails : Why shouldn't I directly make changes directly in schema than do migrate

I am using ruby on rails for an application. I am developing it on my local server as of now.
Everytime I need to make a change to the database, I need to create a migration. Why can't I directly make changes in schema.rb itself?
I am allowed to reset the database and reset all values in the tables. I came across a problem where I needed to change date format from "dateTime" to "timestamp". Now there are just too many fields to change. Why can't I just change them in schema.rb?

schema.rb is an automatically generated file and will be dumped from the current state of the database after you ran a migration. Although I strongly discourage it, it is in fact possible to change it manually, then run rake db:schema:load to apply it to the database. However, you will loose all the benefits you get from migrations, and you'd be ignoring the convention.
So, what are the benefits, you ask? Just to name a few:
They can be rolled back when you made a mistake
They make it easier to handle multiple developers on a single project
They provide a place to clean up and move around data before/after applying the change
They give you a history of changes to the database schema
They reduce some of the boilerplate by mapping rails concepts such as polymorphic relationships to simple DSL commands so you don't need to think as much how the columns should be named and typed

First, migrations are easily revertable. Second, with migrations you have history and, more important, order of changing things. Third, you can add additional code to migrations (for example, calculate some value for just added column).
And I'm sure there are more benefits.

Migrations are key to ensuring that your dev, test and prod environments are all identical. If you start mucking around in the database manually dev will very quickly look nothing like prod... In fact, it becomes extremely likely that you will begin doing your development in prod which is a very bad idea!

Rails: How much work should a single migration do?

I have to add foreign keys to several different tables in our Rails app. Is it better for me to add all the keys in one migration or to make several single-purpose migrations, one for each table being altered?

Generally speaking, "several single-purpose migrations" is better. Make sure your migration run up AND down. On a side note, your migration filename should be descriptive enough for a 3rd party to understand what the migration does.

When I do migrations I group changes based on common purpose. For instance if there are 3 changes that are for one feature of my app and 2 for another, I'll migrate those 3 together and then those 2 as another migration. That way in the future if I need to do a rollback, I'm only rolling back the changes that pertain to the area I'm working on.

I think, splitting the migrations or not is a question of what makes your code more clear. If adding multiple foreign keys is intended to be one task so do it in one migration. If your foreign keys come along with significant changes to the related models/controllers/views, it might be more clear to split them, so you can keep track of which migration belongs to which changes in the rest of your application.
When you are using git (or something similar), it might be helpful to bind corresponding changes in one commit.
From a performance point of view, it doesn't make a difference.

Active record migrations and refactoring

I'm in the midst of a fairly steep bit of refactoring on my current project. Previous to reaching this crossroads I have two models that I came to realize are really the same model but with in a different state and I want to represent the system that way. As a result I have to take all the objects of the soon to be defunct model and move them into the other model and set the new status column correctly. The problem is simple code-wise, especially since the models are so similar as is.
The pain point for me is that I have to make these changes at some midpoint in my migration in both directions. The path from here to there will be sort of like:
add_column :model_ones, :status, :string
make_all_model_two_records_into_model_one_records()
drop_table :model_twos
Clearly the other direction is easy to define as well
create_table :model_twos do |t|
...
end
move_model_ones_with_status_x_into_model_twos_table
remove_column :model_ones, :status
This is a good but when I get to that magical moment that I remove ModelTwo.rb from my repo then the whole thing goes to pot. At that point I can't ever migrate from the ground up w/o reading that source. My reaction to that is to either write straight sql to move the data back and forth or to take that data conversion out of the migration. If I take it out, where the heck does it go? How do I ensure it happens at the right time when migrating?
And let's say I surmount that aspect of the problem and I can now migrate happily from zero to present. I can NEVER migrate down, right? Does this represent some moment in time where the concept of staged migrations is simply dead to me?
I suppose I could go back and massage the earlier migrations to convince the world that ModelTwo never existed at all, but the thought of violating the sanctity of existing migrations makes my skin crawl.
People have to be doing this sort of refactor with Rails already somewhere. It has to be feasible, right? I can't figure out how to do it.
Thanks in advance,
jd

I would:
Create a migration that adds the status column
Run a rake task to move your data across
Test all the data moved correctly
Run another migration for removing the old table that isn't needed.
Sometimes you are going to need to alter old migrations to ensure you can build development environments easily. I am not sure why you think it's such a problem. The migrations are there to help you, not some magic rule you should feel obliged to abide by.
Sometimes you can get too hung up on best practice and forget that it's very hard to have "best practice rules" that apply to every situation. They are good as a guide but ultimately best practice is doing what is best for your project.

Best practice for maintaining migration files

Currently, for each table in my database, I add columns in several steps (ie. I add columns by migrating new files on multiple occasions). This results in a large number of migration files (~50 or so?). This seems very un-DRY.
I end up with large "add-details_to" files mixed with single entry "add_(column_name)_to" files, making it difficult to tell which file was used to migrate which column.
Is there a way to DRY up the migration files so that I have a single migration file for each table?
For example, if I add multiple columns in a single migration, then decide I want to remove one of those columns, what is the best practice?
1) create a down migration for the one column I want to remove
2) rollback the entire multiple-column migration, then create a new up migration with only the columns I want.
I currently follow 1, but it seems to me that 2 would allow me to get rid of my initial mistake migration files, thereby avoiding the lots-of-migration-files-for-each-table problem.
Any thoughts would be appreciated!

I think in general it's a good option to just let your migration files grow and just manage the growing requirements through tests. shoulda-matchers is a great tool for this.
I definitely do not like the idea of down migrations, especially after its up has been run on the server (few exceptions if the down is against the immediate migration). I would rather create another migration to do what would have been done in the down. Though, I will admit there are times down is the way to go.
But at the end this all depends on where you are in your app. If working on a feature locally and want to consolidate, I could see you doing that, where you are doing a db:migrate:redo till you get what you need on your current migration. However, once you push something up (especially to production) I'd add another migration.

When (if) to consolidate ActiveRecord migrations?

As I move through the iterations on my application*(s) I accumulate migrations. As of just now there are 48 such files, spanning about 24 months' activity.
I'm considering taking my current schema.rb and making that the baseline.
I'm also considering deleting (subject to source control, of course) the existing migrations and creating a nice shiny new single migration from my my current schema? Migrations tend to like symbols, but rake db:schema:dump uses strings: should I care?
Does that seem sensible?
If so, at what sort of interval would such an exercise make sense?
If not, why not?
And am I missing some (rake?) task that would do this for me?
* In my case, all apps are Rails-based, but anything that uses ActiveRecord migrations would seem to fit the question.

Yes, this makes sense. There is a practice of consolidating migrations. To do this, simply copy the current schema into a migration, and delete all the earlier migrations. Then you have fewer files to manage, and the tests can run faster. You need to be careful doing this, especially if you have migrations running automatically on production. I generally replace a migration that I know everyone has run with the new schema one.
Other people have slightly different ways to do this.
I generally haven't done this until we had over 100 migrations, but we can hit this after a few months of development. As the project matures, though, migrations come less and less often, so you may not have to do it again.
This does go against a best practice: Once you check in a migration to source control, don't alter it. I make a rare exception if there is a bug in one, but this is quite rare (1 in 100 maybe). The reason is that once they are out in the wild, some people may have run them. They are recorded as being completed in the db. If you change them and check in a new version, other people will not get the benefit of the change. You can ask people to roll back certain changes, and re-run them, but that defeats the purpose of the automation. Done often, it becomes a mess. It's better left alone.

I think that there are two kinds of migrations:
those you made during design/development, because you changed your mind on how your db should be like;
those you made between releases, reflecting some behaviour changes.
I get rid of the first kind of migrations as soon as I can, as they do not really represent working releases, and keep the second kind, so that it is possible, in theory, to update the app.
About symbols vs strings: many argue that only strings should be used in migrations: symbols are meant to be "handles" to objects, and should not be used to represent names (column and table names, in this case). This is a mere stylistic consideration, but convinced me, and I'm no more using symbols in migrations.
I've read of another point for using strings: "ruby symbols are memory leaks", meaning that, when you create a symbol, it never gets disposed for all the application life time. This seems quite pointless to me, as all your db columns will be used as symbols in a Rails (and ActiveRecord) app; the migrating task, also, will not last forever, so I don't think that this point actually makes sense.

The top of schema.rb declares:
# This file is auto-generated from the current state of the database. Instead of editing this file,
# please use the migrations feature of Active Record to incrementally modify your database, and
# then regenerate this schema definition.
#
# Note that this schema.rb definition is the authoritative source for your database schema. If you need
# to create the application database on another system, you should be using db:schema:load, not running
# all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for issues).
#
# It's strongly recommended to check this file into your version control system.
I must endorse what [giorgian] said above about different migrations for different purposes. I recommend cleaning up development-oriented migrations along with other tasks you do when you branch for a release. That works for well for me, for myself and small teams. Of course my main app sits atop and between two other databases with their own schemas which I have to be careful of so we use migrations (rather than schema restore) for a new install and those need to survive release engineering.

Having lots of migrations are a good thing. Combined with your version control system, they allow you to see what developer made a change to the database and why. This helps with accountability. Removing them just makes this a big hassle.
If you really want to get a new database up and running quickly you can just load the schema with rake db:schema:load RAILS_ENV=your_environment and if you want to get your test database setup quick you can just use rake db:test:prepare
That being said, if you really want to consolidate your migrations then I'd create a new migration that checks to see if the very last migration in your set has been performed (ex: does the column you added exist?) and if not, then it will fire. Otherwise the migration will just add itself to the schema table as completed so it doesn't attempt to fire again.
Just communicate what you're doing to the rest of your team so that they understand what is going on lest they blindly fire off a rake db:migrate and screw up something they already had.

Although I'm sure everyone has their own practices, there's a few rules implied by the way the migration system works:
Never commit changes to migrations that may have been used by other developers or previous deployments. Instead, make an additional migration to adjust things as required.
Never put model-level dependencies in a migration. The model may be renamed or deleted at some point in the future and this would prevent the migration. Keep the migration as self-contained as possible, even if that means it's quite simplistic and low-level.
Of course there are exceptions. For example, if a migration doesn't work, for whatever reason, a patch may be required to bring it up to date. Even then, though, the nature of the changes effected by the migration shouldn't change, though the implementation of them may.
Any mature Rails project will likely have around 200 to 1000 migrations. In my experience it is unusual to see a project with less than 30 except in the planning stages. Each model, after all, typically needs its own migration file.
Collapsing multiple migrations into a single one is a bad habit to get into when working on an evolving piece of software. You probably don't collapse your source control history, so why worry about database schema history?
The only occasion I can see it as being reasonably practical is if you're forking an old project to create a new version or spin-off and don't want to have to carry forward with an extraordinary number of migrations.

You shouldn't be deleting migrations. Why create the extra work?
Migrations essentially are a set of instructions that define how to build the database to support your application. As you build your application the migrations record the iterative changes you make to the database.
IMHO by resetting the baseline periodically you are making changes that have the potential to introduce bugs/issues with your application, creating extra work.
In the case where a column is mistakenly added and then needs to be removed sometime later, just create a new migration to remove extra column. My main reason for this is that when working in a team you don't want your colleagues to have to keep rebuilding their databases from scratch. With this simple approach you (and they) can carry on working in an iterative manner.
As an aside - when building a new database from scratch (without any data) migrations tend to run very quickly. A project I am currently working on has 177 migrations, this causes no problems when building a new database.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart