Rails - authoritative source for your database schema? - ruby-on-rails

I have Rails app, and every once in a while, when I bring new developer onboard they exclaim that they should be able to produce the current DB schema in their dev environment by running the whole history of the migrations. I personally don't think that migrations is the authoritative source for your schema. Right now what we do is load a production copy of the DB, with the current schema, onto the dev machine. And, from there, the schema can be maintained via incremental migrations.
So my question are:
What is the authoritative source of your schema on a Rails project?
What is now considered the best-practice way to maintain your DB schema?

I do not consider migrations to be the authoritative source for your schema. Migrations are extremely powerful but optional. Some developers use alternative workflows especially in environments where DBA's insist on strong referential integrity and DBMS-enforced constraints. I suggest looking at the official RoR Guide on Migrations for more information. The db/schema.db (or db/{env}_structure.sql) file is the authoritative source for your schema. Many developers will purge old migrations as projects get older so running each migration will not necessarily produce a working database. It also takes a long time to run through a hundreds of migrations. Rails uses schema.db (or the sql dump file) to build the test database and of course when running rake db:setup which is the recommended way of creating a new database for your application.
Bottom like is that rake db:setup should always produce a working database regardless of migrations. Developers can use this to create new environments and Rails uses this to run your tests.
http://guides.rubyonrails.org/migrations.html#schema-dumps-and-source-control

Normally, running the succession of all migrations should produce your actual DB schema (if it's not the case, then you didn't use your migrations correctly*).
Another way of doing is to copy over the schema.rb (created/updated when you migrate), which is used by rake db:setup and should produce an exact copy of the schema you have in production (unless, again, you didn't use migrations correctly*).
Then, if you need "sample data", you can insert it using the db/seeds.rb file, which contains ruby code that can access your models, and thus create and persist new entities & so on...
*: There are cases where you can't put all your database changes in migrations in a "usual" way (it is uncommon and should be avoided if possible)... These should be included in migrations however (in plain SQL execution statements), or the changes would need to be made manually on the dev DB as well... And then using a snapshot of prod. is sometimes more convenient. But again, I would discourage doing so.

Related

How can I convert all migration files into a single file in Rails?

I have been developing a project in Ruby on Rails. During the development I have generated tons of migration files for my project. Sometimes I have added and deleted columns from different tables.
Is there a way that I could consolidate all the migrations from multiple files into a single file?
TL;DR
What you need isn't a consolidated set of migrations; it's a single schema file and an optional seeds.rb file. Rails generally maintains the schema automagically when you run migrations, so you already have most of what you should need with the possible exception of seed data as described below.
Use the Schema, Not Migrations
In general, you shouldn't be maintaining a large pool of migrations. Instead, you should periodically clear out your migrations, and use schema.rb or schema.sql to (re)create a database. The Rails guides specifically state:
There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema...Because schema dumps are the authoritative source for your database schema, it is strongly recommended that you check them into source control.
You should therefore be using bin/rails db:schema:load rather than running migrations, or run the associated Rake task on older versions of Rails.
Data Migrations
While you can use migrations to fix or munge data related to a recent schema change, data migrations (if used at all) should be temporary artifacts. Data migrations are almost never idempotent, so you shouldn't be maintaining data migrations long-term. The guide says:
Some people use migrations to add data to the database...However, Rails has a 'seeds' feature that should be used for seeding a database with initial data. It's a really simple feature: just fill up db/seeds.rb with some Ruby code, and run rake db:seed...This is generally a much cleaner way to set up the database of a blank application.
Database seed data should be loaded with bin/rails db:seed (or the associated Rake task) rather than maintaining the data in migrations.
There is a gem that purports to do exactly what you describe in the question - check out Squasher.
From the README:
"Squasher compresses old migrations in a Rails application. ... Squasher removes all the migrations and creates a single migration with the final database state of the specified date (the new migration will look like a schema)."
You'll have to do the merge manually.
But if you want only a single file, there is db/schema.rb. It contains a snapshot of current database schema. You can load it directly in database if you want.

Why is db:reset different from running all migrations?

In section Rails Database Migrations of Ruby on Rails Guides, there is one line saying that
The db:reset task will drop the database, recreate it and load the current
schema into it. This is not the same as running all the migrations.
Can anyone tell me where exactly they are different and why it is more error prone to replay the migration history?
I'm fairly new to Ruby on Rails. Thanks in advance.
The schema file contains the current structure of your database. When you load it, you are guaranteed to have the exact schema in your db that is in the file. Migrations were designed to make incremental changes in the database. You may add a table, then some columns, and then remove the table in three separate migrations. There's no need to go through all this when the schema already knows that the table no longer exists.
On why they are error prone, I'm not totally sure. The one thing I can think of is that migrations can be used to make changes to data and not just the structure.
Running rake db:reset will rebuild the structure of your database from schema.db, which essentially works as a cached version of your migrated database structure. Running all your migrations, on the other hand, applies the migrations one by one, which may include arbitrary code to accomodate for changes to the database (e.g. prepopulate an added counter cache column).
It can be more error prone to replay the migration history, since it is the product of changes to both the structure and data of the database. If the developers haven't been careful, it might not apply cleanly to a fresh environment (e.g. the migration assumes an old version of a model). On the other hand, schema.db can get out of sync if you edit a migration once you've migrated (a useful trick to avoid migration explosion during development). In that case, you need to run rake db:migrate:reset.

Aggregate migrations in Rails

I have several dozens Rails DB migrations which were written over a year. Is there a way to aggregate them to one migration so that I will just see a full DDL statement for the database as it exists now? I just need the current snaphot without all the history of how we got to it.
It is possible, but probably not a good idea to aggregate the migrations!
Maybe ask:
Why do you want to do this?
How often do you really need to migrate all the way to VERSION=0 and then back up again?
Is something really broken? (if not, then don't fix it)
I've had the same problem once.. I ended up just re-ordering my migrations, because changes in the schema caused it to not correctly migrate up/down anymore. I would be hesitant to do that again.
If you have migrations which just add fields or indexes, then maybe you can combine them with the main migration for the model -- but beware that you can't reproduce old situations anymore, e.g. older DB-dumps may not be compatible with what migration number they should be compatible with -- that is probably the biggest argument against aggregating...
Technically, you can dump the schema and then load it directly - that is one way:
rake db:schema:dump
then create a single new migration with the contents of the schema dump file db/schema.rb
Here are some similar questions:
Rebase Rails migrations in a long running project
Deleting/"Rebasing" rails migrations
Way to "flatten" Rails migrations?
Should I flatten Rails migrations?
P.S.: I found it useful to stick with the old migration numbering scheme, where the migrations do not use timestamps - for me this works better (is easier to see in which order they are).
e.g. in your config/application.rb file:
config.active_record.timestamped_migrations = false
You should never be using all the migrations to get a database up and running. The current schema.rb is always what the DB looks like 'presently'.
It's good practice to periodically just truncate your migrations if you have a ton of them in there. We finally did that with one of our larger applications, removing a good 50 migrations from the folder because the only thing that matters is schema.rb. Migrations are just that, a way to migrate and make changes to an existing state of the database. They should only ever have to be run once.
You can simply load the current schema into the DB.
rake db:schema:load RAILS_ENV=[production, test, etc.]
This will take the schema.rb file's version of the schema, and load it into the DB without running individual migrations.
NOTE: if you have migrations that put data into the DB (e.g. default values, for example), that data will not be added to the DB.
If you need to load default values into your DB, that might be better done via a custom rake task, independent of migrations.

rake db:schema:load vs. migrations

Very simple question here - if migrations can get slow and cumbersome as an app gets more complex and if we have the much cleaner rake db:schema:load to call instead, why do migrations exist at all?
If the answer to the above is that migrations are used for version control (a stepwise record of changes to the database), then as an app gets more complex and rake db:schema:load is used more instead, do they continue to maintain their primary function?
Caution:
From the answers to this question: rake db:schema:load will delete data on a production server so be careful when using it.
Migrations provide forward and backward step changes to the database. In a production environment, incremental changes must be made to the database during deploys: migrations provide this functionality with a rollback failsafe. If you run rake db:schema:load on a production server, you'll end up deleting all your production data. This is a dangerous habit to get into.
That being said, I believe it is a decent practice to occasionally "collapse" migrations. This entails deleting old migrations, replacing them with a single migration (very similar to your schema.rb file) and updating the schema_migrations table to reflect this change. Be very careful when doing this! You can easily delete your production data if you aren't careful.
As a side note, I strongly believe that you should never put data creation in the migration files. The seed.rb file can be used for this, or custom rake or deploy tasks. Putting this into migration files mixes your database schema specification with your data specification and can lead to conflicts when running migration files.
Just stumbled across this post, that was long ago and didn't see the answer I was expecting.
rake db:schema:load is great for the first time you put a system in production. After that you should run migrations normally.
This also helps you cleaning your migrations whenever you like, since the schema has all the information to put other machines in production even when you cleaned up your migrations.
Migrations lets you add data to the db too. but db:schema:load only loads the schema .
Because migrations can be rolled back, and provide additional functionality. For example, if you need to modify some data as part of a schema change then you'll need to do that as a migration.
As a user of other ORM's, it always seemed strange to me that Rails didn't have a 'sync and update' feature. ie, by using the schema file (which represents the entire, up-to-date schema), go through the existing DB structure and add/remove tables, columns, indexes as required.
To me this would be a lot more robust, even if possibly a little slower.
I have already posted as a comment, but feels it is better to put the comments of the db/schema.rb file here:
# Note that this schema.rb definition is the authoritative source for your
# database schema. If you need to create the application database on another
# system, you should be using db:schema:load, not running all the migrations
# from scratch. The latter is a flawed and unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for issues).
#
# It's strongly recommended that you check this file into your version control system.
Actually, my experience is that it is better to put the migration files in git and not the schema.rb file...
rake db:migrate setup the tables in the database. When you run the migration command, it will look in db/migrate/ for any ruby files and execute them starting with the oldest. There is a timestamp at the beginning of each migration filename.
Unlike rake db:migrate that runs migrations that have not run yet, rake db:schema:load loads the schema that is already generated in db/schema.rbinto the database.
You can find out more about rake database commands here.
So schema:load takes the currently configured schema, derives the associated queries to match, and runs them all in one go. It's kind of a one-and-done situation. As you've seen, migrations make changes step-by-step. Loading the schema might make sense when working on a project locally, especially early in the lifetime of a project. But if we were to drop and recreate the production DB each time we do a deployment, we would lose production data each time. That's a no-go. So that's why we use migrations to make the required changes to the existing DB.
So. The deeper into a project you get, the more migrations you'll get stacked up as you make more changes to the DB. And with each migration, those migrations become more and more the source of truth of what's on production - what matters isn't what's in the schema, but what migrations have been run in production. The difference is effectively moot if we have both in sync. But as soon as one goes of out date from the other, you start to have discrepancies. Ideally this would not happen, but we live in the real world, and stuff happens. And if you're using schema:load to set up your DB locally, you might not be getting the actual state of the DB, as it is reflected via the migration history on production.

Rebase Rails migrations in a long running project

In which I mean "rebasing" in the dictionary, rather than git definition...
I have a large, long running Rails project that has about 250 migrations, it's getting a touch unwieldy to manage all of these.
That said, I do need a base from which to purge and rebuild my database when running tests. So the data contained in these is important.
Does any one have any strategies for say, dumping the schema at a set point - archiving off all the old migrations and starting afresh with new migrations.
Obviously I can use rake schema:dump - but really I need a way that db:migrate will load the schema first and then start running the rest of the migrations.
I would like to keep using migrations as they're very useful in development, however, there's no way I'm going back and editing a migration from 2007 so it seems silly to keep it.
In general, you don't need to clean up old migrations. If you're running db:migrate from scratch (no existing db), Rails uses db/schema.rb to create the tables instead of running every migration. Otherwise, it only runs the migrations required to upgrade from the current schema to the latest.
If you still want to combine migrations up to a given point into a single one, you could try to:
migrate from scratch up to the targeted schema using rake db:migrate VERSION=xxx
dump the schema using rake db:schema:dump
remove the migrations from the beginning up to version xxx and create a single new migration using the contents of db/schema.rb (put create_table and add_index statements into the self.up method of the new migration).
Make sure to choose one of the old migration version numbers for your aggregated new migration; otherwise, Rails would try to apply that migration on your production server (which would wipe your existing data, since the create_table statements use :force⇒true).
Anyway, I wouldn't recommend to do this since Rails usually handles migrations well itself. But if you still want to, make sure to double check everything and try locally first before you risk data loss on your production server.
To automate the merging (or squashing) of migrations, you could use the Squasher gem
Simply install
gem install squasher
And run with a date, and migrations before that date will be merged:
squasher 2016 # => Will merge all migration created before 2016
More details in the README
In addition to the answer provided (which well indicates how to consolidate your volume of migrations), you indicate a concern to purge data (which I assume is manually added after fixtures populate your tables); which infers you're depending on refreshing an initial data state. Some projects indeed require intensive refinement of base data, reconstruction, and re-population of tables. Ours heavily depends on repetitive execution of these operations, and I've found that if you can reduce your schema entirely to SQL execute statements, your tables will rebuild far faster than they will from Ruby syntax.
A trivial further help in rebuilding your tables is to dedicate a separate terminal window to a single combined command statement:
rake db:drop db:create db:schema:load db:fixtures:load
Each time you need to rebuild and re-populate your tables, an up-arrow and return keypress will get that routine job done. If there's no conflict in SQL execute statements, and if you don't have further migrations to run while you're project is in development state, the SQL statements will execute perhaps better than twice as fast as the Ruby syntax. Our tables rebuild and re-populate in 20 seconds this way for example, whereas the Ruby syntax increases the process to well over 50 seconds. If you're waiting on that data to refresh to perform further work (especially many times), this makes a huge difference in workflow.

Resources