Using Rails and Git, if I run a migration in one branch, the table is now in the database. Now if I check out a branch in which I don't want that migration, the table is still committed to the schema.rb, because it gets dumped from the database.
Is there a way to prevent the schema from getting dumped from the database? For example is it possible to maybe have the schema be generated by running the migrations instead of by dumping the tables from the DB? It's really annoying to me, and to the other people in my company.
When you switch branches, either create a brand new DB and load an empty schema, or rollback migrations to the common anscestor, then migrate up again once you're in the new branch.
There are some tools and scripts that can help with this, but I haven't used any of them personally so I can't vouch for them. Here's a couple that I found after googling:
https://github.com/fhwang/migreazy
http://www.mobalean.com/blog/2013/03/03/keep-database-in-sync-with-current-branch
The schema file changing is only the symptom of the true problem: rails is dynamically building the properties of your models from the schema of the database. Sweeping the schema.rb file under the rug will be insufficient because rails will still be making assumptions about your models based on a database that has the incorrect schema for the branch you're currently working in.
Related
I have been developing a project in Ruby on Rails. During the development I have generated tons of migration files for my project. Sometimes I have added and deleted columns from different tables.
Is there a way that I could consolidate all the migrations from multiple files into a single file?
TL;DR
What you need isn't a consolidated set of migrations; it's a single schema file and an optional seeds.rb file. Rails generally maintains the schema automagically when you run migrations, so you already have most of what you should need with the possible exception of seed data as described below.
Use the Schema, Not Migrations
In general, you shouldn't be maintaining a large pool of migrations. Instead, you should periodically clear out your migrations, and use schema.rb or schema.sql to (re)create a database. The Rails guides specifically state:
There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema...Because schema dumps are the authoritative source for your database schema, it is strongly recommended that you check them into source control.
You should therefore be using bin/rails db:schema:load rather than running migrations, or run the associated Rake task on older versions of Rails.
Data Migrations
While you can use migrations to fix or munge data related to a recent schema change, data migrations (if used at all) should be temporary artifacts. Data migrations are almost never idempotent, so you shouldn't be maintaining data migrations long-term. The guide says:
Some people use migrations to add data to the database...However, Rails has a 'seeds' feature that should be used for seeding a database with initial data. It's a really simple feature: just fill up db/seeds.rb with some Ruby code, and run rake db:seed...This is generally a much cleaner way to set up the database of a blank application.
Database seed data should be loaded with bin/rails db:seed (or the associated Rake task) rather than maintaining the data in migrations.
There is a gem that purports to do exactly what you describe in the question - check out Squasher.
From the README:
"Squasher compresses old migrations in a Rails application. ... Squasher removes all the migrations and creates a single migration with the final database state of the specified date (the new migration will look like a schema)."
You'll have to do the merge manually.
But if you want only a single file, there is db/schema.rb. It contains a snapshot of current database schema. You can load it directly in database if you want.
Since doing migrations with rails + git is type of a pain, a new thorn has sprung..
Before I am doing any harm to my prod DB, would the following situation cause havoc? If so, how would I handle it?
I am working a long-term feature in a separate branch (feature/long-term). This feature is an overhaul of a lot of components and it will take awhile to complete. This feature has new migrations, which were migrated to the localhost DB.
meanwhile, I need to fix/add a migration to the prod system via another branch (feature/quick-fix). This has a migration file with date later than the feature/long-term migration.
The migrations of the quick-fix and the long-term have nothing to do with each other, they do not collide and work on separate tables. It doesn't matter what order they are run.
If I merge feature/quick-fix to master and db:migrate and in a few days/weeks merge feature/long-term the migration files order would be the long-term first.
Would this affect the DB in some way? (the prod DB is important, so I don't want to reset)
What you described is a very common development workflow (especially so in teams with more members) and it's perfectly safe for your production DB.
Rails, as of version 2.1, is smart enough to keep a list of all migrations ever run, instead of just the latest migration version run. This information is stored on a separate table aptly named schema_migrations.
So, if you push a new migration today, say 20140527_quick_fix.rb, and a month after that you push a new (but with an older timestamp) one 20140101_long_term_feature.rb, Rails will still know that the latter was never run in your production environment so during rake db:migrate it will process it, as you would expect. The newest won't be run again of course as the Rails would know that it has already been processed.
From the official documentation:
Rails versions 2.0 and prior used to create a table called schema_info when using migrations. This table contained the version of the schema as of the last applied migration.
Starting with Rails 2.1, the schema_info table is (automatically) replaced by the schema_migrations table, which contains the version numbers of all the migrations applied.
As a result, it is now possible to add migration files that are numbered lower than the current schema version: when migrating up, those never-applied “interleaved” migrations will be automatically applied, and when migrating down, never-applied “interleaved” migrations will be skipped.
I came across a problem where I was working on two branches on a rails project and each project has a migration to add a column. At the time, rake db:migrate:reset cause a problem and I solely relied on my schema.rb to correctly represent the state of my database. At one point, I came across a problem where a column added by branch A got into the schema of branch B. Since migrate:reset was not an option, I resorted to manually editing the schema file to. I committed this change that basically deleted the column from branch A that I did not need in branch B's schema.rb.
Problem came after I have merged branch A into master. When I tried to rebase branch B to master, I still had that commit in B to delete the column (which now has become relevant because it is in master) in the schema file. Git did not see a conflict for this and auto-merged it. At the end of my rebase, I found that my schema is inconsistent with what I have in master.
My fix is to edit the schema file again and manually add the previously deleted column back to the schema file. My question is: Is this considered unconventional? dangerous? hacky?
Right now it involves one column but if this involved multiple column deletions/additions the (dangerous?) solution could lead to more problems and db/schema.rb inconsistency.
It is generally considered a bad practice to edit your schema.rb file.
According to the Rails Guide on Migrations:
Migrations, mighty as they may be, are not the authoritative source for your database schema. That role falls to either db/schema.rb or an SQL file which Active Record generates by examining the database. They are not designed to be edited, they just represent the current state of the database.
schema.rb gets updated every time you run a new migration:
Note that running the db:migrate also invokes the db:schema:dump task, which will update your db/schema.rb file to match the structure of your database.
I'd recommend just spending some time to sort things out and get the schema.rb file back on track and correct up to the latest set of migrations.
In section Rails Database Migrations of Ruby on Rails Guides, there is one line saying that
The db:reset task will drop the database, recreate it and load the current
schema into it. This is not the same as running all the migrations.
Can anyone tell me where exactly they are different and why it is more error prone to replay the migration history?
I'm fairly new to Ruby on Rails. Thanks in advance.
The schema file contains the current structure of your database. When you load it, you are guaranteed to have the exact schema in your db that is in the file. Migrations were designed to make incremental changes in the database. You may add a table, then some columns, and then remove the table in three separate migrations. There's no need to go through all this when the schema already knows that the table no longer exists.
On why they are error prone, I'm not totally sure. The one thing I can think of is that migrations can be used to make changes to data and not just the structure.
Running rake db:reset will rebuild the structure of your database from schema.db, which essentially works as a cached version of your migrated database structure. Running all your migrations, on the other hand, applies the migrations one by one, which may include arbitrary code to accomodate for changes to the database (e.g. prepopulate an added counter cache column).
It can be more error prone to replay the migration history, since it is the product of changes to both the structure and data of the database. If the developers haven't been careful, it might not apply cleanly to a fresh environment (e.g. the migration assumes an old version of a model). On the other hand, schema.db can get out of sync if you edit a migration once you've migrated (a useful trick to avoid migration explosion during development). In that case, you need to run rake db:migrate:reset.
I have Rails app, and every once in a while, when I bring new developer onboard they exclaim that they should be able to produce the current DB schema in their dev environment by running the whole history of the migrations. I personally don't think that migrations is the authoritative source for your schema. Right now what we do is load a production copy of the DB, with the current schema, onto the dev machine. And, from there, the schema can be maintained via incremental migrations.
So my question are:
What is the authoritative source of your schema on a Rails project?
What is now considered the best-practice way to maintain your DB schema?
I do not consider migrations to be the authoritative source for your schema. Migrations are extremely powerful but optional. Some developers use alternative workflows especially in environments where DBA's insist on strong referential integrity and DBMS-enforced constraints. I suggest looking at the official RoR Guide on Migrations for more information. The db/schema.db (or db/{env}_structure.sql) file is the authoritative source for your schema. Many developers will purge old migrations as projects get older so running each migration will not necessarily produce a working database. It also takes a long time to run through a hundreds of migrations. Rails uses schema.db (or the sql dump file) to build the test database and of course when running rake db:setup which is the recommended way of creating a new database for your application.
Bottom like is that rake db:setup should always produce a working database regardless of migrations. Developers can use this to create new environments and Rails uses this to run your tests.
http://guides.rubyonrails.org/migrations.html#schema-dumps-and-source-control
Normally, running the succession of all migrations should produce your actual DB schema (if it's not the case, then you didn't use your migrations correctly*).
Another way of doing is to copy over the schema.rb (created/updated when you migrate), which is used by rake db:setup and should produce an exact copy of the schema you have in production (unless, again, you didn't use migrations correctly*).
Then, if you need "sample data", you can insert it using the db/seeds.rb file, which contains ruby code that can access your models, and thus create and persist new entities & so on...
*: There are cases where you can't put all your database changes in migrations in a "usual" way (it is uncommon and should be avoided if possible)... These should be included in migrations however (in plain SQL execution statements), or the changes would need to be made manually on the dev DB as well... And then using a snapshot of prod. is sometimes more convenient. But again, I would discourage doing so.