Rails Migration Order and Git - ruby-on-rails

Since doing migrations with rails + git is type of a pain, a new thorn has sprung..
Before I am doing any harm to my prod DB, would the following situation cause havoc? If so, how would I handle it?
I am working a long-term feature in a separate branch (feature/long-term). This feature is an overhaul of a lot of components and it will take awhile to complete. This feature has new migrations, which were migrated to the localhost DB.
meanwhile, I need to fix/add a migration to the prod system via another branch (feature/quick-fix). This has a migration file with date later than the feature/long-term migration.
The migrations of the quick-fix and the long-term have nothing to do with each other, they do not collide and work on separate tables. It doesn't matter what order they are run.
If I merge feature/quick-fix to master and db:migrate and in a few days/weeks merge feature/long-term the migration files order would be the long-term first.
Would this affect the DB in some way? (the prod DB is important, so I don't want to reset)

What you described is a very common development workflow (especially so in teams with more members) and it's perfectly safe for your production DB.
Rails, as of version 2.1, is smart enough to keep a list of all migrations ever run, instead of just the latest migration version run. This information is stored on a separate table aptly named schema_migrations.
So, if you push a new migration today, say 20140527_quick_fix.rb, and a month after that you push a new (but with an older timestamp) one 20140101_long_term_feature.rb, Rails will still know that the latter was never run in your production environment so during rake db:migrate it will process it, as you would expect. The newest won't be run again of course as the Rails would know that it has already been processed.
From the official documentation:
Rails versions 2.0 and prior used to create a table called schema_info when using migrations. This table contained the version of the schema as of the last applied migration.
Starting with Rails 2.1, the schema_info table is (automatically) replaced by the schema_migrations table, which contains the version numbers of all the migrations applied.
As a result, it is now possible to add migration files that are numbered lower than the current schema version: when migrating up, those never-applied “interleaved” migrations will be automatically applied, and when migrating down, never-applied “interleaved” migrations will be skipped.

Related

Can I manually create migrations for production in Heroku, Ruby on Rails?

I have created an application with Ruby and Rails. The thing is that when I was develpoing it, I had some problems with the migrations, because I created them but with a wrong syntax. What happened is that I deleted some of the files because sold migrations that didn´t work had the same name than the new ones, but in the middle of that I accidentally deleted some of the migrations (obviously after running rails db:migrate) that the project uses actually. So for instance, i have the Service table, which is related to the Reservation table because Service has reservation_id, but i don´t have the migration file that says AddReservationIdToService.
So now I want to use Heroku for production. the thing is that O have to change to postgresql because Heroku doesn't support sqlite. So i have to run the de:migrate again to create the tables and relationships in the new DB, but I need the files that I explained that I deleted. The question is:
Can I create the migrations manually, so when i run db:migrate for postgres the full structure of the database is created without lacking relations?
You don't really need the migrations to recreate the existing DB -- in fact it's not a good idea to try for a couple of reasons (including the missing migration file problem you encountered). You can simply run:
bin/rails db:schema:load
to populate a new database from the existing schema. If for some reason you haven't got a db/schema.rb checked under version control you can run:
bin/rails db:schema:dump
against the sqlite version to re-create a fresh schema file from the database.
You can also keep your migrations list tidy by occasionally zapping really old migrations, since all the cumulative changes are captured in the schema file.
Yes, you might create another couple of migration files.
Certify you have now the tables you wish locally with your sqlite. Draw these table in a piece of paper (or where it be the best fr you), then check this official API documentation of Rails.
Delete all migrations made before and create another according to the tables you drew.
The workflow is gonna be like:
1) "I need to create a table called Reservation, where is it shown on the documentation?"
2) "I need a table called Service, where is it shown on the documentation?
3) "I need to add a column with a foreign key to service named reservaton_id, how does this documentation says it?
For all this steps above, create the correspondent migration file as you normally have done.
The main difference here is not to run the migration locally. Instead, push your new version app to your heroku remote branch and there you run the migration, like:
heroku run rails db:migrate
Remember to not run this same migration locally because you already have these tables locally.
The last two advise is:
1) If your migration doesn't go as you expect, don't delete the migration file. Instead, run rails db:rollback and try again.
2) Keep tracking your migration files on the same branch of your version control.

schema.rb messed up due to migrations in other branches

Currently I'm working with a huge rails application and multiple branches with each a new feature for this application.
It happens a lot a feature will require migrations, which shouldn't be a problem until you merge it with master: schema.rb got updated with the information of your dev database!
To clarify:
1. Branch A has migration create_table_x
2. Branch B has migration create_table_y
3. Branch A adds another create_table_z and runs db:migrate
4. You want to merge Branch A with Master and you see table_x, table_y and table_z in the schema.rb of Branch A.
It's not an option to reset+seed the database before every migration in a branch or create a database per branch. Due to the huge size of 2 GB SQL data, it would not be workable.
My question:
Is it really required to keep schema.rb in the repository since it gets rebuilt every migration?
If so, is it possible to build schema off the migrations instead of database dump?
you should keep the schema within your repo. running the migration will fix merge conflicts within your schema.rb file for you.
my simple take on your questions
Is it Required? not really, but good practice.
"It's strongly recommended to check this file into your version
control system." -schema.rb
It is possible? yes but you may want to ask yourself if you really save any time by doing so, either manually or building the schema off your migrations elsewhere and pushing it in.
you ge tthe added benefit of using
rake db:schema:load
and
rake db:schema:dump
http://tbaggery.com/2010/10/24/reduce-your-rails-schema-conflicts.html
Keeping schema.rb in the repo is important, because you want a new developer (or your test environment) to be able to generate a new database just from schema.rb. When you have a lot of migrations, they may not all continue to run, especially if they rely on model classes that don't exist, or have changed, or have different validations in effect than when the migration was first run. When you run your tests, you want to dump and recreate your database from schema.rb, not by re-running all the migrations every time you run the full test suite.
If you're working on two different feature branches simultaneously, and changing the database structure in both, I think schema.rb is the least of your problems. If you rename or remove a column in branch B, branch A is going to break whenever it references the old column. If all they're doing is creating new tables, it's not as bad, but then you want schema.rb to be updated when you merge from master into A, so that A doesn't try to run the migration a second time and fail because the new table already exists.
If that doesn't answer your question, maybe you could explain your workflow a little more?
Fresh Temporary DB as a quick workaround
For example, do the following whenever you need pretty db/schema.rb on particular branch:
git checkout -- db/schema.rb
switch to the different development database, i.e. update config/database.yml
rake db:drop && rake db:setup
rake db:migrate
commit your pretty db/schema.rb
revert changes in config/database.yaml

How does Rails keep track of which migrations have run for a database?

According to Rails doc: http://guides.rubyonrails.org/migrations.html
"Active Record tracks which migrations have already been run so all you have to do is update your source and run rake db:migrate."
How does ActiveRecord actually do this? Where does Active Record store the data?
I suspect this might be stored in the database itself? In a table somewhere.
On my development machine, I ran all the migrations. Then I copied the production database over using mysqldump. Then I ran "rake db:migrate:status", it shows correctly the migrations that need to run on the production database.
I used to think that ActiveRecord keeps track of the last migration run using the timestamp. But I think this is not true because ActiveRecord correctly runs the "older" migrations merged in from another code branch.
Could someone with inside knowledge of this elaborate?
Thanks
Rails creates a table in your database called schema_migrations to keep track of which migrations have run.
The table contains a single column, version. When Rails runs a migration, it takes the leading digits in the migration's file name and inserts a row for that "version", indicating it has been run. If you roll back that migration, Rails will delete the corresponding row from schema_migrations.
For example, running a migration file named 20120620193144_create_users.rb will insert a new row with a version of 20120620193144 into the schema_migrations table.
You are free at any point to introduce migrations with earlier versions. Rails will always run any new migrations for which there is not a corresponding row in schema_migrations. The leading digits don't have to be a timestamp, you could call your migration 001_blah.rb. Earlier versions of Rails used this format, and used sequential numbering for newly generated migrations. Later versions have switched to timestamps to help prevent multiple developers from independently generating migrations with the same number.

Why is db:reset different from running all migrations?

In section Rails Database Migrations of Ruby on Rails Guides, there is one line saying that
The db:reset task will drop the database, recreate it and load the current
schema into it. This is not the same as running all the migrations.
Can anyone tell me where exactly they are different and why it is more error prone to replay the migration history?
I'm fairly new to Ruby on Rails. Thanks in advance.
The schema file contains the current structure of your database. When you load it, you are guaranteed to have the exact schema in your db that is in the file. Migrations were designed to make incremental changes in the database. You may add a table, then some columns, and then remove the table in three separate migrations. There's no need to go through all this when the schema already knows that the table no longer exists.
On why they are error prone, I'm not totally sure. The one thing I can think of is that migrations can be used to make changes to data and not just the structure.
Running rake db:reset will rebuild the structure of your database from schema.db, which essentially works as a cached version of your migrated database structure. Running all your migrations, on the other hand, applies the migrations one by one, which may include arbitrary code to accomodate for changes to the database (e.g. prepopulate an added counter cache column).
It can be more error prone to replay the migration history, since it is the product of changes to both the structure and data of the database. If the developers haven't been careful, it might not apply cleanly to a fresh environment (e.g. the migration assumes an old version of a model). On the other hand, schema.db can get out of sync if you edit a migration once you've migrated (a useful trick to avoid migration explosion during development). In that case, you need to run rake db:migrate:reset.

Rebase Rails migrations in a long running project

In which I mean "rebasing" in the dictionary, rather than git definition...
I have a large, long running Rails project that has about 250 migrations, it's getting a touch unwieldy to manage all of these.
That said, I do need a base from which to purge and rebuild my database when running tests. So the data contained in these is important.
Does any one have any strategies for say, dumping the schema at a set point - archiving off all the old migrations and starting afresh with new migrations.
Obviously I can use rake schema:dump - but really I need a way that db:migrate will load the schema first and then start running the rest of the migrations.
I would like to keep using migrations as they're very useful in development, however, there's no way I'm going back and editing a migration from 2007 so it seems silly to keep it.
In general, you don't need to clean up old migrations. If you're running db:migrate from scratch (no existing db), Rails uses db/schema.rb to create the tables instead of running every migration. Otherwise, it only runs the migrations required to upgrade from the current schema to the latest.
If you still want to combine migrations up to a given point into a single one, you could try to:
migrate from scratch up to the targeted schema using rake db:migrate VERSION=xxx
dump the schema using rake db:schema:dump
remove the migrations from the beginning up to version xxx and create a single new migration using the contents of db/schema.rb (put create_table and add_index statements into the self.up method of the new migration).
Make sure to choose one of the old migration version numbers for your aggregated new migration; otherwise, Rails would try to apply that migration on your production server (which would wipe your existing data, since the create_table statements use :force⇒true).
Anyway, I wouldn't recommend to do this since Rails usually handles migrations well itself. But if you still want to, make sure to double check everything and try locally first before you risk data loss on your production server.
To automate the merging (or squashing) of migrations, you could use the Squasher gem
Simply install
gem install squasher
And run with a date, and migrations before that date will be merged:
squasher 2016 # => Will merge all migration created before 2016
More details in the README
In addition to the answer provided (which well indicates how to consolidate your volume of migrations), you indicate a concern to purge data (which I assume is manually added after fixtures populate your tables); which infers you're depending on refreshing an initial data state. Some projects indeed require intensive refinement of base data, reconstruction, and re-population of tables. Ours heavily depends on repetitive execution of these operations, and I've found that if you can reduce your schema entirely to SQL execute statements, your tables will rebuild far faster than they will from Ruby syntax.
A trivial further help in rebuilding your tables is to dedicate a separate terminal window to a single combined command statement:
rake db:drop db:create db:schema:load db:fixtures:load
Each time you need to rebuild and re-populate your tables, an up-arrow and return keypress will get that routine job done. If there's no conflict in SQL execute statements, and if you don't have further migrations to run while you're project is in development state, the SQL statements will execute perhaps better than twice as fast as the Ruby syntax. Our tables rebuild and re-populate in 20 seconds this way for example, whereas the Ruby syntax increases the process to well over 50 seconds. If you're waiting on that data to refresh to perform further work (especially many times), this makes a huge difference in workflow.

Resources