schema.rb messed up due to migrations in other branches - ruby-on-rails

Currently I'm working with a huge rails application and multiple branches with each a new feature for this application.
It happens a lot a feature will require migrations, which shouldn't be a problem until you merge it with master: schema.rb got updated with the information of your dev database!
To clarify:
1. Branch A has migration create_table_x
2. Branch B has migration create_table_y
3. Branch A adds another create_table_z and runs db:migrate
4. You want to merge Branch A with Master and you see table_x, table_y and table_z in the schema.rb of Branch A.
It's not an option to reset+seed the database before every migration in a branch or create a database per branch. Due to the huge size of 2 GB SQL data, it would not be workable.
My question:
Is it really required to keep schema.rb in the repository since it gets rebuilt every migration?
If so, is it possible to build schema off the migrations instead of database dump?

you should keep the schema within your repo. running the migration will fix merge conflicts within your schema.rb file for you.
my simple take on your questions
Is it Required? not really, but good practice.
"It's strongly recommended to check this file into your version
control system." -schema.rb
It is possible? yes but you may want to ask yourself if you really save any time by doing so, either manually or building the schema off your migrations elsewhere and pushing it in.
you ge tthe added benefit of using
rake db:schema:load
and
rake db:schema:dump
http://tbaggery.com/2010/10/24/reduce-your-rails-schema-conflicts.html

Keeping schema.rb in the repo is important, because you want a new developer (or your test environment) to be able to generate a new database just from schema.rb. When you have a lot of migrations, they may not all continue to run, especially if they rely on model classes that don't exist, or have changed, or have different validations in effect than when the migration was first run. When you run your tests, you want to dump and recreate your database from schema.rb, not by re-running all the migrations every time you run the full test suite.
If you're working on two different feature branches simultaneously, and changing the database structure in both, I think schema.rb is the least of your problems. If you rename or remove a column in branch B, branch A is going to break whenever it references the old column. If all they're doing is creating new tables, it's not as bad, but then you want schema.rb to be updated when you merge from master into A, so that A doesn't try to run the migration a second time and fail because the new table already exists.
If that doesn't answer your question, maybe you could explain your workflow a little more?

Fresh Temporary DB as a quick workaround
For example, do the following whenever you need pretty db/schema.rb on particular branch:
git checkout -- db/schema.rb
switch to the different development database, i.e. update config/database.yml
rake db:drop && rake db:setup
rake db:migrate
commit your pretty db/schema.rb
revert changes in config/database.yaml

Related

After rails migration, resulting schema does not match migrations. Lingering database state?

Rails 6.1.4
Ruby 2.7
Postgresql 14
A dozen or so migrations, one schema.rb file.
I edited a migration, but did not change the migration id. The result is super weird behavior and I wanted to get input on the best approach.
After I incorrectly edited my migrations, I commited and pushed my feature. A team member pulled the feature and ran the migration on their machine. After they did, no matter the branch, the schema would include the changes I added when I originally modified it. But if they were on a different branch than mine, the actual migration files did not have those changes!!
I tried reverting my commit history to pre-migration editing with no luck. This is how I know it's a db issue, albeit caused from git.
So basically, after every migration, a specific model in the schema gets 4 added columns. No matter what, and it's not in a migration file on rails.
And thats the issue.
My question:
How would you go about solving this without resetting the db?
My current approach/best guess:
Lingering state in the db gets generated in the schema.rb file.
If its not in a migration, the only place schema.rb can get the info is from db.
How do I reset the state on stuff in general?
Either rebuild from scratch, or 'install a copy'. From scratch is not an option :)
If I wanted to install a copy, would it be a wise path to:
Revert changes from any migrations after pulling, delete branch.
Pull down fresh copy of branch
DO NOT MIGRATE - Instead, rails db:schema:load
This should copy over the db structure and effectively overwrite any lingering ghost state.
Rails db:migrate -> this will update migrations,
if you did everything right only the schema version number should change
Now things are synced, continue to db:migrate as normal moving forward.
I did this on my local machine and was successful, but I am curious..
Am I understanding this process correctly? Is there an easier way?

Edit commited migration in rails

In my project I have migrations like:
1. 20141122184434_add_column_a_to_b
2. 20141208235304_delete_data_from_b
3. 20141217011359_add_table
4. 20141218183503_remove_column_b_from_c
And they are already commited for my develop branch (but not on master and production). After few weeks (and adding more migration) it was found that migration B contains error, and it deletes important data, so we can't merge it with master.
Is there any clean way to edit migration 20141208235304_delete_data_from_b? I know I can just reroll, edit, and migrate again, but how will it work for other developers after I commit my changes to develop branch?
If the migration is recoverable (which means it will not destruct important data), then you can simply add a new migration to fix the issue.
Otherwise, if you discovered that one of the migrations is causing any issue, therefore the only possible solution is to fix the migration before it is ever applied in production. In that case, fix the migration, but then whoever applied such migration locally must:
reload the schema completely (rake db:schema:load) or
revert that migration (rake db:migrate:redo VERSION=20141208235304)
One more advice. A migration is designed to change the database schema, NOT to delete/insert or manipulate data. For such, you should use a rake task.

Is it considered safe to manually edit schema.rb in rails

I came across a problem where I was working on two branches on a rails project and each project has a migration to add a column. At the time, rake db:migrate:reset cause a problem and I solely relied on my schema.rb to correctly represent the state of my database. At one point, I came across a problem where a column added by branch A got into the schema of branch B. Since migrate:reset was not an option, I resorted to manually editing the schema file to. I committed this change that basically deleted the column from branch A that I did not need in branch B's schema.rb.
Problem came after I have merged branch A into master. When I tried to rebase branch B to master, I still had that commit in B to delete the column (which now has become relevant because it is in master) in the schema file. Git did not see a conflict for this and auto-merged it. At the end of my rebase, I found that my schema is inconsistent with what I have in master.
My fix is to edit the schema file again and manually add the previously deleted column back to the schema file. My question is: Is this considered unconventional? dangerous? hacky?
Right now it involves one column but if this involved multiple column deletions/additions the (dangerous?) solution could lead to more problems and db/schema.rb inconsistency.
It is generally considered a bad practice to edit your schema.rb file.
According to the Rails Guide on Migrations:
Migrations, mighty as they may be, are not the authoritative source for your database schema. That role falls to either db/schema.rb or an SQL file which Active Record generates by examining the database. They are not designed to be edited, they just represent the current state of the database.
schema.rb gets updated every time you run a new migration:
Note that running the db:migrate also invokes the db:schema:dump task, which will update your db/schema.rb file to match the structure of your database.
I'd recommend just spending some time to sort things out and get the schema.rb file back on track and correct up to the latest set of migrations.

Is it a good idea to purge old Rails migration files?

I have been running a big Rails application for over 2 years and, day by day, my ActiveRecord migration folder has been growing up to over 150 files.
There are very old models, no longer available in the application, still referenced in the migrations. I was thinking to remove them.
What do you think? Do you usually purge old migrations from your codebase?
The Rails 4 Way page 177:
Sebastian says...
A little-known fact is that you can remove old migration files (while
still keeping newer ones) to keep the db/migrate folder to a
manageable size. You can move the older migrations to a
db/archived_migrations folder or something like that. Once you do trim
the size of your migrations folder, use the rake db:reset task to
(re-)create your database from db/schema.rb and load the seeds into
your current environment.
Once I hit a major site release, I'll roll the migrations into one and start fresh. I feel dirty once the migration version numbers get up around 75.
I occasionally purge all migrations, which have already been applied in production and I see at least 2 reasons for this:
More manageable folder: it is easier to spot a new migration.
Cleaner text search results: global text search within a project does not lead to tons of useless matches because of some 3-year-old migration when someone added or removed some column which anyway does not exist anymore.
They are relatively small, so I would choose to keep them, just for the record.
You should write your migrations without referencing models, or other parts of application, because they'll come back to you haunting ;)
Check out these guidelines:
http://guides.rubyonrails.org/migrations.html#using-models-in-your-migrations
Personally I like to keep things tidy in the migrations files. I think once you have pushed all your changes into prod you should really look at archiving the migrations. The only difficulty I have faced with this is that when Travis runs it runs a db:migrate, so these are the steps I have used:
Move historic migrations from /db/migrate/ to /db/archive/release-x.y/
Create a new migration file manually using the version number from the last run migration in the /db/archive/release-x.y directory and change the description to something like from_previous_version. Using the old version number means that it won't run on your prod machine and mess up.
Copy the schema.rb contents from inside the ActiveRecord::Schema.define(version: 20141010044951) do section and paste into the change method of your from_previous_version changelog
Check all that in and Robert should be your parent's brother.
The only other consideration would be if your migrations create any data (my test scenarios contain all their own data so I don't have this issue)
Why? Unless there is some kind of problem with disk space, I don't see a good reason for deleting them. I guess if you are absolutely certain that you are never going to roll back anything ever again, than you can. However, it seems like saving a few KB of disk space to do this wouldn't be worth it. Also, if you just want to delete the migrations that refer to old models, you have to look through them all by hand to make sure you don't delete anything that is still used in your app. Lots of effort for little gain, to me.
See http://edgeguides.rubyonrails.org/active_record_migrations.html#schema-dumping-and-you
Migrations are not a representation of the database: either structure.sql or schema.rb is. Migrations are also not a good place for setting/initializing data. db/seeds or a rake task are better for that kind of task.
So what are migrations? In my opinion they are instructions for how to change the database schema - either forwards or backwards (via a rollback). Unless there is a problem, they should be run only in the following cases:
On my local development machine as a way to test the migration itself and write the schema/structure file.
On colleague developer machines as a way to change the schema without dropping the database.
On production machines as a way to change the schema without dropping the database.
Once run they should be irrelevant. Of course mistakes happen, so you definitely want to keep migrations around for a few months in case you need to rollback.
CI environments do not ever need to run migrations. It slows down your CI environment and is error prone (just like the Rails guide says). Since your test environments only have ephemeral data, you should instead be using rake db:setup, which will load from the schema.rb/structure.sql and completely ignore your migration files.
If you're using source control, there is no benefit in keeping old migrations around; they are part of the source history. It might make sense to put them in an archive folder if that's your cup of coffee.
With that all being said, I strongly think it makes sense to purge old migrations, for the following reasons:
They could contain code that is so old it will no longer run (like if you removed a model). This creates a trap for other developers who want to run rake db:migrate.
They will slow down grep-like tasks and are irrelevant past a certain age.
Why are they irrelevant? Once more for two reasons: the history is stored in your source control and the actual database structure is stored in structure.sql/schema.rb. My rule of thumb is that migrations older than about 12 months are completely irrelevant. I delete them. If there were some reason why I wanted to rollback a migration older than that I'm confident that the database has changed enough in that time to warrant writing a new migration to perform that task.
So how do you get rid of the migrations? These are the steps I follow:
Delete the migration files
Write a rake task to delete their corresponding rows in the schema_migrations table of your database.
Run rake db:migrate to regenerate structure.sql/schema.rb.
Validate that the only thing changed in structure.sql/schema.rb is removed lines corresponding to each of the migrations you deleted.
Deploy, then run the rake task from step 2 on production.
Make sure other developers run the rake task from step 2 on their machines.
The second item is necessary to keep schema/structure accurate, which, again, is the only thing that actually matters here.
It's fine to remove old migrations once you're comfortable they won't be needed. The purpose of migrations is to have a tool for making and rolling back database changes. Once the changes have been made and in production for a couple of months, odds are you're unlikely to need them again. I find that after a while they're just cruft that clutters up your repo, searches, and file navigation.
Some people will run the migrations from scratch to reload their dev database, but that's not really what they're intended for. You can use rake db:schema:load to load the latest schema, and rake db:seed to populate it with seed data. rake db:reset does both for you. If you've got database extensions that can't be dumped to schema.rb then you can use the sql schema format for ActiveRecord and run rake db:structure:load instead.
Yes. I guess if you have completely removed any model and related table also from database, then it is worth to put it in migration. If model reference in migration does not depend on any other thing, then you can delete it. Although that migration is never going to run again as it has already run and even if you don't delete it from existing migration, then whenever you will migrate database fresh, it cause a problem.
So better it to remove that reference from migration. And refactore/minimize migrations to one or two file before big release to live database.
I agree, no value in 100+ migrations, the history is a mess, there is no easy way of tracking history on a single table and it adds clutter to your file finding. Simply Muda IMO :)
Here's a 3-step guide to squash all migrations into identical schema as production:
Step1: schema from production
# launch rails console in production
stream = StringIO.new
ActiveRecord::SchemaDumper.dump(ActiveRecord::Base.connection, stream); nil
stream.rewind
puts stream.read
This is copy-pasteable to migrations, minus the obvious header
Step 2: making the migrations without it being run in production
This is important. Use the last migration and change it's name and content. ActiveRecord stors the datetime number in it's schema_migrations table so it knows what it has run and not. Reuse the last and it'll think it has already run.
Example: rename 20161202212203_this_is_the_last_migration -> 20161202212203_schema_of_20161203.rb
And put the schema there.
Step 3: verify and troubleshoot
Locally, rake db:drop, rake db:create, rake db:migrate
Verify that schema is identical. One issue we encountered was datetime "now()" in schema, here's the best solution I could find for that: https://stackoverflow.com/a/40840867/252799

Why does schema.rb change (in the eyes of Git) when just running rake db:migrate?

This is a little general I know, but it's been bugging the hell out of me. I've been working on lots of rails projects remotely with Git and every time I do a git pull and see that there is some sort of data change (migration, or schema.rb change) I do a rake db:migrate.
These generally run fine and I can continue working. But if you do a git pull and then git status, your working directory is clean (obviously) then do a rake db:migrate (obviously when there are changes) and another git status and all the sudden your db/schema.rb has changed. I have been just doing a git checkout immediately to reset back to the latest committed version of the schema.rb file, but why should this be necessary?! What is rails doing? Updating a timestamp? I can't seem to figure out what the diff is but maybe I'm just missing something?
The schema enables machines to run rake db:schema:load when being setup for the first time instead of having to run the migrations, which can go out of date if models are renamed or removed, etc. It's supposed to update after a migration, and you always want to latest version checked into source control.
The order of attributes in the dump reflects the order of the attributes in the database, and that can get out of sync if one person has been playing around locally with migrations, running them forwards and backwards manually, and editing things to get them just so. It's possible to create a state where the attribute order in the pusher's schema.rb is different from what everyone else will see when they run the migrations.
If it's easy to recreate your development data, just rebuild the database from schema.rb - then everyone is back in sync (but remember you can't reload the data from a SQL dump that also creates the table - that will recreate the problem. it has to be a data-only dump/load). In the worst case, you can create a migration to delete the column and another to re-add it.
schema.rb reflects your database schema so when you migrate(with changes) it follow that your schema also changes to reflect your db change. I usually add schema.rb to our gitignore, along with database.yml (probably because instead of using schema:load like the one below, i usually do a sql dump when cloning an existing app - but that's just me)

Resources