Writing database migration to reverse complex migration - ruby-on-rails

I have a pretty old migration on a legacy app by a friend that contains this snippet:
class MakeChangesToProductionDbToMatchWhatItShouldBe < ActiveRecord::Migration
def self.up
# For some reason ActiveRecord::Base.connection.tables.sort().each blows up
['adjustments',
'accounts',
...## more rows of classes here ###...
'product_types'].each do |table|
t = table.singularize.camelize.constantize.new
if t.attributes.include?('created_at')
change_column( table.to_sym, :created_at, :datetime, :null => false ) rescue puts "#{table} doesnt have created_at"
end
if t.attributes.include?('updated_at')
change_column( table.to_sym, :updated_at, :datetime, :null => false ) rescue puts "#{table} doesnt have updated_at"
end
end
This old migration is now conflicting with a new migration I wrote to remove two of the tables mentioned in this long list, which is now causing any deployment to error upon rake db:migrate.
What's the correct kind of migration or down action to write to address this migration and get db:migrate working again?

There are a few different best practices that can help, but at the end of the day there's no good way to always upgrade a database from an arbitrary point without stepping through the codebase along as you run migrations (speaking of which, why is there not already a rake task to do this?).
Always include a migration-namespaced copy of the models you're working on. Example below.
When building a database from scratch, do not run migrations…use db:schema:load which will re-create the last snapshot of the database.
Don't give your migrations ridiculous and aggression fueled titles like MakeChangesToProductionDbToMatchWhatItShouldBe.
Avoid making assumptions, when writing migrations, about the environment they will be run in. This includes specifying table names, database drivers, environment variables, etc.
Write down actions when you write up actions whenever a down action is feasible. It's usually much easier (especially on esoteric or complex migrations) when the series of transformations is fresh in your head.
For this specific case, there's an argument to be made for declaring “Migration Bankruptcy” — clearing out some or all existing migrations (or refactoring and coalescing into a new one) to achieve the desired database state. When you do this you are no longer backwards compatible so it is not to be taken lightly, but there are times it is the appropriate move.

Related

Best practise to populate somedata in a existing database in rails

Problem statement:
Let's say I created a new column column_1 in a table table_1 through rails migration. And now I want to populate the data into column_1 doing some computation but it's a one time task.
Approaches:
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
or is there any other best practise.
Generally, we use Rails Migrations to migrate our application’s schema, and Rake tasks to migrate our production data.
There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment.
In all other cases, using a Rake task provides us with more flexibility, and less maintenance.
See detailed explanation here
Even though there are different approaches, it is better to do that in in Rake or Migrations but not seed.
Rake tasks:
Rake tasks are generally preferred to do maintenance or data migration jobs over a collection of data.
Example of rake:
lib/tasks/addData.rake
desc "TODO"
task :my_task1 => :environment do
User.update_all(status: 'active')
end
Example of doing it in migration:
If you add status field to user:
class AddStatusToUser < ActiveRecord::Migration
def up
add_column :users, :status, :string
User.update_all(status: 'active')
end
def down
remove_column :users, :status
end
end
Why Not seeds:
Seeds are generally like database Dump, seed files are used to fill the data into Database for the first time when the application is started. So that the application is kickstarted with different data.
Examples are Application settings, Application Configurations, Admin users etc.
This is actually an opinion based question
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
It depends on how long it's gonna take.
1. Migration:
If it's one-time task go with migration but only if the task is going to run for few minutes.
2. Rake Task:
If the task is one-time but it might take a few hours it should be a rake task, not a migration.
One time task ? Definitely will go with migration as it will only executed when the migration take place. And won't be executed afterwards
Rake task is considered as too much, since you will need it to run only once. The task script remain there until you decide to remove it. But, it is totally doesn't make sense ( to build something which will be removed in the near future, unless, for testing purposes in some special cases )
If you're asking about best practices, people will tend to have different approach. It will depend on each case that we are trying to solve. But, in common, there are some cases which are shareable.

Running rails migration moves indexes around in schema

The app in question was originally created as a Rails 4 app, and later upgraded to Rails 5.
I will create a rails migration that might look like this:
class AddPubliclyVisibleToGcodeMacros < ActiveRecord::Migration[5.0]
def change
add_column :gcode_macros, :publicly_visible, :boolean, default: false
end
end
And when I run it, I expect the schema to have a few lines updated, specifically adding t.boolean "publicly_visible", default: false
to the gcode_macros table.
However, running the migration creates a LOT of changes to my schema, mostly just moving indexes from outside a create_table block, into it.
Im quite confused over whats going on here. This isn't something that happened all of a sudden, I've just been working around it for a while now.
Any help would be greatly appreciated!
The answer is that this is just how the schema dumper works in Rails. It takes the schema from the database, absolutely irrelevant from how you created the structure in the first place, whether with migrations or direct sql statements.
So when you create a new migration or change anything in the db a new schema is dumped based on the database.
Edit
I should add that schema.rb is not updated if the db is changed directly with sql statements, that is not through a migration. Only when either
rake db:migrate
or ...
rake db:schema:dump
are run is the schema.rb file updated.

How to know that a rake task has been run in rails?

I've a migration in my Rails app, that I'd like to run only if a particular rake task has been run, otherwise I'll lose a bunch of data. Following is something that I'd like to do:
if has_rake_task_been_run?
remove_column :transactions, :paid_by
end
Currently, I couldn't find anyway, instead of assuring this thing manually. Is there any work around for it?
Using rake task for data migration is an extremely risky idea. Couple of reasons not to do this:
Even if you manage to find out whether your rake task has finished or not, your migration will still be marked as completed and you won't be able to replay it. Only way around is raising an exception in your migration.
No, you won't be able to rollback that migration neither. If rake task finishes after the migration has run, rollback will try to add already existing column.
Setting up your database from scratch by new devs will become painful as hell, as they will need to know which rake tasks are to be run when. Not to mentioned that rake db:migrate executes all migrations.
You're polluting your rake task list with non-reusable tasks
It seems that what you're doing is just a regular data migration, so all the stuff done by your rake task should be in fact part of your migration. That will even allow you to make a reversible data migration (in majority of cases).
Note however that data migrations are not that simple as regular scheme-only migrations. Because your migration should be completely independent on your code (as they are to work in the future, even when migrated model is completely removed from your codebase), so it is a common practice to redefine the models you are about to use in your migrations (only the bits required fro the migration). This is not that simple as it sounds, unfortunately, and honestly I am still looking for a perfect solution to that. The best I've seen so far is simple (I'm assuming that paid_by used to be string and you changed it paid_by_id, which references the user):
class YOURMIGRATIONNAME < ActiveRecord::Migration
class Transaction < ActiveRecord::Base
belongs_to :paid_by, class_name: "User"
end
class User < ActiveRecord::Base
end
def up
add_column :transaction, :paid_by_id, :integer
Transaction.transaction do # for speed
Transaction.find_each do |t|
t.paid_by_id = User.find_by(username: t[:paid_by])
t.save! # Always banged save in migration!
end
end
remove_column :paid_by
end
def down
add_column :transaction, :paid_by, :string
Transactions.transaction do
Transaction.find_each do |t|
t[:paid_by] = t.paid_by && t.paid_by.username
t.save!
end
end
remove_column :transactions, :paid_by_id
end
The only downfall of using the code above is that it won't work well if any of those models is using STI (I've made that mistake once, took a while to find out what's wrong). The work around is to define it outside of the migration class, but then those classes are available across all migrations and can be affected with your actual model code (especially in production when all the models are preloaded). In short, data migration with STI is something I am still looking into at the moment.
In case somebody would come here, We succcessfully used after_party rails library. With a simple mechanism, the library maintains rake tasks that have been executed and it becomes easy to perform migration tasks.

How does the migration work in Rails?

This is a newbie question. Is the migration (rake db:migrate) feature something that one would using during development or it's strictly a production tool for bringing the database up to date?
Ideally you want to use migrations only during development, and then load schema and seed the database in production. In reality, they'll allow you to make some changes and then deploy to production without any harm done.
Migrations allow you to work in iterations even on your database. You don't have to worry about forgetting to add something. When you start, just create the table as you think it's right, and you can fix it with another migration later. That's basically the idea. It takes away the one db script rules them all kind of thing.
A little example, if you have a User model with username and password and you need to add an email field, simply do this
rails generate migration AddEmailToUser # this is a convention, but you can name it however you want
class AddEmailToUser < ActiveRecord::Migration
def change
add_column :users, :email, :string
end
end
the change method will work both ways, when you apply the migration, and also when you need to revert it. It's kind of a neat Rails 3.1 magic.
The old version of migrations would look like this
class AddEmailToUser < ActiveRecord::Migration
def up
add_column :users, :email, :string
end
def down
remove_column :users, :email
end
end
Once you've added the migration, just run rake db:migrate and everything should work just fine. A big advantage of migrations is that if you manually mess up your database, you can easily just do
rake db:drop
rake db:create
rake db:migrate
or
rake db:migrate:reset # this might not work if you messed up your migrations
And you have the correct version of the database created
Migrations keep track of changes in your database schema. All changes with it (renaming column, changing tables, adding indexes etc.) should be done via migration. Thanks to that is really easy to deploy changes across multiple production servers.

Use db:migrate:redo while properly handling exceptions from schema changes

I have a migration that's breaking in the middle of a couple of schema changes. When it breaks an exception is thrown and rake db:migrate exits, leaving my database in a half-migrated state.
How can I set it up so that the migration automatically reverts just the changes that have run so far? I'd like to do so globally when in development mode. Surely someone out there has a better way than embedding each successive AR::Migration::ClassMethod in a begin; rescue =>e opposite_action; end block.
Perhaps a common example is in order:
#2010010100000000_made_a_typo.rb
class MadeATypo < ActiveRecord::Migration
def self.up
rename_column :birds, :url, :photo_file_name
rename_column :birds, :genius, :species #typo on :genius => :genus
end
def self.down
rename_column :birds, :photo_file_name, :url
rename_column :birds, :species, :genius
end
end
This migration will fail on the second line with "column genius not found", but not record the migration number in the schema_migrations table. I'd like it if it called
rename_column :birds, :photo_file_name, :url #this is a revert of the first line
before the exception was passed out of MadeATypo.up.
Responses to comments:
I understand that mysql might not have support for DDL transactions, I'm looking for a more application-level solution which (probably) uses AR::Migration itself. Surely someone has created a plugin which captures method calls to the main AR:M:ClassMethods and can rewind them in most cases if an exception occurs during a migration.
I don't have a solution, but the main problem is that DDL statements can't be transactioned (at least in MySQL, I don't know if that's a general thing).
So, because migrations are treated as atomic up/down actions, there's no easy way to undo "half" a migration - it's not easy to work out which parts of the down correspond to which parts of the up
I don't know of any way to do this currently, but the code for reversible migrations coming to Rails 3.1 looks like a good base to build this feature on. See these links:
http://edgerails.info/articles/what-s-new-in-edge-rails/2011/05/06/reversible-migrations/index.html
https://github.com/rails/rails/commit/47017bd1697d6b4d6780356a403f91536eacd689
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/migration/command_recorder.rb
The way I'm thinking you could implement this is to run the migration against the recorder (as in https://github.com/rails/rails/commit/47017bd1697d6b4d6780356a403f91536eacd689#L0R337), then switch back to the live connection and run the recorder forward one command at a time, tracking how many have completed. All you need, then, is to wrap that forward running loop in a begin-rescue-end that executes as many inverse commands as you successfully executed forward commands, if there's an exception.

Resources