Migration critique - ruby-on-rails

Working on a 3.0 rails project for a client and I have a sensitive migration that I need to run off a live production server. Its essentially suppose to down case all the State abbreviations in the DB, FL -> fl, PA -> pa etc...
I can't test locally due to restrictions:
does calling the wording of the migration effect anything? I know it does with add and create etc but not sure when updating info like this.
rails g migration UpdateStateAbbreviation
def self.up
say_with_time "Updating states abbreviation..." do
State.find(:all).each do |s|
tmp = s.abbreviation.downcase
s.update_attribute :abbreviation, tmp
end
end end
Rake db:migrate

One very important rule with migrations is to never reference models in your migrations. This might seem like an academic concern, but at some point in the future you may not have a State model at all, and when you remove app/models/state.rb then this migration will not work.
A properly constructed migration will execute properly regardless of changes in the future. Whatever it does may be later un-done, there's nothing wrong with that, but setting it up for failure is never a good idea.
You can do this downcasing operation in your database using a string function and something like:
execute "UPDATE states SET abbreviation=LOWER(abbreviation)"
Using models in the migration causes all sorts of problems. This goes for using your model to pre-populate certain key records as well. Use seeds.rb if you must, or even better, a rake task to do it for you.
As a note, if you can't test locally you have a very flawed development process. You should always run and test your migrations, both up and down where applicable, to ensure that they work correctly. Where you can't get actual production data for reasons of security or privacy, work with your DBA to get a scrubbed, non-sensitive version for testing purposes. State names should not be confidential, for instance.

Related

Best practise to populate somedata in a existing database in rails

Problem statement:
Let's say I created a new column column_1 in a table table_1 through rails migration. And now I want to populate the data into column_1 doing some computation but it's a one time task.
Approaches:
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
or is there any other best practise.
Generally, we use Rails Migrations to migrate our application’s schema, and Rake tasks to migrate our production data.
There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment.
In all other cases, using a Rake task provides us with more flexibility, and less maintenance.
See detailed explanation here
Even though there are different approaches, it is better to do that in in Rake or Migrations but not seed.
Rake tasks:
Rake tasks are generally preferred to do maintenance or data migration jobs over a collection of data.
Example of rake:
lib/tasks/addData.rake
desc "TODO"
task :my_task1 => :environment do
User.update_all(status: 'active')
end
Example of doing it in migration:
If you add status field to user:
class AddStatusToUser < ActiveRecord::Migration
def up
add_column :users, :status, :string
User.update_all(status: 'active')
end
def down
remove_column :users, :status
end
end
Why Not seeds:
Seeds are generally like database Dump, seed files are used to fill the data into Database for the first time when the application is started. So that the application is kickstarted with different data.
Examples are Application settings, Application Configurations, Admin users etc.
This is actually an opinion based question
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
It depends on how long it's gonna take.
1. Migration:
If it's one-time task go with migration but only if the task is going to run for few minutes.
2. Rake Task:
If the task is one-time but it might take a few hours it should be a rake task, not a migration.
One time task ? Definitely will go with migration as it will only executed when the migration take place. And won't be executed afterwards
Rake task is considered as too much, since you will need it to run only once. The task script remain there until you decide to remove it. But, it is totally doesn't make sense ( to build something which will be removed in the near future, unless, for testing purposes in some special cases )
If you're asking about best practices, people will tend to have different approach. It will depend on each case that we are trying to solve. But, in common, there are some cases which are shareable.

How to keep overview of the migrations?

I have a question regarding my migrations in rails.
Normally when i want to add a colum to a model for i dont make extra migrations but instead i perform this steps:
rake db:rollback
next i change the migration file in db/migrations and rerune:
rake db:migrate
The biggest problem is that when i do this i loose my data.
Previous i wrote migrations from the command line with for example
rake g migration Add_Column_House_to_Users house:string
The problem with this approach is that my db/migrations folder afterwards get very large and not very clear! I mean at the end i dont know wich variables the object has! Im not an expert in rails and would like to ask you how to keep the overview over the migrations!Thanks
Just a minor thought - I just use the file db/migrate/schema.rb to determine whats in the database as opposed to tracking through the migrations
You definitely shouldn't use db:rollback with a table with existing data.
I have a few production RonR apps with a ton of data and there are 100+ entries in the migrations table and adding new migrations to tweak tables is the rails way to do things. Not sure what you mean by lucid, but your schema and data model are going to change over time and that is ok and expected.
One tip. The migrations are great, but they are just the beginning, you can include complex logic as needed to fix your existing data (like so)
Changing data in existing table:
def up
add_column :rsvps, :column_name_id, :integer
update_data
end
def update_data
rsvps = Rsvp.where("other_column is not null")
rsvps.each do |rsvp|
invite = Blah.find(rsvp.example_id)
...
rsvp.save
end
end
Another tip: backup your production database often (should do this anyway), but use it to test all of your migrations before deploying. I run scripts like this all the time for local testing:
mysql -u root -ppassword
drop database mydatabase_dev;
create database mydatabase_dev;
use mydatabase_dev;
source /var/www/bak/mydatabase_backup_2013-10-04-16.28.06.sql
exit
rake db:migrate

Factory Girl missing methods with schema fiddling?

I made a standard Active::Record class with a user_id field on it, and sure enough, the requirements changed. Turns out I needed to nix the 'user_id' column, and put on an 'email' column instead. I figured I would just manually replace the field in the migration file and then do a quick 'rake db:migrate down/up' on that version to make the update. I even ended up updating db/schema.rb as well.
No Dice:
Failure/Error: ss = Factory(:model)
undefined method `email=' for #<Model:0xb2cc288>
I can use email getter/setter methods in the console, or in the service fine -- but factory girl doesn't seem to get the hint. Is it caching a method list somewhere? (Hint: mysql table looks good too)
Firstly, in the future, try really hard not to change existing migration files. You will mess up more than just FactoryGirl, namely you will mess up your teammates' setups. Requirements change and that's OK, you just make a new migration that drops the user_id column and adds an e-mail column. This way, the evolution of the data model will be recorded in the migrations.
For your specific problem, though, you probably want to run rake db:test:prepare or, if that doesn't work, drop your test database completely and rebuild it.

Rails: Is the version number in 'schema.rb' used for anything?

Now that Rails has timestamped migrations, the single version number at the top of /db/schema.rb seems pointless. Sometimes the version number ends up incorrect when dealing with multiple developers or multiple branches.
Does Rails even utilize that :version parameter anymore?
And is there any harm in it being incorrect (as in: it doesn't reflect the timestamp of most recently applied commit)?
Example:
ActiveRecord::Schema.define(:version => 20100417022947) do
# schema definition ...
end
Actually, the version is much more important than this. The code you've cited is actually only a small part of what assume_migrated_upto_version does. The real effect of the migration version is that all prior migrations (as found in the db/migrate directory) are assumed to have been run. (So yes, it does what the function name suggests.)
This has some interesting implications, particularly in the case where multiple people commit new migrations at the same time.
If you version your schema.rb, which is what the Rails team recommends, you're okay. You're 100% guaranteed to have a conflict (the schema version), and the committing/merging user has to resolve it, by merging their changes and setting the :version to the highest of the two. Hopefully they do this merge correctly.
Some projects choose to avoid this continual conflict issue by keeping the schema.rb out of version control. They might rely solely on migrations, or keep a separate version-controlled copy of the schema that they occasionally update.
The problem occurs if someone creates a migration with a timestamp prior to your schema.rb's :version. If you db:migrate, you'll apply their migration, your schema.rb will be updated (but retain the same, higher :version), and everything is fine. But if you should happen to db:schema:load (or db:reset) instead, you'll not only be missing their migration, but assume_migrated_upto_version will mark their migration as having been applied.
The best solution at this point is probably to require that users re-timestamp their migrations to the time of their merge.
Ideally, I would prefer if schema.rb actually contained a list of applied migration numbers rather than an assume-up-to-here :version. But I doubt this will happen -- the Rails team seems to believe the problem is adequately solved by checking in the schema.rb file.
I decided to investigate myself. It turns out that because of the timestamped migrations, the only thing Rails does with that number is assume that the migration with that particular timestamp has already been applied and thus create the appropriate entry in the schema_migration table if it doesn't exist.
from: /lib/active_record/connection_adapters/abstract/schema_statements.rb
def assume_migrated_upto_version(version, migrations_path = ActiveRecord::Migrator.migrations_path)
# other code ...
unless migrated.include?(version)
execute "INSERT INTO #{sm_table} (version) VALUES ('#{version}')"
end
# ...

Is seeding data with fixtures dangerous in Ruby on Rails

I have fixtures with initial data that needs to reside in my database (countries, regions, carriers, etc.). I have a task rake db:seed that will seed a database.
namespace :db do
desc "Load seed fixtures (from db/fixtures) into the current environment's database."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/yamls/*.yml').each do |file|
Fixtures.create_fixtures('db/fixtures/yamls', File.basename(file, '.*'))
end
end
end
I am a bit worried because this task wipes my database clean and loads the initial data. The fact that this is even possible to do more than once on production scares the crap out of me. Is this normal and do I just have to be cautious? Or do people usually protect a task like this in some way?
Seeding data with fixtures is an extremely bad idea.
Fixtures are not validated and since most Rails developers don't use database constraints this means you can easily get invalid or incomplete data inserted into your production database.
Fixtures also set strange primary key ids by default, which is not necessarily a problem but is annoying to work with.
There are a lot of solutions for this. My personal favorite is a rake task that runs a Ruby script that simply uses ActiveRecord to insert records. This is what Rails 3 will do with db:seed, but you can easily write this yourself.
I complement this with a method I add to ActiveRecord::Base called create_or_update. Using this I can run the seed script multiple times, updating old records instead of throwing an exception.
I wrote an article about these techniques a while back called Loading seed data.
For the first part of your question, yes I'd just put some precaution for running a task like this in production. I put a protection like this in my bootstrapping/seeding task:
task :exit_or_continue_in_production? do
if Rails.env.production?
puts "!!!WARNING!!! This task will DESTROY " +
"your production database and RESET all " +
"application settings"
puts "Continue? y/n"
continue = STDIN.gets.chomp
unless continue == 'y'
puts "Exiting..."
exit!
end
end
end
I have created this gist for some context.
For the second part of the question -- usually you really want two things: a) very easily seeding the database and setting up the application for development, and b) bootstrapping the application on production server (like: inserting admin user, creating folders application depends on, etc).
I'd use fixtures for seeding in development -- everyone from the team then sees the same data in the app and what's in app is consistent with what's in tests. (Usually I wrap rake app:bootstrap, rake app:seed rake gems:install, etc into rake app:install so everyone can work on the app by just cloning the repo and running this one task.)
I'd however never use fixtures for seeding/bootstrapping on production server. Rails' db/seed.rb is really fine for this task, but you can of course put the same logic in your own rake app:seed task, like others pointed out.
Rails 3 will solve this for you using a seed.rb file.
http://github.com/brynary/rails/commit/4932f7b38f72104819022abca0c952ba6f9888cb
We've built up a bunch of best practices that we use for seeding data. We rely heavily on seeding, and we have some unique requirements since we need to seed multi-tenant systems. Here's some best practices we've used:
Fixtures aren't the best solution, but you still should store your seed data in something other than Ruby. Ruby code for storing seed data tends to get repetitive, and storing data in a parseable file means you can write generic code to handle your seeds in a consistent fashion.
If you're going to potentially update seeds, use a marker column named something like code to match your seeds file to your actual data. Never rely on ids being consistent between environments.
Think about how you want to handle updating existing seed data. Is there any potential that users have modified this data? If so, should you maintain the user's information rather than overriding it with seed data?
If you're interested in some of the ways we do seeding, we've packaged them into a gem called SeedOMatic.
How about just deleting the task off your production server once you have seeded the database?
I just had an interesting idea...
what if you created \db\seeds\ and added migration-style files:
file: 200907301234_add_us_states.rb
class AddUsStates < ActiveRecord::Seeds
def up
add_to(:states, [
{:name => 'Wisconsin', :abbreviation => 'WI', :flower => 'someflower'},
{:name => 'Louisiana', :abbreviation => 'LA', :flower => 'cypress tree'}
]
end
end
def down
remove_from(:states).based_on(:name).with_values('Wisconsin', 'Louisiana', ...)
end
end
alternately:
def up
State.create!( :name => ... )
end
This would allow you to run migrations and seeds in an order that would allow them to coexist more peaceably.
thoughts?

Resources