Best practise to populate somedata in a existing database in rails - ruby-on-rails

Problem statement:
Let's say I created a new column column_1 in a table table_1 through rails migration. And now I want to populate the data into column_1 doing some computation but it's a one time task.
Approaches:
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
or is there any other best practise.

Generally, we use Rails Migrations to migrate our application’s schema, and Rake tasks to migrate our production data.
There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment.
In all other cases, using a Rake task provides us with more flexibility, and less maintenance.
See detailed explanation here

Even though there are different approaches, it is better to do that in in Rake or Migrations but not seed.
Rake tasks:
Rake tasks are generally preferred to do maintenance or data migration jobs over a collection of data.
Example of rake:
lib/tasks/addData.rake
desc "TODO"
task :my_task1 => :environment do
User.update_all(status: 'active')
end
Example of doing it in migration:
If you add status field to user:
class AddStatusToUser < ActiveRecord::Migration
def up
add_column :users, :status, :string
User.update_all(status: 'active')
end
def down
remove_column :users, :status
end
end
Why Not seeds:
Seeds are generally like database Dump, seed files are used to fill the data into Database for the first time when the application is started. So that the application is kickstarted with different data.
Examples are Application settings, Application Configurations, Admin users etc.

This is actually an opinion based question
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
It depends on how long it's gonna take.
1. Migration:
If it's one-time task go with migration but only if the task is going to run for few minutes.
2. Rake Task:
If the task is one-time but it might take a few hours it should be a rake task, not a migration.

One time task ? Definitely will go with migration as it will only executed when the migration take place. And won't be executed afterwards
Rake task is considered as too much, since you will need it to run only once. The task script remain there until you decide to remove it. But, it is totally doesn't make sense ( to build something which will be removed in the near future, unless, for testing purposes in some special cases )
If you're asking about best practices, people will tend to have different approach. It will depend on each case that we are trying to solve. But, in common, there are some cases which are shareable.

Related

How to know that a rake task has been run in rails?

I've a migration in my Rails app, that I'd like to run only if a particular rake task has been run, otherwise I'll lose a bunch of data. Following is something that I'd like to do:
if has_rake_task_been_run?
remove_column :transactions, :paid_by
end
Currently, I couldn't find anyway, instead of assuring this thing manually. Is there any work around for it?
Using rake task for data migration is an extremely risky idea. Couple of reasons not to do this:
Even if you manage to find out whether your rake task has finished or not, your migration will still be marked as completed and you won't be able to replay it. Only way around is raising an exception in your migration.
No, you won't be able to rollback that migration neither. If rake task finishes after the migration has run, rollback will try to add already existing column.
Setting up your database from scratch by new devs will become painful as hell, as they will need to know which rake tasks are to be run when. Not to mentioned that rake db:migrate executes all migrations.
You're polluting your rake task list with non-reusable tasks
It seems that what you're doing is just a regular data migration, so all the stuff done by your rake task should be in fact part of your migration. That will even allow you to make a reversible data migration (in majority of cases).
Note however that data migrations are not that simple as regular scheme-only migrations. Because your migration should be completely independent on your code (as they are to work in the future, even when migrated model is completely removed from your codebase), so it is a common practice to redefine the models you are about to use in your migrations (only the bits required fro the migration). This is not that simple as it sounds, unfortunately, and honestly I am still looking for a perfect solution to that. The best I've seen so far is simple (I'm assuming that paid_by used to be string and you changed it paid_by_id, which references the user):
class YOURMIGRATIONNAME < ActiveRecord::Migration
class Transaction < ActiveRecord::Base
belongs_to :paid_by, class_name: "User"
end
class User < ActiveRecord::Base
end
def up
add_column :transaction, :paid_by_id, :integer
Transaction.transaction do # for speed
Transaction.find_each do |t|
t.paid_by_id = User.find_by(username: t[:paid_by])
t.save! # Always banged save in migration!
end
end
remove_column :paid_by
end
def down
add_column :transaction, :paid_by, :string
Transactions.transaction do
Transaction.find_each do |t|
t[:paid_by] = t.paid_by && t.paid_by.username
t.save!
end
end
remove_column :transactions, :paid_by_id
end
The only downfall of using the code above is that it won't work well if any of those models is using STI (I've made that mistake once, took a while to find out what's wrong). The work around is to define it outside of the migration class, but then those classes are available across all migrations and can be affected with your actual model code (especially in production when all the models are preloaded). In short, data migration with STI is something I am still looking into at the moment.
In case somebody would come here, We succcessfully used after_party rails library. With a simple mechanism, the library maintains rake tasks that have been executed and it becomes easy to perform migration tasks.

How to keep overview of the migrations?

I have a question regarding my migrations in rails.
Normally when i want to add a colum to a model for i dont make extra migrations but instead i perform this steps:
rake db:rollback
next i change the migration file in db/migrations and rerune:
rake db:migrate
The biggest problem is that when i do this i loose my data.
Previous i wrote migrations from the command line with for example
rake g migration Add_Column_House_to_Users house:string
The problem with this approach is that my db/migrations folder afterwards get very large and not very clear! I mean at the end i dont know wich variables the object has! Im not an expert in rails and would like to ask you how to keep the overview over the migrations!Thanks
Just a minor thought - I just use the file db/migrate/schema.rb to determine whats in the database as opposed to tracking through the migrations
You definitely shouldn't use db:rollback with a table with existing data.
I have a few production RonR apps with a ton of data and there are 100+ entries in the migrations table and adding new migrations to tweak tables is the rails way to do things. Not sure what you mean by lucid, but your schema and data model are going to change over time and that is ok and expected.
One tip. The migrations are great, but they are just the beginning, you can include complex logic as needed to fix your existing data (like so)
Changing data in existing table:
def up
add_column :rsvps, :column_name_id, :integer
update_data
end
def update_data
rsvps = Rsvp.where("other_column is not null")
rsvps.each do |rsvp|
invite = Blah.find(rsvp.example_id)
...
rsvp.save
end
end
Another tip: backup your production database often (should do this anyway), but use it to test all of your migrations before deploying. I run scripts like this all the time for local testing:
mysql -u root -ppassword
drop database mydatabase_dev;
create database mydatabase_dev;
use mydatabase_dev;
source /var/www/bak/mydatabase_backup_2013-10-04-16.28.06.sql
exit
rake db:migrate

Migration critique

Working on a 3.0 rails project for a client and I have a sensitive migration that I need to run off a live production server. Its essentially suppose to down case all the State abbreviations in the DB, FL -> fl, PA -> pa etc...
I can't test locally due to restrictions:
does calling the wording of the migration effect anything? I know it does with add and create etc but not sure when updating info like this.
rails g migration UpdateStateAbbreviation
def self.up
say_with_time "Updating states abbreviation..." do
State.find(:all).each do |s|
tmp = s.abbreviation.downcase
s.update_attribute :abbreviation, tmp
end
end end
Rake db:migrate
One very important rule with migrations is to never reference models in your migrations. This might seem like an academic concern, but at some point in the future you may not have a State model at all, and when you remove app/models/state.rb then this migration will not work.
A properly constructed migration will execute properly regardless of changes in the future. Whatever it does may be later un-done, there's nothing wrong with that, but setting it up for failure is never a good idea.
You can do this downcasing operation in your database using a string function and something like:
execute "UPDATE states SET abbreviation=LOWER(abbreviation)"
Using models in the migration causes all sorts of problems. This goes for using your model to pre-populate certain key records as well. Use seeds.rb if you must, or even better, a rake task to do it for you.
As a note, if you can't test locally you have a very flawed development process. You should always run and test your migrations, both up and down where applicable, to ensure that they work correctly. Where you can't get actual production data for reasons of security or privacy, work with your DBA to get a scrubbed, non-sensitive version for testing purposes. State names should not be confidential, for instance.

Create Sequence In Migration Not Reflected In Schema

I have an application that requires a sequence to be present in the database. I have a migration that does the following:
class CreateSequence < ActiveRecord::Migration
def self.up
execute "CREATE SEQUENCE sequence"
end
def self.down
execute "DROP SEQUENCE sequence"
end
end
This does not modify the schema.rb and thus breaks rake db:setup. How can I force the schema to include the sequence?
Note: The sequence exists after running rake db:migrate.
Rails Migrations because they aim toward a schema of tables and fields, instead of a complete database representation including stored procedures, functions, seed data.
When you run rake db:setup, this will create the db, load the schema and then load the seed data.
A few solutions for you to consider:
Choice 1: create your own rake task that does these migrations independent of the Rails Migration up/down. Rails Migrations are just normal classes, and you can make use of them however you like. For example:
rake db:create_sequence
Choice 2: run your specific migration after you load the schema like this:
rake db:setup
rake db:migrate:up VERSION=20080906120000
Choice 3: create your sequence as seed data, because it's essentially providing data (rather than altering the schema).
db/seeds.rb
Choice 4 and my personal preference: run the migrations up to a known good point, including your sequence, and save that blank database. Change rake db:setup to clone that blank database. This is a bit trickier and it sacrifices some capabilities - having all migrations be reversible, having migrations work on top of multiple database vendors, etc. In my experience these are fine tradeoffs. For example:
rake db:fresh #=> clones the blank database, which you store in version control
All the above suggestions are good. however, I think I found a better solution. basically in your development.rb put
config.active_record.schema_format = :sql
For more info see my answer to this issue -
rake test not copying development postgres db with sequences
Check out the pg_sequencer gem. It manages Pg sequences for you as you wish. The one flaw that I can see right now is that it doesn't play nicely with db/schema.rb -- Rails will generate a CREATE SEQUENCE for your tables with a serial field, and pg_sequencer will also generate a sequence itself. (Working to fix that.)

Is seeding data with fixtures dangerous in Ruby on Rails

I have fixtures with initial data that needs to reside in my database (countries, regions, carriers, etc.). I have a task rake db:seed that will seed a database.
namespace :db do
desc "Load seed fixtures (from db/fixtures) into the current environment's database."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/yamls/*.yml').each do |file|
Fixtures.create_fixtures('db/fixtures/yamls', File.basename(file, '.*'))
end
end
end
I am a bit worried because this task wipes my database clean and loads the initial data. The fact that this is even possible to do more than once on production scares the crap out of me. Is this normal and do I just have to be cautious? Or do people usually protect a task like this in some way?
Seeding data with fixtures is an extremely bad idea.
Fixtures are not validated and since most Rails developers don't use database constraints this means you can easily get invalid or incomplete data inserted into your production database.
Fixtures also set strange primary key ids by default, which is not necessarily a problem but is annoying to work with.
There are a lot of solutions for this. My personal favorite is a rake task that runs a Ruby script that simply uses ActiveRecord to insert records. This is what Rails 3 will do with db:seed, but you can easily write this yourself.
I complement this with a method I add to ActiveRecord::Base called create_or_update. Using this I can run the seed script multiple times, updating old records instead of throwing an exception.
I wrote an article about these techniques a while back called Loading seed data.
For the first part of your question, yes I'd just put some precaution for running a task like this in production. I put a protection like this in my bootstrapping/seeding task:
task :exit_or_continue_in_production? do
if Rails.env.production?
puts "!!!WARNING!!! This task will DESTROY " +
"your production database and RESET all " +
"application settings"
puts "Continue? y/n"
continue = STDIN.gets.chomp
unless continue == 'y'
puts "Exiting..."
exit!
end
end
end
I have created this gist for some context.
For the second part of the question -- usually you really want two things: a) very easily seeding the database and setting up the application for development, and b) bootstrapping the application on production server (like: inserting admin user, creating folders application depends on, etc).
I'd use fixtures for seeding in development -- everyone from the team then sees the same data in the app and what's in app is consistent with what's in tests. (Usually I wrap rake app:bootstrap, rake app:seed rake gems:install, etc into rake app:install so everyone can work on the app by just cloning the repo and running this one task.)
I'd however never use fixtures for seeding/bootstrapping on production server. Rails' db/seed.rb is really fine for this task, but you can of course put the same logic in your own rake app:seed task, like others pointed out.
Rails 3 will solve this for you using a seed.rb file.
http://github.com/brynary/rails/commit/4932f7b38f72104819022abca0c952ba6f9888cb
We've built up a bunch of best practices that we use for seeding data. We rely heavily on seeding, and we have some unique requirements since we need to seed multi-tenant systems. Here's some best practices we've used:
Fixtures aren't the best solution, but you still should store your seed data in something other than Ruby. Ruby code for storing seed data tends to get repetitive, and storing data in a parseable file means you can write generic code to handle your seeds in a consistent fashion.
If you're going to potentially update seeds, use a marker column named something like code to match your seeds file to your actual data. Never rely on ids being consistent between environments.
Think about how you want to handle updating existing seed data. Is there any potential that users have modified this data? If so, should you maintain the user's information rather than overriding it with seed data?
If you're interested in some of the ways we do seeding, we've packaged them into a gem called SeedOMatic.
How about just deleting the task off your production server once you have seeded the database?
I just had an interesting idea...
what if you created \db\seeds\ and added migration-style files:
file: 200907301234_add_us_states.rb
class AddUsStates < ActiveRecord::Seeds
def up
add_to(:states, [
{:name => 'Wisconsin', :abbreviation => 'WI', :flower => 'someflower'},
{:name => 'Louisiana', :abbreviation => 'LA', :flower => 'cypress tree'}
]
end
end
def down
remove_from(:states).based_on(:name).with_values('Wisconsin', 'Louisiana', ...)
end
end
alternately:
def up
State.create!( :name => ... )
end
This would allow you to run migrations and seeds in an order that would allow them to coexist more peaceably.
thoughts?

Resources