I have fixtures with initial data that needs to reside in my database (countries, regions, carriers, etc.). I have a task rake db:seed that will seed a database.
namespace :db do
desc "Load seed fixtures (from db/fixtures) into the current environment's database."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/yamls/*.yml').each do |file|
Fixtures.create_fixtures('db/fixtures/yamls', File.basename(file, '.*'))
end
end
end
I am a bit worried because this task wipes my database clean and loads the initial data. The fact that this is even possible to do more than once on production scares the crap out of me. Is this normal and do I just have to be cautious? Or do people usually protect a task like this in some way?
Seeding data with fixtures is an extremely bad idea.
Fixtures are not validated and since most Rails developers don't use database constraints this means you can easily get invalid or incomplete data inserted into your production database.
Fixtures also set strange primary key ids by default, which is not necessarily a problem but is annoying to work with.
There are a lot of solutions for this. My personal favorite is a rake task that runs a Ruby script that simply uses ActiveRecord to insert records. This is what Rails 3 will do with db:seed, but you can easily write this yourself.
I complement this with a method I add to ActiveRecord::Base called create_or_update. Using this I can run the seed script multiple times, updating old records instead of throwing an exception.
I wrote an article about these techniques a while back called Loading seed data.
For the first part of your question, yes I'd just put some precaution for running a task like this in production. I put a protection like this in my bootstrapping/seeding task:
task :exit_or_continue_in_production? do
if Rails.env.production?
puts "!!!WARNING!!! This task will DESTROY " +
"your production database and RESET all " +
"application settings"
puts "Continue? y/n"
continue = STDIN.gets.chomp
unless continue == 'y'
puts "Exiting..."
exit!
end
end
end
I have created this gist for some context.
For the second part of the question -- usually you really want two things: a) very easily seeding the database and setting up the application for development, and b) bootstrapping the application on production server (like: inserting admin user, creating folders application depends on, etc).
I'd use fixtures for seeding in development -- everyone from the team then sees the same data in the app and what's in app is consistent with what's in tests. (Usually I wrap rake app:bootstrap, rake app:seed rake gems:install, etc into rake app:install so everyone can work on the app by just cloning the repo and running this one task.)
I'd however never use fixtures for seeding/bootstrapping on production server. Rails' db/seed.rb is really fine for this task, but you can of course put the same logic in your own rake app:seed task, like others pointed out.
Rails 3 will solve this for you using a seed.rb file.
http://github.com/brynary/rails/commit/4932f7b38f72104819022abca0c952ba6f9888cb
We've built up a bunch of best practices that we use for seeding data. We rely heavily on seeding, and we have some unique requirements since we need to seed multi-tenant systems. Here's some best practices we've used:
Fixtures aren't the best solution, but you still should store your seed data in something other than Ruby. Ruby code for storing seed data tends to get repetitive, and storing data in a parseable file means you can write generic code to handle your seeds in a consistent fashion.
If you're going to potentially update seeds, use a marker column named something like code to match your seeds file to your actual data. Never rely on ids being consistent between environments.
Think about how you want to handle updating existing seed data. Is there any potential that users have modified this data? If so, should you maintain the user's information rather than overriding it with seed data?
If you're interested in some of the ways we do seeding, we've packaged them into a gem called SeedOMatic.
How about just deleting the task off your production server once you have seeded the database?
I just had an interesting idea...
what if you created \db\seeds\ and added migration-style files:
file: 200907301234_add_us_states.rb
class AddUsStates < ActiveRecord::Seeds
def up
add_to(:states, [
{:name => 'Wisconsin', :abbreviation => 'WI', :flower => 'someflower'},
{:name => 'Louisiana', :abbreviation => 'LA', :flower => 'cypress tree'}
]
end
end
def down
remove_from(:states).based_on(:name).with_values('Wisconsin', 'Louisiana', ...)
end
end
alternately:
def up
State.create!( :name => ... )
end
This would allow you to run migrations and seeds in an order that would allow them to coexist more peaceably.
thoughts?
Related
Problem statement:
Let's say I created a new column column_1 in a table table_1 through rails migration. And now I want to populate the data into column_1 doing some computation but it's a one time task.
Approaches:
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
or is there any other best practise.
Generally, we use Rails Migrations to migrate our application’s schema, and Rake tasks to migrate our production data.
There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment.
In all other cases, using a Rake task provides us with more flexibility, and less maintenance.
See detailed explanation here
Even though there are different approaches, it is better to do that in in Rake or Migrations but not seed.
Rake tasks:
Rake tasks are generally preferred to do maintenance or data migration jobs over a collection of data.
Example of rake:
lib/tasks/addData.rake
desc "TODO"
task :my_task1 => :environment do
User.update_all(status: 'active')
end
Example of doing it in migration:
If you add status field to user:
class AddStatusToUser < ActiveRecord::Migration
def up
add_column :users, :status, :string
User.update_all(status: 'active')
end
def down
remove_column :users, :status
end
end
Why Not seeds:
Seeds are generally like database Dump, seed files are used to fill the data into Database for the first time when the application is started. So that the application is kickstarted with different data.
Examples are Application settings, Application Configurations, Admin users etc.
This is actually an opinion based question
Is it preferred to do it through migration. or should I add it in seeds and then populate it and remove it back again
It depends on how long it's gonna take.
1. Migration:
If it's one-time task go with migration but only if the task is going to run for few minutes.
2. Rake Task:
If the task is one-time but it might take a few hours it should be a rake task, not a migration.
One time task ? Definitely will go with migration as it will only executed when the migration take place. And won't be executed afterwards
Rake task is considered as too much, since you will need it to run only once. The task script remain there until you decide to remove it. But, it is totally doesn't make sense ( to build something which will be removed in the near future, unless, for testing purposes in some special cases )
If you're asking about best practices, people will tend to have different approach. It will depend on each case that we are trying to solve. But, in common, there are some cases which are shareable.
I need to populate my production database app with data in particular tables. This is before anyone ever even touches the application. This data would also be required in development mode as it's required for testing against. Fixtures are normally the way to go for testing data, but what's the "best practice" for Ruby on Rails to ship this data to the live database also upon db creation?
ultimately this is a two part question I suppose.
1) What's the best way to load test data into my database for development, this will be roughly 1,000 items. Is it through a migration or through fixtures? The reason this is a different answer than the question below is that in development, there's certain fields in the tables that I'd like to make random. In production, these fields would all start with the same value of 0.
2) What's the best way to bootstrap a production db with live data I need in it, is this also through a migration or fixture?
I think the answer is to seed as described here: http://lptf.blogspot.com/2009/09/seed-data-in-rails-234.html but I need a way to seed for development and seed for production. Also, why bother using Fixtures if seeding is available? When does one seed and when does one use fixtures?
Usually fixtures are used to provide your tests with data, not to populate data into your database. You can - and some people have, like the links you point to - use fixtures for this purpose.
Fixtures are OK, but using Ruby gives us some advantages: for example, being able to read from a CSV file and populate records based on that data set. Or reading from a YAML fixture file if you really want to: since your starting with a programming language your options are wide open from there.
My current team tried to use db/seed.rb, and checking RAILS_ENV to load only certain data in certain places.
The annoying thing about db:seed is that it's meant to be a one shot thing: so if you have additional items to add in the middle of development - or when your app has hit production - ... well, you need to take that into consideration (ActiveRecord's find_or_create_by...() method might be your friend here).
We tried the Bootstrapper plugin, which puts a nice DSL over the RAILS_ENV checking, and lets your run only the environment you want. It's pretty nice.
Our needs actually went beyond that - we found we needed database style migrations for our seed data. Right now we are putting normal Ruby scripts into a folder (db/bootstrapdata/) and running these scripts with Arild Shirazi's required gem to load (and thus run) the scripts in this directory.
Now this only gives you part of the database style migrations. It's not hard to go from this to creating something where these data migrations can only be run once (like database migrations).
Your needs might stop at bootstrapper: we have pretty unique needs (developing the system when we only know half the spec, larg-ish Rails team, big data migration from the previous generation of software. Your needs might be simpler).
If you did want to use fixtures the advantage over seed is that you can easily export also.
A quick guess at how the rake task may looks is as follows
desc 'Export the data objects to Fixtures from data in an existing
database. Defaults to development database. Set RAILS_ENV to override.'
task :export => :environment do
sql = "SELECT * FROM %s"
skip_tables = ["schema_info"]
export_tables = [
"roles",
"roles_users",
"roles_utilities",
"user_filters",
"users",
"utilities"
]
time_now = Time.now.strftime("%Y_%h_%d_%H%M")
folder = "#{RAILS_ROOT}/db/fixtures/#{time_now}/"
FileUtils.mkdir_p folder
puts "Exporting data to #{folder}"
ActiveRecord::Base.establish_connection(:development)
export_tables.each do |table_name|
i = "000"
File.open("#{folder}/#{table_name}.yml", 'w') do |file|
data = ActiveRecord::Base.connection.select_all(sql % table_name)
file.write data.inject({}) { |hash, record|
hash["#{table_name}_#{i.succ!}"] = record
hash }.to_yaml
end
end
end
desc "Import the models that have YAML files in
db/fixture/defaults or from a specified path."
task :import do
location = 'db/fixtures/default'
puts ""
puts "enter import path [#{location}]"
location_in = STDIN.gets.chomp
location = location_in unless location_in.blank?
ENV['FIXTURES_PATH'] = location
puts "Importing data from #{ENV['FIXTURES_PATH']}"
Rake::Task["db:fixtures:load"].invoke
end
I have a rake task that populates some initial data in my rails app. For example, countries, states, mobile carriers, etc.
The way I have it set up now, is I have a bunch of create statements in files in /db/fixtures and a rake task that processes them. For example, one model I have is themes. I have a theme.rb file in /db/fixtures that looks like this:
Theme.delete_all
Theme.create(:id => 1, :name=>'Lite', :background_color=>'0xC7FFD5', :title_text_color=>'0x222222',
:component_theme_color=>'0x001277', :carrier_select_color=>'0x7683FF', :label_text_color=>'0x000000',
:join_upper_gradient=>'0x6FAEFF', :join_lower_gradient=>'0x000000', :join_text_color=>'0xFFFFFF',
:cancel_link_color=>'0x001277', :border_color=>'0x888888', :carrier_text_color=>'0x000000', :public => true)
Theme.create(:id => 2, :name=>'Metallic', :background_color=>'0x000000', :title_text_color=>'0x7299FF',
:component_theme_color=>'0xDBF2FF', :carrier_select_color=>'0x000000', :label_text_color=>'0xDBF2FF',
:join_upper_gradient=>'0x2B25FF', :join_lower_gradient=>'0xBEFFAC', :join_text_color=>'0x000000',
:cancel_link_color=>'0xFF7C12', :border_color=>'0x000000', :carrier_text_color=>'0x000000', :public => true)
Theme.create(:id => 3, :name=>'Blues', :background_color=>'0x0060EC', :title_text_color=>'0x000374',
:component_theme_color=>'0x000374', :carrier_select_color=>'0x4357FF', :label_text_color=>'0x000000',
:join_upper_gradient=>'0x4357FF', :join_lower_gradient=>'0xffffff', :join_text_color=>'0x000000',
:cancel_link_color=>'0xffffff', :border_color=>'0x666666', :carrier_text_color=>'0x000000', :public => true)
puts "Success: Theme data loaded"
The idea here is that I want to install some stock themes for users to start with. I have a problem with this method.
Setting the ID does not work. This means that if I decide to add a theme, let's call it 'Red', then I would simply like to add the theme statement to this fixture file and call the rake task to reseed the database. If I do that, because themes belong to other objects and their id's change upon this re-initialization, all links are broken.
My question is first of all, is this a good way to handle seeding a database? In a previous post, this was recommended to me.
If so, how can I hard code the IDs, and are there any downsides to that?
If not, what is the best way to seed the database?
I will truly appreciate long and thought out answers that incorporate best practices.
Updating since these answers are slightly outdated (although some still apply).
Simple feature added in rails 2.3.4, db/seeds.rb
Provides a new rake task
rake db:seed
Good for populating common static records like states, countries, etc...
http://railscasts.com/episodes/179-seed-data
*Note that you can use fixtures if you had already created them to also populate with the db:seed task by putting the following in your seeds.rb file (from the railscast episode):
require 'active_record/fixtures'
Fixtures.create_fixtures("#{Rails.root}/test/fixtures", "operating_systems")
For Rails 3.x use 'ActiveRecord::Fixtures' instead of 'Fixtures' constant
require 'active_record/fixtures'
ActiveRecord::Fixtures.create_fixtures("#{Rails.root}/test/fixtures", "fixtures_file_name")
Usually there are 2 types of seed data required.
Basic data upon which the core of your application may rely. I call this the common seeds.
Environmental data, for example to develop the app it is useful to have a bunch of data in a known state that us can use for working on the app locally (the Factory Girl answer above covers this kind of data).
In my experience I was always coming across the need for these two types of data. So I put together a small gem that extends Rails' seeds and lets you add multiple common seed files under db/seeds/ and any environmental seed data under db/seeds/ENV for example db/seeds/development.
I have found this approach is enough to give my seed data some structure and gives me the power to setup my development or staging environment in a known state just by running:
rake db:setup
Fixtures are fragile and flakey to maintain, as are regular sql dumps.
Using seeds.rb file or FactoryBot is great, but these are respectively great for fixed data structures and testing.
The seedbank gem might give you more control and modularity to your seeds. It inserts rake tasks and you can also define dependencies between your seeds. Your rake task list will have these additions (e.g.):
rake db:seed # Load the seed data from db/seeds.rb, db/seeds/*.seeds.rb and db/seeds/ENVIRONMENT/*.seeds.rb. ENVIRONMENT is the current environment in Rails.env.
rake db:seed:bar # Load the seed data from db/seeds/bar.seeds.rb
rake db:seed:common # Load the seed data from db/seeds.rb and db/seeds/*.seeds.rb.
rake db:seed:development # Load the seed data from db/seeds.rb, db/seeds/*.seeds.rb and db/seeds/development/*.seeds.rb.
rake db:seed:development:users # Load the seed data from db/seeds/development/users.seeds.rb
rake db:seed:foo # Load the seed data from db/seeds/foo.seeds.rb
rake db:seed:original # Load the seed data from db/seeds.rb
factory_bot sounds like it will do what you are trying to achieve. You can define all the common attributes in the default definition and then override them at creation time. You can also pass an id to the factory:
Factory.define :theme do |t|
t.background_color '0x000000'
t.title_text_color '0x000000',
t.component_theme_color '0x000000'
t.carrier_select_color '0x000000'
t.label_text_color '0x000000',
t.join_upper_gradient '0x000000'
t.join_lower_gradient '0x000000'
t.join_text_color '0x000000',
t.cancel_link_color '0x000000'
t.border_color '0x000000'
t.carrier_text_color '0x000000'
t.public true
end
Factory(:theme, :id => 1, :name => "Lite", :background_color => '0xC7FFD5')
Factory(:theme, :id => 2, :name => "Metallic", :background_color => '0xC7FFD5')
Factory(:theme, :id => 3, :name => "Blues", :background_color => '0x0060EC')
When used with faker it can populate a database really quickly with associations without having to mess about with Fixtures (yuck).
I have code like this in a rake task.
100.times do
Factory(:company, :address => Factory(:address), :employees => [Factory(:employee)])
end
Rails has a built in way to seed data as explained here.
Another way would be to use a gem for more advanced or easy seeding such as: seedbank.
The main advantage of this gem and the reason I use it is that it has advanced capabilities such as data loading dependencies and per environment seed data.
Adding an up to date answer as this answer was first on google.
The best way is to use fixtures.
Note: Keep in mind that fixtures do direct inserts and don't use your model so if you have callbacks that populate data you will need to find a workaround.
Add it in database migrations, that way everyone gets it as they update. Handle all of your logic in the ruby/rails code, so you never have to mess with explicit ID settings.
In a Rails application, I need a table in my database to contain constant data.
This table content is not intended to change for the moment but I do not want to put the content in the code, to be able to change it whenever needed.
I tried filling this table in the migration that created it, but this does not seem to work with the test environment and breaks my unit tests. In test environment, my model is never able to return any value while it is ok in my development environment.
Is there a way to fill that database correctly even in test environment ? Is there another way of handling these kind of data that should not be in code ?
edit
Thanks all for your answers and especially Vlad R for explaining the problem.
I now understand why my data are not loaded in test. This is because the test environment uses the db:load rake command which directly loads the schema instead of running the migrations. Having put my values in the migration only and not in the schema, these values are not loaded for test.
What you are probably observing is that the test framework is not running the migrations (db:migrate), but loading db/schema.rb directly (db:load) instead.
You have two options:
continue to use the migration for production and development; for the test environment, add your constant data to the corresponding yml files in db/fixtures
leave the existing db/fixtures files untouched, and create another set of yml files (containing the constant data) in the same vein as db/fixtures, but usable by both test and production/development environments when doing a rake db:load schema initialization
To cover those scenarios that use db:load (instead of db:migrate - e.g. test, bringing up a new database on a new development machine using the faster db:load instead of db:migrate, etc.) is create a drop-in rakefile in RAILS_APP/lib/tasks to augment the db:load task by loading your constant intialization data from "seed" yml files (one for each model) into the database.
Use the db:seed rake task as an example. Put your seed data in db/seeds/.yml
#the command is: rake:db:load
namespace :db do
desc 'Initialize data from YAML.'
task :load => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/seeds/*.yml').each do |file|
Fixtures.create_fixtures('db/seeds', File.basename(file, '.*'))
end
end
end
To cover the incremental scenarios (db:migrate), define one migration that does the same thing as the task defined above.
If your seed data ever changes, you will need to add another migration to remove the old seed data and load the new one instead, which may be non-trivial in case of foreign-key dependencies etc.
Take a look at my article on loading seed data.
There's a number of ways to do this. I like a rake task called db:populate which lets you specify your fixed data in normal ActiveRecord create statements. For getting the data into tests, I've just be loading this populate file in my test_helper. However, I think I am going to switch to a test database that already has the seed data populated.
There's also plugin called SeedFu that helps with this problem.
Whatever you do, I recommend against using fixtures for this, because they don't validate your data, so it's very easy to create invalid records.
I've got a rails application where users have to log in. Therefore in order for the application to be usable, there must be one initial user in the system for the first person to log in with (they can then create subsequent users). Up to now I've used a migration to add a special user to the database.
After asking this question, it seems that I should be using db:schema:load, rather than running the migrations, to set up fresh databases on new development machines. Unfortunately, this doesn't seem to include the migrations which insert data, only those which set up tables, keys etc.
My question is, what's the best way to handle this situation:
Is there a way to get d:s:l to include data-insertion migrations?
Should I not be using migrations at all to insert data this way?
Should I not be pre-populating the database with data at all? Should I update the application code so that it handles the case where there are no users gracefully, and lets an initial user account be created live from within the application?
Any other options? :)
Try a rake task. For example:
Create the file /lib/tasks/bootstrap.rake
In the file, add a task to create your default user:
namespace :bootstrap do
desc "Add the default user"
task :default_user => :environment do
User.create( :name => 'default', :password => 'password' )
end
desc "Create the default comment"
task :default_comment => :environment do
Comment.create( :title => 'Title', :body => 'First post!' )
end
desc "Run all bootstrapping tasks"
task :all => [:default_user, :default_comment]
end
Then, when you're setting up your app for the first time, you can do rake db:migrate OR rake db:schema:load, and then do rake bootstrap:all.
Use db/seed.rb found in every Rails application.
While some answers given above from 2008 can work well, they are pretty outdated and they are not really Rails convention anymore.
Populating initial data into database should be done with db/seed.rb file.
It's just works like a Ruby file.
In order to create and save an object, you can do something like :
User.create(:username => "moot", :description => "king of /b/")
Once you have this file ready, you can do following
rake db:migrate
rake db:seed
Or in one step
rake db:setup
Your database should be populated with whichever objects you wanted to create in seed.rb
I recommend that you don't insert any new data in migrations. Instead, only modify existing data in migrations.
For inserting initial data, I recommend you use YML. In every Rails project I setup, I create a fixtures directory under the DB directory. Then I create YML files for the initial data just like YML files are used for the test data. Then I add a new task to load the data from the YML files.
lib/tasks/db.rake:
namespace :db do
desc "This loads the development data."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/*.yml').each do |file|
base_name = File.basename(file, '.*')
say "Loading #{base_name}..."
Fixtures.create_fixtures('db/fixtures', base_name)
end
end
desc "This drops the db, builds the db, and seeds the data."
task :reseed => [:environment, 'db:reset', 'db:seed']
end
db/fixtures/users.yml:
test:
customer_id: 1
name: "Test Guy"
email: "test#example.com"
hashed_password: "656fc0b1c1d1681840816c68e1640f640c6ded12"
salt: "188227600.754087929365988"
This is my new favorite solution, using the populator and faker gems:
http://railscasts.com/episodes/126-populating-a-database
Try the seed-fu plugin, which is quite a simple plugin that allows you to seed data (and change that seed data in the future), will also let you seed environment specific data and data for all environments.
I guess the best option is number 3, mainly because that way there will be no default user which is a great way to render otherwise good security useless.
I thought I'd summarise some of the great answers I've had to this question, together with my own thoughts now I've read them all :)
There are two distinct issues here:
Should I pre-populate the database with my special 'admin' user? Or should the application provide a way to set up when it's first used?
How does one pre-populate the database with data? Note that this is a valid question regardless of the answer to part 1: there are other usage scenarios for pre-population than an admin user.
For (1), it seems that setting up the first user from within the application itself is quite a bit of extra work, for functionality which is, by definition, hardly ever used. It may be slightly more secure, however, as it forces the user to set a password of their choice. The best solution is in between these two extremes: have a script (or rake task, or whatever) to set up the initial user. The script can then be set up to auto-populate with a default password during development, and to require a password to be entered during production installation/deployment (if you want to discourage a default password for the administrator).
For (2), it appears that there are a number of good, valid solutions. A rake task seems a good way, and there are some plugins to make this even easier. Just look through some of the other answers to see the details of those :)
Consider using the rails console. Good for one-off admin tasks where it's not worth the effort to set up a script or migration.
On your production machine:
script/console production
... then ...
User.create(:name => "Whoever", :password => "whichever")
If you're generating this initial user more than once, then you could also add a script in RAILS_ROOT/script/, and run it from the command line on your production machine, or via a capistrano task.
That Rake task can be provided by the db-populate plugin:
http://github.com/joshknowles/db-populate/tree/master
Great blog post on this:
http://railspikes.com/2008/2/1/loading-seed-data
I was using Jay's suggestions of a special set of fixtures, but quickly found myself creating data that wouldn't be possible using the models directly (unversioned entries when I was using acts_as_versioned)
I'd keep it in a migration. While it's recommended to use the schema for initial setups, the reason for that is that it's faster, thus avoiding problems. A single extra migration for the data should be fine.
You could also add the data into the schema file, as it's the same format as migrations. You'd just lose the auto-generation feature.
For users and groups, the question of pre-existing users should be defined with respect to the needs of the application rather than the contingencies of programming. Perhaps your app requires an administrator; then prepopulate. Or perhaps not - then add code to gracefully ask for a user setup at application launch time.
On the more general question, it is clear that many Rails Apps can benefit from pre-populated date. For example, a US address holding application may as well contain all the States and their abbreviations. For these cases, migrations are your friend, I believe.
Some of the answers are outdated. Since Rails 2.3.4, there is a simple feature called Seed available in db/seed.rb :
#db/seed.rb
User.create( :name => 'default', :password => 'password' )
Comment.create( :title => 'Title', :body => 'First post!' )
It provides a new rake task that you can use after your migrations to load data :
rake db:seed
Seed.rb is a classic Ruby file, feel free to use any classic datastructure (array, hashes, etc.) and iterators to add your data :
["bryan", "bill", "tom"].each do |name|
User.create(:name => name, :password => "password")
end
If you want to add data with UTF-8 characters (very common in French, Spanish, German, etc.), don't forget to add at the beginning of the file :
# ruby encoding: utf-8
This Railscast is a good introduction : http://railscasts.com/episodes/179-seed-data