fetch csv feed and place into database in rails

fetch csv feed and place into database in rails - ruby-on-rails

So from scratch, I have a CSV feed. Its currently 2596 lines long (Yay)
This feed gets updated frequently, I'm wanting to have this csv feed, (Baring in mind when i click the link it instantly downloads as a csv file.) to populate my database daily at a random time (e.g. 5am in the morning) every morning the database table would wipe and repopulate via the csv. (the way i access the csv is via url)
How would i go about this using rails? I'm unaware if there is any gems or anything i could use for this.
Sam

You wouldn't do it via rails - rails is a web framework, this is more a background task. If the population of the database needs to know your application structure, I'd set this up as a rake task in lib/tasks/populate.rake
Your question is much to broad to answer fully without more details, but generally something like the below should work.
Edit: delete users and recreate from an assumed structure
require 'open-uri'
namespace :populate do
desc 'wipes the database and recreates from CSV'
task reload: :environment do
# Remove all users
User.delete_all
CSV.new(open(YOUR_CSV_URL)).each do |row|
# do something with the row
User.create(name: row[0], address: row[1])
end
end
end
You could then use Cron or equivalent to call this at 5am
cd /path/to/your/web/app && RAILS_ENV=production bundle exec rake populate:reload

Related

Ruby on Rails - Automatically insert data into DB without seeds.rb

I've some .json files with data that are automatically updated from time to time. I also have a Ruby on Rails app where I want to insert the information that's in those files.
By now, I'm parsing the JSON and inserting the data in the database in the seeds.rb file, but I'll want to add more data without having to restart the app, I mean, on the go.
From time to time, I'll check those files and if they have modifications, I want to insert those new items into my database.
What's the best way to do it?

Looks like a job for a cron.
Create the code you need in a rake task (in /lib/tasks):
task :import, => :environment do
Importer.read_json_if_modified # your importer class
end
Then run this with the period you want using your system's cron.

How to handle periodically changing database data in your Rails app?

EDIT: I have totally rewritten this question for clarity. I got no comments and no answers earlier.
I am maintaining a 2.x Rails app with plenty of statistical data. Some data is real and some is estimated for the future years. Every year I need to update estimated data with real data and calculate new estimates.
I have been using BIG yml-files and migrations for loading the data into the app every year. My migrations are full of estimation calculations and data corrections.
Problem
My migrations are full of none-schema related material and I can't even dream of doing db:migrate:reset without waiting few hours (if it even works). I'd love to see my migrations nice and clean - only with schema related modifications. But how I am suppose to update the data every year if not using migrations?
Help needed
I'd like to hear your comments and answers. I'm not looking for a silver bullet - more like best practises and ideas how people are handling similar situation.

It sounds like you have a large operation (data load using yml files) once a year but smaller operations once a month.
From my experience with statistical data you will probably end up doing more and more of these operations to clean and add more data.
I would use a job processing framework like resque and resque scheduler.
You can schedule the jobs to run once a month, year, day or constantly running. A job is something like loading yml files (or sets of yml files) or cleaning up data. You can control parameters to send to your job so you can use one class but alternate how it updates or cleans your data based on the way you enqueue or schedule the job.

First of all, I have to say that this is a very interesting question. As far as i know, it isn't a good idea loading data from migrations. Generally speaking you should use db/seeds.rb for data loading in your db and I think it could be a good idea to write a little class helper to put in your lib dir and then call it from db/seeds.rb. I image you could organize you files in the following way:
lib/data_loader.rb
lib/years/2009.rb
lib/years/2010.rb
Obviously, you should clear your migrations and write code for lib/data_loader.rb in the way you should prefer but I was only trying to offer a general idea of how I'd organize my code if I have to face a problem like that.
I'm not sure I've replied to your question in a way that helps but I hope it does.

If I were you I would go with creating custom rake task. You will have access to all you models and activerecord connections and once a year you will end up doing:
rake calculate

I have a situation where I need to load data from CSV files that change infrequently and update data from the Internet daily. I'll include a somewhat complete example on how to do the former.
First I have a rake file in lib/tasks/update.rake:
require 'update/from_csv_files.rb'
namespace :update do
task :csvfiles => :environment do
Dir.glob('db/static_data/*.csv') do |file|
Update::FromCsvFiles.load(file)
end
end
end
The => :environment means we will have access to the database via the usual models.
Then I have code in the lib/update/from_csv_files.rb file to do the actual work:
require 'csv'
module Update
module FromCsvFiles
def FromCsvFiles.load(file)
csv = CSV.open(file, 'r')
csv.each do |row|
id = row[0]
s = Statistic.find_by_id(id)
if (s.nil?)
s = Statistic.new
s.id= id
end
s.survey_area = row[1]
s.nr_of_space_men = row[2]
s.save
end
end
end
end
Then I can just run rake update:csvfiles whenever my CSV files changes to load the new data. I also have another task that is set up in a similar way to update my daily data.
In your case you should be able to write some code to load your YML files or make your calculations directly. To handle your smaller corrections you could make a generic method for loading YML files and call it with specific files from the rake task. That way you only need to include the YML file and update the rake file with a new task. To handle execution order you can make a rake task that calls the other rake tasks in the appropriate order. I'm just throwing around some ideas now, you know better than me.

Ruby on Rails 2.3.5: Populating my prod and devel database with data (migration or fixture?)

I need to populate my production database app with data in particular tables. This is before anyone ever even touches the application. This data would also be required in development mode as it's required for testing against. Fixtures are normally the way to go for testing data, but what's the "best practice" for Ruby on Rails to ship this data to the live database also upon db creation?
ultimately this is a two part question I suppose.
1) What's the best way to load test data into my database for development, this will be roughly 1,000 items. Is it through a migration or through fixtures? The reason this is a different answer than the question below is that in development, there's certain fields in the tables that I'd like to make random. In production, these fields would all start with the same value of 0.
2) What's the best way to bootstrap a production db with live data I need in it, is this also through a migration or fixture?
I think the answer is to seed as described here: http://lptf.blogspot.com/2009/09/seed-data-in-rails-234.html but I need a way to seed for development and seed for production. Also, why bother using Fixtures if seeding is available? When does one seed and when does one use fixtures?

Usually fixtures are used to provide your tests with data, not to populate data into your database. You can - and some people have, like the links you point to - use fixtures for this purpose.
Fixtures are OK, but using Ruby gives us some advantages: for example, being able to read from a CSV file and populate records based on that data set. Or reading from a YAML fixture file if you really want to: since your starting with a programming language your options are wide open from there.
My current team tried to use db/seed.rb, and checking RAILS_ENV to load only certain data in certain places.
The annoying thing about db:seed is that it's meant to be a one shot thing: so if you have additional items to add in the middle of development - or when your app has hit production - ... well, you need to take that into consideration (ActiveRecord's find_or_create_by...() method might be your friend here).
We tried the Bootstrapper plugin, which puts a nice DSL over the RAILS_ENV checking, and lets your run only the environment you want. It's pretty nice.
Our needs actually went beyond that - we found we needed database style migrations for our seed data. Right now we are putting normal Ruby scripts into a folder (db/bootstrapdata/) and running these scripts with Arild Shirazi's required gem to load (and thus run) the scripts in this directory.
Now this only gives you part of the database style migrations. It's not hard to go from this to creating something where these data migrations can only be run once (like database migrations).
Your needs might stop at bootstrapper: we have pretty unique needs (developing the system when we only know half the spec, larg-ish Rails team, big data migration from the previous generation of software. Your needs might be simpler).

If you did want to use fixtures the advantage over seed is that you can easily export also.
A quick guess at how the rake task may looks is as follows
desc 'Export the data objects to Fixtures from data in an existing
database. Defaults to development database. Set RAILS_ENV to override.'
task :export => :environment do
sql = "SELECT * FROM %s"
skip_tables = ["schema_info"]
export_tables = [
"roles",
"roles_users",
"roles_utilities",
"user_filters",
"users",
"utilities"
]
time_now = Time.now.strftime("%Y_%h_%d_%H%M")
folder = "#{RAILS_ROOT}/db/fixtures/#{time_now}/"
FileUtils.mkdir_p folder
puts "Exporting data to #{folder}"
ActiveRecord::Base.establish_connection(:development)
export_tables.each do |table_name|
i = "000"
File.open("#{folder}/#{table_name}.yml", 'w') do |file|
data = ActiveRecord::Base.connection.select_all(sql % table_name)
file.write data.inject({}) { |hash, record|
hash["#{table_name}_#{i.succ!}"] = record
hash }.to_yaml
end
end
end
desc "Import the models that have YAML files in
db/fixture/defaults or from a specified path."
task :import do
location = 'db/fixtures/default'
puts ""
puts "enter import path [#{location}]"
location_in = STDIN.gets.chomp
location = location_in unless location_in.blank?
ENV['FIXTURES_PATH'] = location
puts "Importing data from #{ENV['FIXTURES_PATH']}"
Rake::Task["db:fixtures:load"].invoke
end

Is seeding data with fixtures dangerous in Ruby on Rails

I have fixtures with initial data that needs to reside in my database (countries, regions, carriers, etc.). I have a task rake db:seed that will seed a database.
namespace :db do
desc "Load seed fixtures (from db/fixtures) into the current environment's database."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/yamls/*.yml').each do |file|
Fixtures.create_fixtures('db/fixtures/yamls', File.basename(file, '.*'))
end
end
end
I am a bit worried because this task wipes my database clean and loads the initial data. The fact that this is even possible to do more than once on production scares the crap out of me. Is this normal and do I just have to be cautious? Or do people usually protect a task like this in some way?

Seeding data with fixtures is an extremely bad idea.
Fixtures are not validated and since most Rails developers don't use database constraints this means you can easily get invalid or incomplete data inserted into your production database.
Fixtures also set strange primary key ids by default, which is not necessarily a problem but is annoying to work with.
There are a lot of solutions for this. My personal favorite is a rake task that runs a Ruby script that simply uses ActiveRecord to insert records. This is what Rails 3 will do with db:seed, but you can easily write this yourself.
I complement this with a method I add to ActiveRecord::Base called create_or_update. Using this I can run the seed script multiple times, updating old records instead of throwing an exception.
I wrote an article about these techniques a while back called Loading seed data.

For the first part of your question, yes I'd just put some precaution for running a task like this in production. I put a protection like this in my bootstrapping/seeding task:
task :exit_or_continue_in_production? do
if Rails.env.production?
puts "!!!WARNING!!! This task will DESTROY " +
"your production database and RESET all " +
"application settings"
puts "Continue? y/n"
continue = STDIN.gets.chomp
unless continue == 'y'
puts "Exiting..."
exit!
end
end
end
I have created this gist for some context.
For the second part of the question -- usually you really want two things: a) very easily seeding the database and setting up the application for development, and b) bootstrapping the application on production server (like: inserting admin user, creating folders application depends on, etc).
I'd use fixtures for seeding in development -- everyone from the team then sees the same data in the app and what's in app is consistent with what's in tests. (Usually I wrap rake app:bootstrap, rake app:seed rake gems:install, etc into rake app:install so everyone can work on the app by just cloning the repo and running this one task.)
I'd however never use fixtures for seeding/bootstrapping on production server. Rails' db/seed.rb is really fine for this task, but you can of course put the same logic in your own rake app:seed task, like others pointed out.

Rails 3 will solve this for you using a seed.rb file.
http://github.com/brynary/rails/commit/4932f7b38f72104819022abca0c952ba6f9888cb

We've built up a bunch of best practices that we use for seeding data. We rely heavily on seeding, and we have some unique requirements since we need to seed multi-tenant systems. Here's some best practices we've used:
Fixtures aren't the best solution, but you still should store your seed data in something other than Ruby. Ruby code for storing seed data tends to get repetitive, and storing data in a parseable file means you can write generic code to handle your seeds in a consistent fashion.
If you're going to potentially update seeds, use a marker column named something like code to match your seeds file to your actual data. Never rely on ids being consistent between environments.
Think about how you want to handle updating existing seed data. Is there any potential that users have modified this data? If so, should you maintain the user's information rather than overriding it with seed data?
If you're interested in some of the ways we do seeding, we've packaged them into a gem called SeedOMatic.

How about just deleting the task off your production server once you have seeded the database?

I just had an interesting idea...
what if you created \db\seeds\ and added migration-style files:
file: 200907301234_add_us_states.rb
class AddUsStates < ActiveRecord::Seeds
def up
add_to(:states, [
{:name => 'Wisconsin', :abbreviation => 'WI', :flower => 'someflower'},
{:name => 'Louisiana', :abbreviation => 'LA', :flower => 'cypress tree'}
]
end
end
def down
remove_from(:states).based_on(:name).with_values('Wisconsin', 'Louisiana', ...)
end
end
alternately:
def up
State.create!( :name => ... )
end
This would allow you to run migrations and seeds in an order that would allow them to coexist more peaceably.
thoughts?

How (and whether) to populate rails application with initial data

I've got a rails application where users have to log in. Therefore in order for the application to be usable, there must be one initial user in the system for the first person to log in with (they can then create subsequent users). Up to now I've used a migration to add a special user to the database.
After asking this question, it seems that I should be using db:schema:load, rather than running the migrations, to set up fresh databases on new development machines. Unfortunately, this doesn't seem to include the migrations which insert data, only those which set up tables, keys etc.
My question is, what's the best way to handle this situation:
Is there a way to get d:s:l to include data-insertion migrations?
Should I not be using migrations at all to insert data this way?
Should I not be pre-populating the database with data at all? Should I update the application code so that it handles the case where there are no users gracefully, and lets an initial user account be created live from within the application?
Any other options? :)

Try a rake task. For example:
Create the file /lib/tasks/bootstrap.rake
In the file, add a task to create your default user:
namespace :bootstrap do
desc "Add the default user"
task :default_user => :environment do
User.create( :name => 'default', :password => 'password' )
end
desc "Create the default comment"
task :default_comment => :environment do
Comment.create( :title => 'Title', :body => 'First post!' )
end
desc "Run all bootstrapping tasks"
task :all => [:default_user, :default_comment]
end
Then, when you're setting up your app for the first time, you can do rake db:migrate OR rake db:schema:load, and then do rake bootstrap:all.

Use db/seed.rb found in every Rails application.
While some answers given above from 2008 can work well, they are pretty outdated and they are not really Rails convention anymore.
Populating initial data into database should be done with db/seed.rb file.
It's just works like a Ruby file.
In order to create and save an object, you can do something like :
User.create(:username => "moot", :description => "king of /b/")
Once you have this file ready, you can do following
rake db:migrate
rake db:seed
Or in one step
rake db:setup
Your database should be populated with whichever objects you wanted to create in seed.rb

I recommend that you don't insert any new data in migrations. Instead, only modify existing data in migrations.
For inserting initial data, I recommend you use YML. In every Rails project I setup, I create a fixtures directory under the DB directory. Then I create YML files for the initial data just like YML files are used for the test data. Then I add a new task to load the data from the YML files.
lib/tasks/db.rake:
namespace :db do
desc "This loads the development data."
task :seed => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/fixtures/*.yml').each do |file|
base_name = File.basename(file, '.*')
say "Loading #{base_name}..."
Fixtures.create_fixtures('db/fixtures', base_name)
end
end
desc "This drops the db, builds the db, and seeds the data."
task :reseed => [:environment, 'db:reset', 'db:seed']
end
db/fixtures/users.yml:
test:
customer_id: 1
name: "Test Guy"
email: "test#example.com"
hashed_password: "656fc0b1c1d1681840816c68e1640f640c6ded12"
salt: "188227600.754087929365988"

This is my new favorite solution, using the populator and faker gems:
http://railscasts.com/episodes/126-populating-a-database

Try the seed-fu plugin, which is quite a simple plugin that allows you to seed data (and change that seed data in the future), will also let you seed environment specific data and data for all environments.

I guess the best option is number 3, mainly because that way there will be no default user which is a great way to render otherwise good security useless.

I thought I'd summarise some of the great answers I've had to this question, together with my own thoughts now I've read them all :)
There are two distinct issues here:
Should I pre-populate the database with my special 'admin' user? Or should the application provide a way to set up when it's first used?
How does one pre-populate the database with data? Note that this is a valid question regardless of the answer to part 1: there are other usage scenarios for pre-population than an admin user.
For (1), it seems that setting up the first user from within the application itself is quite a bit of extra work, for functionality which is, by definition, hardly ever used. It may be slightly more secure, however, as it forces the user to set a password of their choice. The best solution is in between these two extremes: have a script (or rake task, or whatever) to set up the initial user. The script can then be set up to auto-populate with a default password during development, and to require a password to be entered during production installation/deployment (if you want to discourage a default password for the administrator).
For (2), it appears that there are a number of good, valid solutions. A rake task seems a good way, and there are some plugins to make this even easier. Just look through some of the other answers to see the details of those :)

Consider using the rails console. Good for one-off admin tasks where it's not worth the effort to set up a script or migration.
On your production machine:
script/console production
... then ...
User.create(:name => "Whoever", :password => "whichever")
If you're generating this initial user more than once, then you could also add a script in RAILS_ROOT/script/, and run it from the command line on your production machine, or via a capistrano task.

That Rake task can be provided by the db-populate plugin:
http://github.com/joshknowles/db-populate/tree/master

Great blog post on this:
http://railspikes.com/2008/2/1/loading-seed-data
I was using Jay's suggestions of a special set of fixtures, but quickly found myself creating data that wouldn't be possible using the models directly (unversioned entries when I was using acts_as_versioned)

I'd keep it in a migration. While it's recommended to use the schema for initial setups, the reason for that is that it's faster, thus avoiding problems. A single extra migration for the data should be fine.
You could also add the data into the schema file, as it's the same format as migrations. You'd just lose the auto-generation feature.

For users and groups, the question of pre-existing users should be defined with respect to the needs of the application rather than the contingencies of programming. Perhaps your app requires an administrator; then prepopulate. Or perhaps not - then add code to gracefully ask for a user setup at application launch time.
On the more general question, it is clear that many Rails Apps can benefit from pre-populated date. For example, a US address holding application may as well contain all the States and their abbreviations. For these cases, migrations are your friend, I believe.

Some of the answers are outdated. Since Rails 2.3.4, there is a simple feature called Seed available in db/seed.rb :
#db/seed.rb
User.create( :name => 'default', :password => 'password' )
Comment.create( :title => 'Title', :body => 'First post!' )
It provides a new rake task that you can use after your migrations to load data :
rake db:seed
Seed.rb is a classic Ruby file, feel free to use any classic datastructure (array, hashes, etc.) and iterators to add your data :
["bryan", "bill", "tom"].each do |name|
User.create(:name => name, :password => "password")
end
If you want to add data with UTF-8 characters (very common in French, Spanish, German, etc.), don't forget to add at the beginning of the file :
# ruby encoding: utf-8
This Railscast is a good introduction : http://railscasts.com/episodes/179-seed-data

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart