Ruby on Rails - Automatically insert data into DB without seeds.rb - ruby-on-rails

I've some .json files with data that are automatically updated from time to time. I also have a Ruby on Rails app where I want to insert the information that's in those files.
By now, I'm parsing the JSON and inserting the data in the database in the seeds.rb file, but I'll want to add more data without having to restart the app, I mean, on the go.
From time to time, I'll check those files and if they have modifications, I want to insert those new items into my database.
What's the best way to do it?

Looks like a job for a cron.
Create the code you need in a rake task (in /lib/tasks):
task :import, => :environment do
Importer.read_json_if_modified # your importer class
end
Then run this with the period you want using your system's cron.

Related

How to fill model(database) in ruby on rails with json file?

I have a json file with a lot of movies in it. I want to create a model 'Movie' and fill it with all movies from that json file. How do i do it? I know that I can parse json file into a hash but that's not the thing I am looking for.
The correct term you're looking for is "seeding"!
You're going to need a database however, and a migration to create the database along with the associated movies table. (There are plenty of guides on how to do this, along with the official documentation).
After that's done, you'll need to "seed" your database with the data in your json file.
In the seeds.rb file, assuming that the JSON file is an array of Movies in JSON form, you should be able to loop over every Movie JSON object and insert it into your database.
To add to docaholic's helpful response, here's some steps/pseudo-code that may help.
Assuming you're using a SQL database and need to create a model:
# creates a migration file.
rails generate migration create_movies title:string #duration_in_minutes:integer or whatever fields you have
# edit the file to add other fields/ensure it has what you want.
rake db:migrate
Write a script to populate your database. There are many patterns for this (rake task, test fixtures, etc) and which one you'd want to use would depend on what you need (whether it's for testing, for production environment, as seed data for new environments, etc).
But generally what the code would look like is:
text_from_file = File.read(file_path)
JSON.parse(text_from_file).each do |json_movie_object|
Movie.create!(title: json_movie_object[:title], other_attribute: json_movie_object[:other_attribute])
# if the json attributes exactly match the column names, you can do
# Movie.create!(json_movie_object)
end
This is not the most performant option for large amounts of data. For large files you can use insert_all for much greater efficiency, but this bypasses activerecord validations and callbacks so you'd want to understand what that means.
For my case, I need to seed around 200 hundred datas for production from a JSON file, so I tried to insert data from the json files in a database.
SO at first I created a database in rails project
rails generate migration create_movies title:string duration_in_minutes:integer
# or whatever fields you have
# edit the file to add other fields/ensure it has what you want.
rake db:migrate
Now its time to seed datas!
Suppose your movies.json file has:
[
{"id":"1", "name":"Titanic", "time":"120"},
{"id":"2", "name":"Ruby tutorials", "time":"120"},,
{"id":"3", "name":"Something spetial", "time":"500"}
{"id":"4", "name":"2HAAS", "time":"320"}
]
NOW, like this 4 datas, think your JSON file has 400+ datas to input which is a nightmare to write in seed files manually.
You need to add JSON gem to work. Run
bundle add json
to work with JSON file.
In db/seed.rb file, add these lines to add those 400+ infos in your DATABASE:
infos_from_json = File.read(your/JSON/file/path/movies.json)
JSON.parse(infos_from_json).each do |t|
Movie.create!(title: t['name'], duration_in_minutes:
t['time'])
end
Here:
infos_from_json variable fixing the JSON file from file directory
JSON.parse is calling the JSON datas then the variable is declared to indicate where the file is located and each do loop is dividing each data and |t| will be used to add part JSON.parse(infos_from_json). in every call.
Movie.create!(title: t['title'], duration_in_minutes: t['time']) is adding the data in database
This is how we can add datas in our database from a JSON file easily.
For more info about seeding data in database checkout the documentation
THANKS
Edit: This operation requires JSON gem just type bundle add json

fetch csv feed and place into database in rails

So from scratch, I have a CSV feed. Its currently 2596 lines long (Yay)
This feed gets updated frequently, I'm wanting to have this csv feed, (Baring in mind when i click the link it instantly downloads as a csv file.) to populate my database daily at a random time (e.g. 5am in the morning) every morning the database table would wipe and repopulate via the csv. (the way i access the csv is via url)
How would i go about this using rails? I'm unaware if there is any gems or anything i could use for this.
Sam
You wouldn't do it via rails - rails is a web framework, this is more a background task. If the population of the database needs to know your application structure, I'd set this up as a rake task in lib/tasks/populate.rake
Your question is much to broad to answer fully without more details, but generally something like the below should work.
Edit: delete users and recreate from an assumed structure
require 'open-uri'
namespace :populate do
desc 'wipes the database and recreates from CSV'
task reload: :environment do
# Remove all users
User.delete_all
CSV.new(open(YOUR_CSV_URL)).each do |row|
# do something with the row
User.create(name: row[0], address: row[1])
end
end
end
You could then use Cron or equivalent to call this at 5am
cd /path/to/your/web/app && RAILS_ENV=production bundle exec rake populate:reload

Rails separated seeds

Hi i probably have a simple question.
I have this database with moodboard templates.
I want to transport it to the server with capistrano but in my seeds.rb file there is only all the seeds and if i run them again a lot of data gets inserted twice.
I normally run:
rake db:seed
But i would like to see another command
How can i make a separate seed file to execute on its own.
You can either delete your seeds before insert in seeds.rb or check the count of the model rows or check for the id if the item you're looking to insert before you insert it in your seeds.rb. Basically, I think you just need to add some checks before you insert additional data. I could provide a more specific answer if you posted your seeds.rb.

How to handle periodically changing database data in your Rails app?

EDIT: I have totally rewritten this question for clarity. I got no comments and no answers earlier.
I am maintaining a 2.x Rails app with plenty of statistical data. Some data is real and some is estimated for the future years. Every year I need to update estimated data with real data and calculate new estimates.
I have been using BIG yml-files and migrations for loading the data into the app every year. My migrations are full of estimation calculations and data corrections.
Problem
My migrations are full of none-schema related material and I can't even dream of doing db:migrate:reset without waiting few hours (if it even works). I'd love to see my migrations nice and clean - only with schema related modifications. But how I am suppose to update the data every year if not using migrations?
Help needed
I'd like to hear your comments and answers. I'm not looking for a silver bullet - more like best practises and ideas how people are handling similar situation.
It sounds like you have a large operation (data load using yml files) once a year but smaller operations once a month.
From my experience with statistical data you will probably end up doing more and more of these operations to clean and add more data.
I would use a job processing framework like resque and resque scheduler.
You can schedule the jobs to run once a month, year, day or constantly running. A job is something like loading yml files (or sets of yml files) or cleaning up data. You can control parameters to send to your job so you can use one class but alternate how it updates or cleans your data based on the way you enqueue or schedule the job.
First of all, I have to say that this is a very interesting question. As far as i know, it isn't a good idea loading data from migrations. Generally speaking you should use db/seeds.rb for data loading in your db and I think it could be a good idea to write a little class helper to put in your lib dir and then call it from db/seeds.rb. I image you could organize you files in the following way:
lib/data_loader.rb
lib/years/2009.rb
lib/years/2010.rb
Obviously, you should clear your migrations and write code for lib/data_loader.rb in the way you should prefer but I was only trying to offer a general idea of how I'd organize my code if I have to face a problem like that.
I'm not sure I've replied to your question in a way that helps but I hope it does.
If I were you I would go with creating custom rake task. You will have access to all you models and activerecord connections and once a year you will end up doing:
rake calculate
I have a situation where I need to load data from CSV files that change infrequently and update data from the Internet daily. I'll include a somewhat complete example on how to do the former.
First I have a rake file in lib/tasks/update.rake:
require 'update/from_csv_files.rb'
namespace :update do
task :csvfiles => :environment do
Dir.glob('db/static_data/*.csv') do |file|
Update::FromCsvFiles.load(file)
end
end
end
The => :environment means we will have access to the database via the usual models.
Then I have code in the lib/update/from_csv_files.rb file to do the actual work:
require 'csv'
module Update
module FromCsvFiles
def FromCsvFiles.load(file)
csv = CSV.open(file, 'r')
csv.each do |row|
id = row[0]
s = Statistic.find_by_id(id)
if (s.nil?)
s = Statistic.new
s.id= id
end
s.survey_area = row[1]
s.nr_of_space_men = row[2]
s.save
end
end
end
end
Then I can just run rake update:csvfiles whenever my CSV files changes to load the new data. I also have another task that is set up in a similar way to update my daily data.
In your case you should be able to write some code to load your YML files or make your calculations directly. To handle your smaller corrections you could make a generic method for loading YML files and call it with specific files from the rake task. That way you only need to include the YML file and update the rake file with a new task. To handle execution order you can make a rake task that calls the other rake tasks in the appropriate order. I'm just throwing around some ideas now, you know better than me.

Ruby on Rails 2.3.5: Populating my prod and devel database with data (migration or fixture?)

I need to populate my production database app with data in particular tables. This is before anyone ever even touches the application. This data would also be required in development mode as it's required for testing against. Fixtures are normally the way to go for testing data, but what's the "best practice" for Ruby on Rails to ship this data to the live database also upon db creation?
ultimately this is a two part question I suppose.
1) What's the best way to load test data into my database for development, this will be roughly 1,000 items. Is it through a migration or through fixtures? The reason this is a different answer than the question below is that in development, there's certain fields in the tables that I'd like to make random. In production, these fields would all start with the same value of 0.
2) What's the best way to bootstrap a production db with live data I need in it, is this also through a migration or fixture?
I think the answer is to seed as described here: http://lptf.blogspot.com/2009/09/seed-data-in-rails-234.html but I need a way to seed for development and seed for production. Also, why bother using Fixtures if seeding is available? When does one seed and when does one use fixtures?
Usually fixtures are used to provide your tests with data, not to populate data into your database. You can - and some people have, like the links you point to - use fixtures for this purpose.
Fixtures are OK, but using Ruby gives us some advantages: for example, being able to read from a CSV file and populate records based on that data set. Or reading from a YAML fixture file if you really want to: since your starting with a programming language your options are wide open from there.
My current team tried to use db/seed.rb, and checking RAILS_ENV to load only certain data in certain places.
The annoying thing about db:seed is that it's meant to be a one shot thing: so if you have additional items to add in the middle of development - or when your app has hit production - ... well, you need to take that into consideration (ActiveRecord's find_or_create_by...() method might be your friend here).
We tried the Bootstrapper plugin, which puts a nice DSL over the RAILS_ENV checking, and lets your run only the environment you want. It's pretty nice.
Our needs actually went beyond that - we found we needed database style migrations for our seed data. Right now we are putting normal Ruby scripts into a folder (db/bootstrapdata/) and running these scripts with Arild Shirazi's required gem to load (and thus run) the scripts in this directory.
Now this only gives you part of the database style migrations. It's not hard to go from this to creating something where these data migrations can only be run once (like database migrations).
Your needs might stop at bootstrapper: we have pretty unique needs (developing the system when we only know half the spec, larg-ish Rails team, big data migration from the previous generation of software. Your needs might be simpler).
If you did want to use fixtures the advantage over seed is that you can easily export also.
A quick guess at how the rake task may looks is as follows
desc 'Export the data objects to Fixtures from data in an existing
database. Defaults to development database. Set RAILS_ENV to override.'
task :export => :environment do
sql = "SELECT * FROM %s"
skip_tables = ["schema_info"]
export_tables = [
"roles",
"roles_users",
"roles_utilities",
"user_filters",
"users",
"utilities"
]
time_now = Time.now.strftime("%Y_%h_%d_%H%M")
folder = "#{RAILS_ROOT}/db/fixtures/#{time_now}/"
FileUtils.mkdir_p folder
puts "Exporting data to #{folder}"
ActiveRecord::Base.establish_connection(:development)
export_tables.each do |table_name|
i = "000"
File.open("#{folder}/#{table_name}.yml", 'w') do |file|
data = ActiveRecord::Base.connection.select_all(sql % table_name)
file.write data.inject({}) { |hash, record|
hash["#{table_name}_#{i.succ!}"] = record
hash }.to_yaml
end
end
end
desc "Import the models that have YAML files in
db/fixture/defaults or from a specified path."
task :import do
location = 'db/fixtures/default'
puts ""
puts "enter import path [#{location}]"
location_in = STDIN.gets.chomp
location = location_in unless location_in.blank?
ENV['FIXTURES_PATH'] = location
puts "Importing data from #{ENV['FIXTURES_PATH']}"
Rake::Task["db:fixtures:load"].invoke
end

Resources