I have a ruby script written which takes a CSV file and converts the input into a hash:
Culper = File.open('.\CulperCSV.csv')
culper_hash = {}
# set up culper code hash from provided CSV
CSV.foreach(Culper) do |row|
number, word = row
culper_hash[word] = number
end
and I am trying to make a Rails app using the script.
My question: How do I store the Hash persistently (or the CSV data so I can build the hash) so that I can minimize load times?
My thoughts:
1) Load the CSV data into a database (seed it) and each time I get a visitor on my site, do the above assignment into a hash but from the db. (not sure how to do this but I can research it).
or
2) Load the complete hash into the database (I think I would have to serialize it?) so that I can do just one fetch from the db and have the hash ready to go.
I am very new to building apps, especially in Rails so please ask questions if what I am trying to do doesn't make sense.
I suggest you should go with 2nd approach. Here are steps to do that:
Setup new app:
rails new app_name
bundle install
rake db:create
Create Model:
rails g model model_name column_name:text
rake db:migrate
Open model_name.rb file and add the following line
serialize :column_name
Now all sets. Just run the your script to parse .csv file and store hash in db. Your column is now able to store the hash.
Culper = File.open('.\CulperCSV.csv')
# get the object from database
obj = ModelName.first
# set up culper code hash from provided CSV
CSV.each(Culper) do |row|
number, word = row
obj.column_name[word] = number
end
obj.save
Your .csv file seems to be already in your Rails app directory, so load times shouldn't be bad (unless it's really big). However, if that file isn't going to change and you need only a small piece of it at a time, then I would store that in your database.
Create a model/migration that corresponds to the data you have in the .csv file and then (after migrating the migration) run a script to parse the data from your .csv file to your database.
I managed to solve my problem following the advice of #Kh Ammad: setting up a new app, creating a model for it, and marking my column serializable.
However, I had some problems with running the script to populate the model with the hash so instead, after some research, I created the rake task below:
#lib/tasks/import.rake
require 'csv'
task :import, [:filename] => :environment do
culper_hash = {}
Culper = File.open('.\CulperCSV.csv')
CSV.foreach(Culper) do |row|
number, word = row
culper_hash[word] = number
end
# culper_hash == column_name
obj = CulperDict.create(culper_hash: culper_hash)
obj.save
end
and ran it with:
$ bundle exec rake import
and my model contained the entire hash table in one entry!
I used this article to figure out how to run a rake task:
http://erikonrails.snowedin.net/?p=212
Specifically, the last comment on the page by Lauralee (posted on December 20th, 2012 at 8:47 am) who ran into a similar problem.
Related
I have a json file with a lot of movies in it. I want to create a model 'Movie' and fill it with all movies from that json file. How do i do it? I know that I can parse json file into a hash but that's not the thing I am looking for.
The correct term you're looking for is "seeding"!
You're going to need a database however, and a migration to create the database along with the associated movies table. (There are plenty of guides on how to do this, along with the official documentation).
After that's done, you'll need to "seed" your database with the data in your json file.
In the seeds.rb file, assuming that the JSON file is an array of Movies in JSON form, you should be able to loop over every Movie JSON object and insert it into your database.
To add to docaholic's helpful response, here's some steps/pseudo-code that may help.
Assuming you're using a SQL database and need to create a model:
# creates a migration file.
rails generate migration create_movies title:string #duration_in_minutes:integer or whatever fields you have
# edit the file to add other fields/ensure it has what you want.
rake db:migrate
Write a script to populate your database. There are many patterns for this (rake task, test fixtures, etc) and which one you'd want to use would depend on what you need (whether it's for testing, for production environment, as seed data for new environments, etc).
But generally what the code would look like is:
text_from_file = File.read(file_path)
JSON.parse(text_from_file).each do |json_movie_object|
Movie.create!(title: json_movie_object[:title], other_attribute: json_movie_object[:other_attribute])
# if the json attributes exactly match the column names, you can do
# Movie.create!(json_movie_object)
end
This is not the most performant option for large amounts of data. For large files you can use insert_all for much greater efficiency, but this bypasses activerecord validations and callbacks so you'd want to understand what that means.
For my case, I need to seed around 200 hundred datas for production from a JSON file, so I tried to insert data from the json files in a database.
SO at first I created a database in rails project
rails generate migration create_movies title:string duration_in_minutes:integer
# or whatever fields you have
# edit the file to add other fields/ensure it has what you want.
rake db:migrate
Now its time to seed datas!
Suppose your movies.json file has:
[
{"id":"1", "name":"Titanic", "time":"120"},
{"id":"2", "name":"Ruby tutorials", "time":"120"},,
{"id":"3", "name":"Something spetial", "time":"500"}
{"id":"4", "name":"2HAAS", "time":"320"}
]
NOW, like this 4 datas, think your JSON file has 400+ datas to input which is a nightmare to write in seed files manually.
You need to add JSON gem to work. Run
bundle add json
to work with JSON file.
In db/seed.rb file, add these lines to add those 400+ infos in your DATABASE:
infos_from_json = File.read(your/JSON/file/path/movies.json)
JSON.parse(infos_from_json).each do |t|
Movie.create!(title: t['name'], duration_in_minutes:
t['time'])
end
Here:
infos_from_json variable fixing the JSON file from file directory
JSON.parse is calling the JSON datas then the variable is declared to indicate where the file is located and each do loop is dividing each data and |t| will be used to add part JSON.parse(infos_from_json). in every call.
Movie.create!(title: t['title'], duration_in_minutes: t['time']) is adding the data in database
This is how we can add datas in our database from a JSON file easily.
For more info about seeding data in database checkout the documentation
THANKS
Edit: This operation requires JSON gem just type bundle add json
I am Rails 5.1 with Postgresql 9.6.9 on Heroku free tier.
I recently was using a rake task with csv files to try to create a couple of batches or records. I then wanted to test the front end use and just verify my forms were working like they do in development.
However, when I went to add a new Record with the form I kept getting a 500 error. When I check my Heroku logs, I got this error.
ActiveRecord::RecordNotUnique (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "games_pkey"
It would list the id it was trying to use, which would be a lower id that was used when I imported records via the rake task and csv file.
I confirmed this truly was the issue by continuously attempting to use my form to create the record, which finally went through after about 20 attempts and it got to the correct next id.
I am not sure if this is an error caused by Postgres or Rails. It clearly was started when I started using the rake task and csv file. The csv file does have an attribute for ID which was being used for an easy check to either update or create a new file, but perhaps this was my cause.
Any advice on how to prevent this error would be great, as I have also uploaded other csv files for other models in even bigger batches. I would like to continue to do so, but only if I can avoid this error, so that I can add on one or two by the simple form I have when possible.
Thank you for your help.
Here is the rask task code that maybe is causing an issue:
CSV.foreach(filename, {col_sep: ";", headers: true}) do |row|
game_hash = row.to_h
game = Game.find_by(id: game_hash["id"])
if(!game)
game = Game.create(game_hash)
else
game.update(game_hash)
end
end
If the issue is because of id in the game_hash then I would attempt to remove that as:
CSV.foreach(filename, {col_sep: ";", headers: true}) do |row|
game_hash = row.to_h
game = Game.find_by(id: game_hash["id"])
#remove the id from the hash
game_hash.delete :id
if(!game)
game = Game.create(game_hash)
else
game.update(game_hash)
end
end
This should remove the id before the new Game is created.
So from scratch, I have a CSV feed. Its currently 2596 lines long (Yay)
This feed gets updated frequently, I'm wanting to have this csv feed, (Baring in mind when i click the link it instantly downloads as a csv file.) to populate my database daily at a random time (e.g. 5am in the morning) every morning the database table would wipe and repopulate via the csv. (the way i access the csv is via url)
How would i go about this using rails? I'm unaware if there is any gems or anything i could use for this.
Sam
You wouldn't do it via rails - rails is a web framework, this is more a background task. If the population of the database needs to know your application structure, I'd set this up as a rake task in lib/tasks/populate.rake
Your question is much to broad to answer fully without more details, but generally something like the below should work.
Edit: delete users and recreate from an assumed structure
require 'open-uri'
namespace :populate do
desc 'wipes the database and recreates from CSV'
task reload: :environment do
# Remove all users
User.delete_all
CSV.new(open(YOUR_CSV_URL)).each do |row|
# do something with the row
User.create(name: row[0], address: row[1])
end
end
end
You could then use Cron or equivalent to call this at 5am
cd /path/to/your/web/app && RAILS_ENV=production bundle exec rake populate:reload
I've some .json files with data that are automatically updated from time to time. I also have a Ruby on Rails app where I want to insert the information that's in those files.
By now, I'm parsing the JSON and inserting the data in the database in the seeds.rb file, but I'll want to add more data without having to restart the app, I mean, on the go.
From time to time, I'll check those files and if they have modifications, I want to insert those new items into my database.
What's the best way to do it?
Looks like a job for a cron.
Create the code you need in a rake task (in /lib/tasks):
task :import, => :environment do
Importer.read_json_if_modified # your importer class
end
Then run this with the period you want using your system's cron.
I need to populate my production database app with data in particular tables. This is before anyone ever even touches the application. This data would also be required in development mode as it's required for testing against. Fixtures are normally the way to go for testing data, but what's the "best practice" for Ruby on Rails to ship this data to the live database also upon db creation?
ultimately this is a two part question I suppose.
1) What's the best way to load test data into my database for development, this will be roughly 1,000 items. Is it through a migration or through fixtures? The reason this is a different answer than the question below is that in development, there's certain fields in the tables that I'd like to make random. In production, these fields would all start with the same value of 0.
2) What's the best way to bootstrap a production db with live data I need in it, is this also through a migration or fixture?
I think the answer is to seed as described here: http://lptf.blogspot.com/2009/09/seed-data-in-rails-234.html but I need a way to seed for development and seed for production. Also, why bother using Fixtures if seeding is available? When does one seed and when does one use fixtures?
Usually fixtures are used to provide your tests with data, not to populate data into your database. You can - and some people have, like the links you point to - use fixtures for this purpose.
Fixtures are OK, but using Ruby gives us some advantages: for example, being able to read from a CSV file and populate records based on that data set. Or reading from a YAML fixture file if you really want to: since your starting with a programming language your options are wide open from there.
My current team tried to use db/seed.rb, and checking RAILS_ENV to load only certain data in certain places.
The annoying thing about db:seed is that it's meant to be a one shot thing: so if you have additional items to add in the middle of development - or when your app has hit production - ... well, you need to take that into consideration (ActiveRecord's find_or_create_by...() method might be your friend here).
We tried the Bootstrapper plugin, which puts a nice DSL over the RAILS_ENV checking, and lets your run only the environment you want. It's pretty nice.
Our needs actually went beyond that - we found we needed database style migrations for our seed data. Right now we are putting normal Ruby scripts into a folder (db/bootstrapdata/) and running these scripts with Arild Shirazi's required gem to load (and thus run) the scripts in this directory.
Now this only gives you part of the database style migrations. It's not hard to go from this to creating something where these data migrations can only be run once (like database migrations).
Your needs might stop at bootstrapper: we have pretty unique needs (developing the system when we only know half the spec, larg-ish Rails team, big data migration from the previous generation of software. Your needs might be simpler).
If you did want to use fixtures the advantage over seed is that you can easily export also.
A quick guess at how the rake task may looks is as follows
desc 'Export the data objects to Fixtures from data in an existing
database. Defaults to development database. Set RAILS_ENV to override.'
task :export => :environment do
sql = "SELECT * FROM %s"
skip_tables = ["schema_info"]
export_tables = [
"roles",
"roles_users",
"roles_utilities",
"user_filters",
"users",
"utilities"
]
time_now = Time.now.strftime("%Y_%h_%d_%H%M")
folder = "#{RAILS_ROOT}/db/fixtures/#{time_now}/"
FileUtils.mkdir_p folder
puts "Exporting data to #{folder}"
ActiveRecord::Base.establish_connection(:development)
export_tables.each do |table_name|
i = "000"
File.open("#{folder}/#{table_name}.yml", 'w') do |file|
data = ActiveRecord::Base.connection.select_all(sql % table_name)
file.write data.inject({}) { |hash, record|
hash["#{table_name}_#{i.succ!}"] = record
hash }.to_yaml
end
end
end
desc "Import the models that have YAML files in
db/fixture/defaults or from a specified path."
task :import do
location = 'db/fixtures/default'
puts ""
puts "enter import path [#{location}]"
location_in = STDIN.gets.chomp
location = location_in unless location_in.blank?
ENV['FIXTURES_PATH'] = location
puts "Importing data from #{ENV['FIXTURES_PATH']}"
Rake::Task["db:fixtures:load"].invoke
end