Trouble Importing Data into Rails, Strange Looping - ruby-on-rails

I am trying to import data into rails (3.1) and I have created this rake task to parse a CSV file (generated by Excel on Mac)
desc "Import users."
task :import_users => :environment do
File.open("users.csv", "r").each do |line|
id, name, age, email = line.strip.split(',')
u = User.new(:id => id, :name => name, :age => age, :email => email)
u.save
end
end
However when I run the rake task, only the first line of the CSV file gets imported. It does not iterate over every line in the file besides the first one. Can anyone tell me why?

Not sure, but I think what is happening here is the each is representing each file rather than each line. And as there's only one file, this may not work as expected. I'd try a CSV parser instead:
CSV.foreach("users.csv") do |line|
id, name, age, email = line
u = User.new(:id => id, :name => name, :age => age, :email => email)
u.save
end

When parsing any kind of text file using ruby, be sure to check encoding and/or line endings to make sure it's a format that Ruby likes.
In this case, Ruby disliked the Mac OS X line ending format, but liked the Unix one.

Related

Rails reuse object to save memory in bulk import

I'm currently using SmarterCSV to do bulk CSV import via MongoDB's upsert commands. I have the following code excerpt:
SmarterCSV.process(csv, csv_options) do |chunk|
chunk.each do |row|
#creates a temporary user to store the object
user = User.new
#converts row info to populate user object
#creates an array of commands that can be executed by MongoDB via user.as_document
updates << {:q => {:email => user.email},
:u => {:$set => user.as_document},
:multi => false,
:upsert => true}
user = nil
end
end
However, I'm noticing that the memory usage keeps growing as the Garbage Collection (using Rails 3.2.14 & Ruby 2.0.0p353) doesn't seem to clear the temporary user objects fast enough.
So I tried to create user = User.new outside of the SmarterCSV process (see below) and reuse the user object within the process. This saves memory. However, user.as_document would overwrite previous elements in the updates array on each iteration. I was able to solve the problem by using user.as_document.to_json, but that doesn't set any of User's relationship correctly. For example, instead of saving a BSON reference for an relation's id, it only saves the id in string format.
Any ideas? Is there a way that I can optimize the bulk import process?
user = User.new
SmarterCSV.process(csv, csv_options) do |chunk|
chunk.each do |row|
#creates a temporary user to store the object
#converts row info to populate & reuse user object
#creates an array of commands that can be executed by MongoDB via user.as_document.to_json
updates << {:q => {:email => user.email},
:u => {:$set => user.as_document.to_json},
:multi => false,
:upsert => true}
end
end
I ended fixing this by using 'user.as_document.deep_dup'

What is the Ruby on Rails-way to import a structured comma delimited file and then create records Activerecord

I have a structured comma delimited file that has two record types. The different records are differentiated by a header entry: H or P. The file format follows:
"H","USA","MD","20904"
"P","1","A","Female","W"
"P","2","A","Male","H"
I'd like to import the file and then create activerecord models with the imported data. The approach that I am using is to create a field map that includes the number of fields, object name and columns.
I then utilize the field map
$field_map =
{
'H' =>
{
:count => 4,
:object => :Header,
:cols => [:record_type, :country_id, :state, :zip]
},
'R' =>
{
:count => 4,
:object => :RaceData,
:cols => [:record_type, :household_size, :gender, :race]
}
}
I then use FastCSV to import the file and use a case statement to how the file will be transformed and then used in activerecord create statements.
FasterCSV.foreach(filename) do |row|
tbl_type = row[0]
tbl_info = $field_map[tbl_type]
unless (tbl_info.nil?)
field_no = tbl_info[:count]
object = tbl_info[:object]
columns = tbl_info[:cols]
record_type = new_record[:record_type]
case record_type
when "H"
factory_build_h_record(new_record)
when "P"
factory_build_p_record(new_record)
end
end
end
The code above is summarized due to space constraints. My approach works just fine, but my I'm new to ruby and I'm always interested in best practices and the "true" Ruby-way of doing things. I'd be interested in hearing how more experienced programmers would tackle this problem. Thanks for your input.
I suggest the gem 'roo'
You have an example source code here but I rather watch the 10 min video

Data casting error when reading from a file in Ruby

I am trying to import data into rails (3.1) and I have created this rake task to parse a tab delimited text file (generated by Excel on Mac), the file has standard Mac OS X line endings.
desc "Import users."
task :import_users => :environment do
File.open("users.txt", "r", '\r').each do |line|
id, name, age, email = line.strip.split('/t')
u = User.new(:id => id, :name => name, :age => age, :email => email)
u.save
end
end
However, when I try to run this rake task I get the following error:
rake aborted!
can't convert String into Integer
My guess is that Ruby doesn't like converting the Age heading into a numerical age variable in my User class. Is there a way I can either
(a) skip the header line in the file OR
(b) do this cast on the fly in Ruby?
Note: This is one of many attempts to read in some data to Ruby. Whenever I tried to read in the data before, I never seemed to get this error. The string value always got casted to 0.0.
Simpliest solution, which come to mind was:
u = User.new(:id => id, :name => name, :age => Integer(age), :email => email)
Of course, you will still have an error on a first line of a file, if you got your headers there.

appending to rake db:seed in rails and running it without duplicating data

Rake db:seed populates your db with default database values for an app right? So what if you already have a seed and you need to add to it(you add a new feature that requires the seed). In my experience, when I ran rake db:seed again, it added the existing content already so existing content became double.
What I need is to add some seeds and when ran, it should just add the newest ones, and ignore the existing seeds. How do I go about with this? (the dirty, noob way I usually do it is to truncate my whole db then run seed again, but that's not very smart to do in production, right?)
A cleaner way to do this is by using find_or_create_by, as follows:
User.find_or_create_by_username_and_role(
:username => "admin",
:role => "admin",
:email => "me#gmail.com")
Here are the possible outcomes:
A record exists with username "admin" and role "admin". This record will NOT be updated with the new e-mail if it already exists, but it will also NOT be doubled.
A record does not exist with username "admin" and role "admin". The above record will be created.
Note that if only one of the username/role criteria are satisfied, it will create the above record. Use the right criteria to ensure you aren't duplicating something you want to remain unique.
I do something like this.... When I need to add a user
in seeds.rb:
if User.count == 0
puts "Creating admin user"
User.create(:role=>:admin, :username=>'blagh', :etc=>:etc)
end
You can get more interesting than that, but in this case, you could run it over again as needed.
Another option that might have a slight performance benefit:
# This example assumes that a role consists of just an id and a title.
roles = ['Admin', 'User', 'Other']
existing_roles = Role.all.map { |r| r.title }
roles.each do |role|
unless existing_roles.include?(role)
Role.create!(title: role)
end
end
I think that doing it this way, you only have to do one db call to get an array of what exists, then you only need to call again if something isn't there and needs to be created.
Adding
from
departments = ["this", "that"]
departments.each{|d| Department.where(:name => d).first_or_create}
to
departments = ["this", "that", "there", "then"]
departments.each{|d| Department.where(:name => d).first_or_create}
this is a simple example,
Updating/rename
from
departments = ["this", "that", "there", "then"]
departments.each{|d| Department.where(:name => d).first_or_create}
to
departments = ["these", "those", "there", "then"]
new_names = [['these', 'this'],['those','that']]
new_names.each do |new|
Department.where(:name => new).group_by(&:name).each do |name, depts|
depts.first.update_column :name, new[0] if new[1] == name # skips validation
# depts[1..-1].each(&:destroy) if depts.size > 1 # paranoid mode
end
end
departments.each{|d| Department.where(:name => d).first_or_create}
IMPORTANT: You need to update the elements of departments array else duplication will surely happen.
Work around: Add a validates_uniqueness_of validation or a validation of uniqueness comparing all necessary attributes BUT don't use methods skipping validations.
My preference for this sort of thing is to create a custom rake task rather than use the seeds.rb file.
If you're trying to bulk create users I'd create a .csv files with the data then create a rake task called import_users and pass it the filename. Then loop through it to create the user records.
In lib/tasks/import_users.rake:
namespace :my_app do
desc "Import Users from a .csv"
task :import_users => :environment do
# loop through records and create users
end
end
Then run like so: rake bundle exec my_app:import_users path/to/.csv
If you need to run it in production: RAILS_ENV=production bundle exec rake my_app:import_users /path/to/.csv
Another alternative is to use the #first_or_create.
categories = [
[ "Category 1", "#e51c23" ],
[ "Category 2", "#673ab7" ]
]
categories.each do |name, color|
Category.where( name: name, color: color).first_or_create
end
A really hackable way would be to comment out the existing data, that's how i did it, and it worked fine for me
=begin
#Commented Out these lines since they where already seeded
PayType.create!(:name => "Net Banking")
PayType.create!(:name => "Coupouns Pay")
=end
#New data to be used by seeds
PayType.create!(:name => "Check")
PayType.create!(:name => "Credit card")
PayType.create!(:name => "Purchase order")
PayType.create!(:name => "Cash on delivery")
Once done just remove the comments
Another trivial alternative:
#categories => name, color
categories = [
[ "Category 1", "#e51c23" ],
[ "Category 2", "#673ab7" ]
]
categories.each do |name, color|
if ( Category.where(:name => name).present? == false )
Category.create( name: name, color: color )
end
end
Just add User.delete_all and for all the models that you have included in your application at the beginning of your seed.rb file. There will not be any duplicate values for sure.

Trouble importing csv file with ruby CSV Module

I'm trying to use Ruby's csv module to import the records contained in a csv file to my local table in a Ruby on Rails 3 application.
The table was created through the creation of model Movie.
Here is what I've been executing in console:
require 'csv'
CSV.foreach('public/uploads/VideoTitles2.csv') do |row|
record = Movie.new(
:media_format => row[0],
:title => row[1],
:copies_at_home => row[2],
:order => row[3]
)
record.save
end
The rows of the csv file match (in data type) the columns they're being passed into. Here is a shortened version of the csv file (VideoTitles2.csv) I'm attempting to import:
"DVD","LEAP OF FAITH",1,1
"DVD","COCOON",1,2
"DVD","TITANIC",1,3
where each record is separated by \n I believe. This csv file was exported from Access and its original file extension was .txt. I've manually changed it to .csv for sake of the import.
The problem is that, after executing the above lines in rails console, I get the following output:
=> nil
The import doesn't seem to happen. If anyone has an idea as to how I could remedy this I'd really appreciate it.
I don't see the problem. This code snippet returns nil because CSV.foreach returns nil, but this is no indication if the loop is run or not. Did you checked if any Movie was created? did you include any debug lines to follow the process?
You may want to check the output of record.save (or call record.save!), maybe validations errors are preventing the record from being created. Also, if you want the loop to return the created records, you can write this (Ruby >= 1.8.7):
require 'csv'
records = CSV.foreach('public/uploads/VideoTitles2.csv').map do |media_format, title, copies_at_home, order|
Movie.create!({
media_format: media_format,
title: title,
copies_at_home: copies_at_home,
order: order,
})
end
Okay there were two things I had wrong:
The exported csv file should not have quotations around the strings - I just removed them.
Thanks to tokland, the record.save! was necessary (as opposed to the record.save I was doing) - validation errors were preventing the records from being created.
So to conclude, one could just create the following function after creating the model/table Movie:
class Movie < ActiveRecord::Base
attr_accessible :media_format, :title, :copies_at_home, :order
require 'csv'
def self.import_movies()
CSV.foreach('public/uploads/movies.csv') do |row|
record = Movie.new(
:media_format => row[0],
:title => row[1],
:copies_at_home => row[2],
:order => row[3]
)
record.save!
end
end
end
Where movies.csv looks like the following:
Blu-ray, Movie 1, 1, 1
DVD, Movie 2, 1, 2
Blu-ray, Movie 3, 1, 3
then call this function in console as such:
Movie.import_movies()
and, as expected, all that would be returned in the console would be:
=> nil
Check your index view (if you've created one) and you should see that the records were successfully imported into the movies table.

Resources