Rake data import task isn't updating correctly

Rake data import task isn't updating correctly - ruby-on-rails

I have a .txt file full of attributes like so:
"12345", "1", "Kent"
"67890", "1", "New Castle"
I need this to update my County model, and so I have this rake task:
namespace :data do
desc "import data from files to database"
task :import => :environment do
file = File.open(File.join(Rails.root, "lib", "tasks", "counties.txt"), "r")
file.each do |line|
attrs = line.split(", ")
c = County.find_or_initialize_by_number(attrs[0])
c.state_id = attrs[1]
c.name = attrs[2]
c.save!
end
end
end
All seems to be well, but when I check in the console to make sure it was imported correctly, I get this:
#<County id: 2, name: nil, number: 0, state_id: 0, created_at: "2013-08-04 17:44:11", updated_at: "2013-08-04 17:44:11">
I know that it's actually importing something, because it has created exactly the right number of County records, but it's not actually updating the attributes correctly. I'm sure I'm missing something very obvious but I can't find it!

Assumption: County.number is defined as an integer field.
attrs[0] after your line.split comes out to be "\"12345\"" (a string). The find_by method defaults the lookup key to 0 (integer) given that it's looking up an integer field (number). This would explain why your code works when you manually strip out the quotation marks from the first data column in the text file.
Based on this root cause, there could be multiple ways to resolve your issue. Here's an ugly way:
c = County.find_or_initialize_by_number(Integer(attrs[0].gsub(/\"/, '')))
Ideally, I would trim out (or gsub) quotes when doing the line text split.

Related

Exclude headers when importing Google Spreadsheet content with Roo

I created a rake task to import users from a Google Sheet. Therefore I am using the gem 'Roo'. Everything works so far but I can't seem to get it working without importing the first row (headers).
This is my code:
require 'roo'
namespace :import do
desc "Import users from Google Sheet"
task users: :environment do
#counter = 0
url = 'https://docs.google.com/spreadsheets/d/{mycode}/export?format=xlsx'
xlsx = Roo::Spreadsheet.open(url, extension: :xlsx, headers: true)
xlsx.each do |row|
n = User.where(name:row[0]).first
user = User.find_or_create_by(id: n)
user.update(
name:row[0],
country_id:row[6]
)
user.save!
puts user.name
#counter += 1
end
puts "Imported #{#counter} lines."
end
end

Your code says headers: true when you are opening the sheet. Have you tried turning it to false? Or are you saying it does not work when it's set to false?
Also, you are using .each rather differently than the example in the documentation. The doc shows a hash with keys derived from the headers. You are using [n] array notation. Does that work?
EDIT:
Try using .each in a way that's more similar to what the documentation says:
xlsx.each(name: 'Name', country_id: 'Country ID') do |row|
n = User.where(name: row[:name]).first
...
end
The strings 'Name' and 'Country ID' are just examples; they should be the text of whatever column headers have the name and country_id information.

There is a way to skip the headers, it is using the method: each_row_streaming(offset: 1).
It will return an array with rows skipping the header, so you have to get the value using .value method. In documentation specify it for Excelx::Cell objects, but it works for Roo::Spreadsheet objects too.
The documentation example:
xlsx.each_row_streaming(offset: 1) do |row| # Will exclude first (inevitably header) row
puts row.inspect # Array of Excelx::Cell objects
end

Ruby on Rails CSV Import rake task Fails Silently and does not import data

I have tried to import a CSV file into ruby on rails using many scripts and methods and nothing seems to work. I have been hoping that code from Erik on Rails blog will help to get my job done.
I put this script into lib/tasks/import.rake:
desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model, :needs] => [:environment] do |task,args|
lines = File.new(args[:filename]).readlines
header = lines.shift.strip
keys = header.split(',')
lines.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
end
I created a model in the rails console
rails generate model SomeModel
then ran this in the rails console
rake csv_model_import[somefile.csv,SomeModel]
After running this, the cursor just returns in the console. It fails silently. When viewing the database file for the rails program, the table is empty after
the import. It has filed to import the data.
I tried something else as well. I tried first creating a model with the fields and types defined before running the rake import command. This also failed in the same way.
I am very new to Ruby on Rails and I am. I have spent 2 days trying to get a CSV file into Ruby on Rails and would greatly appreciate some help. Please let me know how to proceed, Thanks so much guys.

Following on your approach and why you may not be getting anything:
When I tried to replicate the error, the following is what I found out:
the line lines = File.new(args[:filename]).readlines reads the file in as one element separated by commas(,) for cells and new lines (\r or \n) for new lines... e.g: ["name,age,food\rtabitha,2,carrots\relijah,1,lettuce\rbeatrice,3,apples"]
This is the major problem because the blog post probably had another format gotten, on which the rest of the code is based on.
With this output, however... performing the shifting and splitting and stripping as above yields no result.
What I then did was to work off based on the result with the following steps:
1) Get the content:
file = File.new(args[:filename]).readlines => ["name,age,food\rtabitha,2,carrots\relijah,1,lettuce\rbeatrice,3,apples"]
2) Parse into a desired format by splitting on the new-lines (\r in my case)
lines = file.shift.strip.gsub(/\r/,"\\").split(/\\/) => ["name,age,food", "tabitha,2,carrots", "elijah,1,lettuce", "beatrice,3,apples"]
3) Get the headers:
header = lines.first => "name,age,food"
4) Get the body:
body = lines[1..-1] => ["tabitha,2,carrots", "elijah,1,lettuce", "beatrice,3,apples"]
5) Get keys from header:
keys = header.split(',') => ["name", "age", "food"]
6) Loop through the body and create objects in the database:
body.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
The full gist is as follow:
desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model, :needs] => [:environment] do |task,args|
file = File.new(args[:filename]).readlines
lines = file.shift.strip.gsub(/\r/,"\\").split(/\\/)
header = lines.first
body = lines[1..-1]
keys = header.split(',')
body.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
end
PS: I notice the comment on ProgNoob's answer. you have to create a model first with all the needed attributes and have the database migrated. Then you can pass the model name and the csv file name into the rake file.
In my case, I have a model Somemodel generated as follow:
rails g model somemodel name:string age:string food:string
Notice that I added all the desired attributes.
And then migrated my database as:
$ rake db:migrate

Split one data row into multiple based on # of instances in one of the cells

Right now, when a user creates a Request object, the output looks like this:
<Request id: 1, email: "abc#yahoo.com", items: ["one item", "two item"], created_at: "2014-04-24 05:14:24", edit_id: "gwe3EX4q2EUVk7FQCRUJug">
I am trying to convert this so that if items > 1, the output is split into two Request instances like so:
<Request id: 1, email: "abc#yahoo.com", items: "one item", created_at: "2014-04-24 05:14:24", edit_id: "gwe3EX4q2EUVk7FQCRUJug">
<Request id: 2, email: "abc#yahoo.com", items: "two item", created_at: "2014-04-24 05:14:24", edit_id: "gwe3EX4q2EUVk7FQCRUJug">
What's complicating this further is that also I want the Request.id to increment as per usual, but NOT the created_at and edit_id, both of which are automatically generated when a Request.create is called.
How can I do this? The code snippet of the method I've worked on so far (certainly not working, I'm stumped)...
self.items.each_with_index do |item, index|
index = Request.save(:email => self.email, :items => item, :created_at => self.created_at, :edit_id => self.edit_id)
end
Thanks!
FYI the code for the edit_id method:
def Request.new_edit_id
SecureRandom.urlsafe_base64
end
def create_edit_id
self.edit_id = Request.new_edit_id
end
UPDATE:
Also, the clone or dup methods that create shallow copies don't work in this case. The first answer on this question: How do I copy a hash in Ruby? works for a simple array, but for a complex hash like what I have, changing the cloned copy will actually change the original as well.
I'm going to explore a deep copy and see if I can get that to work!

In the process of figuring this out, I learned that it's bad practice to store array data like this in an object. I should just be making another data table and relating the two.
That said, if any bit of this code is helpful, here's what did it:
#requestrecord.items = ["water filter", "tent"]
num_of_requests = #requestrecord.items.count
i = 0
cloned_request = Hash.new
while i < num_of_requests do
cloned_request[i] = Marshal.load(Marshal.dump(#requestrecord)) #creates a cloned_request[0] = #requestrecord and cloned_request[1] = #requestrecord
cloned_request[i].items = #requestrecord.items.slice(i) #disaggregates the items into the cloned_requests; cloned_request[0].items = "water filter", cloned_request[1].items = "tent"
i += 1
end
Note the Marshal.load(Marshal.dump(#requestrecord)) creates a deep copy and therefore works where clone and dup would not. (They would create references to the original hash. If the original hash changed, so would they. Moreover, I found out that in complex hashes like what I was building, changing a cloned hash actually changes the original as well!)

Rails - Create an helper method that print a specific attribute of all objects (render an array instead)

Sorry for this simple question but I am stuck here.
I am trying to create a helper method that simply prints for each object the attribute "name" of my Table "Term".
I tried this:
def display_name(terms)
terms.each do |term|
p term.name
end
end
But instead of printing each objects name, it prints an array for each object with all attributes.
As an example:
[#<Term id: 1, name: "test", definition: "first definition", created_at: "2011-07-21 14:52:12", updated_at: "2011-07-21 14:52:12">,
#<Term id: 2, name: "second test", definition: "blabla", created_at: "2011-07-20 18:00:42", updated_at: "2011-07-20 18:04:15">
I am trying to find what I can do with the documentation (content_tag, concat, collect) but it doesn't seem to provide the result I want..
Thanks for your explanation

The reason for this is because it does not actually print the name, it returns the value from terms.each since that was the last statement in the method.
I would probably use the map method to collect all the names into an array first and if you want a String instead of an Array then I would join them with whatever separator that is preferred, like this:
def display_name(terms)
terms.map(&:name).join ", "
end
You could also add a parameter to choose the separator if you like. Like this:
def display_name(terms, sep = ", ")
terms.map(&:name).join sep
end
# in view
display_name(collection, "<br/>")
By default it then uses a comma to separate them but you can manually choose something else.

Ruby on Rails - Import Data from a CSV file

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.
This is my table:
create_table "mouldings", :force => true do |t|
t.string "suppliers_code"
t.datetime "created_at"
t.datetime "updated_at"
t.string "name"
t.integer "supplier_id"
t.decimal "length", :precision => 3, :scale => 2
t.decimal "cost", :precision => 4, :scale => 2
t.integer "width"
t.integer "depth"
end
Can you give me some code to show me the best way to do this, thanks.

require 'csv'
csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:
require 'csv'
CSV.foreach(filename, headers: true) do |row|
Moulding.create!(row.to_hash)
end
No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.
It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.
require 'smarter_csv'
options = {}
SmarterCSV.process('input_file.csv', options) do |chunk|
chunk.each do |data_hash|
Moulding.create!( data_hash )
end
end
You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.
See also:
https://github.com/tilo/smarter_csv

You might try Upsert:
require 'upsert' # add this to your Gemfile
require 'csv'
u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
setter = row.to_hash
u.row selector, setter
end
If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.

This can help. It has code examples too:
http://csv-mapper.rubyforge.org/
Or for a rake task for doing the same:
http://erikonrails.snowedin.net/?p=212

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,
require 'csv'
namespace :lan do
desc 'Seed initial languages data with language & code'
task init_data: :environment do
puts '>>> Initializing Languages Data Table'
ActiveRecord::Base.transaction do
csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
csv_str = File.read(csv_path)
csv = CSV.new(csv_str).to_a
csv.each do |lan_set|
lan_code = lan_set[0]
lan_str = lan_set[1]
Language.create!(language: lan_str, code: lan_code)
print '.'
end
end
puts ''
puts '>>> Languages Database Table Initialization Completed'
end
end
Snippet below is a partial of languages.csv file,
aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.
desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
lines = File.new(args[:filename], "r:ISO-8859-1").readlines
header = lines.shift.strip
keys = header.split(',')
lines.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
end
After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]

I know it's old question but it still in first 10 links in google.
It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.
It's better (and significantly faster) to use batch insert.
INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
('s1', 'supplier1', 1.111),
('s2', 'supplier2', '2.222')
You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended)
or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows
refer to gem docs for details

Use this gem:
https://rubygems.org/gems/active_record_importer
class Moulding < ActiveRecord::Base
acts_as_importable
end
Then you may now use:
Moulding.import!(file: File.open(PATH_TO_FILE))
Just be sure to that your headers match the column names of your table

The following module can be extended on any model and it will import the data according to the column headers defined in the CSV.
Note:
This is a great internal tool, for customer use I would recommend adding safeguards and sanitization
The column names in the CSV must be exactly like the DB schema or it won't work
It can be further improved by using the table name to get the headers vs defining them in the file
Create a file named "csv_importer.rb" in your models/concerns folder
module CsvImporter
extend ActiveSupport::Concern
require 'csv'
def convert_csv_to_book_attributes(csv_path)
csv_rows = CSV.open(csv_path).each.to_a.compact
columns = csv_rows[0].map(&:strip).map(&:to_sym)
csv_rows.shift
return columns, csv_rows
end
def import_by_csv(csv_path)
columns, attributes_array = convert_csv_to_book_attributes(csv_path)
message = ""
begin
self.import columns, attributes_array, validate: false
message = "Import Successful."
rescue => e
message = e.message
end
return message
end
end
Add extend CsvImporter to whichever model you would like to extend this functionality to.
In your controller you can have an action like the following to utilize this functionality:
def import_file
model_name = params[:table_name].singularize.camelize.constantize
csv = params[:file].path
#message = model_name.import_by_csv(csv)
end

It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF

If you want to Use SmartCSV
all_data = SmarterCSV.process(
params[:file].tempfile,
{
:col_sep => "\t",
:row_sep => "\n"
}
)
This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

Categories

HOME

docker

spring-security

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Rake data import task isn't updating correctly - ruby-on-rails

Related

Exclude headers when importing Google Spreadsheet content with Roo

Ruby on Rails CSV Import rake task Fails Silently and does not import data

Split one data row into multiple based on # of instances in one of the cells

Rails - Create an helper method that print a specific attribute of all objects (render an array instead)

Ruby on Rails - Import Data from a CSV file

Categories

Resources