I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.
This is my table:
create_table "mouldings", :force => true do |t|
t.string "suppliers_code"
t.datetime "created_at"
t.datetime "updated_at"
t.string "name"
t.integer "supplier_id"
t.decimal "length", :precision => 3, :scale => 2
t.decimal "cost", :precision => 4, :scale => 2
t.integer "width"
t.integer "depth"
end
Can you give me some code to show me the best way to do this, thanks.
require 'csv'
csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end
Simpler version of yfeldblum's answer, that is simpler and works well also with large files:
require 'csv'
CSV.foreach(filename, headers: true) do |row|
Moulding.create!(row.to_hash)
end
No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.
It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.
The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.
require 'smarter_csv'
options = {}
SmarterCSV.process('input_file.csv', options) do |chunk|
chunk.each do |data_hash|
Moulding.create!( data_hash )
end
end
You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.
See also:
https://github.com/tilo/smarter_csv
You might try Upsert:
require 'upsert' # add this to your Gemfile
require 'csv'
u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
setter = row.to_hash
u.row selector, setter
end
If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.
This can help. It has code examples too:
http://csv-mapper.rubyforge.org/
Or for a rake task for doing the same:
http://erikonrails.snowedin.net/?p=212
It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,
require 'csv'
namespace :lan do
desc 'Seed initial languages data with language & code'
task init_data: :environment do
puts '>>> Initializing Languages Data Table'
ActiveRecord::Base.transaction do
csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
csv_str = File.read(csv_path)
csv = CSV.new(csv_str).to_a
csv.each do |lan_set|
lan_code = lan_set[0]
lan_str = lan_set[1]
Language.create!(language: lan_str, code: lan_code)
print '.'
end
end
puts ''
puts '>>> Languages Database Table Initialization Completed'
end
end
Snippet below is a partial of languages.csv file,
aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...
The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.
desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
lines = File.new(args[:filename], "r:ISO-8859-1").readlines
header = lines.shift.strip
keys = header.split(',')
lines.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
end
After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]
I know it's old question but it still in first 10 links in google.
It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.
It's better (and significantly faster) to use batch insert.
INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
('s1', 'supplier1', 1.111),
('s2', 'supplier2', '2.222')
You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended)
or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows
refer to gem docs for details
Use this gem:
https://rubygems.org/gems/active_record_importer
class Moulding < ActiveRecord::Base
acts_as_importable
end
Then you may now use:
Moulding.import!(file: File.open(PATH_TO_FILE))
Just be sure to that your headers match the column names of your table
The following module can be extended on any model and it will import the data according to the column headers defined in the CSV.
Note:
This is a great internal tool, for customer use I would recommend adding safeguards and sanitization
The column names in the CSV must be exactly like the DB schema or it won't work
It can be further improved by using the table name to get the headers vs defining them in the file
Create a file named "csv_importer.rb" in your models/concerns folder
module CsvImporter
extend ActiveSupport::Concern
require 'csv'
def convert_csv_to_book_attributes(csv_path)
csv_rows = CSV.open(csv_path).each.to_a.compact
columns = csv_rows[0].map(&:strip).map(&:to_sym)
csv_rows.shift
return columns, csv_rows
end
def import_by_csv(csv_path)
columns, attributes_array = convert_csv_to_book_attributes(csv_path)
message = ""
begin
self.import columns, attributes_array, validate: false
message = "Import Successful."
rescue => e
message = e.message
end
return message
end
end
Add extend CsvImporter to whichever model you would like to extend this functionality to.
In your controller you can have an action like the following to utilize this functionality:
def import_file
model_name = params[:table_name].singularize.camelize.constantize
csv = params[:file].path
#message = model_name.import_by_csv(csv)
end
It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF
If you want to Use SmartCSV
all_data = SmarterCSV.process(
params[:file].tempfile,
{
:col_sep => "\t",
:row_sep => "\n"
}
)
This represents tab delimited data in each row "\t" with rows separated by new lines "\n"
Related
I am trying to parse a csv file from url to record it through my DB then via this database I must create a html page table on rails contains and lists the info from csv file.
The problem is every value returning fine but one float parameter is blank at the table.
This is my migration
class CreateHmrcRates < ActiveRecord::Migration[6.0]
def change
create_table :hmrc_rates do |t|
t.string :country
t.string :currency
t.string :curr_code
t.float :units_per_pound
t.datetime :start_date
t.datetime :end_date
t.timestamps
end
end
end
This is my Controller
require 'open-uri'
require 'csv'
class HmrcRatesController < ApplicationController
def index
#hmrc_rates = HmrcRate.all
end
def new
csv_text = URI.open('https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/988629/exrates-monthly-0621.csv'){ |io| io.read.encode("UTF-8", invalid: :replace) }
csv = CSV.parse(csv_text, :headers=>true)
csv.each_with_index do |row, i|
HmrcRate.create(country: row["Country/Territories"], currency: row["Currency"], curr_code: row["Currency Code"], units_per_pound: row["Currency units per £1"].to_f, start_date: Date.parse(row["Start Date"]), end_date: Date.parse(row["End Date"]))
puts "#{i}. #{row}"
puts "***********************"
end
redirect_to hmrc_rates_path, notice: "HMRC rates created"
end
end
at first it was like this
units_per_pound: row["Currency units per £1"]
this couldn't pass Validation:
units_per_pound can not be blank
so I tried
units_per_pound: row["Currency units per £1"].to_f
This didn't trigger validation but when I checked in command prompt all float values recorded as 0.0
My rails page looks like this
Can you help me get my precious float values? Thank you for your time and effort.
The CSV file you are downloading is NOT UTF-8 encoded. Instead, it appears to be encoded in Windows-1252 (although you might want to check the documentation for the API if available). Because of that, the name of the header is not correct if you interpret the incoming data as UTF-8. As you have specified that invalid characters should be replaced with a replacement character on reading, the £ character is read incorrectly by your code.
You can fix this by specifying the correct encoding when reading the file into a string and then re-encoding the string into your desired target encoding:
csv_uri = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/988629/exrates-monthly-0621.csv'
csv_text = URI.open(csv_uri, encoding: "Windows-1252").read.encode('UTF-8')
Here, we specify that the data as read by URI.open is supposed to be Windows-1252 encoded. In the second step, we re-encode this data to UTF-8.
Now, you can use the rest of your code to parse the csv_text. As we are now using the correct encoding, we can use the correct header names:
csv = CSV.parse(csv_text, headers: true)
csv.each_with_index do |row, i|
puts "#{row['Country/Territories']} - #{row['Currency units per £1 ']}"
end
Note that the specified header name for the currency units has a trailing space in the file. As such, you have to also specify this trailing space when accessing the value in the CSV::Row object.
If you do not want to be dependent on header names but can be sure that the order of columns doesn't change, you can also use numeric indexes rather than names to access he cells in a row:
csv = CSV.parse(csv_text, headers: true)
csv.each_with_index do |row, i|
puts "#{row[0]} - #{row[3]}"
end
I created a rake task to import users from a Google Sheet. Therefore I am using the gem 'Roo'. Everything works so far but I can't seem to get it working without importing the first row (headers).
This is my code:
require 'roo'
namespace :import do
desc "Import users from Google Sheet"
task users: :environment do
#counter = 0
url = 'https://docs.google.com/spreadsheets/d/{mycode}/export?format=xlsx'
xlsx = Roo::Spreadsheet.open(url, extension: :xlsx, headers: true)
xlsx.each do |row|
n = User.where(name:row[0]).first
user = User.find_or_create_by(id: n)
user.update(
name:row[0],
country_id:row[6]
)
user.save!
puts user.name
#counter += 1
end
puts "Imported #{#counter} lines."
end
end
Your code says headers: true when you are opening the sheet. Have you tried turning it to false? Or are you saying it does not work when it's set to false?
Also, you are using .each rather differently than the example in the documentation. The doc shows a hash with keys derived from the headers. You are using [n] array notation. Does that work?
EDIT:
Try using .each in a way that's more similar to what the documentation says:
xlsx.each(name: 'Name', country_id: 'Country ID') do |row|
n = User.where(name: row[:name]).first
...
end
The strings 'Name' and 'Country ID' are just examples; they should be the text of whatever column headers have the name and country_id information.
There is a way to skip the headers, it is using the method: each_row_streaming(offset: 1).
It will return an array with rows skipping the header, so you have to get the value using .value method. In documentation specify it for Excelx::Cell objects, but it works for Roo::Spreadsheet objects too.
The documentation example:
xlsx.each_row_streaming(offset: 1) do |row| # Will exclude first (inevitably header) row
puts row.inspect # Array of Excelx::Cell objects
end
I have a file b.xls from excel I need to import it to my rails app
I have tried to open it
file = File.read(Rails.root.to_s+'/b.xls')
I have got this
file.encoding => #Encoding:UTF-8
I have few questions:
how to open without this symbols(normal language)?
how to convert this file to a hash?
File pretty large about 5k lines
You must have array of all rows then you can convert it to some hash if you like so.
I would recommend to use a batch_factory gem.
The gem is very simple and relies on the roo gem under the hood.
Here is the code example
require 'batch_factory'
factory = BatchFactory.from_file(
Rails.root.join('b.xlsx'),
keys: [:column1, :column2, ..., :what_ever_column_name]
)
Then you can do
factory.each do |row|
puts row[:column1]
end
You can also omit specifying keys. Then batch_factory will automatically fetch headers from the first row. But your keys would be in russian. Like
factory.each do |row|
puts row['Товар']
end
If you want to hash with product name as key you can do
factory.inject({}) do |hash, row|
hash.merge(row['Товар'] => row)
end
I have a csv file with dump data of table and I would like to import it directly into my database using rails.
I am currently having this code:
csv_text = File.read("public/csv_fetch/#{model.table_name}.csv")
ActiveRecord::Base.connection.execute("TRUNCATE TABLE #{model.table_name}")
puts "\nUpdating table #{model.table_name}"
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
row = row.to_hash.with_indifferent_access
ActiveRecord::Base.record_timestamps = false
model.create!(row.to_hash.symbolize_keys)
end
with help from here..
Consider my Sample csv:
id,code,created_at,updated_at,hashcode
10,00001,2012-04-12 06:07:26,2012-04-12 06:07:26,
2,00002,0000-00-00 00:00:00,0000-00-00 00:00:00,temphashcode
13,00007,0000-00-00 00:00:00,0000-00-00 00:00:00,temphashcode
43,00011,0000-00-00 00:00:00,0000-00-00 00:00:00,temphashcode
5,00012,0000-00-00 00:00:00,0000-00-00 00:00:00,temphashcode
But problem with this code is :
It is generating `id' as autoincrement 1,2,3,.. instead of what in
csv file.
The timestamps for records where there is 0000-00-00 00:00:00 defaults to null automatically and throws error as the column created_at cannot be null...
Is there any way I can do it in generic way to import from csv to models?
or would i have to write custom code for each model to manipulate the attributes in each row manually??
for question1, I suggest you output the row.to_hash.symbolize_keys, e.g.
# ...
csv.each do |row|
#...
hash = row.to_hash.symbolize_keys
Rails.logger.info "hash: #{hash.inspect}"
model.create!(hash)
end
to see if the "id" is assigned.
for Question2, I don't think it's a good idea to store "0000-00-00" instead of nil for the date.
providing fields like 'id' and for timestamps fields too manually solved it...
model.id = row[:id]
and similar for created_at,updated_at if these exists in model..
I had to migrate from a mySql based ruby on rails app to using postgresql. No problems but one so far, and I don't know how to solve it.
The migration of data brought ids along with it, and postgresql is now having problems with existing ids: it's not clear to me where it gets the value that it uses to determine the base for nextval: it certainly isn't the highest value in the column, although you might think that would be a good idea. In any case, it's now colliding with existing id values. id column, created from a standard RoR migration is defined as
not null default nextval('geopoints_id_seq'::regclass)
Is there some place that the value it uses as a base can be hacked? This problem could now arise in any of 20 or so tables: I could use
'select max(id) from <table_name>'
but that seems to make the idea of an autoincrement column pointless.
How is this best handled?
There is a reset_pk_sequences! method on the Postgres adapter. You can call it and it will set it to max(id) + 1, which is probably what you want.
In some projects I get data ETL'ed in often enough to warrant a rake task to do this for all models, or for a specified model. Here's the task - include it in some Rakefile or in it's own under lib/tasks:
desc "Reset all sequences. Run after data imports"
task :reset_sequences, :model_class, :needs => :environment do |t, args|
if args[:model_class]
classes = Array(eval args[:model_class])
else
puts "using all defined active_record models"
classes = []
Dir.glob(RAILS_ROOT + '/app/models/**/*.rb').each { |file| require file }
Object.subclasses_of(ActiveRecord::Base).select { |c|
c.base_class == c}.sort_by(&:name).each do |klass|
classes << klass
end
end
classes.each do |klass|
next if klass == CGI::Session::ActiveRecordStore::Session && ActionController::Base.session_store.to_s !~ /ActiveRecordStore/
puts "reseting sequence on #{klass.table_name}"
ActiveRecord::Base.connection.reset_pk_sequence!(klass.table_name)
end
end
Now you can run this either for all models (defined under RAIS_ROOT/app/models) using rake reset_sequences, or for a specific model by passing in a class name.
The rails 3 version looks like this:
namespace :db do
desc "Reset all sequences. Run after data imports"
task :reset_sequences, :model_class, :needs => :environment do |t, args|
if args[:model_class]
classes = Array(eval args[:model_class])
else
puts "using all defined active_record models"
classes = []
Dir.glob(RAILS_ROOT + '/app/models/**/*.rb').each { |file| require file }
ActiveRecord::Base.subclasses.select { |c|c.base_class == c}.sort_by(&:name).each do |klass|
classes << klass
end
end
classes.each do |klass|
puts "reseting sequence on #{klass.table_name}"
ActiveRecord::Base.connection.reset_pk_sequence!(klass.table_name)
end
end
end
https://gist.github.com/909032
with that definition, the column will get the next value from the geopoints_id_seq sequence.
That sequence is not directly attached to the table. If you're migrating data, you have to create or update that sequence so its starting point is larger than the current max id in your table.
You should be able to set its new value with e.g.
ALTER SEQUENCE geopoints_id_seq RESTART with 1692;
Or whatever select max(id) from table_name; yields
PG uses sequences :
Make it's current value 1 higher than the highest value in your table like this.
SELECT setval('geopoints_id_seq', 999999999, true);
Also see these
http://www.postgresql.org/docs/8.4/interactive/datatype-numeric.html#DATATYPE-SERIAL
http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html
Use setval() to set the starting value for the sequence.