Importing a CSV to Rails database - ruby-on-rails

I asked this question earlier this week, and it worked fine. I just tried it with a slightly bigger spreadsheet and it doesn't seem to work for some reason.
My code is as follows:
require 'roo'
xlsx = Roo::Spreadsheet.open(File.expand_path('../Downloads/unistats/LOCATION.csv'))
xlsx.each_row_streaming(offset: 1) do |row|
Location.find_or_create_by(ukprn: row[0].value, accomurl: row[1].value, instbeds: row[3].value, instlower: row[4].value, instupper: row[5].value, locid: row[6].value, locname: row[7].value, lat: row[9].value, long: row[10].value, locukprn: row[11].value, loccountry: row[12].value, privatelower: row[13].value, privateupper: row[14].value, suurl: row[15].value)
end
But unlike last time, this is coming up with this error:
NoMethodError: undefined method `each_row_streaming' for #<Roo::CSV:0xb9e0b78>
Did you mean? each_row_using_tempdir
This file is a CSV rather than .xlsx but that shouldn't make a difference.
Any ideas what I'm doing wrong?

It does actually makes a difference that you're trying to read a CSV file using the Excel methods.
Excerpts from the Roo documentation.
# Load a CSV file
s = Roo::CSV.new("mycsv.csv")
# Load a tab-delimited csv
s = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
s = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
A neat way to read both Excel and CSV files is to do something like
if File.extname(filename).start_with?('xls')
workbook = Roo::Excel.new(filename)
else
workbook = Roo::CSV.new(filename)
end
workbook.default_sheet = workbook.sheets[0]
(workbook.first_row..workbook.last_row).each do |line|
...
end

Related

Rails Import/Parse from CSV UTF-8 Missing Column

So I'm working on allowing users to import data from a CSV file. Right now all the fields will import correctly, except whatever is the first field.
What I've discovered is the file type is affecting the import.
My code looks like:
class Import < Operation
require 'csv'
def call(file, training_event_id)
csv_data = CSV.parse(file.read, headers: true)
list_occo = []
csv_data.each do |row|
occupant = Occupant.new
occupant.account_number = row['Account Number']
occupant.check_in = row['Check In']
binding.pry
occupant.training_event_id = training_event_id
list_occo << occupant
end
binding.pry
occo_errors = check_file(list_occo)
list_occo.each(&:save) if occo_errors.empty?
return occo_errors
end
When I do the binding.pry and check on occupant I'm getting nil on the Account Number when doing CSV UTF-8. If I switch to straight up CSV not an issue. Is there a way to convert/switch a CSV UTF-8 to CSV? I thought/tried using some sort of encoding on the parse like: encoding: 'iso-8859-1' but that didn't work.
Is there a way to convert the CSV UTF-8 or is there a way to do a straight up file format check to ensure it's CSV and not CSV UTF-8?
Just in case someone comes across this issue in the future. I looked at the file in the rails console using CSV.read(file.path) and noticed U+FEFF preceding the first column header. There's a rabbit hole of information about BOM and UTF-8 issues. Without wanting to do a CSV/File.open I attempted things like doing a split, gsub, file checks on utf-8, etc. Then I simply changed the csv_data line to be:
csv_data = CSV.parse(File.read(file, encoding: 'bom|utf-8'), headers: true)
Then in my controller I updated it from (params[:file]) to (params[:file].path) as I was getting an error of
no implicit conversion of ActionDispatch::Http::UploadedFile into
String
Hopefully this helps someone else.

How do I parse an Excel file that will give me data exactly as it appears visually?

I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have
book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)
However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has
16:45.81
and when I get the CSV data from above, what is returned is
"0.011641319444444444"
How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...
Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0
You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.
book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2)
=> '16.45.8'
If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.
require 'roo-xls'
require 'csv'
CSV::Converters[:time_parser] = lambda do |field, info|
case info[:header].strip
when "time" then begin
# 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81
parse_time = field.to_f * 24 * 3600
# 1005.81.divmod(60) = [16, 45.809999999999999945]
mm, ss = parse_time.divmod(60)
# returns "16:45.81"
time = "#{mm}:#{ss.round(2)}"
time
rescue
field
end
else
field
end
end
book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv
=> {"time "=>"16:45.81"}
{"time "=>"12:46.0"}
Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.
I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.
You can use converters option. It seems looking like this:
arr_of_arrs = CSV.parse(text, {converters: :date_time})
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html
Your problem seems to be with the way you're parsing (reading) the input file.
roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.
Like the documentation says, use the roo-xls gem instead.

Ruby on Rails - CSV file not storing to file path

I have a file path but i don't know how do i store the csv file to file path and i tried code below but i didn't get file from that file-path.I want uploaded csv file store to some location
File.join(file-path, filename)
if you want to open CSV there a couple options but the code above has a syntactic error.
you can try this:
parsed_file = CSV.parse(File.open('/Users/yourname/Desktop/' + 'file.csv', 'wb'))
parsed_file.each do |row|
puts row[0] # will print first column of each row
end

Spreadsheet - encoding problem with reading cyrillic characters

I'm working on a rails app for a small shop. It needs to load an .xls file, parse it and maybe load to the database.
I use Spreadsheet gem to work with the file.
The problem is that the file contains russian characters which are displayed as "└ÛÛ.ExT H-1727F (ÓÝÓÙ¯Ò GP T304)"
The reference says, I need to specify the encoding, but I don't know which one is used in this file. I tried "win-1251" but it gave me an error about being unable to find a "utf-8 to win-1251 converter"
I've setting encoding to "WINDOWS-1251" but it gave me this error:
U+00BE to WINDOWS-1251 in conversion from CP850 to UTF-8 to WINDOWS-1251
So then I've tried CP850, which didn't throw an error, but the characters were still not readable.
There's not much code really.
# -*- encoding : utf-8 -*-
...
def show
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet.open 'c:\rails\renergy23\public\price-16-04-11.xls'
#sheet = book.worksheet 0
end
For simpicity I don't load it to the database right now. Instead I output it in my view:
- 30.times do |i|
= #sheet.row i+10
%br
http://dl.dropbox.com/u/4976861/price-16-04-11.xls
I kinda solved this after 1.5 months by first saving the document in .xlsx and then saving it in .xls (97-2003). I couldn't use the .xlsx because of some weird OLE signature incorrect error.

Headers on the second row in FasterCSV?

G'day guys, I'm currently using fasterCSV to parse a CSV file in ruby, and wondering how to get rid of the initial row of data on a CSV (The initial row contains the time/date information generated by another software package)
I tried using fasterCSV.table and then deleting row(0) then converting it to a CSV document then parsing it
but the row was still present in the document.
Any other ideas?
fTable = FasterCSV.table("sto.csv", :headers => true)
fTable.delete(0)
Three suggestions
Can you get FasterCSV to ignore the line?
You could use the :return_headers => true option to skip over the bad line. That'll work great if the second line isn't the real header. See here for more
:return_headers:
When false, header rows are silently
swallowed. If set to true, header rows
are returned in a FasterCSV::Row
object with identical headers and
fields (save that the fields do not go
through the converters).
Chop the line off with another tool
You don't need to use Ruby for this - how about chopping the file using one of the solutions suggested here you can call the one-liners from Ruby using the system method.
Max Flexibility - parse the file line by line with FasterCSV
Have you considered reading the file directly, skipping the first line and then accepting or rejecting lines? Deep in the heart of my code is this parse method which treats the file as a series of lines, accepting or rejecting each. You could do something similar but skip over the first row.
The neat thing is that you get to determine which rows are acceptable by defining your own acceptable? method - only valid CSV data is passed to acceptable? the rest are thrown away in response to the exception.
def parse(file)
#
# Parse data
#
row = []
file.each_line do |line|
the_line = line.chomp
begin
row = FasterCSV.parse_line(the_line)
ok, message = acceptable?(row)
if not ok
reject(file.lineno, the_line, message)
else
accept(row, the_line)
end
rescue FasterCSV::MalformedCSVError => e
reject(file.lineno, the_line, e.to_s)
end
end
hi doing just that with some data for Australian Electoral Commission. The file in question has a date string on the first line and headers on the second
require 'csv'
require 'open-uri'
filename = "http://results.aec.gov.au/15508/Website/Downloads/SenateGroupVotingTicketsDownload-15508.csv"
file = File.open(open(filename))
first_line = file.readline
CSV.parse(file, headers: true).each do |row|
puts row["State"]
end
I presume the file I quote still exists but that can be replaced by the file in question. if you need to skip more rows you have to call file.readline that number of times.
According to the docs, fTable = FasterCSV.table("sto.csv", :return_headers => false) should do what you want. .table implies :headers => true The docs have this info.

Resources