Rails Import/Parse from CSV UTF-8 Missing Column - ruby-on-rails

So I'm working on allowing users to import data from a CSV file. Right now all the fields will import correctly, except whatever is the first field.
What I've discovered is the file type is affecting the import.
My code looks like:
class Import < Operation
require 'csv'
def call(file, training_event_id)
csv_data = CSV.parse(file.read, headers: true)
list_occo = []
csv_data.each do |row|
occupant = Occupant.new
occupant.account_number = row['Account Number']
occupant.check_in = row['Check In']
binding.pry
occupant.training_event_id = training_event_id
list_occo << occupant
end
binding.pry
occo_errors = check_file(list_occo)
list_occo.each(&:save) if occo_errors.empty?
return occo_errors
end
When I do the binding.pry and check on occupant I'm getting nil on the Account Number when doing CSV UTF-8. If I switch to straight up CSV not an issue. Is there a way to convert/switch a CSV UTF-8 to CSV? I thought/tried using some sort of encoding on the parse like: encoding: 'iso-8859-1' but that didn't work.
Is there a way to convert the CSV UTF-8 or is there a way to do a straight up file format check to ensure it's CSV and not CSV UTF-8?

Just in case someone comes across this issue in the future. I looked at the file in the rails console using CSV.read(file.path) and noticed U+FEFF preceding the first column header. There's a rabbit hole of information about BOM and UTF-8 issues. Without wanting to do a CSV/File.open I attempted things like doing a split, gsub, file checks on utf-8, etc. Then I simply changed the csv_data line to be:
csv_data = CSV.parse(File.read(file, encoding: 'bom|utf-8'), headers: true)
Then in my controller I updated it from (params[:file]) to (params[:file].path) as I was getting an error of
no implicit conversion of ActionDispatch::Http::UploadedFile into
String
Hopefully this helps someone else.

Related

Importing a CSV to Rails database

I asked this question earlier this week, and it worked fine. I just tried it with a slightly bigger spreadsheet and it doesn't seem to work for some reason.
My code is as follows:
require 'roo'
xlsx = Roo::Spreadsheet.open(File.expand_path('../Downloads/unistats/LOCATION.csv'))
xlsx.each_row_streaming(offset: 1) do |row|
Location.find_or_create_by(ukprn: row[0].value, accomurl: row[1].value, instbeds: row[3].value, instlower: row[4].value, instupper: row[5].value, locid: row[6].value, locname: row[7].value, lat: row[9].value, long: row[10].value, locukprn: row[11].value, loccountry: row[12].value, privatelower: row[13].value, privateupper: row[14].value, suurl: row[15].value)
end
But unlike last time, this is coming up with this error:
NoMethodError: undefined method `each_row_streaming' for #<Roo::CSV:0xb9e0b78>
Did you mean? each_row_using_tempdir
This file is a CSV rather than .xlsx but that shouldn't make a difference.
Any ideas what I'm doing wrong?
It does actually makes a difference that you're trying to read a CSV file using the Excel methods.
Excerpts from the Roo documentation.
# Load a CSV file
s = Roo::CSV.new("mycsv.csv")
# Load a tab-delimited csv
s = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
s = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
A neat way to read both Excel and CSV files is to do something like
if File.extname(filename).start_with?('xls')
workbook = Roo::Excel.new(filename)
else
workbook = Roo::CSV.new(filename)
end
workbook.default_sheet = workbook.sheets[0]
(workbook.first_row..workbook.last_row).each do |line|
...
end

CSV file encoding in Rails with S3 and Heroku

My rails app uploads CSV files to S3, then subsequently pulls them down into a tempfile to send each row's data to a Sidekiq worker. I'm using Carrierwave and fog to handle the uploading.
This all worked beautifully until recently switching to Heroku, and now, when trying to create my tempfile I get the following error:
Error type Encoding::UndefinedConversionError
Error message "\xA2" from ASCII-8BIT to UTF-8
I've tried setting the encoding when creating the tempfile as well as working with the CSV file and continue to get the same error. I cannot reproduce this error on my local machine, which has made this entire process that much more fun :)
Currently, my Sidekiq worker calls the following method:
def upload_csv(filename, file_path)
file = Tempfile.new(filename, Rails.root.join('tmp'), encoding: "ISO8859-1:utf-8").tap do |f|
open(file_path).rewind
f.write(open(file_path).read)
f.close
end
CSV.foreach(file, headers: true, encoding: "ISO8859-1:utf-8")do |row|
#do stuff to rows
end
end
I understand the very basics of encoding, but I'm super stuck on this. Any insight would be appreciated.
Thanks!
Not sure if this will help anyone else, but I found a solution that works for me:
def upload_csv(filename, file_path)
file = Tempfile.new(filename, Rails.root.join('tmp')).tap do |f|
open(file_path).rewind
f.write(open(file_path).read.force_encoding('utf-8'))
f.close
end
CSV.foreach(file, headers: true)do |row|
#do stuff to rows
end
end
Even though I could confirm that the file was UTF-8 encoded before it was uploaded, open(#file_path).read.encoding returning an ASCII-8BIT encoding. It was getting confused on how to write the file and convert it from ASCII-8BIT to UTF-8.

MalformedCSVError with rails CSV (FasterCSV)

I'm having serious issues trying to parse some CSV in rails right now.
Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"
Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.
If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)
If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.
If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.
Any suggestions? My app is processing the file as follows :
def create
#survey = Survey.new(params[:survey])
# Now we need to try and convert this to UTF-8 if it isn't already
encoded = File.read(#survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)
# We've got a guess at the encoding,
# so we can try and convert it but it
# may still fail so we need to handle
# that
begin
re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
re_encoded = re_encoded.gsub(/\r\n?/, "\n")
# Now replace the uploaded file
File.open(#survey.survey_data.current_path, 'w') { |f|
f.write(re_encoded)
}
rescue ArgumentError
puts "UH OH!!!!!"
end
puts "#{#survey.survey_data.current_path}"
#parsed = CSV.read(#survey.survey_data.current_path)
end
The file uploading gem is CarrierWave if that makes any difference.
Please can someone help me as this is driving me insane!
Edit
The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is
"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"
Well that was irritating!
Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with
CSV.open("path/to/file.csv", "rb:bom|encoding")
allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!

Spreadsheet - encoding problem with reading cyrillic characters

I'm working on a rails app for a small shop. It needs to load an .xls file, parse it and maybe load to the database.
I use Spreadsheet gem to work with the file.
The problem is that the file contains russian characters which are displayed as "└ÛÛ.ExT H-1727F (ÓÝÓÙ¯Ò GP T304)"
The reference says, I need to specify the encoding, but I don't know which one is used in this file. I tried "win-1251" but it gave me an error about being unable to find a "utf-8 to win-1251 converter"
I've setting encoding to "WINDOWS-1251" but it gave me this error:
U+00BE to WINDOWS-1251 in conversion from CP850 to UTF-8 to WINDOWS-1251
So then I've tried CP850, which didn't throw an error, but the characters were still not readable.
There's not much code really.
# -*- encoding : utf-8 -*-
...
def show
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet.open 'c:\rails\renergy23\public\price-16-04-11.xls'
#sheet = book.worksheet 0
end
For simpicity I don't load it to the database right now. Instead I output it in my view:
- 30.times do |i|
= #sheet.row i+10
%br
http://dl.dropbox.com/u/4976861/price-16-04-11.xls
I kinda solved this after 1.5 months by first saving the document in .xlsx and then saving it in .xls (97-2003). I couldn't use the .xlsx because of some weird OLE signature incorrect error.

Headers on the second row in FasterCSV?

G'day guys, I'm currently using fasterCSV to parse a CSV file in ruby, and wondering how to get rid of the initial row of data on a CSV (The initial row contains the time/date information generated by another software package)
I tried using fasterCSV.table and then deleting row(0) then converting it to a CSV document then parsing it
but the row was still present in the document.
Any other ideas?
fTable = FasterCSV.table("sto.csv", :headers => true)
fTable.delete(0)
Three suggestions
Can you get FasterCSV to ignore the line?
You could use the :return_headers => true option to skip over the bad line. That'll work great if the second line isn't the real header. See here for more
:return_headers:
When false, header rows are silently
swallowed. If set to true, header rows
are returned in a FasterCSV::Row
object with identical headers and
fields (save that the fields do not go
through the converters).
Chop the line off with another tool
You don't need to use Ruby for this - how about chopping the file using one of the solutions suggested here you can call the one-liners from Ruby using the system method.
Max Flexibility - parse the file line by line with FasterCSV
Have you considered reading the file directly, skipping the first line and then accepting or rejecting lines? Deep in the heart of my code is this parse method which treats the file as a series of lines, accepting or rejecting each. You could do something similar but skip over the first row.
The neat thing is that you get to determine which rows are acceptable by defining your own acceptable? method - only valid CSV data is passed to acceptable? the rest are thrown away in response to the exception.
def parse(file)
#
# Parse data
#
row = []
file.each_line do |line|
the_line = line.chomp
begin
row = FasterCSV.parse_line(the_line)
ok, message = acceptable?(row)
if not ok
reject(file.lineno, the_line, message)
else
accept(row, the_line)
end
rescue FasterCSV::MalformedCSVError => e
reject(file.lineno, the_line, e.to_s)
end
end
hi doing just that with some data for Australian Electoral Commission. The file in question has a date string on the first line and headers on the second
require 'csv'
require 'open-uri'
filename = "http://results.aec.gov.au/15508/Website/Downloads/SenateGroupVotingTicketsDownload-15508.csv"
file = File.open(open(filename))
first_line = file.readline
CSV.parse(file, headers: true).each do |row|
puts row["State"]
end
I presume the file I quote still exists but that can be replaced by the file in question. if you need to skip more rows you have to call file.readline that number of times.
According to the docs, fTable = FasterCSV.table("sto.csv", :return_headers => false) should do what you want. .table implies :headers => true The docs have this info.

Resources