Headers on the second row in FasterCSV? - ruby-on-rails

G'day guys, I'm currently using fasterCSV to parse a CSV file in ruby, and wondering how to get rid of the initial row of data on a CSV (The initial row contains the time/date information generated by another software package)
I tried using fasterCSV.table and then deleting row(0) then converting it to a CSV document then parsing it
but the row was still present in the document.
Any other ideas?
fTable = FasterCSV.table("sto.csv", :headers => true)
fTable.delete(0)

Three suggestions
Can you get FasterCSV to ignore the line?
You could use the :return_headers => true option to skip over the bad line. That'll work great if the second line isn't the real header. See here for more
:return_headers:
When false, header rows are silently
swallowed. If set to true, header rows
are returned in a FasterCSV::Row
object with identical headers and
fields (save that the fields do not go
through the converters).
Chop the line off with another tool
You don't need to use Ruby for this - how about chopping the file using one of the solutions suggested here you can call the one-liners from Ruby using the system method.
Max Flexibility - parse the file line by line with FasterCSV
Have you considered reading the file directly, skipping the first line and then accepting or rejecting lines? Deep in the heart of my code is this parse method which treats the file as a series of lines, accepting or rejecting each. You could do something similar but skip over the first row.
The neat thing is that you get to determine which rows are acceptable by defining your own acceptable? method - only valid CSV data is passed to acceptable? the rest are thrown away in response to the exception.
def parse(file)
#
# Parse data
#
row = []
file.each_line do |line|
the_line = line.chomp
begin
row = FasterCSV.parse_line(the_line)
ok, message = acceptable?(row)
if not ok
reject(file.lineno, the_line, message)
else
accept(row, the_line)
end
rescue FasterCSV::MalformedCSVError => e
reject(file.lineno, the_line, e.to_s)
end
end

hi doing just that with some data for Australian Electoral Commission. The file in question has a date string on the first line and headers on the second
require 'csv'
require 'open-uri'
filename = "http://results.aec.gov.au/15508/Website/Downloads/SenateGroupVotingTicketsDownload-15508.csv"
file = File.open(open(filename))
first_line = file.readline
CSV.parse(file, headers: true).each do |row|
puts row["State"]
end
I presume the file I quote still exists but that can be replaced by the file in question. if you need to skip more rows you have to call file.readline that number of times.

According to the docs, fTable = FasterCSV.table("sto.csv", :return_headers => false) should do what you want. .table implies :headers => true The docs have this info.

Related

Rails Import/Parse from CSV UTF-8 Missing Column

So I'm working on allowing users to import data from a CSV file. Right now all the fields will import correctly, except whatever is the first field.
What I've discovered is the file type is affecting the import.
My code looks like:
class Import < Operation
require 'csv'
def call(file, training_event_id)
csv_data = CSV.parse(file.read, headers: true)
list_occo = []
csv_data.each do |row|
occupant = Occupant.new
occupant.account_number = row['Account Number']
occupant.check_in = row['Check In']
binding.pry
occupant.training_event_id = training_event_id
list_occo << occupant
end
binding.pry
occo_errors = check_file(list_occo)
list_occo.each(&:save) if occo_errors.empty?
return occo_errors
end
When I do the binding.pry and check on occupant I'm getting nil on the Account Number when doing CSV UTF-8. If I switch to straight up CSV not an issue. Is there a way to convert/switch a CSV UTF-8 to CSV? I thought/tried using some sort of encoding on the parse like: encoding: 'iso-8859-1' but that didn't work.
Is there a way to convert the CSV UTF-8 or is there a way to do a straight up file format check to ensure it's CSV and not CSV UTF-8?
Just in case someone comes across this issue in the future. I looked at the file in the rails console using CSV.read(file.path) and noticed U+FEFF preceding the first column header. There's a rabbit hole of information about BOM and UTF-8 issues. Without wanting to do a CSV/File.open I attempted things like doing a split, gsub, file checks on utf-8, etc. Then I simply changed the csv_data line to be:
csv_data = CSV.parse(File.read(file, encoding: 'bom|utf-8'), headers: true)
Then in my controller I updated it from (params[:file]) to (params[:file].path) as I was getting an error of
no implicit conversion of ActionDispatch::Http::UploadedFile into
String
Hopefully this helps someone else.

Problem with attachments' character encoding using gmail gem in ruby/rails

What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(#user_email,#user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => #sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv (Looks good to me):
With nano temp.csv (Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...) with all the different "ISO-8859-x" character sets
putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)
encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "\xC4" from ASCII-8BIT to UTF-8
writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).
searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("\r\n", "\n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "\u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ" and "þ" in the start of the document and "\u0000" between all remaining characters).
It seems like you need to do attachment.body.decoded instead of attachment.decoded

Importing a CSV to Rails database

I asked this question earlier this week, and it worked fine. I just tried it with a slightly bigger spreadsheet and it doesn't seem to work for some reason.
My code is as follows:
require 'roo'
xlsx = Roo::Spreadsheet.open(File.expand_path('../Downloads/unistats/LOCATION.csv'))
xlsx.each_row_streaming(offset: 1) do |row|
Location.find_or_create_by(ukprn: row[0].value, accomurl: row[1].value, instbeds: row[3].value, instlower: row[4].value, instupper: row[5].value, locid: row[6].value, locname: row[7].value, lat: row[9].value, long: row[10].value, locukprn: row[11].value, loccountry: row[12].value, privatelower: row[13].value, privateupper: row[14].value, suurl: row[15].value)
end
But unlike last time, this is coming up with this error:
NoMethodError: undefined method `each_row_streaming' for #<Roo::CSV:0xb9e0b78>
Did you mean? each_row_using_tempdir
This file is a CSV rather than .xlsx but that shouldn't make a difference.
Any ideas what I'm doing wrong?
It does actually makes a difference that you're trying to read a CSV file using the Excel methods.
Excerpts from the Roo documentation.
# Load a CSV file
s = Roo::CSV.new("mycsv.csv")
# Load a tab-delimited csv
s = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
s = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
A neat way to read both Excel and CSV files is to do something like
if File.extname(filename).start_with?('xls')
workbook = Roo::Excel.new(filename)
else
workbook = Roo::CSV.new(filename)
end
workbook.default_sheet = workbook.sheets[0]
(workbook.first_row..workbook.last_row).each do |line|
...
end

MalformedCSVError with rails CSV (FasterCSV)

I'm having serious issues trying to parse some CSV in rails right now.
Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"
Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.
If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)
If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.
If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.
Any suggestions? My app is processing the file as follows :
def create
#survey = Survey.new(params[:survey])
# Now we need to try and convert this to UTF-8 if it isn't already
encoded = File.read(#survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)
# We've got a guess at the encoding,
# so we can try and convert it but it
# may still fail so we need to handle
# that
begin
re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
re_encoded = re_encoded.gsub(/\r\n?/, "\n")
# Now replace the uploaded file
File.open(#survey.survey_data.current_path, 'w') { |f|
f.write(re_encoded)
}
rescue ArgumentError
puts "UH OH!!!!!"
end
puts "#{#survey.survey_data.current_path}"
#parsed = CSV.read(#survey.survey_data.current_path)
end
The file uploading gem is CarrierWave if that makes any difference.
Please can someone help me as this is driving me insane!
Edit
The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is
"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"
Well that was irritating!
Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with
CSV.open("path/to/file.csv", "rb:bom|encoding")
allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!

CSV - Unquoted fields do not allow \r or \n (line 2)

Trying to parse a CSV file, but still getting the error message Unquoted fields do not allow \r or \n (line 2)..
I found here at SO similar topic, where was a hint to do following:
CSV.open('file.csv', :row_sep => "\r\n") do |csv|
but his unfortunately doesn't works me... I can't change the CSV file, so I would need to fix it in the code.
EDIT sample of CSV file:
A;B;C
1234;...
Is there any way to do it?
Many thanks!
First of all, you should set you column delimiters to ';', since that is not the normal way CSV files are parsed. This worked for me:
CSV.open('file.csv', :row_sep => :auto, :col_sep => ";") do |csv|
csv.each { |a,b,c| puts "#{a},#{b},#{c}" }
end
From the 1.9.2 CSV documentation:
Auto-discovery reads ahead in the data looking for the next \r\n,
\n, or \r sequence. A sequence will be selected even if it occurs
in a quoted field, assuming that you would have the same line endings
there.
Simpler solution if the CSV was touched or saved by any program that may have used weird formatting (such as Excel or Spreadsheet):
Open the file with any plain text editor (I used Sublime Text 3)
Press the enter key to add a new line anywhere
Save the file
Remove the line you just added
Save the file again
Try the import again, error should be gone
For me I was importing LinkedIn CSV and got the error.
I removed the blank lines like this:
def import
csv_text = File.read('filepath', :encoding => 'ISO-8859-1')
#remove blank lines from LinkedIn
csv_text = csv_text.gsub /^$\n/, ''
#csv = CSV.parse(csv_text, :headers => true, skip_blanks: true)
end
In my case I had to provide encoding, and a quote char that was guaranteed to not occur in data
CSV.read("file.txt", 'rb:bom|UTF-16LE', {:row_sep => "\r\n", :col_sep => "\t", :quote_char => "\x00"})
I realize this is an old post but I recently ran into a similar issue with a badly formatted CSV file that failed to parse with the standard Ruby CSV library.
I tried the SmarterCSV gem which parsed the file in no time. It's an external library so it might not be the best solution for everyone but it beats parsing the file myself.
opts = { col_sep: ';', file_encoding: 'iso-8859-1', skip_lines: 5 }
SmarterCSV.process(file, opts).each do |row|
p row[:someheader]
end
Please see this thread Unquoted fields do not allow \r or \n
Solution:
file = open(file.csv).read.gsub!("\r", '')
CSV.open(file, :row_sep => "\r\n") do |csv|
In my case, the first row of the spreadsheet/CSV was a double-quoted bit of introduction text. The error I got was:
/Users/.../.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/csv.rb:1880:in `block (2 levels) in shift': Unquoted fields do not allow \r or \n (line 1). (CSV::MalformedCSVError)
I deleted the comment with " characters so the .csv ONLY had the .csv data, saved it, and my program worked with no errors.
If you have to deal with files coming from Excel with newlines in cells there is also a solution.
The big disadvantage of this way is, that no semicolons or no double quotes in strings are allowed.
I choose to go with no semicolons
if file.respond_to?(:read)
csv_contents = file.read
elsif file_data.respond_to?(:path)
csv_contents = File.read(file.path)
else
logger.error "Bad file_data: #{file_data.class.name}: #{file_data.inspect}"
return false
end
result = "string"
csv_contents = csv_contents.force_encoding("iso-8859-1").encode('utf-8') # In my case the files are latin 1...
# Here is the important part (Remove all newlines between quotes):
while !result.nil?
result = csv_contents.sub!(/(\"[^\;]*)[\n\r]([^\;]*\")/){$1 + ", " + $2}
end
CSV.parse(csv_contents, headers: false, :row_sep => :auto, col_sep: ";") do |row|
# do whatever
end
For me the solution works fine, if you deal with large files you could run into problems with it.
If you want to go with no quotes just replace the semicolons in the regex with quotes.
Another simple solution to fix the weird formatting caused by Excel is to copy and paste the data into Google spreadsheet and then download it as a CSV.

Resources