Ruby/Rails, Parsing CSV with line break between quotes

Ruby/Rails, Parsing CSV with line break between quotes - ruby-on-rails

Is there any way to tell the CSV object that a line break between quotes is not a row delimiter?
My CSV file is:
"a","b","c"
1,"some
text with line break",21
2,"blah",4
My code is:
CSV.foreach(file_path, headers: true) do |row|
puts row
end
I want it to return only two rows, but it returns three.

You're (wrongly) judging the number of rows by the number of printed lines. It returns two. Go figure:
[4] pry(main)> CSV.foreach('example.csv', headers: true).to_a
=> [
#<CSV::Row "a":"1" "b":"some\ntext with line break" "c":"21">,
#<CSV::Row "a":"2" "b":"blah" "c":"4">
]
Your code outputs three lines because you're printing the rows out and line break is printed as-is. That makes it look as if one row became two. Thinking the same way, I'd say that your source CSV contains 4 (four!) rows. And that isn't really true.

Currently, you can set your header into true then to show your data row.to_hash. Example:
CSV.foreach("/home/akbar/text.csv", headers: true) do |row|
puts row.to_hash
end
The result is:
1.9.3p194 :034 > CSV.foreach("/home/akbar/text.csv", headers: true) do |x|
1.9.3p194 :035 > puts x.to_hash
1.9.3p194 :036?> end
{"a"=>"1", "b"=>"some\ntext with line break", "c"=>"21"}
{"a"=>"2", "b"=>"blah", "c"=>"4"}
For more information see "ruby-on-rails-import-data-from-a-csv-file".

For those who getting trouble when trying to read a CSV file that contains a line break in any row, just read it with row_sep: '\r\n'
data = CSV.read('your_file.csv', row_sep: "\r\n")

Related

CSV won't import by key in hash (Rails)

I'm having problems importing this CSV:
municipality,province,province abbrev,country,region
Vancouver,British Columbia,BC,Canada,Metro Vancouver - North
Specifically, Vancouver is not being returned when I look for its value by its key:
municipality_name = row["municipality"]
Here's the code:
def self.import_csv(file)
CSV.foreach(file, headers: true,
skip_blanks: true,
skip_lines: /^(?:,\s*)+$/,
col_sep: ",") do |row|
municipality_name = row["municipality"]
puts row.to_h
puts "municipality_name: #{municipality_name}"
puts "row[0]: #{row[0]}"
end
end
Here's the output:
irb(main):052:0> Importers::Municipalities.import_csv('tmp/municipalities.csv')
{"municipality"=>"Vancouver", "province"=>"British Columbia", "province abbrev"=>"BC", "country"=>"Canada", "region"=>"Metro Vancouver - North"}
municipality_name:
row['municipality']:
row[0]: Vancouver
Seems like I'm missing something obvious. I thought maybe there was a hidden character in the CSV but turned on hidden characters in Sublime and no dice.
Thanks in advance.

You need to call to_h on the row if you want to access it by its keys. Otherwise, it is an array-like object, accessible by indices.
def self.import_csv(file)
CSV.foreach(file, headers: true,
skip_blanks: true,
skip_lines: /^(?:,\s*)+$/,
col_sep: ",") do |row|
row = row.to_h
municipality_name = row["municipality"]
puts "municipality_name: #{municipality_name}"
end
end

Seems like it was a problem with the CSV and the code works fine. Created a new CSV, typed in the same content, and it worked. Maybe an invisible character that Sublime wasn't showing? Can't verify as I wiped the original CSV that was causing issues.

Header not in Row 1 CSV Import Ruby Rails

I'm trying to import CSV files and the header row keeps moving around, sometimes it's in row 20, sometimes it's in 25, and so on, but the field 'SPIRIT Barcode' is always in the header and it's the only thing I'm interested in at this point. I'm saving it in "barcode".
How do I manipulate this to find the row 'SPIRIT Barcode' is in and use that as the header? (everything above the header can be ignored)
def self.import(file)
#b = []
CSV.foreach(file.path, headers: true) do |row|
entry = ZygReport.find_by(barcode: row['SPIRIT Barcode']) || new
entry.update({
:barcode => row['SPIRIT Barcode'],
})
entry.save!
#b << [entry.barcode]
end
end
Ignore the #b, that's for another function.

This code parses the file only once, reading it line by line and looking for SPIRIT Barcode.
Once the line is found, it removes the newline, and splits it to get an array of header names.
CSV.parse can be called on the file directly. Since it has already been read until the header, it will start at the correct line :
require 'csv'
sep = ';'
File.open('test.csv'){|file|
header = file.find{|line| line.include?('SPIRIT Barcode')}.chomp.split(sep)
CSV.parse(file, headers: header, col_sep: sep).each do |row|
p row
end
}
With test.csv as :
Ignore me
Ignore me
SPIRIT Barcode; B; C
1; 2; 3
4; 5; 6
It outputs :
#<CSV::Row "SPIRIT Barcode":"1" " B":" 2" " C":" 3">
#<CSV::Row "SPIRIT Barcode":"4" " B":" 5" " C":" 6">
It shouldn't be hard to adapt it to your data.

This is untested, but it's the general idea for what you're trying to do:
require 'csv'
header_found = false
File.foreach('path/to/file.csv') do |li|
next unless header_found || li['SPIRIT Barcode']
header_found = true
# start processing CSV...
CSV.parse(li) do |row|
# use row here...
end
end
You'll have to figure out what to do when the header is found if you want to keep track of the values.

How to find a specific row in csv

I'm using ruby 1.9.2. My csv file as follows..,
NAME, Id, No, Dept
Tom, 1, 12, CS
Hendry, 2, 35, EC
Bahamas, 3, 21, IT
Frank, 4, 61, EE
I want to print an specific row say ('Tom'). I tried out in many ways, but I didn't find the exact result. The most recommended options is "Fastercsv". But it is applicable for my version. Also, I noticed that csv print the field as column wise. How to print an entire row using csv in rails. My ruby code is as follows
require 'csv'
csv_text = File.read('sampler.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "#{row[:NAME]},#{row[:Id]},#{row[:No]},#{row[:Dept]}"
end

Use .find
csv = CSV.read('sampler.csv', headers: true)
puts csv.find {|row| row['NAME'] == 'Tom'} #=> returns first `row` that satisfies the block.

Here's another approach that keeps the code within the CSV API.
csv_table is a CSV::Table
row is a CSV::Row
row_with_specified_name is a CSV::Row.
csv_table = CSV.table("./tables/example.csv", converters: :all)
row_with_specified_name = csv_table.find do |row|
row.field(:name) == 'Bahamas'
end
p row_with_specified_name.to_csv.chomp #=> "Bahamas,3,21,IT"
FYI, CSV.table is just a shortcut for:
CSV.read( path, { headers: true,
converters: :numeric,
header_converters: :symbol }.merge(options) )
As per the docs.

If you have a large CSV file and want to find an exact row it will be way faster and way less memory intense to read one line at a time.
require 'csv'
csv = CSV.open('sampler.csv', 'r', headers: true)
while row = csv.shift
if row['name'] == 'Bahamas'
break
end
end
pp row

How do i skip over the first three rows instead of the only the first in FasterCSV

I am using FasterCSV and i am looping with a foreach like this
FasterCSV.foreach("#{Rails.public_path}/uploads/transfer.csv", :encoding => 'u', :headers => :first_row) do |row|
but the problem is my csv has the first 3 lines as the headers...any way to make fasterCSV skip the first three rows rather then only the first??

Not sure about FasterCSV, but in Ruby 1.9 standard CSV library (which is made from FasterCSV), I can do something like:
c = CSV.open '/path/to/my.csv'
c.drop(3).each do |row|
# do whatever with row
end

I'm not a user of FasterCSV, but why not do the control yourself:
additional_rows_to_skip = 2
FasterCSV.foreach("...", :encoding => 'u', :headers => :first_row) do |row|
if additional_rows_to_skip > 0
additional_rows_to_skip -= 1
else
# do stuff...
end
end

Thanks to Mladen Jablanovic. I got my clue.. But I realized something interesting
In 1.9, reading seems to be from POS.
In this I mean if you do
c = CSV.open iFileName
logger.debug c.first
logger.debug c.first
logger.debug c.first
You'll get three different results in your log. One for each of the three header rows.
c.each do |row| #now seems to start on the 4th row.
It makes perfect sense that it would read the file this way. Then it would only have to have the current row in memory.
I still like Mladen Jablanovićs answer, but this is an interesting bit of logic too.

How parse the data from TXT file with tab separator?

I am using ruby 1.8.7 , rails 2.3.8. I want to parse the data from TXT dump file separated by tab.
In this TXT dump contain some CSS property look like has some invalid data.
When run my code using FasterCSV gem
FasterCSV.foreach(txt_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
puts col[15]
end
the error written in console as "Illegal quoting on line 38." Can any one suggest me how to skip the row which has invalid data and proceed data load process of remaining rows?

Here's one way to do it. We go to lower level, using shift to parse each row and then silent the MalformedCSVError exception, continuing with the next iteration. The problem with this is the loop doesn't look so nice. If anyone can improve this, you're welcome to edit the code.
FasterCSV.open(filename, :quote_char => '"', :col_sep => "\t", :headers => true) do |csv|
row = true
while row
begin
row = csv.shift
break unless row
# Do things with the row here...
rescue FasterCSV::MalformedCSVError
next
end
end
end

Just read the file as a regular one (not with FasterCSV), split it like you do know by \t and it should work

So the problem is that TSV files don't have a quote character. The specification simply specifies that you aren't allowed to have tabs in the data.
The CSV library doesn't really support this use case. I've worked around it by specifying a quote character that I know won't appear in my data. For example
CSV.parse(txt_file, :quote_char => '☎', :col_sep => "\t" do |row|
puts row[15]
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Ruby/Rails, Parsing CSV with line break between quotes - ruby-on-rails

Is there any way to tell the CSV object that a line break between quotes is not a row delimiter? My CSV file is: "a","b","c" 1,"some text with line break",21 2,"blah",4 My code is: CSV.foreach(file_path, headers: true) do |row| puts row end I want it to return only two rows, but it returns three.

For those who getting trouble when trying to read a CSV file that contains a line break in any row, just read it with row_sep: '\r\n' data = CSV.read('your_file.csv', row_sep: "\r\n")

Related

CSV won't import by key in hash (Rails)

Header not in Row 1 CSV Import Ruby Rails

How to find a specific row in csv

How do i skip over the first three rows instead of the only the first in FasterCSV

How parse the data from TXT file with tab separator?

Categories

Resources