I'm trying to import CSV files and the header row keeps moving around, sometimes it's in row 20, sometimes it's in 25, and so on, but the field 'SPIRIT Barcode' is always in the header and it's the only thing I'm interested in at this point. I'm saving it in "barcode".
How do I manipulate this to find the row 'SPIRIT Barcode' is in and use that as the header? (everything above the header can be ignored)
def self.import(file)
#b = []
CSV.foreach(file.path, headers: true) do |row|
entry = ZygReport.find_by(barcode: row['SPIRIT Barcode']) || new
entry.update({
:barcode => row['SPIRIT Barcode'],
})
entry.save!
#b << [entry.barcode]
end
end
Ignore the #b, that's for another function.
This code parses the file only once, reading it line by line and looking for SPIRIT Barcode.
Once the line is found, it removes the newline, and splits it to get an array of header names.
CSV.parse can be called on the file directly. Since it has already been read until the header, it will start at the correct line :
require 'csv'
sep = ';'
File.open('test.csv'){|file|
header = file.find{|line| line.include?('SPIRIT Barcode')}.chomp.split(sep)
CSV.parse(file, headers: header, col_sep: sep).each do |row|
p row
end
}
With test.csv as :
Ignore me
Ignore me
SPIRIT Barcode; B; C
1; 2; 3
4; 5; 6
It outputs :
#<CSV::Row "SPIRIT Barcode":"1" " B":" 2" " C":" 3">
#<CSV::Row "SPIRIT Barcode":"4" " B":" 5" " C":" 6">
It shouldn't be hard to adapt it to your data.
This is untested, but it's the general idea for what you're trying to do:
require 'csv'
header_found = false
File.foreach('path/to/file.csv') do |li|
next unless header_found || li['SPIRIT Barcode']
header_found = true
# start processing CSV...
CSV.parse(li) do |row|
# use row here...
end
end
You'll have to figure out what to do when the header is found if you want to keep track of the values.
Related
I created a rake task to import users from a Google Sheet. Therefore I am using the gem 'Roo'. Everything works so far but I can't seem to get it working without importing the first row (headers).
This is my code:
require 'roo'
namespace :import do
desc "Import users from Google Sheet"
task users: :environment do
#counter = 0
url = 'https://docs.google.com/spreadsheets/d/{mycode}/export?format=xlsx'
xlsx = Roo::Spreadsheet.open(url, extension: :xlsx, headers: true)
xlsx.each do |row|
n = User.where(name:row[0]).first
user = User.find_or_create_by(id: n)
user.update(
name:row[0],
country_id:row[6]
)
user.save!
puts user.name
#counter += 1
end
puts "Imported #{#counter} lines."
end
end
Your code says headers: true when you are opening the sheet. Have you tried turning it to false? Or are you saying it does not work when it's set to false?
Also, you are using .each rather differently than the example in the documentation. The doc shows a hash with keys derived from the headers. You are using [n] array notation. Does that work?
EDIT:
Try using .each in a way that's more similar to what the documentation says:
xlsx.each(name: 'Name', country_id: 'Country ID') do |row|
n = User.where(name: row[:name]).first
...
end
The strings 'Name' and 'Country ID' are just examples; they should be the text of whatever column headers have the name and country_id information.
There is a way to skip the headers, it is using the method: each_row_streaming(offset: 1).
It will return an array with rows skipping the header, so you have to get the value using .value method. In documentation specify it for Excelx::Cell objects, but it works for Roo::Spreadsheet objects too.
The documentation example:
xlsx.each_row_streaming(offset: 1) do |row| # Will exclude first (inevitably header) row
puts row.inspect # Array of Excelx::Cell objects
end
I have a CSV document with one column and 1000 rows. Each row has a string of data which is seperated by "|".
For example
BOB|MARLEY|306336|Friday| 9:00AM|02 DIS 2|HELE TP 1|PARRA|JULIA|20 Jul 2018|TOMPSON|TORI|21332|NA|AUS|4214|||0400 000 000|zzz11#bigpond.com|.0000|NULL|NULL|0|QLD|F|2016-06-22 00:00:00.000|
I need to loop through each row then split the string into another array. I then need to loop through each of those arrays.
Currently I have
csv_text = open('https://res.cloudinary.com/thypowerhouse/raw/upload/v1534642033/rackleyswimming/HVL_SCHOOL.csv')
csv = CSV.parse(csv_text, :headers=>true)
csv.each do |row|
new_row = row.map(&:inspect).join
new_row = new_row.delete! '[]'
new_row = new_row.gsub('|', '", "')
new_row = new_row.split(',')
puts new_row
end
Don't know if I'm heading in the right direction?
You can use col_sep to separate the data of each row:
require "csv"
CSV.foreach("HVL_SCHOOL.csv", headers: true, col_sep: "|") do |row|
# Your code here, trait your data
end
Every row on the scope of CSV#foreach (previus example) will be a CSV::Row that can be treated as an array because it has enumerable as included module.
I think with this you can do what you want with this data.
Is there any way to tell the CSV object that a line break between quotes is not a row delimiter?
My CSV file is:
"a","b","c"
1,"some
text with line break",21
2,"blah",4
My code is:
CSV.foreach(file_path, headers: true) do |row|
puts row
end
I want it to return only two rows, but it returns three.
You're (wrongly) judging the number of rows by the number of printed lines. It returns two. Go figure:
[4] pry(main)> CSV.foreach('example.csv', headers: true).to_a
=> [
#<CSV::Row "a":"1" "b":"some\ntext with line break" "c":"21">,
#<CSV::Row "a":"2" "b":"blah" "c":"4">
]
Your code outputs three lines because you're printing the rows out and line break is printed as-is. That makes it look as if one row became two. Thinking the same way, I'd say that your source CSV contains 4 (four!) rows. And that isn't really true.
Currently, you can set your header into true then to show your data row.to_hash. Example:
CSV.foreach("/home/akbar/text.csv", headers: true) do |row|
puts row.to_hash
end
The result is:
1.9.3p194 :034 > CSV.foreach("/home/akbar/text.csv", headers: true) do |x|
1.9.3p194 :035 > puts x.to_hash
1.9.3p194 :036?> end
{"a"=>"1", "b"=>"some\ntext with line break", "c"=>"21"}
{"a"=>"2", "b"=>"blah", "c"=>"4"}
For more information see "ruby-on-rails-import-data-from-a-csv-file".
For those who getting trouble when trying to read a CSV file that contains a line break in any row, just read it with row_sep: '\r\n'
data = CSV.read('your_file.csv', row_sep: "\r\n")
I'm using ruby 1.9.2. My csv file as follows..,
NAME, Id, No, Dept
Tom, 1, 12, CS
Hendry, 2, 35, EC
Bahamas, 3, 21, IT
Frank, 4, 61, EE
I want to print an specific row say ('Tom'). I tried out in many ways, but I didn't find the exact result. The most recommended options is "Fastercsv". But it is applicable for my version. Also, I noticed that csv print the field as column wise. How to print an entire row using csv in rails. My ruby code is as follows
require 'csv'
csv_text = File.read('sampler.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "#{row[:NAME]},#{row[:Id]},#{row[:No]},#{row[:Dept]}"
end
Use .find
csv = CSV.read('sampler.csv', headers: true)
puts csv.find {|row| row['NAME'] == 'Tom'} #=> returns first `row` that satisfies the block.
Here's another approach that keeps the code within the CSV API.
csv_table is a CSV::Table
row is a CSV::Row
row_with_specified_name is a CSV::Row.
csv_table = CSV.table("./tables/example.csv", converters: :all)
row_with_specified_name = csv_table.find do |row|
row.field(:name) == 'Bahamas'
end
p row_with_specified_name.to_csv.chomp #=> "Bahamas,3,21,IT"
FYI, CSV.table is just a shortcut for:
CSV.read( path, { headers: true,
converters: :numeric,
header_converters: :symbol }.merge(options) )
As per the docs.
If you have a large CSV file and want to find an exact row it will be way faster and way less memory intense to read one line at a time.
require 'csv'
csv = CSV.open('sampler.csv', 'r', headers: true)
while row = csv.shift
if row['name'] == 'Bahamas'
break
end
end
pp row
I am using FasterCSV and i am looping with a foreach like this
FasterCSV.foreach("#{Rails.public_path}/uploads/transfer.csv", :encoding => 'u', :headers => :first_row) do |row|
but the problem is my csv has the first 3 lines as the headers...any way to make fasterCSV skip the first three rows rather then only the first??
Not sure about FasterCSV, but in Ruby 1.9 standard CSV library (which is made from FasterCSV), I can do something like:
c = CSV.open '/path/to/my.csv'
c.drop(3).each do |row|
# do whatever with row
end
I'm not a user of FasterCSV, but why not do the control yourself:
additional_rows_to_skip = 2
FasterCSV.foreach("...", :encoding => 'u', :headers => :first_row) do |row|
if additional_rows_to_skip > 0
additional_rows_to_skip -= 1
else
# do stuff...
end
end
Thanks to Mladen Jablanovic. I got my clue.. But I realized something interesting
In 1.9, reading seems to be from POS.
In this I mean if you do
c = CSV.open iFileName
logger.debug c.first
logger.debug c.first
logger.debug c.first
You'll get three different results in your log. One for each of the three header rows.
c.each do |row| #now seems to start on the 4th row.
It makes perfect sense that it would read the file this way. Then it would only have to have the current row in memory.
I still like Mladen Jablanovićs answer, but this is an interesting bit of logic too.