I have a rake file that pulls in data from an external CSV file and enumerates through it with:
CSV.foreach(file, :headers => true) do |row|
What is the most effective way (in ruby) to specify the starting point within the spreadsheet?
:headers => true allows me to start importing from the second line, but what if I want to start a line 20?
Ruby enumerators include a drop method that will skip over the first n items.
When not passed a block, CSV.foreach returns an enumerator.
You can use
CSV.foreach(file, :headers => true).drop(20).each do |row|
This will skip the first 20 data rows (the header row does NOT count as one of those twenty).
Use .drop(#rows to ignore):
CSV.open(file, :headers => true).drop(20).each do |row|
Related
I've a rake task where I import CSV data into a database via Rails.
I want a specific column (specifically, row[6] below) to be rendered as an integer. However, everything I try returns that value as a string.
Below is the rake task:
require 'csv'
namespace :import_site_csv do
task :create_sites => :environment do
CSV.foreach('../sites.csv', :headers => true) do |row|
row[6] = row[6].to_i
Site.create!(row.to_hash)
end
end
end
Does anyone have an idea how I might do this? Thanks!
You are making one small (but important) mistake here.
When you call CSV.foreach('../sites.csv') each of the rows will be an array of the values in that particular row. That would allow you to access the data you need, in the way you do it now - row[6].
But, when you add the :headers => true option to CSV.foreach, you will not get an array of values (row will not be an array). Instead, it will be a CSV::Row object (docs). As you can read in the documentation:
A CSV::Row is part Array and part Hash. It retains an order for the fields and allows duplicates just as an Array would, but also allows you to access fields by name just as you could if they were in a Hash.
For example, if you have a column with the name Title in the CSV, to get the title in each of the rows, you need to do something like:
CSV.foreach('file.csv', :headers => true) do |row|
puts row['Title']
end
Since I do not know the structure of your CSV, I cannot tell you which key you should use to get the data and convert it to an Integer, but I think that this should give you a good idea of how to proceed.
I am going to keep this question simple.
I am trying to import a CSV file into my application.
The file has long numbers in it such as :"9405510200830754182150"
but when the file is imported the data looks like this: "9.40551e+21"
does anyone know how to get around this?
Here is the code I am using
CSV.foreach(file.path, headers: true) do |row|
puts "row: #{row.inspect}"
end
UPDATE
Thank you for the comments, I am not sure why CSV is converting that number into a float i need to keep it as a string.
I should clarify that I am using Rails 3.2.18 for this project
If you want to reproduce my code:
1.create CSV with 9405510200830754182150 in it
2.run this code to terminal:
file = File.join(Rails.root, 'tracking.csv')
CSV.foreach(file, headers: true) do |row|
puts "row: #{row.inspect}"
end
I need to be able to keep "9405510200830754182150" is a string since this is a tracking number of an order and needs to be stored in the database
Are you sure that "9.40551e+21" is not a visual approximation? Try this:
CSV.foreach(file.path, headers: true) do |row|
puts row['my_numeric_header']
end
It's supposed to treat everything in a CSV file as a string by default. You could try the converters: numeric option
CSV.foreach(file.path, headers: true, converters: :numeric) do |row|
puts "row: #{row.inspect}"
end
Which will interpret numbers into the appropriate types. Otherwise you might have to debug the CSV module code to figure out what's going on.
The :float converter converted your number to a Float for you. Unfortunately Float cannot hold such a large number, see comments...
[14] pry(main)> val = CSV::Converters[:float].('9405510200830754182150')
=> 9.405510200830755e+21
[15] pry(main)> val.class
=> Float
[16] pry(main)> "%d" % val
=> "9405510200830754553856"
[17] pry(main)> "%f" % val
=> "9405510200830754553856.000000"
I'm using ruby 1.9.2. My csv file as follows..,
NAME, Id, No, Dept
Tom, 1, 12, CS
Hendry, 2, 35, EC
Bahamas, 3, 21, IT
Frank, 4, 61, EE
I want to print an specific row say ('Tom'). I tried out in many ways, but I didn't find the exact result. The most recommended options is "Fastercsv". But it is applicable for my version. Also, I noticed that csv print the field as column wise. How to print an entire row using csv in rails. My ruby code is as follows
require 'csv'
csv_text = File.read('sampler.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "#{row[:NAME]},#{row[:Id]},#{row[:No]},#{row[:Dept]}"
end
Use .find
csv = CSV.read('sampler.csv', headers: true)
puts csv.find {|row| row['NAME'] == 'Tom'} #=> returns first `row` that satisfies the block.
Here's another approach that keeps the code within the CSV API.
csv_table is a CSV::Table
row is a CSV::Row
row_with_specified_name is a CSV::Row.
csv_table = CSV.table("./tables/example.csv", converters: :all)
row_with_specified_name = csv_table.find do |row|
row.field(:name) == 'Bahamas'
end
p row_with_specified_name.to_csv.chomp #=> "Bahamas,3,21,IT"
FYI, CSV.table is just a shortcut for:
CSV.read( path, { headers: true,
converters: :numeric,
header_converters: :symbol }.merge(options) )
As per the docs.
If you have a large CSV file and want to find an exact row it will be way faster and way less memory intense to read one line at a time.
require 'csv'
csv = CSV.open('sampler.csv', 'r', headers: true)
while row = csv.shift
if row['name'] == 'Bahamas'
break
end
end
pp row
I am using FasterCSV and i am looping with a foreach like this
FasterCSV.foreach("#{Rails.public_path}/uploads/transfer.csv", :encoding => 'u', :headers => :first_row) do |row|
but the problem is my csv has the first 3 lines as the headers...any way to make fasterCSV skip the first three rows rather then only the first??
Not sure about FasterCSV, but in Ruby 1.9 standard CSV library (which is made from FasterCSV), I can do something like:
c = CSV.open '/path/to/my.csv'
c.drop(3).each do |row|
# do whatever with row
end
I'm not a user of FasterCSV, but why not do the control yourself:
additional_rows_to_skip = 2
FasterCSV.foreach("...", :encoding => 'u', :headers => :first_row) do |row|
if additional_rows_to_skip > 0
additional_rows_to_skip -= 1
else
# do stuff...
end
end
Thanks to Mladen Jablanovic. I got my clue.. But I realized something interesting
In 1.9, reading seems to be from POS.
In this I mean if you do
c = CSV.open iFileName
logger.debug c.first
logger.debug c.first
logger.debug c.first
You'll get three different results in your log. One for each of the three header rows.
c.each do |row| #now seems to start on the 4th row.
It makes perfect sense that it would read the file this way. Then it would only have to have the current row in memory.
I still like Mladen Jablanovićs answer, but this is an interesting bit of logic too.
I am using ruby 1.8.7 , rails 2.3.8. I want to parse the data from TXT dump file separated by tab.
In this TXT dump contain some CSS property look like has some invalid data.
When run my code using FasterCSV gem
FasterCSV.foreach(txt_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
puts col[15]
end
the error written in console as "Illegal quoting on line 38." Can any one suggest me how to skip the row which has invalid data and proceed data load process of remaining rows?
Here's one way to do it. We go to lower level, using shift to parse each row and then silent the MalformedCSVError exception, continuing with the next iteration. The problem with this is the loop doesn't look so nice. If anyone can improve this, you're welcome to edit the code.
FasterCSV.open(filename, :quote_char => '"', :col_sep => "\t", :headers => true) do |csv|
row = true
while row
begin
row = csv.shift
break unless row
# Do things with the row here...
rescue FasterCSV::MalformedCSVError
next
end
end
end
Just read the file as a regular one (not with FasterCSV), split it like you do know by \t and it should work
So the problem is that TSV files don't have a quote character. The specification simply specifies that you aren't allowed to have tabs in the data.
The CSV library doesn't really support this use case. I've worked around it by specifying a quote character that I know won't appear in my data. For example
CSV.parse(txt_file, :quote_char => '☎', :col_sep => "\t" do |row|
puts row[15]
end