Ignore first line on csv parse Rails - ruby-on-rails

I am using the code from this tutorial to parse a CSV file and add the contents to a database table. How would I ignore the first line of the CSV file? The controller code is below:
def csv_import
#parsed_file=CSV::Reader.parse(params[:dump][:file])
n = 0
#parsed_file.each do |row|
s = Student.new
s.name = row[0]
s.cid = row[1]
s.year_id = find_year_id_from_year_title(row[2])
if s.save
n = n+1
GC.start if n%50==0
end
flash.now[:message] = "CSV Import Successful, #{n} new students added to the database."
end
redirect_to(students_url)
end

This question kept popping up when i was searching for how to skip the first line with the CSV / FasterCSV libraries, so here's the solution that if you end up here.
the solution is...
CSV.foreach("path/to/file.csv",{:headers=>:first_row}) do |row|
HTH.

#parsed_file.each_with_index do |row, i|
next if i == 0
....

If you identify your first line as headers then you get back a Row object instead of a simple Array.
When you grab cell values, it seems like you need to use .fetch("Row Title") on the Row object.
This is what I came up with. I'm skipping nil with my if conditional.
CSV.foreach("GitHubUsersToAdd.csv",{:headers=>:first_row}) do |row|
username = row.fetch("GitHub Username")
if username
puts username.inspect
end
end

Using this simple code, you can read a CSV file and ignore the first line which is the header or field names:
CSV.foreach(File.join(File.dirname(__FILE__), filepath), headers: true) do |row|
puts row.inspect
end
You can do what ever you want with row. Don't forget headers: true

require 'csv'
csv_content =<<EOF
lesson_id,user_id
5,3
69,95
EOF
parse_1 = CSV.parse csv_content
parse_1.size # => 3 # it treats all lines as equal data
parse_2 = CSV.parse csv_content, headers:true
parse_2.size # => 2 # it ignores the first line as it's header
parse_1
# => [["lesson_id", "user_id"], ["5", "3"], ["69", "95"]]
parse_2
# => #<CSV::Table mode:col_or_row row_count:3>
here where it's the fun part
parse_1.each do |line|
puts line.inspect # the object is array
end
# ["lesson_id", "user_id"]
# ["5", " 3"]
# ["69", " 95"]
parse_2.each do |line|
puts line.inspect # the object is `CSV::Row` objects
end
# #<CSV::Row "lesson_id":"5" "user_id":" 3">
# #<CSV::Row "lesson_id":"69" "user_id":" 95">
So therefore I can do
parse_2.each do |line|
puts "I'm processing Lesson #{line['lesson_id']} the User #{line['user_id']}"
end
# I'm processing Lesson 5 the User 3
# I'm processing Lesson 69 the User 95

data_rows_only = csv.drop(1)
will do it
csv.drop(1).each do |row|
# ...
end
will loop it

Related

How to check header exist before import data in Ruby CSV?

I want to write header only 1 time in first row when import data to csv in ruby, but the header is written many time on output file.
job_datas.each do |job_data|
#company_job = job data coverted etc....
save_job_to_csv(#company_job)
end
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
if csv.blank?
csv << CompanyJob.attribute_names
end
csv << job_data.attributes.values
end
end
Any one can give me solution? Thank you so much!
You are calling save_job_to_csv the method for each job_data and pushing header every time csv << CompanyJob.attribute_names
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
# push header once
csv << CompanyJob.attribute_names
# push every job record
job_datas.each do |job_data|
#company_job = job data coverted etc....
csv << #company_job.attributes.values
end
end
The above script can be created wrapped a method but if you like to write a separate method that just saves the CSV, then you need to refactor the script when you first prepare an array of values holding header and pass it to a method that just saves to CSV.
You could do something similar to this:
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
unless File.file?(filepath)
File.open(filepath, 'w') do |file|
file.puts(job_data.attribute_names.join(','))
end
end
CSV.open(filepath, "a", :headers => true) do |csv|
csv << job_data.attributes.values
end
end
It just checks beforehand if the file exists and if not it adds the header. If you want tabs as column separators, you just have to change the value for the join function and add the col_sep parameter to CSV.open():
file.puts(job_data.attribute_names.join("\t"))
CSV.open(filepath, "a", :headers => true, col_sep: "\t") do |csv|

Ruby Rails Remove Comments from CSV

I'm new to rails and am trying to process a CSV file, some files will have comments at the start of the CSV file, comments are marked with #. If there a way I can delete these rows? I don't have to just ignore them as I want to save the file without comments.
sample file:
#-----------------------
# report --------------
#-----------------------
Date, transctions
20100923, 34
20200110, 56
Thanks.
The CSV library has a skip_lines options:
When setting an object responding to match, every line matching it is considered a comment and ignored during parsing. When set to a String, it is first converted to a Regexp. When set to nil no line is considered a comment. If the passed object does not respond to match, ArgumentError is thrown.
This should work for you:
CSV.foreach(file, skip_lines: /^#/, headers: true) do |row|
# ...
end
/^#/ matches lines starting with #.
Adding something to #Stefan answer (all credit goes to him for the skip_lines tip), assuming your csv file is input.csv :
require "csv"
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true) do |row|
# ...
output_csv << row
end
end
This way you will end with a file output.csv without those comments.
EDIT:
If you want also the header, you can do:
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true).with_index(0) do |row, i|
output_csv << row.headers if i == 0
puts row
output_csv << row
end
end
...It's not as clean as I want but fits your needs ;)

Do a diff between csv column and ActiveRecord object

I have a simple csv (a list of emails) that I want to upload to my rails backend API which looks like this:
abd#gmail.com,cool#hotmail.com
What I want is to upload that file, check in the user table if there are matching rows (in terms of the email address) and then return a newly downloadable csv with 2 columns: the email and whether or not the email was matched to an existing user(boolean true/false).
I'd like to stream the output since the file can be very large. This is what I have so far:
controller
def import_csv
send_data FileIngestion.process_csv(
params[:file]
), filename: 'processed_emails.csv', type: 'text/csv'
end
file_ingestion.rb
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = []
CSV.foreach(file.path, headers: true) do |row|
emails << row[0]
end
users = User.where("email IN (?)", emails)
end
end
Thanks!
Why not just pluck all the emails from the Users and do something like this. This example keeps it simple but you get the idea. If we can assume your input file is just a string of emails with comma separated values then this should work:
emails = File.read('emails.csv').split(',')
def process_csv(emails)
user_emails = User.where.not(email: [nil, '']).pluck(:email)
CSV.open('emails_processed.csv', 'w') do |row|
row << ['email', 'present']
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
end
process_csv(emails)
UPDATED to match your code design:
def import_csv
send_data FileIngestion.process_csv(params[:file]),
filename: 'processed_emails.csv', type: 'text/csv'
end
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = File.read('emails.csv').split(',')
CSV.open('emails_processed.csv', 'w') do |row|
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
File.read('emails_processed.csv')
end
end
Basically what you want to do is collect the incoming CSV data into batches - use each batch to query the database and write a diff to a tempfile.
You would then stream the tempfile to the client.
require 'csv'
require 'tempfile'
class FileIngestion
BATCH_SIZE = 1000
def self.process_csv(file)
csv_tempfile = CSV.new(Tempfile.new('foo'))
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) do |batch|
emails = batch.flatten
users = User.where(email: emails).pluck(:email)
emails.each do |e|
csv_tempfile << [e, users.include?(e)]
end
end
csv_tempfile
end
end
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) uses a lazy enumerator to access the CSV file in batches. .drop(1) gets rid of the header row.
Ok so this is what I came up with. A solution that basically prevents users from uploading a file that has more than 10,000 data points. Might not be the best solution (I prefer #Max's one) but in any case wanted to share what I did:
def emails_exist
raise 'Missing file parameter' if !params[:file]
csv_path = params[:file].tempfile.path
send_data csv_of_emails_matching_users(csv_path), filename: 'emails.csv', type: 'text/csv'
end
private
def csv_of_emails_matching_users(input_csv_path)
total = 0
CSV.generate(headers: true) do |result|
result << %w{email exists}
emails = []
CSV.foreach(input_csv_path) do |row|
total += 1
if total > 10001
raise 'User Validation limited to 10000 emails'
end
emails.push(row[0])
if emails.count > 99
append_to_csv_info_for_emails(result, emails)
end
end
if emails.count > 0
append_to_csv_info_for_emails(result, emails)
end
end
end
def append_to_csv_info_for_emails(csv, emails)
user_emails = User.where(email: emails).pluck(:email).to_set
emails.each do |email|
csv << [email, user_emails.include?(email)]
end
emails.clear
end

Ruby csv - delete row if column is empty

Trying to delete rows from the csv file here with Ruby without success.
How can I tell that all rows, where column "newprice" is empty, should be deleted?
require 'csv'
guests = CSV.table('new.csv', headers:true)
guests.each do |guest_row|
p guests.to_s
end
price = CSV.foreach('new.csv', headers:true) do |row|
puts row['newprice']
end
guests.delete_if('newprice' = '')
File.open('new_output.csv', 'w') do |f|
f.write(guests.to_csv)
end
Thanks!
Almost there. The table method changes the headers to symbols, and delete_if takes a block, the same way as each and open.
require 'csv'
guests = CSV.table('test.csv', headers:true)
guests.each do |guest_row|
p guest_row.to_s
end
guests.delete_if do |row|
row[:newprice].nil?
end
File.open('test1.csv', 'w') do |f|
f.write(guests.to_csv)
end

How to import a large size (5.5Gb) CSV file to Postgresql using ruby on rails?

I have huge CSV file of 5.5 GB size, it has more than 100 columns in it. I want to import only specific columns from the CSV file. What are the possible ways to do this?
I want to import it to two different tables. Only one field to one table and rest of the fields into another table.
Should i use COPY command in Postgresql or CSV class or SmartCSV kind of gems for this purpose?
Regards,
Suresh.
If I had 5Gb of CSV, I'd better import it without Rails! But, you may have a use case that needs Rails...
Since you've said RAILS, I suppose you are talking about a web request and ActiveRecord...
If you don't care about waiting (and hanging one instance of your server process) you can do this:
Before, notice 2 things: 1) use of temp table, in case of errors you don't mess with your dest table - this is optional, of course. 2) use o option to truncate dest table first
CONTROLLER ACTION:
def updateDB
remote_file = params[:remote_file] ##<ActionDispatch::Http::UploadedFile>
truncate = (params[:truncate]=='true') ? true : false
if remote_file
result = Model.csv2tempTable(remote_file.original_filename, remote_file.tempfile) if remote_file
if result[:result]
Model.updateFromTempTable(truncate)
flash[:notice] = 'sucess.'
else
flash[:error] = 'Errors: ' + result[:errors].join(" ==>")
end
else
flash[:error] = 'Error: no file given.'
end
redirect_to somewhere_else_path
end
MODEL METHODS:
# References:
# http://www.kadrmasconcepts.com/blog/2013/12/15/copy-millions-of-rows-to-postgresql-with-rails/
# http://stackoverflow.com/questions/14526489/using-copy-from-in-a-rails-app-on-heroku-with-the-postgresql-backend
# http://www.postgresql.org/docs/9.1/static/sql-copy.html
#
def self.csv2tempTable(uploaded_name, uploaded_file)
erros = []
begin
#read csv file
file = uploaded_file
Rails.logger.info "Creating temp table...\n From: #{uploaded_name}\n "
#init connection
conn = ActiveRecord::Base.connection
rc = conn.raw_connection
# remove columns created_at/updated_at
rc.exec "drop table IF EXISTS #{TEMP_TABLE}; "
rc.exec "create table #{TEMP_TABLE} (like #{self.table_name}); "
rc.exec "alter table #{TEMP_TABLE} drop column created_at, drop column updated_at;"
#copy it!
rc.exec("COPY #{TEMP_TABLE} FROM STDIN WITH CSV HEADER")
while !file.eof?
# Add row to copy data
l = file.readline
if l.encoding.name != 'UTF-8'
Rails.logger.info "line encoding is #{l.encoding.name}..."
# ENCODING:
# If the source string is already encoded in UTF-8, then just calling .encode('UTF-8') is a no-op,
# and no checks are run. However, converting it to UTF-16 first forces all the checks for invalid byte
# sequences to be run, and replacements are done as needed.
# Reference: http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8?rq=1
l = l.encode('UTF-16', 'UTF-8').encode('UTF-8', 'UTF-16')
end
Rails.logger.info "writing line with encoding #{l.encoding.name} => #{l[0..80]}"
rc.put_copy_data( l )
end
# We are done adding copy data
rc.put_copy_end
# Display any error messages
while res = rc.get_result
e_message = res.error_message
if e_message.present?
erros << "Erro executando SQL: \n" + e_message
end
end
rescue StandardError => e
erros << "Error in csv2tempTable: \n #{e} => #{e.to_yaml}"
end
if erros.present?
Rails.logger.error erros.join("*******************************\n")
{ result: false, erros: erros }
else
{ result: true, erros: [] }
end
end
# copy from TEMP_TABLE into self.table_name
# If <truncate> = true, truncates self.table_name first
# If <truncate> = false, update lines from TEMP_TABLE into self.table_name
#
def self.updateFromTempTable(truncate)
erros = []
begin
Rails.logger.info "Refreshing table #{self.table_name}...\n Truncate: #{truncate}\n "
#init connection
conn = ActiveRecord::Base.connection
rc = conn.raw_connection
#
if truncate
rc.exec "TRUNCATE TABLE #{self.table_name}"
return false unless check_exec(rc)
rc.exec "INSERT INTO #{self.table_name} SELECT *, '#{DateTime.now}' as created_at, '#{DateTime.now}' as updated_at FROM #{TEMP_TABLE}"
return false unless check_exec(rc)
else
#remove lines from self.table_name that are present in temp
rc.exec "DELETE FROM #{self.table_name} WHERE id IN ( SELECT id FROM #{FARMACIAS_TEMP_TABLE} )"
return false unless check_exec(rc)
#copy lines from temp into self + includes timestamps
rc.exec "INSERT INTO #{self.table_name} SELECT *, '#{DateTime.now}' as created_at, '#{DateTime.now}' as updated_at FROM #{FARMACIAS_TEMP_TABLE};"
return false unless check_exec(rc)
end
rescue StandardError => e
Rails.logger.error "Error in updateFromTempTable: \n #{e} => #{e.to_yaml}"
return false
end
true
end

Resources