Ruby csv - delete row if column is empty - ruby-on-rails

Trying to delete rows from the csv file here with Ruby without success.
How can I tell that all rows, where column "newprice" is empty, should be deleted?
require 'csv'
guests = CSV.table('new.csv', headers:true)
guests.each do |guest_row|
p guests.to_s
end
price = CSV.foreach('new.csv', headers:true) do |row|
puts row['newprice']
end
guests.delete_if('newprice' = '')
File.open('new_output.csv', 'w') do |f|
f.write(guests.to_csv)
end
Thanks!

Almost there. The table method changes the headers to symbols, and delete_if takes a block, the same way as each and open.
require 'csv'
guests = CSV.table('test.csv', headers:true)
guests.each do |guest_row|
p guest_row.to_s
end
guests.delete_if do |row|
row[:newprice].nil?
end
File.open('test1.csv', 'w') do |f|
f.write(guests.to_csv)
end

Related

How to check header exist before import data in Ruby CSV?

I want to write header only 1 time in first row when import data to csv in ruby, but the header is written many time on output file.
job_datas.each do |job_data|
#company_job = job data coverted etc....
save_job_to_csv(#company_job)
end
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
if csv.blank?
csv << CompanyJob.attribute_names
end
csv << job_data.attributes.values
end
end
Any one can give me solution? Thank you so much!
You are calling save_job_to_csv the method for each job_data and pushing header every time csv << CompanyJob.attribute_names
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
# push header once
csv << CompanyJob.attribute_names
# push every job record
job_datas.each do |job_data|
#company_job = job data coverted etc....
csv << #company_job.attributes.values
end
end
The above script can be created wrapped a method but if you like to write a separate method that just saves the CSV, then you need to refactor the script when you first prepare an array of values holding header and pass it to a method that just saves to CSV.
You could do something similar to this:
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
unless File.file?(filepath)
File.open(filepath, 'w') do |file|
file.puts(job_data.attribute_names.join(','))
end
end
CSV.open(filepath, "a", :headers => true) do |csv|
csv << job_data.attributes.values
end
end
It just checks beforehand if the file exists and if not it adds the header. If you want tabs as column separators, you just have to change the value for the join function and add the col_sep parameter to CSV.open():
file.puts(job_data.attribute_names.join("\t"))
CSV.open(filepath, "a", :headers => true, col_sep: "\t") do |csv|

Do a diff between csv column and ActiveRecord object

I have a simple csv (a list of emails) that I want to upload to my rails backend API which looks like this:
abd#gmail.com,cool#hotmail.com
What I want is to upload that file, check in the user table if there are matching rows (in terms of the email address) and then return a newly downloadable csv with 2 columns: the email and whether or not the email was matched to an existing user(boolean true/false).
I'd like to stream the output since the file can be very large. This is what I have so far:
controller
def import_csv
send_data FileIngestion.process_csv(
params[:file]
), filename: 'processed_emails.csv', type: 'text/csv'
end
file_ingestion.rb
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = []
CSV.foreach(file.path, headers: true) do |row|
emails << row[0]
end
users = User.where("email IN (?)", emails)
end
end
Thanks!
Why not just pluck all the emails from the Users and do something like this. This example keeps it simple but you get the idea. If we can assume your input file is just a string of emails with comma separated values then this should work:
emails = File.read('emails.csv').split(',')
def process_csv(emails)
user_emails = User.where.not(email: [nil, '']).pluck(:email)
CSV.open('emails_processed.csv', 'w') do |row|
row << ['email', 'present']
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
end
process_csv(emails)
UPDATED to match your code design:
def import_csv
send_data FileIngestion.process_csv(params[:file]),
filename: 'processed_emails.csv', type: 'text/csv'
end
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = File.read('emails.csv').split(',')
CSV.open('emails_processed.csv', 'w') do |row|
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
File.read('emails_processed.csv')
end
end
Basically what you want to do is collect the incoming CSV data into batches - use each batch to query the database and write a diff to a tempfile.
You would then stream the tempfile to the client.
require 'csv'
require 'tempfile'
class FileIngestion
BATCH_SIZE = 1000
def self.process_csv(file)
csv_tempfile = CSV.new(Tempfile.new('foo'))
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) do |batch|
emails = batch.flatten
users = User.where(email: emails).pluck(:email)
emails.each do |e|
csv_tempfile << [e, users.include?(e)]
end
end
csv_tempfile
end
end
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) uses a lazy enumerator to access the CSV file in batches. .drop(1) gets rid of the header row.
Ok so this is what I came up with. A solution that basically prevents users from uploading a file that has more than 10,000 data points. Might not be the best solution (I prefer #Max's one) but in any case wanted to share what I did:
def emails_exist
raise 'Missing file parameter' if !params[:file]
csv_path = params[:file].tempfile.path
send_data csv_of_emails_matching_users(csv_path), filename: 'emails.csv', type: 'text/csv'
end
private
def csv_of_emails_matching_users(input_csv_path)
total = 0
CSV.generate(headers: true) do |result|
result << %w{email exists}
emails = []
CSV.foreach(input_csv_path) do |row|
total += 1
if total > 10001
raise 'User Validation limited to 10000 emails'
end
emails.push(row[0])
if emails.count > 99
append_to_csv_info_for_emails(result, emails)
end
end
if emails.count > 0
append_to_csv_info_for_emails(result, emails)
end
end
end
def append_to_csv_info_for_emails(csv, emails)
user_emails = User.where(email: emails).pluck(:email).to_set
emails.each do |email|
csv << [email, user_emails.include?(email)]
end
emails.clear
end

Rails 4 - Import CSV is not working

In rails 4.2.4, I am trying to extract the data from .csv file and save it to the database. But right now extracted row from the file is in wrong format so that value is not getting save.
require 'csv'
filename = "#{asset.document.path}"
if File.exist?(filename)
file = File.open(filename)
if file
CSV::parse(file)[1..-1].each do |row|
User.create_data(row, admin)
end
end
end
def create_data(row, admin)
usr = User.new
usr.name = row[0] if row[0]
usr.email = row[1] if row[1]
usr.password = row[2] if row[2]
user.save
end
Generated row's data is like ["Sannidhi\tsannidhi#gmail.com\tsannidhi123#\t"]. From this row I am not getting each values separately Eg: row[0], row[1] & row[2] to assign for related database fields.
How can I solve this CSV import issue? Please help me.
Try this:
CSV::parse(file)[1..-1].each do |row|
row.shift.split("\t") unless row.blank?
User.create_data(row, admin)
end
After this, you should be able to access:
row[0] #=> "Sannidhi"
row[1]
row[2]
You CSV file uses tabs as column separators. You can pass your own column separator to CSV as a col_sep option. Even though other 2 answers will do the job, let csv do its job on its own:
CSV::parse(file, col_sep: "\t")[1..-1].each do |row|
User.create_data(row, admin)
end
Also, consider using headers option to use the first line of the file as a header instead of [1..-1]:
CSV::parse(file, col_sep: "\t", headers: 'first_row').each do |row|
User.create_data(row, admin)
end
CSV stands for Comma Separated Value. It seems that your file is separated by spaces instead. Replace the tabs in your file by commas.

web scraping/export to CSV with Ruby

ruby n00b here in hope of some guidance. I am looking to scrape a website (600-odd names and links on one page) and output to CSV. The scraping itself works fine (the output correctly fills the terminal as the script runs), but I can't get the CSV to populate. The code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
url = "http://www.example.com/page/"
page = Nokogiri::HTML(open(url))
page.css('.item').each do |item|
name = item.at_css('a').text
link = item.at_css('a')[:href]
foo = puts "#{name}"
bar = "#{link}"
CSV.open("file.csv", "wb") do |csv|
csv << [foo, bar]
end
end
puts "upload complete!"
...replacing the csv << [foo, bar] with csv << [name, link] just puts the final iteration into the CSV. I feel there's something basic I am missing here. Thanks for reading.
The problem is that you're doing CSV.open for every single item. So it's overwriting the file with the newer item. And hence at the end, you're left with the last item in the csv file.
Move the CSV.open call before page.css('.item').each and it should work.
CSV.open("file.csv", "wb") do |csv|
page.css('.item').each do |item|
name = item.at_css('a').text
link = item.at_css('a')[:href]
csv << [name, link]
end
end

Ignore first line on csv parse Rails

I am using the code from this tutorial to parse a CSV file and add the contents to a database table. How would I ignore the first line of the CSV file? The controller code is below:
def csv_import
#parsed_file=CSV::Reader.parse(params[:dump][:file])
n = 0
#parsed_file.each do |row|
s = Student.new
s.name = row[0]
s.cid = row[1]
s.year_id = find_year_id_from_year_title(row[2])
if s.save
n = n+1
GC.start if n%50==0
end
flash.now[:message] = "CSV Import Successful, #{n} new students added to the database."
end
redirect_to(students_url)
end
This question kept popping up when i was searching for how to skip the first line with the CSV / FasterCSV libraries, so here's the solution that if you end up here.
the solution is...
CSV.foreach("path/to/file.csv",{:headers=>:first_row}) do |row|
HTH.
#parsed_file.each_with_index do |row, i|
next if i == 0
....
If you identify your first line as headers then you get back a Row object instead of a simple Array.
When you grab cell values, it seems like you need to use .fetch("Row Title") on the Row object.
This is what I came up with. I'm skipping nil with my if conditional.
CSV.foreach("GitHubUsersToAdd.csv",{:headers=>:first_row}) do |row|
username = row.fetch("GitHub Username")
if username
puts username.inspect
end
end
Using this simple code, you can read a CSV file and ignore the first line which is the header or field names:
CSV.foreach(File.join(File.dirname(__FILE__), filepath), headers: true) do |row|
puts row.inspect
end
You can do what ever you want with row. Don't forget headers: true
require 'csv'
csv_content =<<EOF
lesson_id,user_id
5,3
69,95
EOF
parse_1 = CSV.parse csv_content
parse_1.size # => 3 # it treats all lines as equal data
parse_2 = CSV.parse csv_content, headers:true
parse_2.size # => 2 # it ignores the first line as it's header
parse_1
# => [["lesson_id", "user_id"], ["5", "3"], ["69", "95"]]
parse_2
# => #<CSV::Table mode:col_or_row row_count:3>
here where it's the fun part
parse_1.each do |line|
puts line.inspect # the object is array
end
# ["lesson_id", "user_id"]
# ["5", " 3"]
# ["69", " 95"]
parse_2.each do |line|
puts line.inspect # the object is `CSV::Row` objects
end
# #<CSV::Row "lesson_id":"5" "user_id":" 3">
# #<CSV::Row "lesson_id":"69" "user_id":" 95">
So therefore I can do
parse_2.each do |line|
puts "I'm processing Lesson #{line['lesson_id']} the User #{line['user_id']}"
end
# I'm processing Lesson 5 the User 3
# I'm processing Lesson 69 the User 95
data_rows_only = csv.drop(1)
will do it
csv.drop(1).each do |row|
# ...
end
will loop it

Resources