I have a simple csv (a list of emails) that I want to upload to my rails backend API which looks like this:
abd#gmail.com,cool#hotmail.com
What I want is to upload that file, check in the user table if there are matching rows (in terms of the email address) and then return a newly downloadable csv with 2 columns: the email and whether or not the email was matched to an existing user(boolean true/false).
I'd like to stream the output since the file can be very large. This is what I have so far:
controller
def import_csv
send_data FileIngestion.process_csv(
params[:file]
), filename: 'processed_emails.csv', type: 'text/csv'
end
file_ingestion.rb
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = []
CSV.foreach(file.path, headers: true) do |row|
emails << row[0]
end
users = User.where("email IN (?)", emails)
end
end
Thanks!
Why not just pluck all the emails from the Users and do something like this. This example keeps it simple but you get the idea. If we can assume your input file is just a string of emails with comma separated values then this should work:
emails = File.read('emails.csv').split(',')
def process_csv(emails)
user_emails = User.where.not(email: [nil, '']).pluck(:email)
CSV.open('emails_processed.csv', 'w') do |row|
row << ['email', 'present']
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
end
process_csv(emails)
UPDATED to match your code design:
def import_csv
send_data FileIngestion.process_csv(params[:file]),
filename: 'processed_emails.csv', type: 'text/csv'
end
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = File.read('emails.csv').split(',')
CSV.open('emails_processed.csv', 'w') do |row|
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
File.read('emails_processed.csv')
end
end
Basically what you want to do is collect the incoming CSV data into batches - use each batch to query the database and write a diff to a tempfile.
You would then stream the tempfile to the client.
require 'csv'
require 'tempfile'
class FileIngestion
BATCH_SIZE = 1000
def self.process_csv(file)
csv_tempfile = CSV.new(Tempfile.new('foo'))
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) do |batch|
emails = batch.flatten
users = User.where(email: emails).pluck(:email)
emails.each do |e|
csv_tempfile << [e, users.include?(e)]
end
end
csv_tempfile
end
end
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) uses a lazy enumerator to access the CSV file in batches. .drop(1) gets rid of the header row.
Ok so this is what I came up with. A solution that basically prevents users from uploading a file that has more than 10,000 data points. Might not be the best solution (I prefer #Max's one) but in any case wanted to share what I did:
def emails_exist
raise 'Missing file parameter' if !params[:file]
csv_path = params[:file].tempfile.path
send_data csv_of_emails_matching_users(csv_path), filename: 'emails.csv', type: 'text/csv'
end
private
def csv_of_emails_matching_users(input_csv_path)
total = 0
CSV.generate(headers: true) do |result|
result << %w{email exists}
emails = []
CSV.foreach(input_csv_path) do |row|
total += 1
if total > 10001
raise 'User Validation limited to 10000 emails'
end
emails.push(row[0])
if emails.count > 99
append_to_csv_info_for_emails(result, emails)
end
end
if emails.count > 0
append_to_csv_info_for_emails(result, emails)
end
end
end
def append_to_csv_info_for_emails(csv, emails)
user_emails = User.where(email: emails).pluck(:email).to_set
emails.each do |email|
csv << [email, user_emails.include?(email)]
end
emails.clear
end
Related
I have a doubt about showing a generated CSV file to the user (with a large amount of data). So here is the task I have to do.
App: I have a film that has many characters.
Task:
allow users to upload characters via CSV (ok, done)
if there are errors, show them for each row (ok, done)
in the results page, also show a link to a new CSV file only with the remaining characters - the ones that couldn’t be created (I’m stuck here)
Here is part of my code (upload method):
def upload
saved_characters = []
characters_with_errors = []
errors = {}
begin
CSV.parse(params[:csv].read, **csv_options) do |row|
row_hash = clear_input(row.to_h)
new_character = Character.new(row_hash)
if new_character.save
add_images_to(new_character, row)
saved_characters << new_character
else
characters_with_errors << new_character
errors[new_character.name] = new_character.errors.full_messages.join(', ')
end
end
rescue CSV::MalformedCSVError => e
errors = { 'General error': e.message }.merge(errors)
end
#upload = {
errors: errors,
characters: saved_characters,
characters_with_errors: characters_with_errors
}
end
The issue: large amount of data
In the end, the upload.html.erb almost everything works fine, it shows the results and errors per column BUT I’m not sure how create a link on this page to send the user to the new CSV file (only with characters with errors). If the link sends the user to another method / GET endpoint (for the view with CSV format), how can I send such a large amount of data (params won’t work because they will get too long)? What would be the best practice here?
You can use a session variable to store the data, and then redirect to a new action to download the file. In the new action, you can get the data from the session variable, and then generate the CSV file.
For example, In the upload action, you can do something like this:
session[:characters_with_errors] = characters_with_errors
redirect_to download_csv_path
In the download_csv action, you can do something like this:
characters_with_errors = session[:characters_with_errors]
session[:characters_with_errors] = nil
respond_to do |format|
format.csv { send_data generate_csv(characters_with_errors) }
end
In the generate_csv method, you can do something like this:
def generate_csv(characters_with_errors)
CSV.generate do |csv|
csv << ['name', 'age' ]
characters_with_errors.each do |character|
csv << [character.name, character.age]
end
end
end
Another option, you can use a temporary file to store the data and then send the user to the new CSV file. Here is an example:
def upload
saved_characters = []
characters_with_errors = []
errors = {}
begin
CSV.parse(params[:csv].read, **csv_options) do |row|
row_hash = clear_input(row.to_h)
new_character = Character.new(row_hash)
if new_character.save
add_images_to(new_character, row)
saved_characters << new_character
else
characters_with_errors << new_character
errors[new_character.name] = new_character.errors.full_messages.join(', ')
end
end
rescue CSV::MalformedCSVError => e
errors = { 'General error': e.message }.merge(errors)
end
#upload = {
errors: errors,
characters: saved_characters,
characters_with_errors: characters_with_errors
}
respond_to do |format|
format.html
format.csv do
# Create a temporary file
tmp = Tempfile.new('characters_with_errors')
# Write the CSV data to the temporary file
tmp.write(characters_with_errors.to_csv)
# Send the user to the new CSV file
send_file tmp.path, filename: 'characters_with_errors.csv'
# Close the temporary file
tmp.close
end
end
end
Is there a way to achieve exporting the entire active records in a rails application to csv where each relation is a sheet of csv or there is another way to export full db data.Any suggestions?
Try like this you can get datas from your tables. after you can split it by sheet
models = ActiveRecord::Base.connection.tables
models.shift
models.shift
models.map do |model_name|
model_name = model_name.split("")
model_name.pop
model_name = model_name.join("")
model_name.camelize.constantize.all.map do |data|
puts data
end
end
def backup
models = ActiveRecord::Base.connection.tables
all_data = Hash.new
models.map do |model_name|
table_data = []
model_name = model_name.split("")
model_name.pop
model_name = model_name.join("")
model_name.camelize.constantize.all.map do |data|
table_data.push(data)
end
all_data[model_name.camelize] = table_data
end
send_data export_csv(all_data), filename: "Backup - #{Date.today}.csv" and return
end
def export_csv(data)
csvfile = CSV.generate(headers: true) do |csv|
data.each do |key, value|
csv << [key]
attributes = key.camelize.constantize.column_names
csv << attributes
value.each do |val|
csv << val.attributes.values_at(*attributes)
end
csv << ['eot']
end
end
return csvfile
end
I found the solution and a way to export all tables inside a single csv file.
I have 200k locations in my database. So I want to export all the locations into CSV format. While doing this it is taking too much time to download. What is the best way to optimize code in rails?
In controller:
def index
all_locations = Location.all
respond_to do |format|
format.csv { send_data all_locations.to_csv, filename: "locations-#{Date.today}.csv" }
end
end
In model
def self.to_csv
attributes = %w{id city address}
CSV.generate(headers: true) do |csv|
csv << ['Id', 'City', 'Address']
all.each do |location|
csv << attributes.map{ |attr| location.send(attr) }
end
end
end
I ran your code with some adjustments with my own data. I made the following changes, and using benchmarking I came to a 7x increase.
Your model:
def self.to_csv
attributes = %w{id city address}
CSV.generate(headers: true) do |csv|
csv << ['Id', 'City', 'Address']
all.pluck(attributes).each { |data| csv << data }
end
end
By using pluck you only get the data you want, and then you push all that data into the csv array.
if you are using Postgresql then you can use this in application_record.rb
def self.to_csv_copy(attrs="*", header=[])
rc = connection.raw_connection
rv = header.empty? ? [] : ["#{header.join(',')}\n"]
sql = self.all.select(attrs).to_sql
rc.copy_data("copy (#{sql}) to stdout with csv") do
# rubocop:disable AssignmentInCondition
while line = rc.get_copy_data
rv << line
end
end
rv.join
end
and then do
Location.to_csv_copy(%w{id city address}, ['Id', 'City', 'Address'])
It is even faster than the above solution.
Trying to delete rows from the csv file here with Ruby without success.
How can I tell that all rows, where column "newprice" is empty, should be deleted?
require 'csv'
guests = CSV.table('new.csv', headers:true)
guests.each do |guest_row|
p guests.to_s
end
price = CSV.foreach('new.csv', headers:true) do |row|
puts row['newprice']
end
guests.delete_if('newprice' = '')
File.open('new_output.csv', 'w') do |f|
f.write(guests.to_csv)
end
Thanks!
Almost there. The table method changes the headers to symbols, and delete_if takes a block, the same way as each and open.
require 'csv'
guests = CSV.table('test.csv', headers:true)
guests.each do |guest_row|
p guest_row.to_s
end
guests.delete_if do |row|
row[:newprice].nil?
end
File.open('test1.csv', 'w') do |f|
f.write(guests.to_csv)
end
Am using active admin Export CSV option. Its returning all the values related to the particular table.
I want the reports only for a particular month.
Can anyone help?
you can write own csv exporter
collection_action :download_report, :method => :get do
users = User.where('created_at >= ?', Date.today - 1.month)
csv = CSV.generate( encoding: 'Windows-1251' ) do |csv|
# add headers
csv < [ #Some header ]
# add data
users.each do |user|
csv << [ user.created_at ]
end
end
# send file to user
send_data csv.encode('Windows-1251'), type: 'text/csv; charset=windows-1251; header=present', disposition: "attachment; filename=report.csv"
end
action_item only: :index do
link_to('csv report'), params.merge(:action => :download_report))
end
index :download_links => false do
# off standard download link
end
this is just example for you. Your code can be another
for generation csv file use this code where you want
# generate csv file of photo
def self.generate_csv
header = []
csv_fname = "#{CSV_FILE_PATH}/images.csv"
options = {headers: :first_row}
photo_columns = column_names - ["id", "updated_at"]
photo_columns.map{|col| col == "created_at" ? header << "ScrapeDate" : header << col.classify}
CSV.open(csv_fname, "w", options ) do |csv|
csv << header if File.exist?(csv_fname) && File.size(csv_fname) == 0
find_each(batch_size: 5000) do |photo|
csv << photo.attributes.values_at(*photo_columns)
end
end
end
in above code which column you don't want subtract that cols from actual cols, for example column_names - ["id", "updated_at"] here column_names return actual cols array and which cols we don't need we subtract them.