Is there any optimized way to export CSV of more than 100K record using Rails? - ruby-on-rails

I have 200k locations in my database. So I want to export all the locations into CSV format. While doing this it is taking too much time to download. What is the best way to optimize code in rails?
In controller:
def index
all_locations = Location.all
respond_to do |format|
format.csv { send_data all_locations.to_csv, filename: "locations-#{Date.today}.csv" }
end
end
In model
def self.to_csv
attributes = %w{id city address}
CSV.generate(headers: true) do |csv|
csv << ['Id', 'City', 'Address']
all.each do |location|
csv << attributes.map{ |attr| location.send(attr) }
end
end
end

I ran your code with some adjustments with my own data. I made the following changes, and using benchmarking I came to a 7x increase.
Your model:
def self.to_csv
attributes = %w{id city address}
CSV.generate(headers: true) do |csv|
csv << ['Id', 'City', 'Address']
all.pluck(attributes).each { |data| csv << data }
end
end
By using pluck you only get the data you want, and then you push all that data into the csv array.

if you are using Postgresql then you can use this in application_record.rb
def self.to_csv_copy(attrs="*", header=[])
rc = connection.raw_connection
rv = header.empty? ? [] : ["#{header.join(',')}\n"]
sql = self.all.select(attrs).to_sql
rc.copy_data("copy (#{sql}) to stdout with csv") do
# rubocop:disable AssignmentInCondition
while line = rc.get_copy_data
rv << line
end
end
rv.join
end
and then do
Location.to_csv_copy(%w{id city address}, ['Id', 'City', 'Address'])
It is even faster than the above solution.

Related

How can I dry up code to export different CSVs from my action controller and application record in rails 5?

I have an app whose sole purpose is to seed data files and add the data to different CSVs which are zipped and exported by the user. My application controller is filled with lines that all look like this:
def export_tips
#appointments = Appointment.order('service_id')
send_data #appointments.to_csv_tips, filename: 'tips.csv'
end
def export_ticketpayments
#appointments = Appointment.order('service_id')
send_data #appointments.to_csv_ticketpayments, filename: 'ticketspaymentitems.csv'
end
def export_batchmanifest
#batchmanifests = Batchmanifest.all
send_data #batchmanifests.to_csv_batchmanifest, filename: "batch_manifest-#{Date.today}.csv"
end
def export_pets
#clients = Client.all
send_data #clients.to_csv_pets, filename: 'pets.csv'
end
def export_clients
#clients = Client.all
send_data #clients.to_csv_clients, filename: 'clients.csv'
end
I have it in the application controller because I used it in multiple different areas including creating single CSV exports and creating complex zip files with multiple zips and CSVs inside.
Some things that I have tried to cleanup the code include:
Different variables of this:
def csv_export (model, filename)
#model.pluralize = (model.titleize).all
send_data #model.pluralize.filename, filename: filename
end
Having each one in its own controller (could not access them from different views and other controllers easily)
I also tried to figure out how to create my own module, but was unable to do so.
My application record is just as bad with repeated lines simply meant to export the CSVs:
def self.to_csv_appointments
attributes = %w[appointment_id location_id employee_id client_id child_id notes
has_specific_employee start_time end_time]
CSV.generate(headers: true) do |csv|
csv << attributes
all.each do |appointment|
csv << attributes.map { |attr| appointment.send(attr) }
end
end
end
def self.to_csv_appointmentservices
attributes = %w[appointment_id service_id price duration]
CSV.generate(headers: true) do |csv|
csv << attributes
all.each do |appointment|
csv << attributes.map { |attr| appointment.send(attr) }
end
end
end
def self.to_csv_tickets
attributes = %w[ticket_id location_id client_id ticket_status employee_id
employee_id start_time]
headers = %w[ticket_id location_id client_id status employee_id
closed_by_employee_id closed_at]
CSV.generate(headers: true) do |csv|
csv << headers
all.each do |appointment|
csv << attributes.map { |attr| appointment.send(attr) }
end
end
end
For the application record, I have tried similar methods as those listed for the application controller, but to no avail. Again, I use the code in application record instead of in the individual model files because I need to access these in multiple parts of the site.
The code from the application controller is used mostly in the static controller and buttons on the view files. I need the ability to create the file sets, as listed below, but also allow the user to export just one CSV.
Examples from static controller to built the zip files:
def create_appointments_zip
file_stream = Zip::OutputStream.write_buffer do |zip|
#appointments = Appointment.order('service_id')
zip.put_next_entry "appointment_manifest.csv"; zip << File.binread("#{Rails.root}/app/assets/csvs/appointment_manifest.csv")
zip.put_next_entry "appointments.csv"; zip << #appointments.to_csv_appointments
zip.put_next_entry "appointment_services.csv"; zip << #appointments.to_csv_appointmentservices
zip.put_next_entry "appointment_statuses.csv"; zip << #appointments.to_csv_appointmentstatuses
end
file_stream.rewind
File.open("#{Rails.root}/app/assets/csvs/appointments.zip", 'wb') do |file|
file.write(file_stream.read)
end
end
def export_salonset
create_appointments_zip
create_tickets_zip
create_inventory_zip
create_memberships_zip
file_stream = Zip::OutputStream.write_buffer do |zip|
#saloncategories = Saloncategory.all
#salonservices = Salonservice.all
#clients = Client.all
#locations = Location.all
#salonpricings = Salonpricing.all
#staffs = Staff.order("location_id")
zip.put_next_entry "batch_manifest.csv"; zip << File.binread("#{Rails.root}/app/assets/csvs/batch_manifest_simple_salon.csv")
zip.put_next_entry "categories.csv"; zip << #saloncategories.to_csv_saloncategories
zip.put_next_entry "clients.csv"; zip << #clients.to_csv_clients
zip.put_next_entry "employees.csv"; zip << #staffs.to_csv_staff
zip.put_next_entry "locations.csv"; zip << #locations.to_csv_locations
zip.put_next_entry "pricings.csv"; zip << #salonpricings.to_csv_pricings
zip.put_next_entry "services.csv"; zip << #salonservices.to_csv_salonservices
zip.put_next_entry "appointments.zip"; zip << File.binread("#{Rails.root}/app/assets/csvs/appointments.zip")
zip.put_next_entry "inventories.zip"; zip << File.binread("#{Rails.root}/app/assets/csvs/inventories.zip")
zip.put_next_entry "tickets.zip"; zip << File.binread("#{Rails.root}/app/assets/csvs/tickets.zip")
zip.put_next_entry "addonmappings.csv"; zip << File.binread("#{Rails.root}/app/assets/csvs/addonmappings.csv")
end
file_stream.rewind
respond_to do |format|
format.zip do
send_data file_stream.read, filename: "salon_set.zip"
end
end
file_stream.rewind
File.open("#{Rails.root}/app/assets/csvs/salon_set.zip", 'wb') do |file|
file.write(file_stream.read)
end
end
Link to my repository, if that is helpful
https://github.com/atayl16/data-wizard/blob/master/app/controllers/application_controller.rb
https://github.com/atayl16/data-wizard/blob/master/app/models/application_record.rb
I know there must be a better way than writing these same lines over and over. The code works, my site works (amazingly), but I would be embarrassed for any seasoned developer to see the repository without laughing. Any help is appreciated!
In this end, I ended up using metaprogramming to clean this up. Here is an example in which I excluded some items from the array for brevity:
["bundle", "attendee", "location", "membership", "client", "staff"].each do |new_method|
define_method("#{new_method.pluralize}") do
instance_variable_set("##{new_method.pluralize}", new_method.camelcase.constantize.all)
instance_var = instance_variable_get("##{new_method.pluralize}")
send_data instance_var.public_send("to_csv_#{new_method.pluralize}"), filename: "#{new_method.pluralize}.csv"
end
end
I was able to remove 30 methods from my newly created export controller. Here is the code after pushing up the changes https://github.com/atayl16/data-wizard/blob/0011b6cf8c1fe967d73a569fa573cedc52cb8c72/app/controllers/export_controller.rb

How to export the entire active records in a schema to a csv file

Is there a way to achieve exporting the entire active records in a rails application to csv where each relation is a sheet of csv or there is another way to export full db data.Any suggestions?
Try like this you can get datas from your tables. after you can split it by sheet
models = ActiveRecord::Base.connection.tables
models.shift
models.shift
models.map do |model_name|
model_name = model_name.split("")
model_name.pop
model_name = model_name.join("")
model_name.camelize.constantize.all.map do |data|
puts data
end
end
def backup
models = ActiveRecord::Base.connection.tables
all_data = Hash.new
models.map do |model_name|
table_data = []
model_name = model_name.split("")
model_name.pop
model_name = model_name.join("")
model_name.camelize.constantize.all.map do |data|
table_data.push(data)
end
all_data[model_name.camelize] = table_data
end
send_data export_csv(all_data), filename: "Backup - #{Date.today}.csv" and return
end
def export_csv(data)
csvfile = CSV.generate(headers: true) do |csv|
data.each do |key, value|
csv << [key]
attributes = key.camelize.constantize.column_names
csv << attributes
value.each do |val|
csv << val.attributes.values_at(*attributes)
end
csv << ['eot']
end
end
return csvfile
end
I found the solution and a way to export all tables inside a single csv file.

Do a diff between csv column and ActiveRecord object

I have a simple csv (a list of emails) that I want to upload to my rails backend API which looks like this:
abd#gmail.com,cool#hotmail.com
What I want is to upload that file, check in the user table if there are matching rows (in terms of the email address) and then return a newly downloadable csv with 2 columns: the email and whether or not the email was matched to an existing user(boolean true/false).
I'd like to stream the output since the file can be very large. This is what I have so far:
controller
def import_csv
send_data FileIngestion.process_csv(
params[:file]
), filename: 'processed_emails.csv', type: 'text/csv'
end
file_ingestion.rb
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = []
CSV.foreach(file.path, headers: true) do |row|
emails << row[0]
end
users = User.where("email IN (?)", emails)
end
end
Thanks!
Why not just pluck all the emails from the Users and do something like this. This example keeps it simple but you get the idea. If we can assume your input file is just a string of emails with comma separated values then this should work:
emails = File.read('emails.csv').split(',')
def process_csv(emails)
user_emails = User.where.not(email: [nil, '']).pluck(:email)
CSV.open('emails_processed.csv', 'w') do |row|
row << ['email', 'present']
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
end
process_csv(emails)
UPDATED to match your code design:
def import_csv
send_data FileIngestion.process_csv(params[:file]),
filename: 'processed_emails.csv', type: 'text/csv'
end
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = File.read('emails.csv').split(',')
CSV.open('emails_processed.csv', 'w') do |row|
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
File.read('emails_processed.csv')
end
end
Basically what you want to do is collect the incoming CSV data into batches - use each batch to query the database and write a diff to a tempfile.
You would then stream the tempfile to the client.
require 'csv'
require 'tempfile'
class FileIngestion
BATCH_SIZE = 1000
def self.process_csv(file)
csv_tempfile = CSV.new(Tempfile.new('foo'))
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) do |batch|
emails = batch.flatten
users = User.where(email: emails).pluck(:email)
emails.each do |e|
csv_tempfile << [e, users.include?(e)]
end
end
csv_tempfile
end
end
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) uses a lazy enumerator to access the CSV file in batches. .drop(1) gets rid of the header row.
Ok so this is what I came up with. A solution that basically prevents users from uploading a file that has more than 10,000 data points. Might not be the best solution (I prefer #Max's one) but in any case wanted to share what I did:
def emails_exist
raise 'Missing file parameter' if !params[:file]
csv_path = params[:file].tempfile.path
send_data csv_of_emails_matching_users(csv_path), filename: 'emails.csv', type: 'text/csv'
end
private
def csv_of_emails_matching_users(input_csv_path)
total = 0
CSV.generate(headers: true) do |result|
result << %w{email exists}
emails = []
CSV.foreach(input_csv_path) do |row|
total += 1
if total > 10001
raise 'User Validation limited to 10000 emails'
end
emails.push(row[0])
if emails.count > 99
append_to_csv_info_for_emails(result, emails)
end
end
if emails.count > 0
append_to_csv_info_for_emails(result, emails)
end
end
end
def append_to_csv_info_for_emails(csv, emails)
user_emails = User.where(email: emails).pluck(:email).to_set
emails.each do |email|
csv << [email, user_emails.include?(email)]
end
emails.clear
end

Exporting data to CSV from multiple models

I am able to export all fields of a model to a CSV file, but now I need to add some attributes from another model which has a has_many relationship with the original.
my controller file looks like
respond_to do |format|
format.html
format.csv { send_data #students.as_csv, filename: "students-#{Date.today}.csv" }
end
student.rb
def self.as_csv
attributes = %w{surname given_name admission_year admission_no hobbies }
CSV.generate do |csv|
csv << attributes
all.each do |item|
csv << item.attributes.values_at(*attributes)
end
end
It works fine but because hobby is another table having a has_many relation with student as a student has many hobbies, I want to show hobbies for each student as a comma separated list in the csv. I am stuck as to how to achieve this.
Any help will be appreciated.
I would just do something like this:
CSV_HEADER = %w[surname given_name admission_year admission_no hobbies]
def self.as_csv
CSV.generate do |csv|
csv << CSV_HEADER
all.each do |student|
csv << [
student.surname,
student.given_name,
student.admission_year,
student.admission_no,
student.hobbies.pluck(:title).join(', ')
]
end
end
end
You may need to adjust title with a attributes name that returns the hobby as a string.

export hash array to csv file

My objective is to transform a hash array to a csv file.
This is my controller:
respond_to do |format|
format.html
format.csv { send_data #comsumptions.to_csv }
end
#comsumptions is a hash array:
[{"x"=>76,
"y"=>"example",
"z"=>2015,
"consumption"=>#<BigDecimal:7fea4a1cadb8,'0.5382857142E4',18(27)>},
{"x"=>76,
"y"=>"example2",
"z"=>2015,
"consumption"=>#<BigDecimal:7fea4a1ca7c8,'0.5437E4',9(27)>},(..)
I want to create a CSV file with 2 specific columns, "consumption" and "z".
When I did this with these 3 lines comment the output is a file with all the #consumptions. How can I select these 2 columns and transform in a cv file?
def self.to_csv
CSV.generate(headers: true ) do |csv|
#csv << column_names
#all.each do |product|
# csv << product.attributes.values_at(*column_names)
end
end
end
From your feedback, I think the best way here is creating a csv view file in your views. For example, if your html file is comsumptions.html.erb, then your csv view file should be comsumptions.csv.ruby
# comsumptions.csv.ruby
require 'csv'
CSV.generate do |csv|
csv << ['consumption', 'z']
#comsumptions.each do |c|
csv << [ c['consumption'].to_s, c['z'] ]
end
end
And we need to change the controller too. Remove respond_to part or modify it as follows
respond_to do |format|
format.html
format.csv
end
I already tested on my localhost, and this should work!
require 'csv'
#comsumptions =
[{"x"=>76,
"y"=>"example",
"z"=>2015},
{"x"=>76,
"y"=>"example2",
"z"=>2015}]
class << #comsumptions
def to_csv (*keys)
keys = first.keys if keys.empty?
CSV.generate(headers: keys, write_headers: true) do |csv|
each do |e|
csv << e.values_at(*keys)
end
end
end
end
p #comsumptions.to_csv("x","y")
p #comsumptions.to_csv()
This solution heavily inspired from the one from Van Huy works fine for any array oh hashes as long as the hashes have all the same keys, undefined behaviour to be expected otherwise
What's not clear to me is where do you put the def self.to_csv method, inside the controller? That is wrong. What you want to do is either augment the #consumptions object, augment the Array class, or define a method in the controller. My example does augment the #consumptions object, you can put all that stuff inside the controller method and it should work.

Resources