extremely slow csv exporting - ruby-on-rails

We are trying to generate a CSV file for users to download. However, it's extremely slow, about 5 minutes for 10k lines of CSV.
Any good idea on improving? Code below:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date)
csv_string = CSV.generate do |csv|
report_lines.each do |report_data|
csv << [report_data.time, report_data.name, report_data.value]
end
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end

I would start by checking if report_time is indexed, unindexed report_time is certainly going to contribute to the slowness. Refer to Active Record Migration for details on adding index.
Second, you could trim down the result to only what you need, i.e. instead of selecting all columns, select only time, name, and value:
report_lines = #report.report_lines
.where("report_time between (?) and (?)", start_date, end_date)
.select('time, name, value')
Try with:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date).select('time, name, value')
csv_string = CSV.generate do |csv|
report_lines.map { |row| csv << row }
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end

Check which part takes so much time.
If it's SQL query, then try to optimize it: check if you have indexes, maybe you need to force another index.
If query is fast, but Ruby is slow, then you can try 2 things:
either try some faster CSV lib (like fastest-csv)
generate as much of CSV as possible in SQL. What I mean is: instead of creating ActiveRecord objects there, fetch just concatenated string for each row. If it's better, but still too slow, you can end up generating just one big string in database and just rendering it for ust (this solution might not seem good, and it looks quite ugly, but in one project I had to use it, because it was the fastest way)

Related

When CSV.generate, generate empty field without ""

Ruby 2.2, Ruby on Rails 4.2
I'm genarating some CSV data in Ruby on Rails, and want empty fields to be empty, like ,, not like ,"", .
I wrote codes like below:
somethings_cotroller.rb
def get_data
respond_to do |format|
format.html
format.csv do
#data = SheetRepository.accounts_data
send_data render_to_string, type: :csv
end
end
end
somethings/get_data.csv.ruby
require 'csv'
csv_str = CSV.generate do |csv|
csv << [1,260,37335,'','','','','','']
...
end
And this generates CSV file like this.
get_data.csv
1,260,37335,"","","","","",""
I want CSV data like below.
1,260,37335,,,,,,
It seems like Ruby adds "" automatically.
How can I do this??
In order to get CSV to output an empty column, you need to tell it that nothing is in the column. An empty string, in ruby, is still something, you'll need to replace those empty strings with nil in order to get the output you want:
csv_str = CSV.generate do |csv|
csv << [1,260,37335,'','','','','',''].map do |col|
col.respond_to?(:empty?) && col.empty? ? nil : col
end
end
# => 1,260,37335,,,,,,
In rails you can clean that up by making use of presence, though this will blank out false as well:
csv_str = CSV.generate do |csv|
csv << [1,260,37335,'',false, nil,'','',''].map(&:presence)
end
# => 1,260,37335,,,,,,
The CSV documentation shows an option that you can use for this case. There are not examples but you can guess what it does.
The only consideration is, you need to send an array of Strings, otherwise, you will get a NoMethodError
csv_str = CSV.generate(write_empty_value: nil) do |csv|
csv << [1,260,37335,'','','','','','', false, ' ', nil].map(&:to_s)
end
=> "1,260,37335,,,,,,,false, ,\n"
The benefit of this solution is, you preserve the false.
I resolved by myself!
in somethings_controller.rb
send_data render_to_string.gsub("\"\"",""), type: :csv

Rails Multiple CSV Export Buttons in Single View (i.e. Two or More)

I've followed Railscasts and the similar GoRails videos--searched SO for [rails] [csv] export to no avail. There's a similar but hard-to-follow question/answer called Two Export Buttons to CSV in Rails. I am able to add a single button to export my table records to csv (have chosen all records for this button).
I want to add a second (or nth) button to my view page to export a subset of table records...e.g. Event.where(country: some_country) in addition to the current (working fine) button/download that exports all records of a model. I thought this would be a common use of a csv export.
Here's what I have, working backwards:
Events Index view: I have a download button to export to CSV (added helper code for the glyph image)
<%= link_to glyph(:save), "events.csv", class: "btn btn-success" %>
If I want a second action, I need a different route than events.csv, right? So, I tried a new button changing "events.csv" to #events_us.csv.
Events Controller: have defined the 'what' and respond as all records (using ransack and will_paginate gems too):
def index
#search = Event.ransack(params[:q])
#events = #search.result.paginate(:page => params[:page], :per_page => 50)
#events_all = Event.all
respond_to do |format|
format.html # need to have html
format.csv { send_data #events_all.to_csv, filename: "Events-#{Date.today}.csv" }
end
end
Should I add a second format.csv in the respond_to do...? Tried that (i.e. defined a different instance variable like #events_us = Event.where(country: "US"), to no avail. Seems weird to have two format.csvs though. And, I get No route matches [GET] "/#events_us.csv"
I probably don't need to say much more, as I'm all kinds of lost on this one.
Event Model: use all scope, csv library, etc to organize the csv:
def self.to_csv
column_names = Event.column_names.map(&:to_s) - %w[id created_at updated_at]
CSV.generate(headers: true) do |csv|
csv << column_names
all.each do |record|
csv << record.attributes.values_at(*column_names)
end
end
end
If I have a second respond_to action in the controller, could I have a second method like self.to_csv_us?
def self.to_csv_us
column_names = Event.column_names.map(&:to_s) - %w[id created_at updated_at]
CSV.generate(headers: true) do |csv|
csv << column_names
#events_us.each do |record|
csv << record.attributes.values_at(*column_names)
end
end
end

Memory issue with huge CSV Export in Rails

I'm trying to export a large amount of data from a database to a csv file but it is taking a very long time and fear I'll have major memory issues.
Does anyone know of any better way to export a CSV without the memory build up? If so, can you show me how? Thanks.
Here's my controller:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}"
end
You're building up one giant string so you have to keep the entire csv file in memory. You're also loading all of your users which will also sit on a bunch of memory. It won't make any difference if you only have a few hundred or a few thousand users but a some point you will probably need to do 2 things
Use
User.find_each do |user|
csv << [...]
end
This loads users in batches (1000 by default) rather than all of them.
You should also look at writing the csv to a file rather than buffering the entire thing in memory. Assuming you have created a temporary file,
FasterCSV.open('/path/to/file','w') do |csv|
...
end
Will write your csv to a file. You can then use send_file to send it. If you already have a file open, FasterCSV.new(io) should work too.
Lastly, on rails 3.1 and higher you might be able to stream the csv file as you create it, but that isn't something I've tried before.
Additionally to the tips on csv generation, be sure to optimize the call to the database also.
Select only the columns you need.
#users = User.select('id, login, email, last_login, created_at, updated_at').order('login')
#users.find_each do |user|
...
end
If you have for example 1000 users, and each have password, password_salt, city, country, ...
then several 1000 objects less are transfered from database, created as ruby objects and finally garbage collected.

Rails 3.1 active record query to an array of arrays for CSV export via FastCSV

I'm attempting to DRY up a method I've been using for a few months:
def export(imagery_requests)
csv_string = FasterCSV.generate do |csv|
imagery_requests.each do |ir|
csv << [ir.id, ir.service_name, ir.description, ir.first_name, ir.last_name, ir.email,
ir.phone_contact, ir.region, ir.imagery_type, ir.file_type, ir.pixel_type,
ir.total_images, ir.tile_size, ir.progress, ir.expected_date, ir.high_priority,
ir.priority_justification, ir.raw_data_location, ir.service_overviews,
ir.is_def, ir.isc_def, ir.special_instructions, ir.navigational_path,
ir.fyqueue, ir.created_at, ir.updated_at]
end
end
# send it to the browser with proper headers
send_data csv_string,
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=requests_as_of-#{Time.now.strftime("%Y%m%d")}.csv"
end
I figured it would be a LOT better if instead of specifying EVERY column manually, I did something like this:
def export(imagery_requests)
csv_string = FasterCSV.generate do |csv|
line = []
imagery_requests.each do |ir|
csv << ir.attributes.values.each do |i|
line << i
end
end
end
# send it to the browser with proper headers
send_data csv_string,
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=requests_as_of-#{Time.now.strftime("%Y%m%d")}.csv"
end
That should be creating an array of arrays. It works just fine in the Rails console. But in the production environment, it just produces garbage output. I'd much rather make this method extensible so I can add more fields to the ImageryRequest model at a later time. Am I going about this all wrong?
I'm guessing that it probably works in the console when you do it for just one imagery_request, yes?
But when you do multiple it fails?
Again I'm guessing that's because you never reset line to be an empty array again. So you're continually filling a single array.
Try the simple way first, to check it works, then start going all << on it then:
csv_string = FasterCSV.generate do |csv|
imagery_requests.each do |ir|
csv << ir.attributes.values.clone
end
end
PS - in the past I've even used clone on my line-by-line array, just to be sure I wasn't doing anything untoward with persisted stuff...

Changing Output for FasterCSV

I currently have a controller that will handle a call to export a table into a CSV file using the FasterCSV gem. The problem is the information stored in the database isn't clear sometimes and so I want to change the output for a particular column.
My project.status column for instance has numbers instead of statuses ie 1 in the database corresponds to Active, 2 for Inactive and 0 for Not Yet decided. When I export the table it shows 0,1,2 instead of Active, Inactive or Not Yet decided. Any idea how to implement this?
I tried a simple loop that would check the final generated CSV file and change each 0,1,2 to its corresponding output, but the problem is every other column that had a 0,1,2 would change as well. I'm not sure how to isolate the column.
Thanks in advance
def csv
qt = params[:selection]
#lists = Project.find(:all, :order=> (params[:sort] + ' ' + params[:direction]), :conditions => ["name LIKE ? OR description LIKE ?", "%#{qt}%", "%#{qt}%"])
csv_string = FasterCSV.generate(:encoding => 'u') do |csv|
csv << ["Status","Name","Summary","Description","Creator","Comment","Contact Information","Created Date","Updated Date"]
#lists.each do |project|
csv << [project.status, project.name, project.summary, project.description, project.creator, project.statusreason, project.contactinfo, project.created_at, project.updated_at]
end
end
filename = Time.now.strftime("%Y%m%d") + ".csv"
send_data(csv_string,
:type => 'text/csv; charset=UTF-8; header=present',
:filename => filename)
end
This is actually fairly easy. In your controller code:
#app/controllers/projects_controller.rb#csv
#lists.each do |project|
csv << [project.descriptive_status, project.name, project.summary, project.description, project.creator, project.statusreason, project.contactinfo, project.created_at, project.updated_at]
end
Then in your model code. You probably already have a method that decodes the DB status to a more descriptive one though:
#app/models/project.rb
ACTIVE_STATUS = 0
INACTIVE_STATUS = 1
NOT_YET_DECIDED_STATUS = 2
def descriptive_status
case status
when ACTIVE_STATUS
"Active"
when INACTIVE_STATUS
"Inactive"
when NOT_YET_DECIDED_STATUS
"Not Yet Decided"
end
end
There are probably a number of ways you can then refactor this. In the controller at least, it would probably be best to make that finder a more descriptive named scope. The constants in the model could be brought into SettingsLogic configuration or another similar gem.

Resources