Export large data(million of rows) into CSV in Rails - ruby-on-rails

I have a million of records and I want to export that data into CSV. I used find_each method to fetch the records. But it also taking too much time to fetch data and download CSV. I am not able to do other activity in the application because its taking more memory. Its just showing me loading the page in the browser.
I have written the following code in the controller
def export_csv
require 'csv'
lines = []
csv_vals = []
User.where(status:ACTIVE).order('created_atdesc').find_each(batch_size: 10000) do |user|
csv_vals << user.email if user.email.present?
csv_vals << user.name if user.name.present?
.......
........
.......etc
lines << CSV.generate_line(csv_vals)
end
send_data(line, type: 'text/csv; charset=iso-8859-1; header=present', \
disposition: "attachment; filename=file123.csv"
end
Is there another way to load the millions of records and download quickly?

this may help:
genereating and streaming potentially large csv files using ruby on rail

Related

How to download CSV data using ActionController::Live from MongoDB?

I have created a CSV downloader in a controller like this
format.csv do
#records = Model.all
headers['Content-Disposition'] = "attachment; filename=\"products.csv\""
headers['Content-Type'] ||= 'text/csv'
end
Now I want to create server sent events to download CSV from this for optimising purpose. I know I can do this in Rails using ActionController::Live but I have have no experience with it.
Can some one explain to me how I can
Query records as batches
Add records to stream
Handle sse from browser side
Write records to CSV files
Correct me if any of my assumptions are wrong. Help me do this in a better way. Thanks.
Mongoid automatically query your records in batches (More info over here)
To add your records to a CSV file, you should do something like:
records = MyModel.all
# By default batch_size is 100, but you can modify it using .batch_size(x)
result = CSV.generate do |csv|
csv << ["attribute1", "attribute2", ...]
records.each do |r|
csv << [r.attribute1, r.attribute2, ...]
end
end
send_data result, filename: 'MyCsv.csv'
Remember that send_data is an ActionController method!
I think you donĀ“t need SSE for generating a CSV. Just include ActionController::Live into the controller to use the response.stream.write iterating your collection:
include ActionController::Live
...
def some_action
format.csv do
# Needed for streaming to workaround Rack 2.2 bug
response.headers['Last-Modified'] = Time.now.httpdate
headers['Content-Disposition'] = "attachment; filename=\"products.csv\""
headers['Content-Type'] ||= 'text/csv'
[1,2,3,4].each do |i| # --> change it to iterate your DB records
response.stream.write ['SOME', 'thing', "interesting #{i}", "#{Time.zone.now}"].to_csv
sleep 1 # some fake delay to see chunking
end
ensure
response.stream.close
end
end
Try it with curl or similar to see the output line by line:
$ curl -i http://localhost:3000/test.csv

extremely slow csv exporting

We are trying to generate a CSV file for users to download. However, it's extremely slow, about 5 minutes for 10k lines of CSV.
Any good idea on improving? Code below:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date)
csv_string = CSV.generate do |csv|
report_lines.each do |report_data|
csv << [report_data.time, report_data.name, report_data.value]
end
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end
I would start by checking if report_time is indexed, unindexed report_time is certainly going to contribute to the slowness. Refer to Active Record Migration for details on adding index.
Second, you could trim down the result to only what you need, i.e. instead of selecting all columns, select only time, name, and value:
report_lines = #report.report_lines
.where("report_time between (?) and (?)", start_date, end_date)
.select('time, name, value')
Try with:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date).select('time, name, value')
csv_string = CSV.generate do |csv|
report_lines.map { |row| csv << row }
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end
Check which part takes so much time.
If it's SQL query, then try to optimize it: check if you have indexes, maybe you need to force another index.
If query is fast, but Ruby is slow, then you can try 2 things:
either try some faster CSV lib (like fastest-csv)
generate as much of CSV as possible in SQL. What I mean is: instead of creating ActiveRecord objects there, fetch just concatenated string for each row. If it's better, but still too slow, you can end up generating just one big string in database and just rendering it for ust (this solution might not seem good, and it looks quite ugly, but in one project I had to use it, because it was the fastest way)

optimizing reading database and writing to csv file

I'm trying to read a large amount of cells from database (over 100.000) and write them to a csv file on VPS Ubuntu server. It happens that server doesn't have enough memory.
I was thinking about reading 5000 rows at once and writing them to file, then reading another 5000, etc..
How should I restructure my current code so that memory won't be consumed fully?
Here's my code:
def write_rows(emails)
File.open(file_path, "w+") do |f|
f << "email,name,ip,created\n"
emails.each do |l|
f << [l.email, l.name, l.ip, l.created_at].join(",") + "\n"
end
end
end
The function is called from sidekiq worker by:
write_rows(user.emails)
Thanks for help!
The problem here is that when you call emails.each ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each:
require 'csv'
BATCH_SIZE = 5000
def write_rows(emails)
CSV.open(file_path, 'w') do |csv|
csv << %w{email name ip created}
emails.find_each do |email|
csv << [email.email, email.name, email.ip, email.created_at]
end
end
end
By default find_each loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size to find_each:
emails.find_each(:batch_size => 5000) do |email|
...
More information about the find_each method (and the related find_in_batches) can be found on the Ruby on Rails Guides.
I've used the CSV class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.

Memory issue with huge CSV Export in Rails

I'm trying to export a large amount of data from a database to a csv file but it is taking a very long time and fear I'll have major memory issues.
Does anyone know of any better way to export a CSV without the memory build up? If so, can you show me how? Thanks.
Here's my controller:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}"
end
You're building up one giant string so you have to keep the entire csv file in memory. You're also loading all of your users which will also sit on a bunch of memory. It won't make any difference if you only have a few hundred or a few thousand users but a some point you will probably need to do 2 things
Use
User.find_each do |user|
csv << [...]
end
This loads users in batches (1000 by default) rather than all of them.
You should also look at writing the csv to a file rather than buffering the entire thing in memory. Assuming you have created a temporary file,
FasterCSV.open('/path/to/file','w') do |csv|
...
end
Will write your csv to a file. You can then use send_file to send it. If you already have a file open, FasterCSV.new(io) should work too.
Lastly, on rails 3.1 and higher you might be able to stream the csv file as you create it, but that isn't something I've tried before.
Additionally to the tips on csv generation, be sure to optimize the call to the database also.
Select only the columns you need.
#users = User.select('id, login, email, last_login, created_at, updated_at').order('login')
#users.find_each do |user|
...
end
If you have for example 1000 users, and each have password, password_salt, city, country, ...
then several 1000 objects less are transfered from database, created as ruby objects and finally garbage collected.

Rails 3.1 active record query to an array of arrays for CSV export via FastCSV

I'm attempting to DRY up a method I've been using for a few months:
def export(imagery_requests)
csv_string = FasterCSV.generate do |csv|
imagery_requests.each do |ir|
csv << [ir.id, ir.service_name, ir.description, ir.first_name, ir.last_name, ir.email,
ir.phone_contact, ir.region, ir.imagery_type, ir.file_type, ir.pixel_type,
ir.total_images, ir.tile_size, ir.progress, ir.expected_date, ir.high_priority,
ir.priority_justification, ir.raw_data_location, ir.service_overviews,
ir.is_def, ir.isc_def, ir.special_instructions, ir.navigational_path,
ir.fyqueue, ir.created_at, ir.updated_at]
end
end
# send it to the browser with proper headers
send_data csv_string,
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=requests_as_of-#{Time.now.strftime("%Y%m%d")}.csv"
end
I figured it would be a LOT better if instead of specifying EVERY column manually, I did something like this:
def export(imagery_requests)
csv_string = FasterCSV.generate do |csv|
line = []
imagery_requests.each do |ir|
csv << ir.attributes.values.each do |i|
line << i
end
end
end
# send it to the browser with proper headers
send_data csv_string,
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=requests_as_of-#{Time.now.strftime("%Y%m%d")}.csv"
end
That should be creating an array of arrays. It works just fine in the Rails console. But in the production environment, it just produces garbage output. I'd much rather make this method extensible so I can add more fields to the ImageryRequest model at a later time. Am I going about this all wrong?
I'm guessing that it probably works in the console when you do it for just one imagery_request, yes?
But when you do multiple it fails?
Again I'm guessing that's because you never reset line to be an empty array again. So you're continually filling a single array.
Try the simple way first, to check it works, then start going all << on it then:
csv_string = FasterCSV.generate do |csv|
imagery_requests.each do |ir|
csv << ir.attributes.values.clone
end
end
PS - in the past I've even used clone on my line-by-line array, just to be sure I wasn't doing anything untoward with persisted stuff...

Resources