I'm trying to export a large amount of data from a database to a csv file but it is taking a very long time and fear I'll have major memory issues.
Does anyone know of any better way to export a CSV without the memory build up? If so, can you show me how? Thanks.
Here's my controller:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}"
end
You're building up one giant string so you have to keep the entire csv file in memory. You're also loading all of your users which will also sit on a bunch of memory. It won't make any difference if you only have a few hundred or a few thousand users but a some point you will probably need to do 2 things
Use
User.find_each do |user|
csv << [...]
end
This loads users in batches (1000 by default) rather than all of them.
You should also look at writing the csv to a file rather than buffering the entire thing in memory. Assuming you have created a temporary file,
FasterCSV.open('/path/to/file','w') do |csv|
...
end
Will write your csv to a file. You can then use send_file to send it. If you already have a file open, FasterCSV.new(io) should work too.
Lastly, on rails 3.1 and higher you might be able to stream the csv file as you create it, but that isn't something I've tried before.
Additionally to the tips on csv generation, be sure to optimize the call to the database also.
Select only the columns you need.
#users = User.select('id, login, email, last_login, created_at, updated_at').order('login')
#users.find_each do |user|
...
end
If you have for example 1000 users, and each have password, password_salt, city, country, ...
then several 1000 objects less are transfered from database, created as ruby objects and finally garbage collected.
Related
I have created a CSV downloader in a controller like this
format.csv do
#records = Model.all
headers['Content-Disposition'] = "attachment; filename=\"products.csv\""
headers['Content-Type'] ||= 'text/csv'
end
Now I want to create server sent events to download CSV from this for optimising purpose. I know I can do this in Rails using ActionController::Live but I have have no experience with it.
Can some one explain to me how I can
Query records as batches
Add records to stream
Handle sse from browser side
Write records to CSV files
Correct me if any of my assumptions are wrong. Help me do this in a better way. Thanks.
Mongoid automatically query your records in batches (More info over here)
To add your records to a CSV file, you should do something like:
records = MyModel.all
# By default batch_size is 100, but you can modify it using .batch_size(x)
result = CSV.generate do |csv|
csv << ["attribute1", "attribute2", ...]
records.each do |r|
csv << [r.attribute1, r.attribute2, ...]
end
end
send_data result, filename: 'MyCsv.csv'
Remember that send_data is an ActionController method!
I think you donĀ“t need SSE for generating a CSV. Just include ActionController::Live into the controller to use the response.stream.write iterating your collection:
include ActionController::Live
...
def some_action
format.csv do
# Needed for streaming to workaround Rack 2.2 bug
response.headers['Last-Modified'] = Time.now.httpdate
headers['Content-Disposition'] = "attachment; filename=\"products.csv\""
headers['Content-Type'] ||= 'text/csv'
[1,2,3,4].each do |i| # --> change it to iterate your DB records
response.stream.write ['SOME', 'thing', "interesting #{i}", "#{Time.zone.now}"].to_csv
sleep 1 # some fake delay to see chunking
end
ensure
response.stream.close
end
end
Try it with curl or similar to see the output line by line:
$ curl -i http://localhost:3000/test.csv
I have a million of records and I want to export that data into CSV. I used find_each method to fetch the records. But it also taking too much time to fetch data and download CSV. I am not able to do other activity in the application because its taking more memory. Its just showing me loading the page in the browser.
I have written the following code in the controller
def export_csv
require 'csv'
lines = []
csv_vals = []
User.where(status:ACTIVE).order('created_atdesc').find_each(batch_size: 10000) do |user|
csv_vals << user.email if user.email.present?
csv_vals << user.name if user.name.present?
.......
........
.......etc
lines << CSV.generate_line(csv_vals)
end
send_data(line, type: 'text/csv; charset=iso-8859-1; header=present', \
disposition: "attachment; filename=file123.csv"
end
Is there another way to load the millions of records and download quickly?
this may help:
genereating and streaming potentially large csv files using ruby on rail
We are trying to generate a CSV file for users to download. However, it's extremely slow, about 5 minutes for 10k lines of CSV.
Any good idea on improving? Code below:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date)
csv_string = CSV.generate do |csv|
report_lines.each do |report_data|
csv << [report_data.time, report_data.name, report_data.value]
end
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end
I would start by checking if report_time is indexed, unindexed report_time is certainly going to contribute to the slowness. Refer to Active Record Migration for details on adding index.
Second, you could trim down the result to only what you need, i.e. instead of selecting all columns, select only time, name, and value:
report_lines = #report.report_lines
.where("report_time between (?) and (?)", start_date, end_date)
.select('time, name, value')
Try with:
def download_data
start_date, end_date = get_start_end_date
report_lines = #report.report_lines.where("report_time between (?) and (?)", start_date, end_date).select('time, name, value')
csv_string = CSV.generate do |csv|
report_lines.map { |row| csv << row }
end
respond_to do |format|
format.csv { send_data(csv_string, :filename => "#{Time.now}.csv", :type => "text/csv") }
end
end
Check which part takes so much time.
If it's SQL query, then try to optimize it: check if you have indexes, maybe you need to force another index.
If query is fast, but Ruby is slow, then you can try 2 things:
either try some faster CSV lib (like fastest-csv)
generate as much of CSV as possible in SQL. What I mean is: instead of creating ActiveRecord objects there, fetch just concatenated string for each row. If it's better, but still too slow, you can end up generating just one big string in database and just rendering it for ust (this solution might not seem good, and it looks quite ugly, but in one project I had to use it, because it was the fastest way)
I'm trying to read a large amount of cells from database (over 100.000) and write them to a csv file on VPS Ubuntu server. It happens that server doesn't have enough memory.
I was thinking about reading 5000 rows at once and writing them to file, then reading another 5000, etc..
How should I restructure my current code so that memory won't be consumed fully?
Here's my code:
def write_rows(emails)
File.open(file_path, "w+") do |f|
f << "email,name,ip,created\n"
emails.each do |l|
f << [l.email, l.name, l.ip, l.created_at].join(",") + "\n"
end
end
end
The function is called from sidekiq worker by:
write_rows(user.emails)
Thanks for help!
The problem here is that when you call emails.each ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each:
require 'csv'
BATCH_SIZE = 5000
def write_rows(emails)
CSV.open(file_path, 'w') do |csv|
csv << %w{email name ip created}
emails.find_each do |email|
csv << [email.email, email.name, email.ip, email.created_at]
end
end
end
By default find_each loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size to find_each:
emails.find_each(:batch_size => 5000) do |email|
...
More information about the find_each method (and the related find_in_batches) can be found on the Ruby on Rails Guides.
I've used the CSV class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.
I currently have a controller that will handle a call to export a table into a CSV file using the FasterCSV gem. The problem is the information stored in the database isn't clear sometimes and so I want to change the output for a particular column.
My project.status column for instance has numbers instead of statuses ie 1 in the database corresponds to Active, 2 for Inactive and 0 for Not Yet decided. When I export the table it shows 0,1,2 instead of Active, Inactive or Not Yet decided. Any idea how to implement this?
I tried a simple loop that would check the final generated CSV file and change each 0,1,2 to its corresponding output, but the problem is every other column that had a 0,1,2 would change as well. I'm not sure how to isolate the column.
Thanks in advance
def csv
qt = params[:selection]
#lists = Project.find(:all, :order=> (params[:sort] + ' ' + params[:direction]), :conditions => ["name LIKE ? OR description LIKE ?", "%#{qt}%", "%#{qt}%"])
csv_string = FasterCSV.generate(:encoding => 'u') do |csv|
csv << ["Status","Name","Summary","Description","Creator","Comment","Contact Information","Created Date","Updated Date"]
#lists.each do |project|
csv << [project.status, project.name, project.summary, project.description, project.creator, project.statusreason, project.contactinfo, project.created_at, project.updated_at]
end
end
filename = Time.now.strftime("%Y%m%d") + ".csv"
send_data(csv_string,
:type => 'text/csv; charset=UTF-8; header=present',
:filename => filename)
end
This is actually fairly easy. In your controller code:
#app/controllers/projects_controller.rb#csv
#lists.each do |project|
csv << [project.descriptive_status, project.name, project.summary, project.description, project.creator, project.statusreason, project.contactinfo, project.created_at, project.updated_at]
end
Then in your model code. You probably already have a method that decodes the DB status to a more descriptive one though:
#app/models/project.rb
ACTIVE_STATUS = 0
INACTIVE_STATUS = 1
NOT_YET_DECIDED_STATUS = 2
def descriptive_status
case status
when ACTIVE_STATUS
"Active"
when INACTIVE_STATUS
"Inactive"
when NOT_YET_DECIDED_STATUS
"Not Yet Decided"
end
end
There are probably a number of ways you can then refactor this. In the controller at least, it would probably be best to make that finder a more descriptive named scope. The constants in the model could be brought into SettingsLogic configuration or another similar gem.