failed to allocate memory while downloading large files in rails - ruby-on-rails

Hii all,
I am trying to downaload large file in rails using send_data function ,but
getting error :failes to allocate memory and when trying to download in chunks ,getting only chunk size file only ,below is my code ..
File.open(#containerformat.location,"rb"){|f| #data = f.read(8888)}
ext = File.extname(#containerformat.streamName)
if ext == ''
extension = File.extname(#containerformat.location)
send_data(#data,:filename => #containerformat.name+extension,
:disposition => 'attachment')
else
send_data(#data,:filename => #containerformat.streamName,
:disposition => 'attachment')
end
i think am not able to make loop work

You are reading whole file into memory!
Use send_file which uses memory friendly buffered stream.
I would also suggest to use :x_sendfile here, then file may be served directly by front server (Apache, nginx, lighttpd) if proper module is available and configured. This gives very efficient downloads and prevents blocking rails instance by slow clients.
Read about "X-Sendfile" header. http://tn123.ath.cx/mod_xsendfile/

Related

How to read a pdf response from Rails app and save to file with parallel tests?

Ok - I have the following in my test/test_helper.rb:
def read_pdf_from_response(response)
file = Tempfile.new
file.write response.body.force_encoding('UTF-8')
begin
reader = PDF::Reader.new(file)
reader.pages.map(&:text).join.squeeze("\n")
ensure
file.close
file.unlink
end
end
I use it like this in an integration test:
get project_path(project, format: 'pdf')
read_response_from_pdf(#response).tap do |pdf|
assert_match(/whatever/, pdf)
end
This works fine as long as I run a test singly or when running all tests with only one worker, e.g. PARALLEL_WORKERS=1. But tests that use this method will fail intermittently when I run my suite with more than 1 parallel worker. My laptop has 8 cores, so that's normally what it's running with.
Here's the error:
PDF::Reader::MalformedPDFError: PDF malformed, expected 5 but found 96 instead
or sometimes: PDF::Reader::MalformedPDFError: PDF file is empty
The PDF reader is https://github.com/yob/pdf-reader which hasn't given any problems.
The controller that sends the PDF returns like so:
send_file out_file,
filename: "#{#project.name}.pdf",
type: 'application/pdf',
disposition: (params[:download] ? 'attachment' : 'inline')
I can't see why this isn't working. No files should ever have the same name at the same time, since I'm using Tempfile, right? How can I make all this run with parallel tests?
While I cannot confirm why this is happening the issue may be that:
You are forcing the encoding to "UTF-8" but PDF documents are binary files so this conversion could be damaging the PDF.
Some of the responses you are receiving are truly empty or malformed.
Maybe try this instead:
def read_pdf_from_response(response)
doc = StringIO.new(response.body.to_s)
begin
PDF::Reader.new(doc)
.pages
.map(&:text)
.join
.squeeze("\n")
rescue PDF::Reader::MalformedPDFError => e
# handle issues with the pdf itself
end
end
This will avoid the file system altogether while still using a compatible IO object and will make sure that the response is read as binary to avoid any conversion conflicts.

rails - Exporting a huge CSV file consumes all RAM in production

So my app exports a 11.5 MB CSV file and uses basically all of the RAM that never gets freed.
The data for the CSV is taken from the DB, and in the case mentioned above the whole thing is being exported.
I am using Ruby 2.4.1 standard CSV library in the following fashion:
export_helper.rb:
CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
data = Model.scope1(param).scope2(param).includes(:model1, :model2)
data.each do |item|
file << [
item.method1,
item.method2,
item.methid3
]
end
# repeat for other models - approx. 5 other similar loops
end
and then in the controller:
generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
format.csv do
send_file(
"#{Rails.root}/full_report.csv",
filename: 'full_report.csv',
type: :csv,
disposition: :attachment
)
end
end
After a single request the puma processes load 55% of the whole server's RAM and stay like that until eventually run out of memory completely.
For instance in this article generating a million-lines 75 MB CSV file required only 1 MB of RAM. But there is no DB querying involved.
The server has 1015 MB RAM + 400 MB of swap memory.
So my questions are:
What exactly consumes so much memory? Is it the CSV generation or the communication with the DB?
Am I doing something wrong and missing a memory leak? Or is it just how the library works?
Is there way to free up the memory without restarting puma workers?
Thanks in advance!
Instead of each you should be using find_each, which is specifically for cases like this, because it will instantiate the Models in batches and release them afterwards, whereas each will instantiate all of them at once.
CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
Model.scope1(param).find_each do |item|
file << [
item.method1
]
end
end
Furthermore you should stream the CSV instead of writing it to memory or disk before sending it to the browser:
format.csv do
headers["Content-Type"] = "text/csv"
headers["Content-disposition"] = "attachment; filename=\"full_report.csv\""
# streaming_headers
# nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
headers['X-Accel-Buffering'] = 'no'
headers["Cache-Control"] ||= "no-cache"
# Rack::ETag 2.2.x no longer respects 'Cache-Control'
# https://github.com/rack/rack/commit/0371c69a0850e1b21448df96698e2926359f17fe#diff-1bc61e69628f29acd74010b83f44d041
headers["Last-Modified"] = Time.current.httpdate
headers.delete("Content-Length")
response.status = 200
header = ['Method 1', 'Method 2']
csv_options = { col_sep: ";" }
csv_enumerator = Enumerator.new do |y|
y << CSV::Row.new(header, header).to_s(csv_options)
Model.scope1(param).find_each do |item|
y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
end
end
# setting the body to an enumerator, rails will iterate this enumerator
self.response_body = csv_enumerator
end
Apart from using find_each, you should try running the ReportGenerator code in a background job with ActiveJob. As background jobs run in seperate processes, when they are killed memory is released back to the OS.
So you could try something like this:
A user requests some report(CSV, PDF, Excel)
Some controller enqeues a ReportGeneratorJob, and a confirmation is displayed to the user
The job is performed and an email sent with the download link/file.
Beware tho, you can easily improve ActiveRecord side, but then when sending response through Rails, it will all end up in memory buffer in the Response object: https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/response.rb#L110
You also need to take use of live streaming feature to pass the data to the client directly without buffering: https://guides.rubyonrails.org/action_controller_overview.html#live-streaming-of-arbitrary-data

Sending zipped files as attachments

I have a rails application that creates a couple of csv file, zips them up and sends them to the client as an attachment(for download) using this line:
send_file t.path, :x_sendfile => true, :type => 'application/zip', :filename => "invited_friends_stats.zip"
When I view the zipped file created on the server, I'm able to use it, however, when I download the file through the application, it uncompresses into a .zip.cpgz file, while in turn compresses into a zip file which compresses into a .zip.cpgz file, etc, etc.
I then downloaded "The Unarchiver" app (on Mac OSX) and when I try and open the .zip file I get an error: "the contents cannot be extracted with this program"
Does anyone have any idea why this is happening? Encoding error, etc? Is there something I'm missing from the line above, or in my configuration that would fix this?
You try to stream the ZIP file. Try adding :disposition => 'attachment' to force the browser to download the complete file.
Try setting the Content-Disposition response header with something like this
response.headers['Content-Disposition'] = "attachment; filename=\"#{#filename}\""

Rails send_file alternative with Rackspace Cloudfiles

I am trying to setup a download link for a file management system built on Rails 3 using the paperclip-cloudfiles gem. The send_file method works great when hosting files locally, but I need to use the Rackspace Cloudfiles system. I've tried setting the response headers and it seems to initialize the download, but the file is empty when finished.
here is my download function:
#file = UserFile.find(params[:id])
response.headers['Content-type'] = "#{#file.attachment_content_type}"
response.headers['Content-Disposition'] = "attachment;filename=\"#{#file.attachment_file_name}\""
response.headers['Content-Length'] = "#{#file.attachment_file_size}"
response.headers['Content-Description'] = 'File Transfer'
response.headers['Location'] = "#{#file.attachment.url(:original, false)}"
render :nothing => true
Am I doing this right?
I've also tried using just the ruby-cloudfiles library from Rackspace to download the object but no luck there as well.
Use "send_data" method.
It works for me.

ruby reading files from S3 with open-URI

I'm having some problems reading a file from S3. I want to be able to load the ID3 tags remotely, but using open-URI doesn't work, it gives me the following error:
ruby-1.8.7-p302 > c=TagLib2::File.new(open(URI.parse("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514")))
TypeError: can't convert Tempfile into String
from (irb):8:in `initialize'
from (irb):8:in `new'
from (irb):8
However, if i download the same file and put it on my desktop (ie no need for open-URI), it works just fine.
c=TagLib2::File.new("/Users/momofwombie/Desktop/blah.mp3")
is there something else I should be doing to read a remote file?
UPDATE: I just found this link, which may explain a little bit, but surely there must be some way to do this...
Read header data from files on remote server
Might want to check out AWS::S3, a Ruby Library for Amazon's Simple Storage Service
Do an AWS::S3:S3Object.find for the file and then an use about to retrieve the metadata
This solution assumes you have the AWS credentials and permission to access the S3 bucket that contains the files in question.
TagLib2::File.new doesn't take a file handle, which is what you are passing to it when you use open without a read.
Add on read and you'll get the contents of the URL, but TagLib2::File doesn't know what to do with that either, so you are forced to read the contents of the URL, and save it.
I also noticed you are unnecessarily complicating your use of OpenURI. You don't have to parse the URL using URI before passing it to open. Just pass the URL string.
require 'open-uri'
fname = File.basename($0) << '.' << $$.to_s
File.open(fname, 'wb') do |fo|
fo.print open("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514").read
end
c = TagLib2::File.new(fname)
# do more processing...
File.delete(fname)
I don't have TagLib2 installed but I ran the rest of the code and the mp3 file downloaded to my disk and is playable. The File.delete would clean up afterwards, which should put you in the state you want to be in.
This solution isn't going to work much longer. Paperclip > 3.0.0 has removed to_file. I'm using S3 & Heroku. What I ended up doing was copying the file to a temporary location and parsing it from there. Here is my code:
dest = Tempfile.new(upload.spreadsheet_file_name)
dest.binmode
upload.spreadsheet.copy_to_local_file(:default_style, dest.path)
file_loc = dest.path
...
CSV.foreach(file_loc, :headers => true, :skip_blanks => true) do |row|}
This seems to work instead of open-URI:
Mp3Info.open(mp3.to_file.path) do |mp3info|
puts mp3info.tag.artist
end
Paperclip has a to_file method that downloads the file from S3.

Resources