Rails 5 - stream file in smaller chunks? - ruby-on-rails

I am sending a file from my rails server to a microcontroller. The microcontroller is running out of memory because (we believe) the file is being sent in chunks that are too large - up to 16 kb at a time.
How can I take the StringIO object I have from S3 and send it to the requestor in 4kb chunks?
My current implementation:
file_name = "#{version}.zip"
firmware_file = s3(file_name).get()
response.headers['Content-Length'] = firmware_file.body.size.to_s
send_data firmware_file.body.read, filename: file_name

Rails has ActionController::Live module which helps you stream response in real time. In your case, as you want to create smaller chunks and send it to client (microcontroller), this feature might be useful.
"File System Monitoring" section of the article Is It Live? by aaron patterson explains how change in file system can be pushed to client in real time with ActionController::Live.
Hope this helps!

In short, you need to exploit ActionController::Live in order to stream response data to your client(s).
Since you are transferring zipfiles you can use the elegantly simple zipline gem. What I particularly like about this gem is that it supports a large number of streamable object types - so just about anything you can think of to throw at it, it will figure out how to stream it without much effort on your part.

Related

Need an use case example for stream response in ChicagoBoss

ChicageBoss controller API has this
{stream, Generator::function(), Acc0}
Stream a response to the client using HTTP chunked encoding. For each
chunk, the Generator function is passed an accumulator (initally Acc0)
and should return either {output, Data, Acc1} or done.
I am wondering what is the use case for this? There are others like Json, output. When will this stream be useful?
Can someone present an use case in real world?
Serving large files for download might be the most straight-forward use case.
You could argue that there are also other ways to serve files so that users can download them, but these might have other disadvantages:
By streaming the file, you don't have to read the entire file into memory before starting to send the response to the client. For small files, you could just read the content of the file, and return it as {output, BinaryContent, CustomHeader}. But that might become tricky if you want to serve large files like disk images.
People often suggest to serve downloadable files as static files (e.g. here). However, these downloads bypass all controllers, which might be an issue if you want things like download counters or access restrictions. Caching might be an issue, too.

What's the best way to copy files from Amazon S3 to another FTP?

Every night we generate reports to CSV files from our system and backup these files to Amazon S3.
Then we would need to copy these files (usually 1-5 files, the files don't have more than 5MB) from the Amazon S3 storage to another FTP.
What's the best way to do it? The system is written in Ruby On Rails. Generating CSV files is ran every night with using CRON.
I can upload a single file from my laptop like this:
def upload_to_ftp
Net::SFTP.start('FTP_IP', 'username', :password => 'password') do |sftp|
sftp.upload!("/Users/my_name/Downloads/report.csv", "/folder_on_the_ftp/report.csv")
end
render :nothing => true
end
But how to upload a few files not from a local hard drive, but from Amazon S3?
Thank you
Perhaps I'm not imaginative enough but I think you'll need to download it to your server and then upload it to the FTP.
You're missing just reading from S3; using ruby-aws-sdk it's simple, look here: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
But if the files grow larger than 5MB, you can use IO streams.
As far as I know Net:SFTP#upload! accepts an IO stream as an input. This is one side of the equation.
Then use ruby-aws-sdk to download the CSVs using streaming reads (again reference: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html). So in one thread write to 'buffer' (an instance of a class deriving from 'IO'):
s3 = AWS::S3.new
obj = s3.buckets['my-bucket'].objects['key']
obj.read do |chunk|
buffer.write(chunk)
end
In another thread run the upload using the 'buffer' object as the source.
Note that I haven't used this solution myself but this should get you started.
Also note that you'll buffer incoming data. Unless you use a temporary file and you have sufficient disk space on the server, you need to limit the amount of data you store in the 'buffer' (i.e. call #write only if you're below the maximum size of the object).
This is Ruby; it's not as if it has first-class support for concurrency.
I'd personally either upload to S3 and SFTP from the same code or if that is impossible, download the entire CSV file and then upload it to the SFT. I'd switch to streams only if this is necessary as an optimization. (Just my $.0002).

Rails: How to stream data to and from a binary column (blob)

I have a question about how to efficiently store and retrieve large amounts of data to and from a blob column (data_type :binary). Most examples and code out there show simple assignments but that cannot be efficient for large amounts of data. For instance storing data from a file may be something like this:
# assume a model MyFileStore has a column blob_content :binary
my_db_rec = MyFileStore.new
File.open("#{Rails.root}/test/fixtures/alargefile.txt", "rb") do |f|
my_db_rec.blob_content = f.read
end
my_db_rec.save
Clearly this would read the entire file content into memory before saving it to the database. This cannot be the only way you can save blobs. For instance, in Java and in .Net there are ways to stream to and from a blob column so you are not pulling every thing into memory (see Similar Questions to the right). Is there something similar in rails? Or are we limited to only small chunks of data being stored in blobs when it comes to Rails applications.
If this is Rails 4 you can use render stream. Here's an example Rails 4 Streaming
I would ask though what database you're using, and if it might be better to store the files in a filesystem (Amazon s3, Google Cloud Storage, etc..) as this can greatly affect your ability to manage blobs. Microsoft, for example, has this recommendation: To Blob or Not to Blob
Uploading is generally done through forms, all at once or multi-part. Multi-part chunks the data so you can upload larger files with more confidence. The chunks are reassembled and stored in whatever database field (and type) you have defined in your model.
Downloads can be streamed. There is a large tendency to hand off upload and streaming to third party cloud storage systems like amazon s3. This drastically reduces the burden on rails. You can also hand off upload duties to your web server. All modern web servers have a way to stream files from a user. Doing this avoids memory issues as only the currently uploading chunk is in memory at any give time. The web server should also be able to notify your app once the upload is completed.
For general streaming of output:
To add a stream to a template you need to pass the :stream option from within your controller like this: render stream: true. You also need to explicitly close the stream with response.stream.close. Since the method of rendering templates and layouts changes with streaming, it is important to pay attention to loading attributes like title, etc. This needs to be done with content_for not yield. You can explicitly open and close streams using the Live API. For this you need the puma gem. Also be aware that you need a web server that supports streaming. You can configure Unicorn to support streaming.

How to solve the memory-leak with send_file (or send_data) on Heroku?

I have a Rails 3 app that needs to generate an image and send the data to the browser.
The app must be deployed on Heroku.
However, Heroku only supports streaming through Mongrel which holds on to the memory. This then causes Heroku to slow, then kill the thread after a dozen or so requests.
https://devcenter.heroku.com/articles/error-codes#r14-memory-quota-exceeded
I am currently using send_data or send_file from ActionController::DataStreaming
http://api.rubyonrails.org/classes/ActionController/DataStreaming.html#method-i-send_data
Heroku does not support Rack::Sendfile or x-sendfile.
https://devcenter.heroku.com/articles/rack-sendfile
The project "ruby-mongrel-x-sendfile" says: "Streaming very much data through mongrel is a bad thing; springs stringy memory leaks" and provides an "in-mongrel solution". But it doesn't look like a good solution.
http://code.google.com/p/ruby-mongrel-x-sendfile/
A slow solution to this is to upload every file to Amazon S3 first.
Does anyone have any ideas please?
The answer is to start garbage collection with:
GC.start
I placed that line at the bottom of the Rails controller action after send_data.
http://www.ruby-doc.org/core-1.9.3/GC.html
The answer is absolutely not to start garbage collection. That masks the poor implementation. Your Ruby process will still consume more memory that is - strictly speaking - necessary.
The answer is to stream the response data - i.e. read the data chunk-by-chunk into memory and flush it through the response body. This way the maximum memory requirement to serve the file/data you are sending is limited to the size of the "page" being streamed.
Check out ActionController::Live and read the binary data out in chunks to the client requesting these images.

Why is Performing Multi-part Uploads to S3 on iOS not supported?

Problem statement:
I want to upload a large binary (such as an audio clip) from an iOS app to S3, and I'd like to make the app's handling of disconnects (or low connectivity) as robust as possible, preferably by uploading the binary as a series of chunks.
Unfortunately, neither the AWSiOS SDK, nor ASI's S3 framework seem to support to multi-part uploads, or indicate that they plan to add support. I realize that I can initiate a 'longish' upload using beginBackgroundTaskWithExpirationHandler: and that'll give me a window of time to complete the upload (currently 600 seconds, I believe), but what's to be done if I'm not in a situation to complete said upload within that timeframe?
Aside from worrying about completing tasks within that time frame, is their a 'best practice' for how an app should resume uploads, or even just break a larger upload into smaller chunks?
I've thought about writing a library to talk to S3's REST API specifically for multi-part uploads, but this seems like a problem other have either been solved, or realized needn't be solved (perhaps for being completely in appropriate for the platform).
Another (overly complicated) solution would be chunking the file on the device, uploading those to S3 (or elsewhere) and have them re-assembled on S3 via a server process. This seems even more unpalatable than rolling my own library for multi-part upload.
How are others handling this problem?
Apparently I was looking at some badly out of date documentation.
in AmazonS3Client see:
- (S3MultipartUpload * AmazonS3Client)initiateMultipartUploadWithKey:(NSString *)theKey withBucket:(NSString *)theBucket
Which will give you a S3MultipartUpload which will contain an uploadId.
You can then put together an S3UploadPartRequest using initWithMultipartUpload: (S3MultipartUpload *) multipartUpload and send that as you usually would.
S3UploadPartRequest contains an int property partNumber where you can specify the part # you're uploading.
you can write some code to do so, you can refer code from http://dextercoder.blogspot.in/2012/02/multipart-upload-to-amazon-s3-in-three.html. Core java code, steps can be used for iOS.

Resources