Streaming Download while File is Created - ruby-on-rails

I was wondering if anyone knows how to stream a file download while its being created at the same time.
I'm generating a huge CSV export and as of right now it takes a couple minutes for the file to be created. Once its created the browser then downloads the file.
I want to change this so that the browser starts downloading the file while its being created. Looking at this progress bar users will be more willing to wait. Even though it would tell me there an “Unknown time remaining” I’m less likely to get impatient since I know data is being steadily downloaded.
NOTE: Im using Rails version 3.0.9
Here is my code:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each(:batch_size => 100 ) do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}",
:stream => true,
end

I sought an answer to this question several weeks ago. I thought that if data was being streamed back to the client then maybe Heroku wouldn't time out one of my long running API calls after 30 seconds. I even found an answer that looked promising:
format.xml do
self.response_body =
lambda { |response, output|
output.write("<?xml version='1.0' encoding='UTF-8' ?>")
output.write("<results type='array' count='#{#report.count}'>")
#report.each do |result|
output.write("""
<result>
<element-1>Data-1</element-1>
<element-2>Data-2</element-2>
<element-n>Data-N</element-n>
</result>
""")
end
output.write("</results>")
}
end
The idea being that the response_body lambda will have direct access to the output buffer going back to the client. However, in practice Rack has its own ideas about what data should be sent back and when. Furthermore this response_body as lambda pattern is deprecated in newer versions of rails and I think support is dropped outright in 3.2. You could get your hands dirty in the middleware stack and write this output as a Rails Metal but......
If I may be so bold, I strongly suggest refactoring this work to a background job. The benefits are many:
Your users will not have to just sit and wait for the download. They can request a file and then browse away to other more exciting portions of your site.
The file generation and download will be more robust, for example, if a user loses internet connectivity, even briefly, on minute three of a download under the current setup, they will lose all that time and need to start over again. If the file is being generated in the background on your site, they only need internet for as long as it takes to get the job started.
It will decrease the load on your front-end processes and may decrease the load on your site in total if the background job generates the files and you provide links to the generated files on a page within your app. Chances are one file generation could serve several downloads.
Since practically all Rails web servers are single threaded and synchronous out of the box, you will have an entire app server process tied up on this one file download for each time a user requests it. This makes it easy for users to accidentally carry out a DoS attack on your site.
You can ship the background generated file to a CDN such as S3 and perhaps gain a performance boost on the download speed your users see.
When the background process is done you can notify the user via email so they don't even have to be at the computer where they initiated the file generation in order to know it's done.
Once you have a background job system in your application you will find many more uses for it, such as sending email or updating search indexing.
Sorry that this doesn't really answer your original question. But I strongly believe this is a better overall solution.

Related

Can I use http streaming with axlsx_rails to avoid timeout issue with large/time intensive query?

I'm using the axlsx_rails Ruby gem in Rails 4.2.5 to generate an Excel file to let users download their data.
I have this in my index.xlsx.axlsx template:
wb = xlsx_package.workbook
wb.add_worksheet(name: 'Transactions') do |sheet|
sheet.add_row ["Date", "Vendor Name", "Account",
"Transaction Category",
"Amount Spent", "Description"]
#transactions.find_each(batch_size: 100) do |transaction|
sheet.add_row [transaction.transaction_date,
transaction.vendor_name,
transaction.account.account_name,
transaction.transaction_category.name,
transaction.amount,
transaction.description]
end
end
The page times out before returning an Excel file if there's enough data. Is there a way to use HTTP streaming to send results back as it's processing, rather than waiting until the entire transactions.find_each loop has completed?
I saw code here using response.stream.write:
response.headers['Content-Type'] = 'text/event-stream'
10.times {
response.stream.write "This is a test message"
sleep 1
}
response.stream.close
That approach looks promising, but I couldn't figure out how to integrate response.stream.write into an axlsx_rails template. Is there a way?
This is my first Stack Overflow question- apologies for any faux pas and thank you for any ideas you can offer.
Welcome to SO, Joe.
I asked in comment, but perhaps it's better to answer and explain.
The short answer, is yes, you can always stream if you can render (though with sometimes mixed performance results).
It does not, however, work if your referencing a file directly. IE, http://someurl.com/reports/mycustomreport.xlsx
Streaming in rails just isn't built that way by default. But not to worry, you "should" still be able to tackle your issue, providing the time you wish to save is rendering only.
In your controller (* note for future, when you're asking about rendering actions, it helps to provide your controller action code *) you should be able to do something similar to:
def report
#transactions = current_user.transactions.all
respond_to do |format|
format.html { render xlsx: 'report', stream: true}
end
end
Might help to do a sanity check on your loading. In your log as part of the 200 response you should get something like:
Completed 200 OK in 506ms (Views: 494.6ms | ActiveRecord: 2.8ms)
If the active record number is too high, or higher than the view number, this solution might not work for your query, and as suggested, this might need to be threaded or sent to a job.
Even if you can stream, I don't think it will be any faster. The problem is Axlsx is not going to generate your spreadsheet until you are done building it. And axlsx_rails just wraps that process, so it won't help either. So there will be no partial spreadsheet to serve in bits, and the delay will be just as long.
You should bite the bullet and try Sidekiq (which is very fast) or some other job scheduler. Then you can return the request immediately and generate the spreadsheet in the background. You will have to do some kind of monitoring or notification to get the generated report, or a ping back to another url using javascript that forwards to a new page when a flag is set on render complete. Your call there.
Having a job scheduler is also very convenient when you need to fire off an email in response to a request; the response can return immediately and not wait for the email to complete. Once you have a scheduler you will find more uses for it.
If you choose a job scheduler, axlsx_rails will let you use your template to generate the attachment, or you can create your own view context to generate the file. Or for a really bare bones way of rendering the template, see this test.

Lengthy operation in Ruby-on-Rails

I am face to face with the following situation.
A user clicks on a link in order to generate a text or XML. I have a method generateXMLfile in my controller which reads data from a db table and creates a hash or array. After reading and creating are finished I send data using send_file method.
The file generating process may take time between 5 and 25 seconds (huge data), so what I want to do is to display the "Please wait" message with waiting gif animation while the request is being processed, and display the success message upon successful completion.
I know how to implement similar operations such as, for example, a file upload using pure AJAX, but I don't know how to do it in Rails.
Has anyone dealt with the similar problem? What is the best practice or Rails way to perform this operation? Any suggestions or recommendations?
UPDATE:
def generateXMLfile
#lengthy operation
(1..100000000).each do
end
sample_string = "This is a sample string\ngenerated by generateXML method.\n\n Bye!"
send_data sample_string,
:type => 'charset=utf-8; header=present',
:disposition => "attachment; filename=sample.txt"
end
You can bind call like this using UJS.
<%= link_to "send file",generateXMLfile_path, :remote => true, :id => "send_file" %>
$('#send_file').bind('ajax:beforeSend', function() {
$('#please wait').show();
});
$('#send_file').bind('ajax:complete', function() {
$('#please_wait').hide();
$('flash').show();
});
You can also use generateXMLfile.js.erb for complete action.
It's not a good idea to have 5-25 seconds requests.
It can render your application unresponsive when multiple users start uploading files simultaneously. Or you can hit a timeout limit on your web server.
You should use some background processing tool, here are some options:
delayed job (https://github.com/collectiveidea/delayed_job)
sidekiq (http://sidekiq.org/)
resque (http://resquework.org/)
Delayed job is the simplest one, Sidekiq and Resque are a little bit more complex and they require you to install Redis.
When background processing finishes you can use some Websocket-based tool to send a message to your frontend. Pusher (http://pusher.com/) is one of such tools.

how to consume twitter/datasift stream with rails on heroku

How would one consume a streaming api (like the twitter streaming api) with rails on heroku? Would it involve keeping a script running with a worker that consumes the stream? If there are any existing resources that document this please link, I have not been able to find much so far.
Your two options are to use a worker dyno to run a script that consumes the stream and writes it to a data store (your database etc.), or to fetch parts of the stream on the fly in your rails application as part of your response to HTTP requests.
Which one of those makes sense for you depends on what you are trying to do with the data and how much of the stream you need.
Sorry for the soft answer, none of this code or ideas are my own...
The easiest way to consume a streaming API without using a background worker on Heroku is to use EventMachine
In a model, you'd do something like this:
EM.schedule do
http = EM::HttpRequest.new(STREAMING_URL).get :head => { 'Authorization' => [ 'USERNAME', 'PASSWORD' ] }
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
handle_tweet JSON.parse(line)
end
end
end
For more details have a look at Adam Wiggins, Joslyn Esser and Kenne Jima

Batch downloads in Ruby on Rails

I've got a Ruby on Rails app and I was wondering what the best way do do batch downloads would be? At any given time I've got a set of URLs that point to files that I want my users to be able to download, but based on a search of those files done by my users I want them to be able to download a subset of those files, say the search result, in one process instead of having them download them individually. This set of files may potentially number in the thousands. My real question is, based on an array of URLs, how do I enable my app to download that entire set at once? I of course did some Googling and I came up with the solution below. It doesn't seem to work for me, but it did seem to work for those who posted it as a solution for a similar problem. Any and all input would be appreciated.
# controller code
def download
for n in 0..(#urls.length - 1)
send_file(#urls[n], :type => "video/quicktime",
:filename => #urls[n].basename,
:disposition => "attachment")
end
end
# view code
<%= link_to 'Download them all', :controller => 'my_controller',
:action => 'download' %>
This approach seems to me that it will use a huge amount of memory, especially with 1000s of files downloaded per user at a time. Perhaps instead you should ZIP the files in the background after they click a link and then send it to the user or email them the location of where the ZIP's at. It'll still use a lot of memory to ZIP that many files, so perhaps offloading that task to another server would be good.

My web site need to read a slow web site, how to improve the performance

I'm writing a web site with rails, which can let visitors inputing some domains and check if they had been regiestered.
When user clicked "Submit" button, my web site will try to post some data to another web site, and read the result back. But that website is slow for me, each request need 2 or 3 seconds. So I'm worried about the performance.
For example, if my web server allows 100 processes at most, that there are only 30 or 40 users can visit my website at the same time. This is not acceptable, is there any way to improve the performance?
PS:
At first, I want to use ajax reading that web site, but because of the "cross-domain" problem, it doesn't work. So I have to use this "ajax proxy" solution.
It's a bit more work, but you can use something like DelayedJob to process the requests to the other site in the background.
DelayedJob creates separate worker processes that look at a jobs table for stuff to do. When the user clicks submit, such a job is created, and starts running in one of those workers. This off-loads your Rails workers, and keeps your website snappy.
However, you will have to create some sort of polling mechanism in the browser while the job is running. Perhaps using a refresh or some simple AJAX. That way, the visitor could see a message such as “One moment, please...”, and after a while, the actual results.
Rather than posting some data to the websites, you could use an HTTP HEAD request, which (I believe) should return only the header information for that URL.
I found this code by googling around a bit:
require "net/http"
req = Net::HTTP.new('google.com', 80)
p req.request_head('/')
This will probably be faster than a POST request, and you won't have to wait to receive the entire contents of that resource. You should be able to determine whether the site is in use based on the response code.
Try using typhoeus rather than AJAX to get the body. You can POST the domain names for that site to check using typhoeus and can parse the response fetched. Its extremely fast compared to other solutions. A snippet that i ripped from the wiki page from the github repo http://github.com/pauldix/typhoeus shows that you can run requests in parallel (Which is probably what you want considering that it takes 1 to 2 seconds for an ajax request!!) :
hydra = Typhoeus::Hydra.new
first_request = Typhoeus::Request.new("http://localhost:3000/posts/1.json")
first_request.on_complete do |response|
post = JSON.parse(response.body)
third_request = Typhoeus::Request.new(post.links.first) # get the first url in the post
third_request.on_complete do |response|
# do something with that
end
hydra.queue third_request
return post
end
second_request = Typhoeus::Request.new("http://localhost:3000/users/1.json")
second_request.on_complete do |response|
JSON.parse(response.body)
end
hydra.queue first_request
hydra.queue second_request
hydra.run # this is a blocking call that returns once all requests are complete
first_request.handled_response # the value returned from the on_complete block
second_request.handled_response # the value returned from the on_complete block (parsed JSON)
Also Typhoeus + delayed_job = AWESOME!

Resources