Lengthy operation in Ruby-on-Rails - ruby-on-rails

I am face to face with the following situation.
A user clicks on a link in order to generate a text or XML. I have a method generateXMLfile in my controller which reads data from a db table and creates a hash or array. After reading and creating are finished I send data using send_file method.
The file generating process may take time between 5 and 25 seconds (huge data), so what I want to do is to display the "Please wait" message with waiting gif animation while the request is being processed, and display the success message upon successful completion.
I know how to implement similar operations such as, for example, a file upload using pure AJAX, but I don't know how to do it in Rails.
Has anyone dealt with the similar problem? What is the best practice or Rails way to perform this operation? Any suggestions or recommendations?
UPDATE:
def generateXMLfile
#lengthy operation
(1..100000000).each do
end
sample_string = "This is a sample string\ngenerated by generateXML method.\n\n Bye!"
send_data sample_string,
:type => 'charset=utf-8; header=present',
:disposition => "attachment; filename=sample.txt"
end

You can bind call like this using UJS.
<%= link_to "send file",generateXMLfile_path, :remote => true, :id => "send_file" %>
$('#send_file').bind('ajax:beforeSend', function() {
$('#please wait').show();
});
$('#send_file').bind('ajax:complete', function() {
$('#please_wait').hide();
$('flash').show();
});
You can also use generateXMLfile.js.erb for complete action.

It's not a good idea to have 5-25 seconds requests.
It can render your application unresponsive when multiple users start uploading files simultaneously. Or you can hit a timeout limit on your web server.
You should use some background processing tool, here are some options:
delayed job (https://github.com/collectiveidea/delayed_job)
sidekiq (http://sidekiq.org/)
resque (http://resquework.org/)
Delayed job is the simplest one, Sidekiq and Resque are a little bit more complex and they require you to install Redis.
When background processing finishes you can use some Websocket-based tool to send a message to your frontend. Pusher (http://pusher.com/) is one of such tools.

Related

Can I use http streaming with axlsx_rails to avoid timeout issue with large/time intensive query?

I'm using the axlsx_rails Ruby gem in Rails 4.2.5 to generate an Excel file to let users download their data.
I have this in my index.xlsx.axlsx template:
wb = xlsx_package.workbook
wb.add_worksheet(name: 'Transactions') do |sheet|
sheet.add_row ["Date", "Vendor Name", "Account",
"Transaction Category",
"Amount Spent", "Description"]
#transactions.find_each(batch_size: 100) do |transaction|
sheet.add_row [transaction.transaction_date,
transaction.vendor_name,
transaction.account.account_name,
transaction.transaction_category.name,
transaction.amount,
transaction.description]
end
end
The page times out before returning an Excel file if there's enough data. Is there a way to use HTTP streaming to send results back as it's processing, rather than waiting until the entire transactions.find_each loop has completed?
I saw code here using response.stream.write:
response.headers['Content-Type'] = 'text/event-stream'
10.times {
response.stream.write "This is a test message"
sleep 1
}
response.stream.close
That approach looks promising, but I couldn't figure out how to integrate response.stream.write into an axlsx_rails template. Is there a way?
This is my first Stack Overflow question- apologies for any faux pas and thank you for any ideas you can offer.
Welcome to SO, Joe.
I asked in comment, but perhaps it's better to answer and explain.
The short answer, is yes, you can always stream if you can render (though with sometimes mixed performance results).
It does not, however, work if your referencing a file directly. IE, http://someurl.com/reports/mycustomreport.xlsx
Streaming in rails just isn't built that way by default. But not to worry, you "should" still be able to tackle your issue, providing the time you wish to save is rendering only.
In your controller (* note for future, when you're asking about rendering actions, it helps to provide your controller action code *) you should be able to do something similar to:
def report
#transactions = current_user.transactions.all
respond_to do |format|
format.html { render xlsx: 'report', stream: true}
end
end
Might help to do a sanity check on your loading. In your log as part of the 200 response you should get something like:
Completed 200 OK in 506ms (Views: 494.6ms | ActiveRecord: 2.8ms)
If the active record number is too high, or higher than the view number, this solution might not work for your query, and as suggested, this might need to be threaded or sent to a job.
Even if you can stream, I don't think it will be any faster. The problem is Axlsx is not going to generate your spreadsheet until you are done building it. And axlsx_rails just wraps that process, so it won't help either. So there will be no partial spreadsheet to serve in bits, and the delay will be just as long.
You should bite the bullet and try Sidekiq (which is very fast) or some other job scheduler. Then you can return the request immediately and generate the spreadsheet in the background. You will have to do some kind of monitoring or notification to get the generated report, or a ping back to another url using javascript that forwards to a new page when a flag is set on render complete. Your call there.
Having a job scheduler is also very convenient when you need to fire off an email in response to a request; the response can return immediately and not wait for the email to complete. Once you have a scheduler you will find more uses for it.
If you choose a job scheduler, axlsx_rails will let you use your template to generate the attachment, or you can create your own view context to generate the file. Or for a really bare bones way of rendering the template, see this test.

Webhook firing multiple times, causing heavy API calls

My app has some heavy callback validations when I create a new customer. Basically I check multiple APIs to see if there's a match before creating a new customer record. I don't want this to happen after create, because I'd rather not save the record in the first place if there aren't any matches.
I have a webhook setup that creates a new customer. The problem is that, because my customer validations take so long, the webhook continues to fire because it doesn't get the immediate response.
Here's my Customer model:
validates :shopify_id, uniqueness: true, if: 'shopify_id.present?'
before_validation :get_external_data, :on => :create
def get_external_data
## heavy API calls that I don't want to perform multiple times
end
My hook:
customer = shop.customers.new(:first_name => first_name, :last_name => last_name, :email => email, :shopify_url => shopify_url, :shopify_id => id)
customer.save
head :ok
customer.save is taking about 20 seconds.
To clarify, here's the issue:
Webhook is fired
Heavy API Calls are made
Second Webhook is fired (API calls still being made from first webhook). Runs Heavy API Calls
Third Webhook is fired
This happens until finally the first record is saved so that I can now check to make sure shopify_id is unique
Is there a way around this? How can I defensively program to make sure no duplicate records start to get processed?
What an interesting question, thank you.
Asynchronicity
The main issue here is the dependency on external web hooks.
The latency required to test these will not only impact your save times, but also prevent your server from handling other requests (unless you're using some sort of multi processing).
It's generally not a good idea to have your flow dependent on more than one external resource. In this case, it's legit.
The only real suggestion I have is to make it an asynchronous flow...
--
Asynchronous vs synchronous execution, what does it really mean?
When you execute something synchronously, you wait for it to finish
before moving on to another task. When you execute something
asynchronously, you can move on to another task before it finishes.
In JS, the most famous example of making something asynchronous is to use an Ajax callback... IE sending a request through Ajax, using some sort of "waiting" process to keep user updated, then returning the response.
I would propose implementing this for the front-end. The back-end would have to ensure the server's hands are not tied whilst processing the external API calls. This would either have to be done using some other part of the system (not requiring the use of the web server process), or separating the functionality into some other format.
Ajax
I would most definitely use Ajax on the front-end, or another asynchronous technology (web sockets?).
Either way, when a user creates an account, I would create a "pending" screen. Using ajax is the simplest example of this; however, it is massively limited in scope (IE if the user refreshes the page, he's lost his connection).
Maybe someone could suggest a way to regain state in an asynchronous system?
You could handle it with Ajax callbacks:
#app/views/users/new.html.erb
<%= form_for #user, remote: true do |f| %>
<%= f.text_field ... %>
<%= f.submit %>
<% end %>
#app/assets/javascripts/application.js
$(document).on("ajax:beforeSend", "#new_user", function(xhr, settings){
//start "pending" screen
}).on("ajax:send", "#new_user", function(xhr){
// keep user updated somehow
}).on("ajax:success", "#new_user", function(event, data, status, xhr){
// Remove "pending" screen, show response
});
This will give you a front-end flow which does not jam up the server. IE you can still do "stuff" on the page whilst the request is processing.
--
Queueing
The second part of this will be to do with how your server processes the request.
Specifically, how it deals with the API requests, as they are what are going to be causing the delay.
The only way I can think of at present will be to queue up requests, and have a separate process go through them. The main benefit here being that it will make your Rails app's request asynchronous, instead of having to wait around for the responses to come.
You could use a gem such as Resque to queue the requests (it uses Redis), allowing you to send the request to the Resque queue & capture its response. This response will then form your response to your ajax request.
You'd probably have to set up a temporary user before doing this:
#app/models/user.rb
class User < ActiveRecord::Base
after_create :check_shopify_id
private
def check_shopify_id
#send to resque/redis
end
end
Of course, this is a very high level suggestion. Hopefully it gives you some better perspective.
This is a tricky issue since your customer creation is dependant on an expensive validation. I see a few ways you can mitigate this, but it will be a "lesser of evils" type decision:
Can you pre-call/pre-load the customer list? If so you can cache the list of customers and validate against that instead of querying on each create. This would require a cron job to keep a list of customers updated.
Create the customer and then perform the customer check as a "validation" step. As in, set a validated flag on the customer and then run the check once in a background task. If the customer exists, merge with the existing customer; if not, mark the customer as valid.
Either choice will require work arounds to avoid the expensive calls.

Pull/push status in rails 3

I have a longer running task in the background, and how exactly would I let pull status from my background task or would it better somehow to communicate the task completion to my front end?
Background :
Basically my app uses third party service for processing data, so I want this external web service workload not to block all the incoming requests to my website, so I put this call inside a background job (I use sidekiq). And so when this task is done, I was thinking of sending a webhook to a certain controller which will notify the front end that the task is complete.
How can I do this? Is there a better solution for this?
Update:
My app is hosted on heroku
Update II:
I've done some research on the topic and I found out that I can create a seperate app on heroku which will handle this, found this example :
https://github.com/heroku-examples/ruby-websockets-chat-demo
This long running task will be run per user, on a website with a lot of traffic, is this a good idea?
I would implement this using a pub/sub system such as Faye or Pusher. The idea behind this is that you would publish the status of your long running job to a channel, which would then cause all subscribers of that channel to be notified of the status change.
For example, within your job runner you could notify Faye of a status change with something like:
client = Faye::Client.new('http://localhost:9292/')
client.publish('/jobstatus', {id: jobid, status: 'in_progress'})
And then in your front end you can subscribe to that channel using javascript:
var client = new Faye.Client('http://localhost:9292/');
client.subscribe('/jobstatus', function(message) {
alert('the status of job #' + message.jobid + ' changed to ' + message.status);
});
Using a pub/sub system in this way allows you to scale your realtime page events separately from your main app - you could run Faye on another server. You could also go for a hosted (and paid) solution like Pusher, and let them take care of scaling your infrastructure.
It's also worth mentioning that Faye uses the bayeaux protocol, which means it will utilise websockets where it is available, and long-polling where it is not.
We have this pattern and use two different approaches. In both cases background jobs are run with Resque, but you could likely do something similar with DelayedJob or Sidekiq.
Polling
In the polling approach, we have a javascript object on the page that sets a timeout for polling with a URL passed to it from the rails HTML view.
This causes an Ajax ("script") call to the provided URL, which means Rails looks for the JS template. So we use that to respond with state and fire an event for the object to response to when available or not.
This is somewhat complicated and I wouldn't recommend it at this point.
Sockets
The better solution we found was to use WebSockets (with shims). In our case we use PubNub but there are numerous services to handle this. That keeps the polling/open-connection off your web server and is much more cost effective than running the servers needed to handle these connection.
You've stated you are looking for front-end solutions and you can handle all the front-end with PubNub's client JavaScript library.
Here's a rough idea of how we notify PubNub from the backend.
class BackgroundJob
#queue = :some_queue
def perform
// Do some action
end
def after_perform
publish some_state, client_channel
end
private
def publish some_state, client_channel
Pubnub.new(
publish_key: Settings.pubnub.publish_key,
subscribe_key: Settings.pubnub.subscribe_key,
secret_key: Settings.pubnub.secret_key
).publish(
channel: client_channel,
message: some_state.to_json,
http_sync: true
)
end
end
The simplest approach that I can think of is that you set a flag in your DB when the task is complete, and your front-end (view) sends an ajax request periodically to check the flag state in db. In case the flag is set, you take appropriate action in the view. Below are code samples:
Since you suggested that this long running task needs to run per user, so let's add a boolean to users table - task_complete. When you add the job to sidekiq, you can unset the flag:
# Sidekiq worker: app/workers/task.rb
class Task
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
# Long running task code here, which executes per user
user.task_complete = true
user.save!
end
end
# When adding the task to sidekiq queue
user = User.find(params[:id])
# flag would have been set to true by previous execution
# In case it is false, it means sidekiq already has a job entry. We don't need to add it again
if user.task_complete?
Task.perform_async(user.id)
user.task_complete = false
user.save!
end
In the view you can periodically check whether the flag was set using ajax requests:
<script type="text/javascript">
var complete = false;
(function worker() {
$.ajax({
url: 'task/status/<%= #user.id %>',
success: function(data) {
// update the view based on ajax request response in case you need to
},
complete: function() {
// Schedule the next request when the current one's complete, and in case the global variable 'complete' is set to true, we don't need to fire this ajax request again - task is complete.
if(!complete) {
setTimeout(worker, 5000); //in miliseconds
}
}
});
})();
</script>
# status action which returns the status of task
# GET /task/status/:id
def status
#user = User.find(params[:id])
end
# status.js.erb - add view logic based on what you want to achieve, given whether the task is complete or not
<% if #user.task_complete? %>
$('#success').show();
complete = true;
<% else %>
$('#processing').show();
<% end %>
You can set the timeout based on what the average execution time of your task is. Let's say your task takes 10 minutes on average, so their's no point in checking it at a 5sec frequency.
Also in case your task execution frequency is something complex (and not 1 per day), you may want to add a timestamp task_completed_at and base your logic on a combination of the flag and timestamp.
As for this part:
"This long running task will be run per user, on a website with a lot of traffic, is this a good idea?"
I don't see a problem with this approach, though architectural changes like executing jobs (sidekiq workers) on separate hardware will help. These are lightweight ajax calls, and some intelligence built into your javascript (like the global complete flag) will avoid the unnecessary requests. In case you have huge traffic, and DB reads/writes are a concern then you may want to store that flag directly into redis instead (since you already have it for sidekiq). I believe that will resolve your read/write concerns, and I don't see that it is going to cause problems. This is the simplest and cleanest approach I can think of, though you can try achieving the same via websockets, which are supported by most modern browsers (though can cause problems in older versions).

Asynchronous GET request in Rails

I'm working on a Ruby on Rails app that relies on my app making some simple URL calls for user metrics. For part of the tracking I need to make a server-side call prior to the rendering of my index page. This is achieved by calling a specially formatted URL. Currently I'm achieving this in the following way:
url = URI.parse('https://example.tracking.url')
result = Net::HTTP.start(url.host, use_ssl: true, verify_mode: OpenSSL::SSL::VERIFY_NONE) do
|http| http.get url.request_uri, 'User-Agent' => 'MyLib v1.2'
end
The loading of my page seems to be, at times, somewhat delayed. Short of it being a Database latency issue I assume it's just that sometimes the URL takes a extra time to respond and that this is a synchronous request. What is the best way to make asynchronous requests in Rails, Threads maybe? Thanks.
Have you looked into using a delayed job or Thread.new?
I would move it to a helper method and then call Thread.new on the helper method. Personally, I like using delayed_job for handling things that may present a delay with the user interface.

Streaming Download while File is Created

I was wondering if anyone knows how to stream a file download while its being created at the same time.
I'm generating a huge CSV export and as of right now it takes a couple minutes for the file to be created. Once its created the browser then downloads the file.
I want to change this so that the browser starts downloading the file while its being created. Looking at this progress bar users will be more willing to wait. Even though it would tell me there an “Unknown time remaining” I’m less likely to get impatient since I know data is being steadily downloaded.
NOTE: Im using Rails version 3.0.9
Here is my code:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each(:batch_size => 100 ) do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}",
:stream => true,
end
I sought an answer to this question several weeks ago. I thought that if data was being streamed back to the client then maybe Heroku wouldn't time out one of my long running API calls after 30 seconds. I even found an answer that looked promising:
format.xml do
self.response_body =
lambda { |response, output|
output.write("<?xml version='1.0' encoding='UTF-8' ?>")
output.write("<results type='array' count='#{#report.count}'>")
#report.each do |result|
output.write("""
<result>
<element-1>Data-1</element-1>
<element-2>Data-2</element-2>
<element-n>Data-N</element-n>
</result>
""")
end
output.write("</results>")
}
end
The idea being that the response_body lambda will have direct access to the output buffer going back to the client. However, in practice Rack has its own ideas about what data should be sent back and when. Furthermore this response_body as lambda pattern is deprecated in newer versions of rails and I think support is dropped outright in 3.2. You could get your hands dirty in the middleware stack and write this output as a Rails Metal but......
If I may be so bold, I strongly suggest refactoring this work to a background job. The benefits are many:
Your users will not have to just sit and wait for the download. They can request a file and then browse away to other more exciting portions of your site.
The file generation and download will be more robust, for example, if a user loses internet connectivity, even briefly, on minute three of a download under the current setup, they will lose all that time and need to start over again. If the file is being generated in the background on your site, they only need internet for as long as it takes to get the job started.
It will decrease the load on your front-end processes and may decrease the load on your site in total if the background job generates the files and you provide links to the generated files on a page within your app. Chances are one file generation could serve several downloads.
Since practically all Rails web servers are single threaded and synchronous out of the box, you will have an entire app server process tied up on this one file download for each time a user requests it. This makes it easy for users to accidentally carry out a DoS attack on your site.
You can ship the background generated file to a CDN such as S3 and perhaps gain a performance boost on the download speed your users see.
When the background process is done you can notify the user via email so they don't even have to be at the computer where they initiated the file generation in order to know it's done.
Once you have a background job system in your application you will find many more uses for it, such as sending email or updating search indexing.
Sorry that this doesn't really answer your original question. But I strongly believe this is a better overall solution.

Resources