Foreground or background image manipulations in Rails (Jruby, Torquebox) - ruby-on-rails

I have uploading photos with ajax, manipulations and uploading to s3 take a lot of time. I heard that it's better to complete that tasks on background. My app need to wait while photos become uploaded. But if I choose background way then I will need to work with websockets or repeat ajax to check result(links to s3) ( I'm not happy about this).
Why is it too bad to make hard calculations right in controller (foreground)?
Now I use Torquebox(Jruby) and as I understand it has perfect concurrency. Does it mean that waiting uploading to s3 will not take resources and all will work fine?
Please write about pros and cons of back/fore ground in my situation. Thank you!

It is generally considered bad practice to block a web request handler on a network request to a third party service. If that service should become slow or unavailable, this can clog up all your web processes, regardless of what ruby you are using. This is what you are referring to as 'foreground.'
Essentially this is the flow of your current setup (in foreground):
a user uploads an image on your site and your desired controller receives the request.
Your controller makes a synchronous request to s3. This is a blocking request.
Your controller waits
Your controller waits
Your controller (continues) to wait
finally, (and this is not guaranteed) you receive a response from s3 and your code continues and renders your given view/json/text/etc.
Clearly steps 3-5 are very bad news for your server, and as I stated earlier, this worker/thread/process (Depending on your ruby/rails server framework) will be 'held up' until the response from s3 is received (which potentially could never happen).
Here is the same flow with a background job with some javascript help on the front-end for notification:
a user uploads an image on your site and your desired controller receives the request.
Your controller creates a new thread/process to make the request to s3. This is a non-blocking approach. You set a flag on a record that references your s3 image src, for example completed: false and your code continues nicely to step 3. Your new thread/process will be the one waiting for a response from s3 now, and you will set the 'completed' flag to true when s3 responds.
You render your view/json/text/etc, and inherently release your worker/thread/process for this request...good news!
now for the fun front end stuff:
your client receives your response, triggering your front-end javascript to start a setInterval-like repetitive function that 'pings' your server every 3-ish seconds, where your back-end controller checks to see if the 'completed' flag that you set earlier is true, and if so, respond/render true.
your client side javascript receives your response and either continues to ping (until you designate that it should give up) or stop pinging because your app responded true.
I hope this sets you on the right path. I figured writing code for this answer was inferior because it seemed like you were looking for pros and cons. For actual implementation ideas, I would look into the following:
the sidekiq is excellent for solving the background job issues described here. It will handle creating the new process where you can make the request to s3.
here is an excellent railscast that will help you get a better understanding of the code.

Related

In Rails 3, how do I call some code via a controller but completely after the Request/Response cycle is done?

I have a very weird situation: I have a system where a client app (Client) makes an HTTP GET call to my Rails server, and that controller does some handling and then needs to make a separate call to the Client via a different pathway (i.e. it actually goes via Rabbit to a proxy and the proxy calls the Client). I can't change the pathway for that different call and I can't change the Client at all (it's a 3rd party system).
However: the issue is: the call via the different pathway fails UNLESS the HTTP GET from the client is completed.
So I'm trying to figure out: is there a way to have Rails finish the HTTP GET response and then make this additional call?
I've tried:
1) after_filter: this doesn't work because the after filter is apparently still within the Request/Response cycle so the TCP/HTTP response back to the Client hasn't completed.
2) enqueuing a worker: this works, but it is not ideal because if the workers are backed up, this call back to the client may not happen right away and it really does need to happen right after the Client calls the Rails app
3) starting a separate thread: this may work, but it makes me nervous: adding threading explicitly in Rails could be fraught with peril.
I welcome any ideas/suggestions.
Again, in short, the goal is: process the HTTP GET call to the Rails app and return a 200 OK back to the Client, completely finishing the HTTP request/response cycle and then call some extra code
I can provide any further details if that would help. I've found both #1 and #2 as recommended options but neither of them are quite what I need.
Ideally, there would be some "after_response" callback in Rails that allows some code to run but after the full request/response cycle is done.
Possibly use an around filter? Around filters allow us to define methods that wrap around every action that rails calls. So if I had an around filter for the above controller, I could control the execution of every action, execute code before calling the action, and after calling it, and also completely skip calling the action under certain circumstances if I wanted to.
So what I ended up doing was using a gem that I had long ago helped with: Spawnling
It turns out that this works well, although it required a tweak to get it working with Rails 3.2. It allows me to spawn a thread to do the extra, out-of-band callback to the Client, but let the normal, controller process complete. And I don't have to worry about thread management, or AR connection management. Spawnling handles that.
It's still not ideal, but pretty close. And it's slightly better than enqueuing a Resque/Sidekiq worker as there's no risk of worker backlog causing an unexpected delay.
I still wish there was an "after_response_sent" callback or something, but I guess this is too unusual a request.

Long processing; way to periodically send a 102 Processing response?

I have a Rails app that can take a long time to prepare its response to some queries. (Mostly the delay is rendering the dataset into JSON or YAML.) The app sits behind a proxy whose configuration I cannot alter, with the result that these long-running queries tend to get terminated by the proxy as timeouts. Chunking doesn't help because there's nothing to chunk until the render is fully complete.
Is there any supported or already existing way in Rails to set up an asynchronous repeating task that could send back 102 Processing responses to keep the proxy happy until the complete response is ready?
I would really prefer not to have to implement pagination semantics.
I have control over the app and the client; both bits are my code. I don't have control over the proxy, nor the app's server.
Any suggestions are really welcome!
I would likely solve the problem by POSTing the initial request and having the rails app return the appropriate HTTP status code. Then I'd have javascript on the client side that would poll the server at reasonable intervals for the status of the render. The status action could return the 102 response until the processing is complete. Then you could insert a link into the page with the javascript that the user could click to download the finished file.

Triggering a SWF Workflow based on SQS messages

Preamble: I'm trying to put together a proposal for what I assume to be a very common use-case, and I'd like to use Amazon's SWF and SQS to accomplish my goals. There may be other services that will better match what I'm trying to do, so if you have suggestions please feel free to throw them out.
Problem: The need at its most basic is for a client (mobile device, web server, etc.) to post a message that will be processed asynchronously without a response to the client - very basic.
The intended implementation is to for the client to post a message to a pre-determined SQS queue. At that point, the client is done. We would also have a defined SWF workflow responsible for picking up the message off the queue and (after some manipulation) placing it in a Dynamo DB - again, all fairly straightforward.
What I can't seem to figure out though, is how to trigger the workflow to start. From what I've been reading a workflow isn't meant to be an indefinite process. It has a start, a middle, and an end. According to the SWF documentation, a workflow can run for no longer than a year (Setting Timeout Values in SWF).
So, my question is: If I assume that a workflow represents one message-processing flow, how can I start the workflow whenever a message is posted to the SQS?
Caveat: I've looked into using SNS instead of SQS as well. This would allow me to run a server that could subscribe to SNS, and then start the workflow whenever a notification is posted. That is certainly one solution, but I'd like to avoid setting up a server for a single web service which I would then have to manage / scale according to the number of messages being processed. The reason I'm looking into using SQS/SWF in the first place is to have an auto-scaling system that I don't have to worry about.
Thank you in advance.
I would create a worker process that listens to the SQS queue. Upon receiving a message it calls into SWF API to start a workflow execution. The workflow execution id should be generated based on the message content to ensure that duplicated messages do not result in duplicated workflows.
You can use AWS Lambda for this purpose . A lambda function will be invoked by SQS event and therefore you don't have to write a queue poller explicitly . The lambda function could then make a post request to SWF to initiate the workflow

Canceling a request when connection to client is lost

I noticed that in a standard grails environment, a request is always executed to the end, even when the client connection is lost and the result can't be delivered anymore.
Is there a way to configure the environment in such a way that execution of a request is canceled as soon as the client connection is lost?
Update: Thanx fo the answers. Yes - most of the problems I am trying to avoid can be avoided by better coding:
caching can make nearly every page fast
a token can help to avoid submitting something twice
but there are some requests which still could consume some time. Let's take a map service as example. Calculating a route will take some time. One solution to avoid resubmitting the request could be a "calculationInProgress" flag together with a message to the user. But then it is still possible to create a lot of sessions and thus a lot of requests in order to do a DOS attack...
I am still curious: is there no way to configure the server to cancel the request? I used to develop on a system where the server behaved this way and it was great :-)
Probably there is no such way. And I'm sure grails (and your webcontainer) is designed to
accept incoming request
process it on server side
send response
if something happened during phase 2, i'll know about it only on send response phase. Actually you can send data to HttpSerlvetRespone by yourself, handle IOException, etc - but it will be too much low-level way, I think. And it will not help you with canceling your DB operations, while you're preparing data to send.
Btw, it's common pattern to use an web frontend, like nginx, that accepts incomming request and and handle all this problems with cancelled requests, slow requests (i guess it's the real problem?), etc.
According to your comment it is reload and multiple clicks that you are trying to avoid. The proper technique should be to use Grails support for handling multiple form submissions:
http://grails.org/doc/2.0.x/guide/theWebLayer.html#formtokens

How to do an ajax callback after a Delayed Job has finished in Ruby on Rails?

I allow users on my site to rotate their photos. I accomplish this by an ajax call to a Delayed_Job process (via Heroku) that rotates the photo. After they press "rotate photo", I show a loading spinner. But my question is this: what is the best way for my page to know when the Delayed_Job is complete, so I can load the new photo?
Do I need to have a continuous ajax polling of my server to determine if the Delayed Job is complete? Or is there any way I can implement an ajax callback to my page that will notify my page when the Delayed Job has finished?
Thanks in advance.
There's a bunch of ways to deal with this kind of thing. You could do ajax polling as you've mentioned, you could use the comet approach where you essentially leave a connection open until whatever it is on the server has completed, or you could even go all out and use web sockets (probably a bit overkill for this task though).
Without sockets, there's currently no way to have your server send a message to the client, without the client requesting it.
In any case, you should decide whether the need or want to background the task warrants all the extra work of dealing with the polling/comet/sockets. Rotating an image shouldn't take long at all. Depending on whether you can afford to lock up a server process, it'd be a lot simpler to just do the image manipulation in the foreground (not delayed_job). Then, when the ajax request to that action has completed, you know the task is completed.

Resources