Long processing; way to periodically send a 102 Processing response? - ruby-on-rails

I have a Rails app that can take a long time to prepare its response to some queries. (Mostly the delay is rendering the dataset into JSON or YAML.) The app sits behind a proxy whose configuration I cannot alter, with the result that these long-running queries tend to get terminated by the proxy as timeouts. Chunking doesn't help because there's nothing to chunk until the render is fully complete.
Is there any supported or already existing way in Rails to set up an asynchronous repeating task that could send back 102 Processing responses to keep the proxy happy until the complete response is ready?
I would really prefer not to have to implement pagination semantics.
I have control over the app and the client; both bits are my code. I don't have control over the proxy, nor the app's server.
Any suggestions are really welcome!

I would likely solve the problem by POSTing the initial request and having the rails app return the appropriate HTTP status code. Then I'd have javascript on the client side that would poll the server at reasonable intervals for the status of the render. The status action could return the 102 response until the processing is complete. Then you could insert a link into the page with the javascript that the user could click to download the finished file.

Related

Rails API, microservices, async/deferred responses

I have a Rails API which can handle requests from the clients. Clients use that API to perform analysis of their data. Client POSTs the data to API, API checks if that data have been analysed before. If so API just respond with analysis result. If the data haven't been analyzed before API:
Tells client that analysis started.
Establishes the connection with analyzing microservice.
Performs asynchronous (or deferred or i don't know) request to the analyzing microservice and waiting for response. The analysis takes much time so neither the API nor the microservice should be blocked while doing it.
When the response from analyzing microservice is returned API hands it to the client.
The main issue for me is to set up things such way that client could receive somehow the message "Your data had been sent to analysis" right after he performed the request. And then when analysis will be done client could receive its result.
The question is what approach I have to use in that case? Async responses, deferred responses, something else? And what known solutions could help me with that? Any gems?
I'm new to that stuff so I'm really sorry if I ask dumb questions.
If using HTTP you can only have one response to every request. To send multiple responses, i.e. "work in progress", then later the "results", you would need to use a different protocol, e.g. web sockets.
Since HTTP is so very common I'd stick with that in combination with background jobs. There are a couple of options which spring to mind.
Polling: The API kicks off a background jobs (to call the microservice) and responds to the client with a URL which the client can ping periodically for the result. The URL would respond with some kind of "work in progress" status until the result is actually ready). The URL would need to include some kind of id so the API can lookup the background job.
The API would potentially have two URLS; /api/jobs/new and /api/jobs/<ID>. They would, in Rails, map to a controller new and show action.
Webhooks: Have the client include a URL of its own in the request. Once the result is available have the background job hit the given URL with the result.
Either way, if using HTTP, you will not be able to handle the whole thing within a request/response, you will have to use some kind of background processing (so request to the microservice happens in a different process). You could look at Sidekiq, for example.
Here is an example for polling:
URL: example.com/api/jobs/new
web app receives client request
generates a unique id for the request, SecureRandom.uuid.
starts a background job (Sidekiq) passing in the uuid and any other parameters needed
respond with URL such as example.com/api/jobs/
--
background job
sends request to microservice API and waits for response
saves result to database with uuid
--
URL: example.com/api/jobs/UUID
look in database for UUID, if not found respond that job is "in progress". If found return result found in database.
Depending on what kind of API you use. I assume your clients interact via HTTP.
If you want to build an asynchronous API over HTTP the first thing that you should do: accept the request, create a job, handle it in the background and immediately return.
For the client to get the response you have to 2 options:
Implement a status endpoint where clients can periodically poll the status of the job
Implement a callback via webhooks. So the client has to provide a URL which you then call after you're done.
A good start for background processing is the sidekiq gem or more general ActiveJob that ships with Rails.

Foreground or background image manipulations in Rails (Jruby, Torquebox)

I have uploading photos with ajax, manipulations and uploading to s3 take a lot of time. I heard that it's better to complete that tasks on background. My app need to wait while photos become uploaded. But if I choose background way then I will need to work with websockets or repeat ajax to check result(links to s3) ( I'm not happy about this).
Why is it too bad to make hard calculations right in controller (foreground)?
Now I use Torquebox(Jruby) and as I understand it has perfect concurrency. Does it mean that waiting uploading to s3 will not take resources and all will work fine?
Please write about pros and cons of back/fore ground in my situation. Thank you!
It is generally considered bad practice to block a web request handler on a network request to a third party service. If that service should become slow or unavailable, this can clog up all your web processes, regardless of what ruby you are using. This is what you are referring to as 'foreground.'
Essentially this is the flow of your current setup (in foreground):
a user uploads an image on your site and your desired controller receives the request.
Your controller makes a synchronous request to s3. This is a blocking request.
Your controller waits
Your controller waits
Your controller (continues) to wait
finally, (and this is not guaranteed) you receive a response from s3 and your code continues and renders your given view/json/text/etc.
Clearly steps 3-5 are very bad news for your server, and as I stated earlier, this worker/thread/process (Depending on your ruby/rails server framework) will be 'held up' until the response from s3 is received (which potentially could never happen).
Here is the same flow with a background job with some javascript help on the front-end for notification:
a user uploads an image on your site and your desired controller receives the request.
Your controller creates a new thread/process to make the request to s3. This is a non-blocking approach. You set a flag on a record that references your s3 image src, for example completed: false and your code continues nicely to step 3. Your new thread/process will be the one waiting for a response from s3 now, and you will set the 'completed' flag to true when s3 responds.
You render your view/json/text/etc, and inherently release your worker/thread/process for this request...good news!
now for the fun front end stuff:
your client receives your response, triggering your front-end javascript to start a setInterval-like repetitive function that 'pings' your server every 3-ish seconds, where your back-end controller checks to see if the 'completed' flag that you set earlier is true, and if so, respond/render true.
your client side javascript receives your response and either continues to ping (until you designate that it should give up) or stop pinging because your app responded true.
I hope this sets you on the right path. I figured writing code for this answer was inferior because it seemed like you were looking for pros and cons. For actual implementation ideas, I would look into the following:
the sidekiq is excellent for solving the background job issues described here. It will handle creating the new process where you can make the request to s3.
here is an excellent railscast that will help you get a better understanding of the code.

HTTP disconnect/timeout between request and response handling

Assume following scenario:
Client is sending HTTP POST to server
Request is valid and
have been processed by server. Data has been inserted into database.
Web application is responding to client
Client meets timeout
and does not see HTTP response.
In this case we meet situation where:
- client does not know if his data was valid and been inserted properly
- web server (rails 3.2 application) does not show any exception, no matter if it is behind apache proxy or not
I can't find how to handle such scenario in HTTP documentation. My question are:
a) should client expect that his data MAY be processed already? (so then try for example GET request to check if data has been submitted)
b) if not (a) - should server detect it? is there possibility to do it in rails? In such case changes can be reversed. In such case i would expect some kind of expection from rails application but there is not...
HTTP is a stateless protocol: Which means by definition you cannot know on the client side that the http-verb POST has succeeded or not.
There are some techniques that web applications use to overcome this HTTP 'feature'. They include.
server side sessions
cookies
hidden variables within the form
However, none of these are really going to help with your issue. When I have run into these types of issues in the past they are almost always the result of the server taking too long to process the web request.
There is a really great quote to that I whisper to myself on sleepless nights:
“The web request is a scary place, you want to get in and out as quick
as you can” - Rick Branson
You want to be getting into and out of your web request in 100 - 500 ms. You meet those numbers and you will have a web application that will behave well/play well with web servers.
To that end I would suggest that you investigate how long your post's are taking and figure out how to shorten those requests. If you are doing some serious processing on the server side before doing dbms inserts you should consider handing those off to some sort of tasking/queuing system.
An example of 'serious processing' could be some sort of image upload, possibly with some image processing after the upload.
An example of a tasking and queuing solution would be: RabbitMQ and Celery
An example solution to your problem could be:
insert a portion of your data into the dbms ( or even faster some NoSQL solution )
hand off the expensive processing to a background task.
return to the user/web-client. ( even tho in the background the task is still running )
listen for the final response with ( polling, streaming or websockets) This step is not a trivial undertaking but the end result is well worth the effort.
Tighten up those web request and it will be a rare day that your client does not receive a response.
On that rare day that the client does not receive the data: How do you prevent multiple posts... I don't know anything about your data. However, there are some schema related things that you can do to uniquely identify your post. i.e. figure out on the server side if the data is an update or a create.
This answer covers some of the polling / streaming / websockets techniques you can use.
You can handle this with ajax and jQuery as the documentation of complete callback explains below:
Complete
Type: Function( jqXHR jqXHR, String textStatus )
A function to be called when the request finishes (after success and error callbacks are executed). The function gets passed two arguments: The jqXHR (in jQuery 1.4.x, XMLHTTPRequest) object and a string categorizing the status of the request ("success", "notmodified", "error", "timeout", "abort", or "parsererror").
Jquery ajax API
As for your second question, is their away to handle this through rails the answer is no as the timeout is from the client side and not the server side however to revert the changes i suggest using one of the following to detect is the user still online or not
http://socket.io/
websocket-rails

Correct response to POST request for long running process

I am trying to code an API which has a long running process to which an end user may make a POST request:
POST /things { "some":"json" }
The actual creation process can take some time, will often be queued. It might take many minutes. As a result I am not sure what I should be returning or when. Is it the usual 201 plus object, returned after whatever time it takes my API to create the object? Isn't this going to cause problems at the client end? Is there some other standard way to do this - such as an intermediate step?
I'm using Rails & Grape for my API if that helps.
Consider whether the Post-Redirect-Get pattern suits your needs. For example, you can return a 303 redirect to some sort of status page where the client can check the progress of the request. In general, 201+object is a poor choice if the client has to wait for any appreciable period, because too many things can go wrong (what if out of annoyance or impatience he kills the browser window, or refreshes, or resubmits?)

Canceling a request when connection to client is lost

I noticed that in a standard grails environment, a request is always executed to the end, even when the client connection is lost and the result can't be delivered anymore.
Is there a way to configure the environment in such a way that execution of a request is canceled as soon as the client connection is lost?
Update: Thanx fo the answers. Yes - most of the problems I am trying to avoid can be avoided by better coding:
caching can make nearly every page fast
a token can help to avoid submitting something twice
but there are some requests which still could consume some time. Let's take a map service as example. Calculating a route will take some time. One solution to avoid resubmitting the request could be a "calculationInProgress" flag together with a message to the user. But then it is still possible to create a lot of sessions and thus a lot of requests in order to do a DOS attack...
I am still curious: is there no way to configure the server to cancel the request? I used to develop on a system where the server behaved this way and it was great :-)
Probably there is no such way. And I'm sure grails (and your webcontainer) is designed to
accept incoming request
process it on server side
send response
if something happened during phase 2, i'll know about it only on send response phase. Actually you can send data to HttpSerlvetRespone by yourself, handle IOException, etc - but it will be too much low-level way, I think. And it will not help you with canceling your DB operations, while you're preparing data to send.
Btw, it's common pattern to use an web frontend, like nginx, that accepts incomming request and and handle all this problems with cancelled requests, slow requests (i guess it's the real problem?), etc.
According to your comment it is reload and multiple clicks that you are trying to avoid. The proper technique should be to use Grails support for handling multiple form submissions:
http://grails.org/doc/2.0.x/guide/theWebLayer.html#formtokens

Resources