How can I retrieve data onscreen from an external API without tying up a request? - ruby-on-rails

I have a Rails 3 application that lets a user perform a search against a 3rd party database via an API. It could potentially bring back quite a bit of XML data. Also, the API could be busy serving requests for other users and have a nontrivial delay in its response time.
I only run 2 webservers so I can't afford to have a delayed request obviously. I use Sidekiq to process long-running jobs, but in all the cases I've needed that I haven't had to return a value to the screen.
I also use Pusher to communicate back to the user when a background job is finished. I am checking it out, but I don't know if it can be used for the kind of data I want to push to the screen. Right now it just pops up dialog boxes with messages that I send it.
I have thought of some pretty kooky stuff, like running the request via Sidekiq, sending the results to a session object or file, then using Pusher to kick off some kind of event to grab the data and populate the screen with it. Seems kind of Rube Goldberg-ish.
I appreciate any help or insight anyone can offer into the problem!

I had a similar situation not long ago and the way I've fixed was using memcache and threads.
I've also thought about using Sidekiq, but Sidekiq is ideal if you don't expect to use the data right away, so memcache and threads worked pretty well and gave us a good amount of control.
Instead of calling the API directly I would assign the API request to a thread and this thread once done would write to memcache, in my case this can happen incrementally, with the same API being able to return more data from the same endpoint until is complete.
From the UI I would have a basic ajax pooling mechanism that would hit a controller and check memcache for data and for the status to see if it was complete or not, this would sign the UI that it need to keep pooling for more data.

Related

Long-call asynchronous data delivery for Rails app?

We have a rails app that does some user-driven/filtered data representation over a large dataset. So we're calculating things on the fly and it takes longer than the 15s Unicorn gives us!
What's the best option here? I was thinking of using a pub/sub model (like a Node/Faye setup) to allow the rails app to send data that the browser could then render.
I guess another option is to try to pre-generate the data, but as we have a lot of clients and very few would be looking at the data it seems like we'd be wasting a lot of time on preparing data that would never be used.
You're on the right track with pre-generating the data.
If you're concerned about needless number crunching and want to do it on-demand, you could kick off a background job to process the data, and poll periodically to see if the background generation is done or not.
If you're looking for a library to do this for you:
Alternatively, if you're using ActionCable already, get_schwifty was built for this very purpose (shameless plug, I'm the author).
render_async is another option if you're not using ActionCable, however, I beleive it still does the processing in a Unicorn process instead of a background job.

C# 5 .NET MVC long async task, progress report and cancel globally

I use ASP.Net MVC 5 and I have a long running action which have to poll webservices, process data and store them in database.
For that I want to use TPL library to start the task async.
But I wonder how to do 3 things :
I want to report progress of this task. For this I think about SignalR
I want to be able to left the page where I start this task from and be able to report the progression across the website (from a panel on the left but this is ok)
And I want to be able to cancel this task globally (from my panel on the left)
I know quite a few about all of technologies involved. But I'm not sure about the best way to achieve this.
Is someone can help me about the best solution ?
The fact that you want to run long running work while the user can navigate away from the page that initiates the work means that you need to run this work "in the background". It cannot be performed as part of a regular HTTP request because the user might cancel his request at any time by navigating away or closing the browser. In fact this seems to be a key scenario for you.
Background work in ASP.NET is dangerous. You can certainly pull it off but it is not easy to get right. Also, worker processes can exit for many reasons (app pool recycle, deployment, machine reboot, machine failure, Stack Overflow or OOM exception on an unrelated thread). So make sure your long-running work tolerates being aborted mid-way. You can reduce the likelyhood that this happens but never exclude the possibility.
You can make your code safe in the face of arbitrary termination by wrapping all work in a transaction. This of course only works if you don't cause non-transacted side-effects like web-service calls that change state. It is not possible to give a general answer here because achieving safety in the presence of arbitrary termination depends highly on the concrete work to be done.
Here's a possible architecture that I have used in the past:
When a job comes in you write all necessary input data to a database table and report success to the client.
You need a way to start a worker to work on that job. You could start a task immediately for that. You also need a periodic check that looks for unstarted work in case the app exits after having added the work item but before starting a task for it. Have the Windows task scheduler call a secret URL in your app once per minute that does this.
When you start working on a job you mark that job as running so that it is not accidentally picked up a second time. Work on that job, write the results and mark it as done. All in a single transaction. When your process happens to exit mid-way the database will reset all data involved.
Write job progress to a separate table row on a separate connection and separate transaction. The browser can poll the server for progress information. You could also use SignalR but I don't have experience with that and I expect it would be hard to get it to resume progress reporting in the presence of arbitrary termination.
Cancellation would be done by setting a cancel flag in the progress information row. The app needs to poll that flag.
Maybe you can make use of message queueing for job processing but I'm always wary to use it. To process a message in a transacted way you need MSDTC which is unsupported with many high-availability solutions for SQL Server.
You might think that this architecture is not very sophisticated. It makes use of polling for lots of things. Polling is a primitive technique but it works quite well. It is reliable and well-understood. It has a simple concurrency model.
If you can assume that your application never exits at inopportune times the architecture would be much simpler. But this cannot be assumed. You cannot assume that there will be no deployments during work hours and that there will be no bugs leading to crashes.
Even if using http worker is a bad thing to run long task I have made a small example of how to manage it with SignalR :
Inside this example you can :
Start a task
See task progression
Cancel task
It's based on :
twitter bootstrap
knockoutjs
signalR
C# 5.0 async/await with CancelToken and IProgress
You can find the source of this example here :
https://github.com/dragouf/SignalR.Progress

Using AFNetworking to process multiple JSON responses for a single request

I'm trying to find a way to open up a connection to a web service and have that service send down JSON objects on an as-needed basis.
Say I request 20 profiles from a service. Instead of waiting for the service to build all 20, the service would build the first profile and throw it back down to the client until all 20 are created.
I've been using AFNetworking and would like to continue using it. Eventually I'd like to contribute this component back to the community if it requires an addition.
Anyone have any ideas on tackling something like this? Right now I have a service pushing JSON every few seconds to test with.
A couple of thoughts:
If you want to open a connection and respond to transmissions from the server, socket-based model seems to make sense. See Ray Wenderlich's How To Create A Socket Based iPhone App and Server for an example (the server-side stuff is likely to change based upon your server architecture, but it gives you an example). But AFNetworking is built on a NSURLConnection framework, not a socket framework, so if you wanted to integrate your socket classes into that framework, a non-inconsiderable amount of work would be involved.
Another, iOS-specific model is to use Apple's push notification service (see the push-related sections of the Local and Push Notification Programming Guide).
A third approach would be to stay with a pull mechanism, but if you're looking for a way to consume multiple feeds in a non-serial fashion would be to create multiple AFURLConnectionOperation (or the appropriate subclass) operations, and submit them concurrently (you may want to constraint maxConcurrentOperations on the queue to 4 or 5 as iOS can only have so many concurrent network operations). By issuing these concurrently, you mitigate many of the delays that result from network latencies. If you pursue this approach, some care might have to be taken for thread safety, but it's probably easier than the above two techniques.
This sounds like a job for a socket (or a web socket, whatever is easier).
I don't believe there is support for this in AF. This could be implemented in the NSURLConnection's didRecieveData method. This is triggered every time a piece of data is received, so you can do your parsing and messaging from that point. Unfortunately, I can't think of a very clean way to implement this.
Perhaps a better approach to this is to handle the appropriate rerequest via a pagination-style technique. You would request page 1 of profiles with 1/page, then request page 2, etc. You could then control the flow, i.e. if you want to request all in paralel or request one then the next sequentially. This would be less work to implement, and would (in my opinion) be cleaner and easier to maintain.
AFNetworking supports batching of requests with AFHTTPClient -enqueueBatchOfHTTPRequestOperations:progressBlock:completionBlock:.
You can use this method to return on each individual operation, as well as when all of the operations in the batch have finished.

Prevent request timeout with long requests

I have a Rails Controller on Heroku where I send emails in a loop, and respond to the user with some information on which email address the emails were sent to.
While this works when only a few (~40) emails need to be sent out, the request times out when more there are more than just a few emails to be sent out (e.g. > 40).
Heroku states in their guides that requests must respond with at least one byte within 30 seconds: https://devcenter.heroku.com/articles/request-timeout
While I know this is not the best way to achieve this, I'm currently trying to figure out how to do this in Ruby.
If this were a PHP app, I could do an echo before entering the loop, and then keep echoing something in every iteration. How do I achieve something similar in rails?
Your best bet is to not do the mailing before sending the response back. You will have better luck first adding the job to one of Heroku's many available worker queues, then kicking to a monitoring page that displays the job progress and updates itself periodically. If you are trying to avoid using one of those queue services, for budget reasons, you may be able to accomplish the same thing using a new thread, instead of a queue. Either way, this technique will scale better, as well as being able to recover from failure more easily as well.
As you appear to already know that your proposed solution is not the ideal solution, I will also try to answer your exact question. You may be able to make HTTP streaming work for this. I would recommend checking out http://railscasts.com/episodes/266-http-streaming.

Ruby/Rails synchronous job manager

hi
i'm going to set up a rails-website where, after some initial user input, some heavy calculations are done (via c-extension to ruby, will use multithreading). as these calculations are going to consume almost all cpu-time (memory too), there should never be more than one calculation running at a time. also i can't use (asynchronous) background jobs (like with delayed job) as rails has to show the results of that calculation and the site should work without javascript.
so i suppose i need a separate process where all rails instances have to queue their calculation requests und wait for the answer (maybe an error message if the queue is full), kind of a synchronous job manager.
does anyone know if there is a gem/plugin with such functionality?
(nanite seemed pretty cool to me, but seems to be only asynchronous, so the rails instances would not know when the calculation is finished. is that correct?)
another idea is to write my own using distributed ruby (drb), but why invent the wheel again if it already exists?
any help would be appreciated!
EDIT:
because of the tips of zaius i think i will be able to do this asynchronously, so i'm going to try resque.
Ruby has mutexes / semaphores.
http://www.ruby-doc.org/core/classes/Mutex.html
You can use a semaphore to make sure only one resource intensive process is happening at the same time.
http://en.wikipedia.org/wiki/Mutex
http://en.wikipedia.org/wiki/Semaphore_(programming)
However, the idea of blocking a front end process while other tasks finish doesn't seem right to me. If I was doing this, I would use a background worker, and then use a page (or an iframe) with the refresh meta tag to continuously check on the progress.
http://en.wikipedia.org/wiki/Meta_refresh
That way, you can use the same code for both javascript enabled and disabled clients. And your web app threads aren't blocking.
If you have a separate process, then you have a background job... so either you can have it or you can't...
What I have done is have the website write the request params to a database. Then a separate process looks for pending requests in the database - using the daemons gem. It does the work and writes the results back to the database.
The website then polls the database until the results are ready and then displays them.
Although I use javascript to make it do the polling.
If you really cant use javascript, then it seems you need to either do the work in the web request thread or make that thread wait for the background thread to finish.
To make the web request thread wait, just do a loop in it, checking the database until the reply is saved back into it. Once its there, you can then complete the thread.
HTH, chris

Resources