Rails4 : live or faye or polling - Reload page when background job is finished - ruby-on-rails

I have a background job (Sidekiq) that generate a pdf and save it on S3 with paperclip.
During the process, I present a "waiting page" to the user.
I'm wondering how to auto-reload the waiting page with the pdf once the background job is completed.
I know how to present the pdf inline when it exists (when background job is finished).
I know how to check if it exist on page load, but I need the auto check its presence till success.
I did my homework and checked what are the possible solutions :
Polling - seems to be an outdated solution, not the most efficient one
ActiveRecord::Live::SSE - Seems to be interesting but complex to my new knowledge on "livestreaming". And not compatible with IE (even it's not a blocker, It is still a con)
Faye and websocket - Still, I have difficulties to understand how to implement it regarding my need.
Could you please help me on what is the best technique, and the proper way to implement it? (mostly the timeout part on poll, and the reload part for live::sse and faye). Most examples show chat apps, which make me wonder if I'm on the right track.

Related

Long-call asynchronous data delivery for Rails app?

We have a rails app that does some user-driven/filtered data representation over a large dataset. So we're calculating things on the fly and it takes longer than the 15s Unicorn gives us!
What's the best option here? I was thinking of using a pub/sub model (like a Node/Faye setup) to allow the rails app to send data that the browser could then render.
I guess another option is to try to pre-generate the data, but as we have a lot of clients and very few would be looking at the data it seems like we'd be wasting a lot of time on preparing data that would never be used.
You're on the right track with pre-generating the data.
If you're concerned about needless number crunching and want to do it on-demand, you could kick off a background job to process the data, and poll periodically to see if the background generation is done or not.
If you're looking for a library to do this for you:
Alternatively, if you're using ActionCable already, get_schwifty was built for this very purpose (shameless plug, I'm the author).
render_async is another option if you're not using ActionCable, however, I beleive it still does the processing in a Unicorn process instead of a background job.

How can I retrieve data onscreen from an external API without tying up a request?

I have a Rails 3 application that lets a user perform a search against a 3rd party database via an API. It could potentially bring back quite a bit of XML data. Also, the API could be busy serving requests for other users and have a nontrivial delay in its response time.
I only run 2 webservers so I can't afford to have a delayed request obviously. I use Sidekiq to process long-running jobs, but in all the cases I've needed that I haven't had to return a value to the screen.
I also use Pusher to communicate back to the user when a background job is finished. I am checking it out, but I don't know if it can be used for the kind of data I want to push to the screen. Right now it just pops up dialog boxes with messages that I send it.
I have thought of some pretty kooky stuff, like running the request via Sidekiq, sending the results to a session object or file, then using Pusher to kick off some kind of event to grab the data and populate the screen with it. Seems kind of Rube Goldberg-ish.
I appreciate any help or insight anyone can offer into the problem!
I had a similar situation not long ago and the way I've fixed was using memcache and threads.
I've also thought about using Sidekiq, but Sidekiq is ideal if you don't expect to use the data right away, so memcache and threads worked pretty well and gave us a good amount of control.
Instead of calling the API directly I would assign the API request to a thread and this thread once done would write to memcache, in my case this can happen incrementally, with the same API being able to return more data from the same endpoint until is complete.
From the UI I would have a basic ajax pooling mechanism that would hit a controller and check memcache for data and for the status to see if it was complete or not, this would sign the UI that it need to keep pooling for more data.

C# 5 .NET MVC long async task, progress report and cancel globally

I use ASP.Net MVC 5 and I have a long running action which have to poll webservices, process data and store them in database.
For that I want to use TPL library to start the task async.
But I wonder how to do 3 things :
I want to report progress of this task. For this I think about SignalR
I want to be able to left the page where I start this task from and be able to report the progression across the website (from a panel on the left but this is ok)
And I want to be able to cancel this task globally (from my panel on the left)
I know quite a few about all of technologies involved. But I'm not sure about the best way to achieve this.
Is someone can help me about the best solution ?
The fact that you want to run long running work while the user can navigate away from the page that initiates the work means that you need to run this work "in the background". It cannot be performed as part of a regular HTTP request because the user might cancel his request at any time by navigating away or closing the browser. In fact this seems to be a key scenario for you.
Background work in ASP.NET is dangerous. You can certainly pull it off but it is not easy to get right. Also, worker processes can exit for many reasons (app pool recycle, deployment, machine reboot, machine failure, Stack Overflow or OOM exception on an unrelated thread). So make sure your long-running work tolerates being aborted mid-way. You can reduce the likelyhood that this happens but never exclude the possibility.
You can make your code safe in the face of arbitrary termination by wrapping all work in a transaction. This of course only works if you don't cause non-transacted side-effects like web-service calls that change state. It is not possible to give a general answer here because achieving safety in the presence of arbitrary termination depends highly on the concrete work to be done.
Here's a possible architecture that I have used in the past:
When a job comes in you write all necessary input data to a database table and report success to the client.
You need a way to start a worker to work on that job. You could start a task immediately for that. You also need a periodic check that looks for unstarted work in case the app exits after having added the work item but before starting a task for it. Have the Windows task scheduler call a secret URL in your app once per minute that does this.
When you start working on a job you mark that job as running so that it is not accidentally picked up a second time. Work on that job, write the results and mark it as done. All in a single transaction. When your process happens to exit mid-way the database will reset all data involved.
Write job progress to a separate table row on a separate connection and separate transaction. The browser can poll the server for progress information. You could also use SignalR but I don't have experience with that and I expect it would be hard to get it to resume progress reporting in the presence of arbitrary termination.
Cancellation would be done by setting a cancel flag in the progress information row. The app needs to poll that flag.
Maybe you can make use of message queueing for job processing but I'm always wary to use it. To process a message in a transacted way you need MSDTC which is unsupported with many high-availability solutions for SQL Server.
You might think that this architecture is not very sophisticated. It makes use of polling for lots of things. Polling is a primitive technique but it works quite well. It is reliable and well-understood. It has a simple concurrency model.
If you can assume that your application never exits at inopportune times the architecture would be much simpler. But this cannot be assumed. You cannot assume that there will be no deployments during work hours and that there will be no bugs leading to crashes.
Even if using http worker is a bad thing to run long task I have made a small example of how to manage it with SignalR :
Inside this example you can :
Start a task
See task progression
Cancel task
It's based on :
twitter bootstrap
knockoutjs
signalR
C# 5.0 async/await with CancelToken and IProgress
You can find the source of this example here :
https://github.com/dragouf/SignalR.Progress

RnR: Long running process

I have a part of my application that creates an export file. The export file process is fairly quick for the vast majority of users however, there are users that generate 10,000 or more records. This complicates things. First, the tool that imports the files, blows up on files larger than about 4,000 records. Secondly, the process for 10,000 records takes about 20 minutes. There has a tendency for the users to start doing other things and then for what ever reason, the process seems to time out and they never get their file. However, if you click the process button, and just leave your machine alone, 20 minutes later you will get the file.
I need to make this more user-friendly and robust. Here's my ideas:
1) automatically create separate files of 4,000 a pop
2) provide a status bar for the file generation
3) background the process so a user can click the button and come back say an hour later and download their files
So I have been doing research on the background plugins and gems. Most seem to be fairly out of date, which make me nervous and may seem to be major overkill for what I need. So Spawn seemed to be simple and straight forward but I'm unclear on how to do a status bar for that type of product.
Then we have something like Delayed_job. This seems like it would work but also seems a little heavy but it does provide the hooks to generate some kind of status update. Anyone have an example of this? The README is a little light.
Another issue is the file generation, how do I get this multiple files to download? Anyway, I can store the generated file for the live of the user session?
Finally, most of the solutions are looking like a major change, this issue is painful but technically works. So the time that I am being allotted to solve it is minimal so I am trying to KISS. Thanks for any help and or direction you can provide.
If your looking for background processing job I guess you must look for resque it supereasy run on redis as against delayed_job which poll your databases changes
as per gathering progress info I guess there bunch of resque plugin here one that can help you in the quest
Lastly
Another issue is the file generation, how do I get this multiple files to download? Anyway, I can store the generated file for the live of the user session?
Not sure what you actually meant but if you wanted multiple file to download can zipping into one can help

Ruby/Rails synchronous job manager

hi
i'm going to set up a rails-website where, after some initial user input, some heavy calculations are done (via c-extension to ruby, will use multithreading). as these calculations are going to consume almost all cpu-time (memory too), there should never be more than one calculation running at a time. also i can't use (asynchronous) background jobs (like with delayed job) as rails has to show the results of that calculation and the site should work without javascript.
so i suppose i need a separate process where all rails instances have to queue their calculation requests und wait for the answer (maybe an error message if the queue is full), kind of a synchronous job manager.
does anyone know if there is a gem/plugin with such functionality?
(nanite seemed pretty cool to me, but seems to be only asynchronous, so the rails instances would not know when the calculation is finished. is that correct?)
another idea is to write my own using distributed ruby (drb), but why invent the wheel again if it already exists?
any help would be appreciated!
EDIT:
because of the tips of zaius i think i will be able to do this asynchronously, so i'm going to try resque.
Ruby has mutexes / semaphores.
http://www.ruby-doc.org/core/classes/Mutex.html
You can use a semaphore to make sure only one resource intensive process is happening at the same time.
http://en.wikipedia.org/wiki/Mutex
http://en.wikipedia.org/wiki/Semaphore_(programming)
However, the idea of blocking a front end process while other tasks finish doesn't seem right to me. If I was doing this, I would use a background worker, and then use a page (or an iframe) with the refresh meta tag to continuously check on the progress.
http://en.wikipedia.org/wiki/Meta_refresh
That way, you can use the same code for both javascript enabled and disabled clients. And your web app threads aren't blocking.
If you have a separate process, then you have a background job... so either you can have it or you can't...
What I have done is have the website write the request params to a database. Then a separate process looks for pending requests in the database - using the daemons gem. It does the work and writes the results back to the database.
The website then polls the database until the results are ready and then displays them.
Although I use javascript to make it do the polling.
If you really cant use javascript, then it seems you need to either do the work in the web request thread or make that thread wait for the background thread to finish.
To make the web request thread wait, just do a loop in it, checking the database until the reply is saved back into it. Once its there, you can then complete the thread.
HTH, chris

Resources