How to implement a multithreaded infinite-task with subthreads in Rails - ruby-on-rails

Inside my Rails 4 application I need to make API calls to a webservice where I can ask for an instance of stock quotes. I can ask for 1000 stocks in one request and four requests at a time which is the throttle limit.
The workflow goes like this:
A user makes a request to my app and wants to get quotes for 12000 stocks.
So I chunk it into twelve requests and put them in a Queue.
At application start I start a thread in a loop which is supposed to look at the queue and since I am allowed to make concurrent requests I'd like to make 4 requests in parallel.
I get stuck in several ways. First of all I need to take into consideration that I can get multiple requests of 12000 stocks at a time since different users can trigger the same request.
Second, I ll use the Thin web server wjich is multithreaded. So I guess I have to use a Mutex.
How can this be achieved?

Queues are already threadsafe data structures, so you don't need a mutex to work with them.
You'd just start 4 threads at the start of your app, which each poll the queue for work, do some work, and then do something (which is up to you) to notify the DB and/or user that the work is complete. The actual workers will be something like:
work_queue = Queue.new
4.times do
Thread.new do
loop do
job = work_queue.pop
# Do some work with the job
end
end
end
Queue#pop blocks until data is available, so you can just fire off those queues and the first thread waiting for data will get the job when it's pushed in, the next thread will get the next job, and so on. While there are no worker threads available, jobs will accumulate in the queue.
What you actually do with the output of the job is probably the more interesting question here, but you haven't defined what should happen when the quotes are retrieved.
You might also look at Sidekiq.

Related

Rails Controller - Return and then perform complex action (without worker)

I want to create an API endpoint in my app that takes data and performs a complex, time-consuming operation with it, but only after returning that the data has been received.
As in, I'd love to be able to do something like this:
def action
render json: { data_received: params[:whatever].present? }
perform_calculations( params[:whatever] )
end
Unfortunately, as I understand it, Ruby/Rails is synchronous, and requires that a controller action end in a render/redirect/head statement of some sort.
Ordinarily, I'd think of accomplishing this with a background worker, like so:
def action
DelayedJobActionPerformer.perform_later(params[:whatever])
render { data_received: params[:whatever].present? }
end
But a worker costs (on Heroku) a fair amount of monthly money for a beginning app like this, and I'm looking for alternatives. Is there any alternative to background workers you can think of to return from the action and then perform the behavior?
I'm thinking of maybe creating a separate Node app or something that can start an action and then respond, but that's feeling ridiculous. I guess the architecture in my mind would involve a main Rails app which performs most of the behavior, and a lightweight Node app that acts as the API endpoint, which can receive a request, respond that it's been received, and then send on the data to be performed by that first Rails app, or another. But it feels excessive, and also like just kicking the problem down the road.
At any rate, whether or not I end up having to buy a worker or few, I'd love to know if this sort of thing is feasible, and whether using an external API as a quasi-worker makes sense (particularly given the general movement towards breaking up application concerns).
Not really...
Well you can spawn a new thread:
thread = Thread.new { perform_calculations( params[:whatever] ) }
And not call thread.join, but that is highly unreliable, because that thread will be killed if the main thread terminates.
I don't know how things with cron jobs are in Heroku, but another option is to have a table with pending jobs where you save params[:whatever] and have a rake task that is triggered with cron periodically to check and perform any pending tasks. This solution is a (really) basic worker implementation.
Heard about sucker_punch, you can give it a try. This will run in single webprocess but the downside is that if the web processes is restarted and there are jobs that haven't yet been processed, they will be lost. So not recommended for critical background tasks.

Use Sidekiq to keep a cache of results full

Say I have a particularly expensive calculation to perform during a specific user request. The plus side is that this calculation can be performed ahead of time, and pushed in a general queue for people to pull from.
Is there a way to use Sidekiq in a Ruby/Rails backend to keep this cache of results full to a certain level? Where would I store the results of this calculation?
e.g.
On server load, calculate 20 sets of results, and cache somewhere.
On user request, pop off a result to allow for immediate server response.
Regenerate one set of results in the background to fill back up to 20 in the queue.
Obviously may need to use a different number than 20 depending on how long the computation takes, and rate of user requests, but I think you get the idea.
I'm curious to know what kind of calculation actually fits this profile but that's not really important.
Since you are using Sidekiq (or would like to use Sidekiq) it means you have a Redis database. A Redis database is a great place to put this kind of info.
So you can just create a LIST in Redis of your results. During application startup fire of 20 sidekiq jobs to create your calculations. The worker doing the calculation can push the result onto the list in Redis.
As you handle requests, just pop a result off the list and queue another sidekiq job to make yourself a new calculation.

Is it possible to string / queue Ruby actions?

I've written a number of actions in a RoR app, that perform different actions within process.
E.g.
- One action communicates with a third party service using their API and collects data.
- Another processes this data and places it into a relevant database.
- Another takes this new data and formats it in a specific way.
etc..
I would like to fire off the process at timed intervals, eg. Each hour. But I don't want to do the whole thing each time.
Sometimes I may just want to do the first two actions. At other times, I might want to do each part of the process.
So have one action run, and then when it's finished call another action. ETC..
The actions could take up to an hour to complete, if not longer, so I need a solution that won't timeout.
What would be the best way to achieve this?
You have quite a few options for processing jobs in the background:
Sidekiq: http://mperham.github.io/sidekiq/
Queue Classic: https://github.com/ryandotsmith/queue_classic
Delayed Job: https://github.com/collectiveidea/delayed_job
Resque: https://github.com/resque/resque
Just read through and pick the one that seems to fit your criteria the best.
EDIT
As you clarified, you want regularly scheduled tasks. Clockwork is a great gem for that (and generally a better option than cron):
https://github.com/tomykaira/clockwork

Accessing thread from other rails controller action

I'm working on an application, but at the moment I'm stuck on multithreading with rails.
I have the following situation: when some action occurs (it could be after a user clicks a button or when a scheduled task fires off), I'm starting a separate thread which parses some websites until the moment when I have to receive the SMS-code to continue parsing. At this moment I make Thread.stop.
The SMS-code comes as a POST request to some of my controllers. So I want to pass it to my stopped thread and continue its job.
But how can I access that thread?
Where is the best place to keep a link to that thread?
So how can I handle multithreading? There may be a situation when there'll be a lot of threads and a lot of SMS requests, and I need to somehow correlate them.
For all real purposes you can't, but you can have that other thread 'report' its status.
You can use redis-objects to create either a lock object using redis as its flag, create some type of counter, or just true, false value store. You can then query redis to see the corresponding state of the other thread, and exit if needed.
https://github.com/nateware/redis-objects
The cool part about this is it not only works between threads, but between applications.

Ruby/Rails synchronous job manager

hi
i'm going to set up a rails-website where, after some initial user input, some heavy calculations are done (via c-extension to ruby, will use multithreading). as these calculations are going to consume almost all cpu-time (memory too), there should never be more than one calculation running at a time. also i can't use (asynchronous) background jobs (like with delayed job) as rails has to show the results of that calculation and the site should work without javascript.
so i suppose i need a separate process where all rails instances have to queue their calculation requests und wait for the answer (maybe an error message if the queue is full), kind of a synchronous job manager.
does anyone know if there is a gem/plugin with such functionality?
(nanite seemed pretty cool to me, but seems to be only asynchronous, so the rails instances would not know when the calculation is finished. is that correct?)
another idea is to write my own using distributed ruby (drb), but why invent the wheel again if it already exists?
any help would be appreciated!
EDIT:
because of the tips of zaius i think i will be able to do this asynchronously, so i'm going to try resque.
Ruby has mutexes / semaphores.
http://www.ruby-doc.org/core/classes/Mutex.html
You can use a semaphore to make sure only one resource intensive process is happening at the same time.
http://en.wikipedia.org/wiki/Mutex
http://en.wikipedia.org/wiki/Semaphore_(programming)
However, the idea of blocking a front end process while other tasks finish doesn't seem right to me. If I was doing this, I would use a background worker, and then use a page (or an iframe) with the refresh meta tag to continuously check on the progress.
http://en.wikipedia.org/wiki/Meta_refresh
That way, you can use the same code for both javascript enabled and disabled clients. And your web app threads aren't blocking.
If you have a separate process, then you have a background job... so either you can have it or you can't...
What I have done is have the website write the request params to a database. Then a separate process looks for pending requests in the database - using the daemons gem. It does the work and writes the results back to the database.
The website then polls the database until the results are ready and then displays them.
Although I use javascript to make it do the polling.
If you really cant use javascript, then it seems you need to either do the work in the web request thread or make that thread wait for the background thread to finish.
To make the web request thread wait, just do a loop in it, checking the database until the reply is saved back into it. Once its there, you can then complete the thread.
HTH, chris

Resources