I am working on a multi-user tree editing app. It uses resque gem for background processes. To avoid runtime multiuser conflicts I want to use command pattern and store user actions in a resque queue so if someone is deleting a branch other user cannot edit children of that branch.
It works, but it is quite slow to pick the job first time from a queue, because resque worker checks for the jobs using 5 seconds interval. It slows down editing interface significantly. It is possibe to do something like this:
cmd = MyCommand.create!(:attr1 => 'foo', :attr2 => 'bar')
Resque.enqueue(MyCommand, cmd.id)
workers = Resque.workers.select {|w| w.queues.include?('my_queue') }
raise "Should be only one queue for commands!" if workers.size != 1
not_done = true
while not_done
not_done = workers[0].process
end
It does what I need, but I wonder if there is a more elegant way of doing this. Also, :process is a deprecated method for Worker instances.
I think your design approach is somewhat sound, but Redis/Resque may not be appropriate. What you want is a super fast in-memory queue that's similar to Resque, but that does not come with a polling delay.
I am pretty sure you can use MemCached for this, but there maybe other options. Any solution where your queued commands have to be pulled at certain interval would probably not provide acceptable performance for collaborative editing, unless it's OK to poll maybe every 100ms or even more often.
Finally, if you are placing every action on a single queue which can only process command serially (one at a time), you are inevitably going to end up in a situation where the queue may backup because too many commands are coming in, and they are not processing as fast. This is why a more scalable solution maybe with using versioning, where each element of the tree is versioned, and when updated/changed, all child elements are updated with a new version too. That way, an edit on an older version number is rejected.
Anyway.. good luck, sounds like a non-trivial problem to solve.
Related
I want to create an API endpoint in my app that takes data and performs a complex, time-consuming operation with it, but only after returning that the data has been received.
As in, I'd love to be able to do something like this:
def action
render json: { data_received: params[:whatever].present? }
perform_calculations( params[:whatever] )
end
Unfortunately, as I understand it, Ruby/Rails is synchronous, and requires that a controller action end in a render/redirect/head statement of some sort.
Ordinarily, I'd think of accomplishing this with a background worker, like so:
def action
DelayedJobActionPerformer.perform_later(params[:whatever])
render { data_received: params[:whatever].present? }
end
But a worker costs (on Heroku) a fair amount of monthly money for a beginning app like this, and I'm looking for alternatives. Is there any alternative to background workers you can think of to return from the action and then perform the behavior?
I'm thinking of maybe creating a separate Node app or something that can start an action and then respond, but that's feeling ridiculous. I guess the architecture in my mind would involve a main Rails app which performs most of the behavior, and a lightweight Node app that acts as the API endpoint, which can receive a request, respond that it's been received, and then send on the data to be performed by that first Rails app, or another. But it feels excessive, and also like just kicking the problem down the road.
At any rate, whether or not I end up having to buy a worker or few, I'd love to know if this sort of thing is feasible, and whether using an external API as a quasi-worker makes sense (particularly given the general movement towards breaking up application concerns).
Not really...
Well you can spawn a new thread:
thread = Thread.new { perform_calculations( params[:whatever] ) }
And not call thread.join, but that is highly unreliable, because that thread will be killed if the main thread terminates.
I don't know how things with cron jobs are in Heroku, but another option is to have a table with pending jobs where you save params[:whatever] and have a rake task that is triggered with cron periodically to check and perform any pending tasks. This solution is a (really) basic worker implementation.
Heard about sucker_punch, you can give it a try. This will run in single webprocess but the downside is that if the web processes is restarted and there are jobs that haven't yet been processed, they will be lost. So not recommended for critical background tasks.
I've written a number of actions in a RoR app, that perform different actions within process.
E.g.
- One action communicates with a third party service using their API and collects data.
- Another processes this data and places it into a relevant database.
- Another takes this new data and formats it in a specific way.
etc..
I would like to fire off the process at timed intervals, eg. Each hour. But I don't want to do the whole thing each time.
Sometimes I may just want to do the first two actions. At other times, I might want to do each part of the process.
So have one action run, and then when it's finished call another action. ETC..
The actions could take up to an hour to complete, if not longer, so I need a solution that won't timeout.
What would be the best way to achieve this?
You have quite a few options for processing jobs in the background:
Sidekiq: http://mperham.github.io/sidekiq/
Queue Classic: https://github.com/ryandotsmith/queue_classic
Delayed Job: https://github.com/collectiveidea/delayed_job
Resque: https://github.com/resque/resque
Just read through and pick the one that seems to fit your criteria the best.
EDIT
As you clarified, you want regularly scheduled tasks. Clockwork is a great gem for that (and generally a better option than cron):
https://github.com/tomykaira/clockwork
Each Thing inserted into the database has an archive attribute. When set to 0, nothing will happen. However, if it is not, I want it to be added to a queue to be processed.
Archiving a Thing can take anywhere from 3 to 30 seconds, because a lot of requests are sent and handled. So my question is do I make it so:
When a Thing has archive set to 1, have it put in a queue to be processed by a Rake task every so often (every 15 minutes or so), and then have the archive attribute set to 2, to stop it from being processed again
Make a do_archive method on Thing, and when saving, do self.delay.do_archive, and let delayed_job handle all of that for me
Some Things do not need to be processed, and the archiving isn't a time-essential thing. My gut is that delayed_job is probably a better idea, as it's not time-specific and just goes through a queue as opposed to running a script every day at midnight.
I think you pretty much nailed it in your last paragraph. If it's something that is time dependent and not event driven, then cron makes more sense. But if there is an event that occurs and can queue it up, and it is not time dependent, then use a background job.
One thing you might want to consider is whether an actual messaging system makes more sense. While something like RabbitMQ might be overkill for where you are today, there are other simpler options. Sidekiq or Resque are two popular options that give you quite a bit more control over the background jobs and offer the simplicity of delayed_job and the robustness of a messaging system.
hi
i'm going to set up a rails-website where, after some initial user input, some heavy calculations are done (via c-extension to ruby, will use multithreading). as these calculations are going to consume almost all cpu-time (memory too), there should never be more than one calculation running at a time. also i can't use (asynchronous) background jobs (like with delayed job) as rails has to show the results of that calculation and the site should work without javascript.
so i suppose i need a separate process where all rails instances have to queue their calculation requests und wait for the answer (maybe an error message if the queue is full), kind of a synchronous job manager.
does anyone know if there is a gem/plugin with such functionality?
(nanite seemed pretty cool to me, but seems to be only asynchronous, so the rails instances would not know when the calculation is finished. is that correct?)
another idea is to write my own using distributed ruby (drb), but why invent the wheel again if it already exists?
any help would be appreciated!
EDIT:
because of the tips of zaius i think i will be able to do this asynchronously, so i'm going to try resque.
Ruby has mutexes / semaphores.
http://www.ruby-doc.org/core/classes/Mutex.html
You can use a semaphore to make sure only one resource intensive process is happening at the same time.
http://en.wikipedia.org/wiki/Mutex
http://en.wikipedia.org/wiki/Semaphore_(programming)
However, the idea of blocking a front end process while other tasks finish doesn't seem right to me. If I was doing this, I would use a background worker, and then use a page (or an iframe) with the refresh meta tag to continuously check on the progress.
http://en.wikipedia.org/wiki/Meta_refresh
That way, you can use the same code for both javascript enabled and disabled clients. And your web app threads aren't blocking.
If you have a separate process, then you have a background job... so either you can have it or you can't...
What I have done is have the website write the request params to a database. Then a separate process looks for pending requests in the database - using the daemons gem. It does the work and writes the results back to the database.
The website then polls the database until the results are ready and then displays them.
Although I use javascript to make it do the polling.
If you really cant use javascript, then it seems you need to either do the work in the web request thread or make that thread wait for the background thread to finish.
To make the web request thread wait, just do a loop in it, checking the database until the reply is saved back into it. Once its there, you can then complete the thread.
HTH, chris
I'm building something akin to Google Analytics and currently I'm doing real time database updates. Here's the workflow for my app:
User makes a RESTful API request
I find a record in a database, return it as JSON
I record the request counter for the user in the database (i.e. if I user makes 2
API calls, I increment the request counter for the user by '2').
1 and 2 are really fast in SQL - they are SELECTs. #3 is really slow, because it's an UPDATE. In the real world, my database (MySQL) is NOT scaling. According to New Relic, #3 is taking most of the time - up to 70%!
My thinking is that I need to stop doing synchronous DB operations. In the short term, I'm trying to reduce DB writes, so I'm thinking about a global hash (say declared in environment.rb) that is accessible from my controllers and models that I can write to in lieu of writing to the DB. Every so often I can have a task write the updates that need to be written to the DB.
Questions:
Does this sound reasonable? Any gotchas?
Will I run into any concurrency problems?
How does this compare with writing logs to the file system and importing later?
Should I be using some message queuing system instead, like Starling? Any recommendations?
PS: Here's the offending query -- all columns of interest are indexed:
UPDATE statistics_api SET count_request = COALESCE(count_request, ?) + ? WHERE (id = ?)
Your hash solution sounds like it's a bit too complex. This set of slides is an insightful and up-to-date resource that addresses your issue head on:
http://www.slideshare.net/mattmatt/the-current-state-of-asynchronous-processing-with-ruby
They say the simplest thing would be:
Thread.new do
MyModel.do_long_thing
end
But the Ruby mysql driver is blocking, so a mysql request in that thread could still block your request. You could use mysqlplus as a driver and get non-blocking requests, but now we're getting a pretty complex and specialized solution.
If you really just want this out of your request cycle, but can spare locking the server for it, you can do something like:
MyController
after_filter :do_jobs
def index
#job = Proc.new{ MyModel.do_long_thing }
end
private
def do_jobs
return unleses #job
#job.call
end
end
I'd abstract it into ApplicationController more, but you get the idea. The proc defers updates until after the request.
If you are serious about asynchronous and background processes, you'll need to look at the various options out there and make a decision about what fits you needs. Matt Grande recommended DelayedJob- that's a very popular pick right now, but if your entire server is bogged down with database writes, I would not suggest it. If this is just a particularly slow update, but your server is not over-loaded, then maybe it's a good solution.
I currently use Workling with Starling in my most complex project. Workling has been pretty extensible, but Starling has been a little less than ideal. One of Workling's advantages is the ability to swap backends, so we can move off Starling if it becomes a large problem.
If your server is bogged with writes, you will need to look at scaling it up regardless of your asynchronous task approach.
Good luck! It sounds like you're app is growing at an exciting pace :-)
I just asked a similar question over on the EventMachine mailing list, and I was suggested that I try phat (http://www.mikeperham.com/2010/04/03/introducing-phat-an-asynchronous-rails-app/) to get asynchronous database access.
Maybe you should try it out.
Do it later with DelayedJob.
Edit: If your DB is being hit so much that one UPDATE is noticeably slowing down your requests, maybe you should consider setting up a master-slave database architecture.