I'm using version 1.25.2 of Resque in a Rails app.
I tried to invoke the instance methods pause_processing and its reverse unpause_processing of Resque::Worker class on all the workers I fetched through Resque.workers. However the workers still continued to process new jobs added dynamically to any queue. When checked for the state through instance.paused? every worker returned true.
Not sure if I can really control the workers running in background.
As far as I can comprehend pause_processing,unpause_processing and shutdown do the same thing as sending USR2 CONT and KILL signals to Resque workers.
Am I missing something trivial or is there another way to manage the workers.
This is because you're calling this method and modifying state on an entirely different instance of the worker. As you can see all it does is set the value of an instance variable. When you call Resque.workers, you get instances of all the workers—but not the instance of the running worker in a different process. Resque doesn't allow you to modify the state of a running worker remotely. You must interact with it by signals if you want to change the state of a Resque worker from the outside. If you're just calling the methods, you're not sending a signal.
Related
I can't just shut down the entire cluster like in this answer because there might be other jobs running. I run one cluster in order to avoid having to use Kubernetes. Jobs get submitted to this cluster, but they call into C libraries that leak memory.
The workers run one thread per process, so it would be acceptable to terminate the entire worker process and have it be restarted.
I can't just use os.kill from the task itself because the task's return value has to be propagated back through Dask. I have to get Dask to terminate the process for me at the right time.
Is there any way to do this?
I am currently developing a Rails application which takes a long list of links as input, scrapes them using a background worker (Resque), then serves the results to the user. However, in some cases, there are numerous URLs and I would like to be able to make multiple requests in parallel / concurrency such that it would take much less time, rather than waiting for one request to complete to a page, scraping it, and moving on to the next one.
Is there a way to do this in heroku/rails? Where might I find more information?
I've come across resque-pool but I'm not sure whether it would solve this issue and/or how to implement. I've also read about using different types of servers to run rails in order to make concurrency possible, but don't know how to modify my current situation to take advantage of this.
Any help would be greatly appreciated.
Don't use Resque. Use Sidekiq instead.
Resque runs in a single-threaded process, meaning the workers run synchronously, while Sidekiq runs in a multithreaded process, meaning the workers run asynchronously/simutaneously in different threads.
Make sure you assign a URL to scrape per worker. It's no use if one worker scrape multiple URLs.
With Sidekiq, you can pass the link to a worker, e.g.
LINKS = [...]
LINKS.each do |link|
ScrapeWoker.perform_async(link)
end
The perform_async doesn't actually execute the job right away. Instead, the link is just put in a queue in redis along with the worker class, and so on, and later (could be milliseconds later) workers are assigned to execute each job in queue in its own thread by running the perform instance method in ScrapeWorker. Sidekiq will make sure to retry again if exception occur during execution of a worker.
PS: You don't have pass a link to the worker. You can store the links to a table and then pass the ids of the records to workers.
More info about sidekiq
Adding these two lines to your code will also let you wait until the last job is complete before proceeding:
this line ensures that your program waits for at least one job is enqueued before checking that all jobs are completed as to avoid misinterpreting an unfilled queue as the completion of all jobs
sleep(0.2) until Sidekiq::Queue.new.size > 0 || Sidekiq::Workers.new.size > 0
this line ensures your program waits till all jobs are done
sleep(0.5) until Sidekiq::Workers.new.size == 0 && Sidekiq::Queue.new.size == 0
scheduler.every '1m', :overlap => false, :mutex => "my_lock" do
something...
end
will this scheduler's job wait for the previous run to finish or skips if it finds that previous run is still running?
There are two cases to consider, the single process case and the multi process case.
The multi process case happens when your Rails uses a server made/tuned to use multiple Ruby processes.
By server, I mean Webrick, Unicorn, Passenger, Puma, etc...
In both cases, the overlap is considered before the mutex.
In the single process case the overlap => false will kick in first and the scheduler will skip the incoming, overlapping job. The mutex will then be considered (if the job isn't overlapping) and might make the job wait until the mutex is freed by an instance of the same job or an instance of a job pointing to that same mutex.
In the multiple process case, you might end up having a scheduler in each of your Ruby processes and overlap and mutex might appear to be not respected, while, in their local Ruby Process they are, but since you have more than one process with each a scheduler...
If you have a server giving you multiple Ruby processes and you want to have a single rufus-scheduler working on its jobs, you have to read https://github.com/jmettraux/rufus-scheduler#lockfile--mylockfiletxt and https://github.com/jmettraux/rufus-scheduler#scheduler_lock There are also multiple stackoverflow questions dedicated to this subject.
Is there a way to permanently remove jobs from a resque queue? The following commands remove the jobs, but when I restart the workers and the resque server, the jobs load back up.
Resque::Job.destroy("name_queue", Class)
OR
Resque.remove_queue("name_queue")
The problem is you're not removing the specific instance of the job that you added to your Redis server through resque. So when you remove the queue then add it back when you restart the server, all the data from that queue could still be in your Redis server. You can work around this in your job.perform depending on your implementation. For instance, if you want to manipulate a model through resque you could check to see if that model has been destroyed before manipulating it.
Here's my simple ideal case scenario for when I'd like delayed job to run:
When the first application server (whether through mongrel or passenger) starts, it'll start my delayed job workers.
When the last running application server terminates, it'll kill all the delayed job workers.
The first part (starting) is doable, although I'm not sure what the "right" or "best" way to do it is. Just make a conditional (on process not already running) system call to delayed_job start?
The second part (terminating) -- well, I'm not sure if it is doable or not. Definitely have no idea how this effect could be accomplished.
Any thoughts or ideas?
Is there another way that you start/end delayed job workers that you think is best?
Side question:
The main questions above are for the production environment -- a more difficult case because there are multiple app servers running at the same time. Could the same thing be easily done in the development environment (where there's guaranteed to only be one application server, not a cluster of them) by forking a child process to run the delayed job workers that would always terminate when the parent terminates? How would I go about doing this?
You could definitely pull the terminating off with god.
Simply watch the app processes and god will fire a callback when they're all stopped.