I am using Resque to enqueue jobs.
I start a worker and the jobs are processed.
My jobs extend a gem that implements job hooks like before_enqueue, after_enqueue, before_perform, after_perform and sends stuff to statsd. Those work. However, before_dequeue and after_dequeue do not seem to be called. Is there a reason why?
Also, my understanding of Resque isn't all quite there. I would call Resque.enqueue to queue up a job class, and then if I start a Resque worker, it will automatically pop a task from the queue and then perform the task. Where does dequeue come into play? I notice that dequeue destroys the the task, when does the dequeue step happen in the Resque worker workflow?
I want to hook into after_dequeue because I want to log the time that a task stays in the queue, so I need to hook into before_enqueue and after_dequeue.
So dequeue is used by the client to manually dequeue jobs from Redis/Resque. To calculate the time a job spends in the queue, I will have to capture the time in after_enqueue and before_perform. When Resque pops a job off the queue, there is no hook that we can hook into.
Related
After running SomeJob.perform_later I can see that job was enqueued in ActiveJob::Base.queue_adapter.enqueued_jobs.
How can I remove the job from the queue if I already have saved job_id?
Basically I want to remove job from the queue.
Calling perform_later will enqueue the job into whichever backend you use. The ActiveJob interface doesn't provide a way to remove jobs. If you are using Sidekiq as your backend this documentation explains how to remove a job from a queue.
How can I manage to execute job after the first job that has executed is done in sidekiq. For example:
I triggered the first job for this morning
GoodWorker.perform_async(params) #=> JID-eetc
while it is still in progress I've executed again a job in the same worker dynamically
GoodWorker.perform_ascyn(params) #=> JID-eetc2
and etc.
What's going on now is Sidekiq processing the jobs all of the time,
is there a way performing the job one at a time?
Short answer: no.
Long answer: You can use a mutex to guarantee that only one instance of a worker is executing at a time. If you're running on a cluster, you'll need to use Redis or some other medium to maintain the mutex. Otherwise, you might try putting these jobs in their own queue, and firing up a separate instance of Sidekiq that only monitors that queue, with a concurrency of one.
Can you not setup Sidekiq to only have one thread? Then only one job will be executed at a time.
I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue
Is it possible that 1 job is being processed twice by 2 different sidekiq threads? I am using sidekiq to insert some analytics events into a mongodb collection, asynchronously. I see around 15 duplicates in that collection. My guess is that 2 worker threads picked the same job, at the same time, and added to the collection.
Does sidekiq ensure that the job is picked only by 1 thread. We can ignore the restart case, as the jobs are small and will complete in less than 8s.
Is firing analytics events asynchronously using sidekiq not a good practice? What are my options? I could add a unique key to the event and check it before insert to avoid insertion of duplicates, but that's adding data (+ an overhead/query) that I am never going to use (and it adds up for millions of events). Can I somehow ensure that a job is processed only once by sidekiq?
Thanks for your help.
No. Sidekiq uses Redis as a work queue for background processing. Redis provides atomic operations for adding jobs to the queue and popping jobs off of the queue (specifically the redis BRPOP command). Each Sidekiq worker tries to fetch a job from the queue with a timeout via BRPOP and any given job popped from the queue will only be returned to one of the workers pulling work from the queue.
What is more likely is that you are enqueuing multiple jobs.
Another possibility is that your job is throwing an error, causing it to partially execute, and then be re-tried multiple times. By default Sidekiq will retry failed jobs, but doesn't have any built in mechanism for transactions/atomicity of work. ie: If your sidekiq job does A, B, and C and doing B raises an exception, causing the job to fail - it will be retried, causing A to be run again each time the job is retried.
I have resque and resque-scheduler workers with 2 different queues,
they do the same task, fetch links for a certain website and save that links.
What will happen if resque-scheduler and resque workers work in parallel and do the same task (fetching links for the same website)? How can i handle such situations?
Either you have not clarified your setup or there are some big issues there. Resque and Resque-scheduler were meant to be run together. Resue-scheduler is only supposed to schedule tasks in the future. Such tasks are still executed by Resque workers. Please read this section on their homepage on github: https://github.com/resque/resque-scheduler#delayed-jobs. To quote them,
This will store the job ... in the resque delayed queue at
which time the scheduler process will pull it from the delayed queue
and put it in the appropriate work queue for the given job and it will
be processed as soon as a worker is available (just like any other
resque job).
So, there you go. Keep running your resque workers and schedulers together forever. To answer the other part of your question, if you schedule some task through scheduler and the same task is also queued for resque to pick up directly, the net outcome depends on the task execution logic. "Fetching something from a website" sounds a harmless thing to do twice. But if you update some transaction table to make payments to your vendors based on the result of the fetch, you are in deep trouble.