Sidekiq execute job after the other job is done - ruby-on-rails

How can I manage to execute job after the first job that has executed is done in sidekiq. For example:
I triggered the first job for this morning
GoodWorker.perform_async(params) #=> JID-eetc
while it is still in progress I've executed again a job in the same worker dynamically
GoodWorker.perform_ascyn(params) #=> JID-eetc2
and etc.
What's going on now is Sidekiq processing the jobs all of the time,
is there a way performing the job one at a time?

Short answer: no.
Long answer: You can use a mutex to guarantee that only one instance of a worker is executing at a time. If you're running on a cluster, you'll need to use Redis or some other medium to maintain the mutex. Otherwise, you might try putting these jobs in their own queue, and firing up a separate instance of Sidekiq that only monitors that queue, with a concurrency of one.

Can you not setup Sidekiq to only have one thread? Then only one job will be executed at a time.

Related

How to Chain Rails ActiveJobs

I am looking for a way to be able run Active Job serially. Ideally, a long running Job 1 is scheduled to run at a certain time. A similarly running Job 2 is slated to run only after Job 1 completes. Job 3 then waits for Job 2 to run to completion before it starts and so on.
I have to admit that I am rather new to background jobs in Rails but I am already using Active Job with Sidekiq as the job runner for simple fire-and-forget tasks.
I like Active Job because it provides a simple enough interface to dive almost immediately into background jobs processing. I can use Sidekiq without having to define workers, for example.
For reference, I have achieved something similar but it was on .NET using the excellent Hangfire library which has continuations where you pass the ID of a parent job ensuring that the job will run only after the parent job has successfully completed.
It would be nice to have something as clean and simple as that using Sidekiq and Active Job but really any alternative ways to achieve the same thing are welcome. It doesn't have to be Sidekiq and Active Job.
The most straightforward way to to this is to call a third job from within a second job, and the second one from within a first job

How can I configure Delayed jobs to not wait for a task before starting the others?

I am using Delayed jobs for my Ruby app hosted in Heroku to perform a very long task that can take up to 5 minutes.
I've noticed that, in development mode at least, when this task is running the ones that come afterwards are not started until that one finishes. I would like other tasks to be able to start running without having to wait for the other to finish (to have at least 3 concurrent tasks, for example).
I don't wish to increase the number of workers in Heroku ($$$).
I noticed the 'pool' param in delayed jobs but I don't fully understand if this is what I need or how to use it.
https://github.com/collectiveidea/delayed_job/blob/master/README.md
I achieved it using threads in the task code, but maybe this is not the best way to do it.
If you could tell me exactly how I could achieve concurrency in delayed jobs I would really appreciate it.
A DJ worker only runs a single job at a time. If you want concurrent processing of your background jobs, you'll need multiple background workers.
You are way better off implementing sidekiq.

How to correctly use Resque workers?

I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.

Does Sidekiq execute jobs in the order they are sent to a worker?

I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue

Can 2 sidekiq worker threads process the same job?

Is it possible that 1 job is being processed twice by 2 different sidekiq threads? I am using sidekiq to insert some analytics events into a mongodb collection, asynchronously. I see around 15 duplicates in that collection. My guess is that 2 worker threads picked the same job, at the same time, and added to the collection.
Does sidekiq ensure that the job is picked only by 1 thread. We can ignore the restart case, as the jobs are small and will complete in less than 8s.
Is firing analytics events asynchronously using sidekiq not a good practice? What are my options? I could add a unique key to the event and check it before insert to avoid insertion of duplicates, but that's adding data (+ an overhead/query) that I am never going to use (and it adds up for millions of events). Can I somehow ensure that a job is processed only once by sidekiq?
Thanks for your help.
No. Sidekiq uses Redis as a work queue for background processing. Redis provides atomic operations for adding jobs to the queue and popping jobs off of the queue (specifically the redis BRPOP command). Each Sidekiq worker tries to fetch a job from the queue with a timeout via BRPOP and any given job popped from the queue will only be returned to one of the workers pulling work from the queue.
What is more likely is that you are enqueuing multiple jobs.
Another possibility is that your job is throwing an error, causing it to partially execute, and then be re-tried multiple times. By default Sidekiq will retry failed jobs, but doesn't have any built in mechanism for transactions/atomicity of work. ie: If your sidekiq job does A, B, and C and doing B raises an exception, causing the job to fail - it will be retried, causing A to be run again each time the job is retried.

Resources