Sidekiq processes queues only AFTER others are done? - ruby-on-rails

Is it possible to process jobs from a Sidekiq queue ONLY if all other queues are empty?
For example, say we have a photos queue and a updates queue. I only want to process photos if updates is free of pending jobs.
Is that possible?

Well, all you queues execute in parallel, so I don't get the idea of executing consequentially.
But you have several options to play with:
you can make more concurrent workers
you can set frequency higher to updates queue, so updates worker will check for updates more frequently then photo worker.
Take a look at this options in doc

Related

In iOS, if all the task can only be executed asynchronously on a serial queue, isn't it essentially becomes a concurrent queue?

according to apple's doc, tasks are still started in the order in which they were added to a concurrent queue, so it seems to me that there is no difference between a concurrent queue vs a serial queue where all its task are run by async.
correct me if I miss something
read bunch of documents and did not find the answer
The difference is how many tasks may run at the same time:
A serial queue only processes (runs) one task at time, one after another. A concurrent queue may process multiple tasks (on multiple thread) at the same time.
You typically use a serial queue to ensure only one task is accessing a resource at the same time. These are scenarios where you would traditionally use mutexes.
If you have tasks that would benefit from (and are able to) running concurrently at the same time, or tasks that are completely independent and thus don't care, you usually use a concurrent queue.

How to set the concurrency for each queue separately in sidekiq?

Say for example, I have two queues and I want each queue to process only one job simultaneously.
Now I have one queue and I have it something like
bundle exec sidekiq -c 1 -q queue_name
I want two queues to process the job simultaneously and each queue should have concurrency 1. So, is that possible? If yes, how can I do that?
Queues are just places from where sidekiq takes jobs to perform. Concurrency allows to perform taken jobs simultaneously. I mean queues don't process jobs, they just collect them and determine their priority. Therefore the phrase set the concurrency for each queue doesn't make sense.

How to correctly use Resque workers?

I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.

Does Sidekiq execute jobs in the order they are sent to a worker?

I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue

Resque and resque-scheduler works in parallel

I have resque and resque-scheduler workers with 2 different queues,
they do the same task, fetch links for a certain website and save that links.
What will happen if resque-scheduler and resque workers work in parallel and do the same task (fetching links for the same website)? How can i handle such situations?
Either you have not clarified your setup or there are some big issues there. Resque and Resque-scheduler were meant to be run together. Resue-scheduler is only supposed to schedule tasks in the future. Such tasks are still executed by Resque workers. Please read this section on their homepage on github: https://github.com/resque/resque-scheduler#delayed-jobs. To quote them,
This will store the job ... in the resque delayed queue at
which time the scheduler process will pull it from the delayed queue
and put it in the appropriate work queue for the given job and it will
be processed as soon as a worker is available (just like any other
resque job).
So, there you go. Keep running your resque workers and schedulers together forever. To answer the other part of your question, if you schedule some task through scheduler and the same task is also queued for resque to pick up directly, the net outcome depends on the task execution logic. "Fetching something from a website" sounds a harmless thing to do twice. But if you update some transaction table to make payments to your vendors based on the result of the fetch, you are in deep trouble.

Resources