Resque and resque-scheduler works in parallel - ruby-on-rails

I have resque and resque-scheduler workers with 2 different queues,
they do the same task, fetch links for a certain website and save that links.
What will happen if resque-scheduler and resque workers work in parallel and do the same task (fetching links for the same website)? How can i handle such situations?

Either you have not clarified your setup or there are some big issues there. Resque and Resque-scheduler were meant to be run together. Resue-scheduler is only supposed to schedule tasks in the future. Such tasks are still executed by Resque workers. Please read this section on their homepage on github: https://github.com/resque/resque-scheduler#delayed-jobs. To quote them,
This will store the job ... in the resque delayed queue at
which time the scheduler process will pull it from the delayed queue
and put it in the appropriate work queue for the given job and it will
be processed as soon as a worker is available (just like any other
resque job).
So, there you go. Keep running your resque workers and schedulers together forever. To answer the other part of your question, if you schedule some task through scheduler and the same task is also queued for resque to pick up directly, the net outcome depends on the task execution logic. "Fetching something from a website" sounds a harmless thing to do twice. But if you update some transaction table to make payments to your vendors based on the result of the fetch, you are in deep trouble.

Related

How can I configure Delayed jobs to not wait for a task before starting the others?

I am using Delayed jobs for my Ruby app hosted in Heroku to perform a very long task that can take up to 5 minutes.
I've noticed that, in development mode at least, when this task is running the ones that come afterwards are not started until that one finishes. I would like other tasks to be able to start running without having to wait for the other to finish (to have at least 3 concurrent tasks, for example).
I don't wish to increase the number of workers in Heroku ($$$).
I noticed the 'pool' param in delayed jobs but I don't fully understand if this is what I need or how to use it.
https://github.com/collectiveidea/delayed_job/blob/master/README.md
I achieved it using threads in the task code, but maybe this is not the best way to do it.
If you could tell me exactly how I could achieve concurrency in delayed jobs I would really appreciate it.
A DJ worker only runs a single job at a time. If you want concurrent processing of your background jobs, you'll need multiple background workers.
You are way better off implementing sidekiq.

If I use Heroku scheduler, do I still need delayed job?

I'm a little confused about this. I have a couple of tasks that I would like to run asynchronously, for example my inventory sync integration. For this I have implemented delayed job, but I realize that I need to run rake jobs:work on Heroku for this. I can use the Heroku scheduler to run this rake task every 10 minutes. My question is; if I create rake tasks to run i.e. my inventory sync method, do I still need delayed job? My understanding is that heroku scheduler kicks off 'one off dynos'.
Instead of using delayed job, could I not just kick off the sync method directly since a separate dyno is used anyway? What is the added value of delayed job here?
Heroku's Scheduler replaces what cron would handle on a typical server. Delayed Job or Sidekiq are for processing jobs asynchronously from your app, not a timed schedule.
The reason you use a worker & run these jobs on the back-end is so that your server can return a response as soon as is possible rather than making the user wait for some potentially unnecessarily long running process to finish (lots of queries, outbound e-mail, external API requests, etc.).
Ex, scheduler can run analytics or updates from a script every hour or day, but delayed job can not.

How to correctly use Resque workers?

I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.

Does Sidekiq execute jobs in the order they are sent to a worker?

I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue

Should the resque-scheduler queue be expected to handle /lots/ of delayed jobs?

I am currently using resque and resque-scheduler in an application that will have to handle a lot of recurring jobs - "do this every hour", "do this every day" etc. At the moment, I simply queue up the next run of the job in the job itself, the HourlyJob queue has a .enqueue_at(1.hour.from_now, HourlyJob) etc.
Should I be doing this? It "feels" like I should have a static recurring job using resque-schedulers cron-type functionality that then schedules up say the next 5 minutes worth of delayed jobs... but all I am really doing is moving the work from the (probably fast, redis based) resque-scheduler to my (probably less well implemented, mysql based) code, surely?
Is there anything wrong with how I'm doing it now?
I'd personally use the cron style provided by resque-scheduler, your use case is exactly what it was built for:
Your more directly indicate these are recurring jobs.
Everything is located in the same YAML file rather then multiple job classes/modules.
By queuing the next run of the job inside the actual job:
You run the risk of the next run going missing when your worker/job/server fails.
Your needlessly using more memory in Redis, the scheduler process will not add the jobs to Redis until there ready to be run.
Hops this helps.

Resources