Can 2 sidekiq worker threads process the same job? - ruby-on-rails

Is it possible that 1 job is being processed twice by 2 different sidekiq threads? I am using sidekiq to insert some analytics events into a mongodb collection, asynchronously. I see around 15 duplicates in that collection. My guess is that 2 worker threads picked the same job, at the same time, and added to the collection.
Does sidekiq ensure that the job is picked only by 1 thread. We can ignore the restart case, as the jobs are small and will complete in less than 8s.
Is firing analytics events asynchronously using sidekiq not a good practice? What are my options? I could add a unique key to the event and check it before insert to avoid insertion of duplicates, but that's adding data (+ an overhead/query) that I am never going to use (and it adds up for millions of events). Can I somehow ensure that a job is processed only once by sidekiq?
Thanks for your help.

No. Sidekiq uses Redis as a work queue for background processing. Redis provides atomic operations for adding jobs to the queue and popping jobs off of the queue (specifically the redis BRPOP command). Each Sidekiq worker tries to fetch a job from the queue with a timeout via BRPOP and any given job popped from the queue will only be returned to one of the workers pulling work from the queue.
What is more likely is that you are enqueuing multiple jobs.
Another possibility is that your job is throwing an error, causing it to partially execute, and then be re-tried multiple times. By default Sidekiq will retry failed jobs, but doesn't have any built in mechanism for transactions/atomicity of work. ie: If your sidekiq job does A, B, and C and doing B raises an exception, causing the job to fail - it will be retried, causing A to be run again each time the job is retried.

Related

Sidekiq Jobs maximum run time

I am having a job which takes more than 1 hour to execute.Due to that,remaining jobs gets enqueued and not able to start.So I have decided to set maximum run time for background jobs.Is there any way to set timeout for jobs in sidekiq?
You cannot timeout or stop jobs in Sidekiq. This is dangerous and can corrupt your application data.
Sounds like you only have one Sidekiq process with concurrency of 1. You can start multiple Sidekiq processes and they will work on different jobs and you can increase concurrency to do the same.

How to make multiple parallel concurrent requests with Rails and Heroku

I am currently developing a Rails application which takes a long list of links as input, scrapes them using a background worker (Resque), then serves the results to the user. However, in some cases, there are numerous URLs and I would like to be able to make multiple requests in parallel / concurrency such that it would take much less time, rather than waiting for one request to complete to a page, scraping it, and moving on to the next one.
Is there a way to do this in heroku/rails? Where might I find more information?
I've come across resque-pool but I'm not sure whether it would solve this issue and/or how to implement. I've also read about using different types of servers to run rails in order to make concurrency possible, but don't know how to modify my current situation to take advantage of this.
Any help would be greatly appreciated.
Don't use Resque. Use Sidekiq instead.
Resque runs in a single-threaded process, meaning the workers run synchronously, while Sidekiq runs in a multithreaded process, meaning the workers run asynchronously/simutaneously in different threads.
Make sure you assign a URL to scrape per worker. It's no use if one worker scrape multiple URLs.
With Sidekiq, you can pass the link to a worker, e.g.
LINKS = [...]
LINKS.each do |link|
ScrapeWoker.perform_async(link)
end
The perform_async doesn't actually execute the job right away. Instead, the link is just put in a queue in redis along with the worker class, and so on, and later (could be milliseconds later) workers are assigned to execute each job in queue in its own thread by running the perform instance method in ScrapeWorker. Sidekiq will make sure to retry again if exception occur during execution of a worker.
PS: You don't have pass a link to the worker. You can store the links to a table and then pass the ids of the records to workers.
More info about sidekiq
Adding these two lines to your code will also let you wait until the last job is complete before proceeding:
this line ensures that your program waits for at least one job is enqueued before checking that all jobs are completed as to avoid misinterpreting an unfilled queue as the completion of all jobs
sleep(0.2) until Sidekiq::Queue.new.size > 0 || Sidekiq::Workers.new.size > 0
this line ensures your program waits till all jobs are done
sleep(0.5) until Sidekiq::Workers.new.size == 0 && Sidekiq::Queue.new.size == 0

Sidekiq execute job after the other job is done

How can I manage to execute job after the first job that has executed is done in sidekiq. For example:
I triggered the first job for this morning
GoodWorker.perform_async(params) #=> JID-eetc
while it is still in progress I've executed again a job in the same worker dynamically
GoodWorker.perform_ascyn(params) #=> JID-eetc2
and etc.
What's going on now is Sidekiq processing the jobs all of the time,
is there a way performing the job one at a time?
Short answer: no.
Long answer: You can use a mutex to guarantee that only one instance of a worker is executing at a time. If you're running on a cluster, you'll need to use Redis or some other medium to maintain the mutex. Otherwise, you might try putting these jobs in their own queue, and firing up a separate instance of Sidekiq that only monitors that queue, with a concurrency of one.
Can you not setup Sidekiq to only have one thread? Then only one job will be executed at a time.

How to correctly use Resque workers?

I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.

Does Sidekiq execute jobs in the order they are sent to a worker?

I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue

Resources