I have a queue that happened to contain wrong parameters for the perform_async worker. I don't want to loose the jobs, but edit arguments so they will succeed next time or on forced retry.
Is this possible?
Sidekiq stores his jobs in Redis so maybe try using some GUI for redis (like http://redisdesktop.com/) find job you need to update, edit, save. It can be done in some loop to update multiple of them.
Related
The problem is that the sidekiq worker that process the object runs before the object exists in the database. The job is sending to the queue in after_commit callback in the object model. It's posible because I have two replicated databases one for reads and other for inserts. So the time that the process is from enqueue to fail is minor to the time that the data is replicated in the database.
What is the best approach for the solution? I was thinking to add some wait time between enqueue and process to ensure that the data is in the slave database. Is it possible in sidekiq configuration or something like that?
You could do a few things:
Implement a check in the worker to make sure the object exists; otherwise, re-enqueue the job. Probably want to think about this to make sure you don't accidentally re-enqueue bad jobs forever, but this seems like a good sanity check for you.
Introduce a delay. In particular, sidekiq can wait to pull jobs from the queue until a specified time.
"Sidekiq allows you to schedule the time when a job will be executed. You use perform_in(interval, *args) or perform_at(timestamp, *args) rather than the standard perform_async(*args):
MyWorker.perform_in(3.hours, 'mike', 1)
MyWorker.perform_at(3.hours.from_now, 'mike', 1)"
See https://github.com/mperham/sidekiq/wiki/Scheduled-Jobs for more details on this option.
Personally I would go for #1 but #2 might be a quicker fix if you're desperate.
I am currently developing a Rails application which takes a long list of links as input, scrapes them using a background worker (Resque), then serves the results to the user. However, in some cases, there are numerous URLs and I would like to be able to make multiple requests in parallel / concurrency such that it would take much less time, rather than waiting for one request to complete to a page, scraping it, and moving on to the next one.
Is there a way to do this in heroku/rails? Where might I find more information?
I've come across resque-pool but I'm not sure whether it would solve this issue and/or how to implement. I've also read about using different types of servers to run rails in order to make concurrency possible, but don't know how to modify my current situation to take advantage of this.
Any help would be greatly appreciated.
Don't use Resque. Use Sidekiq instead.
Resque runs in a single-threaded process, meaning the workers run synchronously, while Sidekiq runs in a multithreaded process, meaning the workers run asynchronously/simutaneously in different threads.
Make sure you assign a URL to scrape per worker. It's no use if one worker scrape multiple URLs.
With Sidekiq, you can pass the link to a worker, e.g.
LINKS = [...]
LINKS.each do |link|
ScrapeWoker.perform_async(link)
end
The perform_async doesn't actually execute the job right away. Instead, the link is just put in a queue in redis along with the worker class, and so on, and later (could be milliseconds later) workers are assigned to execute each job in queue in its own thread by running the perform instance method in ScrapeWorker. Sidekiq will make sure to retry again if exception occur during execution of a worker.
PS: You don't have pass a link to the worker. You can store the links to a table and then pass the ids of the records to workers.
More info about sidekiq
Adding these two lines to your code will also let you wait until the last job is complete before proceeding:
this line ensures that your program waits for at least one job is enqueued before checking that all jobs are completed as to avoid misinterpreting an unfilled queue as the completion of all jobs
sleep(0.2) until Sidekiq::Queue.new.size > 0 || Sidekiq::Workers.new.size > 0
this line ensures your program waits till all jobs are done
sleep(0.5) until Sidekiq::Workers.new.size == 0 && Sidekiq::Queue.new.size == 0
Is it possible that 1 job is being processed twice by 2 different sidekiq threads? I am using sidekiq to insert some analytics events into a mongodb collection, asynchronously. I see around 15 duplicates in that collection. My guess is that 2 worker threads picked the same job, at the same time, and added to the collection.
Does sidekiq ensure that the job is picked only by 1 thread. We can ignore the restart case, as the jobs are small and will complete in less than 8s.
Is firing analytics events asynchronously using sidekiq not a good practice? What are my options? I could add a unique key to the event and check it before insert to avoid insertion of duplicates, but that's adding data (+ an overhead/query) that I am never going to use (and it adds up for millions of events). Can I somehow ensure that a job is processed only once by sidekiq?
Thanks for your help.
No. Sidekiq uses Redis as a work queue for background processing. Redis provides atomic operations for adding jobs to the queue and popping jobs off of the queue (specifically the redis BRPOP command). Each Sidekiq worker tries to fetch a job from the queue with a timeout via BRPOP and any given job popped from the queue will only be returned to one of the workers pulling work from the queue.
What is more likely is that you are enqueuing multiple jobs.
Another possibility is that your job is throwing an error, causing it to partially execute, and then be re-tried multiple times. By default Sidekiq will retry failed jobs, but doesn't have any built in mechanism for transactions/atomicity of work. ie: If your sidekiq job does A, B, and C and doing B raises an exception, causing the job to fail - it will be retried, causing A to be run again each time the job is retried.
Is there a way to permanently remove jobs from a resque queue? The following commands remove the jobs, but when I restart the workers and the resque server, the jobs load back up.
Resque::Job.destroy("name_queue", Class)
OR
Resque.remove_queue("name_queue")
The problem is you're not removing the specific instance of the job that you added to your Redis server through resque. So when you remove the queue then add it back when you restart the server, all the data from that queue could still be in your Redis server. You can work around this in your job.perform depending on your implementation. For instance, if you want to manipulate a model through resque you could check to see if that model has been destroyed before manipulating it.
I am using collectiveidea for rails 2.3.8. I am creating array of delayed jobs to
perform some tasks, after some time I want to destroy all the delayed jobs which are running.
If anyone know the way to do this please help me.
You can invoke rake jobs:clear to delete all jobs in the queue.
In addition to the rake task, DelayedJob jobs are just a normal ActiveRecord model, so if you're in Ruby code you can do what you like with them:
Delayed::Job.destroy_all
Delayed::Job.delete_all
Delayed::Job.find(4).destroy
# etc.
Sounds like you've got a parent process that wants to timeout if its set of jobs doesn't complete within a certain time. Instead of hanging on to references to the jobs themselves, set a flag on a model that indicates that the process has given up. Jobs can check for that flag and short circuit if they're not needed anymore. (Your Job class should also wrap the contents of its #perform method in a timeout.)
It's almost always a bad idea to try to hang on to references to DJ objects as you seem to be suggesting.