I'm playing with Rails 4.2 app which uses ActiveJob backed by resque/sidekiq for email scheduling. When a user creates newsletter campaign a new job is created and scheduled on certain date. That's all great but what happens when the user changes the delivery date.
In this case every job could check if it should be delivered or not thus invalid jobs would be ignored and only the last one would be executed. This could work but if a user would make 1k edits that would push 1k-1 invalid jobs into queue - not good. I believe that the existing job should be updated or replaced with a new one. As far as I know searching through the Redis queue for the job_id is slow.
What would be the proper way for rescheduling ActiveJobs in Rails (with resque/sidekiq)?
There is none, jobs are not meant to be rescheduled. You have answered your own question:
In this case every job could check if it should be delivered or not thus invalid jobs would be ignored and only the last one would be executed.
The alternative is to re-architect how you send campaigns: store the delivery date in the database and have cron check every minute for campaigns which need delivery now and create the Sidekiq job right then.
Related
I am using Sidekiq to schedule some tasks based on a schedule that the user provides. However, if the user changes the schedule, I want to be able to simply update the old schedule with the new one.
Suggestion one
I saw a suggestion to just find the old job with Sidekiq::ScheduledSet.new.find_job(job_id), but I am trying to avoid having to create a new model just to simply store the job ID and the task.
Suggestion two
Another suggestion I saw was to just have the worker check if the time of the task matches the current time, but that won't work because if the server is offline, it won't process jobs when it returns back online because the time of those delayed jobs won't match the current time.
If I could assign my own job ID, like a hex version of the job name or a padded version of the task ID, then I could easily avoid having to create a new model to store the job IDs. So when the user reschedules a task, then it would be a lot easier.
Other thoughts
Maybe if I could check the job's at attribute and match that with the task, that may work, but I'm not sure how to access that attribute from within the worker without knowing the job ID.
Edit
I just tried to pull the current job's at attribute, but it looks like once the job kicks off, it doesn't exist anymore in Sidekiq::ScheduledSet, so there's no matching this job's time with Task's time from what it seems like.
I am using Sidekiq to schedule some tasks based on a schedule that the user provides...
There's an extension for that. Sidekiq-Scheduler gives you a cron-like schedule configuration file. Then you can alter the schedule as you see fit. This seems like the best option as it avoids having to write your own scheduler interface.
Can I assign my own randomized Job ID to Sidekiq?
Yes, though it's undocumented. You can give Sidekiq::Client.push a jid attribute.
Sidekiq::Client.push('class' => MyWorker, 'args' => [1, 2, 3], 'jid' => ... )
This is not a good way to solve your problem. It's relying on an undocumented feature. And it invites collisions with normal Sidekiq IDs.
Maybe if I could check the job's at attribute and match that with the task, that may work, but I'm not sure how to access that attribute from within the worker without knowing the job ID.
This sounds very error prone. You'd have to store the timestamp in a model anyway. Better to store the job ID in the first place.
I am trying to avoid having to create a new model just to simply store the job ID and the task.
Storing things in models is what Rails does really well. This would seem to be the way to go. It will take a trivial amount of coding, database storage, and processing. You should have a model, view, and controller for your scheduled jobs anyway else how will you create scheduled jobs and view your schedule?
However, the Sidekiq docs notes that find_job is "a slow, inefficient operation. Do not use under normal conditions. Sidekiq Pro contains a faster version." This is because it has to iterate through all jobs.
I had a case where I had to reschedule jobs based on updates from the User. It is actually pretty slow and complicated.
It's simpler to not reschedule, but instead make the old queued tasks no-ops (no operations) and then queue up the new tasks.
This is basically defined by the logic within the task. You'd have to know that the user updated their schedule somehow and check that within the old jobs and based on some if-check, not go through with the job.
I have a PendingEmail table which I push many records to for emails I want to send.
I then have multiple Que workers which process my app's jobs. One of said jobs is my SendEmailJob.
The purpose of this job is to check PendingEmail, pull the latest 500 ordered by priority, make a batch request to my 3rd party email provider, wait for array response of all 500 responses, then delete the successful items and mark the failed records' error column. The single job will continue in this fashion until the records returned from the DB are 0, and the job will exit/destroy.
The issues are:
It's critical only one SendEmailJob processes email at one time.
I need to check the database every second if a current SendEmailJob isn't running. If it is running, then there's no issue as that job will get to it in ~3 seconds.
If a table is locked (however that may be), my app/other workers MUST still be able to INSERT, as other parts of my app need to add emails to the table. I mainly just need to restrict SELECT I think.
All this needs to be FAST. Part of the reason I did it this way is for performance as I'm sending millions of email in a short timespan.
Currently my jobs are initiated with a clock process (Clockwork), so it would add this job every 1 second.
What I'm thinking...
Que already uses advisory locks and other PG mechanisms. I'd rather not attempt to mess with that table trying to prevent adding more than one job in the first place. Instead, I think it's ok that potentially many SendEmailJob could be running at once, as long as they abort early if there is a lock in place.
Apparently there are some Rails ways to do this but I assume I will need to execute code directly to PG to initiate some sort of lock in each job, but before doing that it checks if there already is one lock, and if there is it aborts)
I just don't know which type of lock to choose, whether to do it in Rails or in the database directly. There are so many of them with such subtle differences (I'm using PG). Any insight would be greatly appreciated!
Answer: I needed an advisory lock.
I am working on implementing a requirement for a web app:
Users should be able to schedule a job in the future
Users should be able to view a list of scheduled jobs
Users should be able to edit the time of scheduled jobs
All of the above can easily be done with Sidekiq using something like:
ss = Sidekiq::ScheduledSet.new
job = ss.find_job(jid)
job.reschedule(at)
However, I would have to store in the db at least the jid and the scheduled time so that the user can determine if he / she wants to reschedule it.
My concern is that the above is not efficient. First, I would have to duplicate storage in redis and the db, additionally, as Mike Perham put it:
You can do this but it won't be efficient. It's a linear scan for find
a scheduled job by JID.
See https://stackoverflow.com/a/24189112/1755697
If I need to store details in the db and in redis (sidekiq storage), I am wondering if it may be simpler to go with Delayed Job which stores job in the db, thus not duplicating storage.
The main concern about Delayed Job is that it is not yet Rails 5.0 ready and it has not been updated for a while.
I would appreciate any suggestion on which system to chose or how to use Sidekiq in a more efficient way to meet the above mentioned requirements.
I'm creating flight booking website in Rails. Booking information is stored in database in the following table:
USERNAME | FLIGHT FROM | FLIGHT TO | DATE OF FLIGHT | TIME OF FLIGHT | some additional information not relevant to this task ... |
I'm looking to send an email an hour (or some specific time) before the TIME OF FLIGHT on a DATE OF FLIGHT. What is the best approach to do it ? I was looking into Cron and delayed_job however both seem to be based more on intervals rather than executing a job at specific date and time.
Please help.
Thank you
The simplest approach is just to have a cron job set to run every 10 minutes and determine via a database query which flights now require a reminder e-mail. You can have an additional field in the database such as "REMINDER_SENT" so that you only send an e-mail once.
If you are already using delayed job then the cron job should just call a ruby script which adds a SendReminders job on to the queue. You can then manage all of the db querying, e-mail sending and db updating from a normal delayed job.
This approach saves you having to queue up a large number of future dated events and you don't need to worry about flight times changing or events getting lost. If you miss one event then the next run in 10 minutes will pick up all the flights anyway.
Are you required to send those notifications exactly one hour (or another time) in advance?
If not I would create a cron job that calls a rake task, say every 10 minutes. This task checks if there are notifications due and sends them. If you expect them to arrive 60 minutes before, with these settings you have a delivery timeframe between 60-70 minutes in advance, given the delays imposed by spam filters etc I think this is reasonable.
If you call the script more often (every minute), the precision is higher, but you might have trouble with concurrently running tasks.
I'd like to make an email notification if SomeModel has not been updated for 2 hours.
What is the best way to implement it?
After a model has been saved, queue up a background job to run 2 hours from that time to send the email. When a new job is enqueued, remove any still-unrun jobs that are still on the queue.
resque-scheduler providers a pretty simple way of doing this, assuming you have redis up and running.
Personally I find the solution that #x1a4 proposes to be somewhat overkill. Given the relatively large window of 2 hours, I would just run a job periodically (say, once every 10-15 minutes), then search all Models for updated_at <= 2.hours.ago and send out the emails.
As for scheduling that job to run every 15 minutes, there are several options. You may use resque-scheduler, if you are using Resque. You may also use the standard system cron, but will incur some fairly substantial overhead starting Rails each time the job runs. I also have written a distributed scheduler gem (i.e. cron that can run on multiple machines, but act like it's only running on one), which uses Redis under the hood.