How to run n jobs each x time with delayed job gem? - ruby-on-rails

My question is how could run x quantities of jobs each n seconds with delayed job gem?
Example
I want send massive newsletter. However when I send #50 ("example") mail, GMail doesn´t send the emails (SPAM) however its contents is a bit different.
So if I send the newsletter by groups of twenty user each 3 minutes perhaps GMail will send the emails correctly.
Thanks in advance

delayed_job has an option :run_at(https://github.com/collectiveidea/delayed_job), you can use that to set at what time the job should run. It doesn't guarantee if job would run at that time, but it would surely not run before that.
So, 20 mails/3 minutes = 1 email/9 seconds.
You can do somethings like this:
now = Time.now
count = 0
# then for each newsletter schedule it at intervals of 9 secs
users.each do |u|
WhateverMailer.delay(:run_at => now + count*9).news_letter(u)
count += 1
end

You can do this type of thing with SimpleWorker, a cloud-based scheduling and background processing service. It offers DelayedJob-like capabilities but without having to manage servers and queues. (I work at Iron.io, makers of SimpleWorker).
You can schedule a job to run every X seconds or schedule multiple jobs to come onto queue at specific times (different priorities have different target run windows). Rather than pre-schedule a lot of jobs though, you'll probably want to have one or more master jobs that run on regular schedules and then queue up one or more slave jobs to send the actual emails (each checking the database to pick up the next set to send).
You can do use the same approach when facing thresholds with fetching data via APIs. Happy to discuss further if you'd like.
Ken

Related

enqueuing jobs using sucker punch

I have one doubt with enqueuing the job using sucker punch.
I have 2000+ search keywords in my database I want to know the google and bing ranking for each keyword in my database. For this I'm using Authority Labs API. But AuthorityLabs will only process 1000 POST request in 1 hour. I'm sending each request to AuthorityLab as a background job using sucker punch. How can I limit only 1000 jobs will run in 1 hour, remaining jobs only start after one hour. Also I want to run this jobs daily for analysing the rank change.
Rate limiting is not a concern of your queue system, much less of SuckerPunch that is not designed to handle advanced delaying/queuing stuff, it just moves asynchronous jobs to a thread from a thread pool.
If you really want to have rate limiting, use a real queue system like Sidekiq, and put some actual code to work.
Sidekiq Enterprise supports it natively: https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting
Sidekiq-throttler seems to provide the same functionality: https://github.com/gevans/sidekiq-throttler
But you can also just delay execution (so pre-emptively limiting the rate), by enqueuing jobs at specific times in the future (each executing 4 minutes after the other) or enqueuing just one job that executes itself (doing next outstanding request) and enqueues itself again with 4 minutes delay.
As always with open source, check the code and decide by yourself.
Could you do something like this?
YourProcessingJob.set(wait: 1.hours).perform_later
Possibly in a custom rake task...

What's the best way to manage Resque jobs on per user basis?

I'm migrating from Delayed_jobs to Resque and I have difficulties finding the best way to handle those cases:
A user can NOT add twice the same command to the list of jobs (e.g. "export all my data"). Only one export command at a time. For other it's fine to have many (e.g. send emails)
Some jobs should not run for more than 5 minutes, while other are allowed to run for 30 minutes. In both cases, I'd like to have a time-out in case process is blocked or is not completed on time.
Can add jobs to start in a few days
Inform the user on all their current & future jobs.
Can cancel some jobs (current and future) for the user
Keep ability to have different lists (mostly for priorities / slow and fast tasks)
I looked at resque-status and it seems like it provides the low level query, but I would still need to do my per user job management.
Suggestions on best way to handle this?

Ruby on Rails - Automated Email Sending - Flight Reminder

I'm creating flight booking website in Rails. Booking information is stored in database in the following table:
USERNAME | FLIGHT FROM | FLIGHT TO | DATE OF FLIGHT | TIME OF FLIGHT | some additional information not relevant to this task ... |
I'm looking to send an email an hour (or some specific time) before the TIME OF FLIGHT on a DATE OF FLIGHT. What is the best approach to do it ? I was looking into Cron and delayed_job however both seem to be based more on intervals rather than executing a job at specific date and time.
Please help.
Thank you
The simplest approach is just to have a cron job set to run every 10 minutes and determine via a database query which flights now require a reminder e-mail. You can have an additional field in the database such as "REMINDER_SENT" so that you only send an e-mail once.
If you are already using delayed job then the cron job should just call a ruby script which adds a SendReminders job on to the queue. You can then manage all of the db querying, e-mail sending and db updating from a normal delayed job.
This approach saves you having to queue up a large number of future dated events and you don't need to worry about flight times changing or events getting lost. If you miss one event then the next run in 10 minutes will pick up all the flights anyway.
Are you required to send those notifications exactly one hour (or another time) in advance?
If not I would create a cron job that calls a rake task, say every 10 minutes. This task checks if there are notifications due and sends them. If you expect them to arrive 60 minutes before, with these settings you have a delivery timeframe between 60-70 minutes in advance, given the delays imposed by spam filters etc I think this is reasonable.
If you call the script more often (every minute), the precision is higher, but you might have trouble with concurrently running tasks.

How to lock Resque jobs to one server

I have a "cluster" of Resque servers in my infrastructure. They all have the same exact job priorities etc. I automagically scale the number of Resque servers up and down based on how many pending jobs there are and available resources on the servers to handle said jobs. I always have a minimum of two Resque servers up.
My issue is that when I do a quick, one off job, sometimes both the servers process that job. This is bad.
I've tried adding a lock to my job with something like the following:
require 'resque-lock-timeout'
class ExampleJob
extend Resque::Plugins::LockTimeout
def self.perform
# some code
end
end
This plugin works for longer running jobs. However for these super tiny one off jobs, processing happens right away. The Resque servers both do not see the lock set by its sister server, both set a lock, process the job, unlock, and are done.
I'm not entirely sure what to do at this point or what solutions there are except for having one dedicated server handle this type of job. That would be a serious pain to configure and scale. I really want both the servers to be able to handle it, but once one of them grabs it from the queue, ensure the other does not run it.
Can anyone suggest some viable solution(s)?
Write your lock interpreter to wait T milliseconds before it looks for a lock with a unique_id less than the value of the lock it made.
This will determine who won the race, and the loser will self-terminate.
T is the parallelism latency between all N servers in the pool of a given queue. You can determine this heuristically by scaling back from 1000 milliseconds until you again find the job happening in-duplicate. Give padding for latency variation.
This is called the Busy-Wait solution to mutex thread safety. It is considered one of the trade-offs acceptable given the various scenarios in which one must solve Mutex (e.g. Locking, etc)
I'll post some links when off mobile. Wikipedia entry on mutex should explain all this.
Of this won't work for you, then:
1. Use a scheduler to control duplication.
2. Classify short-running jobs to a queue designed to run them in serial.
TL;DR there is no perfect solution, only good trade-off for your conditions.
It should not be possible for two workers to get the same 'payload' because items are dequeued using BLPOP. Redis will only send the queued item to the first client that calls BLPOP. It sounds like you are enqueueing the job more than once and therefore two workers are able to acquire different payloads with the same arguments. The purpose of 'resque-lock-timeout' is to assure that payloads that have the same method and arguments do not run concurrently; it does not however stop the second payload from being worked if the first job releases the lock before the second job tries to acquire it.
It would make sense that this only happens to short running jobs. Here is what might be happening:
payload 1 is enqueued
payload 2 is enqueued
payload 1 is locked
payload 1 is worked
payload 1 is unlocked
payload 2 is locked
payload 2 is worked
payload 2 is unlocked
Where as in long running jobs the following senario might happen:
payload 1 is enqueued
payload 2 is enqueued
payload 1 is locked
payload 1 is worked
payload 2 is fails to get lock
payload 1 is unlocked
Try turning off Resque and enqueueing your job. Take a look in redis at the list for your Resque queue (or monitor Redis using redis-cli monitor). See if Resque has queued more than one payload. If you still only see one payload then monitor the list to see if another one of your resque workers is calling recreate on failed jobs.
If you want to have 'resque-lock-timeout' hold the lock for longer than the duration it takes to process the job you can override the release_lock! method to set an expiry on the lock instead of just deleting it.
module Resque
module Plugins
module LockTimeout
def release_lock!(*args)
lock_redis.expire(redis_lock_key(*args), 60) # expire lock after 60 seconds
end
end
end
end
https://github.com/lantins/resque-lock-timeout/blob/master/lib/resque/plugins/lock_timeout.rb#l153-155

Email notification when 'updated_at' become 2 hours before current time

I'd like to make an email notification if SomeModel has not been updated for 2 hours.
What is the best way to implement it?
After a model has been saved, queue up a background job to run 2 hours from that time to send the email. When a new job is enqueued, remove any still-unrun jobs that are still on the queue.
resque-scheduler providers a pretty simple way of doing this, assuming you have redis up and running.
Personally I find the solution that #x1a4 proposes to be somewhat overkill. Given the relatively large window of 2 hours, I would just run a job periodically (say, once every 10-15 minutes), then search all Models for updated_at <= 2.hours.ago and send out the emails.
As for scheduling that job to run every 15 minutes, there are several options. You may use resque-scheduler, if you are using Resque. You may also use the standard system cron, but will incur some fairly substantial overhead starting Rails each time the job runs. I also have written a distributed scheduler gem (i.e. cron that can run on multiple machines, but act like it's only running on one), which uses Redis under the hood.

Resources