What's the best way to manage Resque jobs on per user basis? - ruby-on-rails

I'm migrating from Delayed_jobs to Resque and I have difficulties finding the best way to handle those cases:
A user can NOT add twice the same command to the list of jobs (e.g. "export all my data"). Only one export command at a time. For other it's fine to have many (e.g. send emails)
Some jobs should not run for more than 5 minutes, while other are allowed to run for 30 minutes. In both cases, I'd like to have a time-out in case process is blocked or is not completed on time.
Can add jobs to start in a few days
Inform the user on all their current & future jobs.
Can cancel some jobs (current and future) for the user
Keep ability to have different lists (mostly for priorities / slow and fast tasks)
I looked at resque-status and it seems like it provides the low level query, but I would still need to do my per user job management.
Suggestions on best way to handle this?

Related

Increase the recurring job polling interval for hangfire and enabling/disabling the recurring job process

I am trying to create a background processor windows service using hangfire.
I would like to increase the recurring job polling interval to more than 1 minute(hard-coded by default). The reason for doing the same is that recurring polling can affect the performance of the database.
Is there a possibility to enable/disable the hangfire recurring Job feature. This is required in case there are multiple instances of the service installed.
When you create a recurring job in Hangfire, even if you have multiple Hangfire servers, the job will not be run on two servers at the same time.
You can use Cron expression to define the frequency at which to run your job, as described in Hangfire docs:
RecurringJob.AddOrUpdate(() => YourJob(), "0 12 * */2");
However, your need may be to avoid triggering a job when the previous instance is still running. For this situation, I would recommend setting a flag (in the DB for example) when your job starts and removing it when it ends. Then check if the flag is present before actually starting your process.
Update
As you stated you want to prevent the RecurringJobScheduler from running on some servers, I have looked into the code and it seems there is no option to do this.
You can check the file BackgroundJobServer.cs where the scheduler is added to the process list and the RecurringJobScheduler.cs where the DB is queried. The value of 1 minute is hardcoded, as specified in the comments.
I think your only option is the pull request you have already made :(

enqueuing jobs using sucker punch

I have one doubt with enqueuing the job using sucker punch.
I have 2000+ search keywords in my database I want to know the google and bing ranking for each keyword in my database. For this I'm using Authority Labs API. But AuthorityLabs will only process 1000 POST request in 1 hour. I'm sending each request to AuthorityLab as a background job using sucker punch. How can I limit only 1000 jobs will run in 1 hour, remaining jobs only start after one hour. Also I want to run this jobs daily for analysing the rank change.
Rate limiting is not a concern of your queue system, much less of SuckerPunch that is not designed to handle advanced delaying/queuing stuff, it just moves asynchronous jobs to a thread from a thread pool.
If you really want to have rate limiting, use a real queue system like Sidekiq, and put some actual code to work.
Sidekiq Enterprise supports it natively: https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting
Sidekiq-throttler seems to provide the same functionality: https://github.com/gevans/sidekiq-throttler
But you can also just delay execution (so pre-emptively limiting the rate), by enqueuing jobs at specific times in the future (each executing 4 minutes after the other) or enqueuing just one job that executes itself (doing next outstanding request) and enqueues itself again with 4 minutes delay.
As always with open source, check the code and decide by yourself.
Could you do something like this?
YourProcessingJob.set(wait: 1.hours).perform_later
Possibly in a custom rake task...

Email notification when 'updated_at' become 2 hours before current time

I'd like to make an email notification if SomeModel has not been updated for 2 hours.
What is the best way to implement it?
After a model has been saved, queue up a background job to run 2 hours from that time to send the email. When a new job is enqueued, remove any still-unrun jobs that are still on the queue.
resque-scheduler providers a pretty simple way of doing this, assuming you have redis up and running.
Personally I find the solution that #x1a4 proposes to be somewhat overkill. Given the relatively large window of 2 hours, I would just run a job periodically (say, once every 10-15 minutes), then search all Models for updated_at <= 2.hours.ago and send out the emails.
As for scheduling that job to run every 15 minutes, there are several options. You may use resque-scheduler, if you are using Resque. You may also use the standard system cron, but will incur some fairly substantial overhead starting Rails each time the job runs. I also have written a distributed scheduler gem (i.e. cron that can run on multiple machines, but act like it's only running on one), which uses Redis under the hood.

How should I schedule many Google Search scrapes over the course of a day?

Currently, my Nokogiri script iterates through Google's SERPs until it finds the position of the target website. It does this for each keyword for each website that each user specifies (users are capped on amount of websites & keywords they can track).
Right now, it's run in a rake that's hard-scheduled every day and batches all scrapes at once by looping through all the websites in the database. But I'm concerned about scalability and swarming Google with a batch of requests.
I'd like a solution that scales and can run these scrapes over the course of the day. I'm not sure what kind of solution is available or what I'm really looking for.
Note: The amount of websites/keywords change from day to day as users add and delete their websites and keywords. I don't mean to make this question too superfluous, but is this the kind of thing Beanstalkd/Stalker (job queuing) can be used for?
You will have to balance two issues: Scalability for lots of users versus Google shutting you down for scaping in violation of their terms of use.
So your system will need to be able to distribute tasks to various different IPs to conceal your bulk scraping which suggests at least two levels of queuing. One to manage all the jobs and send them to each separate IP for subsequent searching and collecting results and queues on each separate machine to hold the requested searches until they are executed and the results returned.
I have no idea what Google's thresholds are (I am sure they don't advertise it) but exceeding them and getting cut off would obviously be devastating for what you are trying to do so your simple looping rake task is exactly what you shouldn't do after a certain number of users.
So yes, use a queue of some sort but realize that you probably have a different goal from the typical goal of a queue in that you want to deliberately delay jobs rather that offload word to avoid UI delays. So you will be seeking ways to slow down the queue rather than have it just execute job after job as they arrive in the queue.
So based on a cursory inspection of DelayedJob and BackgroundJobs it looks like DelayedJob has what you would need with the run_at attribute. But I am only speculating here and I am sure an expert would have more to say.
If I'm understanding correclty, it sounds like one of these tools might fit the bill:
Delayed_job: https://github.com/tobi/delayed_job
or
BackgroundJobs: http://codeforpeople.rubyforge.org/svn/bj/trunk/README
I've used both of them, and found them easy to work with.
There are definitely some background job libraries that might work.
delayed_job: https://github.com/collectiveidea/delayed_job (beware of the unmaintained branch from tobi!)
resque: https://github.com/defunkt/resque
However, you might think about just scheduling a Cron job that runs more times during the day, and processes less items per run.
SaaS solution: http://momentapp.com/ "Launch delayed jobs with scheduled http requests" - disclaimer a) in beta b) I am not affiliated with this service

Running Jobs when DB is free on Ruby on Rails Heroku

I have a ruby on rails app that uses Heroku. I have the need to run things like import/export tasks on our db that lock up the whole system since they are so heavy on the DB. Is there a way to tell the system to only run these tasks when the database is not being used at that second?
There is no built-in way to schedule a job like this. There are a few things you can do, though.
Schedule the jobs to run during the least busy hours of the day. That will depend on your business, customer base and so on, but hopefully there is a window that is more suitable than others.
You could write your batch job to run for a longer time, doing small units of work. Between each unit of work, sleep for a few seconds, or take a look at the current load average and decide what to do based on that. This should lower the impact of the batch jobs.
Have the website update a "lock" somewhere, either in the database or in a memcached or something. If your normal website usage updates the database, you could look at the existing updated_at. Then only do batch work when there hasn't been any activity for a while. This doesn't guarantee that a new user won't pop in at the same time your batch job runs, of course, but could be a way to find a window where the site is less used.
Have you looked into using Background Jobs / Workers on Heroku? It's also worth reading about Heroku's Delayed Job queuing system

Resources