Running Jobs when DB is free on Ruby on Rails Heroku - ruby-on-rails

I have a ruby on rails app that uses Heroku. I have the need to run things like import/export tasks on our db that lock up the whole system since they are so heavy on the DB. Is there a way to tell the system to only run these tasks when the database is not being used at that second?

There is no built-in way to schedule a job like this. There are a few things you can do, though.
Schedule the jobs to run during the least busy hours of the day. That will depend on your business, customer base and so on, but hopefully there is a window that is more suitable than others.
You could write your batch job to run for a longer time, doing small units of work. Between each unit of work, sleep for a few seconds, or take a look at the current load average and decide what to do based on that. This should lower the impact of the batch jobs.
Have the website update a "lock" somewhere, either in the database or in a memcached or something. If your normal website usage updates the database, you could look at the existing updated_at. Then only do batch work when there hasn't been any activity for a while. This doesn't guarantee that a new user won't pop in at the same time your batch job runs, of course, but could be a way to find a window where the site is less used.

Have you looked into using Background Jobs / Workers on Heroku? It's also worth reading about Heroku's Delayed Job queuing system

Related

Running large amount of long running background jobs in Rails

We're building a web-app where users will be uploading potentially large files that will need to be processed in the background. The task involves calling 3rd-party APIs so each job can take several hours to complete. We're using DelayedJob to run the background jobs. With every user kicking off a background job, each of which will take a few hours to finish, that will add up to a lot of background jobs every quickly. I am wondering what would be the best way to setup the deployment for this? We're currently hosted on DigitalOcean. I've kicked off 10 DelayedJob workers. Each one (when ideal) takes up 157MB. When actively running it utilizes around 900 MB. Our user-base right now is pretty small so it's not an issue but will be one soon. So on a 4GB droplet, I can probably run like 2 or 3 workers at a time. How should we approach this issue? Should we be looking at using DigitalOcean's API to auto-spin cheap droplets on demand? Should we subscribe to high-memory droplets on a monthly basis instead? If we go with auto-spinning droplets, should we stick with DigitalOcean or would Heroku make more sense? Or is the entire approach wrong and should we be approaching it from an entire different direction? Any help/advice would be very much appreciated.
Thanks!
It sounds like you are limited by memory on the number of workers that you can run on your DigitalOcean host.
If you are worried about scaling, I would focus on making the workers as efficient as possible. Have you done any benchmarking to understanding where the 900MB of memory is being allocated? I'm not sure what the nature of these jobs are, but you mentioned large files. Are you reading the contents of these files into memory, or are you streaming them? Are you using a database with SQL you can tune? Are you making many small API calls when you could be using a batch endpoint? Are you assigning intermediary variables that must then be garbage collected? Can you compress the files before you send them?
Look at the job structure itself. I've found that background jobs work best with many smaller jobs rather than one larger job. This allows execution to happen in parallel, and be more load balanced across all workers. You could even have a job that generates other jobs. If you need a job to orchestrate callbacks when a group of jobs finishes there is a DelayedJobGroup plugin at https://github.com/salsify/delayed_job_groups_plugin that allows you to invoke a final job only after the sibling jobs complete. I would aim for an execution time of a single job to be under 30 seconds. This is arbitrary but it illustrates what I mean by smaller jobs.
Some hosting providers like Amazon provide spot instances where you can pay a lower price on servers that do not have guaranteed availability. These pair well with the many fewer jobs approach I mentioned earlier.
Finally, Ruby might not be the right tool for the job. There are faster languages, and if you are limited by memory, or CPU, you might consider writing these jobs and their workers in another language like Javascript, Go or Rust. These can pair well with a Ruby stack, but offload computationally expensive subroutines to faster languages.
Finally, like many scaling issues, if you have more money than time, you can always throw more hardware at it. At least for a while.
I thing memory and time is more problem for you. you have to use sidekiq gem for this process because it will consume less time and memory consumption for doing the same job,because it uses redis as database which is key value pair db.if the problem continues go with java script.

Keep delayed job running on Heroku

I'm connecting to Twitter's streaming API to get a stream of updates to my Rails app, adding them to the db, etc, etc.
What's the best way to do this on Heroku? Right now, I'm using the delayed_job gem - problem is that the job (connecting to the Twitter Streaming API) expires after hours.
Is there a way to make the job run forever, or a better way to do this?
Thanks
I wouldn't make a job "run forever" as that would mean loading the CPU forever too.
The way this is usually handled is by using a cron job which starts the specific script at specific intervals (every minute, every hour, every few days, etc.).
Almost every webhost provides an easy interface to setup such cron jobs via their backend (eg: CPanel).
In case you're running your own server, you probably already know how to configure such jobs. If you don't, you'll have to lookup the individual setup guide which fits the operating system you're running on your server… there's always a way to run "jobs" at specific intervals (even on MS Windows servers — via scheduling).
And for a more detailed description and better insight into what "cron" is, you might want to check the "cron" article at Wikipedia , which also provides some pretty good examples.

Rails: Delayed_job for queuing but running jobs through cron

Ok, so this is probably evil, however.. here's the question! I want to run a pretty lightweight app on a shared environment (site5). Ideally I would like to use delayed_job for the ease of queueing the mails (~200+ every so often). However, being a shared environment they don't want background processes running all the time (fair enough).
So, my plan, such as it is, is to queue the mails using delayed job, and then every hour or something, spin up a cron job, send a few emails (10 or something small) and then kill the process. And repeat.
Q) Is there a rake jobs:works:1 equivalent task it'd be easy to setup? - pointer would be handy.
I'm quite open to "this is a terrible idea, don't even go there" being the answer.. in which case I might look at another queuing strategy... (or heroku hire-fire perhaps..)
You can get delayed job to process only a certain number of jobs by doing:
Delayed::Worker.new.work_off(10)
You could fire a script to do that from cron or use "rails runner":
rails runner -e production 'Delayed::Worker.new.work_off(10)'
I guess the main issue on whether it is a good idea or not is working out what small value is actually high enough to make sure you process all your jobs in a reasonable time-frame. Also, you've got the overhead of firing up the rails environment every time you want to process, or even check whether you should process, any jobs. That might cause problems in a shared environment if they are particularly strict on spikes of memory or CPU usage.
Why not skip the 'workers' (which are just daemons which look for work else sleep) and have your cron fire a custom rake task of 10.times { MailerJob.first.perform }
You'd just need to require you're app in the line before that so its loaded ofc.

Best current rails background task method?

I am trying to find out the best way to run scripts in the background. I have been looking around and found plenty of options, but many/most seem to have become inactive in the past few years. Let me describe my needs.
The rails app is basically a front-end to configure when and how these scripts will be run. The scripts run and generate reports and send email alerts. So the user must be able to configure the start times and how often these scripts will run dynamically. The scripts themselves should have access to the rails environment in order to save the resulting reports in the DB.
Just trying to figure out the best method from the myriad of options.
I think you're looking for a background job queuing system.
For that, you're either looking for resque or delayed_job. Both support scheduling tasks at some point in the future -- delayed_job does this natively, whereas resque has a plugin for it called resque_scheduler.
You would enqueue jobs in the background with parameters that you specify, and then at the time you selected they'll be executed. You can set jobs to recur indefinitely or a fixed number of times (at least with resque-scheduler, not sure about delayed_job).
delayed_job is easier to set up since it saves everything in the database. resque is more robust but requires you to have redis in your stack -- but if you do already it's pretty much the ideal solution for your problem.
I recently learned about Sidekiq, and I think it is really great.
There's also a RailsCast about it - Sidekiq.
Take a look at the gem whenever at https://github.com/javan/whenever.
It allows you to schedule tasks like cron jobs.
Works very well under linux, and the last commit was 14 days ago. A friend of mine used it in a project and was pretty satisfied with it.
edit: take a look at the gem delayed_job as well, it is good for executing long tasks in the background. Useful when creating a cron job only to start other tasks.

How can I monitor recurrent rake tasks run by heroku scheduler?

I just got the last month heroku bill, and the scheduled rake tasks were a relatively heavy burden. We are pretty early in our development process, so we just developed some rake tasks to get the job done recently, and didn't had much concern in theirs optimization.
Now we want to improve theirs performance and theirs heroku processing hours usage. We use New Relic to monitor the webapp performance, but apparently this type of rake tasks are ignored by default, and it's unclear how to override that.
Anyone had a similiar problem? How can I track the scheduled tasks in close to real time to monitor performance, optimize, and don't get suprise bills?
Whilst you can't really monitor rake tasks that well, there are a few little things you can do. One is the use of logging. Output start and end times of tasks to logs, and you can then see what's been happening duration wise. If you couple this with something like the Papertrail add-on then you can do additional interrogation later on.
As for running the jobs themselves, there's a couple of ways that you can run background processes which are dependant on how they need to run:
If you're needing to run jobs on a schedule, there's a few options available. Firstly there's the Heroku scheduler, which is pretty good, but doesn't guarantee executions will happen. Normally you would use this to kick off a rake task which will bring up a one-off dyno for the duration of the task - therefore you need to ensure in development that these tasks are as efficient as possible.
Alternatively, if you're looking at jobs that need a little more control or using a clock process. Essentially this is a dyno running 24/7 that does nothing but kick off other jobs at preset intervals and times. This would normally be done using the clockwork gem. The downside of this approach is that you need to pay for a clock process all the time.
A third approach, and one that might work is delayed job, with it's runat option, allowing you to queue a job to be run in the future (and jobs can re-queue themselves). There are a few issues with this in that a failure can kill the whole chain, and you need a full time worker running to process them all.
Therefore, in order to minimize your bills, ensure that your rake tasks are as performant and reliable, and then choose the scheduling option that suits you. If you're looking at schedules plus user created events, delayed_job might be the best option. If you're looking at a few tasks running periodically, then go scheduler. If you're looking at running lots of time critical jobs on a regular basis, go with clockwork.
Either way, you should be able to constrain a fair amount of processing into just one or two processes depending on your approach.
I know this question is almost 10 years old, but there is a new way!
You can now monitor your Heroku Scheduler jobs using One-off Dyno Metrics. This Heroku add-on gathers metrics for all detached one-off dynos running in your Heroku app. It was created to be an extension of Heroku's Application Metrics and works out of the box.
when you are running on heroku cedar there is a way to get a free setup for your workers. this is no answer to your monitoring question, but it might be interesting anyways: http://blog.nofail.de/2011/07/heroku-cedar-background-jobs-for-free/
You can force the New Relic agent to start in your rake tasks and report their performance data.
Not the answer to the specific question,but...
One method of reducing overhead is using Unicorn server to get multiple workers working on one dyno. It depends on your set up, but most people who've taken the time to test it can comfortably get 3 - 4 worker processes running concurrently. It's a huge boost in clearing cues or tasks. Just be careful not to max out the allocated memory for the dyno.

Resources