How can I monitor recurrent rake tasks run by heroku scheduler? - ruby-on-rails

I just got the last month heroku bill, and the scheduled rake tasks were a relatively heavy burden. We are pretty early in our development process, so we just developed some rake tasks to get the job done recently, and didn't had much concern in theirs optimization.
Now we want to improve theirs performance and theirs heroku processing hours usage. We use New Relic to monitor the webapp performance, but apparently this type of rake tasks are ignored by default, and it's unclear how to override that.
Anyone had a similiar problem? How can I track the scheduled tasks in close to real time to monitor performance, optimize, and don't get suprise bills?

Whilst you can't really monitor rake tasks that well, there are a few little things you can do. One is the use of logging. Output start and end times of tasks to logs, and you can then see what's been happening duration wise. If you couple this with something like the Papertrail add-on then you can do additional interrogation later on.
As for running the jobs themselves, there's a couple of ways that you can run background processes which are dependant on how they need to run:
If you're needing to run jobs on a schedule, there's a few options available. Firstly there's the Heroku scheduler, which is pretty good, but doesn't guarantee executions will happen. Normally you would use this to kick off a rake task which will bring up a one-off dyno for the duration of the task - therefore you need to ensure in development that these tasks are as efficient as possible.
Alternatively, if you're looking at jobs that need a little more control or using a clock process. Essentially this is a dyno running 24/7 that does nothing but kick off other jobs at preset intervals and times. This would normally be done using the clockwork gem. The downside of this approach is that you need to pay for a clock process all the time.
A third approach, and one that might work is delayed job, with it's runat option, allowing you to queue a job to be run in the future (and jobs can re-queue themselves). There are a few issues with this in that a failure can kill the whole chain, and you need a full time worker running to process them all.
Therefore, in order to minimize your bills, ensure that your rake tasks are as performant and reliable, and then choose the scheduling option that suits you. If you're looking at schedules plus user created events, delayed_job might be the best option. If you're looking at a few tasks running periodically, then go scheduler. If you're looking at running lots of time critical jobs on a regular basis, go with clockwork.
Either way, you should be able to constrain a fair amount of processing into just one or two processes depending on your approach.

I know this question is almost 10 years old, but there is a new way!
You can now monitor your Heroku Scheduler jobs using One-off Dyno Metrics. This Heroku add-on gathers metrics for all detached one-off dynos running in your Heroku app. It was created to be an extension of Heroku's Application Metrics and works out of the box.

when you are running on heroku cedar there is a way to get a free setup for your workers. this is no answer to your monitoring question, but it might be interesting anyways: http://blog.nofail.de/2011/07/heroku-cedar-background-jobs-for-free/

You can force the New Relic agent to start in your rake tasks and report their performance data.

Not the answer to the specific question,but...
One method of reducing overhead is using Unicorn server to get multiple workers working on one dyno. It depends on your set up, but most people who've taken the time to test it can comfortably get 3 - 4 worker processes running concurrently. It's a huge boost in clearing cues or tasks. Just be careful not to max out the allocated memory for the dyno.

Related

Is it possible to redeploy a Heroku app without restarting some process types

I'm running a Rails app on Heroku, and I have defined a custom process type to perform some long-running jobs, really long-running, a job can easily take something about an hour or more. I know it's better to split it into some small chunks, but that's quite problematic for that task.
And the issue is that when I push a new version — Heroku restarts all the dynos (web, workers, long workers — everything). I wonder is it possible to restart only some process types, e.g. only the web dynos?
No, that isn't possible. The easiest and most scalable way around this would be to split your long-running jobs into smaller chunks.
That way, you would have a lot of very small jobs being processed very quickly. When your app is restarted, you would be able to restart your process, as it wouldn't stop a long-running job.
Alternatively, one-off dynos won't be restarted when your app is deployed.
Using the heroku api, you can programmatically boot one-off dynos. Using that, you could start a one-off dyno for each long-running job you need to process.
That job would be processed (for up to 24 hours, where it would be cycled), and you would be able to deploy your app without restarting it.

custom clock process on heroku

I've just inherited a Rails project and before it was on a typical 'nix server. The decision was made to move it to heroku for the client and it up to me to get the background process working.
Currently it uses Whenever to schedule daily events(email etc) and fire up the delayed job queue on boot.
Heroku provides an example of documentation for a custom clock process using clockwork, can I going by this example use it with whenever? Any pitfalls I might come across? Will I need to create a separate worker dyno?
Scheduled Jobs and Custom Clock Processes in Ruby with Clockwork
Yes -- Heroku's Cedar stack lets you run whatever you want.
The basic building block of the Cedar stack is the dyno. Each dyno gets an ephemeral copy of your application, 512 MB of RAM, and a bunch of shared CPU time. Web dynos are expected to bind an HTTP server to the port specified in the $PORT environment variable, since that's where Heroku will send HTTP requests, but other than that, web dynos are identical to other types of dynos.
Your application tells Heroku how to run its various components by defining them in the Procfile. (See Declaring and Scaling Process Types with Procfile.) The Clock Processes article demonstrates a pattern where you use a worker (i.e. non-web) dyno to enqueue work based on arbitrary criteria. Again, you can do whatever you want here -- just define it in a Procfile and Heroku will happily run it. If you go with a clock process (e.g. a 24x7 whenever), you'll be using a whole dyno ($0.05/hour) to do nothing but schedule work.
In your case, I'd consider switching from Whenever to Heroku Scheduler. Scheduler is basically a Heroku-run cron, where the crontab entries are "spin up a dyno and run this command". You'll still pay $0.05/hour for the extra dynos, but unlike the clock + worker setup, you'll only pay for the time they actually spend running. It cleanly separates periodic tasks from the steady-state web + worker traffic, and it's usually significantly cheaper too.
The only other word of warning is that running periodic tasks in distributed systems is complex and has complex failure modes. Some of the platform incidents (corresponding with the big EC2 outages) have resulted in things like 2 simultaneous clock processes and duplicate scheduler runs. If you're doing something that needs to run serially (like emailing people once a day), consider guarding it with RDBMS locking, and double-checking that it's actually been ~23 hours since your daily job.
Heroku Scheduler is often a bad option for production use because it's unreliable and will skip running its tasks sometimes.
The good news is that if you run a jobs queue dyno with Sidekiq there are scheduling plugins for it, e.g. sidekiq-cron. With that you can use the same dyno for scheduling. And if you don't have a jobs worker yet you need to set it up just for scheduling if you need to run it reliably.
P.S. if you happen to run Delayed::Job for jobs queing there are scheduling plugins for it, too, e.g. this one.

Rails: Delayed_job for queuing but running jobs through cron

Ok, so this is probably evil, however.. here's the question! I want to run a pretty lightweight app on a shared environment (site5). Ideally I would like to use delayed_job for the ease of queueing the mails (~200+ every so often). However, being a shared environment they don't want background processes running all the time (fair enough).
So, my plan, such as it is, is to queue the mails using delayed job, and then every hour or something, spin up a cron job, send a few emails (10 or something small) and then kill the process. And repeat.
Q) Is there a rake jobs:works:1 equivalent task it'd be easy to setup? - pointer would be handy.
I'm quite open to "this is a terrible idea, don't even go there" being the answer.. in which case I might look at another queuing strategy... (or heroku hire-fire perhaps..)
You can get delayed job to process only a certain number of jobs by doing:
Delayed::Worker.new.work_off(10)
You could fire a script to do that from cron or use "rails runner":
rails runner -e production 'Delayed::Worker.new.work_off(10)'
I guess the main issue on whether it is a good idea or not is working out what small value is actually high enough to make sure you process all your jobs in a reasonable time-frame. Also, you've got the overhead of firing up the rails environment every time you want to process, or even check whether you should process, any jobs. That might cause problems in a shared environment if they are particularly strict on spikes of memory or CPU usage.
Why not skip the 'workers' (which are just daemons which look for work else sleep) and have your cron fire a custom rake task of 10.times { MailerJob.first.perform }
You'd just need to require you're app in the line before that so its loaded ofc.

Using "rails runner" for cron jobs is very CPU intensive - alternatives?

I'm currently using cron and "rails runner" to execute background jobs. For the most part these jobs are simple polls "Find the records that are due to receive a reminder email. Send that email."
I've been watching my Amazon EC2 Small instance, and noticed that each time one of these cron job kicks in, the CPU spikes to ~99%. The teeny tiny little query inside my current job is definitely not responsible. I'm presuming that the spike is simply due to the effort of loading the full rails environment via "rails runner".
Is there a more CPU efficient way to handle regularly scheduled batch jobs?
P.S. I know that in the particular example of sending a reminder email at time X in the future, I could delayed_jobs, and simply schedule the job in the future. Not every possible task fits into the delayed_jobs framework very well though, so I'm looking for a more traditional "cron job" type solution. Like "rails runner", but without the crazy CPU consequences.
You can use workers witch don't load rails env. Or load it only once(like resque)
I don't think there is a solution for this, since you do need to load a Rails environment to handle whatever that is you are handling. So when on the "cron" model you will be starting up a handler which in turn will create some load on your instance. I don't know how cloud services lend themselves to this, but I think the optimal model in your case would be to have a running daemon for job handling and forking coupled with REE for the job execution (that helps prevent memory leaks by letting as much as possible happen in the child process that will die at the end of the execution loop).
The daemon could be configured to accept signals (also via a job queue) that would spin off jobs doing specific things.

Running Jobs when DB is free on Ruby on Rails Heroku

I have a ruby on rails app that uses Heroku. I have the need to run things like import/export tasks on our db that lock up the whole system since they are so heavy on the DB. Is there a way to tell the system to only run these tasks when the database is not being used at that second?
There is no built-in way to schedule a job like this. There are a few things you can do, though.
Schedule the jobs to run during the least busy hours of the day. That will depend on your business, customer base and so on, but hopefully there is a window that is more suitable than others.
You could write your batch job to run for a longer time, doing small units of work. Between each unit of work, sleep for a few seconds, or take a look at the current load average and decide what to do based on that. This should lower the impact of the batch jobs.
Have the website update a "lock" somewhere, either in the database or in a memcached or something. If your normal website usage updates the database, you could look at the existing updated_at. Then only do batch work when there hasn't been any activity for a while. This doesn't guarantee that a new user won't pop in at the same time your batch job runs, of course, but could be a way to find a window where the site is less used.
Have you looked into using Background Jobs / Workers on Heroku? It's also worth reading about Heroku's Delayed Job queuing system

Resources