Is it possible to redeploy a Heroku app without restarting some process types - ruby-on-rails

I'm running a Rails app on Heroku, and I have defined a custom process type to perform some long-running jobs, really long-running, a job can easily take something about an hour or more. I know it's better to split it into some small chunks, but that's quite problematic for that task.
And the issue is that when I push a new version — Heroku restarts all the dynos (web, workers, long workers — everything). I wonder is it possible to restart only some process types, e.g. only the web dynos?

No, that isn't possible. The easiest and most scalable way around this would be to split your long-running jobs into smaller chunks.
That way, you would have a lot of very small jobs being processed very quickly. When your app is restarted, you would be able to restart your process, as it wouldn't stop a long-running job.
Alternatively, one-off dynos won't be restarted when your app is deployed.
Using the heroku api, you can programmatically boot one-off dynos. Using that, you could start a one-off dyno for each long-running job you need to process.
That job would be processed (for up to 24 hours, where it would be cycled), and you would be able to deploy your app without restarting it.

Related

how to continuously deploy with long running jobs

We currently use delayed_job and rails to manage some long running jobs in our system. Some of these jobs take potentially hours to run, but we also like to deploy rather frequently, often many times a day. The problem with this setup is that we have to restart delayed_job during deployment to pick up code changes, so that any new jobs are processed with the latest code.
The solution we've arrived at is that for any job that needs to run for more than some small amount of time, we fork the delayed job so that it returns immediately, and the forked process handles the work. This way a deploy can restart all the delayed job processes, while the long-running 'job' keeps going until it's finished as an orphaned process.
We've looked at sidekiq, but it looks like we'd have the same issue there when trying to deploy new code.
Has anyone developed a solution they would recommend for dealing with long-running background processes that span multiple deployments?

Scaling Dyno worker size dynamically on Heroku Rails application

I am working on a project that launches a process via a Rails worker that is very resource intensive and it can only be handled properly by a Performance Worker on Heroku, 1X workers are killed because they use too much RAM and 2X workers can barely handle the load exceeding their RAM limits by up to 160%. A performance worker does the job fine with no issues.
My question is, is there a way to dynamically switch the Dyno size to Performance before a job initiates and then scale it back down once the job is finished or a queue is empty?
I know HireFire exists but to my knowledge this service only increases the amount of workers based on a queue length etc? Another possible solution I thought about was using the Heroku API which has a Dyno endpoint to resize the worker dyno before the job starts and then resize it back down when the job ends.
Does anyone else have other recommendations, ideas or strategies for this issue?
Thanks!
The best way is the one you mentioned: use the Heroku Platform API to scale your Dyno size up before starting the job, and then down again afterwards.
This is because tools like HireFire only work by inspecting stuff like application response time, router queue, etc. -- so there's no way for them to know you're about to run some job and then scale up just for that.
Depending on the specifics of the usage, you may be able to just create a distinct dyno-type in your procfile that only runs this particular worker and is always scaled to performance, but isn't always running? You could even just run this with one-off runs, instead of scaling it potentially (this can also be done via the API, roughly equivalent to heroku run ...). That said, #rdegges answer should certainly work.

How long can a sucker_punch worker run for on heroku?

I have sucker_punch worker which is processing a csv file, I initially had a problem with the csv file disappearing when the dyno powered down, to fix that i'm gonna set up s3 for file storage.
But my current concern is whether a dyno powering down will stop my worker in it's tracks.
How can I prevent that?
Since sucker_punch uses a separate thread on the same dyno and does not use an external queue or persistence (the way delayed_job, sidekiq, and resque do) you will be subject to losing the job when your dyno gets rebooted or stopped and you'll have no way to restart the job. On Heroku, dynos are rebooted at least once a day. If you need persistence and the ability to retry a job in the event a dyno goes down, I'd say switch to one of the other job libraries:
https://github.com/collectiveidea/delayed_job
https://github.com/mperham/sidekiq
https://github.com/resque/resque
However, these require using a Heroku Addon. You can get a way with the free version but you will still have to pay for the extra worker process. Other than that you'd have to implement your own persistence and retrying by wrapping sucker_punch. Here's a discussion on adding those features to sucker_punch: https://github.com/brandonhilkert/sucker_punch/issues/21 They basically say to use Sidekiq instead.

custom clock process on heroku

I've just inherited a Rails project and before it was on a typical 'nix server. The decision was made to move it to heroku for the client and it up to me to get the background process working.
Currently it uses Whenever to schedule daily events(email etc) and fire up the delayed job queue on boot.
Heroku provides an example of documentation for a custom clock process using clockwork, can I going by this example use it with whenever? Any pitfalls I might come across? Will I need to create a separate worker dyno?
Scheduled Jobs and Custom Clock Processes in Ruby with Clockwork
Yes -- Heroku's Cedar stack lets you run whatever you want.
The basic building block of the Cedar stack is the dyno. Each dyno gets an ephemeral copy of your application, 512 MB of RAM, and a bunch of shared CPU time. Web dynos are expected to bind an HTTP server to the port specified in the $PORT environment variable, since that's where Heroku will send HTTP requests, but other than that, web dynos are identical to other types of dynos.
Your application tells Heroku how to run its various components by defining them in the Procfile. (See Declaring and Scaling Process Types with Procfile.) The Clock Processes article demonstrates a pattern where you use a worker (i.e. non-web) dyno to enqueue work based on arbitrary criteria. Again, you can do whatever you want here -- just define it in a Procfile and Heroku will happily run it. If you go with a clock process (e.g. a 24x7 whenever), you'll be using a whole dyno ($0.05/hour) to do nothing but schedule work.
In your case, I'd consider switching from Whenever to Heroku Scheduler. Scheduler is basically a Heroku-run cron, where the crontab entries are "spin up a dyno and run this command". You'll still pay $0.05/hour for the extra dynos, but unlike the clock + worker setup, you'll only pay for the time they actually spend running. It cleanly separates periodic tasks from the steady-state web + worker traffic, and it's usually significantly cheaper too.
The only other word of warning is that running periodic tasks in distributed systems is complex and has complex failure modes. Some of the platform incidents (corresponding with the big EC2 outages) have resulted in things like 2 simultaneous clock processes and duplicate scheduler runs. If you're doing something that needs to run serially (like emailing people once a day), consider guarding it with RDBMS locking, and double-checking that it's actually been ~23 hours since your daily job.
Heroku Scheduler is often a bad option for production use because it's unreliable and will skip running its tasks sometimes.
The good news is that if you run a jobs queue dyno with Sidekiq there are scheduling plugins for it, e.g. sidekiq-cron. With that you can use the same dyno for scheduling. And if you don't have a jobs worker yet you need to set it up just for scheduling if you need to run it reliably.
P.S. if you happen to run Delayed::Job for jobs queing there are scheduling plugins for it, too, e.g. this one.

How can I monitor recurrent rake tasks run by heroku scheduler?

I just got the last month heroku bill, and the scheduled rake tasks were a relatively heavy burden. We are pretty early in our development process, so we just developed some rake tasks to get the job done recently, and didn't had much concern in theirs optimization.
Now we want to improve theirs performance and theirs heroku processing hours usage. We use New Relic to monitor the webapp performance, but apparently this type of rake tasks are ignored by default, and it's unclear how to override that.
Anyone had a similiar problem? How can I track the scheduled tasks in close to real time to monitor performance, optimize, and don't get suprise bills?
Whilst you can't really monitor rake tasks that well, there are a few little things you can do. One is the use of logging. Output start and end times of tasks to logs, and you can then see what's been happening duration wise. If you couple this with something like the Papertrail add-on then you can do additional interrogation later on.
As for running the jobs themselves, there's a couple of ways that you can run background processes which are dependant on how they need to run:
If you're needing to run jobs on a schedule, there's a few options available. Firstly there's the Heroku scheduler, which is pretty good, but doesn't guarantee executions will happen. Normally you would use this to kick off a rake task which will bring up a one-off dyno for the duration of the task - therefore you need to ensure in development that these tasks are as efficient as possible.
Alternatively, if you're looking at jobs that need a little more control or using a clock process. Essentially this is a dyno running 24/7 that does nothing but kick off other jobs at preset intervals and times. This would normally be done using the clockwork gem. The downside of this approach is that you need to pay for a clock process all the time.
A third approach, and one that might work is delayed job, with it's runat option, allowing you to queue a job to be run in the future (and jobs can re-queue themselves). There are a few issues with this in that a failure can kill the whole chain, and you need a full time worker running to process them all.
Therefore, in order to minimize your bills, ensure that your rake tasks are as performant and reliable, and then choose the scheduling option that suits you. If you're looking at schedules plus user created events, delayed_job might be the best option. If you're looking at a few tasks running periodically, then go scheduler. If you're looking at running lots of time critical jobs on a regular basis, go with clockwork.
Either way, you should be able to constrain a fair amount of processing into just one or two processes depending on your approach.
I know this question is almost 10 years old, but there is a new way!
You can now monitor your Heroku Scheduler jobs using One-off Dyno Metrics. This Heroku add-on gathers metrics for all detached one-off dynos running in your Heroku app. It was created to be an extension of Heroku's Application Metrics and works out of the box.
when you are running on heroku cedar there is a way to get a free setup for your workers. this is no answer to your monitoring question, but it might be interesting anyways: http://blog.nofail.de/2011/07/heroku-cedar-background-jobs-for-free/
You can force the New Relic agent to start in your rake tasks and report their performance data.
Not the answer to the specific question,but...
One method of reducing overhead is using Unicorn server to get multiple workers working on one dyno. It depends on your set up, but most people who've taken the time to test it can comfortably get 3 - 4 worker processes running concurrently. It's a huge boost in clearing cues or tasks. Just be careful not to max out the allocated memory for the dyno.

Resources