Seems that the task is simple and straightforward: I need to limit amount of jobs that can be performed at the same time, so my server won't blow up. But google is silent, perhaps I'm doing something wrong? Enlighten me please?
I use standard async adapter.
It's not recommended to use default rails async adapter in production, especially for heroku dyno that restart itself once per day.
For enqueuing and executing jobs in production you need to set up a
queuing backend, that is to say you need to decide for a 3rd-party
queuing library that Rails should use. Rails itself only provides an
in-process queuing system, which only keeps the jobs in RAM. If the
process crashes or the machine is reset, then all outstanding jobs are
lost with the default async backend. This may be fine for smaller apps
or non-critical jobs, but most production apps will need to pick a
persistent backend.
There are plenty of supported adaptor to choose from such as:
Sidekiq
Resque
Delayed Job
It's easy to start, they provide clear instruction and example.
If you would like to use the default async adapter in development and would like to limit the maximum number of jobs that will be executed in parallel, you can add the following line in your config/application.rb file:
config.active_job.queue_adapter = ActiveJob::QueueAdapters::AsyncAdapter.new min_threads: 1, max_threads: 2, idletime: 600.seconds
So in this case at most two will run at the same time. I think the default maximum is twice the number of processors.
I have a case where I need to limit it to one and that works just fine (Rails 7).
Source: https://api.rubyonrails.org/classes/ActiveJob/QueueAdapters/AsyncAdapter.html
Related
I am looking for a good background job processor with following ability,
Works well with MySql
Can have priorities
Can easily schedule anything in background( not just emails)
Ability to reinitialize the job after completion (callback would be good. I have few task/jobs that keeps on running after every minutes), even a repetitive scheduler would work
Should not eat up lot of memory, (have this experience with DJ)
Few options that I am looking into Resque, DJ, Beanstalkd (haven't explored completely)
I have my production env in Amazon EC2 (if this helps for better solution)
Please suggest me which is a good option, is there something else apart from these that people use nowadays ?
I'd heartily recommend sidekiq - it's extremely flexible and it uses far less resources than Resque or DelayedJob.
It does require redis (like Resque), but redis is valuable addition to any Rails project since it can be reused as a session store and cache. Our primary db is MySQL and we deploy to EC2 :-) We've used delayed job and rescue in the past, but found them problematic and heavy on the resources they use. Sidekiq uses threads and a single sidekiq worker is as efficient as several DJ/Resque workers. Here's an interesting part of the project's README that I can corroborate:
You'll find that you might need 50 200MB resque processes to peg your
CPU whereas one 300MB Sidekiq process will peg the same CPU and
perform the same amount of work. Please see my blog post on Resque's
memory efficiency and how I was able to shrink a Carbon Five client's
resque processing farm from 9 machines to 1 machine.
To sum it all up:
It works fine with MySQL - not really, but it doesn't have problems with MySQL either
You can have priorities by setting up different processing queues
You can easily schedule anything (and there is special ala DJ support for e-mails in particular)
Not quite sure about that, we use whenever + cron for repetitive jobs
You're gonna love Sidekiq's small memory footprint
I've just inherited a Rails project and before it was on a typical 'nix server. The decision was made to move it to heroku for the client and it up to me to get the background process working.
Currently it uses Whenever to schedule daily events(email etc) and fire up the delayed job queue on boot.
Heroku provides an example of documentation for a custom clock process using clockwork, can I going by this example use it with whenever? Any pitfalls I might come across? Will I need to create a separate worker dyno?
Scheduled Jobs and Custom Clock Processes in Ruby with Clockwork
Yes -- Heroku's Cedar stack lets you run whatever you want.
The basic building block of the Cedar stack is the dyno. Each dyno gets an ephemeral copy of your application, 512 MB of RAM, and a bunch of shared CPU time. Web dynos are expected to bind an HTTP server to the port specified in the $PORT environment variable, since that's where Heroku will send HTTP requests, but other than that, web dynos are identical to other types of dynos.
Your application tells Heroku how to run its various components by defining them in the Procfile. (See Declaring and Scaling Process Types with Procfile.) The Clock Processes article demonstrates a pattern where you use a worker (i.e. non-web) dyno to enqueue work based on arbitrary criteria. Again, you can do whatever you want here -- just define it in a Procfile and Heroku will happily run it. If you go with a clock process (e.g. a 24x7 whenever), you'll be using a whole dyno ($0.05/hour) to do nothing but schedule work.
In your case, I'd consider switching from Whenever to Heroku Scheduler. Scheduler is basically a Heroku-run cron, where the crontab entries are "spin up a dyno and run this command". You'll still pay $0.05/hour for the extra dynos, but unlike the clock + worker setup, you'll only pay for the time they actually spend running. It cleanly separates periodic tasks from the steady-state web + worker traffic, and it's usually significantly cheaper too.
The only other word of warning is that running periodic tasks in distributed systems is complex and has complex failure modes. Some of the platform incidents (corresponding with the big EC2 outages) have resulted in things like 2 simultaneous clock processes and duplicate scheduler runs. If you're doing something that needs to run serially (like emailing people once a day), consider guarding it with RDBMS locking, and double-checking that it's actually been ~23 hours since your daily job.
Heroku Scheduler is often a bad option for production use because it's unreliable and will skip running its tasks sometimes.
The good news is that if you run a jobs queue dyno with Sidekiq there are scheduling plugins for it, e.g. sidekiq-cron. With that you can use the same dyno for scheduling. And if you don't have a jobs worker yet you need to set it up just for scheduling if you need to run it reliably.
P.S. if you happen to run Delayed::Job for jobs queing there are scheduling plugins for it, too, e.g. this one.
I just got the last month heroku bill, and the scheduled rake tasks were a relatively heavy burden. We are pretty early in our development process, so we just developed some rake tasks to get the job done recently, and didn't had much concern in theirs optimization.
Now we want to improve theirs performance and theirs heroku processing hours usage. We use New Relic to monitor the webapp performance, but apparently this type of rake tasks are ignored by default, and it's unclear how to override that.
Anyone had a similiar problem? How can I track the scheduled tasks in close to real time to monitor performance, optimize, and don't get suprise bills?
Whilst you can't really monitor rake tasks that well, there are a few little things you can do. One is the use of logging. Output start and end times of tasks to logs, and you can then see what's been happening duration wise. If you couple this with something like the Papertrail add-on then you can do additional interrogation later on.
As for running the jobs themselves, there's a couple of ways that you can run background processes which are dependant on how they need to run:
If you're needing to run jobs on a schedule, there's a few options available. Firstly there's the Heroku scheduler, which is pretty good, but doesn't guarantee executions will happen. Normally you would use this to kick off a rake task which will bring up a one-off dyno for the duration of the task - therefore you need to ensure in development that these tasks are as efficient as possible.
Alternatively, if you're looking at jobs that need a little more control or using a clock process. Essentially this is a dyno running 24/7 that does nothing but kick off other jobs at preset intervals and times. This would normally be done using the clockwork gem. The downside of this approach is that you need to pay for a clock process all the time.
A third approach, and one that might work is delayed job, with it's runat option, allowing you to queue a job to be run in the future (and jobs can re-queue themselves). There are a few issues with this in that a failure can kill the whole chain, and you need a full time worker running to process them all.
Therefore, in order to minimize your bills, ensure that your rake tasks are as performant and reliable, and then choose the scheduling option that suits you. If you're looking at schedules plus user created events, delayed_job might be the best option. If you're looking at a few tasks running periodically, then go scheduler. If you're looking at running lots of time critical jobs on a regular basis, go with clockwork.
Either way, you should be able to constrain a fair amount of processing into just one or two processes depending on your approach.
I know this question is almost 10 years old, but there is a new way!
You can now monitor your Heroku Scheduler jobs using One-off Dyno Metrics. This Heroku add-on gathers metrics for all detached one-off dynos running in your Heroku app. It was created to be an extension of Heroku's Application Metrics and works out of the box.
when you are running on heroku cedar there is a way to get a free setup for your workers. this is no answer to your monitoring question, but it might be interesting anyways: http://blog.nofail.de/2011/07/heroku-cedar-background-jobs-for-free/
You can force the New Relic agent to start in your rake tasks and report their performance data.
Not the answer to the specific question,but...
One method of reducing overhead is using Unicorn server to get multiple workers working on one dyno. It depends on your set up, but most people who've taken the time to test it can comfortably get 3 - 4 worker processes running concurrently. It's a huge boost in clearing cues or tasks. Just be careful not to max out the allocated memory for the dyno.
I'm having problems understanding this article: http://blog.darkhax.com/2010/07/30/auto-scale-your-resque-workers-on-heroku .
I don't quite get it why do I need Redis + Resque when I have delayed jobs provided by Heroku.
From my understanding, I still have to pay for the workers, correct? What's my main advantage of using that solution?
Regards.
If you don't know why you need Resque, then you don't need it ;)
Resque is for high-scalability. delayed_job is fine for smaller-scale stuff, but once you get to the size of, say, Github, you will need something like Resque. If delayed_job works for you, then stay with it. You don't need to worry about replacing it until your background jobs queue gets around 30,000 or so.
To autoscale heroku workers using delayed job, you can hook into the enqueue and after hooks and use the heroku api to query/update the number of workers.
For the most basic implementation on enqueue, check to see if there are workers and if not add a worker. On after, check to see if there are other delayed jobs and if not reduce the workers to 0.
You can obviously make this more sophisticated in the way that you scale.
Here is a basic implementation: https://github.com/phaza/Heroku-Delayed-Job-Autoscale
hirefireapp is a new-ish simple drop-in solution to auto-scaling workers.
It spawns workers for you based on queue size (configurable) and then "fires" them when they are no longer necessary. You pay for the dyno time (to the nearest second) and for the hirefireapp service. In theory you could roll your own using the open source hirefire gem too.
It also handles scaling the web side if you choose, so you can spawn more web dynos based on current latency.
You can also use Hirefireapp.com to monitor and scale your apps
I have a ruby on rails app that uses Heroku. I have the need to run things like import/export tasks on our db that lock up the whole system since they are so heavy on the DB. Is there a way to tell the system to only run these tasks when the database is not being used at that second?
There is no built-in way to schedule a job like this. There are a few things you can do, though.
Schedule the jobs to run during the least busy hours of the day. That will depend on your business, customer base and so on, but hopefully there is a window that is more suitable than others.
You could write your batch job to run for a longer time, doing small units of work. Between each unit of work, sleep for a few seconds, or take a look at the current load average and decide what to do based on that. This should lower the impact of the batch jobs.
Have the website update a "lock" somewhere, either in the database or in a memcached or something. If your normal website usage updates the database, you could look at the existing updated_at. Then only do batch work when there hasn't been any activity for a while. This doesn't guarantee that a new user won't pop in at the same time your batch job runs, of course, but could be a way to find a window where the site is less used.
Have you looked into using Background Jobs / Workers on Heroku? It's also worth reading about Heroku's Delayed Job queuing system