Do I need a worker dyno on Heroku? - ruby-on-rails

I have not specifically added any background tasks to my Rails app. The app is going to be sending emails and also going to be resizing images. But I haven't included any background processes like delayed_jobs or Resque in my app. And for the time being I am not going to be adding background processes. So do I need a worker dyno?
If I add a worker dyno, would these tasks be automatically handled by the worker dyno?

Nothing happens by magic! Unless you add delayed_job or other gem for handling tasks in background and explicitly write code that should be performed in background - it will not.
When it comes to sending email and resizing images - it's encouraged to use background workers for those tasks but it's not a must. As long as you don't see timeouts for requests you should be all ok.
If you decide upon using delayed_job in the future take a look at workless gem - it autoscales worker dyno when it's needed and then scales it back to zero. I use it in my projects and that saves my quite some money.

If you are not running background processes you should be fine without a worker dyno.
But it depends also on your normal processes. If your processes are having to long of a request time (e.g. might be the case with your image resizing) it might be useful to pack that into a background task.
Pretty good explanation of this can be found on the Heroku website:
https://devcenter.heroku.com/articles/background-jobs-queueing
Of course the beauty of Heroku is it's scalability... so if you see that your tasks are having to high response times you can always add a worker dyno later and then change the task into a background task.

Related

Scaling Dyno worker size dynamically on Heroku Rails application

I am working on a project that launches a process via a Rails worker that is very resource intensive and it can only be handled properly by a Performance Worker on Heroku, 1X workers are killed because they use too much RAM and 2X workers can barely handle the load exceeding their RAM limits by up to 160%. A performance worker does the job fine with no issues.
My question is, is there a way to dynamically switch the Dyno size to Performance before a job initiates and then scale it back down once the job is finished or a queue is empty?
I know HireFire exists but to my knowledge this service only increases the amount of workers based on a queue length etc? Another possible solution I thought about was using the Heroku API which has a Dyno endpoint to resize the worker dyno before the job starts and then resize it back down when the job ends.
Does anyone else have other recommendations, ideas or strategies for this issue?
Thanks!
The best way is the one you mentioned: use the Heroku Platform API to scale your Dyno size up before starting the job, and then down again afterwards.
This is because tools like HireFire only work by inspecting stuff like application response time, router queue, etc. -- so there's no way for them to know you're about to run some job and then scale up just for that.
Depending on the specifics of the usage, you may be able to just create a distinct dyno-type in your procfile that only runs this particular worker and is always scaled to performance, but isn't always running? You could even just run this with one-off runs, instead of scaling it potentially (this can also be done via the API, roughly equivalent to heroku run ...). That said, #rdegges answer should certainly work.

How long can a sucker_punch worker run for on heroku?

I have sucker_punch worker which is processing a csv file, I initially had a problem with the csv file disappearing when the dyno powered down, to fix that i'm gonna set up s3 for file storage.
But my current concern is whether a dyno powering down will stop my worker in it's tracks.
How can I prevent that?
Since sucker_punch uses a separate thread on the same dyno and does not use an external queue or persistence (the way delayed_job, sidekiq, and resque do) you will be subject to losing the job when your dyno gets rebooted or stopped and you'll have no way to restart the job. On Heroku, dynos are rebooted at least once a day. If you need persistence and the ability to retry a job in the event a dyno goes down, I'd say switch to one of the other job libraries:
https://github.com/collectiveidea/delayed_job
https://github.com/mperham/sidekiq
https://github.com/resque/resque
However, these require using a Heroku Addon. You can get a way with the free version but you will still have to pay for the extra worker process. Other than that you'd have to implement your own persistence and retrying by wrapping sucker_punch. Here's a discussion on adding those features to sucker_punch: https://github.com/brandonhilkert/sucker_punch/issues/21 They basically say to use Sidekiq instead.

background tasks executing immediately and parallelly in rails

our rails web app has to download/unpack archives with html pages from ftp on request for user's viewing through the browser.
the archive can be quite big, so user has to wait until it downloads/unpacks on the server.
i implemented progress bar the way that i call fork/Process.detach in user's request, so that his request is done but downloading/unpacking process continues running in the background. and javascript rendered in his browser pings our server for status until all is ready and then it redirects him to unpacked html pages.
as long as user requests one archive, everything goes smoothly, but if he tries to run 2 or more requests at the same time(so that more forks are started), it seems that only one of them completes, and the rest expires/times outs/gets killed by passenger(?). i suppose its the issue with Passenger/forking.
i am not sure if its possible to fix it somehow so i guess i need to switch to another solution. the solution needs to permit immediate and parallel processing of downloads. so that if user requests multiple archives, he has to see download/decompression progress in all of them at the same time.
i was thinking about running background rake job immediately but it seems very slow to startup(also there's a lot of cron rake tasks happening every minute on our server). reason i liked fork was that it was very fast to start. i know there is delayed job, we also use it heavily for other tasks. but can it start multiple processes at the same time immediately without queues?
solved by keeping the fork and using single dj worker. this way i can have as many processes starting at the same time as needed without trouble with passenger/modifying our product's gemset (which we are trying to avoid since it resulted in bugs in the past)
not sure if forking inside dj worker can cause any troubles, so asked at
running fork in delayed job
if id be free to modify gemset, id probably use resque as wrdevos suggested, or sidekiq, or girl_friday(but thats less probable because it depends on the server running).
Use Resque: https://github.com/defunkt/resque
More on bg jobs and Resque here.
https://github.com/blog/542-introducing-resque

Should I have a heroku worker dyno for poll a AWS SQS?

Im confusing about where should I have a script polling an Aws Sqs inside a Rails application.
If I use a thread inside the web app probably it will use cpu cycles to listen this queue forever and then affecting performance.
And if I reserve a single heroku worker dyno it costs $34.50 per month. It makes sense to pay this price for it for a single queue poll? Or it's not the case to use a worker for it?
The script code:
What it does: Listen to converted pdfs. Gets the responde and creates the object into a postgres database.
queue = AWS::SQS::Queue.new(SQSADDR['my_queue'])
queue.poll do |msg|
...
id = received_message['document_id']
#document = Document.find(id)
#document.converted_at = Time.now
...
end
I need help!! Thanks
You have three basic options:
Do background work as part of a worker dyno. This is the easiest, most straightforward option because it's the thing that's most appropriate. Your web processes handle incoming HTTP requests, and your worker process handles the SQS messages. Done.
Do background work as part of your web dyno. This might mean spinning up another thread (and dealing with the issues that can cause in Rails), or it might mean forking a subprocess to do background processing. Whatever happens, bear in mind the 512 MB limit of RAM consumed by a dyno, and since I'm assuming you have only one web dyno, be aware that dyno idling means your app likely isn't running 24x7. Also, this option smells bad because it's generally against the spirit of the 12-factor app.
Do background work as one-off processes. Make e.g. a rake handle_sqs task that processes the queue and exits once it's empty. Heroku Scheduler is ideal: have it run once every 20 minutes or something. You'll pay for the one-off dyno for as long as it runs, but since that's only a few seconds if the queue is empty, it costs less than an always-on worker. Alternately, your web app could use the Heroku API to launch a one-off process, programmatically running the equivalent heroku run rake handle_sqs.

Rails 3 + Heroku + Delayed jobs - Help me understand!

I'm having problems understanding this article: http://blog.darkhax.com/2010/07/30/auto-scale-your-resque-workers-on-heroku .
I don't quite get it why do I need Redis + Resque when I have delayed jobs provided by Heroku.
From my understanding, I still have to pay for the workers, correct? What's my main advantage of using that solution?
Regards.
If you don't know why you need Resque, then you don't need it ;)
Resque is for high-scalability. delayed_job is fine for smaller-scale stuff, but once you get to the size of, say, Github, you will need something like Resque. If delayed_job works for you, then stay with it. You don't need to worry about replacing it until your background jobs queue gets around 30,000 or so.
To autoscale heroku workers using delayed job, you can hook into the enqueue and after hooks and use the heroku api to query/update the number of workers.
For the most basic implementation on enqueue, check to see if there are workers and if not add a worker. On after, check to see if there are other delayed jobs and if not reduce the workers to 0.
You can obviously make this more sophisticated in the way that you scale.
Here is a basic implementation: https://github.com/phaza/Heroku-Delayed-Job-Autoscale
hirefireapp is a new-ish simple drop-in solution to auto-scaling workers.
It spawns workers for you based on queue size (configurable) and then "fires" them when they are no longer necessary. You pay for the dyno time (to the nearest second) and for the hirefireapp service. In theory you could roll your own using the open source hirefire gem too.
It also handles scaling the web side if you choose, so you can spawn more web dynos based on current latency.
You can also use Hirefireapp.com to monitor and scale your apps

Resources