How long can a sucker_punch worker run for on heroku? - ruby-on-rails

I have sucker_punch worker which is processing a csv file, I initially had a problem with the csv file disappearing when the dyno powered down, to fix that i'm gonna set up s3 for file storage.
But my current concern is whether a dyno powering down will stop my worker in it's tracks.
How can I prevent that?

Since sucker_punch uses a separate thread on the same dyno and does not use an external queue or persistence (the way delayed_job, sidekiq, and resque do) you will be subject to losing the job when your dyno gets rebooted or stopped and you'll have no way to restart the job. On Heroku, dynos are rebooted at least once a day. If you need persistence and the ability to retry a job in the event a dyno goes down, I'd say switch to one of the other job libraries:
https://github.com/collectiveidea/delayed_job
https://github.com/mperham/sidekiq
https://github.com/resque/resque
However, these require using a Heroku Addon. You can get a way with the free version but you will still have to pay for the extra worker process. Other than that you'd have to implement your own persistence and retrying by wrapping sucker_punch. Here's a discussion on adding those features to sucker_punch: https://github.com/brandonhilkert/sucker_punch/issues/21 They basically say to use Sidekiq instead.

Related

Is it possible to redeploy a Heroku app without restarting some process types

I'm running a Rails app on Heroku, and I have defined a custom process type to perform some long-running jobs, really long-running, a job can easily take something about an hour or more. I know it's better to split it into some small chunks, but that's quite problematic for that task.
And the issue is that when I push a new version — Heroku restarts all the dynos (web, workers, long workers — everything). I wonder is it possible to restart only some process types, e.g. only the web dynos?
No, that isn't possible. The easiest and most scalable way around this would be to split your long-running jobs into smaller chunks.
That way, you would have a lot of very small jobs being processed very quickly. When your app is restarted, you would be able to restart your process, as it wouldn't stop a long-running job.
Alternatively, one-off dynos won't be restarted when your app is deployed.
Using the heroku api, you can programmatically boot one-off dynos. Using that, you could start a one-off dyno for each long-running job you need to process.
That job would be processed (for up to 24 hours, where it would be cycled), and you would be able to deploy your app without restarting it.

Do I need a worker dyno on Heroku?

I have not specifically added any background tasks to my Rails app. The app is going to be sending emails and also going to be resizing images. But I haven't included any background processes like delayed_jobs or Resque in my app. And for the time being I am not going to be adding background processes. So do I need a worker dyno?
If I add a worker dyno, would these tasks be automatically handled by the worker dyno?
Nothing happens by magic! Unless you add delayed_job or other gem for handling tasks in background and explicitly write code that should be performed in background - it will not.
When it comes to sending email and resizing images - it's encouraged to use background workers for those tasks but it's not a must. As long as you don't see timeouts for requests you should be all ok.
If you decide upon using delayed_job in the future take a look at workless gem - it autoscales worker dyno when it's needed and then scales it back to zero. I use it in my projects and that saves my quite some money.
If you are not running background processes you should be fine without a worker dyno.
But it depends also on your normal processes. If your processes are having to long of a request time (e.g. might be the case with your image resizing) it might be useful to pack that into a background task.
Pretty good explanation of this can be found on the Heroku website:
https://devcenter.heroku.com/articles/background-jobs-queueing
Of course the beauty of Heroku is it's scalability... so if you see that your tasks are having to high response times you can always add a worker dyno later and then change the task into a background task.

background tasks executing immediately and parallelly in rails

our rails web app has to download/unpack archives with html pages from ftp on request for user's viewing through the browser.
the archive can be quite big, so user has to wait until it downloads/unpacks on the server.
i implemented progress bar the way that i call fork/Process.detach in user's request, so that his request is done but downloading/unpacking process continues running in the background. and javascript rendered in his browser pings our server for status until all is ready and then it redirects him to unpacked html pages.
as long as user requests one archive, everything goes smoothly, but if he tries to run 2 or more requests at the same time(so that more forks are started), it seems that only one of them completes, and the rest expires/times outs/gets killed by passenger(?). i suppose its the issue with Passenger/forking.
i am not sure if its possible to fix it somehow so i guess i need to switch to another solution. the solution needs to permit immediate and parallel processing of downloads. so that if user requests multiple archives, he has to see download/decompression progress in all of them at the same time.
i was thinking about running background rake job immediately but it seems very slow to startup(also there's a lot of cron rake tasks happening every minute on our server). reason i liked fork was that it was very fast to start. i know there is delayed job, we also use it heavily for other tasks. but can it start multiple processes at the same time immediately without queues?
solved by keeping the fork and using single dj worker. this way i can have as many processes starting at the same time as needed without trouble with passenger/modifying our product's gemset (which we are trying to avoid since it resulted in bugs in the past)
not sure if forking inside dj worker can cause any troubles, so asked at
running fork in delayed job
if id be free to modify gemset, id probably use resque as wrdevos suggested, or sidekiq, or girl_friday(but thats less probable because it depends on the server running).
Use Resque: https://github.com/defunkt/resque
More on bg jobs and Resque here.
https://github.com/blog/542-introducing-resque

Should I have a heroku worker dyno for poll a AWS SQS?

Im confusing about where should I have a script polling an Aws Sqs inside a Rails application.
If I use a thread inside the web app probably it will use cpu cycles to listen this queue forever and then affecting performance.
And if I reserve a single heroku worker dyno it costs $34.50 per month. It makes sense to pay this price for it for a single queue poll? Or it's not the case to use a worker for it?
The script code:
What it does: Listen to converted pdfs. Gets the responde and creates the object into a postgres database.
queue = AWS::SQS::Queue.new(SQSADDR['my_queue'])
queue.poll do |msg|
...
id = received_message['document_id']
#document = Document.find(id)
#document.converted_at = Time.now
...
end
I need help!! Thanks
You have three basic options:
Do background work as part of a worker dyno. This is the easiest, most straightforward option because it's the thing that's most appropriate. Your web processes handle incoming HTTP requests, and your worker process handles the SQS messages. Done.
Do background work as part of your web dyno. This might mean spinning up another thread (and dealing with the issues that can cause in Rails), or it might mean forking a subprocess to do background processing. Whatever happens, bear in mind the 512 MB limit of RAM consumed by a dyno, and since I'm assuming you have only one web dyno, be aware that dyno idling means your app likely isn't running 24x7. Also, this option smells bad because it's generally against the spirit of the 12-factor app.
Do background work as one-off processes. Make e.g. a rake handle_sqs task that processes the queue and exits once it's empty. Heroku Scheduler is ideal: have it run once every 20 minutes or something. You'll pay for the one-off dyno for as long as it runs, but since that's only a few seconds if the queue is empty, it costs less than an always-on worker. Alternately, your web app could use the Heroku API to launch a one-off process, programmatically running the equivalent heroku run rake handle_sqs.

custom clock process on heroku

I've just inherited a Rails project and before it was on a typical 'nix server. The decision was made to move it to heroku for the client and it up to me to get the background process working.
Currently it uses Whenever to schedule daily events(email etc) and fire up the delayed job queue on boot.
Heroku provides an example of documentation for a custom clock process using clockwork, can I going by this example use it with whenever? Any pitfalls I might come across? Will I need to create a separate worker dyno?
Scheduled Jobs and Custom Clock Processes in Ruby with Clockwork
Yes -- Heroku's Cedar stack lets you run whatever you want.
The basic building block of the Cedar stack is the dyno. Each dyno gets an ephemeral copy of your application, 512 MB of RAM, and a bunch of shared CPU time. Web dynos are expected to bind an HTTP server to the port specified in the $PORT environment variable, since that's where Heroku will send HTTP requests, but other than that, web dynos are identical to other types of dynos.
Your application tells Heroku how to run its various components by defining them in the Procfile. (See Declaring and Scaling Process Types with Procfile.) The Clock Processes article demonstrates a pattern where you use a worker (i.e. non-web) dyno to enqueue work based on arbitrary criteria. Again, you can do whatever you want here -- just define it in a Procfile and Heroku will happily run it. If you go with a clock process (e.g. a 24x7 whenever), you'll be using a whole dyno ($0.05/hour) to do nothing but schedule work.
In your case, I'd consider switching from Whenever to Heroku Scheduler. Scheduler is basically a Heroku-run cron, where the crontab entries are "spin up a dyno and run this command". You'll still pay $0.05/hour for the extra dynos, but unlike the clock + worker setup, you'll only pay for the time they actually spend running. It cleanly separates periodic tasks from the steady-state web + worker traffic, and it's usually significantly cheaper too.
The only other word of warning is that running periodic tasks in distributed systems is complex and has complex failure modes. Some of the platform incidents (corresponding with the big EC2 outages) have resulted in things like 2 simultaneous clock processes and duplicate scheduler runs. If you're doing something that needs to run serially (like emailing people once a day), consider guarding it with RDBMS locking, and double-checking that it's actually been ~23 hours since your daily job.
Heroku Scheduler is often a bad option for production use because it's unreliable and will skip running its tasks sometimes.
The good news is that if you run a jobs queue dyno with Sidekiq there are scheduling plugins for it, e.g. sidekiq-cron. With that you can use the same dyno for scheduling. And if you don't have a jobs worker yet you need to set it up just for scheduling if you need to run it reliably.
P.S. if you happen to run Delayed::Job for jobs queing there are scheduling plugins for it, too, e.g. this one.

Resources