Does using heroku restart result in data loss? Is the last DB backup used during a restart or is the DB unaffected?
A restart does not affect the database. So, generally speaking, data loss will not occur on restart.
However, it will also restart your workers, which may interrupt any jobs currently being processed. This can result in a partially finished job, which may have an undesired effect, depending on the job. You should design any background jobs so they can be restarted from scratch if necessary (for instance, do any database interaction in a transaction).
A similar effect is also possible for your dynos - in this case, instead of a partially completed job, it would be a partially completed web request. This would very rarely cause a problem, though.
A deploy - but not necessarily a restart - will also cause any files in your temporary directories (tmp/ and log/) to be deleted.
To prevent both of these, use maintenance mode (heroku maintenance on) and make sure all your workers and web requests are done working before you deploy or restart.
Related
I have an application that handles some data in memory.
I'd like to close the operations and persist the data into DB so that a reboot wouldn't destroy it.
My app opens some resources in various third parties and it I'd like to close them. After that the app can happily go down and wait until it reboots.
What I found is that Heroku has various webhooks for application deployment state changes and so on. But I couldn't find a way to trigger a webhook before the DB becomes read only.
I would like to have a webhook that tells me that "in 5 minutes PostgreSQL will become read only". And then later the app can reboot and for now it doesn't matter.
Also I couldn't find any info if this is even possible. I couldn't find an email for support as well.
Is there a way to do it? Is it even possible?
(I have an Event-Sourced app that saves event data into DB but persists the data in-memory as it runs. So I don't want to continuously bash all of my state into the DB).
It sounds like there is some amount of confusion with regards to your understanding about the various parts of dyno and database uptime on Heroku.
Firstly, a database going into read-only mode is a very rare event usually associated with a critical failure. Based on what behavior you're seeking and some of your comments, it seems like you may be confusing database state changes with dyno state changes. Dynos (representing the servers for your application runtime), are restarted once per 24 hours roughly and these servers are ephemeral. Thus the memory is blown away. The 'roughly' part accounts for fuzzing so that all of your dynos aren't restarting at the same time which would cause availability issues.
I don't think you actually need a webhook here. Conveniently, shortly before a dyno is due to be cycled (and blow away your memory) it will receive a SIGTERM and be given 30 seconds to clean up after itself. That SIGTERM can be trapped and you can then save your data to the database.
I have a Sidekiq job that runs for a while and when I deploy to Heroku and the job is running, it can't finish within in the few seconds.
That is fine, as the job is designed to be able to be re-run if needed.
The problem is that the job gets lost (instead of put back to redis and run again after deploy).
I found that it is advised to set :timeout: 8 on heroku and I tried it, but it had no effect (also tried seeting to 5).
When there is an exception, I get errors reported, but I don't see any. So not sure what could be wrong.
Any tips on how to debug this?
The free version of Sidekiq will push unfinished jobs back to Redis after the timeout has passed, default of 8 seconds. Heroku gives a process 10 seconds to shut down. That means we have 2 seconds to get those jobs back to Redis or they will be lost. If your network is slow, if the Redis server is swapping, etc, that 2 sec deadline might not be met and the jobs lost.
You were on the right track: one answer is to lower the timeout so you have a better chance of meeting that deadline. But network or swapping delay can't be predicted: even 5 seconds might not be enough time.
Under normal healthy conditions, things should work as designed. Keep your machines healthy (uncongested network, plenty of RAM) and the basic fetch should work well. Sidekiq Pro's reliable fetch feature is a fundamental redesign of how Sidekiq fetches jobs and works around all of these issues by keeping jobs in Redis all the time so they can't be lost. But it comes with serious trade offs too: it's more complicated, slower and more Redis intensive than "basic" fetch.
In short, I don't know why you are losing jobs but make sure your instances and Redis server are healthy and the latency is low.
https://github.com/mperham/sidekiq/wiki/Using-Redis#life-in-the-cloud
This is actually feature of sidekiq - designed to steer you toward paying pro version:
http://sidekiq.org/products/pro
RELIABILITY
More reliable message processing.
Cloud environments are noisy and unreliable. Seeing timeouts? Wild swings in latency or performance? Ruby VM crashes or processes disappearing?
If a Sidekiq process crashes while processing a job, that job is lost.
If the Sidekiq client gets a networking error while pushing a job to Redis, an exception is raised and the job is not delivered.
Sidekiq Pro uses Redis's RPOPLPUSH command to ensure that jobs will not be lost if the process crashes or gets a KILL signal.
The Sidekiq Pro client can withstand transient Redis outages or timeouts. It will enqueue jobs locally upon error and attempt to deliver those jobs once connectivity is restored.
Deploy terminates all processes that belongs to user, therefore job is lost. There is actually not much you can do there.
As #mike-perham and #esse noted, Sidekiq is designed the way it can loose jobs due to its fetching mechanism. Your options to get around this are:
To buy Sidekiq Pro (although it was reported to cause the same issue)
To write your own fetcher (but that would mean you can not use most of 3rd party libraries, as they will not work with your custom fetcher)
To mimic Sidekiq Pro's reliable fetch by backing up your jobs data. In case you are up for this way, check out attentive_sidekiq gem which does exactly that.
I have N dynos for a Rails application, and I'd like to run a command on all of them. Is there a way to do it? Would running rails r "SomeRubyCode" be executed on all dynos?
I'm using a plugin which syncs with a 3rd party every M minutes. The problem is, sometimes the 3rd party service times out, and I'd like to run it again without having to wait for another M minutes to pass.
No. One off commands (those like heroku run bash) are ran on another, one-off dyno. You would need to setup some kind of pubsub/message queue that all dynos listen to to accomplish this. https://devcenter.heroku.com/articles/one-off-dynos
(Asked to turn my comment into an answer... will take this opportunity to expound.)
I don't know about the details of what your plugin needs to do to 'sync' to a 3rd-party service, but I'm going to proceed with the assumption that the plugin basically fetches some transient data which your Web application then uses somehow.
Because the syncing or fetching process occasionally fails, and your Web application relies on up-to-date data you want the option of running the 'sync' process manually. Currently, the only way to do this from the plugin itself which means you need to run some code on all dynos which, as others have pointed out, isn't currently possible.
What I've done in a previous, similar scenario (fetching analytics from an external service) is simple:
Provision and configure your Heroku app with Redis
Write a rake task that simply executes the code (that would otherwise be run by the plugin) to fetch the data, then write that data into cache
Where you would normally fetch the data in the app, first try to fetch from cache (and on a cache miss, just run the same code again—just means that the data expired from cache before it was refreshed)
I then went further and used Heroku simple scheduler to execute the said rake task every n minutes to attempt to keep the data freshly updated and always in cache (cache expiry was set to a little less than n minutes) and reduce instances of perceivable lag as the data fetch occurs. I could've set cache expiry to never or greater than n but this wasn't mission-critical.
This way, if I did want to ensure that the latest analytics were displayed, all I had to do was either a) connect to Redis and remove the item from cache, or (easier), b) just heroku run rake task.
Again—this mainly works if you're just pulling data that needs to be shared among all dynos.
This obviously doesn't work the other way around. For instance, if you had a centralized service that you wanted to periodically send metrics (say, time spent per request) to on a per-dyno basis. Can't think of an easy, elegant way to do that using Heroku (other than at real-time, with all the overhead that entails).
our rails web app has to download/unpack archives with html pages from ftp on request for user's viewing through the browser.
the archive can be quite big, so user has to wait until it downloads/unpacks on the server.
i implemented progress bar the way that i call fork/Process.detach in user's request, so that his request is done but downloading/unpacking process continues running in the background. and javascript rendered in his browser pings our server for status until all is ready and then it redirects him to unpacked html pages.
as long as user requests one archive, everything goes smoothly, but if he tries to run 2 or more requests at the same time(so that more forks are started), it seems that only one of them completes, and the rest expires/times outs/gets killed by passenger(?). i suppose its the issue with Passenger/forking.
i am not sure if its possible to fix it somehow so i guess i need to switch to another solution. the solution needs to permit immediate and parallel processing of downloads. so that if user requests multiple archives, he has to see download/decompression progress in all of them at the same time.
i was thinking about running background rake job immediately but it seems very slow to startup(also there's a lot of cron rake tasks happening every minute on our server). reason i liked fork was that it was very fast to start. i know there is delayed job, we also use it heavily for other tasks. but can it start multiple processes at the same time immediately without queues?
solved by keeping the fork and using single dj worker. this way i can have as many processes starting at the same time as needed without trouble with passenger/modifying our product's gemset (which we are trying to avoid since it resulted in bugs in the past)
not sure if forking inside dj worker can cause any troubles, so asked at
running fork in delayed job
if id be free to modify gemset, id probably use resque as wrdevos suggested, or sidekiq, or girl_friday(but thats less probable because it depends on the server running).
Use Resque: https://github.com/defunkt/resque
More on bg jobs and Resque here.
https://github.com/blog/542-introducing-resque
Is it possible to do something like the Github zero downtime deploy on Heroku using Unicorn on the Cedar stack?
I'm not entirely sure how the restart works on Heroku and what control we have over restarting processes, but I like the possibility of zero downtime deploys and up until now, from what I've read, it's not possible
There are a few things that would be required for this to work.
First off, we'd need backwards compatible migrations. I leave that up to our team to figure out.
Secondly, we'd want to migrate the db right after a push, but before the restart (assuming our migrations are fully backwards compatible, this should not affect anything)
Thirdly, we'd want to instruct Unicorn to launch a new master process and fork some workers, then swap the PIDs and gracefully shut down the old process/workers
I've scoured the docs but I can't find anything that would indicate this is possible on Heroku. Any thoughts?
I can't address migrations, but the part about restarting processes and avoiding wait time:
There is an beta feature for heroku called preboot. After a deploy, it boots your new dynos first and waits a while before switching traffic and killing the old ones:
https://devcenter.heroku.com/articles/labs-preboot/
I also wrote a blog post that has some measurements on my app's performance improvements using this feature:
http://ylan.segal-family.com/blog/2012/08/27/deploy-to-heroku-with-near-zero-downtime/
You might be interested in their feature called preboot.
Taken from their documentation:
This feature provides seamless deploys by booting web dynos with new code before killing existing web dynos.
Some apps take a long time to boot up, and this can cause unacceptable delays in serving HTTP requests during deployment.
There are a few caveats:
You must have at least two web dynos to use this feature. If you have your web process type scaled to 1 or 0, preboot will be disabled.
Whoever is doing the deployment will have to wait a few minutes before the new code starts serving user requests; this happens later than it would without preboot (but in the meanwhile, user requests are still served promptly by old dynos).
There will be a short period (a minute or two) where heroku ps shows the status of the new code, but user requests are still being served by old code.
There is much more information about it, so refer to their documentation.
It is possible, but requires a fair amount of forward planning. As of Rails 3.1 there's three tasks that need carrying out
Upload the new code
Run any database migrations
Sync the assets
Uploading code and restarting is fairly straightforward, the main problem lies with the other two, but the way round them is the pretty much the same.
Essentially you need to:
Make the code compatible with the migration you need to run
Run the migration, and remove any code written specifically for it
For instance, if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.
In short, you need to consider your database and the code compatability an work around them so that the two can overlap in terms of versioning.
An alternative to this method might be to have two versions of the application running on Heroku at the same time. When you deploy, switch the domain to the other version, do the deploy, and switch it back again. This will help in most instances, but again, database compat is an issue.
Personally, I would say that if your deployments are significant to require this sort of consideration, taking parts of the application offline are probably the safest answer. By breaking up an application into several smaller applications can help mitigate this and is a mechanism that I use regularly.
No - this is currently not possible using Unicorn on Heroku cedar. I've been bugging Heroku about this for weeks.
Here was Heroku Support's reply to my email on March 8, 2012:
Hi, you could enable maintenance mode when doing a deploy, at least your users would see a maintenance page instead of an error, and also request queue wouldn't build up.
We're definitely aware this is a pain and we're working to offer rolling / zero-downtime deploys in the future. We have no ETA to announce, though.