Heroku Error H13 - ruby-on-rails

I've been getting this error now on & off for the past couple days since I deployed my application to heroku. It happens both before I started using unicorn as a server as well as afterwards. I can sometimes get it back up and running by using heroku run rake db:migrate then heroku restart but this only fixes it for a couple hours and it's broken again. As for the webpage it's saying "Application error". The logs aren't very helpful but here's what it says each time this error happens:
[2014-10-27T21:13:31.675956 #2] ERROR -- : worker=1 PID:8 timeout (16s > 15s), killing
[2014-10-27T21:13:31.731646 #14] INFO -- : worker=1 ready
[2014-10-27T21:13:31.694690 #2] ERROR -- : reaped #<Process::Status: pid 8 SIGKILL (signal 9)> worker=1
at=error code=H13 desc="Connection closed without response" method=GET
I'm just using the free version of heroku, I want to make sure it works before upgrading but is that my only option at this point?
Also I am able to run this locally perfectly fine using either rails server or foreman start.

Heroku docs say this about H13:
H13 - Connection closed without response
This error is thrown when a process in your web dyno accepts a connection, but then closes the socket without writing anything to it.
One example where this might happen is when a Unicorn web server is configured with a timeout shorter than 30s and a request has not been processed by a worker before the timeout happens. In this case, Unicorn closes the connection before any data is written, resulting in an H13.
A couple lines up, you have an error about a process timing out after 15s:
ERROR -- : worker=1 PID:8 timeout (16s > 15s), killing
Heroku help has a section on timeout settings:
Depending on your language you may be able to set a timeout on the app server level. One example is Ruby’s Unicorn. In Unicorn you can set a timeout in config/unicorn.rb like this:
timeout 15
The timer will begin once Unicorn starts processing the request, if 15 seconds pass, then the master process will send a SIGKILL to the worker but no exception will be raised.
That matches the error messages in your log. I'd look into it.

Related

Unicorn workers dying for no reason

All the unicorn workers are dying silently, no indication as to why, and I can't find any evidence of an external process killing them. I'm new to diagnosing this kind of stuff, and after several hours of research, experimenting, and trying to figure this out, I'm at a dead end.
Background info- it's a Rails 4.1 app, Ruby 2.0, running nginx and unicorn on a Ubuntu 14.04 server.
unicorn.rb
working_directory "/home/deployer/apps/ourapp/current"
pid "/home/deployer/apps/ourapp/current/tmp/pids/unicorn.pid"
stderr_path "/home/deployer/apps/ourapp/current/log/unicorn.log"
stdout_path "/home/deployer/apps/ourapp/current/log/unicorn.log"
listen "/tmp/unicorn.ourapp.sock"
worker_processes 2
timeout 30
excerpt from unicorn.log (last lines before it dies and after restart)
I, [2016-08-28T19:54:01.685757 #19559] INFO -- : worker=1 ready
I, [2016-08-28T19:54:01.817464 #19556] INFO -- : worker=0 ready
I, [2016-08-29T09:19:14.818267 #30343] INFO -- : unlinking existing socket=/tmp/unicorn.ourapp.sock
I, [2016-08-29T09:19:14.818639 #30343] INFO -- : listening on addr=/tmp/unicorn.ourapp.sock fd=10
I, [2016-08-29T09:19:14.818807 #30343] INFO -- : worker=0 spawning...
I, [2016-08-29T09:19:14.824358 #30343] INFO -- : worker=1 spawning...
Some pertinent info:
After a period of time ranging from about 8 - 20 hours, unicorn dies.
There's no error recorded in the unicorn log.
I searched all of /var/log for evidence of processes that were killed, and can only find one unrelated process that was killed a few days ago.
New Relic shows flat memory usage before the last random shutdown, with ruby using around 400mb. It's currently at 480mb with no problems, so I don't think it's hitting memory constraints.
Same with CPU usage...ruby was hovering around 0.1% before it died.
The last couple of times it died were in the middle of the night. The only requests coming in were from New Relic and Linode Longview monitoring.
Our production.log shows a last request before dying as a ping from New Relic. It Completed 200 OK in 264ms so it doesn't seem to be a request timing out.
It's happening in staging as well, and the log level is set to debug, and there are no additional clues in the staging logs.
Questions:
What could be killing the Unicorn workers that's not the out-of-memory manager or a shut down signal?
Could it be the OOM or a shut down signal, and it's being recorded in some place that I'm not looking, or just not being recorded for some reason?
Is there a way to capture what's happing to Unicorn in more detail?
I have no idea where to go from here, so any suggestions would be much appreciated.
UPDATE
As suggested, I used strace to find out that unicorn was being killed by an old crontab (I know I should have checked there earlier) added by the previous developers that was intended to restart the server every night. The stop command worked, but the start command was failing.
I still don't know why I wasn't able to find anything in my log searches, but after attaching strace to the main unicorn process (using something like strace -o /tmp/strace.out -s 2000 -fp <unicorn_process_id>), the strace log ended with a clear +++ killed by SIGKILL +++. I searched the logs again, and that led me to the crontab.
The underlying cause is probably pretty specific to my situation, but I'm really glad I know about strace now.

Puma "Terminating timed out worker" after rendering HTML

I am new to AWS Beanstalk-Rails-Puma-Nginx.
After deploying my RAILS app to Beanstalk, all my api calls work fine, but HTML pages are causing error.
When opening my HTML page -
Nginx throws 502 Bad Gateway error.
Puma log :
Started GET "/admin" for 182.70.76.160 at 2016-04-22 05:13:19 +0000
Processing by Devise::SessionsController#new as HTML
Rendered devise/sessions/new.html.erb within layouts/application (6.1ms)
[18858] ! Terminating timed out worker: 22913
var/app/current/production.log is empty.
Read somewhere, that adding SSL could solve. Is it required to added SSL?
Please help! I am stuck!
STATUS :
My assets were huge because of which it was killing itself. I was using a theme and removed all the unnecessary js, css and images.
Now, Puma doesn't terminate, but it doesnot compile assets. I had selected Ruby as application type so it should do it for me, correct?
Try setting worker timeout to a higher value in puma config. Default value is 60 seconds
worker_timeout 100
It is possible that you are creating more workers than the server could handle. Try decreasing the worker count or increasing the server capacity.
For now I moved to EC2 as EBS issues weren't getting solved.
I had the same issue on EC2 but I could fix it as I access my machine.
Puma workers were timing out because my assets weren't precompiled.
Everytime I take a new build on server, I have to run the following :
RAILS_ENV=production rake assets:precompile

Heroku add-ons 'Logentries' & 'FlyData' query

I have a Ruby on Rails web app hosted on Heroku and I've setup Logentries add-on which sets up alarms for 'High Response Time'.
Lately, I have started getting emails for 'ALERT High Response Time', which mention that the high response time was triggered for
heroku router - - at=info method=GET path="/robots.txt"
Now, I know that Search Engines like Google, Microsoft use the robots.txt to ignore the pages that should not be indexed. Is there any other reason, why this file would be accessed?
Please correct me if I am missing something here.
Oh, and I am using the free version of Heroku i.e. 1 worker for website-content and I have 1 worker which runs periodic jobs using the Scheduler.
Query #2-
What's wrong with my application, when I get the following email from Logentries, with subject - 'ALERT Exit Timeout'
Exit timeout: Heroku/my-app
2014-10-13 18:53:56.351
188 <45>1 2014-10-13T18:53:56.053533+00:00 heroku web.1 - - Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Query #3-
I also installed the FlyData add-on trial to see how it works. I get emails with the subject - '[FlyData-Alert] (myapp) Application Error notification'.
The email says-
We noticed the following error logs on your application (myapp) :
2014-10-08T23:59:53.042662+00:00 app[scheduler.3266]: ** [NewRelic][10/08/14 23:59:53 +0000 21fd815f-5e08-42ab-80d8-4771ea1593c7 (2)] INFO : Installing Rails3 Error instrumentation
I think this email is triggered because of the INFO message from New Relic, which says - Installing Rails3 Error instrumentation. The FlyData add-on probably looks at the keyword 'Error' and triggers the email alert.
For Query #2: Heroku - Exit timeout: Heroku/my-app
According to Heroku's documentation,
"A process failed to exit within 10 seconds of being sent a SIGTERM indicating that it should stop. The process is sent SIGKILL to force an exit."
There is a complete list of Heroku Errors codes, including this one, that can be found here: https://devcenter.heroku.com/articles/error-codes#r12-exit-timeout
If you're using webrick to run your application on Heroku, you should try to switch to using 'thin' to see if that helps: See https://devcenter.heroku.com/articles/rails3#webserver.
or see the previous answer on stackoverflow here:
Rails app hosted on heroku: Error R12 (Exit timeout)
Hope this helps.
Michael

How to findout what cause unicorn workers timeout

People keeps claiming that my website always hang out at some pages. I checked the unicorn stderr log, and found many timeout errors like:
E, [2013-08-14T09:27:32.236478 #30027] ERROR -- : worker=5 PID:11619 timeout (601s > 600s), killing
E, [2013-08-14T09:27:32.252252 #30027] ERROR -- : reaped #<Process::Status: pid=11619,signaled(SIGKILL=9)> worker=5
I, [2013-08-14T09:27:32.266141 #4720] INFO -- : worker=5 ready
There are many error messages like that.
Then I go to the rails production log, find the exact requests by searching the unicorn error time minus 601s. These timeout request, all choked at the page rendering phase. The sql of these requests are done already. It just never gets an end:
Processing by XXXController#index as HTML
Rendered xxx/index.html.erb within layouts/application (41.4ms)
Rendered shared/_sidebar.html.erb (200.9ms)
No complete. Most of these requests served successfully. I don't know why at random time, it hang out there.
I have no idea what may cause this. Can anybody give me a clue of how to find the real reason that cause the unicorn workers to timeout?
Update:
We used NSC to transfer request and response to unicorn. And to try to improve the timeout issue, we added nginx between NSC and unicorn. It turns out the unicorn worker timeout still happens, and each timeout matches a nginx upstream timeout in nginx error log.
Does anyone knows whether there is some kind of bottle neck in TCP connection of unicorn?
I'm using Rack::Timeout to time out before unicorn. Unicorn timeout uses kill -9, and I don't think that gives you any way to do anything.

Interpreting heroku logs, is worker killed prematurely?

I'm trying to debug an issue with workers and I saw this message in my log file:
2013-07-14T21:59:07.024756+00:00 app[web.1]: E, [2013-07-14T14:59:07.024559 #2] ERROR -- : worker=1 PID:261 timeout (30s > 29s), killing
2013-07-14T21:59:07.067325+00:00 app[web.1]: E, [2013-07-14T14:59:07.066999 #2] ERROR -- : reaped #<Process::Status: pid 261 SIGKILL (signal 9)> worker=1
2013-07-14T21:59:07.070701+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=POST path=/photos/687 host=dev.tacktile.org fwd="199.83.223.92" dyno=web.1 connect=8ms service=29345ms status=503 bytes=0
2013-07-14T21:59:07.898048+00:00 app[web.1]: I, [2013-07-14T14:59:07.897739 #269] INFO -- : worker=1 ready
If I'm reading this correctly, my worker was killed because it took longer than 30 seconds. I thought only web responses got killed if longer than 30 seconds. I'm putting this task into a delayed job and processing it with a worker because I know it's slow.
I hope I'm misunderstanding something.
Your log indicates dyno=web.1 - so it looks like the web dyno connection was terminated after 30 seconds, not a worker dyno like you indicate. Have you read the note attached to the definition of the h13 error that indicates:
One example where this might happen is when a Unicorn web server is
configured with a timeout shorter than 30s and a request has not been
processed by a worker before the timeout happens. In this case,
Unicorn closes the connection before any data is written, resulting in
an H13.
Perhaps that's related?
PS. Editing my answer I see by "worker" you mean "Unicorn worker" I guess? Looks like your unicorn worker died for some reason (which is perhaps why you got the H13). Heroku won't explicitly kill a sub-process like that AFAIK.
I'm not a Ruby on Rails expert, but it seems like what you call "worker" is actually a web process (as evident by the dyno name, web.1). I am guessing you use Unicorn, which spawns several processes, each dealing with a single web request at a time. Each such process is termed a "worker", I guess, so it's really a matter of terminology.
As to why it happens: could it be that your web path actually waits for your real worker to complete the request, and thus it too is taking >30sec?

Resources