403 errors from load balancer while new instances are booting - ruby-on-rails

I have a RoR app running on elastic beanstalk. I have occasionally seen 403 errors from Passenger for a while. Most of the time 1 server is running but this gets increased to 3 or 4 instances in busy periods during the day.
Session stickeyness is not turned on
I have noticed that when a new server is started the ELB is sending requests to it before bundle install has finished.
If I ssh to the newly started server I can see in /var/app/current/ that the app has not yet been installed and if I run top it looks like bundler is running and compiling things with cc1, etc.
/var/app/support/log/passenger.log shows that requests to valid urls within my rails app are being received and responded to with 404. Hardly surprising because the app isn't there yet
After 5-10 minutes all of the compiling is complete and the app files appear in /var/app/current and all is well.
This doesn't seem quite right to me. How do I set up the ELB / my rails app so that the ELB can tell when it is ready to receive requests?

I found the answer to this. There was no application health check url set. In this case the ELB pings the instance to see if it's healthy, i.e. it checks that it is booted rather than if rails is up and running. Setting the health check url to '/login/' fixed it for me because this gives a 404 until rails in running and a 200 afterwards.
Elastic beanstalk demands 2 correct responses before it deems an instance to be healthy. It checks the instance every 5 minutes. This means that an instance can take a while to start serving requests. i.e. it takes boot time + waiting for next poll from elb + 5 minutes before it sees any real traffic

Related

Terminating timed out worker of puma in ebs on rails app

Running rails app in elastic-beanstalk for quite long time so far I didn't face this issue. My server keeps on restarting with logs as
[4412] ! Terminating timed out worker: 8646
[4412] ! Out-of-sync worker list, no 8646 worker
[4412] ! Out-of-sync worker list, no 8646 worker
In ebs log, I am getting
Environment health has transitioned from Ok to Warning. 50.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (3.0 requests/min) to determine application health (5 minutes ago). 1 out of 1 instances are impacted. See instance health for details.
My instance ran in t2.medium(CPU core 2), is this because of heavy load on the server? Do I need to upgrade my instance? I am totally stuck.
FYI: I included puma only by having in gemfile, as rails 4.+ can handle.
Suggest me your opinion on this. Thanks.

Why my Amz EC2 server downs everyday for few minutes? Errors 503 and 502. It's a Rails app

I don't know which error causes the problem. When I see, the server is down with the error 503. In Google Chrome log, I have the following error:
503 Service Unavailable: Back-end server is at capacity
While the server is down, I can't get to connect via SSH to see the error log. After few minutes the server works and I am go to the nginx error log.
In the log, I have common errors, like:
ActiveRecord::RecordNotFound (Couldn't find Attachment with 'id'=4240)
I know how to solve and I think that this errors is not the problem.
But I have this error too:
Sending 502 response: application did not send a complete response
Process (pid=31880, group=/home/ubuntu/........./current/public) no longer exists! Detaching it from the pool.
I think that it is the problem, but I looked in the internet and the causes and solutions do not appear to solve the problem.
This problem happens after I created a Load Balancer and use HTTPS.
Before, this problem never happens.
About my server and app:
Amazon Ec2 instance;
Using Classic Load Balancer (with Amazon Certificate Manager in https port);
Using Route 53;
Don't using Elastic IP;
OS: Ubuntu 14.04.2 LTS
ruby -v: 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
rails -v: Rails 4.2.3
nginx -v: nginx/1.8.0
passenger -v: Phusion Passenger version 5.0.10
Load Balancer Health Check is set up like this:
Ping Target
HTTP:80/index.html
Timeout 5 seconds
Interval 30 seconds
Unhealthy threshold 5
Healthy threshold 5
Health Check Information:
I get this print in the Load Balancer MONITORING tab. Is the Unhealthy Hosts (Count). Why my host was unhealthy?
SOLUTION
In my case, the problem was in the assets precompile task.
I have a lot of assets in my app and when I did the deploy with capistrano, it exhausts the server.
In other side, sometimes, the assets was precompiled after the deploy, during the page load. But this task is very slowly, and returns the errors 502, 503 and 504.
It causes the servers down to, because the CPU utilization goes to 100%, the average latency is going higher too.
To solve, I removed the assets precompile task from Capistrano. I precompile the assets in my locally PC and send all of them to GIT branch MASTER. When I run cap production deploy, the precompile taks will not run. More details in this post.
I did some changes in my Load Balancer Health Check settings:
Ping Target HTTP:80/elb/index.html (I created in pubic folder this folder and file)
Timeout 5 seconds
Interval 30 seconds
Unhealthy threshold 2
Healthy threshold 10
Idle timeout: 65 seconds (equal my nginx timeout)
With this I hope the task assets precompile never more runs on the server.

How can I automatically restart my Heroku app when there's a server error?

I have a Rails 4.2 app running on Heroku. Occasionally there is an issue that causes most incoming requests to get a server error. For example, there could be a memory leak or a max database connection issue. How can I setup a script or service to automatically restart the server when it detects errors?
I think this service could ping the app every few minutes and if it detects an error, it should confirm there's really a problem and then run heroku restart. How could this be set up?
After Googling this topic, I came across Neptune.io, which seems to provide a useful service for this task.

Request to external service times out on Heroku web process but works in console process

I have a Rails 4 application running on Heroku. For one type of request I make a HTTP call to an external service and then return the response to the client.
As I see from the logs, the request to the external service is taking too long and resulting in the Heroku's H12 error where it sends a 503 after 30 seconds. The HTTP request that I am making to the external service eventually comes back with a Net::ReadTimeout after some more time (60 seconds).
However if I run heroku run console and make the same HTTP call (through the same Ruby code), it works just fine. The request completes in about a second or two at the max.
I am unable to understand why this request is timing out when run from the web process while it works seamlessly in the heroku run console.
I am running Puma as my webserver. I followed guidelines given here : https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server
I also tried the basic WEBrick server to see if that helps. But no avail.
Has anyone faced this issue? Any hints on how to debug this?

AWS Deployment with Rails - Inaccessible

Could you tell me what happens with AWS Server now? From 3 weeks ago, util now, whenever I deploy my RoR app into AWS Server (using ElasticBeantalk tool), I meet a strange issue
Deployment time is quite good (just about 10-15 minutes), and the healthy of server is still green. But after that, the server is inaccessible. This status last about 3 - 4 hours !!! Then, everything is OK, server run fast and smoothly. I totally don't understand server healthy still un-change although this error happen. Everything I can do is "refresh browser periodically until it run"
I don't think my application is bigger enough with total deployment time like that. It just takes me about 20 minutes on local (production mode)
Here're some error I found out when server is hang:
"An error occured while starting up the preloader."
"Gateway timeout" when loading application.js (using chrome debug)
"Bad gateway" when loading application.js (using chrome debug)
Please give me some advise to solve that. I have been stucked on this issue for a long time
Thanks

Resources