EC2 instance is showing unhealthy after reboot - amazon-elb

I have setup one Application load balancer crossed zoned. Initially all instances are in healthy status but after reboot a instance that particular instance shows unhealthy in the target group.
Protocol HTTP
Path /
Port traffic port
Healthy threshold 5
Unhealthy threshold 5
Timeout 5
Interval 30
Success codes 200

Are you sure your webservice has successfully started back up after you rebooted it?
You'll likely need to provide much more information (such as OS + webservice) if you are to get any proper answer, but as a general suggestion you your remote into your EC2 instance and figure out if the service (IIS, Apache, Nginx or otherwise) is actually running.

Related

nginx with high traffic socket.io running on docker

So I am building a web application for university which has a very high tick rate (clients recieving data from node server above 30 times per second via socketio). This works well in docker. Now I installed nginx, configured it and everything works well (no exposed ports, socket still running etc.) but now nginx logs in the docker terminal every single socket connection from every single client (at two clients well above 60 logs per second) and I think this also leads to performance issues and causes small lag to the clients. I did not find any solutions in their docs.

Docker Swarm load balance testing using Chrome

I've tried doing simple single node swarm just like in Docker tutorial part 3 and I've found out that if I use curl then I'm jumping between two replicas, but if I use Chrome then once I open the page then any following requests will be handled by the same replica. I'm sure I'm actually hitting it only once, because counter increases only by 1.
What is happening? Is it some kind of feature in Docker Swarm load balancing? If so, how would it work? No specific request headers are send to the server, so how would the load balancer recognize me? It can't be IP, because if I use incognito mode I'll be handled by different replica and I'll be stick to it as long as I'm in incognito.
It's not a Swarm thing, it's a chrome thing. Curl acts like you'd expect, each command is a new TCP request that shows as a new connection going through the Swarm VIP load balancer.
Chrome (and other browsers) have lots of methods to keep TCP connections open for future requests (HTTP keep-alives, etc). This is why it will stay connected to the same container because the connection is persistent through the LB to the replica. The LB will only shift to the "next in the round-robin pool" for a new connection.

AWS Load Balancer EC2 health check request timed out failure

I'm trying to get down and dirty with DevOps and I'm running into a health check request timed out failure. The problem is my Elastic Load Balancer sends a health check to my EC2 instance and gets a network timeout. I'm not sure what I did wrong. I am following this tutorial and I have completed all the steps up to and including "Using a Elastic Load Balancer". My EC2 instance seems to be working fine and I am able to successfully curl localhost on port 9292 from within the EC2 instance.
EC2 instance security group setup:
Elastic Load Balancer setup:
My target group for the ELB routing has port 9292 open via HTTP and here's a screenshot of the target in my target group that is unhealthy.
Health check config:
I have a VPC that my EC2 instance is a part of and my ELB is connected to the same VPC. I do not have Apache installed and I do not have nginx installed. To my understanding, I do not need these. I have a Rails Puma server running and I can send successful curl requests to the server.
My hunch is that my ELB is not allowed to reach my EC2 instance, resulting in a network timeout and a failed health check. I'm unable to find the cause for this. Any ideas? This SO post didn't help much. Are my security groups misconfigured? What else could potentially block a routing request from ELB to my EC2 instance?
Also, is there a way to view network requests / logs for my EC2 instance? I keep seeing VPC flow logging but I feel like there are simpler alternatives.
Here's something I posted in the AWS forums but to no avail.
UPDATE: I can curl the private IP of target just fine from within an EC2 instance. I don't think it's the target instance, I think it's something to do with the security group setup. I am unable to identify why though because I have basically allowed all traffic from the Load Balancer to the EC2 instance.
I made my mistake during the "Setup your VPC" step. I finished creating a subnet for an RDS instance. I proceeded to start an instance and the default subnet that AWS chose when I switched to my VPC was the subnet I made for my RDS, which was NOT a public subnet. Therefore, any attempts, from any EC2 instance or my load balancer, would not be able to reach it because I had only set up my public subnet to take requests.
The solution was to create a new instance and this time, pick the correct public subnet. My original EC2 instance was associated with a private subnet while the load balancer was pointing to the public subnet.
Here's a link to a hand drawn image that helped me pin point my problem, hopefully can help anyone else who's having trouble setting up. I didn't put image here directly because it's bigger than 2MB.
Glad to answer any further questions too!

Docker Swarm Late Server Startup

I've been using docker swarm for a while and I'm really pleased with how simple it is to set up a swarm cluster and to run replicated services. However I've faced a problem that seems like a blocker in my use case.
I'm using docker 1.12 and swarm mode.
My problem is that the internal IPVS load balancer sends request to tasks that have "status health: starting" and whereas my application is not properly started.
My application takes some time to start but docker swarm load balancer starts sending requests as soon as the container is in "state running".
After running some tests I realized that If I scale up one instance, the instance is available to the load balancer immediately and the client may get a connection refused response if the load balancer sends the request to the starting server.
I've implemented the health check and I was expecting a particular instance to only become available to the load balancer after the first successful health check.
Is there any way to configure the load balancer or the scheduler to only send request to instance that are properly started?
Best Regards,
Bruno Vale

403 errors from load balancer while new instances are booting

I have a RoR app running on elastic beanstalk. I have occasionally seen 403 errors from Passenger for a while. Most of the time 1 server is running but this gets increased to 3 or 4 instances in busy periods during the day.
Session stickeyness is not turned on
I have noticed that when a new server is started the ELB is sending requests to it before bundle install has finished.
If I ssh to the newly started server I can see in /var/app/current/ that the app has not yet been installed and if I run top it looks like bundler is running and compiling things with cc1, etc.
/var/app/support/log/passenger.log shows that requests to valid urls within my rails app are being received and responded to with 404. Hardly surprising because the app isn't there yet
After 5-10 minutes all of the compiling is complete and the app files appear in /var/app/current and all is well.
This doesn't seem quite right to me. How do I set up the ELB / my rails app so that the ELB can tell when it is ready to receive requests?
I found the answer to this. There was no application health check url set. In this case the ELB pings the instance to see if it's healthy, i.e. it checks that it is booted rather than if rails is up and running. Setting the health check url to '/login/' fixed it for me because this gives a 404 until rails in running and a 200 afterwards.
Elastic beanstalk demands 2 correct responses before it deems an instance to be healthy. It checks the instance every 5 minutes. This means that an instance can take a while to start serving requests. i.e. it takes boot time + waiting for next poll from elb + 5 minutes before it sees any real traffic

Resources