Docker Swarm Late Server Startup - docker-swarm

I've been using docker swarm for a while and I'm really pleased with how simple it is to set up a swarm cluster and to run replicated services. However I've faced a problem that seems like a blocker in my use case.
I'm using docker 1.12 and swarm mode.
My problem is that the internal IPVS load balancer sends request to tasks that have "status health: starting" and whereas my application is not properly started.
My application takes some time to start but docker swarm load balancer starts sending requests as soon as the container is in "state running".
After running some tests I realized that If I scale up one instance, the instance is available to the load balancer immediately and the client may get a connection refused response if the load balancer sends the request to the starting server.
I've implemented the health check and I was expecting a particular instance to only become available to the load balancer after the first successful health check.
Is there any way to configure the load balancer or the scheduler to only send request to instance that are properly started?
Best Regards,
Bruno Vale

Related

Why is my lb responding with bad gateway?

I have no webserver runnning on my ec2 machine, but I still get 502 bad gateway from the load balancer in front of it.
Why do I get bad gateway error from the load balancer, but no bad gateway error, when there is no load balancer in front of the ec2 machine, but just a time out.
The load balancer regularly does health checks on its target machines, i.e. it sends an HTTP or TCP request (as you have configured it). This way it knows what machines in its target pool are healthy and can take requests and which can't. It's supposed to balance the load between multiple machines after all.
When your EC2 machine does not have a running web server, its health check fails and it's seen as unavailable by the load balancer. Since apparently there's no other healthy machine in the pool, the load balancer cannot forward any requests to anything, and thus answers with a 502 Bad Gateway status.
The difference to just timing out when you try to access your EC2 machine directly is that in the case of a load balancer, there's still something that can accept and handle HTTP requests and return appropriate HTTP error codes. When you simply have no web server whatsoever, the connection cannot be accepted by anything and thus can only time out.

google run - does each container instance get 443, if that is what I need

I am trying to understand google run to deploy docker containers on demand. I may have load balancer at 443 and all that, but assume without load balancer will I be able to get 443 for all say 10s or 100s or instances? Thanks!
It's serverless! It's mysterious and powerful!! In fact, on only have to worry about your code (here, your container with Cloud Run). You have to host a webserver (in HTTP (by default on the port 8080 but you can change it), not HTTPS) that answer to HTTP requests. That's all!!
Then deploy it. The deployment create a service and a revision. Each new deployment, create a new revision (set of container + param unique, like this, if your new container and/or the new params of the new revision break your service, you can easily rollback to a previous stable revision).
When you serve traffic, Cloud Run is behind GFE (Google Front End). A Google wide proxy in charge of SSL management (that's why you don't have to worry about HTTPS in your container) and to route the traffic to your Cloud Run revisions. Here, Cloud Run engine is in charge of the instance creation (because Cloud Run scale to 0), and the loadbalancing of the traffic between all the created instances. You have nothing to do, it's native.
So, take it easy, that's the future for the developers!

EC2 instance is showing unhealthy after reboot

I have setup one Application load balancer crossed zoned. Initially all instances are in healthy status but after reboot a instance that particular instance shows unhealthy in the target group.
Protocol HTTP
Path /
Port traffic port
Healthy threshold 5
Unhealthy threshold 5
Timeout 5
Interval 30
Success codes 200
Are you sure your webservice has successfully started back up after you rebooted it?
You'll likely need to provide much more information (such as OS + webservice) if you are to get any proper answer, but as a general suggestion you your remote into your EC2 instance and figure out if the service (IIS, Apache, Nginx or otherwise) is actually running.

Docker Swarm load balance testing using Chrome

I've tried doing simple single node swarm just like in Docker tutorial part 3 and I've found out that if I use curl then I'm jumping between two replicas, but if I use Chrome then once I open the page then any following requests will be handled by the same replica. I'm sure I'm actually hitting it only once, because counter increases only by 1.
What is happening? Is it some kind of feature in Docker Swarm load balancing? If so, how would it work? No specific request headers are send to the server, so how would the load balancer recognize me? It can't be IP, because if I use incognito mode I'll be handled by different replica and I'll be stick to it as long as I'm in incognito.
It's not a Swarm thing, it's a chrome thing. Curl acts like you'd expect, each command is a new TCP request that shows as a new connection going through the Swarm VIP load balancer.
Chrome (and other browsers) have lots of methods to keep TCP connections open for future requests (HTTP keep-alives, etc). This is why it will stay connected to the same container because the connection is persistent through the LB to the replica. The LB will only shift to the "next in the round-robin pool" for a new connection.

AWS Load Balancer EC2 health check request timed out failure

I'm trying to get down and dirty with DevOps and I'm running into a health check request timed out failure. The problem is my Elastic Load Balancer sends a health check to my EC2 instance and gets a network timeout. I'm not sure what I did wrong. I am following this tutorial and I have completed all the steps up to and including "Using a Elastic Load Balancer". My EC2 instance seems to be working fine and I am able to successfully curl localhost on port 9292 from within the EC2 instance.
EC2 instance security group setup:
Elastic Load Balancer setup:
My target group for the ELB routing has port 9292 open via HTTP and here's a screenshot of the target in my target group that is unhealthy.
Health check config:
I have a VPC that my EC2 instance is a part of and my ELB is connected to the same VPC. I do not have Apache installed and I do not have nginx installed. To my understanding, I do not need these. I have a Rails Puma server running and I can send successful curl requests to the server.
My hunch is that my ELB is not allowed to reach my EC2 instance, resulting in a network timeout and a failed health check. I'm unable to find the cause for this. Any ideas? This SO post didn't help much. Are my security groups misconfigured? What else could potentially block a routing request from ELB to my EC2 instance?
Also, is there a way to view network requests / logs for my EC2 instance? I keep seeing VPC flow logging but I feel like there are simpler alternatives.
Here's something I posted in the AWS forums but to no avail.
UPDATE: I can curl the private IP of target just fine from within an EC2 instance. I don't think it's the target instance, I think it's something to do with the security group setup. I am unable to identify why though because I have basically allowed all traffic from the Load Balancer to the EC2 instance.
I made my mistake during the "Setup your VPC" step. I finished creating a subnet for an RDS instance. I proceeded to start an instance and the default subnet that AWS chose when I switched to my VPC was the subnet I made for my RDS, which was NOT a public subnet. Therefore, any attempts, from any EC2 instance or my load balancer, would not be able to reach it because I had only set up my public subnet to take requests.
The solution was to create a new instance and this time, pick the correct public subnet. My original EC2 instance was associated with a private subnet while the load balancer was pointing to the public subnet.
Here's a link to a hand drawn image that helped me pin point my problem, hopefully can help anyone else who's having trouble setting up. I didn't put image here directly because it's bigger than 2MB.
Glad to answer any further questions too!

Resources