Docker swarm redirects requests to the "new" and "old" versions of the app simultaneously, during the service update - docker

I'm using this example https://github.com/BretFisher/node-docker-good-defaults
I have a nodejs backend app and I have 1 replica of the service.
And the "deploy" section in the docker-stack.yml file looks like this:
deploy:
update_config:
order: start-first
When I make a new version of the app, and then update the service, during the update, I have an overlap of the working "old" and "new" version of the app.
Steps to reproduce:
build a new version of the app
update the service
send requests to the backend app each 10ms.
Result: at some point in time I see responses from the old and the new version.
Is it expected behavior?

Yes, this is expected.
Load balancing between multiple replicas of a swarm mode service in docker is round robin between the replicas. There's no priority between the replicas, including for containers running on the same node or for a newer version, it's just round robin.
This is implemented per connection, so if you want to keep talking to the same container, you could keep a persistent connection open (e.g. http keep alive), but then you'll want to have an automatic retry in the app if the connection fails, to handle the container getting replaced.

Related

how can I ensure traffic is being served only on my new cloud run service deployment?

I am seeing in my stackdriver logs that I am running an old revision of a cloud run service, while the console shows only one newer revision as live.
In this case I have a cloud run container built in a GCP project, but which is deployed in a second project, using the fully specified image name. (My attempts at Terraform were sidetracked into an auth catch 22, so until I have restored determination, I am manually deploying.) I don't know if this wrinkle of the two projects is relevant to my question. Otherwise all the auth works.
In brief, why may I be seeing old deployments receiving traffic minutes after the new deployment is made? Even more than 30 minutes later traffic is still reaching the old deployment.
There are a few things to take into account here:
Try explicitly telling Cloud Run to migrate all traffic to the latest revision. You can do that by adding the following to your .yaml file
metadata:
name: SERVICE
spec:
...
traffic:
- latestRevision: true
percent: 100
Try always adding :latest tag upon the building of a new image
so instead of only having let's say gcr.io/project/newimage it would be gcr.io/project/newimage:latest. This way you will ensure the latest image is being used and not previously automatically assigned tags.
If neither fix your issue, then please provide the logs as there might be something useful that indicates what is the root cause. (Also let us know if you are using any caching config)
You can tell Cloud Run to route all traffic to the latest instance with gcloud run services update-traffic [SERVICE NAME] --to-latest. That will route all traffic to the latest deployment, and update the traffic allocation as you deploy new instances.
You might not want to use this if you need to validate the service after deployment and before "opening the floodgates", or if you're doing canary deployments.

What steps does docker swarm take when doing a rolling update with start-first?

When docker swarm does a rolling update with stop-first on multiple running container instances, it takes -among others- the following steps in order for each container in a row:
Remove the container from its internal load balancer
Send a SIGTERM signal to the container.
With respect to the stop-grace-period, send a SIGKILL signal.
Start a new container
Add the new container to its internal load balancer
But which order of steps are taken when I want to do a rolling update with start-first?
Will the old- and new- container be available through the loadbalancer at the same time (until the old one has stopped and removed from the lb)?
Or will the new container first be started and not be added to the loadbalancer until the old container is stopped and removed from the loadbalancer?
The latter would nescesarry for processes that are bounded to a specific instance of a service (container).
But which order of steps are taken when I want to do a rolling update
with start-first?
It's basically the reverse. New container starts, added to LB, then the old one is removed from LB and sent shutdown signal.
Will the old- and new- container be available through the loadbalancer
at the same time (until the old one has stopped and removed from the
lb)?
Yes.
A reminder that most of this will not be seamless (or near zero downtime) unless you (at a minimum) have healthchecks enabled in the service. I talk about this a little in this YouTube video.

How can I deploy docker swarm user docker stack with services by order

I have an issue about Docker Swarm.
I have tried to deploy my app with Docker Swarm Mode.
But I can not arrange my services start by order, although I was used depends_on (It’s recommended not support for docker stack deploy).
How can I deploy that with services start by order
Ex:
Service 1 starting
Service 2 wait for Service 1
Please help.
This is not supported by Swarm.
Swarm is designed for high availability. When encountering problems (services or hosts fail) services will be restartet in the order they failed.
If you have clear dependencies between your services and they can't handle waiting for the other service to be available or reconnecting, your system won't work.
Your services should be written in a way that they can handle any service being redeployed at any time.
There is no orchestration system that recommend or support this feature.
So forgot about it because this is a very bad idea.
Application infrastructure (here the container) should not depend on the database health, but your application itself must depend on the database health.
You see the difference?
For instance, the application could display an error message "Not ready yet" or "This feature is disabled because elasticsearch is down" etc...
So even if this is possible to implement this pattern (aka "wait-for", with kubernetes, you can use initContainer to "wait" for another service up&ready), I strongly recommend to move this logic to your application.

Docker swarm load balancing - How to give common name to the service?

I read swarm routing mesh
I create a simple service which uses tomcat server and listens at 8080.
docker swarm init I created a node manager at node1.
docker swarm join /tokens I used the token provided by the manager at node 2 and node 3 to create workers.
docker node ls shows 5 instances of my service, 3 running at node 1, 1 running at node 2, another one is at node 3.
docker service create image I created the service.
docker service scale imageid=5 scaled it.
My application uses atomic number which is maintained at JVM level.
If I hit http://node1:8080/service 25 times, all requests goes to node1. How dose it balance node?
If I hit http://node2:8080/service, it goes to node 2.
Why is it not using round-robin?
Doubts:
Is anything wrong in the above steps?
Did I miss something?
I feel I am missing something. Like common service name http://domain:8080/service, then swarm will work in round robin fashion.
I would like to understand only swarm mode. I am not interested external load balancer as of now.
How do I see swarm load balance in action?
Docker does round robin load balancing per connection to the port. As long as a connection is up, it will continue to go to the same instance.
Http allows a connection to be kept alive and reused. Browsers take advantage of this behavior to speed up later requests by leaving connections open. To test the round robin load balancing, you'd need to either disable that keep alive setting or switch to a command line tool like curl or wget.

How to keep a certain number of Docker containers running the same application and add/remove them as needed?

I've working with Docker containers. What Ive done is lunching 5 containers running the same application, I use HAProxy to redirect requests to them, I added a volume to preserve data and set restart policy as Always.
It works. (So far this is my load balancing aproach)But sometimes I need another container to join the pool as there might be more requests, or maybe at first I don't need 5 containers.
This is provided by the Swarm Mode addition in Docker 1.12. It includes orchestration that lets you not only scale your service up or down, but recover from an outage by automatically rescheduling the jobs to run on other nodes.
If you don't want to use Docker 1.12 (yet!), you can also use a Service Discovery like Consul, register your containers inside and use a tool like Consul Template to regenerate your load balancer configuration accordingly.
I made a talk 6 months ago about it. You can find the code and the configuration I used during my demo here: https://github.com/bargenson/dockerdemo

Resources