I have a docker swarm cluster form by 6 ECS nodes, 1 manager 5 worker node and 34 services with portainer and watchertower run globally.
I run redis, mysql(replicated mode and bind to specific label node) as docker swarm service and other service access them via serviceName. Normally, it works as I desire.
However, if I update service with --force option(trying to take effect immediately), the other services(not all the service, but quiet a few) start receive error like 'redis Name or Service not Known'. The thing is that I never update redis service. If I enter the container and ping the service, it returns me back correct IP.
The second weird phenomena is that One Service with multiple instances act differently, some instance did not has serviceName resolve issue, some does.
What's the cause of that? DNS cache? How can I avoid it ? What is the correct way to update a docker swarm service ?
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
pqcbjo1rgsta81nudzkkphbqt * bc1 Ready Active Leader 20.10.7
u0za3ktnk0ih4zpmzvpij6h35 bc2 Ready Active 20.10.7
t1jp4z5kiyc4gvml56j5dpzj0 bc3 Ready Active 20.10.7
fa2umzxk5b6vun85vlbw9r7xu bc4 Ready Active 20.10.17
cbamwvq2s5hmkvia035i9i2xw bc5 Ready Active 20.10.17
9uj8g25oirfq8i3fgsvbup5ou d2 Ready Active 20.10.7
Related
As we know in docker swarm we can have more than one manger. Let's suppose that we have 2 nodes and 2 managers (so each node is both manager and worker).
Now, let client (using CLI tool) execute following two seperated scenarios:
1. docker create service some_service
2. docker update --force some_service
where client is launched on one of swarm nodes.
Where will above requests be sent? Only to leader or to each worker node? How docker deal with simultaneous requests?
I assume you're talking about the docker cli talking to the manager api.
The docker cli on a node will default to connecting to localhost. Assuming you're on a manager, you can see which node your cli is talking to with docker node ls.
The * next to a node name indicates that's the one you're talking to.
From there if that node isn't the Leader, it will relay the commands to the Leader node and wait on a response to return to your cli. This all means:
Just ensure you're running the docker cli on a manager node or your cli is configured to talk to one.
It doesn't matter which manager, as they will all relay your command to the current Leader.
We are running docker version 17.06.0-ce and I'm very new to docker (at present learning it on the fly with little network/linux knowledge/experience).
One of the environment we have is running a single manager and worker.
What we are seeing the following two scenarios occurring :
- services are being assign a VIP that is being used and the service will fail to start with "Address already in use" error.
- services starts and is using the same VIP as another service. This can been seen on the manager "docker service inspect". This cause nginx to send requests to the wrong service (this can be seen in the logs).
Several question:
1) Has anyone encounter this?
2) how to does docker swarm decide on what VIP to assign?
3) How does docker know which VIP to use in a multi worker environment? The reason I ask is because in a single manager/worker environment you can see the VIP on the manager "docker service inspect" is the same as the worker "docker network inspect ingress" but in a multi manager/worker environment the VIP are all different.
in my opinion you mustn't use VIP. You have to work just with the name of services, and docker swarm will manage the load balancing.
I am having a problem trying to implement the best way to add new container to an existing cluster while all containers run in docker.
Assuming I have a docker swarm, and whenever a container stops/fails for some reason, the swarm bring up new container and expect it to add itself to the cluster.
How can I make any container be able to add itself to a cluster?
I mean, for example, if I want to create a RabbitMQ HA cluster, I need to create a master, and then create slaves, assuming every instance of RabbitMQ (master or slave) is a container, let's now assume that one of them fails, we have 2 options:
1) slave container has failed.
2) master container has failed.
Usually, a service which have the ability to run as a cluster, it also has the ability to elect a new leader to be the master, so, assuming this scenerio is working seemlesly without any intervention, how would a new container added to the swarm (using docker swarm) will be able to add itself to the cluster?
The problem here is, the new container is not created with new arguments every time, the container is always created as it was deployed first time, which means, I can't just change it's command line arguments, and this is a cloud, so I can't hard code an IP to use.
Something here is missing.
Maybe trying to declare a "Service" in the "docker Swarm" level, will acctualy let the new container the ability to add itself to the cluster without really knowing anything the other machines in the cluster...
There are quite a few options for scaling out containers with Swarm. It can range from being as simple as passing in the information via a container environment variable to something as extensive as service discovery.
Here are a few options:
Pass in IP as container environment variable. e.g. docker run -td -e HOST_IP=$(ifconfig wlan0 | awk '/t addr:/{gsub(/.*:/,"",$2);print$2}') somecontainer:latest
this would set the internal container environment variable HOST_IP to the IP of the machine it was started on.
Service Discovery. Querying a known point of entry to determine the information about any required services such as IP, Port, ect.
This is the most common type of scale-out option. You can read more about it in the official Docker docs. The high level overview is that you set up a service like Consul on the masters, which you have your services query to find the information of other relevant services. Example: Web server requires DB. DB would add itself to Consul, the web server would start up and query Consul for the databases IP and port.
Network Overlay. Creating a network in swarm for your services to communicate with each other.
Example:
$ docker network create -d overlay mynet
$ docker service create –name frontend –replicas 5 -p 80:80/tcp –network mynet mywebapp
$ docker service create –name redis –network mynet redis:latest
This allows the web app to communicate with redis by placing them on the same network.
Lastly, in your example above it would be best to deploy it as 2 separate containers which you scale individually. e.g. Deploy one MASTER and one SLAVE container. Then you would scale each dependent on the number you needed. e.g. to scale to 3 slaves you would go docker service scale <SERVICE-ID>=<NUMBER-OF-TASKS> which would start the additional slaves. In this scenario if one of the scaled slaves fails swarm would start a new one to bring the number of tasks back to 3.
https://docs.docker.com/engine/reference/builder/#healthcheck
Docker images have a new layer for health check.
Use a health check layer in your containers for example:
RUN ./anyscript.sh
HEALTHCHECK exit 1 or (Any command you want to add)
HEALTHCHECK check the status code of command 0 or 1 and than result as
1. healthy
2. unhealthy
3. starting etc.
Docker swarm auto restart the unhealthy containers in swarm cluster.
In docker swarm mode I can run docker node ls to list swarm nodes but it does not work on worker nodes. I need a similar function. I know worker nodes does not have a strong consistent view of the cluster, but there should be a way to get current leader or reachable leader.
So is there a way to get current leader/manager on worker node on docker swarm mode 1.12.1?
You can get manager addresses by running docker info from a worker.
The docs and the error message from a worker node mention that you have to be on a manager node to execute swarm commands or view cluster state:
Error message from a worker node: "This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager."
After further thought:
One way you could crack this nut is to use an external key/value store like etcd or any other key/value store that Swarm supports and store the elected node there so that it can be queried by all the nodes. You can see examples of that in the Shipyard Docker management / UI project: http://shipyard-project.com/
Another simple way would be to run a redis service on the cluster and another service to announce the elected leader. This announcement service would have a constraint to only run on the manager node(s): --constraint node.role == manager
In http://docs.docker.com/swarm/install-w-machine/
There are four machines:
Local, where swarm create will run
swarm-master
swarm-agent-00
swarm-agent-01
My understanding is swarm-master will control agents, but what is Local used for?
It is for generating the discovery token using the Docker Swarm image.
That token is used when creating the swarm master.
This discovery service associates a token with instances of the Docker Daemon running on each node. Other discovery service backends such as etcd, consul, and zookeeper are available.
So the "local" machine is there to make sure the swarm manager discovers nodes. Its functions are:
register: register a new node
watch: callback method for the swarm manager
fetch: fetch the list of entries
See this introduction: