I have the Docker Swarm cluster setup in my preprod servers(3 manager node and 7 worker nodes); however I would like to replicate the same in Production servers but rather than using commands I prefer using a script.
At present I am using "docker swarm init" to initialize the swarm and then adding the workers and managers with the generated key.
I would have 30 servers and planning for 7 manager and 23 worker nodes.
I have searched the net; but could not find any script which can initialize the docker swarm automatically with a script in all the servers.
Any help would really appreciated.
The way I approached this was to have a common build script for all nodes, and to use Consul for sharing the docker swarm manager token with the rest of the cluster.
The first node (at 10.0.0.51) calls docker swarm init and places the token in the key value store, and the remaining nodes (at 10.0.0.52 onwards) read the token back and use it to call docker swarm join.
The bash looked something like this -
# Get the node id of this machine from the local IP address
privateNetworkIP=`hostname -I | grep -o 10.0.0.5.`
nodeId=`(echo $privateNetworkIP | tail -c 2)`
if [ $nodeId -eq 1 ]; then
sudo docker swarm init
MANAGER_KEY_IN=`sudo docker swarm join-token manager -q`
curl --request PUT --data $MANAGER_KEY_IN http://10.0.0.51:8500/v1/kv/docker-manager-key
else
MANAGER_KEY_OUT=`curl -s http://10.0.0.51:8500/v1/kv/docker-manager-key?raw`
sudo docker swarm join --token $MANAGER_KEY_OUT 10.0.0.51:2377
fi
... and works fine provided that node 1 is built first.
There is nothing like in-built utilities for that you can use the command like this :
You can create your custom script like this :
for i in `cat app_server.txt` ; do echo $i ; ssh -i /path/to/your_key.pem $i "sudo docker swarm join --token your-token-here ip-address-of-manager:port" ; done
Here app_server.txt is the ip address of your worker node that you want to add in your swarm.
--token : your token generated by manager on docker swarm init
Hope this may help.
You can also use ansible for the same but that require ansible docker modules installed on all the worker nodes.
Thank you!
Related
We are running a docker swarm and using Monit to see resources utilisation. The
Process memory for dockerd keeps on growing over time. This happens on all nodes that at least perform a docker action e.g docker inspect or docker exec. I'm suspecting it might be something related to this these actions but I'm not sure how to replicate it. I have a script like
#!/bin/sh
set -eu
containers=$(docker container ls | awk '{if(NR>1) print $NF}')
# Loop forever
while true;
do
for container in $containers; do
echo "Running Inspect on $container"
CONTAINER_STATUS="$(docker inspect $container -f "{{.State}}")"
done
done
but I'm open to other suggestions
Assuming you can run ansible to run a command via ssh on all servers:
ansible swarm -a "docker stats --no-stream"
A more SRE solution is containerd + Prometheus + AlerManager / Grafana to gather metrics from the swarm nodes and then implement alerting when container thresholds are exceeded.
Don't forget you can simply set a resource constraint on Swarm services to limit the amount of memory and cpu service tasks can consume or be restarted. Then just look for services that keep getting OOM killed.
When I run docker start, it seems the container might not be fully started at the time the docker start command returns. Is it so?
Is there a way to wait for the container to be fully started before the command returns? Thanks.
A common technique to make sure a container is fully started (i.e. services running, ports open, etc) is to wait until a specific string is logged. See this example Waiting until Docker containers are initialized dealing with PostgreSql and Rails.
Edited:
There could be another solution using the HEALTHCHECK of Docker containers.The idea is to configure the container with a health check command that is used to determine whether or not the main service if fully
started and running normally.
The specified command runs inside the container and sets the health status to starting, healthy or unhealthy
depending of its exit code (0 - container healthy, 1 - container is not healthy). The status of the container can then be retrieved
on the host by inspecting the running instance (docker inspect).
Health check options can be configured inside Dockerfile or when the container is run. Here is a simple example for PostgreSQL
docker run --name postgres --detach \
--health-cmd='pg_isready -U postgres' \
--health-interval='5s' \
--health-timeout='5s' \
--health-start-period='20s' \
postgres:latest && \
until docker inspect --format "{{json .State.Health.Status }}" postgres| \
grep -m 1 "healthy"; do sleep 1 ; done
In this case the health command is pg_isready. A web service will typically use curl, other containers have their specific commands
The docker community provides this kind of configuration for several official images here
Now, when we restart the container (docker start), it is already configured and we need only the second part:
docker start postgres && \
until docker inspect --format "{{json .State.Health.Status }}" postgres|\
grep -m 1 "healthy"; do sleep 1 ; done
The command will return when the container is marked as healthy
Hope that helps.
Disclaimer, I'm not an expert in Docker, and will be glad to know by myself whether a better solution exists.
The docker system doesn't really know that container "may not be fully started".
So, unfortunately, there is nothing to do with this in docker.
Usually, the commands used by the creator of the docker image (in the Dockerfile) are supposed to be organized in a way that the container will be usable once the docker start command ends on the image, and its the best way. However, it's not always the case.
Here is an example:
A Localstack, which is a set of services for local development with AWS has a docker image, but once its started, for example, S3 port is not ready to get connections yet.
From what I understand a non-ready-although-exposed port will be a typical situation that you refer to.
So, out of my experience, in the application that talks to docker process the attempt to connect to the server port should be enclosed with retries and once it's available.
I have three servers in a Docker Swarm cluster (1 master and 2 nodes).
I want to start containers that execute a command which has an incremental value.
Doing that from bash simply using Docker was very easy:
for i in `seq 1 6`;
do
sudo docker run --cpuset-cpus="$i" -d -v /tmp/out:/data_out -t -i image /binary $i
done
As you can see I'm starting several containers each one using a specific CPU id and passing the same id as argument to a binary which is within the container.
How can I do the same but using my Docker Swarm cluster?
Thank you
I'm trying to set up a framework, using docker swarm, where I can connect from an external system (via ssh or whatever) into a specific service's container. So, I'm able to do this using something like:
ssh -o ProxyCommand="ssh ubuntu#10.0.0.18 nc 172.18.0.4 22" -l root foo
Here 10.0.0.18 is one of the swarm nodes and I then connect to the gateway bridge address (172.18.0.4) for that specific container.
In order to provide some automation around this I'd like to be able to inspect whatever docker object in order to map a containers' ID to its bridge IP. I'd like to create a mapping of something like:
{
container_id: {
swarm_node: <Swarm node IP>,
bridge_ip: <Container's bridge IP>
}
}
However, I cannot see any kind of struct which shows the bridge info for a specific container. I can always exec into a given container and run ifconfig but I was hoping to avoid that.
Any pointers appreciated!
Try starting with this:
docker service ls -q \
| xargs docker service ps -f desired-state=running -q \
| while read task_id; do
docker inspect -f '{{printf "%.12s" .Status.ContainerStatus.ContainerID }}:
{ swarm_node: {{.NodeID}},
bridge_ip: {{range .NetworksAttachments}}{{if ne "ingress" .Network.Spec.Name }}{{.Addresses}}{{end}}{{end}}
}' $task_id
done
You may need to cleanup the container IP a bit since it's coming out as a list of IP's in with a bit length included. And the swarm node is actually the node id, not the node IP. I also hardcoded the exclusion for "ingress", not sure if there's a cleaner way.
To map node ID's to IP addresses, here's another one to work with:
docker node ls -q \
| xargs docker node inspect -f '{ {{.ID}}: {{.Status.Addr}} }'
I have installed Shipyard following the automatic procedure on their website. This works and I can access the UI. It's available on 172.31.0.179:8080. From the UI, I see a container called 'shipyard-discovery' which is exposing 172.31.0.179:4001.
I'm now trying to add an additional node to Shipyard. For that I use Docker Machine to install an additional host and on that host I'm using the following command to add the node to Shipyard.
curl -sSL https://shipyard-project.com/deploy | ACTION=node DISCOVERY=etcd://173.31.0.179:4001 bash -s
This additional node is not added to the Swarm cluster and is not visible in the Shipyard UI. On that second host I get the following output
-> Starting Swarm Agent
Node added to Swarm: 172.31.2.237
This indicated that indeed the node is not added to the Swarm cluster as I was expecting sth like: Node added to Swarm: 172.31.0.179
Any idea on why the node is not added to the Swarm cluster?
Following the documentation for manual deployment you can add a Swarm Agent writing it's host IP:
docker run \
-ti \
-d \
--restart=always \
--name shipyard-swarm-agent \
swarm:latest \
join --addr [NEW-NODE-HOST-IP]:2375 etcd://[IP-HOST-DISCOVERY]:4001
I've just managed to make shipyard see the nodes in my cluster, you have to follow the instructions in Node Installation, by creating a bash file that does the deploy for you with the discovery IP set up.