Checking reason behind node failure - docker

I have docker swarm setup with nodes running node-1, node-2 and node-3. Due to some reason everyday one of my node is getting failed basically they exits. I ran docker logs <container id of swarm> but logs doesn't contains any info related to node failure.
So, is there any logs file where a log related to this failure can be seen? or this is due to some less memory allocation problem?
Can any one suggest me how to dig this problem and find a proper solution. As everyday I have to start by swarm nodes.

Like most containers, Swarm containers run and exit unless you use docker run with the -d option to "daemonize" them. For example:
$ docker run -d swarm join --advertise=172.30.0.69:2375 consul://172.30.0.161:8500
On the other hand, if you used Docker Machine to create the VMs, then also use Docker Machine to create the Swarm manager and nodes. By default, Docker Machine applies TLS authentication to the Docker Engine nodes. The easiest thing to do is to also create the Swarm manager and nodes at the same time as you create the Docker Engine nodes.
For more info, check out the brand new Swarm doc.

Related

Docker Swarm Mode - Show containers per node

I am using Docker version 17.12.1-ce.
I have set up a swarm with two nodes, and I have a stack running on the manager, while I am to instantiate new nodes on the worker (not within a service, but as stand-alone containers).
So far I have been unable to find a way to instantiate containers on the worker specifically, and/or to verify that the new container actually got deployed on the worker.
I have read the answer to this question which led me to run containers with the -e option specifying constraint:Role==worker, constraint:node==<nodeId> or constraint:<custom label>==<value>, and this github issue from 2016 showing the docker info command outputting just the information I would need (i.e. how many containers are on each node at any given time), however I am not sure if this is a feature of the stand-alone swarm, since docker info only the number of nodes, but no detailed info for each node. I have also tried with docker -D info.
Specifically, I need to:
Manually specify which node to deploy a stand-alone container to (i.e. not related to a service).
Check that a container is running on a specific swarm node, or check how many containers are running on a node.
Swarm commands will only care/show service-related containers. If you create one with docker run, then you'll need to use something like ssh node2 docker ps to see all containers on that node.
I recommend you do your best in a Swarm to have all containers as part of a service. If you need a container to run on nodeX, then you can create a service with a "node constraint" using labels and constraints. In this case you could restrict the single replica of that service to a node's hostname.
docker service create --constraint Node.Hostname==swarm2 nginx
To see all tasks on a node from any swarm manager:
docker node ps <nodename_or_id>

docker-compose swarm without docker-machine

After looking through docker official swarm explanations, github issues and stackoverflow answers im still at a loss on why i am having the problem that i have.
Issue at hand: docker-compose up starts services not in the swarm even though swarm is active and has 2 nodes.
Im using 1.12.1 docker version.
Looking at swarm tutorial i was able to start and scale my swarm using docker service create without any issues.
running docker-compose up with version 2 docker-compose.yml results in services starting outside of swarm, i can see them through docker ps but not docker service ls
I can see that docker-machine as the tool that solves this problems, but then again it needs virtual box to be installed.
so my questions would be
Can i use docker-compose with docker-swarm (NOT docker-engine) without docker-machine and without experimental build bundle functionality?
If docker service create can start a service on any nodes is it an indication that network configuration of the swarm is correct ?
What is the advantages/disadvantages of docker-machine versus experimental build functionality
1) No. Docker Compose isn't integrated with the new Swarm Mode yet. Issue 3656 in GitHub is tracking that. If you start containers on a swarm with Docker Compose at the moment, it uses docker run to start containers, which is why you see them all on one node.
2) Yes. Actually you can use docker node ls on the manager to confirm all the nodes are up and active, and docker node inspect to check a particular node, you don't need to create a service to validate the swarm.
3) Docker Machine is also behind the 1.12 release, so if you start a swarm with Docker Machine it will be the 'old' type of swarm. The old Docker Swarm product needed a whole lot of extra setup for a key-value store, TLS etc. which Swarm Mode does for free.
1) You can't start services using docker-compose on the new Docker "Swarm Mode". There's a feature to convert a docker-compose file to the new dab format which is understood by the new swarm mode but that's incomplete and experimental at this point. You basically need to use bash scripts to start services at the moment.
2) The nodes in a swarm (swarm mode) interact using their own overlay network. It's the one named ingress when you do docker network ls. You need to setup your own overlay network to run services in. eg:
docker network create -d overlay mynet
docker service create --name serv1 --network mynet nginx
3) I'm not sure what feature you mean by "experimental build'. docker-machine is just a way to create hosts (the nodes). It facilitates the setting up of the docker daemon on each host, the certificates and allows some basic maintenance (renewing the certs, stopping/starting a host if you're the one who created it). It doesn't create services, volumes, networks or manages them. That's the job of the docker api.

How to run same container on all Docker Swarm nodes

I'm just getting my feet wet with Docker Swarm because we're looking at ways to configure our compute cluster to make it more containerized.
Basically we have a small farm of 16 computers and I'd like to be able to have each node pull down the same image, run the same container, and accept jobs from an OpenMPI program running on a master node.
Nothing is really OpenMPI specific about this, just that the containers have to be able to open SSH ports and the master must be able to log into them. I've got this working with a single Docker container and it works.
Now I'm learning Docker Machine and Docker Swarm as a way to manage the 16 nodes. From what I can tell, once I set up a swarm, I can then set it as the DOCKER_HOST (or use -H) to send a "docker run", and the swarm manager will decide which node runs the requested container. I got this basically working using a simple node list instead of messing with discovery services, and so far so good.
But I actually want to run the same container on all nodes in one command. Is this possible?
Docker 1.12 introduced global services and passing --mode global to run command Docker will schedule service to all nodes.
Using Docker Swarm you can use labels and negative affinity filters to gain the same result:
openmpi:
environment:
- "affinity:container!=*openmpi*"
labels:
- "com.myself.name=openmpi"

Overlay network on Swarm Mode without Docker Machine

I currently have three hosts (docker1, docker2 and docker3) which I have not set up using Docker Machine, each one running the v1.12-rc4 Docker daemon.
I run docker swarm init on docker1, which in turn prints a docker swarm join command which I run on both docker2 and docker3. At that point, running docker info on each host contains the Swarm: active line.
It is at this point that the behavior seems to differ from what I used to get with the standalone Swarm container. Especially, running docker network ls will only show me the networks on the local host, and when trying to create an overlay network, it does not seem like worker nodes are aware of it (i.e. it does not show up on their docker network ls.)
I feel like I have missed out on some important information relating to the workings of the Swarm Mode as opposed to the Swarm container.
What is the correct way of setting up such a cluster without Docker Machine on Docker 1.12 while getting the overlay network feature?
I too thought this was an issue when I first started using it.
This works a little differently in 1.12rc4 - when you deploy a container to your swarm with that network attached to it, it should then create the network on the other nodes as well.
Hope this helps!
Issue
You are using the docker command (used to communicate with your localhost Docker daemon) and not the "swarm" command (used to communicate with the Swarm master).
Solution
It depends on the command you used to start Swarm.
A full step-by-step tutorial (including details on how to deploy an overlay network) is detailled on this answer. I'm sure that reading this will help you ;)
With a network scope of swarm, the network is only propagated to worker nodes on an as-needed basis. If you create a service using that network, and it gets scheduled on that worker node, the network will show up in the docker network ls.
With the now-upcoming 1.13 release, you can get a network that has similar behavior to the non-swarm networks by doing docker network create --attachable .... That network will be valid for both services and normal containers, and will be available to all members of the cluster. As of 1.13.0-rc2, those don't seem to show up in the output of docker network ls.

How to remove node from swarm?

I added three nodes to a swarm cluster with static file mode. I want to remove host1 from the cluster. But I don't find a docker swarm remove command:
Usage: swarm [OPTIONS] COMMAND [arg...]
Commands:
create, c Create a cluster
list, l List nodes in a cluster
manage, m Manage a docker cluster
join, j join a docker cluster
help, h Shows a list of commands or help for one command
How can I remove the node from the swarm?
Using Docker Version: 1.12.0, docker help offers:
➜ docker help swarm
Usage: docker swarm COMMAND
Manage Docker Swarm
Options:
--help Print usage
Commands:
init Initialize a swarm
join Join a swarm as a node and/or manager
join-token Manage join tokens
update Update the swarm
leave Leave a swarm
Run 'docker swarm COMMAND --help' for more information on a command.
So, next try:
➜ docker swarm leave --help
Usage: docker swarm leave [OPTIONS]
Leave a swarm
Options:
--force Force leave ignoring warnings.
--help Print usage
Using the swarm mode introduced in the docker engine version 1.12, you can directly do docker swarm leave.
The reference to "static file mode" implies the container based standalone swarm that predated the current Swarm Mode that most know as Swarm. These are two completely different "Swarm" products from Docker and are managed with completely different methods.
The other answers here focused on Swarm Mode. With Swarm Mode docker swarm leave on the target node will cause the node to leave the swarm. And when the engine is no longer talking to the manager, docker node rm on an active manager for the specific node will cleanup any lingering references inside the cluster.
With the container based classic swarm, you would recreate the manager container with an updated static list. If you find yourself doing this a lot, the external DB for discovery would make more sense (e.g. consul, etcd, or zookeeper). Given the classic swarm is deprecated and no longer being maintained, I'd suggest using either Swarm Mode or Kubernetes for any new projects.
Try this:
docker node list # to get a list of nodes in the swarm
docker node rm <node-id>
Using the Docker CLI
I work with Docker Swarm clusters and to remove a node from the cluster there are two options.
It depends on where you want to run the command, within the node you want to remove or on a manager node other than the node to be removed.
The important thing is that the desired node must be drained before being removed to maintain cluster integrity.
First option:
So I think the best thing to do is (as steps in official document):
Go to one of the nodes with manager status using a terminal ssh;
Optionally get your cluster nodes;
Change the availability to drained of the node you want to remove;
And remove it;
# step 1
ssh user#node1cluster3
# step 2, see the nodes in your cluster like print screen below
docker node ls
# step 3, drain one of them
docker node update --availability drain node4cluster3
# step 4, remove the drained node
docker node rm node4cluster3
Second option:
The second option needs two terminal logins, one on a manager node and one on the node you want to remove.
Perform the 3 initial steps described in the first option to drain the desired node.
Afterwards, log in to the node you want to remove and run the docker swarm leave command.
# remove from swarm using leave
docker swarm leave
# OR, if the desired node is a manager, you can use force (be careful*)
docker swarm leave --force
*For information about maintaining a quorum and disaster recovery, refer to the Swarm administration guide.
My environment information
I use Ubuntu 20.04 for nodes inside VMs;
With Docker version 20.10.9;
Swarm: active;

Resources