Service cannot work randomly in docker 17.06 with swarm mode - docker

Command of running service:
docker service create -d \
-p 8080:8080 \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/etc/timezone,target=/etc/timezone \
--mount type=bind,source=/home/test/docker/manager,target=/root \
--network test-network \
--workdir /root \
--name test-manager \
--replicas 2 \
--limit-cpu 2 \
--limit-memory 4G \
java:8 java -Dspring.profiles.active=$PROFILE -jar -Xms512m -Xmx4096m /root/target/test-manager.jar
After service started, I tested service with curl 192.168.2.48:8080/info, 50% of the requests are not working. entered containers with docker exec -it xxx bash, used curl 10.0.1.6:8080/info and curl 10.0.1.7:8080/info, found all result was ok.
But if I restart the above service several times, Sometimes, all requests are working completely.
Network check
nc -vuz 192.168.2.48 4789
nc -vz 192.168.2.48 2377
nc -vuz 192.168.2.48 7946
nc -vz 192.168.2.48 7946
All are succeeded.
Results of docker info:
Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 25
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 102
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: tdc32kn0n6bwcz32ljvprcmq0
Is Manager: true
ClusterID: hdyakushxu1c6rsk2cml7b0l3
Managers: 2
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 192.168.2.47
Manager Addresses:
192.168.2.47:2377
192.168.2.48:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-92-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.78GiB
Name: ubuntu-qgsp01
ID: RP4U:E3PW:AU5R:BLD2:2QDL:DA25:GY2P:YV67:IR2F:GEBZ:XVX3:XC72
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://20agqwyc.mirror.aliyuncs.com/
Live Restore Enabled: false
I suspect the problem caused by network load balance or vip, a node cannot be reached, so the service was hanged. But if I ping another in a container, I find the network is working. I feel puzzled.
This problem has been bothering me for a long timeļ¼Œhope someone can help me.

Related

Rootless dind running in a kubernetes slave, 'docker run' fails

Has anyone seen / resolved the below:
I have a jenkins slave with rootless dind configured, all docker commands work except docker run, details and error below:
Error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown.
ERRO[0004] error waiting for container: context canceled
config and versions:
uname -a
Linux jnlp-5n7x4 4.4.0-1092-aws #103-Ubuntu SMP Tue Aug 27 10:21:48 UTC 2019 x86_64 Linux
docker info:
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 19.03.8
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: none
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
rootless
Kernel Version: 4.4.0-1092-aws
Operating System: Alpine Linux v3.11 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.812GiB
Name: jnlp-5n7x4
ID: X54B:QFRO:NKMQ:YJMW:NEVU:QU2A:VDHC:RJBI:M3YQ:KUU6:C4N7:IXNN
Docker Root Dir: /home/jenkins/.local/share/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
Thanks in advance
I was able to resolve the issue by using a different default runtime for docker.
#update default runtime
RUN wget -O crun https://github.com/containers/crun/releases/download/0.13/crun-0.13-static-x86_64 \
&& cp crun /usr/local/bin \
&& chmod a+x /usr/local/bin/crun \
&& chown -R rootless:rootless /usr/local/bin/crun
the service is then started with supervisor, config file shown:
[program:docker]
command=/home/rootless/bin/dockerd-rootless.sh --experimental --default-
runtime crun --add-runtime crun=/usr/local/bin/crun --storage-driver vfs
autorestart=true
user=rootless
detailed discussion here:
https://github.com/moby/moby/issues/40068

"cAdvisor" is not allowing other containers to be removed

I have installed "cAdvisor" to monitor my containers on the host, now whenever I try to stop and remove any other containers, cAdvisor is not allowing to remove the other containers.
`# docker ps -a | grep -i api
790ddf58f54a container/container-abc "/bin/sh -c 'sh -c..." 42 hours ago Dead`
`# docker rm 790ddf58f54a
Error response from daemon: Unable to remove filesystem for
790ddf58f54acf041b3e33bc040ea035d43be92315c7d970f411ad56a855e627: remove /var/lib/docker/containers/790ddf58f54acf041b3e33bc040ea035d43be92315c7d970f411ad56a855e627/shm: device or resource busy`
`# docker rm 790ddf58f54a
Error response from daemon: Unable to remove filesystem for
790ddf58f54acf041b3e33bc040ea035d43be92315c7d970f411ad56a855e627: remove` `/var/lib/docker/containers/790ddf58f54acf041b3e33bc040ea035d43be92315c7d970f411ad56a855e627/shm: device or resource busy`
When I stop the cAdvisor, it allows me to remove the containers.
`# docker ps | grep -i cadvisor
b54e4acb3f36 google/cadvisor "/usr/bin/cadvisor..." 21 hours ago Up 21 hours 0.0.0.0:9911->8080/tcp cadvisor`
`# docker stop b54e4acb3f36
b54e4acb3f36`
`# docker rm 790ddf58f54a
790ddf58f54a
I reffered to this github issue, but of no help https://github.com/moby/moby/issues/34198. Also searched further on this, is there any permanent fix for this issue.
1) container/container-abc is being generated using below command.
# docker run -d --restart=on-failure:5 --name=container-abc -p 15200:15200 -p 15201:15201 container-abc-image
2) I am using docker-compose for cadvisor, below is the docker compose content.
services:
cadvisor:
image: google/cadvisor
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- 9911:8080
privileged: true
restart: always
3) Below is the docker info command.
# docker info
Containers: 38
Running: 24
Paused: 0
Stopped: 14
Images: 310
Server Version: 1.13.1
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 3.12.74-60.64.85-default
Operating System: SUSE Linux Enterprise Server 12 SP1
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 31.27 GiB
ID: BTZ2:KQZM:VGL5:DN7P:LKEB:JMDY:57N6:JUC2:LIBA:UZWA:EU3T:CHWP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 171
Goroutines: 132
System Time: 2018-07-16T03:58:55.156080332-07:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
Experimental: false
Insecure Registries:
172.24.227.60:8090
127.0.0.0/8
Live Restore Enabled: false
Thanks in advance.

Docker worker node promote failing

Steps to reproduce the issue:
I am creating 5 node docker cluster AWS instances
Terminate the leader(Primary Manager) from AWS console
docker node demote (primary-node) from the terminal
docker node rm (primary-node)
docker node promote (worker-node) / docker node update (worker-node) --role manager
After Step 5 ,
Status of the worker node is down
Availability is active.
Status of the worker node is always down
I tried
docker node update --availability active (worker-node)
But it doesnt help
Output of docker version:
Docker version 18.03.1-ce, build 9ee9f40
Output of docker info:
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 18
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: pw2cgzi62tr1g5yn42gdue9sd
Is Manager: true
ClusterID: 4l9ngpfnlqov063np7efi9idw
Managers: 2
Nodes: 4
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.4.129
Manager Addresses:
172.31.4.129:2377
172.31.6.143:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1060-aws
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 990.7MiB
Name: ip-172-31-4-129
ID: CBXR:P6YF:ICCQ:R7LT:XM4M:BBMD:N4FN:ZPRI:3VOC:FO54:Y7I6:6LHK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Fix is as below:
1) When the primary leader leaves the cluster, swarm will chose a new leader.
2) Generate manager token of the New Primary leader
3) SSH into worker . Docker swarm leave . Docker swarm join with manager token

Docker overlay DNS cant resolve Container Name

I have 2 Docker Hosts with identical Docker Version and Configs.
Since yesterday i have a Problem with the Overlay Networks on one of the Hosts.
What i did:
docker network create -d overlay --attachable test_network
docker run --rm -it --network="test_network" --name test.DNS.name bash
wget test.DNS.name
On One Host the response is as expected:
bash-4.4# wget test.DNS.name
Connecting to test.DNS.name (10.0.1.6:80)
wget: can't connect to remote host (10.0.1.6): Connection refused
On the other the response is:
bash-4.4# wget test.DNS.name
wget: bad address 'test.DNS.name'
I have no idea where this could come from.
Every Idea is welcome
There the output from docker info (identical on both machines):
Containers: 3
Running: 2
Paused: 0
Stopped: 1
Images: 156
Server Version: 18.03.0-ce
Storage Driver: devicemapper
Pool Name: vg01-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data Space Used: 22.89GB
Data Space Total: 42.95GB
Data Space Available: 20.06GB
Metadata Space Used: 6.922MB
Metadata Space Total: 5.583GB
Metadata Space Available: 5.576GB
Thin Pool Minimum Free Space: 4.295GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: m99xj3rhtk8ymabv1k78v4pur
Is Manager: true
ClusterID: of2g9dy3xbwj2jye6potfq5dq
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.16.21.145
Manager Addresses:
10.16.21.145:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 62.74GiB
Name: gtunxlvd04346
ID: QPYZ:ZYLJ:DNXO:VCRC:Y2CG:LOEN:WSKN:52X5:JPNX:FLSB:GPIR:FY3U
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
thanks Constantin for your suggestion, we restartet the Host with the Problem and now it is working as expected.
So this Problem can be solved with a restart of the Host.
It would still be interesting to find the cause of this, but i think that if there is no relation between researching this issue and the chance this happens again.

docker - Error response from daemon: rpc error: code = 2 desc = name conflicts with an existing object

While creating docker service, i'm facing following error.. Error response from daemon: rpc error: code = 2 desc = name conflicts with an existing object
Steps
docker-machine create --driver virtualbox swarm-1
docker-machine create --driver virtualbox swarm-2
docker-machine create --driver virtualbox swarm-3
eval $(docker-machine env swarm-1)
docker swarm init --advertise-addr $(docker-machine ip swarm-1)
docker-machine ssh swarm-2
docker swarm join <token> and IP
docker-machine ssh swarm-3
docker swarm join <token> and IP
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
hdip26vwi9xvr131u1rr7yeia swarm-3 Ready Active
v7e56wf0j7fhkarnqsp5c32qo swarm-2 Ready Active
yjv3r4r4ls4qx47jnm0yov06u * swarm-1 Ready Active Leader
docker network create --driver overlay webnet
docker service create --name redisdb --network webnet --replicas 1 redis
Error response from daemon: rpc error: code = 2 desc = name conflicts with an existing object
I tried
docker service create --name redisdb --network webnet --replicas 1 redis:alpine
docker service create --name redisdb --network webnet --replicas 1 redis:alpine
docker service create --name redisdb --network webnet --replicas 1
rlesouef/alpine-redis
didn't work..
Any suggestion?
adding additional information
docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.13.1
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 0
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: p5bao7gz89hghllnykw8phaek
Is Manager: true
ClusterID: rn5xgfioygwp1b91gfm5znd7v
Managers: 1
Nodes: 3
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 192.168.99.100
Manager Addresses:
192.168.99.100:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.4.47-boot2docker
Operating System: Boot2Docker 1.13.1 (TCL 7.2); HEAD : b7f6033 - Wed Feb 8 20:31:48 UTC 2017
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 995.8 MiB
Name: swarm-1
ID: JGLZ:XY2M:TTZX:DIT7:QCMX:DCNO:6BR4:IJVM:HOQ7:N3Y6:YGNG:LBD4
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 41
Goroutines: 191
System Time: 2017-02-13T18:28:57.184074564Z
EventsListeners: 0
Username: pranaysankpal
Registry: https://index.docker.io/v1/
Labels:
provider=virtualbox
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Please suggest.
I encountered the same issue.
Solved it by the following:
1) fetching the list of services by running: sudo docker service ls.
you suppose to see the service you're trying to create (redisdb)
2) take the ID shown next to the redisdb service in the list
3) run: sudo docker service rm ID
4) now try to run the create command once again
Hope that helps

Resources