Network issues with docker containers having specific IP - docker

Problem
I have a problem with one IP address (172.17.0.11) in my docker network. Whenever a container gets this IP, outbound connections from the container stop working. When I kill this container:
I still can ping this IP despite no one is using it
There are no rules in iptables associated with this IP
I see a lot of established connections by docker-proxy in netstat for this IP but at the same time, other IPs from this list with dangling connections don't have any issues
It looks like IP conflict to me – curl doesn't work, wget and ping work very slowly probably because they re-establish the connection every time. This is not DNS issue, curl by IP doesn't work as well, what docker image used makes no difference.
Infrastructure
It's a single server setup on Debian 8 (4.9 kernel) with kubernetes 1.6.4 and docker-ce 17.06.1 (overlay2). This issue happened after I upgraded from 1.12.6 to 17.06.1
Please help me debug this issue.
docker version:
Client:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:53:31 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:51:25 2017
OS/Arch: linux/amd64
Experimental: false
docker info:
Containers: 336
Running: 336
Paused: 0
Stopped: 0
Images: 52
Server Version: 17.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Kernel Version: 4.9.0-0.bpo.3-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 28.76GiB
Name: host
ID: QY6I:JI2S:BOPG:FIQP:YEBB:3UYF:N3G2:COCQ:PX7Z:QRCV:GIEN:FGQC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Did you try rebooting the faulty node ? Looks like some namespace/bridge configuration might have gotten stuck.

This issue was caused by desync between docker network (bridge plugin) and the actual state of the network on the host machine. The IP from docker network was released but the associated virtual interface and related tcp connections left intact. So when this IP attached to a new container, network anomalies started to happen.
Most likely this happened after random docker daemon hangs (happened with the older 1.12 version).

Related

Container abruptly killed with warning "cleaning up after killed shim"

We have recently upgraded from docker version 17.06.0-ce to 18.09.2 on our deployment environment.
Experienced container got killed suddenly after running for few days without much information in docker logs.
Monitored the memory usage, and the affected containers are well below all limits (per container and also the host has enough memory free).
Setup observations during the issue:
docker version with 18.09.2 with around 30 running containers.
Experienced container got killed after running for few days.
Docker Logs observed during container crash
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171040904Z" level=info msg="shim reaped" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171156262Z" level=warning msg="cleaning up after killed shim" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d namespace=moby
Nov 16 15:42:11 site1 dockerd[3022]: time="2020-11-16T15:42:11.171164295Z" level=warning msg="failed to delete process" container=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d error="ttrpc: client shutting down: ttrpc: closed: unknown" module=libcontainerd namespace=moby process=b0d77b1ebf2c82b09c152530a5e24491d76e216b852e385686c46128c94e7f5a
Nov 16 15:42:11 site1 c73920e3476c[3022]: INFO: 2020/11/16 15:42:11.396872 [nameserver a6:0c:6a:18:69:1f] container d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d died; tombstoning entry test-endpoint-s104.weave.local. -> 10.44.0.14
Output of Docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
Output of Docker Info:
Containers: 30
Running: 25
Paused: 0
Stopped: 5
Images: 236
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-171-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.92GiB
Name: fpas-site1-dra-director-a
ID: KKSM:3YNF:LE7N:NVFE:Y5C4:C6CN:LAQT:QRRZ:VYQS:O4PP:VQKG:DXTK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
com.broadhop.swarm.uuid=uuid4:d96aef99-b5fc-44e3-b7fa-65b08b7e30f3
com.broadhop.swarm.role=endpoint-role
com.broadhop.swarm.node=
com.broadhop.swarm.hostname=site1
com.broadhop.swarm.mode=
com.broadhop.network.interfaces=internal:172.26.50.13
Experimental: false
Insecure Registries:
registry:5000
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
NOTE:
Since this deployment is on critical infrastructure and that we want to understand why this happened and ascertain that this does not occur again. Did anyone faced same kind of issue in any environment and please let us know if there are known issues in with the docker versions being used.
Your go lang version is quite old, you may try to update. I found this issue in the github.
https://github.com/moby/moby/issues/38742

Container Build error - failed to shutdown container - container encountered an error during Shutdown

I am trying to build container image in windows 2019 standard edition. The server run in VMware environment. While performing docker build by using docker file received following error
returned a non-zero code: 4294967295: failed to shutdown container: container 3bdxxxxx encountered an error during Shutdown: failure in a Windows system call: The interface is unknown. (0x6b5)
Docker Info
Server Version: 19.03.5
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics internal l2bridge l2tunnel nat null overlay private transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Standard Version 1809 (OS Build 17763.1282)
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 6GiB
Docker Root Dir: C:\ProgramData\docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker Vesion
Client: Docker Engine - Enterprise
Version: 19.03.5
API version: 1.40
Built: 11/13/2019 08:00:16
OS/Arch: windows/amd64
Experimental: false
Server: Docker Engine - Enterprise
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.24)
Built: 11/13/2019 07:58:51
OS/Arch: windows/amd64
Experimental: false
In the docker file, I am using
FROM mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2019
PowerShell command to install windows feature
start-process command to start application EXE
Receiving above error while executing "start-process"
The issue due to multiple reasons, Did following changes to fix
Increased CPU cores (The CPU reaches 100% while performing docker build operation, Due to this container got exit in between).
While performing docker build used the "--memory=16g" parameter. Refer to Runtime options with Memory, CPUs, and GPUs for more details.
Application EXE expecting reboot configured "/noreboot" in the configuration.

Docker pull: operation not permitted

I'm getting this error when pulling some docker images (but not all):
failed to register layer: Error processing tar file(exit status 1): operation not permitted
For example: docker pull nginx works, but not docker pull redis.
I get the same result wether i run the command with a user that is part of the docker group, using sudo or as root.
If i run dockerd in debug mode i see this in the logs:
DEBU[0025] Downloaded 5233d9aed181 to tempfile /var/lib/docker/tmp/GetImageBlob023191751
DEBU[0025] Applying tar in /var/lib/docker/overlay2/e5290b8c50d601918458c912d937a4f6d4801ecaa90afb3b729a5dc0fc405afc/diff
DEBU[0027] Applied tar sha256:16ada34affd41b053ca08a51a3ca92a1a63379c1b04e5bbe59ef27c9af98e5c6 to e5290b8c50d601918458c912d937a4f6d4801ecaa90afb3b729a5dc0fc405afc, size: 79185732
(...)
DEBU[0029] Applying tar in /var/lib/docker/overlay2/c5c0cfb9907a591dc57b1b7ba0e99ae48d0d7309d96d80861d499504af94b21d/diff
DEBU[0029] Cleaning up layer c5c0cfb9907a591dc57b1b7ba0e99ae48d0d7309d96d80861d499504af94b21d: Error processing tar file(exit status 1): operation not permitted
INFO[0029] Attempting next endpoint for pull after error: failed to register layer: Error processing tar file(exit status 1): operation not permitted
INFO[0029] Layer sha256:938f1cd4eae26ed4fc51c37fa2f7b358418b6bd59c906119e0816ff74a934052 cleaned up
(...)
If i run watch -n 0 "sudo ls -lt /var/lib/docker/overlay2/" while the image is pulling, i can see new folders appearing (and disappearing after it fails) and the permissions on /var/lib/docker/overlay2/ are root:root:700 so i don't think it's exactly a permission issue.
Here are some detail about the environment:
I have a proxmox running the LXC container where i'm having the issue.
The container itself is running Debian 8.
And here are the various versions:
$> uname -a
Linux [redacted-hostname] 4.10.15-1-pve #1 SMP PVE 4.10.15-15 (Fri, 23 Jun 2017 08:57:55 +0200) x86_64 GNU/Linux
$> docker version
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:20:04 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:18:59 2017
OS/Arch: linux/amd64
Experimental: false
$>docker info
Containers: 20
Running: 0
Paused: 0
Stopped: 20
Images: 28
Server Version: 17.06.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Kernel Version: 4.10.15-1-pve
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.906GiB
Name: resumed-dev
ID: EBJ6:AFVS:L3RC:ZEE7:A6ZJ:WDQE:GTIZ:RXHA:P4AQ:QJD7:H6GG:YIQB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 16
Goroutines: 24
System Time: 2017-08-17T14:17:07.800849127+02:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
EDIT: This will be fixed by any release after December 18, 2017 of Moby via this merge. Will update again when fully incorporated into Docker.
If your container is unprivileged, this appears to be an issue with the overlay2 storage driver for Docker. This does not appear to be an issue with overlay (GitHub issue). So either utilize the overlay storage driver instead of overlay2, or make your container privileged.
I have almost the same environment as you, and met the same problem.
Some image works perfectly (alpine), while some images fails at cleaning up (ubuntu).
strace -f dockerd -D then docker pull or docker load gives the reason:
mknodat(AT_FDCWD, "/dev/agpgart", S_IFCHR|0660, makedev(10, 175)) = -1 EPERM (Operation not permitted)
Unprivileged container prohibit mknod by design. If you insist nesting Docker inside lxc, you will have to choose privileged container. (And notice that existing unprivileged container cannot be converted to privileged container directly due to uid/gid mapping)

Building docker for the ARM-64 architecture

I have been trying to compile docker for the ARM-64 architecture. Docker doesn’t officially support ARM 64-bits (at least not through the package management tools); hence I have to build it from source. Building docker binary set needs docker itself as a dependency. I’ve already managed to compile both the docker daemon and the client via the following (hack) command:
./hack/make.sh dynbinary
However, I haven’t managed to run it successfully. Both binaries are compiled and work, but when I want to start up the daemon it complains about other dependencies:
Failed to connect to containerd. Please make sure containerd is installed in your PATH or you have specified the correct address. Got error: exec: "docker-containerd": executable file not found in $PATH
As I mentioned earlier, I cannot build all the binaries as they need docker itself running.
Looking forward to your help.
Two weeks ago, I was able to install Docker on a Pine64 running Armbian (Debian based). It was as easy as following the official documentation for armhf with one exception, change [arch=armhf] by [arch=arm64] when you add the new apt source.
After the install you have a real Arm64 docker running :
root#pine64:~# docker system info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 60
Server Version: 17.12.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 28
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 3.10.107-pine64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 979.6MiB
Name: pine64
ID: xxx
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: xxx
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Docker fails pulling repository with error: Too Many Requests (HAP429)

I was trying to install gitlab using docker containers and was able to bring up gitlab successfully using docker compose file from sameersbn.
However after few uninstalls and (docker rm ) reinstalls (docker-compose up) as part of CI testing, I started getting this weird error while running docker-compose up or docker run
[root#server.com ~]# docker run java
Unable to find image 'java:latest' locally
Pulling repository docker.io/library/java
docker: Error while pulling image: Get https://index.docker.io/v1/repositories/library/java/images: malformed MIME header line: Too Many Requests (HAP429)..
See 'docker run --help'.
I can't seem to be able to pull any of the docker containers using docker run or docker-compose.
Couldn't find much help online reg this issue.
As per the docker hub forum the issue https://forums.docker.com/t/429-too-many-requests-how-to-fix-this-isssue/3971/7 should disappear after an hour but I waited half a day without much luck!
Here are the details of my installation:
[root#server build]# docker version
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built:
OS/Arch: linux/amd64
[root#server build]# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 15
Server Version: 1.12.1
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 3.077 GB
Data Space Total: 61.2 GB
Data Space Available: 58.12 GB
Metadata Space Used: 1.204 MB
Metadata Space Total: 641.7 MB
Metadata Space Available: 640.5 MB
Thin Pool Minimum Free Space: 6.119 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2015-10-14)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.64 GiB
Name: server.com
ID: SDFS:SDEF:GKY5:UKGK:QHWR:H4EC:wEFw:YVAS:JE2V:A5YB:FDSW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 17
Goroutines: 23
System Time: 2016-10-09T18:34:43.969512367-05:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
127.0.0.0/8
Any help would be much appreciated. I'm stuck with this error and can't proceed any further with my gitlab.
Thanks.
This may or may not be relevant to your situation, but I can report that I had the same error (didn't go away within an hour) and it was related to the fact that I was on a VPN to my office. I don't know if the VPN was the issue, or the NAT of my workplace, but when I turned off the VPN, the issue went away.
Note, I was running Docker for Windows (W7), so my circumstances are quite different from yours. But perhaps this answer will be useful to you or to anyone else looking for an answer.
Bottom line: If you are using a VPN, switch it off and try again. If you are inside a corporate filewall, try from outside.

Resources