Intermittent connection failures between Docker containers - docker

Description
I am experiencing some intermittent communications issues between containers in the same overlay network. I have been struggling to find a solution to this for weeks but everything I see in Google relating to communications issues dosen't quite match what I am seeing. So I am hoping someone here can help me figure out what is going on.
We are using Docker 17.06
We are using standalone swarm with three masters and one node.
We have multiple overlay networks
Containers attached to each overlay network:
1 container running Apache Tomcat 8.5 and HAproxy 1.7 (called the controller)
1 container just running Apache Tomcat 8.5 (called the apps container)
3 containers running Postgresql 9.6
1 container running an FTP service
1 container running Logstash
Steps to reproduce the issue:
Create a new overlay network
Attach containers
Look at the logs and after a short while you see the errors
Describe the results you received:
The "controller" polls a servlet on "apps" container every few seconds.
Every 15 minutes or so we see a connect timed out error in the log files of the "controller". And perodically we see connection attempt failed when the controller tries to access its database in one of the Postgresql containers.
Error when polling apps container
org.apache.http.conn.ConnectTimeoutException: Connect to srvpln50-webapp_1.0-1:5050 [srvpln50-webapp_1.0-1/10.0.1.6] failed: connect timed out
Error when trying to connect to database
JavaException: com.ebasetech.xi.exceptions.FormRuntimeException: Error getting connection using Database Connection CONTROLLER, SQLEx
ception in StandardPoolDataSource:getConnection exception: java.sql.SQLException: SQLException in StandardPoolDataSource:getConnection no connection available java.sql.SQLException: Cannot
get connection for URL jdbc:postgresql://srvpln50-controller-db_latest:5432/ctrldata : The connection attempt failed.
I turned on debug mode on the docker deamon node.
Every time these errors occur I see the following corrosponding entry in the docker logs:
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.422797691Z" level=debug msg="Name To resolve: srvpln50-webapp_1.0-1."
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.422905040Z" level=debug msg="Lookup for srvpln50-webapp_1.0-1.: IP [10.0.1.6]"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.648262289Z" level=debug msg="miss notification: dest IP 10.0.0.3, dest MAC 02:42:0a:00:00:03"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.716329366Z" level=debug msg="miss notification: dest IP 10.0.0.6, dest MAC 02:42:0a:00:00:06"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.716952000Z" level=debug msg="miss notification: dest IP 10.0.0.6, dest MAC 02:42:0a:00:00:06"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.802320875Z" level=debug msg="miss notification: dest IP 10.0.0.3, dest MAC 02:42:0a:00:00:03"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.944189349Z" level=debug msg="miss notification: dest IP 10.0.0.9, dest MAC 02:42:0a:00:00:09"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.944770233Z" level=debug msg="miss notification: dest IP 10.0.0.9, dest MAC 02:42:0a:00:00:09"
IP 10.0.0.3 is the "controller" container
IP 10.0.0.6 is the "apps" container
IP 10.0.0.9 is the "postgresql" container that the "controller" is trying to connect to.
Describe the results you expected:
Not to have the connection errors
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version:
Client:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:51:12 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:50:04 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 19
Running: 19
Paused: 0
Stopped: 0
Images: 18
Server Version: 17.06.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 385
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-108-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.784GiB
Name: swarm-node-1
ID: O5ON:VQE7:IRV6:WCB7:RQO4:RIZ4:XFHE:AUCX:ZLM2:GPZL:DXQO:BCIX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 217
Goroutines: 371
System Time: 2018-02-09T15:50:01.902816981Z
EventsListeners: 2
Registry: https://index.docker.io/v1/
Labels:
name=swarm-node-1
Experimental: false
Cluster Store: etcd://localhost:2379/store
Cluster Advertise: 10.80.120.13:2376
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Swarm masters, node and containers are running Ubuntu 16.04 on bare metal servers
If there is anything I have missed that would aid diagnose please let me know.

Having read many comments from the Docker folks on Google about many communication issues being fixed in the latest version of Docker we upgraded to 17.12 CE and all the issues we were experiencing went away.
Would love to know what the issue was but am more than happy to see them gone.

Related

Container abruptly killed with warning "cleaning up after killed shim"

We have recently upgraded from docker version 17.06.0-ce to 18.09.2 on our deployment environment.
Experienced container got killed suddenly after running for few days without much information in docker logs.
Monitored the memory usage, and the affected containers are well below all limits (per container and also the host has enough memory free).
Setup observations during the issue:
docker version with 18.09.2 with around 30 running containers.
Experienced container got killed after running for few days.
Docker Logs observed during container crash
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171040904Z" level=info msg="shim reaped" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171156262Z" level=warning msg="cleaning up after killed shim" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d namespace=moby
Nov 16 15:42:11 site1 dockerd[3022]: time="2020-11-16T15:42:11.171164295Z" level=warning msg="failed to delete process" container=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d error="ttrpc: client shutting down: ttrpc: closed: unknown" module=libcontainerd namespace=moby process=b0d77b1ebf2c82b09c152530a5e24491d76e216b852e385686c46128c94e7f5a
Nov 16 15:42:11 site1 c73920e3476c[3022]: INFO: 2020/11/16 15:42:11.396872 [nameserver a6:0c:6a:18:69:1f] container d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d died; tombstoning entry test-endpoint-s104.weave.local. -> 10.44.0.14
Output of Docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
Output of Docker Info:
Containers: 30
Running: 25
Paused: 0
Stopped: 5
Images: 236
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-171-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.92GiB
Name: fpas-site1-dra-director-a
ID: KKSM:3YNF:LE7N:NVFE:Y5C4:C6CN:LAQT:QRRZ:VYQS:O4PP:VQKG:DXTK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
com.broadhop.swarm.uuid=uuid4:d96aef99-b5fc-44e3-b7fa-65b08b7e30f3
com.broadhop.swarm.role=endpoint-role
com.broadhop.swarm.node=
com.broadhop.swarm.hostname=site1
com.broadhop.swarm.mode=
com.broadhop.network.interfaces=internal:172.26.50.13
Experimental: false
Insecure Registries:
registry:5000
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
NOTE:
Since this deployment is on critical infrastructure and that we want to understand why this happened and ascertain that this does not occur again. Did anyone faced same kind of issue in any environment and please let us know if there are known issues in with the docker versions being used.
Your go lang version is quite old, you may try to update. I found this issue in the github.
https://github.com/moby/moby/issues/38742

Network issues with docker containers having specific IP

Problem
I have a problem with one IP address (172.17.0.11) in my docker network. Whenever a container gets this IP, outbound connections from the container stop working. When I kill this container:
I still can ping this IP despite no one is using it
There are no rules in iptables associated with this IP
I see a lot of established connections by docker-proxy in netstat for this IP but at the same time, other IPs from this list with dangling connections don't have any issues
It looks like IP conflict to me – curl doesn't work, wget and ping work very slowly probably because they re-establish the connection every time. This is not DNS issue, curl by IP doesn't work as well, what docker image used makes no difference.
Infrastructure
It's a single server setup on Debian 8 (4.9 kernel) with kubernetes 1.6.4 and docker-ce 17.06.1 (overlay2). This issue happened after I upgraded from 1.12.6 to 17.06.1
Please help me debug this issue.
docker version:
Client:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:53:31 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:51:25 2017
OS/Arch: linux/amd64
Experimental: false
docker info:
Containers: 336
Running: 336
Paused: 0
Stopped: 0
Images: 52
Server Version: 17.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Kernel Version: 4.9.0-0.bpo.3-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 28.76GiB
Name: host
ID: QY6I:JI2S:BOPG:FIQP:YEBB:3UYF:N3G2:COCQ:PX7Z:QRCV:GIEN:FGQC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Did you try rebooting the faulty node ? Looks like some namespace/bridge configuration might have gotten stuck.
This issue was caused by desync between docker network (bridge plugin) and the actual state of the network on the host machine. The IP from docker network was released but the associated virtual interface and related tcp connections left intact. So when this IP attached to a new container, network anomalies started to happen.
Most likely this happened after random docker daemon hangs (happened with the older 1.12 version).

Docker pull: operation not permitted

I'm getting this error when pulling some docker images (but not all):
failed to register layer: Error processing tar file(exit status 1): operation not permitted
For example: docker pull nginx works, but not docker pull redis.
I get the same result wether i run the command with a user that is part of the docker group, using sudo or as root.
If i run dockerd in debug mode i see this in the logs:
DEBU[0025] Downloaded 5233d9aed181 to tempfile /var/lib/docker/tmp/GetImageBlob023191751
DEBU[0025] Applying tar in /var/lib/docker/overlay2/e5290b8c50d601918458c912d937a4f6d4801ecaa90afb3b729a5dc0fc405afc/diff
DEBU[0027] Applied tar sha256:16ada34affd41b053ca08a51a3ca92a1a63379c1b04e5bbe59ef27c9af98e5c6 to e5290b8c50d601918458c912d937a4f6d4801ecaa90afb3b729a5dc0fc405afc, size: 79185732
(...)
DEBU[0029] Applying tar in /var/lib/docker/overlay2/c5c0cfb9907a591dc57b1b7ba0e99ae48d0d7309d96d80861d499504af94b21d/diff
DEBU[0029] Cleaning up layer c5c0cfb9907a591dc57b1b7ba0e99ae48d0d7309d96d80861d499504af94b21d: Error processing tar file(exit status 1): operation not permitted
INFO[0029] Attempting next endpoint for pull after error: failed to register layer: Error processing tar file(exit status 1): operation not permitted
INFO[0029] Layer sha256:938f1cd4eae26ed4fc51c37fa2f7b358418b6bd59c906119e0816ff74a934052 cleaned up
(...)
If i run watch -n 0 "sudo ls -lt /var/lib/docker/overlay2/" while the image is pulling, i can see new folders appearing (and disappearing after it fails) and the permissions on /var/lib/docker/overlay2/ are root:root:700 so i don't think it's exactly a permission issue.
Here are some detail about the environment:
I have a proxmox running the LXC container where i'm having the issue.
The container itself is running Debian 8.
And here are the various versions:
$> uname -a
Linux [redacted-hostname] 4.10.15-1-pve #1 SMP PVE 4.10.15-15 (Fri, 23 Jun 2017 08:57:55 +0200) x86_64 GNU/Linux
$> docker version
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:20:04 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:18:59 2017
OS/Arch: linux/amd64
Experimental: false
$>docker info
Containers: 20
Running: 0
Paused: 0
Stopped: 20
Images: 28
Server Version: 17.06.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Kernel Version: 4.10.15-1-pve
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.906GiB
Name: resumed-dev
ID: EBJ6:AFVS:L3RC:ZEE7:A6ZJ:WDQE:GTIZ:RXHA:P4AQ:QJD7:H6GG:YIQB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 16
Goroutines: 24
System Time: 2017-08-17T14:17:07.800849127+02:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
EDIT: This will be fixed by any release after December 18, 2017 of Moby via this merge. Will update again when fully incorporated into Docker.
If your container is unprivileged, this appears to be an issue with the overlay2 storage driver for Docker. This does not appear to be an issue with overlay (GitHub issue). So either utilize the overlay storage driver instead of overlay2, or make your container privileged.
I have almost the same environment as you, and met the same problem.
Some image works perfectly (alpine), while some images fails at cleaning up (ubuntu).
strace -f dockerd -D then docker pull or docker load gives the reason:
mknodat(AT_FDCWD, "/dev/agpgart", S_IFCHR|0660, makedev(10, 175)) = -1 EPERM (Operation not permitted)
Unprivileged container prohibit mknod by design. If you insist nesting Docker inside lxc, you will have to choose privileged container. (And notice that existing unprivileged container cannot be converted to privileged container directly due to uid/gid mapping)

Can't stop Docker containers - Error during connect - Docker Toolbox

I'm trying to integrate VSTS with Docker to run automated tests, and I want to dockerize my databases to maintain a consistent database state among tests without take care of cleanup step, running a new container with no modifications. But I have receiving a lot of errors from Docker for containerize a SQL Server database. Frequently my containers hangs, and I can't stop or remove these containers without reboot the Boot2Docker VM. Sometimes, I receive a error message for any Docker command after that error. Even a simple docker ps or docker version didn't work after a container hangs (sometimes these commands work, but stop or remove don't after container hanging), giving me the following error message:
error during connect: Post https://192.168.99.100:2376/v1.26/containers/container-name/stop: dial tcp 192.168.99.100:2376: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
If I reopen the Docker QuickStart Terminal after this issue, I receive the following error description:
Error getting IP address: ssh command error: command : ip addr show
err : exit status 255
Because of that, I have to manually stop the default docker machine and reopen Docker Quickstart.
I have using a container released by Microsoft SQL Server to evaluate my solution, so a believe this container should work properly. So, to test my environment, I created a simple app with netcat server just to send several connections and evaluate whether for any reason my operating system, or Docker Toolbox, or anything related with, were causing these problems, but testing this simple docker server with several threads sending a lot of messages in several iterations my solution work properly with none errors.
Dockerfile:
FROM centos:latest
RUN yum install nc -y
EXPOSE 1433CMD nc -l -k 1433 > /out.netcat
CMD nc -l -k 1433 > /out.netcat
Are there any released solution for these problems, or even an way to avoid it? Looking for solutions in Google, I couldn't identify any solution, I just identified that there are others with the same errors. Can Docker be used to containerize databases?
Environment:
$ docker version
time="2017-03-31T10:23:50-03:00" level=info msg="Unable to use system certificate pool: crypto/x509: system root pool is not available on Windows"
Client:
Version: 1.13.1
API version: 1.26
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 08:47:51 2017
OS/Arch: windows/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 07:52:04 2017
OS/Arch: linux/amd64
Experimental: false
$ docker info
time="2017-03-31T10:27:10-03:00" level=info msg="Unable to use system certificate pool: crypto/x509: system root pool is not available on Windows"
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 25
Server Version: 17.03.0-ce
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 25
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
seccomp
Profile: default
Operating System: Boot2Docker 17.03.0-ce (TCL 7.2); HEAD : f11a204 - Thu Mar 2 00:14:47 UTC 2017
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.491 GiB
Name: default
ID: 56MH:QSVM:SCCQ:DKVC:HBNI:AYJK:UCQN:2UJZ:A4NV:KOZQ:XC67:EEPY
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 14
Goroutines: 22
System Time: 2017-03-31T13:27:08.809341202Z
EventsListeners: 0
Username: pablogoulart
Registry: https://index.docker.io/v1/
Labels:
provider=virtualbox
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
docker logs container-name:
This is an evaluation version. There are [164] days left in the evaluation period.
RegQueryValueEx HADR for key "Software\Microsoft\Microsoft SQL Server\MSSQL\MSSQLServer\HADR" failed.
2017-03-31 12:30:40.77 Server Microsoft SQL Server vNext (CTP1.4) - 14.0.405.198 (X64)
Mar 11 2017 01:54:12
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
on Linux (CentOS Linux 7 (Core))
2017-03-31 12:30:40.78 Server UTC adjustment: 0:00
2017-03-31 12:30:40.78 Server (c) Microsoft Corporation.
2017-03-31 12:30:40.78 Server All rights reserved.
2017-03-31 12:30:40.78 Server Server process ID is 4116.
2017-03-31 12:30:40.78 Server Logging SQL Server messages in file 'C:\var\opt\mssql\log\errorlog'.
2017-03-31 12:30:40.79 Server Registry startup parameters:
-d C:\var\opt\mssql\data\master.mdf
-l C:\var\opt\mssql\data\mastlog.ldf
-e C:\var\opt\mssql\log\errorlog
2017-03-31 12:30:41.29 Server SQL Server detected 1 sockets with 1 cores per socket and 1 logical processors per socket, 1 total logical processors; using 1 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
2017-03-31 12:30:41.34 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2017-03-31 12:30:41.37 Server Detected 2860 MB of RAM. This is an informational message; no user action is required.
2017-03-31 12:30:41.37 Server Using conventional memory in the memory manager.
2017-03-31 12:30:41.51 Server Default collation: SQL_Latin1_General_CP1_CI_AS (us_english 1033)
2017-03-31 12:30:41.76 Server Buffer pool extension is already disabled. No action is necessary.
2017-03-31 12:30:41.97 Server InitializeExternalUserGroupSid failed. Implied authentication will be disabled.
2017-03-31 12:30:41.98 Server Implied authentication manager initialization failed. Implied authentication will be disabled.

docker login <dtr-server> gives error 404 not found

when I try to login on docker private registry it gives the following error:
$docker login https://dtr-ip:443
Error response from daemon: Login: <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.8.0</center>
</body>
</html>
(Code: 404; Headers: map[Date:[Wed, 22 Jun 2016 13:51:33 GMT] Content-Type:[text/html] Content-Length:[168] X-Replica-Id:[fa6e7b73277d] Server:[nginx/1.8.0]])
My docker trusted registry and UCP are on same node.
docker logs in client side:
time="2016-06-22T19:25:08.338336106+05:30" level=info msg="Error logging in to v2 endpoint, trying next endpoint: login attempt to https://54.179.144.153:443/v2/ failed with status: 404 Not Found"
time="2016-06-22T19:25:08.621784740+05:30" level=error msg="Handler for POST /v1.23/auth returned error: Login: <html>\r\n<head><title>404 Not Found</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>404 Not Found</h1></center>\r\n<hr><center>nginx/1.8.0</center>\r\n</body>\r\n</html>\r\n (Code: 404; Headers: map[Content-Type:[text/html] Content-Length:[168] X-Replica-Id:[fa6e7b73277d] Server:[nginx/1.8.0] Date:[Wed, 22 Jun 2016 13:55:08 GMT]])"
$docker info
Containers: 29
Running: 16
Paused: 0
Stopped: 13
Images: 19
Server Version: 1.11.2-cs3
Storage Driver: devicemapper
Pool Name: docker-202:1-201339217-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.725 GB
Data Space Total: 107.4 GB
Data Space Available: 49.69 GB
Metadata Space Used: 3.461 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.144 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-RHEL7 (2015-12-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host overlay
Kernel Version: 3.10.0-229.14.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.26 GiB
Name: automation
ID: Z4XA:KGME:WMYE:RSP4:ILH7:CPFC:PTIN:QUJT:66UT:PC7R:H65R:BIDX
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
File Descriptors: 82
Goroutines: 159
System Time: 2016-06-22T13:59:28.058948802Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Cluster store: etcd://<server-ip>:2050
Cluster advertise: <server-ip>:12376
And version of docker are:
$docker version
Client:
Version: 1.11.2-cs3
API version: 1.23
Go version: go1.5.4
Git commit: c81a77d
Built: Wed Jun 8 01:23:22 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.2-cs3
API version: 1.23
Go version: go1.5.4
Git commit: c81a77d
Built: Wed Jun 8 01:23:22 2016
OS/Arch: linux/amd64
I think that when I login to https://dtr-ip:443 it searches for https://dtr-ip:443/v2/. And this url does not have any data.
I had the same generic error; my infrastructure was working fine for 30 days about, but after that I received the below error:
Error response from daemon: Login: <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.8.0</center>
</body>
</html>
(Code: 404; Headers: map[Date:[Wed, 22 Jun 2016 13:51:33 GMT] Content-Type:[text/html] Content-Length:[168] X-Replica-Id:[fa6e7b73277d] Server:[nginx/1.8.0]])
I took from the event on DTR web-console that the license was expired; after new license installation I haven't seen the error message.

Resources