Docker stopped all of sudden in CentOS 7 - docker

I was running docker on my CentOS 7 machine.
Today I was trying to upgrade a container. So I stopped the container and tried to pull new image.
I got the below error
Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I checked the proxy setting for machine in cat /etc/environment and for docker in cat /etc/systemd/system/docker.service.d/http-proxy.conf
It is set correctly.
I enabled daemon logs for docker and the logs says
Sep 14 10:43:18 myCentOsServer kernel: [4913751.074277] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084599] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084888] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer NetworkManager[794]: <info> [1505349798.0267] device (veth1e3300a): released from master device docker0
Sep 14 10:44:48 myCentOsServer dockerd[29136]: time="2017-09-14T10:44:48.802236300+10:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I tried below commands but it is stuck.
systemctl daemon-reload
systemctl restart docker
Any idea what might be the issue.
Thanks in advance.

I was finally able to solve this issue.
Issue was with my docker mount points. Mine was set as /var/lib/docker and I suspect it got courrupted when I did data volume export.
Steps I followed
1) Navigated to /var/lib/docker, took a backup of images,containers and volumes folder and deleted them.
2) Reloaded the Daemon
3) Restarted the docker.
Now it is working fine.
However bad news is I lost my datadump which I took from one of the containers (using volumes-from).
But it was a dev version of software. So I reinstalled and did the setup.

It occurs sometimes in CentOS. You can simply restart the docker service by
systemctl restart docker.service

Related

Docker Build Process Stuck

My OS---
Ubuntu 18.04 LTS
My Docker Version--
# docker --version
Docker version 19.03.6, build 369ce74a3c
I'm trying to build a docker image--
docker build -t image:tag .
Sending build context to Docker daemon 187.9kB
Step 1/8 : FROM node:8.16.2-alpine3.9
---> 9c0651c52baf
Step 2/8 : RUN mkdir -p /app
---> Running in 85ecdcc9218c
It gets stuck on step 2 with no activity. Here's error log from syslog
dockerd[4988]: time="2020-02-20x08:28:27.xxxxxxxxxx" level=info msg="API listen on /var/run/docker.sock"
systemd[1]: Reloading.
systemd-udevd[5315]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[5315]: Could not generate persistent MAC address for vethaxxxxxx: No such file or directory
systemd-networkd[4063]: vethexxxxxx: Link UP
kernel: [ 2304.024934] docker0: port 1(vethexxxxxx) entered blocking state
kernel: [ 2304.024936] docker0: port 1(vethexxxxxx) entered disabled state
kernel: [ 2304.025182] device vethexxxxxx entered promiscuous mode
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-udevd[5317]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[5317]: Could not generate persistent MAC address for vethexxxxxx: No such file or directory
kernel: [ 2304.029095] IPv6: ADDRCONF(NETDEV_UP): vethexxxxxx: link is not ready
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
containerd[4987]: time="2020-02-20x08:31:18.xxxxxxxxxx" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/85ecdcc9218c280e97de4bfd38b0d70d83bb601e58a61a2c58fff52db2c90042/shim.sock" debug=false pid=5326
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-networkd[4063]: vethexxxxxx: Gained carrier
systemd-networkd[4063]: docker0: Gained carrier
kernel: [ 2304.285614] eth0: renamed from vetha3b6298
kernel: [ 2304.285866] IPv6: ADDRCONF(NETDEV_CHANGE): vethexxxxxx: link becomes ready
kernel: [ 2304.285900] docker0: port 1(vethe0b5233) entered blocking state
kernel: [ 2304.285901] docker0: port 1(vethe0b5233) entered forwarding state
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-networkd[4063]: vethe0b5233: Gained IPv6LL
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Further if I press ^C to quick build process, it breaks my ssh connection too.

attempt to change docker data-root fails - why

I am trying to set my docker storage dir as other than default, something I've done on other machines:
/etc/docker/daemon.json:
{
"data-root": "/mnt/x/y/docker_data"
}
where the storage dir looks like
jeremyr#snorble:~$ ls -ltr /mnt/x/y
total 4
drwxrwxrwx 11 jeremyr 5001 122 Mar 19 08:14 docker_data
with the daemon.json file in place, sudo systemctl restart docker hits Job for docker.service failed (without that daemon.json, docker restarts fine and docker run hello-world runs fine) . with the daemon.json in place, journalctl -xn shows
Mar 25 14:20:33 bolt88 systemd[1]: docker.service start request repeated too quickly, refusing to start.
Mar 25 14:20:33 bolt88 systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is failed.
Mar 25 14:20:33 bolt88 systemd[1]: Unit docker.service entered failed state.
Mar 25 14:20:34 bolt88 sudo[23961]: jeremyr : TTY=pts/18 ; PWD=/home/jeremyr ; USER=root ; COMMAND=/bin/journalctl -xn
Mar 25 14:20:34 bolt88 sudo[23961]: pam_unix(sudo:session): session opened for user root by jeremyr(uid=0)
while systemctl status docker.service just shows code=exited, status=1/FAILURE
and in dmesg I see this:
1547:[Mon Mar 25 14:21:41 2019] aufs au_opts_verify:1570:dockerd[20714]: dirperm1 breaks the protection by the permission bits on the lower branch
1548-[Mon Mar 25 14:21:41 2019] device veth34d1dfd entered promiscuous mode
1549-[Mon Mar 25 14:21:41 2019] IPv6: ADDRCONF(NETDEV_UP): veth34d1dfd: link is not ready
1550-[Mon Mar 25 14:21:41 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth34d1dfd: link becomes ready
1551:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered forwarding state
1552:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered forwarding state
1553:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered disabled state
1554-[Mon Mar 25 14:21:41 2019] device veth34d1dfd left promiscuous mode
1555:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered disabled state
1556-[Mon Mar 25 14:21:59 2019] systemd-sysv-generator[20958]: Ignoring creation of an alias umountiscsi.service for itself
Docker version 17.05.0-ce, build 89658be, on a debian 8.8 setup .
Does anyone know why docker isn't allowing use of that dir as data-root?
TD;DR -- worked on Ubuntu 18.04 just before post
follow the instructions:
sudo systemctl stop docker
sudo rsync -axPS /var/lib/docker/ /mnt/x/y/docker_data #copy all existing data to new location
sudo vi /lib/systemd/system/docker.service # or your favorite text editor
in file docker.service find one line like this:
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
add --data-root /mnt/x/y/docker_data to it(on one line):
ExecStart=/usr/bin/dockerd --data-root /mnt/x/y/docker_data -H fd:// --containerd=/run/containerd/containerd.sock
save and quit, then
sudo systemctl daemon-reload
sudo systemctl start docker
docker info | grep "Root Dir"
last command should output: Docker Root Dir: /mnt/x/y/docker_data
that's it, should've done here.
The Too Long version, if you Do want to Read:
after some investigating, I found some outdated articles, include this one, they mentioned some confident solution, these are typical pages:
add -g option in docker.service
not working because -g and --graph Deprecated In Release: v17.05.0
add data-root in /etc/docker/daemon.json, the method tried by question author,
not working for some unknown reason
read those solution on about one dozen web pages, got the inspiration:
How To Change Docker Data Folder Configuration
not a very good solution -- not popular, , but the interesting part is below Update::
graph has been deprecated in v17.05.0 .You can use data-root instead.
Yeah, graph => data-root, and the --graph is just the long form of -g, so I tried this substitution in solution add -g option in docker.service, and Ta da ~
Something is off on the docker_data.
Solution:
remove the /etc/docker/daemon.json file.
start docker.
copy the /var/lib/docker contents to the path you've put in /etc/docker/daemon.json.
put back the file /etc/docker/daemon.json and restart docker.
Well, I'm not an expert of docker, but I see "dirperm1 breaks the protection by the permission bits on the lower branch" in your log. And I also see this.
"drwxrwxrwx 11 jeremyr 5001 122 Mar 19 08:14 docker_data"
As my understanding, docker daemon requires the access permission to the directory. Does 5001 mean "docker" group?
However, if you ran the daemon in root permission, then it shouldn't happen.
Check the docker version of your machine by
docker --version
I was facing the same issue, and it got solved after upgrading the docker to latest version which is available.
Even the documentation available on docker's official website have not mentioned anything like that.
Once you upgrade docker ,
Restart the docker by
systemctl restart docker
The error will be gone, and new changes will start reflecting.

Failed to start Docker Application Container Engine

I am new to Docker, so don't have much idea about it.
I tried restarting Docker service using command
service docker restart
As the command was taking too much of time I did a CTL+C
Now I am not able to start docker deamon
Any docker command gives following op
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
I tried starting Docker deamon using
systemctl start docker
But it outputs:
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
Output of command
**systemctl status docker.service**
`● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─docker.conf, http-proxy.conf, https-proxy.conf
Active: failed (Result: exit-code) since Mon 2018-03-05 17:17:54 IST; 2min 23s ago
Docs: https://docs.docker.com
Process: 11331 ExecStart=/usr/bin/dockerd --graph=/app/dockerRT (code=exited, status=1/FAILURE)
Main PID: 11331 (code=exited, status=1/FAILURE)
Memory: 76.9M
CGroup: /system.slice/docker.service
└─4593 docker-containerd-shim 3bda33eac892d14adda9f3b1fc8dc52173e26ce60ca949075227d903399c7517 /var/run/docker/libcontainerd/3bda33eac892d14adda9f3b1fc8dc52173e26c...
Mar 05 17:17:05 hj-fsbfsd9761.persistent.co.in systemd[1]: Starting Docker Application Container Engine...
Mar 05 17:17:05 hj-fsbfsd9761.persistent.co.in dockerd[11331]: time="2018-03-05T17:17:05.126009059+05:30" level=info msg="libcontainerd: new containerd process, pid: 11337"
Mar 05 17:17:06 hj-fsbfsd9761.persistent.co.in dockerd[11331]: time="2018-03-05T17:17:06.346599571+05:30" level=warning msg="devmapper: Usage of loopback devices is ...section."
Mar 05 17:17:10 hj-fsbfsd9761.persistent.co.in dockerd[11331]: time="2018-03-05T17:17:10.889378989+05:30" level=warning msg="devmapper: Base device already exists an...ignored."
Mar 05 17:17:10 hj-fsbfsd9761.persistent.co.in dockerd[11331]: time="2018-03-05T17:17:10.976695025+05:30" level=info msg="[graphdriver] using prior storage driver \"...mapper\""
Mar 05 17:17:54 hj-fsbfsd9761.persistent.co.in dockerd[11331]: time="2018-03-05T17:17:54.312812069+05:30" level=fatal msg="Error starting daemon: timeout"
Mar 05 17:17:54 hj-fsbfsd9761.persistent.co.in systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Mar 05 17:17:54 hj-fsbfsd9761.persistent.co.in systemd[1]: **Failed to start Docker Application Container Engine.**
Mar 05 17:17:54 hj-fsbfsd9761.persistent.co.in systemd[1]: Unit docker.service entered failed state.
Mar 05 17:17:54 hj-fsbfsd9761.persistent.co.in systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
journalctl -xe
loop: Write error at byte offset 63585648640, length 4096.
How would I be able to start Docker without losing any containers and using previous configurations?
I had the same issue (Fedora 30 x86_64, kernel 5.2.9) and it turned out that being connected to a VPN was the problem. Apparently having a changed gateway address causes an "Error initializing network controller" error which I was able to see when I tried starting docker via sudo dockerd instead of sudo systemctl start docker.
I found the note here about the VPN being a possible problem, disconnecting immediately allowed me to start docker with systemctl start docker.
"Failed to start Docker Application Container Engine" is a general error message.
You should inspect journal for more details:
journalctl -eu docker
In my case it was: "error initializing graphdriver: /var/lib/docker contains several valid graphdrivers: devicemapper, overlay2"
Changing graphdriver to overlay2, fixed it:
$ sudo systemctl stop docker
$ vi /etc/docker/daemon.json # Create the file if it does not exist, and add:
{
"storage-driver": "overlay2"
}
$ sudo systemctl start docker
$ systemctl status docker.service # Hopefully it's running now
I removed the file /etc/docker/daemon.json and started it with sudo systemctl start docker and it worked!!
For the googlers:
For me what worked was doing a killall dockerd and a sudo rm /var/run/docker.pid
I got the error message that recommended that from running dockerd
I ran into this too but all I did after receiving the error was sudo systemctl start docker and then ran sudo systemctl status docker again and that took care of it while on a vpn.
You may have the issue with the current docker installation.
If you don't have much done already you may want to try to reinstall using the install script provided by Docker: https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-using-the-convenience-script this will help you to investigate the errors.
I faced the same problem, in my case the disk was full. I removed docker volumes from /var/lib/docker/volumes/ and started docker with sudo systemctl start docker.

Error creating default "bridge" network: cannot create network (docker0): conflicts with network (docker0): networks have same bridge name

After stopping docker it refused to start again. It complaint that another bridge called docker0 already exists:
level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
level=info msg="Graph migration to content-addressability took 0.00 seconds"
level=info msg="Firewalld running: false"
level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: cannot create network fa74b0de61a17ffe68b9a8f7c1cd698692fb56f6151a7898d66a30350ca0085f (docker0): conflicts with network bb9e0aab24dd1f4e61f8e7a46d4801875ade36af79d7d868c9a6ddf55070d4d7 (docker0): networks have same bridge name"
docker.service: Main process exited, code=exited, status=1/FAILURE
Failed to start Docker Application Container Engine.
docker.service: Unit entered failed state.
docker.service: Failed with result 'exit-code'.
Deleting the bridge with ip link del docker0 and then starting docker leads to the same result with another id.
For me I downgraded my OS (Centos Atomic Host in this case) and came across this error message. The docker of the older Centos Atomic was 1.9.1. I did not have any running docker containers or images pulled before running the downgrade.
I simply ran the below and docker was happy again:
sudo rm -rf /var/lib/docker/network
sudo systemctl start docker
More info.
The Problem seems to be in /var/docker/network/. There are a lot of sockets stored that reference the bridge by its old id. To solve the Problem you can delete all sockets, delete the interface and then start docker but all your container will refuse to work since their sockets are gone. In my case I did not care about my stateless containers anyway so this fixed the problem:
ip link del docker0
rm -rf /var/docker/network/*
mkdir /var/docker/network/files
systemctl start docker
# delete all containers
docker ps -a | cut -d' ' -f 1 | xargs -n 1 echo docker rm -f
# recreate all containers
It may sound obvious, but you may want to consider rebooting, especially if there was some major system update recently.
Worked for me, since I didn't reboot my VM after installing some kernel updates, which probably led to many network modules being left in an undefined state.

Docker daemon no start

I'm trying to start the Docker daemon:
sudo systemctl start docker
But nothing happens, the cursor just blinks and the process never ends.
Yesterday it was working properly :(
sudo journalctl -fu docker
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.467635627-05:00" level=info msg="New containerd process, pid: 1609\n"
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.482107319-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.470570243-05:00" level=info msg="New containerd process, pid: 1620\n"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.491495106-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:08:06 host systemd[1]: Stopped Docker Application Container Engine.
-- Reboot --
ago 18 16:16:52 host systemd[1]: Starting Docker Application Container Engine...
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.360878396-05:00" level=info msg="New containerd process, pid: 2327\n"
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.686503187-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.664023288-05:00" level=info msg="New containerd process, pid: 2368\n"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.67708602-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
One interesting thing with systemd is that if it thinks that a daemon is running, then the start command does nothing.
I have had to do the following to make sure I cleanly restart certain daemons:
sudo systemctl stop service-name
# wait a little if the service is slow to stop like the Cassandra database
sudo systemctl start service-name
That has worked for me with various services.
One way to know whether the service is considered running, is to check the status like so:
systemctl status service-name

Resources