Docker Build Process Stuck - docker

My OS---
Ubuntu 18.04 LTS
My Docker Version--
# docker --version
Docker version 19.03.6, build 369ce74a3c
I'm trying to build a docker image--
docker build -t image:tag .
Sending build context to Docker daemon 187.9kB
Step 1/8 : FROM node:8.16.2-alpine3.9
---> 9c0651c52baf
Step 2/8 : RUN mkdir -p /app
---> Running in 85ecdcc9218c
It gets stuck on step 2 with no activity. Here's error log from syslog
dockerd[4988]: time="2020-02-20x08:28:27.xxxxxxxxxx" level=info msg="API listen on /var/run/docker.sock"
systemd[1]: Reloading.
systemd-udevd[5315]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[5315]: Could not generate persistent MAC address for vethaxxxxxx: No such file or directory
systemd-networkd[4063]: vethexxxxxx: Link UP
kernel: [ 2304.024934] docker0: port 1(vethexxxxxx) entered blocking state
kernel: [ 2304.024936] docker0: port 1(vethexxxxxx) entered disabled state
kernel: [ 2304.025182] device vethexxxxxx entered promiscuous mode
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-udevd[5317]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[5317]: Could not generate persistent MAC address for vethexxxxxx: No such file or directory
kernel: [ 2304.029095] IPv6: ADDRCONF(NETDEV_UP): vethexxxxxx: link is not ready
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
containerd[4987]: time="2020-02-20x08:31:18.xxxxxxxxxx" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/85ecdcc9218c280e97de4bfd38b0d70d83bb601e58a61a2c58fff52db2c90042/shim.sock" debug=false pid=5326
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-networkd[4063]: vethexxxxxx: Gained carrier
systemd-networkd[4063]: docker0: Gained carrier
kernel: [ 2304.285614] eth0: renamed from vetha3b6298
kernel: [ 2304.285866] IPv6: ADDRCONF(NETDEV_CHANGE): vethexxxxxx: link becomes ready
kernel: [ 2304.285900] docker0: port 1(vethe0b5233) entered blocking state
kernel: [ 2304.285901] docker0: port 1(vethe0b5233) entered forwarding state
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
systemd-networkd[4063]: vethe0b5233: Gained IPv6LL
systemd-timesyncd[4039]: Network configuration changed, trying to establish connection.
systemd-timesyncd[4039]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Further if I press ^C to quick build process, it breaks my ssh connection too.

Related

attempt to change docker data-root fails - why

I am trying to set my docker storage dir as other than default, something I've done on other machines:
/etc/docker/daemon.json:
{
"data-root": "/mnt/x/y/docker_data"
}
where the storage dir looks like
jeremyr#snorble:~$ ls -ltr /mnt/x/y
total 4
drwxrwxrwx 11 jeremyr 5001 122 Mar 19 08:14 docker_data
with the daemon.json file in place, sudo systemctl restart docker hits Job for docker.service failed (without that daemon.json, docker restarts fine and docker run hello-world runs fine) . with the daemon.json in place, journalctl -xn shows
Mar 25 14:20:33 bolt88 systemd[1]: docker.service start request repeated too quickly, refusing to start.
Mar 25 14:20:33 bolt88 systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is failed.
Mar 25 14:20:33 bolt88 systemd[1]: Unit docker.service entered failed state.
Mar 25 14:20:34 bolt88 sudo[23961]: jeremyr : TTY=pts/18 ; PWD=/home/jeremyr ; USER=root ; COMMAND=/bin/journalctl -xn
Mar 25 14:20:34 bolt88 sudo[23961]: pam_unix(sudo:session): session opened for user root by jeremyr(uid=0)
while systemctl status docker.service just shows code=exited, status=1/FAILURE
and in dmesg I see this:
1547:[Mon Mar 25 14:21:41 2019] aufs au_opts_verify:1570:dockerd[20714]: dirperm1 breaks the protection by the permission bits on the lower branch
1548-[Mon Mar 25 14:21:41 2019] device veth34d1dfd entered promiscuous mode
1549-[Mon Mar 25 14:21:41 2019] IPv6: ADDRCONF(NETDEV_UP): veth34d1dfd: link is not ready
1550-[Mon Mar 25 14:21:41 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth34d1dfd: link becomes ready
1551:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered forwarding state
1552:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered forwarding state
1553:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered disabled state
1554-[Mon Mar 25 14:21:41 2019] device veth34d1dfd left promiscuous mode
1555:[Mon Mar 25 14:21:41 2019] docker0: port 1(veth34d1dfd) entered disabled state
1556-[Mon Mar 25 14:21:59 2019] systemd-sysv-generator[20958]: Ignoring creation of an alias umountiscsi.service for itself
Docker version 17.05.0-ce, build 89658be, on a debian 8.8 setup .
Does anyone know why docker isn't allowing use of that dir as data-root?
TD;DR -- worked on Ubuntu 18.04 just before post
follow the instructions:
sudo systemctl stop docker
sudo rsync -axPS /var/lib/docker/ /mnt/x/y/docker_data #copy all existing data to new location
sudo vi /lib/systemd/system/docker.service # or your favorite text editor
in file docker.service find one line like this:
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
add --data-root /mnt/x/y/docker_data to it(on one line):
ExecStart=/usr/bin/dockerd --data-root /mnt/x/y/docker_data -H fd:// --containerd=/run/containerd/containerd.sock
save and quit, then
sudo systemctl daemon-reload
sudo systemctl start docker
docker info | grep "Root Dir"
last command should output: Docker Root Dir: /mnt/x/y/docker_data
that's it, should've done here.
The Too Long version, if you Do want to Read:
after some investigating, I found some outdated articles, include this one, they mentioned some confident solution, these are typical pages:
add -g option in docker.service
not working because -g and --graph Deprecated In Release: v17.05.0
add data-root in /etc/docker/daemon.json, the method tried by question author,
not working for some unknown reason
read those solution on about one dozen web pages, got the inspiration:
How To Change Docker Data Folder Configuration
not a very good solution -- not popular, , but the interesting part is below Update::
graph has been deprecated in v17.05.0 .You can use data-root instead.
Yeah, graph => data-root, and the --graph is just the long form of -g, so I tried this substitution in solution add -g option in docker.service, and Ta da ~
Something is off on the docker_data.
Solution:
remove the /etc/docker/daemon.json file.
start docker.
copy the /var/lib/docker contents to the path you've put in /etc/docker/daemon.json.
put back the file /etc/docker/daemon.json and restart docker.
Well, I'm not an expert of docker, but I see "dirperm1 breaks the protection by the permission bits on the lower branch" in your log. And I also see this.
"drwxrwxrwx 11 jeremyr 5001 122 Mar 19 08:14 docker_data"
As my understanding, docker daemon requires the access permission to the directory. Does 5001 mean "docker" group?
However, if you ran the daemon in root permission, then it shouldn't happen.
Check the docker version of your machine by
docker --version
I was facing the same issue, and it got solved after upgrading the docker to latest version which is available.
Even the documentation available on docker's official website have not mentioned anything like that.
Once you upgrade docker ,
Restart the docker by
systemctl restart docker
The error will be gone, and new changes will start reflecting.

Docker stopped all of sudden in CentOS 7

I was running docker on my CentOS 7 machine.
Today I was trying to upgrade a container. So I stopped the container and tried to pull new image.
I got the below error
Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I checked the proxy setting for machine in cat /etc/environment and for docker in cat /etc/systemd/system/docker.service.d/http-proxy.conf
It is set correctly.
I enabled daemon logs for docker and the logs says
Sep 14 10:43:18 myCentOsServer kernel: [4913751.074277] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084599] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084888] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer NetworkManager[794]: <info> [1505349798.0267] device (veth1e3300a): released from master device docker0
Sep 14 10:44:48 myCentOsServer dockerd[29136]: time="2017-09-14T10:44:48.802236300+10:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I tried below commands but it is stuck.
systemctl daemon-reload
systemctl restart docker
Any idea what might be the issue.
Thanks in advance.
I was finally able to solve this issue.
Issue was with my docker mount points. Mine was set as /var/lib/docker and I suspect it got courrupted when I did data volume export.
Steps I followed
1) Navigated to /var/lib/docker, took a backup of images,containers and volumes folder and deleted them.
2) Reloaded the Daemon
3) Restarted the docker.
Now it is working fine.
However bad news is I lost my datadump which I took from one of the containers (using volumes-from).
But it was a dev version of software. So I reinstalled and did the setup.
It occurs sometimes in CentOS. You can simply restart the docker service by
systemctl restart docker.service

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

Error creating default "bridge" network: cannot create network (docker0): conflicts with network (docker0): networks have same bridge name

After stopping docker it refused to start again. It complaint that another bridge called docker0 already exists:
level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
level=info msg="Graph migration to content-addressability took 0.00 seconds"
level=info msg="Firewalld running: false"
level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: cannot create network fa74b0de61a17ffe68b9a8f7c1cd698692fb56f6151a7898d66a30350ca0085f (docker0): conflicts with network bb9e0aab24dd1f4e61f8e7a46d4801875ade36af79d7d868c9a6ddf55070d4d7 (docker0): networks have same bridge name"
docker.service: Main process exited, code=exited, status=1/FAILURE
Failed to start Docker Application Container Engine.
docker.service: Unit entered failed state.
docker.service: Failed with result 'exit-code'.
Deleting the bridge with ip link del docker0 and then starting docker leads to the same result with another id.
For me I downgraded my OS (Centos Atomic Host in this case) and came across this error message. The docker of the older Centos Atomic was 1.9.1. I did not have any running docker containers or images pulled before running the downgrade.
I simply ran the below and docker was happy again:
sudo rm -rf /var/lib/docker/network
sudo systemctl start docker
More info.
The Problem seems to be in /var/docker/network/. There are a lot of sockets stored that reference the bridge by its old id. To solve the Problem you can delete all sockets, delete the interface and then start docker but all your container will refuse to work since their sockets are gone. In my case I did not care about my stateless containers anyway so this fixed the problem:
ip link del docker0
rm -rf /var/docker/network/*
mkdir /var/docker/network/files
systemctl start docker
# delete all containers
docker ps -a | cut -d' ' -f 1 | xargs -n 1 echo docker rm -f
# recreate all containers
It may sound obvious, but you may want to consider rebooting, especially if there was some major system update recently.
Worked for me, since I didn't reboot my VM after installing some kernel updates, which probably led to many network modules being left in an undefined state.

Setting multiple DOCKER_OPTS arguments

If you want to pass an option to the Docker Engine at startup on Ubuntu, you can edit the /etc/defaults/docker file.
Here I'm setting the storage driver to AUFS:
DOCKER_OPTS="--storage-driver=aufs"
However, if I pass more than one argument, Docker doesn't start. For example:
DOCKER_OPTS="--insecure-registry=0.0.0.0:5000 --storage-driver=aufs"
Now Docker fails to start:
# service docker stop && service docker start
docker start/running, process 31569
# service docker status
docker stop/waiting
From /var/log/syslog:
Mar 11 14:55:30 myhost kernel: [ 2788.030270] init: docker main process (31253) terminated with status 1
Mar 11 14:55:30 myhost kernel: [ 2788.030279] init: docker main process ended, respawning
Mar 11 14:55:30 myhost kernel: [ 2788.085931] init: docker main process (31287) terminated with status 1
Mar 11 14:55:30 myhost kernel: [ 2788.085940] init: docker respawning too fast, stopped
Each argument works on its own, but if passed together the Docker service refuses to start. I am using Docker version 1.10.3, build 20f81dd on Ubuntu 14.04 3.13.0-74-generic.
How can I pass more than one argument to DOCKER_OPTS?
The arguments must be separated by ,
This format works:
DOCKER_OPTS="--insecure-registry=0.0.0.0:5000,--storage-driver=aufs"

Resources