Error starting docker service. "A dependency job for docker.service failed. See 'journalctl -xe' for details." - docker

So, after some minor changes in the docker configuration I tried to restart docker and it resulted in below error message:
A dependency job for docker.service failed. See 'journalctl -xe' for details.
Kubernetes is also running on the same machine where this docker daemon was running.
Below are the logs of the docker service (output of journalctl -u docker.service).
May 15 08:56:06 ilcepoc500 systemd[1]: Stopping Docker Application Container Engine...
May 15 08:56:07 ilcepoc500 oci-umount[42741]: umounthook <debug>: 5148572ffa9c: only runs in prestart stage, ignoring
May 15 08:56:07 ilcepoc500 oci-systemd-hook[42824]: systemdhook <debug>: 4676114a4bcd: Skipping as container command is /fission-bundle, not init or systemd
May 15 08:56:07 ilcepoc500 oci-systemd-hook[43025]: systemdhook <debug>: 92140d272e14: Skipping as container command is /go/bin/all-in-one-linux, not init or systemd
May 15 08:56:18 ilcepoc500 oci-umount[44315]: umounthook <debug>: prestart container_id:12d638f87c0d rootfs:/storage/docker/overlay2/ab7502908ea8a939e9ea7379f9715e40b563717404fc5c2ee923062e67520f15/merged
May 15 08:56:21 ilcepoc500 systemd[1]: Dependency failed for Docker Application Container Engine.
May 15 08:56:21 ilcepoc500 systemd[1]: Job docker.service/start failed with result 'dependency'.
I followed some linked of Github and SO but no luck so far any hints are appreciated
Below are the things that I have tried:
deleted /var/log/docker, reloaded docker-daemon and tried restarting docker, didn't work.
Create a file named override.conf inside the dir /etc/systemd/system/containerd.service.d and tried restarting the docker service didnt work.

Related

Docker service doesn't auto start after moving the docker image data directory to external drive location

Following this page, I have moved the docker data directory and created a symbolic link to it. It works. But everytime after rebooting my computer, the Docker service doesn't start automatically any more. How can I solve this problem?
journalctl -u docker.service returns:
Jun 30 10:29:55 ubuntu systemd[1]: Starting Docker Application Container Engine...
Jun 30 10:29:55 ubuntu dockerd[2358]: time="2022-06-30T10:29:55.426467188+10:00" level=info msg="S>
Jun 30 10:29:55 ubuntu dockerd[2358]: mkdir /var/lib/docker: file exists
Jun 30 10:29:55 ubuntu systemd[1]: docker.service: Main process exited, code=exited, status=1/FAIL>
Jun 30 10:29:55 ubuntu systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 30 10:29:55 ubuntu systemd[1]: Failed to start Docker Application Container Engine.
Jun 30 10:29:57 ubuntu systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Jun 30 10:29:57 ubuntu systemd[1]: Stopped Docker Application Container Engine.
Jun 30 10:29:57 ubuntu systemd[1]: docker.service: Start request repeated too quickly.
Jun 30 10:29:57 ubuntu systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 30 10:29:57 ubuntu systemd[1]: Failed to start Docker Application Container Engine.
Before moving the data directory "/var/lib/docker", it was a directory used by Docker, now it is a symbolic link that points to the external directory where the docker image data is stored. Why there is a mkdir command?
If I run dockerd, it returns:
INFO[2022-06-30T20:53:05.143671302+10:00] Starting up
dockerd needs to be started with root privileges. To run dockerd in rootless mode as an unprivileged user, see https://docs.docker.com/go/rootless/
If I run sudo service docker start, docker can start without error. But I don't want to run this everyday. Docker used to start automatically. Any ideas?
I was able to reproduce the error message with the same configuration:
systemd[1]: Starting Docker Application Container Engine...
dockerd[47623]: time="2022-06-30T16:36:20.047741616Z" level=in..
dockerd[47623]: mkdir /data/docker: file exists
systemd[1]: docker.service: Main process exited, code=exited, ..
The reason was that my external drive wasn't mounted yet.
Adding systemd mount/automount units resolve the issue. Or you can add your external drive to your /etc/fstab (Add nofail for avoid the 90 seconds wait when you don't have it with you).
Also from Docker doc:
You can configure the Docker daemon to use a different directory, using the data-root configuration option.
So editing your /etc/docker/daemon.json with:
{
"data-root": "/data/docker"
}
is probably better than using symlinks.

Docker daemon cannot be started for some (hidden) reason

I am trying to push a docker image and noticed that my docker daemon actually is probably not running.
If for example I run:
docker run hello-world
docker: Cannot connect to the Docker daemon at
unix:///var/run/docker.sock. Is the docker daemon running?.
If I try to restart the daemon using:
systemctl start docker
Job for docker.service failed because the control process exited with
error code. See "systemctl status docker.service" and "journalctl -xe"
for details.
Continuing running:
systemctl status docker.service
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor
preset: enabled)
Active: failed (Result: start-limit-hit) since Wed 2021-05-12 14:45:09
EEST; 43s ago
Docs: https://docs.docker.com
Process: 4810 ExecStart=/usr/bin/dockerd -H fd://
--containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 4810 (code=exited, status=1/FAILURE)
May 12 14:45:07 iti-554 systemd[1]: docker.service: Unit entered
failed state.
May 12 14:45:07 iti-554 systemd[1]: docker.service: Failed with result
'exit-code'.
May 12 14:45:09 iti-554 systemd[1]: docker.service: Service hold-off
time over, scheduling restart.
May 12 14:45:09 iti-554 systemd[1]: Stopped Docker Application
Container Engine.
May 12 14:45:09 iti-554 systemd[1]: docker.service: Start request
repeated too quickly.
May 12 14:45:09 iti-554 systemd[1]: Failed to start Docker Application
Container Engine.
May 12 14:45:09 iti-554 systemd[1]: docker.service: Unit entered
failed state.
May 12 14:45:09 iti-554 systemd[1]: docker.service: Failed with result
'start-limit-hit'.
which as I understand it it means docker daemon is not loaded (it's in a failed state) and the last reason for this is the start-limit-hit has been reached. This on this side probably means another reason exists for this to happen.
SO, how do I find out which is the actual reason for my docker daemon refusing to start?
If I run to reset the failed attemps counter with:
systemctl reset-failed docker.service
it return without error so I assume it succeeds. And indeed when I check the status it has become:
Active: inactive (dead) since Wed 2021-05-12 14:45:09 EEST; 14min ago
Of course if I run docker daemon again it fails.
Can someone provide any workaround about this issue? I even tried to invoke the commands after restarting (didn't help).
Edit
Well, to my case the problem was a rather stupid one. I had added a daemon.json file with minimal content in it. Just this:
cat /etc/docker/daemon.json
{
"insecure-registries": [
"docker-server.com:10022",
"docker-server.com:10023"
],
}
The problem was that the dangling comma before } made docker search for another parameter. The relevant message shown using journalctl -u docker was:
unable to configure the Docker daemon with file
/etc/docker/daemon.json: invalid character '}' looking for beginning
of object key string
is quite obvious but the previous ones did not help much.
journalctl -u docker gives you docker daemon logs. Maybe u can find something there.
The unix:///var/run/docker.sock requires the correct permission to work. This a security feature for Docker.
Try sudo chmod 755 /var/run/docker.sock and re-run Docker command.
Note the permission number given here may not be suitable for everyone.

docker.service failed. See 'journalctl -xe' for details

docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2018-05-17 15:47:26 CEST; 17h ago
Docs: https://docs.docker.com
Main PID: 11843 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/docker.service
May 18 08:48:38 temp systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
May 18 08:49:09 temp systemd[1]: Stopped Docker Application Container Engine.
May 18 08:49:09 temp systemd[1]: Dependency failed for Docker Application Container Engine.
May 18 08:49:09 temp systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
May 18 08:49:15 temp systemd[1]: Dependency failed for Docker Application Container Engine.
May 18 08:49:15 temmp systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
May 18 09:00:03 temp systemd[1]: Dependency failed for Docker Application Container Engine.
May 18 09:00:03 temp systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
May 18 09:03:51 temp systemd[1]: Dependency failed for Docker Application Container Engine.
May 18 09:03:51 temp systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
Tried to uninstall docker and reinstalled it but it raises the same error is the docker daemon running can someone help me here.
There is a service that docker requires that is not running, thus, systemd won't launch docker.
Try launching journalctl -f (without -u) to see all unit logs, then start docker and read carefully the log, you will probably see some other units trying to start and failing.
You can find the reason why docker isn't starting by running
/usr/bin/dockerd -H unix://
In my case it was a fresh install of Centos7 with Docker 18.09
ERRO[2018-11-14T22:14:55.441548150+02:00] 'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded. storage-driver=overlay2
ERRO[2018-11-14T22:14:55.444930007+02:00] AUFS was not found in /proc/filesystems storage-driver=aufs
ERRO[2018-11-14T22:14:55.447984399+02:00] 'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded. storage-driver=overlay
To fix that, I had to upgrade to a newer kernel, and remove the current docker storage
rm -rf /var/lib/docker
Then docker started working
I have this problem on my machine. I don't have success to solve this issue.
But if you are in a hury you can do
/usr/bin/dockerd -H unix:///var/run/docker.sock
All classic commands will work (docker system, docker etc..)

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

Docker fails to start due to "volume store metadata database: timeout"

I have followed the installation instructions of Docker CE for CentOS. Initially this worked. At some point the system was restarted and now starting Docker fails. Appreciate expert eyes on this matter...
systemctl start docker produces:
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
systemctl status docker.service produces:
Apr 21 11:25:23 sec-services-build-1 systemd[1]: Starting Docker Application Container Engine...
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.370390797+03:00" level=info msg="libcontainerd: previous instance of containerd still alive (8908)"
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.382492171+03:00" level=warning msg="overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type support will no longer be supported in Docker 17.12."
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.382547668+03:00" level=info msg="[graphdriver] using prior storage driver: overlay"
Apr 21 11:25:24 sec-services-build-1 dockerd[9693]: Error starting daemon: error while opening volume store metadata database: timeout
Apr 21 11:25:24 sec-services-build-1 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Apr 21 11:25:24 sec-services-build-1 systemd[1]: Failed to start Docker Application Container Engine.
Apr 21 11:25:24 sec-services-build-1 systemd[1]: Unit docker.service entered failed state.
Apr 21 11:25:24 sec-services-build-1 systemd[1]: docker.service failed.
From here: https://github.com/moby/moby/issues/22507
I ran:
ps axf | grep docker | grep -v grep | awk '{print "kill -9 " $1}' | sudo sh
I was then able to restart docker using:
sudo systemctl start docker
Step 1: systemctl status docker (if docker is running) stop the docker.
step 2: systemctl stop docker.
step 3: dockerd
i got this message when copying volumes from production machine, ended up to overwrite metadata.db inside /var/lib/docker/volumes, then it crashes. A fix is so simple
docker system prune --volumes -f && rm /var/lib/docker/volumes/metadata.db && docker-compose up -d
I encountered the same error.
❶tried
sudo kill -9 1452
multiple times, but it doesn't work. There's still a dockerd process active.
1452 ? Zsl 127:42 [dockerd] <defunct>
❷tried as #Artur Mustafin suggested:
sudo mv /var/lib/docker/volumes/metadata.db /var/lib/docker/volumes/metadata.db.bk
it worked.
so I tried all of these and nothing worked. However what worked was removing all the containers from /var/lib/docker/containers. Then i killed all docker processes (ps -ef | grep docker) then restarted docker and the docker socket. When docker became active I added the containers one at a time and 1 container was what caused the issues

Resources