Docker service does not start - docker

docker is gigving me a hard time currently. I followed these instructions in order to install docker on my virtual server running Ubuntu 14.04 hosted by strato.de.
wget -qO- https://get.docker.com/ | sh
Executing this line runs me directly into this error message:
modprobe: ERROR: ../libkmod/libkmod.c:507 kmod_lookup_alias_from_builtin_file() could not open builtin file '/lib/modules/3.13.0-042stab092.3/modules.builtin.bin'modprobe: FATAL: Module aufs not found.
Warning: current kernel is not supported by the linux-image-extra-virtual
package. We have no AUFS support. Consider installing the packages linux-image-virtual kernel and linux-image-extra-virtual for AUFS support.
After the installation was done, I installed the two mentioned packages. Now my problem is that I can't get docker to run.
service docker start
results in:
start: Job failed to start
docker -d
results in
INFO[0000] +job serveapi(unix:///var/run/docker.sock)
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
ERRO[0000] 'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded.
INFO[0000] +job init_networkdriver()
WARN[0000] Running modprobe bridge nf_nat failed with message: , error: exit status 1
package not installed
INFO[0000] -job init_networkdriver() = ERR (1)
FATA[0000] Shutting down daemon due to errors: package not installed
and
docker run hello-world
results in
FATA[0000] Post http:///var/run/docker.sock/v1.18/containers/create: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?
Does anybody have a clue about what dependencies could be missing? What else could have gone wrong? Are there any logs which docker provides?
I'm searching back and forth for a solution, but couldn't find one.
Just to mention this is a fresh Ubuntu 14.04 setup. I didn't install any other services except for java. And the reason why I need docker is for using the dockerimage of sharelatex.
I'm thankful for any help!

Here's what I tried/found out, hoping that it will save you some time or even help you solve it.
Docker's download script is trying to identify the kernel through uname -r to be able to install the right kernel extras for your host.
I suspect two problems:
My (united-hoster.de) and probably your provider use customized kernel images (eg. 3.13.0-042stab108.2) for virtual hosts. Since the script is explicitly looking for -generic in the name, the lookup fails.
While the naming problem would be easy to fix, I wasn't able to install the generic kernel extras with my hoster's custom kernel. It seems like using a upgrading the kernel does not work either, since it would affect all users/vHosts on the same physical machine. This is because the kernel is shared (stated in some support ticket).
To get around that ..
I skipped it, hoping that Docker would work without AUFS support, but it didn't.
I tried to force Docker to use devicemapper instead, but to no avail.
I see two options: get a dedicated host so you can mess with kernels and filesystems or a least let the docker installer do it or install the binaries manually.

You need to start docker
sudo start docker
and then
sudo docker run hello-world
I faced same problem on ubuntu 14.04, solved.
refer comment of Nino-K https://github.com/docker-library/hello-world/issues/3

Related

How do I grant the paketo builder access permissions to the docker socket when building from a docker image?

When using buildpacks to build my spring boot application on Fedora I get the following error during the execution of the spring-boot-plugin:build-image goal:
[INFO] [creator] ERROR: initializing analyzer: getting previous image: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": dial unix /var/run/docker.sock: connect: permission denied
After digging into the goal and buildpacks, I found the following command in the buildpack.io docs (by selecting "Linux" and "Container"):
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $PWD:/workspace -w /workspace \
buildpacksio/pack build <my-image> --builder <builder-image>
AFAICT this command should be equivalent to what happens inside of maven and it exhibits the same error.
My previous assumption was that the use in the buildpacksio/pack image doesn't have the access permissions to my docker socket. (The socket had 660 permissions and root:docker owner).
UPDATE: Even after updating to 666 permissions the issue still persists.
I don't really want to allow anyone to interact with the docker socket so setting it to 666 seems unwise. Is this the only option or can I also add the user in the container to the docker group?
The solution was that the Fedora docker package is no longer the most up-to-date way to install Docker. See the official Docker documentation
They both provide the same version number, but their build hash is different.
While I could not fully diagnose the difference between the two, I can report that it works with docker-ce and doesn't with docker.

nvidia-smi gives an error inside of a docker container

Sometimes I can't communicate with my Nvidia GPUs inside a docker container when I came back to my workplace from home, even though the previously launched process that utilizes GPUs is running well. The running process (training a neural network via Pytorch) is not affected by the disconnection but I cannot launch a new process.
nvidia-smi gives Failed to initialize NVML: Unknown Error and torch.cuda.is_available() returns False likewise.
I met two different cases:
nvidia-smi works fine when it is done at the host machine. In this case, the situation can be solved by restarting the docker container via docker stop $MYCONTAINER followed by docker start $MYCONTAINER at the host machine.
nvidia-smi doesn't work at the host machine nor nvcc --version, throwing Failed to initialize NVML: Driver/library version mismatch and Command 'nvcc' not found, but can be installed with: sudo apt install nvidia-cuda-toolkit error. Strange point is that the current process still runs well. In this case, installing the driver again or rebooting the machine solves the problem.
However, these solutions require stopping all current processes. It would be unavailable when I should not stop the current process.
Does somebody has suggestion for solving this situation?
Many thanks.
(sofwares)
Docker version: 20.10.14, build a224086
OS: Ubuntu 22.04
Nvidia driver version: 510.73.05
CUDA version: 11.6
(hardwares)
Supermicro server
Nvidia A5000 * 8
(pic1) nvidia-smi not working inside of a docker container, but worked well on the host machine.
(pic2) nvidia-smi works after restarting a docker container, which is the case 1 I mentioned above
For the problem of Failed to initialize NVML: Unknown Error and having to restart the container, please see this ticket and post your system/package information there as well:
https://github.com/NVIDIA/nvidia-docker/issues/1671
There's a workaround on the ticket, but it would be good to have others post their configuration to help fix the issue.
Downgrading containerd.io to 1.6.6 works as long as you specify no-cgroups = true in /etc/nvidia-container-runtime/config.toml and specify the devices to docker run like docker run --gpus all --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-modeset:/dev/nvidia-modeset --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --device /dev/nvidiactl:/dev/nvinvidiactl --rm -it nvidia/cuda:11.4.2-base-ubuntu18.04 bash
so sudo apt-get install -y --allow-downgrades containerd.io=1.6.6-1 and sudo apt-mark hold containerd.io to prevent the package from being updated. So do that, edit the config file, and pass all of the /dev/nvidia* devices in to docker run.
For the Failed to initialize NVML: Driver/library version mismatch issue, that is caused by the drivers updating but you haven't rebooted yet. If this is a production machine, I would also hold the driver package to stop that from auto-updating as well. You should be able to figure out the package name from something like sudo dpkg --get-selections "*nvidia*"

NVIDIA Docker - initialization error: nvml error: driver not loaded

I'm a complete newcomer to Docker, so the following questions might be a bit naive, but I'm stuck and I need help.
I'm trying to reproduce some results in research. The authors just released code along with a specification of how to build a Docker image to reproduce their results. The relevant bit is copied below:
I believe I installed Docker correctly:
$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ sudo docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
However, when I try checking that my nvidia-docker installation was successful, I get the following error:
$ sudo docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown.
It looks like the key error is:
nvidia-container-cli: initialization error: nvml error: driver not loaded
I don't have a GPU locally and I'm finding conflicting information on whether CUDA needs to be installed before NVIDIA Docker. For instance, this NVIDIA moderator says "A proper nvidia docker plugin installation starts with a proper CUDA install on the base machine."
My questions are the following:
Can I install NVIDIA Docker without having CUDA installed?
If so, what is the source of this error and how do I fix it?
If not, how do I create this Docker image to reproduce the results?
Can I install NVIDIA Docker without having CUDA installed?
Yes, you can. The readme states that nvidia-docker only requires NVIDIA GPU driver and Docker engine installed:
Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed
If so, what is the source of this error and how do I fix it?
That's either because you don't have a GPU locally or it's not NVIDIA, or you messed up somewhere when installed drivers. If you have a CUDA-capable GPU I recommend using NVIDIA guide to install drivers. If you don't have a GPU locally, you can still build an image with CUDA, then you can move it somewhere where there is a GPU.
If not, how do I create this Docker image to reproduce the results?
The problem is that even if you manage to get rid of CUDA in Docker image, there is software that requires it. In this case fixing the Dockerfile seems to me unnecessary - you can just ignore Docker and start fixing the code to run it on CPU.
I think you need
ENV NVIDIA_VISIBLE_DEVICES=void
then
RUN your work
finally
ENV NVIDIA_VISIBLE_DEVICES=all

I cant start my docker in Linux mint

[myshell]# sudo dockerd
Error starting daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid
In linux, I installed docker but daemon is not running.
the above given error is always generated.
Check If you already have the Docker Installed in it.
May be The Installation was not proper.
Check this Link
if you
cat /var/run/docker.pid
it is mostly likely you are already running it.
you can take a look at the official and proper way to launch your docker once it is installed
https://docs.docker.com/config/daemon/systemd/

error starting docker daemon on ubuntu 14.04 (Devices cgroup isn't mounted)

I followed docker instructions to install and verify the docker installation (from http://docs.docker.com/linux/step_one/).
I tried on 2 Ubuntu 14.04 machines and on both I got following error when starting docker daemon:
$ sudo docker daemon
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
INFO[0000] [graphdriver] using prior storage driver "aufs"
INFO[0000] Option DefaultDriver: bridge
INFO[0000] Option DefaultNetwork: bridge
WARN[0000] Running modprobe bridge nf_nat br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found. , error: exit status 1
INFO[0000] Firewalld running: false
WARN[0000] Your kernel does not support cgroup memory limit: mountpoint for memory not found
WARN[0000] mountpoint for cpu not found
FATA[0000] Error starting daemon: Devices cgroup isn't mounted
I appreciate any help to resolve this issue.
I resolved this issue by starting the docker deamon manually using:
sudo service docker start
Note: Looks like this issue was only present in Ubuntu 14.04 and earlier. The newer Ubuntu versions don't need this.
Try the following:
Log into Ubuntu as a user with sudo privileges.
Edit the /etc/default/grub file.
Set the GRUB_CMDLINE_LINUX value as follows:
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
Save and close the file.
Update GRUB.
$ sudo update-grub
Reboot your system.
Some folks have reported restarting the docker daemon works:
sudo systemctl restart docker
As noted above the newer docker documentation doesn't refer to this in the new Docker versions.
Update
This works for some folks on Ubuntu 14.04 or earlier:
sudo apt-get install cgroup-lite
maybe need packages:
apt-get install aufs-tools
apt-get install cgroup-lite
I've had this issue with debian.
The package cgroupfs-mount solved that.
sudo aptitude install cgroupfs-mount
I just had this problem on Fedora 31. The solution as described here is to append systemd.unified_cgroup_hierarchy=0 to the GRUB_CMDLINE_LINUX var in /etc/sysconfig/grub.
(In my case, GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet systemd.unified_cgroup_hierarchy=0")
Then run grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg and restart.
I just had to remove any mounts of cgroup from /etc/fstab and I solved the problem for Devices cgroup isn't mounted. I think that Module br_netfilter not found is just a warning and does not prevent Docker from starting. But you can fix it by installing:
apt-get install linux-image-3.19.0-33-generic linux-image-extra-3.19.0-33-generic
After that you have to reboot.
The "extra" is needed because aufs is not anymore included with basic image in Ubuntu.
In my case, I didn't have to install or config anything new. Docker was running fine before this failure.
Try restarting docker (eg. systemctl restart docker).
If it fails, shutdown and cold-boot the machine. Ensure docker is running.
After 129 days of uptime, my docker just got in a weird, bad state.

Resources