docker -d vs. service docker start - docker

I've installed docker on a VirtualBox running Fedora 20. I've been having problems getting hello-world to work, and eventually discovered that I can successfully run docker run hello-world only if I start docker on the command line with /usr/bin/docker -d. If I start docker with service docker start, any docker run command I try just hangs.
Why does service docker start not start docker in daemon mode, and how do I set up the other_args in /etc/sysconfig/docker to get it to do so?

Easiest way is just to remove docker and reinstall:
dnf remove docker
dnf install docker

Finally got it working. I removed docker, installed the latest version of virtualbox and the guest additions, upgraded Fedora 20 to 22, recreated loop devices because they got lost in the upgrade, rebooted countless times, and now service docker start starts up a version that docker run hello-world can run against successfully. It only took 2 days :-(

Related

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. AFTER installing nvidia-docker2

I followed the instructions to install the nvidia-docker2 from the official documentation https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
Whenever I run their test example:
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
I still get the error:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. 3
I rebooted but still no effect.
I am on Ubuntu 22.04 with my nvidia drivers updated.
Nvidia-smi works on the machine but not working using docker
EDIT (SOLVED): Finally I found out what was going on.
When reinstalling, it was working, however if rebooting, it was going again to a previous state where it was not working.
This was due to the installation of another docker service installed using "snapd" so I had to purge completely docker:
sudo snap remove docker and after I could "Reinstall everything" and it finally is stable, even after rebooting
Unfortunately I was not able to "Fix" properly the issue so I purge all docker package and all nvidia container packages and reinstalled everything and now it works!!
Good old methods work fine :)
you need to restart the docker daemon :
sudo systemctl restart docker
if the problem still occurs install the nvidia-container-toolkit then restart docker daemon.

What to do if the docker container hangs and does not respond to any command other than ctrl+c?

I have been running a nvidia docker image since 13 days and it used to restart without any problems using docker start -i <containerid> command. But, today while I was downloading pytorch inside the container, download got stuck at 5% and gave no response for a while.
I couldn't exit the container either by ctrl+d or ctrl+c. So, I exited the terminal and in new terminal I ran this docker start -i <containerid> again. But ever since this particular container is not responding to any command. Be it start/restart/exec/commit ...nothing! any command with this container ID or name is just non-responsive and had to exit out of it only after ctrl+c
I cannot restart the docker service since it will kill all running docker containers.
Cannot even stop the container using this docker container stop <containerid>
Please help.
You can make use of docker RestartPolicy:
docker update --restart=always <container>
while mindful of caveats on the docker version you running.
or explore an answer by #Yale Huang from a similar question: How to add a restart policy to a container that was already created
I had to restart docker process to revive my container. There was nothing else I could do to solve it. used sudo service docker restart and then revived my container using docker run. I will try to build the dockerfile out of it in order to avoid future mishaps.

Running docker -ps returns an empty list although containers are running

I installed docker CE version on an ubuntu 18.04 server. Then, I installed a new jenkins container and everything worked well for two weeks.
After two weeks, for some reason, when I run docker ps I receive an empty list although the jenkins container is running and functioning (it worked in the past). I also tried to run docker ps -a, docker images and again, everything is empty. Also tried restarting the server and still every time the list is empty.
I then uninstalled and reinstalled docker and right after the installation, when running docker ps I see the containers....I thought that the problem was fixed, but today it happened to me again and I still see an empty list when running docker ps. Any ideas ? it will be much appreciated.
Run the command sudo service docker stop
After that find the process dockerd
ps aux | grep "dockerd"
and kill the one by
sudo kill {paste_dockerd_pid_here} -9
Start docker service
sudo service docker start

Failed to start docker on an Amazon linux machine

I am using an Amazon linux machine (p2).
I have installed this docker version:
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: 7392c3b/17.03.2-ce
Built: Wed Aug 9 22:45:09 2017
OS/Arch: linux/amd64
I'm not sure, but I think the issue started after killing a screen which ran some docker container
I'm experiencing this error:
sudo docker ps
Gives:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
And:
sudo service docker status
Gives:
docker dead but subsys locked
I have tried both:
sudo rm -rf /var/run/docker
sudo rm /var/run/docker.*
I also tried to restart and stop:
sudo service docker start/stop
I also rebooted the EC2 machine
Try this and restart docker
yum update device-mapper-libs
sudo service docker restart
I am also facing the same issue. Although I (sort of) fixed it by issuing sudo service docker stop and sudo service docker start before running anything in docker.
Details: I am using docker in a spot instance, so it is setup everytime I need to perform some task. I create the docker and upload my files without any problem. But when I issue the command to run an uploaded bash script in docker I face this issue of docker not running. So before running the script I just stop and start docker. Weirdly, simply doing sudo service docker start or even sudo service docker restart did not solve my problem. I had to specifically issue both start and stop commands. But I don't have enough data points just yet, it is only working from the last couple of days and I am not in a hurry to test this hypothesis (of issuing both commands and not just one).
I had 10 docker containers running an ec2 instance(t2.large), each instance was running on its own service and the whole services are running in a cluster. I updated the timezone of the ec2 instance machine, this required me to reboot the instance. I rebooted the instance, this problem surfaced. First thing I noticed was that ssh into the machine was slower than before, I later realized docker ps was throwing that error, I magically resolved that later to realize that some of the container instances are running but they are not serving any page docker logs -f CONTAINER_ID let me know nginx didn't start due to privilege issues that some of my files that supposed to be created were not created.
I later realized that my magical solution was a really magical solution(most magical solutions are not solutions), all my 10 containers were trying to start at the same time which required more memory space than space my instance could offer, I later had to delete services and recreate them one by one - allow one container to start before creating another one in the same cluster. That was when I had peace. I hope this help somebody.

Getting docker daemon not running error

I am trying to get docker working on my sys
However, not able to
Steps to reproduce the issue that I am facing:
Installed EPEL on 6.5 RHEL
Installed docker-io
Able to run "docker" command
When running "docker run -i -t fedora /bin/bash" command using without root, getting below error:
FATA[0000] Post http:///var/run/docker.sock/v1.17/containers/create:
dial unix /var/run/docker.sock
http:///var/run/docker.sock/v1.17/containers/create:%20dial%20unix%20/var/run/docker.sock :
permission denied.
Are you trying to connect to a TLS-enabled daemon without TLS?
However, with root getting diff error like below for the same command:
FATA[0000] Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
A RHEL 6.5, also termed Update 5, is from 21 November 2013 and comes with kernel 2.6.32-431.
That seems quite an old kernel for docker to be installed and run successfully. Docker would need ideally a 3.10+ kernel.
Although Adrian Mouat mentions in the comments that the Red Hat Enterprise Linux 6.5 (64-bit) or later is supported
You will need 64 bit RHEL 6.5 or later, with a RHEL 6 kernel version 2.6.32-431 or higher as this has specific kernel fixes to allow Docker to work.
So make sure the docker daemon is started:
sudo service docker start
Then try some sudo docker commands:
sudo docker run -i -t fedora /bin/bash
The doc mentions:
If you get a Cannot start container error mentioning SELinux or permission denied, you may need to update the SELinux policies.
This can be done using sudo yum upgrade selinux-policy and then rebooting.

Resources