I have a problem with Docker that seems to happen when I change the machine type of a Google Compute Platform VM instance. Images that were fine fail to run, fail to delete, and fail to pull, all with various obscure messages about missing keys (this on Linux), duplicate or missing layers, and others I don't recall.
The errors don't always happen. One that occurred just now, with an image that ran a couple hundred times yesterday on the same setup, though before a restart, was:
$ docker run --rm -it mbloore/model:conda4.3.1-aq0.1.9
docker: Error response from daemon: layer does not exist.
$ docker pull mbloore/model:conda4.3.1-aq0.1.9
conda4.3.1-aq0.1.9: Pulling from mbloore/model
Digest: sha256:4d203b18fd57f9d867086cc0c97476750b42a86f32d8a9f55976afa59e699b28
Status: Image is up to date for mbloore/model:conda4.3.1-aq0.1.9
$ docker rmi mbloore/model:conda4.3.1-aq0.1.9
Error response from daemon: unrecognized image ID sha256:8315bb7add4fea22d760097bc377dbc6d9f5572bd71e98911e8080924724554e
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
$
So it thinks it has no images, but the Docker folders are full of files, and it does know some hashes. It looks like some index has been damaged.
I restarted that instance, and then Docker seemed to be normal again without any special action on my part.
The only workarounds I have found so far are to restart and hope, or to delete several large Docker directories, and recreate them empty. Then after a restart and pull and run works again. But I'm now not sure that it always will.
I am running with Docker version 17.05.0-ce on Debian 9. My images were built with Docker version 17.03.2-ce on Amazon Linux, and are based on the official Ubuntu image.
Has anyone had this kind of problem, or know a way to reset the state of Docker without deleting almost everything?
Two points:
1) It seems that changing the VM had nothing to do with it. On some boots Docker worked, on others not, with no change in configuration or contents.
2) At Google's suggestion I installed Stackdriver monitoring and logging agents, and I haven't had a problem through seven restarts so far.
My first guess is that there is a race condition on startup, and adding those agents altered it in my favour. Of course, I'd like to have a real fix, but for now I don't have the time to pursue the problem.
Related
Upon a fresh start of my ubuntu (on a virtualbox vm), I can build my images normally. Then very inconstantly, it can be the next time I try to build, or the 10th time, it will hang forever after running the command docker build .
Dockerfiles are in directories with 5~10 other files (which eliminates the issue with massive file amount slowing down docker while trying to locate the Dockerfile, as seen on other posts)
If I try to build for a new, very simple, Dockerfile (to eliminate any syntax error), it will also hang whenever it hangs with my project's Dockerfiles.
Beside, I am running minikube --driver=none and my images are used for deployments in kubernetes. (with none driver it's not required to run eval $(minikube docker-env) )
The only reliable fix is to stop the vm on which my ubuntu is running, start it again, and it will consistently allow me to build my images at least one time, then the issue comes back inconsistently.
This fix is quite inconvenient as I need to stop everything I am doing and it takes a bit of time.
I have tried to run docker system prune and to delete all the images already built.
What log could I check to find an issue going on when the build hangs ?
Any idea of the origin of the issue ?
Thanks a lot !
Ok this bug was viscous.
Sometimes when I need to check how my nginx server behave in one of the containers, I open the VM's graphical interface and pop firefox to have a look.
I only figured today that firefox prompt a pop-up after a while, asking for the admin password in order to access the keychain. And turns out docker do not build anything until this pop-up is open. Closing it or filling the password fixed my issue...
On another terminal window, please check the Docker Daemon Log using sudo journalctl -fu docker.service at the time of build command hanging. Also, you can check the list of running processes during the build execution.
Simple question: After using Docker for about a week, my docker build command gets bogged down and hangs (before anything executes) for about a minute. After staying in this hanging state, it will execute the docker build command with no issues at all and at at the expected speed.
Other Docker commands (like docker run) do not suffer from this "hanging" issue.
Docker Installation info:
Version 18.06.1-ce-win73
Channel: stable
Things I have tried:
docker system prune - This does clear up space, but doesn't speed up my docker build command
Reinstalling Docker on my machine - This does fix the issue, but it reappeared after about a week of using Docker again.
Does anyone else suffer from this issue?
I had the same issue. I solved it moving the Dockerfile to an empty folder, then I executed the docker build command and worked perfectly.
On some other forums people created a .dockerignore file including the any call to git and many other files, but that approach didn't work for me.
Here was the the issue:
The very first line of my Dockerfile (the FROM command) was failing. The "hanging" was caused by a timeout during the attempt to download the base image. I was attempting to download the base image from a location that I needed to set a proxy on my machine for.
So I was mistaken in my original post: The Docker build command wasn't running as expected. It was failing to download the base image due to a missing proxy setting.
2 reasons:
1.If you are building many dockers for hours ..please restart your router if possible as sometimes due to heavy data packets movement the router collapses.
2.Increase RAM ,CPU and Swap of docker engine and restart docker and try to build again.
I am trying to troubleshoot a Dockerfile I found on the web. As it is failing in a weird way, I am wondering whether failed docker builds or docker runs from various subsets of that file or other files that I have been experimenting with might corrupt some part of Docker's own state.
In other words, would it possibly help to restart Docker itself, Reboot the computer, or do some other Docker command, to eliminate that possibility?
Sometimes just rebooting things helps and it's not wrong to try restarting Docker for Mac or do a full reboot, but I can't think of a specific symptom it would fix and it's not something I need to do routinely.
I've only really run into two classes of problems that sound like what you're describing.
If you have a Dockerfile step that consistently succeeds, but produces inconsistent results:
RUN curl http://may.not.exist.example.com/ || true
You can wind up in a situation where the underlying command failed or produced the wrong output, but the RUN step as a whole succeeded. docker build --no-cache will re-run a build ignoring this, and an extremely aggressive docker rmi sequence (deleting every build, current and past, of the image in question) will clean it up too.
The other class of problem I've encountered involves some level of corruption in /var/lib/docker. This usually has very obvious symptoms generally involving "file not found" or "failed mounting directory" type errors on a setup that you otherwise know works. I've encountered it more on native Linux than Docker for Mac, probably because the DfM Linux installation is a little more controlled and optimized for Docker (it definitely isn't running a 3-year-old kernel with arbitrary vendor patches). On Linux you can work around this by stopping Docker, deleting everything in /var/lib/docker, and starting Docker again; in Docker for Mac, on the preferences window, there's a "Reset" page with various destructive cleanup options and "Reset to factory defaults" is closest to this.
I would first attempt using the Docker 'Diagnose and Feedback option. This generally runs tests on the health of Docker and the Docker engine.
Docker desktop also has options for various troubleshooting scenarios under 'Preferences' > 'Reset' (if you're using Docker Desktop) which have helped me in the past.
A brief look through the previous Docker Release notes.
It certainly looks like it has been possible in the past to corrupt the Docker Engine; there is evidence suggesting the engine has been iteratively fixed since.
I'm trying to follow the Docker Get Started guide. Currently I'm at part 4. Everything up until the point
docker stack deploy -c docker-compose.yml getstartedlab
worked well. However, after trying to deploy the services, when I run docker stack ps getstartedlab, I see that the swarm manager keeps trying to restart the containers, since every time they get the error "No such image: username/get-st…" and have their state as "Rejected 6 seconds ago" etc.
I tried to search for solutions a bit but surprisingly it seems that nobody encountered this error before whatsoever. The issue here and a similar section in the Get Started guide talks about situations where one wants to pull from a private registry. However, throughout the tutorial I've been working with the default public registry. All previous steps (e.g. launching the swarm locally, without using virtualbox) worked fine.
Versions:
Docker version 18.02.0-ce, build fc4de447b5
Virtualbox 5.2.8 r120774
System Kernel: 4.14.25-1-MANJARO
Any idea what might have been the problem?
Surprisingly passing in the flag --with-registry-auth worked even though my repo is apparently on Docker Hub. Not sure what the problem was but maybe the claim that one would only need this flag if they're using a private registry is a bit inaccurate then.
I have the latest Docker for Mac installed, and I'm running into a problem where it appears that docker-compose up is stuck in a Downloading state for one of the containers:
± |master ✗| → docker-compose up --build
Pulling container (repo.io/company/container:prod)...
prod: Pulling from company/container
somehash: Already exists
somehash: Already exists
somehash: Already exists
somehash: Already exists
somehash: Pulling fs layer
somehash: Already exists
somehash: Already exists
somehash: Downloading [=================================================> ] 234.6 MB/239.3 MB
somehash: Download complete
somehash: Download complete
^^ this is literally what it looks like on my command line. Stopping and starting hasn't helped, it immediately outputs this same output.
I've tried to rm the container but I guess it doesn't yet exist, it returns the output No stopped containers. --force-recreate also gets stuck in the same place. And perhaps I'm not googling for the right terminology but I haven't found anything useful to try - any pointers?
I just needed to restart Docker.
Linux users can use sudo service docker restart.
Docker for Mac has a handy button for this in the Docker widget in the macOS toolbar:
If you happen to be using Docker Toolkit try docker-machine restart.
I faced the same problem! Restarting the service didn't help, downloading again didn't help. It used to get stuck at random instances leaving me with no option but to kill the pull request.
One thing which worked for me was to download 1 file at a time. For Ubuntu users, you can use the following steps:
Stop the service:
sudo service docker stop
Start docker with max concurrent download set as 1:
sudo dockerd --max-concurrent-downloads 1
Download the required image:
sudo docker pull <image_name>
Download images, after that stop the terminal and start the daemon again as it was earlier.
sudo service docker start
I had the similar situation this morning where my network suddenly went down and I was forced to power cycle the modern, while docker-compose was still in the middle of downloading stuff from docker hub.
Yes, bouncing the docker daemon process seems to resolve this.
For Linux users - do sudo service docker restart to fix it.
Go to the Docker Preferences from its menu bar icon. Within there is a "bug" icon. Click on that and then "clean / Purge data"
I'm running OSX and restarting Docker for Mac didn't help. Neither did a full restart or upgrading VirtualBox. What did work was turning my wifi interface on and off every time it got stuck. I had to do this repeatedly, but it eventually downloaded the entire image.
Directly download the necessary images using docker, e.g.
docker pull company/container
and then run
docker-compose up
again. Worked for me on MacOS.
I found a possible workaround.
I have my docker engine installed in a Ubuntu 18.04 Snap Environment.
I discovered searching in some forums that users relate this behaviors to limitation in the download bandwith.
So in the picture below you are going to watch that the components was stucked
Part of the Downloads stucked and finally I cancelled the process CTRL + C
I added two parameters or flags in the configuration file that controls the docker daemon behavior: max-concurrent-downloads 1 and max-concurrent-uploads 1
In my case remember, i am working in a snap environment. This file is located in this directory: /var/lib/docker/current/config/daemon.json
REMEMBER TO STOP ALL DOCKER PROCESS BEFORE THE FILE MODIFICATION, AND CREATE BACKUP OF THE FILE
Add the two lines in the picture. This is going to help you to limit the downloads to only one by one
This is the process that helped me to resolve this problem.
Download Succesfull
I had this issue in my VirtualBox when doing a docker pull on the image but it got stuck at a specific position and never moved from there. So, the issue was due to the network adapters in my VM. I was using NAT by default. When I switched it to "Bridged adapter", the issue went away.
I had a similar problem on docker for windows for a couple of days and when I tried to connect to the virtual machine (via Hyper-V Manager) the downloads started speeding along. I have no idea why but it worked for me...
Completely remove docker
Install docker again
It should work now
I tried to restar docker, update docker, but didnt help