Docker service showing no such image when trying to upgrade service - docker

First of all sorry if I have a bad english.
We have a service that was being upgraded until 26 / September / 2022, via portainer or via terminal on Docker. It was on gitlab registry.
We did not make any changes but we are not able to upgrade it anymore!
How can we debug why this message is appearing?
No such image: registry.gitlab.com/xxxx/xxx/api:1.1.18#sha256:xxxx
Some additional informations:
-We are using docker login before trying to do the service update.
-We can do docker pull registry.gitlab.com/etc/etc (the version)
The problem only occurs when we try to upgrade it as a service.
There is some kind of debug on the service upgrade that can provide some additional information like firewall is blocking or something like this?
docker service update nameofservice
nameofservice
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
1/1: preparing [=================================> ]
Until return the error 'no such image'!
I am pretty sure the image exists.

If you are experiencing the same problem, check if you have more nodes, phisical machines or vms running connected to your docker node (docker node ls).
If that is your case, run docker pull gitlabaddressetcetc on the other nodes and check if everything is fine.
I found the message 'No space left on device', so I runned 'df -h' but a lot of space are available for the VM. Anyway I decided to run 'docker prune -f' to see what happens:
So running the 'docker system prune -f' seems to solved my problem, and everything is fine now.
After that I just needed to change the version of the portainer to a invalid one before trying again.

Related

Docker connectivity issues (to Azure DevOps Services from self hosted Linux Docker agent)

I am looking for some advice on debugging some extremely painful Docker connectivity issues.
In particular, for an Azure DevOps Services Git repository, I am running a self-hosted (locally) dockerized Linux CI (setup according to https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops#linux), which has been working fine for a few months now.
All this runs on a company network, and since last week the network connection of my docker container became highly unstable:
Specifically it intermittently looses network connection, which is also visible via the logs of the Azure DevOps agent, which then keeps trying to reconnect.
This especially happens while downloading Git LFS objects. Enabling extra traces via GIT_TRACE=1 highlights a lot of connection failures and retries:
trace git-lfs: xfer: failed to resume download for "SHA" from byte N: expected status code 206, received 200. Re-downloading from start
During such a LFS pull / fetch, sometimes the container even stops responding as a docker container list command only responds:
Error response from daemon: i/o timeout
As a result the daemon cannot recover on its own, and needs a manual restart (to get back up the CI).
Also I see remarkable differences in network performance:
Manually cloning the same Git repository (including LFS objects, all from scratch) in container instances (created from the same image) on different machines, takes less than 2mins on my dev laptop machine (connected from home via VPN), while the same operation easily takes up to 20minutes (!) on containers running two different Win10 machines (company network, physically located in offices, hence no VPN.
Clearly this is not about the host network connection itself, since cloning on the same Win10 hosts (company network/offices) outside of the containers takes only 14seconds!
Hence I am suspecting some network configuration issues (e.g. sth with the Hyper-V vEthernet Adapter? Firewall? Proxy? or whichever other watchdog going astray?), but after three days of debugging, I am not quite sure how to further investigate this issue, as I am running out of ideas and expertise. Any thoughts / advice / hints?
I should add that LFS configuration options (such as lfs.concurrenttransfers and lfs.basictransfersonly) did not really help, similarly for git config http.version (or just removing some larger files)
UPDATE
it does not actually seem to be about the self-hosted agent but a more general docker network cfg issue within my corporate network.
Running the following works consistently fast on my VPN machine (running from home):
docker run -it
ubuntu bash -c "apt-get update; apt-get install -y wget; start=$SECONDS;
wget http://cdimage.ubuntu.com/lubuntu/releases/18.04/release/lubuntu-18.04-alternate-amd64.iso;
echo Duration: $(( SECONDS - start )) seconds"
Comparision with powershell download (on the host):
$start=Get-Date
$(New-Object
net.webclient).Downloadfile("http://cdimage.ubuntu.com/lubuntu/releases/18.04/release/lubuntu-18.04-alternate-amd64.iso",
"e:/temp/lubuntu-18.04-alternate-amd64.iso")
'Duration: {0:mm}
min {0:ss} sec' -f ($(Get-Date)-$start)
Corporate network
Docker: 1560 seconds (=26 min!)
Windows host sys: Duration: 00 min 15 sec
Dev laptop (VPN, from home):
Docker: 144 seconds (=2min 24sec)
Windows host sys: Duration: 02 min 16 sec
Looking at the issues discussed in https://github.com/docker/for-win/issues/698 (and proposed workaround which didn't work for me), it seems to be a non-trivial problem with Windows / hyper-v ..
The whole issue "solved itself" when my company decided to finally upgrade from Win10 1803 to 1909 (which comes with WSL, replacing Hyper-V) .. 😂
Now everything runs supersmoothly (I kept running these tests for almost 20 times)

`docker build` command hangs for a very long time, other commands work fine

Simple question: After using Docker for about a week, my docker build command gets bogged down and hangs (before anything executes) for about a minute. After staying in this hanging state, it will execute the docker build command with no issues at all and at at the expected speed.
Other Docker commands (like docker run) do not suffer from this "hanging" issue.
Docker Installation info:
Version 18.06.1-ce-win73
Channel: stable
Things I have tried:
docker system prune - This does clear up space, but doesn't speed up my docker build command
Reinstalling Docker on my machine - This does fix the issue, but it reappeared after about a week of using Docker again.
Does anyone else suffer from this issue?
I had the same issue. I solved it moving the Dockerfile to an empty folder, then I executed the docker build command and worked perfectly.
On some other forums people created a .dockerignore file including the any call to git and many other files, but that approach didn't work for me.
Here was the the issue:
The very first line of my Dockerfile (the FROM command) was failing. The "hanging" was caused by a timeout during the attempt to download the base image. I was attempting to download the base image from a location that I needed to set a proxy on my machine for.
So I was mistaken in my original post: The Docker build command wasn't running as expected. It was failing to download the base image due to a missing proxy setting.
2 reasons:
1.If you are building many dockers for hours ..please restart your router if possible as sometimes due to heavy data packets movement the router collapses.
2.Increase RAM ,CPU and Swap of docker engine and restart docker and try to build again.

Docker fails on changed GCP virtual machine?

I have a problem with Docker that seems to happen when I change the machine type of a Google Compute Platform VM instance. Images that were fine fail to run, fail to delete, and fail to pull, all with various obscure messages about missing keys (this on Linux), duplicate or missing layers, and others I don't recall.
The errors don't always happen. One that occurred just now, with an image that ran a couple hundred times yesterday on the same setup, though before a restart, was:
$ docker run --rm -it mbloore/model:conda4.3.1-aq0.1.9
docker: Error response from daemon: layer does not exist.
$ docker pull mbloore/model:conda4.3.1-aq0.1.9
conda4.3.1-aq0.1.9: Pulling from mbloore/model
Digest: sha256:4d203b18fd57f9d867086cc0c97476750b42a86f32d8a9f55976afa59e699b28
Status: Image is up to date for mbloore/model:conda4.3.1-aq0.1.9
$ docker rmi mbloore/model:conda4.3.1-aq0.1.9
Error response from daemon: unrecognized image ID sha256:8315bb7add4fea22d760097bc377dbc6d9f5572bd71e98911e8080924724554e
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
$
So it thinks it has no images, but the Docker folders are full of files, and it does know some hashes. It looks like some index has been damaged.
I restarted that instance, and then Docker seemed to be normal again without any special action on my part.
The only workarounds I have found so far are to restart and hope, or to delete several large Docker directories, and recreate them empty. Then after a restart and pull and run works again. But I'm now not sure that it always will.
I am running with Docker version 17.05.0-ce on Debian 9. My images were built with Docker version 17.03.2-ce on Amazon Linux, and are based on the official Ubuntu image.
Has anyone had this kind of problem, or know a way to reset the state of Docker without deleting almost everything?
Two points:
1) It seems that changing the VM had nothing to do with it. On some boots Docker worked, on others not, with no change in configuration or contents.
2) At Google's suggestion I installed Stackdriver monitoring and logging agents, and I haven't had a problem through seven restarts so far.
My first guess is that there is a race condition on startup, and adding those agents altered it in my favour. Of course, I'd like to have a real fix, but for now I don't have the time to pursue the problem.

Running `docker stack deploy` on a local VM results in "No such image" error even though the image is on the public registry

I'm trying to follow the Docker Get Started guide. Currently I'm at part 4. Everything up until the point
docker stack deploy -c docker-compose.yml getstartedlab
worked well. However, after trying to deploy the services, when I run docker stack ps getstartedlab, I see that the swarm manager keeps trying to restart the containers, since every time they get the error "No such image: username/get-st…" and have their state as "Rejected 6 seconds ago" etc.
I tried to search for solutions a bit but surprisingly it seems that nobody encountered this error before whatsoever. The issue here and a similar section in the Get Started guide talks about situations where one wants to pull from a private registry. However, throughout the tutorial I've been working with the default public registry. All previous steps (e.g. launching the swarm locally, without using virtualbox) worked fine.
Versions:
Docker version 18.02.0-ce, build fc4de447b5
Virtualbox 5.2.8 r120774
System Kernel: 4.14.25-1-MANJARO
Any idea what might have been the problem?
Surprisingly passing in the flag --with-registry-auth worked even though my repo is apparently on Docker Hub. Not sure what the problem was but maybe the claim that one would only need this flag if they're using a private registry is a bit inaccurate then.

Docker deployments fail on Marathon, work fine otherwise

I have been trying to deploy a docker container web based application on Mesos using Mesosphere Marathon.
I first tried deploying my Play Framework application which works fine when I launch it using the docker container. Then I also tried the example application mention on the Mesosphere website. Both fail inside marathon, but work fine when run as standalone docker images.
The application shows up as "Waiting" or "Deploying" in Marathon web UI while on Mesos it fails. I have made sure that the Mesos slave is running fine.
I believe that because the application fails on Mesos, Marathon tries to restart it which is why I get these status message almost always.
I have previously tried deploying the same application (without wrapping it inside the docker container) on Marathon (same installation) and it has worked fine. However, we really want to use Docker for our applications.
I have gone through plenty of tutorials and everything seems to be following the "rules". I don't understand what could be wrong.
Edit:
E1104 19:29:01.291219 4242 slave.cpp:3342] Container '9dbebe8c-5506-4f70-b560-34be39ecdc96' for executor 'mediator.30dbd1ed-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found
W1104 19:29:01.293334 4244 docker.cpp:1002] Ignoring updating unknown container: 9dbebe8c-5506-4f70-b560-34be39ecdc96
E1104 19:29:06.711524 4241 slave.cpp:3342] Container 'b7f8004a-2759-41ec-8169-61d04a7c4c3d' for executor 'mediator.343b027e-82fc-11e5-b1d4-56847afe9799' of framework '64d39023-aad3-4fdc-8565-6d8e3ec9cb77-0000' failed to start: Failed to 'docker -H unix:///var/run/docker.sock pull devrep/message-mediator:latest': exit status = exited with status 1 stderr = Error: image devrep/message-mediator:latest not found
Without an actual error message or the logs, it's hard to guess what your problem could be.
My first thought is that you should check whether your Mesos Slaves are started with the --containerizers=docker,mesos flag at all. If not, it can't work at all.
Also, if you're using a private registry, either make sure that Docker on your Mesos Slaves is either configured to use it, or follow the guidelines in the Marathon docs on how o use a private registry.
Can you do a docker pull devrep/message-mediator:latest on any Mesos Slave?
Also, see
https://github.com/mesosphere/marathon/issues/1781
I know its very late to answer it but might be helpful. Seeing your logs I find
devrep/message-mediator:latest
here latest is the tag name of your image, if you don't provide one in container docker image or leave it blank like below
"container": {
"type": "DOCKER",
"docker": {
"image": "devrep/message-mediator",
},
},
it automatically tries to pull the devrep/message-mediator:latest which I highly doubt will be present so try adding a tag name always e.g in my case it was v1
devrep/message-mediator:v1

Resources