I've got a public docker container at docker.com and I'm trying to unsuccesfully get Watchtower to automatically pull when the container is updated. This worked many months ago but is not now. I created watchtower container with docker-compose (I was using the default launch command previously with no token and it was working).
One point that confuses me is that since this is a public container, I'd not think I need any auth to pull it.
Here is the docker logs of watchtower showing the 401 auth error.
time="2021-11-29T15:26:06Z" level=debug msg="No new images found for /documents_watchtower_1"
time="2021-11-29T15:26:06Z" level=debug msg="Trying to load authentication credentials." container=/web image="pkellner/svccps1:3.0.63"
time="2021-11-29T15:26:06Z" level=debug msg="No credentials for pkellner found" config_file=/config.json
time="2021-11-29T15:26:06Z" level=debug msg="Got image name: pkellner/svccps1:3.0.63"
time="2021-11-29T15:26:06Z" level=debug msg="Checking if pull is needed" container=/web image="pkellner/svccps1:3.0.63"
time="2021-11-29T15:26:06Z" level=debug msg="Building challenge URL" URL="https://index.docker.io/v2/"
time="2021-11-29T15:26:06Z" level=debug msg="Got response to challenge request" header="Bearer realm=\"https://auth.docker.io/token\",service=\"registry.docker.io\"" status="401 Unauthorized"
time="2021-11-29T15:26:06Z" level=debug msg="Checking challenge header content" realm="https://auth.docker.io/token" service=registry.docker.io
time="2021-11-29T15:26:06Z" level=debug msg="Setting scope for auth token" image=pkellner/svccps1 scope="repository:pkellner/svccps1:pull"
time="2021-11-29T15:26:06Z" level=debug msg="No credentials found."
time="2021-11-29T15:26:07Z" level=debug msg="Parsing image ref" host=index.docker.io image=pkellner/svccps1 normalized="docker.io/pkellner/svccps1:3.0.63" tag=3.0.63
time="2021-11-29T15:26:07Z" level=debug msg="Doing a HEAD request to fetch a digest" url="https://index.docker.io/v2/pkellner/svccps1/manifests/3.0.63"
time="2021-11-29T15:26:07Z" level=debug msg="Found a remote digest to compare with" remote="sha256:xxx"
time="2021-11-29T15:26:07Z" level=debug msg=Comparing local="sha256:xxx" remote="sha256:xxx"
time="2021-11-29T15:26:07Z" level=debug msg="Found a match"
time="2021-11-29T15:26:07Z" level=debug msg="No pull needed. Skipping image."
time="2021-11-29T15:26:07Z" level=debug msg="No new images found for /web"
My docker-compose file:
services:
watchtower:
image: containrrr/watchtower
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /etc/timezone:/etc/timezone:ro
environment:
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_LABEL_ENABLE=true
- WATCHTOWER_INCLUDE_RESTARTING=true
- WATCHTOWER_HTTP_API_TOKEN=xxxxxxx{from docker.com token}
- WATCHTOWER_DEBUG=true
- WATCHTOWER_POLL_INTERVAL=60
labels:
- "com.centurylinklabs.watchtower.enable=true"
EDIT
NEVEMIND THIS QUESTION. I found that one of my services, which is using a Docker.DotNet, was terminating the services marked as Shutdown. I've corrected the bug and have regained my trust in Docker and Docker Swarm.
Thank you Carlos for you help. My bad, my fault. Sorry for that!
I have 13 services configured on a docker-compose file and running in Swarm mode with one manager and two worker nodes.
Then I make one of the worker nodes unavailable by draining it
docker node update --availability drain ****-v3-6by7ddst
What I notice is that all the services that where running on the drained node are removed and not scheduled to the available node.
The available worker and manager nodes still have plenty of resources.. The services are simply removed.I am now down to 9 services
Looking at the logs I see stuff like bellow but repeated with different service ids
level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
level=error msg="Error getting service u68b1fofzb3nefpnasctpywav: service u68b1fofzb3nefpnasctpywav not found"
level=warning msg="rmServiceBinding 021eda460c5744fd4d499475e5aa0f1cfbe5df479e5b21389ed1b501a93b47e1 possible transient state ok:false entries:0 set:false "
Then, for debug purposes I set my node back to available
docker node update --availability active ****-v3-6by7ddst
Then I try to balance some of the services to the newly available node. And this is the result.
I get the same error on the logs
level=error msg="Error getting service ****_frontend: service ****_frontend not found"
level=warning msg="rmServiceBinding 6bb220c0a95b30cdb3ff7b577c7e9dec7ad6383b34aff85e1685e94e7486e3ea possible transient state ok:false entries:0 set:false "
msg="Error getting service l29wlucttul75pzqo2sgr0u9e: service l29wlucttul75pzqo2sgr0u9e not found"
On my docker-compose file I am configuring all my services like this. Restart policy is any.
frontend:
image: {FRONTEND_IMAGE}
deploy:
labels:
- "traefik.enable=true"
- "traefik.docker.lbswarm=true"
- "traefik.http.routers.frontend.rule=Host(`${FRONTEND_HOST}`)"
- "traefik.http.routers.frontend.entrypoints=websecure"
- "traefik.http.routers.frontend.tls.certresolver=myhttpchallenge"
- "traefik.http.services.frontend.loadbalancer.server.port=80"
- "traefik.docker.network=ingress"
replicas: 1
resources:
limits:
memory: ${FRONTEND_LIMITS_MEMORY}
cpus: ${FRONTEND_LIMITS_CPUS}
reservations:
memory: ${FRONTEND_RESERVATION_MEMORY}
cpus: ${FRONTEND_RESERVATION_CPUS}
restart_policy:
condition: any
networks:
- ingress
Something fails while recreating services on different nodes, and even with only one manager/worker node I get the same result.
The rest seems to work fine. As an example, if I scale a service it works well.
New Edit
Just did another test.
This time I only have two services, traefik and front-end.
One instance for traefik
4 instances for front-end
two nodes (one manager and one worker)
Drained worker node and front-end instances running on the drained node are moved to the manager node
Activated back the worker node
Did a docker service update cords_frontend --force and two instances of front-end are killed on the manager node and are placed running on the worker node.
So, with this test with only two services everything works fine.
Is there any kind of limit to the number of services and stack should have?
Any clues why this is happening?
Thanks
Hugo
I believe you may be running into an issue with resource reservations. You mention that the nodes available have plenty of resources, but the way reservations work, a service will not be scheduled if it can't reserve the resources specified, very important to note that this has nothing to do with how much resources the service is actually using. This means that if you specify a reservation you are basically saying that service will reserve that amount of resources and those resources are not available for other services to use. So if all your services have similar reservations you may be running into a situation where even though the node shows available resources, those resources are in fact reserved by the existing service. So I would suggest you remove the reservations section and try it to see if that is in fact what is happening.
So I am still struggling with this.
As a recap, I have on docker stack with 13 services running in swarm mode with two nodes (manager+worker). Each node has 4 cores and 8GB of RAM (Ubuntu 18.04, docker 19.03.12).
If a node dies, or I drain a node, all the services running on this node die and are marked are Removed. If I simply run docker service update front_end --force the service also dies and is marked as removed.
Another important detail is, if I sum up all the reserved memory and cores from the 13 services I end up with 1.9 cores and 4GB of RAM, way bellow each of the nodes resources.
I don't see any out of memory on the containers, services or stack logs. Also, by using htop tool I can see that memory usage is using 647MB/7.79GB on Manager node and 2GB/7.79GB on the worker node.
This is what i tried so far:
separated the 13 services into two different stacks. No luck.
removed all the reservations tags from the compose files. No luck.
tried running with 3 nodes. No luck.
I was seeing this warning WARNING: No swap limit support so I followed the suggestions on this document on both nodes enter link description here. No luck.
Upped both node resources to 8 cores and 16BG of RAM. No luck.
tried starting each service one at the time, and I noticed it starts behaving badly with 10 or more services. That is to say, everything works fine if I have up to 9 services running, after this I see the behaviour described above.
Also, I enabled docker's debug mode to see what was happening. Here are the outputs.
If I run docker service update front_end --force and front_end service dies, this is the output form docker events
service update k6a7go4uhexb4b1u1fp98dtke (name=frontend)
service update k6a7go4uhexb4b1u1fp98dtke (name=frontend, updatestate.new=updating)
service remove k6a7go4uhexb4b1u1fp98dtke (name=frontend)
logs from journalctl -fu docker.service
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="form data: {\"EndpointSpec\":{\"Mode\":\"vip\"},\"Labels\":{\"com.docker.stack.image\":\"registry.gitlab.com/devteam/.frontend:1.0.5\",\"com.docker.stack.namespace\":\"\",\"traefik.docker.lbswarm\":\"true\",\"traefik.docker.network\":\"net\",\"traefik.enable\":\"true\",\"traefik.http.routers.frontend.entrypoints\":\"websecure\",\"traefik.http.routers.frontend.rule\":\"Host(`www.frontend.website`)\",\"traefik.http.routers.frontend.tls.certresolver\":\"myhttpchallenge\",\"traefik.http.services.frontend.loadbalancer.server.port\":\"80\"},\"Mode\":{\"Replicated\":{\"Replicas\":1}},\"Name\":\"frontend\",\"TaskTemplate\":{\"ContainerSpec\":{\"Image\":\"registry.gitlab.com/fdevteam/frontend:1.0.5#sha256:e9a0d88bc14848c3b40c3d2905842313bbc648c1bbf09305f8935f9eb23f289a\",\"Isolation\":\"default\",\"Labels\":{\"com.docker.stack.namespace\":\"f\"},\"Privileges\":{\"CredentialSpec\":null,\"SELinuxContext\":null}},\"ForceUpdate\":1,\"Networks\":[{\"Aliases\":[\"frontend\"],\"Target\":\"w7aqg3stebnmk5c5pbhgslh2d\"}],\"Placement\":{\"Platforms\":[{\"Architecture\":\"amd64\",\"OS\":\"linux\"}]},\"Resources\":{},\"RestartPolicy\":{\"Condition\":\"any\",\"MaxAttempts\":0},\"Runtime\":\"container\"}}"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
...
level=debug msg="Calling GET /v1.40/tasks?filters=%7B%22_up-to-date%22%3A%7B%22true%22%3Atrue%7D%2C%22service%22%3A%7B%22frontend%22%3Atrue%7D%7D"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="handleEpTableEvent UPD 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 R:{frontend.1.lv7tjjaev45pvn0f7qtppb21r frontend nnlg81dsspnj6oxip4iqwwjc3 10.0.1.73 10.0.1.74 [] [frontend] [e661c9f39097] true}"
level=debug msg="rmServiceBinding from handleEpTableEvent START for frontend 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 p:0xc004a1f880 nid:w7aqg3stebnmk5c5pbhgslh2d sKey:{nnlg81dsspnj6oxip4iqwwjc3 } deleteSvc:true"
level=debug msg="deleteEndpointNameResolution 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 frontend rm_service:false suppress:false sAliases:[frontend] tAliases:[e661c9f39097]"
level=debug msg="delContainerNameResolution 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 frontend.1.lv7tjjaev45pvn0f7qtppb21r"
level=debug msg="6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 (w7aqg3s).deleteSvcRecords(frontend.1.lv7tjjaev45pvn0f7qtppb21r, 10.0.1.74, <nil>, true) rmServiceBinding sid:6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 "
level=debug msg="6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 (w7aqg3s).deleteSvcRecords(tasks.frontend, 10.0.1.74, <nil>, false) rmServiceBinding sid:nnlg81dsspnj6oxip4iqwwjc3 "
level=debug msg="rmServiceBinding from handleEpTableEvent END for frontend 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=error msg="Error getting service frontend: service frontend not found"
level=debug msg="handleEpTableEvent DEL 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 R:{frontend.1.lv7tjjaev45pvn0f7qtppb21r frontend nnlg81dsspnj6oxip4iqwwjc3 10.0.1.73 10.0.1.74 [] [frontend] [e661c9f39097] true}"
level=debug msg="rmServiceBinding from handleEpTableEvent START for frontend 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 p:0xc004a1f880 nid:w7aqg3stebnmk5c5pbhgslh2d sKey:{nnlg81dsspnj6oxip4iqwwjc3 } deleteSvc:true"
level=debug msg="deleteEndpointNameResolution 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 frontend rm_service:true suppress:false sAliases:[frontend] tAliases:[e661c9f39097]"
level=debug msg="delContainerNameResolution 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 frontend.1.lv7tjjaev45pvn0f7qtppb21r"
level=debug msg="6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 (w7aqg3s).deleteSvcRecords(frontend.1.lv7tjjaev45pvn0f7qtppb21r, 10.0.1.74, <nil>, true) rmServiceBinding sid:6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 "
level=debug msg="6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 (w7aqg3s).deleteSvcRecords(tasks.frontend, 10.0.1.74, <nil>, false) rmServiceBinding sid:nnlg81dsspnj6oxip4iqwwjc3 "
level=debug msg="6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745 (w7aqg3s).deleteSvcRecords(frontend, 10.0.1.73, <nil>, false) rmServiceBinding sid:nnlg81dsspnj6oxip4iqwwjc3 "
level=debug msg="rmServiceBinding from handleEpTableEvent END for frontend 6b20c2924ec1eafa20c27d572019207551819b10a2c4f8d0574f2e142274c745"
If the service does not die (that is the case with 9 or less services) this is the ouput:
service update n1wh16ru879699cpv3topcanc (name=frontend)
service update n1wh16ru879699cpv3topcanc (name=frontend, updatestate.new=updating)
service update n1wh16ru879699cpv3topcanc (name=frontend, updatestate.new=completed, updatestate.old=updating)
logs from journalctl -fu docker.service
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="form data: {\"EndpointSpec\":{\"Mode\":\"vip\"},\"Labels\":{\"com.docker.stack.image\":\"registry.gitlab.com/devteam/.frontend:1.0.5\",\"com.docker.stack.namespace\":\"\",\"traefik.docker.lbswarm\":\"true\",\"traefik.docker.network\":\"net\",\"traefik.enable\":\"true\",\"traefik.http.routers.frontend.entrypoints\":\"websecure\",\"traefik.http.routers.frontend.rule\":\"Host(`www.frontend.website`)\",\"traefik.http.routers.frontend.tls.certresolver\":\"myhttpchallenge\",\"traefik.http.services.frontend.loadbalancer.server.port\":\"80\"},\"Mode\":{\"Replicated\":{\"Replicas\":1}},\"Name\":\"frontend\",\"TaskTemplate\":{\"ContainerSpec\":{\"Image\":\"registry.gitlab.com/devteam/.frontend:1.0.5#sha256:e9a0d88bc14848c3b40c3d2905842313bbc648c1bbf09305f8935f9eb23f289a\",\"Isolation\":\"default\",\"Labels\":{\"com.docker.stack.namespace\":\"\"},\"Privileges\":{\"CredentialSpec\":null,\"SELinuxContext\":null}},\"ForceUpdate\":3,\"Networks\":[{\"Aliases\":[\"frontend\"],\"Target\":\"w7aqg3stebnmk5c5pbhgslh2d\"}],\"Placement\":{\"Platforms\":[{\"Architecture\":\"amd64\",\"OS\":\"linux\"}]},\"Resources\":{},\"RestartPolicy\":{\"Condition\":\"any\",\"MaxAttempts\":0},\"Runtime\":\"container\"}}"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
...
level=debug msg="Calling GET /v1.40/tasks?filters=%7B%22_up-to-date%22%3A%7B%22true%22%3Atrue%7D%2C%22service%22%3A%7B%22frontend%22%3Atrue%7D%7D"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="handleEpTableEvent UPD e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 R:{frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo frontend n1wh16ru879699cpv3topcanc 10.0.1.32 10.0.1.46 [] [frontend] [f986fe859440] true}"
level=debug msg="rmServiceBinding from handleEpTableEvent START for frontend e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 p:0xc005e1fa00 nid:w7aqg3stebnmk5c5pbhgslh2d sKey:{n1wh16ru879699cpv3topcanc } deleteSvc:true"
level=debug msg="deleteEndpointNameResolution e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 frontend rm_service:false suppress:false sAliases:[frontend] tAliases:[f986fe859440]"
level=debug msg="delContainerNameResolution e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo"
level=debug msg="e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 (w7aqg3s).deleteSvcRecords(frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo, 10.0.1.46, <nil>, true) rmServiceBinding sid:e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 "
level=debug msg="e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 (w7aqg3s).deleteSvcRecords(tasks.frontend, 10.0.1.46, <nil>, false) rmServiceBinding sid:n1wh16ru879699cpv3topcanc "
level=debug msg="rmServiceBinding from handleEpTableEvent END for frontend e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
...
level=debug msg="Calling GET /v1.40/tasks?filters=%7B%22_up-to-date%22%3A%7B%22true%22%3Atrue%7D%2C%22service%22%3A%7B%22frontend%22%3Atrue%7D%7D"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="handleEpTableEvent DEL e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 R:{frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo frontend n1wh16ru879699cpv3topcanc 10.0.1.32 10.0.1.46 [] [frontend] [f986fe859440] true}"
level=debug msg="rmServiceBinding from handleEpTableEvent START for frontend e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 p:0xc005e1fa00 nid:w7aqg3stebnmk5c5pbhgslh2d sKey:{n1wh16ru879699cpv3topcanc } deleteSvc:true"
level=debug msg="deleteEndpointNameResolution e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 frontend rm_service:true suppress:false sAliases:[frontend] tAliases:[f986fe859440]"
level=debug msg="delContainerNameResolution e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo"
level=debug msg="e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 (w7aqg3s).deleteSvcRecords(frontend.1.zeq4jz8kzle4c7vtzx5ofbrqo, 10.0.1.46, <nil>, true) rmServiceBinding sid:e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 "
level=debug msg="e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 (w7aqg3s).deleteSvcRecords(tasks.frontend, 10.0.1.46, <nil>, false) rmServiceBinding sid:n1wh16ru879699cpv3topcanc "
level=debug msg="e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98 (w7aqg3s).deleteSvcRecords(frontend, 10.0.1.32, <nil>, false) rmServiceBinding sid:n1wh16ru879699cpv3topcanc "
level=debug msg="rmServiceBinding from handleEpTableEvent END for frontend e21b861c447ffd78bd2014744c13a146accd4600412c12b8cccfe3f3af4f0b98"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
...
level=debug msg="Calling GET /v1.40/tasks?filters=%7B%22_up-to-date%22%3A%7B%22true%22%3Atrue%7D%2C%22service%22%3A%7B%22frontend%22%3Atrue%7D%7D"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
level=debug msg="handleEpTableEvent ADD 521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 R:{frontend.1.1v9ggahd87x2ydlkna0qx7jmz frontend n1wh16ru879699cpv3topcanc 10.0.1.32 10.0.1.47 [] [frontend] [3671840709bb] false}"
level=debug msg="addServiceBinding from handleEpTableEvent START for frontend 521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 p:0xc004a1ed80 nid:w7aqg3stebnmk5c5pbhgslh2d skey:{n1wh16ru879699cpv3topcanc }"
level=debug msg="addEndpointNameResolution 521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 frontend add_service:true sAliases:[frontend] tAliases:[3671840709bb]"
level=debug msg="addContainerNameResolution 521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 frontend.1.1v9ggahd87x2ydlkna0qx7jmz"
level=debug msg="521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 (w7aqg3s).addSvcRecords(frontend.1.1v9ggahd87x2ydlkna0qx7jmz, 10.0.1.47, <nil>, true) addServiceBinding sid:521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0"
level=debug msg="521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 (w7aqg3s).addSvcRecords(tasks.frontend, 10.0.1.47, <nil>, false) addServiceBinding sid:n1wh16ru879699cpv3topcanc"
level=debug msg="521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0 (w7aqg3s).addSvcRecords(frontend, 10.0.1.32, <nil>, false) addServiceBinding sid:n1wh16ru879699cpv3topcanc"
level=debug msg="addServiceBinding from handleEpTableEvent END for frontend 521ffeee31efe056900fb5a1fe73007c179594e964f703625cf3272eb14983c0"
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
...
level=debug msg="Calling GET /v1.40/services/frontend?insertDefaults=false"
level=debug msg="error handling rpc" error="rpc error: code = NotFound desc = service frontend not found" rpc=/docker.swarmkit.v1.Control/GetService
$ docker pull python:3.6.4-stretch
3.6.4-stretch: Pulling from library/python
c73ab1c6897b: Pull complete
1ab373b3deae: Downloading [=============================================> ] 10.13MB/11.11MB
b542772b4177: Download complete
57c8de432dbe: Download complete
1ab373b3deae: Pull complete
b542772b4177: Pull complete
57c8de432dbe: Pull complete
1785850988c5: Pull complete
676ef2d8682b: Pull complete
56321bcc2d38: Pull complete
4788c366a216: Pull complete
0d970fbfeb26: Pull complete
Digest: sha256:db22cb78ba16cb6a0632eead1e48a239636a5a77c9f8cf343087acf309ad0248
Status: Downloaded newer image for python:3.6.4-stretch
Time: 0h:05m:33s
Download hangs with a probability of 80% or more like above ouput. Then hold this state for 5 minutes and pull will succeed when retry starts.
For more detail, this problem occurs in three ubuntu pc.
Two are Ubuntu 16.04 and one is 18.04. All machine are on the same office network.
At first I tried changing the docker and ubuntu versions but it failed. service docker restart was also useless. I noticed that I installed a new gigabit switch hub(https://iptime.com/iptime/?page_id=11&pf=12&page=2&pt=311&pd=1), and I suspected the hub device. It works well when machine connects directly to the LAN without a switching hub and When I changed the switching hub to the old 100Mb/s thing, it worked well also.
It can be judged as a problem of the gigabit switching hub, but it is difficult to find out because all other internet use is work well with gigabit switching hub. So I wonder if there is no other problem with docker pull or there is no other solution.
$ uname -a
Linux my-ubuntu18.04 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:57 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:16:44 2018
OS/Arch: linux/amd64
Experimental: false
Please search failed keyword in below my docker daemon log.
12:13:32 level=debug msg="Calling GET /_ping"
12:13:32 level=debug msg="Calling GET /v1.39/info"
12:13:32 level=debug msg="Calling POST /v1.39/images/create?fromImage=python&tag=3.6.6-stretch"
12:13:32 level=debug msg="Trying to pull python from https://registry-1.docker.io v2"
12:13:35 level=debug msg="Pulling ref from V2 registry: python:3.6.6-stretch"
12:13:35 level=debug msg="docker.io/library/python:3.6.6-stretch resolved to a manifestList object with 7 entries; looking for a unknown/amd64 match"
12:13:35 level=debug msg="found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:c306863aa2e858ccf00958c625ca2ffdbf8845da76e266b0b0d9c4760170aff3"
12:13:36 level=debug msg="Layer already exists: bc9ab73e5b14"
12:13:36 level=debug msg="Layer already exists: 193a6306c92a"
12:13:36 level=debug msg="Layer already exists: e5c3f8c317dc"
12:13:36 level=debug msg="Layer already exists: a587a86c9dcb"
12:13:36 level=debug msg="pulling blob \"sha256:72744d0a318b0788001cc4f5f83c6847ba4b753307fadd046b508bbc41eb9e29\""
12:13:36 level=debug msg="pulling blob \"sha256:6598fc9d11d10365ac9281071a87930a2382ee31d026f1b6d432717b31db387c\""
12:13:37 level=debug msg="Downloaded 6598fc9d11d1 to tempfile /var/lib/docker/tmp/GetImageBlob402585872"
12:13:37 level=debug msg="pulling blob \"sha256:4b1d9004d467b4e710d770a881df027df7e5e7e4629f6e473760893ffc1a667f\""
12:13:40 level=debug msg="Downloaded 72744d0a318b to tempfile /var/lib/docker/tmp/GetImageBlob083083061"
12:13:40 level=debug msg="pulling blob \"sha256:93612f47cdc374d0b33057b9e71eac173ac469da3e1a631dc8a32ba6986a408a\""
12:13:40 level=debug msg="Applying tar in /var/lib/docker/overlay2/9eaab31d9a1f108ba8a5c712cf23f36a9097140d09c76e6b966667fba2cc014b/diff" storage-driver=overlay2
12:13:42 level=debug msg="Downloaded 93612f47cdc3 to tempfile /var/lib/docker/tmp/GetImageBlob281534658"
12:13:42 level=debug msg="pulling blob \"sha256:1bc4b4b508703799ef67a807dacce4736045e642e87bcd49871cd0f23e7f5b8b\""
12:13:43 level=debug msg="Downloaded 1bc4b4b50870 to tempfile /var/lib/docker/tmp/GetImageBlob872144708"
12:13:48 level=debug msg="Applied tar sha256:9978d084fd771e0b3d1acd7f3525d1b25288ababe9ad8ed259b36101e4e3addd to 9eaab31d9a1f108ba8a5c712cf23f36a9097140d09c76e6b966667fba2cc014b, size: 556457027"
12:13:48 level=debug msg="Applying tar in /var/lib/docker/overlay2/3f91f78b3bb3f2cb6096472759bb84ae2f30f0825a7f935cad3b420c5cd71bee/diff" storage-driver=overlay2
12:13:48 level=debug msg="Applied tar sha256:2f4f74d3821ecbdd60b5d932452ea9e30cecf902334165c4a19837f6ee636377 to 3f91f78b3bb3f2cb6096472759bb84ae2f30f0825a7f935cad3b420c5cd71bee, size: 16849952"
12:18:46 level=error msg="Download failed, retrying: read tcp 10.251.12.218:48728->104.18.121.25:443: read: connection timed out"
12:18:51 level=debug msg="pulling blob \"sha256:4b1d9004d467b4e710d770a881df027df7e5e7e4629f6e473760893ffc1a667f\""
12:18:51 level=debug msg="attempting to resume download of \"sha256:4b1d9004d467b4e710d770a881df027df7e5e7e4629f6e473760893ffc1a667f\" from 20499209 bytes"
12:18:53 level=debug msg="Downloaded 4b1d9004d467 to tempfile /var/lib/docker/tmp/GetImageBlob954105135"
12:18:53 level=debug msg="Applying tar in /var/lib/docker/overlay2/458b54b72a80967b2ba5dfca870ed5de222677fc98910538674fbf15ce958dda/diff" storage-driver=overlay2
12:18:54 level=debug msg="Applied tar sha256:003bb6178bc3218242d73e51d5e9ab2f991dc607780194719c6bd4c8c412fe8c to 458b54b72a80967b2ba5dfca870ed5de222677fc98910538674fbf15ce958dda, size: 65191894"
12:18:54 level=debug msg="Applying tar in /var/lib/docker/overlay2/0f4a3bdc5aa6c4428d3368143b0b26c92dd19e12c7c536d20a95a3fdc8a221d3/diff" storage-driver=overlay2
12:18:54 level=debug msg="Applied tar sha256:15b32d849da2239b1af583f9381c7a75d7aceba12f5ddfffa7a059116cf05ab9 to 0f4a3bdc5aa6c4428d3368143b0b26c92dd19e12c7c536d20a95a3fdc8a221d3, size: 32"
12:18:54 level=debug msg="Applying tar in /var/lib/docker/overlay2/c7905eabea23cd147b6772ce255d536b0cdcb759d4387b8282259b338d392c34/diff" storage-driver=overlay2
12:18:54 level=debug msg="Applied tar sha256:6e5c5f6bf043bc634378b1e4b61af09be74741f2ac80204d7a373713b1fd5a40 to c7905eabea23cd147b6772ce255d536b0cdcb759d4387b8282259b338d392c34, size: 5918893"
I have a docker image that I can run properly in my local VM. Everything runs fine.
I save the image and load it in the prod server.
I can see the image by using docker images
next I try to run it with docker run -p 9191:9191 myservice
It hangs and eventually times out.
The log shows the following:
time="2018-08-15T16:14:35.058232400-07:00" level=debug msg="HCSShim::CreateContainer succeeded id=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152 handle=
39044064"
time="2018-08-15T16:14:35.058232400-07:00" level=debug msg="libcontainerd: Create() id=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152, Calling start()"
time="2018-08-15T16:14:35.058232400-07:00" level=debug msg="libcontainerd: starting container b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:14:35.058232400-07:00" level=debug msg="HCSShim::Container::Start id=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:18:25.393050900-07:00" level=debug msg="Result: {\"Error\":-2147023436,\"ErrorMessage\":\"This operation returned because the timeout period expired.\
"}"
time="2018-08-15T16:18:25.394050800-07:00" level=error msg="libcontainerd: failed to start container: container b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db7
9e4152 encountered an error during Start: failure in a Windows system call: This operation returned because the timeout period expired. (0x5b4)"
time="2018-08-15T16:18:25.394050800-07:00" level=debug msg="HCSShim::Container::Terminate id=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:18:25.394050800-07:00" level=debug msg="libcontainerd: cleaned up after failed Start by calling Terminate"
time="2018-08-15T16:18:25.394050800-07:00" level=error msg="Create container failed with error: container b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152
encountered an error during Start: failure in a Windows system call: This operation returned because the timeout period expired. (0x5b4)"
time="2018-08-15T16:18:25.424053800-07:00" level=debug msg="attach: stdout: end"
time="2018-08-15T16:18:25.425055000-07:00" level=debug msg="attach: stderr: end"
time="2018-08-15T16:18:25.427054100-07:00" level=debug msg="Revoking external connectivity on endpoint boring_babbage (b20f403df0ed25ede9152f77eb0f8e049677f1279b68862a25b
b9e2ab94babfb)"
time="2018-08-15T16:18:25.459087300-07:00" level=debug msg="[DELETE]=>[/endpoints/31e66619-5b57-47f2-9256-bbba54510e3b] Request : "
time="2018-08-15T16:18:25.548068700-07:00" level=debug msg="Releasing addresses for endpoint boring_babbage's interface on network nat"
time="2018-08-15T16:18:25.548068700-07:00" level=debug msg="ReleaseAddress(172.25.224.0/20, 172.25.229.142)"
time="2018-08-15T16:18:25.561064000-07:00" level=debug msg="WindowsGraphDriver Put() id b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:18:25.561064000-07:00" level=debug msg="hcsshim::UnprepareLayer flavour 1 layerId b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:18:25.566074800-07:00" level=debug msg="hcsshim::UnprepareLayer succeeded flavour 1 layerId=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db7
9e4152"
time="2018-08-15T16:18:25.566074800-07:00" level=debug msg="hcsshim::DeactivateLayer Flavour 1 ID b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e4152"
time="2018-08-15T16:18:25.668075600-07:00" level=debug msg="hcsshim::DeactivateLayer succeeded flavour=1 id=b928213e42b2103cd33b676ed08a15529be10fffcfd7e709af86df5db79e41
52"
I can see when it is trying to create the container and then it fails.
But why?
added more information
I finally found out how to check the server status for running container and I am getting this error message:
So it means the server doesn't have a network gateway?
How can I fix this problem?
Still keep looking
More information
I did delete all NAT and create a new one, so the online check passed now.
However, I still encounter other errors and can't run the image.
Something in the virtual network is wrong, I just can't find the right information to fix it.... :(
We are trying to set up a Docker repository in Nexus OSS (v3.3.2-02) in a Kubernetes cluster, and having issues logging in to it. We are intending to have a proxy set up for DockerHub, a private repo, and a group repo to tie the two together, using the below configurations
Hosted
Proxy
Group
giving us the following list:
But when I try to log in to the repository, it appears it's trying to forward me to a /v2 endpoint, which is throwing a 404 error:
> docker login -u <user> -p <pass> https://repo.myhost.com:443
Error response from daemon: login attempt to https://repo.myhost.com:443/v2/ failed with status: 404 Not Found
I would like to add that we have Maven and NPM repositories set up in this same instance and they're working, so it appears Nexus itself is OK, but there's something wrong with the Docker configuration.
I don't know why this request is trying to send me to the /v2 endpoint when trying to log in. What am I missing?
Docker requires very specific URL layout and does not allow for any context URL hence the need for Docker connectors to allow Docker client to connect to NXRM. Your screenshot shows you have configured Docker connector for your Docker hosted repository on port 444, but your terminal capture shows you're attempting to connect on port 443 which isn't your Docker connector port. The error message you have suggest your NXRM server indeed runs on port 443, but because of how Docker works you need to access it using port 444. Please try: docker login -u <user> -p <pass> https://repo.myhost.com:444 so it attempts to use your Docker connector port. Also, it's always a good idea to run the latest version of Nexus.
In an experiment I just ran (docker-machine, virtualbox, macOS), when the server was 1.13.1 (as was the docker cli), it made a graceful degradation from /v2 down to /v1, like so:
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /v1.26/version"
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /v1.26/version"
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /v1.26/info"
level=debug msg="Calling POST /v1.26/auth"
level=debug msg="attempting v2 login to registry endpoint https://192.168.2.103:9999/v2/"
level=info msg="Error logging in to v2 endpoint, trying next endpoint: Get https://192.168.2.103:9999/v2/: EOF"
level=debug msg="attempting v1 login to registry endpoint https://192.168.2.103:9999/v1/"
level=info msg="Error logging in to v1 endpoint, trying next endpoint: Get https://192.168.2.103:9999/v1/users/: dial tcp 192.168.2.103:9999: getsockopt: connection refused"
level=error msg="Handler for POST /v1.26/auth returned error: Get https://192.168.2.103:9999/v1/users/: dial tcp 192.168.2.103:9999: getsockopt: connection refused"
but after I upgraded the server to 17.06.0-ce (still with 1.13.1 cli), it only attempted /v2 and then quit:
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /v1.30/version"
level=debug msg="Calling GET /_ping"
level=debug msg="Calling GET /v1.30/info"
level=debug msg="Calling POST /v1.30/auth"
level=debug msg="attempting v2 login to registry endpoint https://192.168.2.103:9999/v2/"
level=info msg="Error logging in to v2 endpoint, trying next endpoint: Get https://192.168.2.103:9999/v2/: tls: oversized record received with length 21584"
level=error msg="Handler for POST /v1.30/auth returned error: Get https://192.168.2.103:9999/v2/: tls: oversized record received with length 21584"
So the answer appears to be that one either needs to teach Nexus to respond correctly to the /v2 endpoints (as it really should be doing already), or downgrade the dockerd back down to a version that speaks the /v1 api if that is the behavior you're after
Not sure if this is going to help, but the browser based URL does not have port number in it, and could login with my credentials. Example browser based URL below.
https://nexus.mysite.net/
However I had to key in the following
docker login -u -p https://nexus.mysite.net/
I am greeted with the following
Error response from daemon: login attempt to https://nexus.mysite.net/v2/ failed with status: 404 Not Found
Giving the right port number did not show up the above error and I could login from the CLI as follows.
docker login -u the-user-name -p the-password https://nexus.mysite.net:7000
(in my case the correct port number was 7000).
Hope this helps.