Docker push intermittent failure to private docker registry on kubernetes (docker-desktop)

Docker push intermittent failure to private docker registry on kubernetes (docker-desktop) - docker

I'm running a kubernetes cluster on docker-desktop (mac).
It has a local docker registry inside it.
I'm able to query the registry no problem via the API calls to get the list of tags.
I was able to push an image before, but it took multiple attempts to push.
I can't push new changes now. It looks like it pushes successfully for layers, but then doesn't acknowledge the layer has been pushed and then retries.
Repo is called localhost:5000 and I am correctly port forwarding as per instructions on https://blog.hasura.io/sharing-a-local-registry-for-minikube-37c7240d0615/
I'm ot using ssl certs as this is for development on local machine.
(The port forwarding is proven to work otherwise API call would fail)
e086a4af6e6b: Retrying in 1 second
35c20f26d188: Layer already exists
c3fe59dd9556: Pushing [========================> ] 169.3MB/351.5MB
6ed1a81ba5b6: Layer already exists
a3483ce177ce: Retrying in 16 seconds
ce6c8756685b: Layer already exists
30339f20ced0: Retrying in 1 second
0eb22bfb707d: Pushing [==================================================>] 45.18MB
a2ae92ffcd29: Waiting
received unexpected HTTP status: 502 Bad Gateway
workaround (this will suffice but not ideal, as have to build each container
apiVersion: v1
kind: Pod
metadata:
name: producer
namespace: aetasa
spec:
containers:
- name: kafkaproducer
image: localhost:5000/aetasa/cta-user-create-app
imagePullPolicy: Never // this line uses the built container in docker
ports:
- containerPort: 5005
Kubectl logs for registry
10.1.0.1 - - [20/Feb/2019:19:18:03 +0000] "POST /v2/aetasa/cta-user-create-app/blobs/uploads/ HTTP/1.1" 202 0 "-" "docker/18.09.2 go/go1.10.6 git-commit/6247962 kernel/4.9.125-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.2 \x5C(darwin\x5C))" "-"
2019/02/20 19:18:03 [warn] 12#12: *293 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000011, client: 10.1.0.1, server: localhost, request: "PATCH /v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D HTTP/1.1", host: "localhost:5000"
2019/02/20 19:18:03 [error] 12#12: *293 connect() failed (111: Connection refused) while connecting to upstream, client: 10.1.0.1, server: localhost, request: "PATCH /v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D HTTP/1.1", upstream: "http://10.104.68.90:5000/v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D", host: "localhost:5000"

Try configure --max-concurrent-uploads=1 for your docker client. You are pushing quite large layers (350MB), so probably you are hitting some limits (request sizes, timeouts) somewhere. Single concurrent upload may help you, but it is only a work around. Real solution will be configuration (buffer sizes, timeouts, ...) of registry + reverse proxy in front of registry eventually.

It may be a disk space issue. If you store docker images inside the Docker VM you can fill up the disk space quite fast.
By default, docker-desktop VM disk space is limited to 64 gigabytes. You can increase it up to 112Gb on the "Disk" tab in Docker Preferences.

I have encountered this issues quite few times and unfortunately couldn't get to the permanent fix.
Most likely the image should have been corrupted in the registry. As a work around, i suggest you delete the image from registry and do a fresh push. it would work and subsequent pushes would work too.
This issue must be related to the missing layers of the image. sometimes we delete the image using --force option, in that case it is possible that some of the common layers might get deleted and would affect other images that share the deleted layers.

Related

docker push to nexus registry (behind proxy) ends with EOF

I have tried a lot, but I can't find a solution to this problem.
I am running a nexus sonatype (3.21.1-01) docker image on a centos7 server behind a vthunder a10 proxy.
The docker login and pull works great but docker push fail with EOF after some retrying.
Here the interested routes:
docker image port 8081 > my.server:8081
docker image port 8443 > my.server:8443
proxy.domain.local:443 > my.server:8081
proxy.domain.local:8443 > my.server:8443
I have created a docker repository in nexus which have the http connector exposed on 8443
The proxy is exposed under ssl with self signed certificate
The client's /etc/docker/daemon.json file contains the insecure registry options:
"insecure-registries": ["proxy.domain.local:8443","proxy.domain.local"]
Here the situation:
If I try to push from the client an image of which all layers already exist on the remote server (but missing on nexus repository), it works.
If I try the same but adding some difference to the same image (such as a new LABEL), it fail in this way:
(9c27e219663c: Layer already exists
Patch https://proxy.domain.local:8443/v2/test4/blobs/uploads/6862fe60-d63b-4942-bbb6-f403307e677a: EOF)
If I push directly from my.server machine, pointing to localhost:8443 it works.
If i push from the client machine an image with new layers it fail in this way after some retrying (the same behavior with smaller images):
docker push proxy.domain.local:8443/ara
The push refers to repository [proxy.domain.local:8443/ara]
edb7a4f74e22: Retrying in 8 seconds
de421654540d: Retrying in 8 seconds
-------------
The push refers to repository [proxy.domain.local:8443/ara]
edb7a4f74e22: Pushing [==================================================>] 172.6MB/172.6MB
de421654540d: Pushing [==================================================>] 200.8MB/200.8MB
EOF
this is a summary of what happen in wireshark
the.client my.server HTTP 316 GET /v2/ HTTP/1.1
...
my.server the.client HTTP 654 HTTP/1.1 401 Unauthorized (application/json)
...
the.client my.server HTTP 442 HEAD /v2/alpine-test/blobs/sha256:95f5ecd24e438e09033c8e69ec136079f8774ab8284f1431f5433a829054b5e7 HTTP/
(asking to nexus if the image is already uploaded)
my.server the.client HTTP 493 HTTP/1.1 404 Not Found
(it isn't)
the.client my.server HTTP 437 POST /v2/alpine-test/blobs/uploads/ HTTP/1.1
(so it start to post the image)
my.server the.client HTTP 584 HTTP/1.1 202 Accepted
...
the.client my.server HTTP 437 POST /v2/alpine-test/blobs/uploads/ HTTP/1.1
...
my.server the.client HTTP 584 HTTP/1.1 202 Accepted
..
and so on with some FIN/ACK in the middle until the client stops to send it...
** on nexus server log there is absolutely no trace about this **
this is the nexus docker compose:
services:
nexus:
build:
context: .
args:
DOCKER_GID: ${DOCKER_GID}
NEXUS_UID: ${NEXUS_UID}
NEXUS_GID: ${NEXUS_GID}
restart: always
environment:
- NEXUS_UID_GID=${NEXUS_UID_GID}
- HOSTNAME_DOCKER_NEXUS=${HOSTNAME_DOCKER_NEXUS}
ports:
- "8081:8081"
- "8443:8443"
user: ${NEXUS_UID_GID}
hostname: ${HOSTNAME_DOCKER_NEXUS}
volumes:
- /var/nexus-data:/nexus-data
- /etc/hosts:/etc/hosts
- /var/run/docker.sock:/var/run/docker.sock
Can you help me?
I was thinking about a possibile nexus-docker-user permission issue on the local machine/docker binary permissions (if i try from localhost it works, yes, but the image is already stored on the system of course) - but I think it is not so probable.
I was thinking also about proxy configuration issue (more probable), but I don't know much about proxy.

[Workaround]
Because I can not figure out the problem, I ended up with make proxy transparent and configuring nexus to serve directly in https throught it's jetty.xml, jetty.https and nexus.properties.
Serving https directly from jetty instead of let the proxy upgrade the connection solved the above problem.

Docker pull fails for some layers from self-hosted Registry

I have encountered an issue with a self-hosted Docker Registry. When pulling a certain image, the pull fails for some of its layers. As soon as I start the docker pull command, some layers report:
9894d615bbeb: Retrying in 5 seconds
dfc282427f6f: Retrying in 5 seconds
8dbb865cf7b1: Retrying in 1 second
This happens immediately when the command is started, the layers are not large.
The registry has been working flawlessly for about a year.
Recently we adopted a more docker-centric CI/CD flow and the need to continuously clear up the registry arose. For that reason we run a nightly cleanup job of the registry:
Delete all manifests on the filesystem (except for a few persistent images we want to keep)
Run the bin/registry garbage-collect command to delete all non-referenced blobs
I have verified that blobs for these failing layers do indeed exist on the registry, I can navigate to them on the filesystem.
The issue reproduces both on local system and on a remote server
The docker registry logs show that the HTTP request for the blob was successful:
time="2019-05-03T13:09:21.714123801Z" level=info msg="response completed" go.version=go1.7.6 http.request.host=registry.example.com http.request.id=e5c01bee-a755-48a5-b87a-716560dd0e25 http.request.method=GET http.request.remoteaddr=94.237.28.78 http.request.uri="/v2/applications/foo/dist-staging/blobs/sha256:9894d615bbebb5b235bb5a7aed17e9b2ba35c95c9fc8c0c763476c057536842f" http.request.useragent="docker/17.05.0-ce go/go1.7.5 git-commit/89658be kernel/4.4.0-145-generic os/linux arch/amd64 UpstreamClient(Swipely/Docker-API 1.33.6)" http.response.contenttype="application/octet-stream" http.response.duration=8.154072ms http.response.status=200 http.response.written=0 instance.id=92dfad5e-bf76-4db6-a8de-07901539d36e service=registry version=v2.6.2
172.17.0.1 - - [03/May/2019:13:09:21 +0000] "GET /v2/applications/foo/dist-staging/blobs/sha256:9894d615bbebb5b235bb5a7aed17e9b2ba35c95c9fc8c0c763476c057536842f HTTP/1.0" 200 0 "" "docker/17.05.0-ce go/go1.7.5 git-commit/89658be kernel/4.4.0-145-generic os/linux arch/amd64 UpstreamClient(Swipely/Docker-API 1.33.6)"
The docker registry is behind an nginx proxy, but its settings have not been changed recently. While debugging, I tried the following, with no luck:
proxy_buffering off;
proxy_max_temp_file_size 0;
Is there anything else I should check? Can this be caused by the registry cleanup somehow? Why? How?
Edit
It seems that something was stale in the Registry cache, because restarting it started throwing unknown blob errors.
After rebuilding and re-pushing the image, this error went away and the Registry was able to serve the image to clients again.
I think that means that something got messed up during the registry cleanup, but why? garbage-collect should only remove non-referenced blobs and should be safe to use, if I understand it correctly?
This question took a turn but I will leave it here in its entirety for others.

POST larger than 400 Kilobytes payload to a container in Kubernetes fails

I'm using EKS (Kubernetes) in AWS and I have problems with posting a payload at around 400 Kilobytes to any web server that runs in a container in that Kubernetes. I hit some kind of limit but it's not a limit in size, it seems at around 400 Kilobytes many times works but sometimes I get (testing with Python requests)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
I test this with different containers (python web server on Alpine, Tomcat server on CentOS, nginx, etc).
The more I increase the size over 400 Kilobytes, the more consistent I get: Connection reset by peer.
Any ideas?

Thanks for your answers and comments, helped me get closer to the source of the problem. I did upgrade the AWS cluster from 1.11 to 1.12 and that cleared this error when accessing from service to service within Kubernetes. However, the error still persisted when accessing from outside the Kubernetes cluster using a public dns, thus the load balancer.
So after testing some more I found out that now the problem lies in the ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/
So I switched back to a Kubernetes service that generates an older-generation ELB and the problem was fixed. The ELB is not ideal, but it's a good work-around for the moment, until the ALB controller gets fixed or I have the right button to press to fix it.

As you mentioned in this answer that the issue might be caused by ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/.
Can you check if Nginx Ingress controller can be used with ALB ?
Nginx has a default value of request size set to 1Mb. It can be changed by using this annotation: nginx.ingress.kubernetes.io/proxy-body-size.
Also are you configuring connection-keep-alive or connection timeouts anywhere ?

The connection reset by peer, even between services inside the cluster, sounds like it may be the known issue with conntrack. The fix involves running the following:
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
And you can automate this with the following DaemonSet:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: startup-script
labels:
app: startup-script
spec:
template:
metadata:
labels:
app: startup-script
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
echo done

As this answer suggests, you may try to change you kube-proxy mode of operation. To edit your kube-proxy configs:
kubectl -n kube-system edit configmap kube-proxy
Search for mode: "" and try "iptables" , "userspace" or "ipvs". Each time you change your configmap, delete your kube-proxy pod(s) to make sure it is reading the new configmap.

we had a similar issue with Azure and its firewall which prevents to send more than 128KB as patch request.
After researching and thinking about the pro/cons on this approach within the team, our solution is a complete different one.
We put our "bigger" requests into a blob storage. Afterwards we put a message onto a queue with the filename created before. The queue will receive the message with the filename, reads the blob from the storage, converts it into whatever-you-need-to-have as object and is able to apply any business logic on this big object.
After processing the message, the file will be deleted.
The biggest advantage is that our API is not blocked with a big request and its long running job.
Maybe this can be another way to solve your issue within the kubernetes container.
See ya, Leonhard

Unable to push image to Docker Hub registry

I am brand new to Docker and I am trying to follow the Getting Started tutorial from Docker. I am using Docker 17.05-ce under Ubuntu 17.04. The problem appears to be network related. When I try to push I get the following results:
jonathan#poseidon:~/DockerTest$ sudo docker push jgossage/get-started:part1
The push refers to a repository [docker.io/jgossage/get-started]
1770f1c9a8cf: Pushed
61fd1d8cd138: Pushed
e0f735a5e86f: Layer already exists
1de570a07fb5: Pushed
b3640b6d4ac2: Layer already exists
08d4c9ccebfd: Pushed
007ab444b234: Retrying in 1 second
dial tcp: lookup registry-1.docker.io on 127.0.0.53:53: dial udp 127.0.0.53:53: i/o timeout
jonathan#poseidon:~/DockerTest$ sudo docker logs 58e8df0a7426
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
172.17.0.1 - - [20/Jun/2017 15:12:24] "GET / HTTP/1.1" 200 -
172.17.0.1 - - [20/Jun/2017 15:13:17] "GET / HTTP/1.1" 200 -
The push runs for some time with several retries before timing out.
This is on a home network with one computer connected to the router via WiFi and then normal TCP to my ISP and the Internet. What steps can I take to make Docker run reliably?

It looks like a DNS issue similar to this one: https://forums.docker.com/t/fata-0025-io-timeout-on-docker-image-push/1742/9
The suggestion is to replace your current DNS (127.0.0.53) by the Google DNS (8.8.8.8).
I'm not sure if there is an open issue concerning this problem. I couldn't find one.

I resolved this issue by replacing the standard DNS caching and resolving DNS server with a third party implementation unbound. The following web page contains complete instructions for doing this at the end of the document. As also suggested by others, it is a good idea to change to use the public Google DNS servers

Pod creation in ContainerCreating state always

I am trying to create a pod using kubernetes with the following simple command
kubectl run example --image=nginx
It runs and assigns the pod to the minion correctly but the status is always in ContainerCreating status due to the following error. I have not hosted GCR or GCloud on my machine. So not sure why its picking from there only.
1h 29m 14s {kubelet centos-minion1} Warning FailedSync Error syncing pod, skipping:
failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed
for gcr.io/google_containers/pause:2.0, this may be because there are no
credentials on this request. details: (unable to ping registry endpoint
https://gcr.io/v0/\nv2 ping attempt failed with error: Get https://gcr.io/v2/:
http: error connecting to proxy http://87.254.212.120:8080: dial tcp
87.254.212.120:8080: i/o timeout\n v1 ping attempt failed with error:
Get https://gcr.io/v1/_ping: http: error connecting to proxy
http://87.254.212.120:8080: dial tcp 87.254.212.120:8080: i/o timeout)

Kubernetes is trying to create a pause container for your pod; this container is used to create the pod's network namespace. See this question and its answers for more general information on the pause container.
To your specific error: Kubernetes tries to pull the pause container's image (which would be gcr.io/google_containers/pause:2.0, according to your error message) from the Google Container Registry (gcr.io). Apparently, your Docker engine tries to connect to GCR using a HTTP proxy located at 87.254.212.120:8080, to which it apparently cannot connect (i/o timeout).
To correct this error, either make sure that you HTTP proxy server is online and does not block HTTP requests to GCR, or (if you do have public Internet access) disable the proxy connection for your Docker engine (this would typically be done using the http_proxy and https_proxy environment variables, which would have been set in /etc/sysconfig/docker or /etc/default/docker, depending on your Linux distribution).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Docker push intermittent failure to private docker registry on kubernetes (docker-desktop) - docker

It may be a disk space issue. If you store docker images inside the Docker VM you can fill up the disk space quite fast. By default, docker-desktop VM disk space is limited to 64 gigabytes. You can increase it up to 112Gb on the "Disk" tab in Docker Preferences.

Related

docker push to nexus registry (behind proxy) ends with EOF

Docker pull fails for some layers from self-hosted Registry

POST larger than 400 Kilobytes payload to a container in Kubernetes fails

Unable to push image to Docker Hub registry

Pod creation in ContainerCreating state always

Categories

Resources