Error syncing pod,failed for registry.access.redhat.com (Kubernetes) - docker

kubectl create -f web.yml
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
httpd 0/1 ContainerCreating 0 1h kube-node2
[root#kube-master pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
httpd 0/1 ContainerCreating 0 1h kube-node2
[root#kube-master pods]# kubectl describe pods httpd Name: httpd
Namespace: default Node: kube-node2/10.10.0.102 Start Time: Mon, 30
Oct 2017 17:47:38 +0600 Labels: app=webserver Status: Pending IP:
Controllers: Containers: httpd:
Container ID:
Image: webserver
Image ID:
Port: 80/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts:
Environment Variables: Conditions: Type Status Initialized True Ready False PodScheduled True No volumes.
QoS Class: BestEffort Tolerations: Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ ------- 1h 5m 16 {kubelet kube-node2}
Warning FailedSync Error syncing
pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull:
"image pull failed for
registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may
be because there are no credentials on this request. details: (open
/etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such
file or directory)"
1h 8s 271 {kubelet kube-node2} Warning FailedSync Error syncing
pod, skipping: failed to "StartContainer" for "POD" with
ImagePullBackOff: "Back-off pulling image
\
"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""
registry should go to hub.docker but here says
Error syncing pod, skipping: failed to "StartContainer" for "POD" with
ErrImagePull: "image pull failed for
registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may
be because there are no credentials on this request. details: (open
/etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such
file or directory)"
Why ?
Please give me solution

I encounter the same problem, and i found that i not install rhsm related software on machine, you can execute command "yum install rhsm" to solve this problem.

For me the file /etc/rhsm/ca/redhat-uep.pem was missing.
I had to uninstall and reinstall docker/kubernetes on the minion to get the file back and it worked again. What a pain.
My environment is on CentOS Linux release 7.4.1708
And these rpms.
kubernetes-master-1.5.2-0.7.git269f928.el7.x86_64
kubernetes-1.5.2-0.7.git269f928.el7.x86_64
kubernetes-node-1.5.2-0.7.git269f928.el7.x86_64
kubernetes-client-1.5.2-0.7.git269f928.el7.x86_64
docker-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
docker-client-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
docker-common-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
There ins no rhsm in CentOS.
This post hast the alternative to rhsm for CentOS.

I met the same issue on centos 7.
yum install rhsm is not working for me giving the following output:
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.ustc.edu.cn
* extras: mirrors.zju.edu.cn
* updates: centos.ustc.edu.cn
No package rhsm available.
Error: Nothing to do
But yum install subscription-manager works well for me.

Related

Helm fetches different "latest" image different than Docker pull's latest image

A different image is fetched with the same image name Helm does a chart install than if I do an ordinary docker pull. My values.yaml has
image:
repository: gcr.io/rsi-api-test/rsi-api
tag: latest
pullPolicy: IfNotPresent
The only image in the container registry is
gcr.io/rsi-api-test/rsi-
api#sha256:473bd9e31df8fd5d5424e11a57cabfd308179c8507683d6b9219c804bb315116
But helm somehow finds this image with this code:
gcr.io/rsi-api-test/rsi-api#sha256:c5cc78caa54ac4cf855c5fdb6a3448ff74ab641581fcda35d3e4e245c3154766
I believe it found some old version and for some reason refuses to get the latest version. Where is the "cached" or local collection of repositories used by helm and how can I force it cleaned?
The registry is a GCP Artifact Registry
>helm version
Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}
(base)
UPDATE, CHANGED TAGGING
I edited the tags in the image in GCP, removing both the "latest" and "v0.1.0" tag and adding tag "v1.0.0". I changed the respective values in the Values.yaml file (above) to
image:
repository: gcr.io/rsi-api-test/rsi-api
tag: 1.0.0
pullPolicy: IfNotPresent
Then, I did
docker image pull gcr.io/rsi-api-test/rsi-api:latest
and confirmed there was no more latest (I'm not sure where it's getting these images from).
then
docker run --detached gcr.io/rsi-api-test/rsi-api:1.0.0
Docker fetched the 1.0.0 version, and it ran as I expected/wanted.
As for helm, I re-ran helm install, and it didn't work due to an image error (kubectl logs shown below).
helm install rsapi
>kubectl logs pod/sullen-lemur-rsiapi-7b8d6d656c-7xmxh
Error from server (BadRequest): container "rsiapi" in pod "sullen-lemur-rsiapi-7b8d6d656c-7xmxh" is waiting to start: trying and failing to pull image
(base)
id=woodsman pwd= ~/rsi/backend/rsiapi git=(master) history-id=1494$
>^Cep -R "\/v2" * |less
(base)
id=woodsman pwd= ~/rsi/backend/rsiapi git=(master) history-id=1494$
>kubectl describe pod/sullen-lemur-rsiapi-7b8d6d656c-7xmxh
Name: sullen-lemur-rsiapi-7b8d6d656c-7xmxh
Namespace: default
Priority: 0
Node: gke-helm-cluster-default-pool-7c542461-sdpv/10.128.0.6
Start Time: Thu, 26 May 2022 23:19:00 -0400
Labels: app.kubernetes.io/instance=sullen-lemur
app.kubernetes.io/name=rsiapi
pod-template-hash=7b8d6d656c
Annotations: <none>
Status: Pending
IP: 10.72.1.13
IPs:
IP: 10.72.1.13
Controlled By: ReplicaSet/sullen-lemur-rsiapi-7b8d6d656c
Containers:
rsiapi:
Container ID:
Image: gcr.io/rsi-api-test/rsi-api:1.0.0
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Liveness: http-get http://:http/ delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=20s timeout=1s period=10s #success=1 #failure=3
Environment:
SPRING_CLOUD_KUBERNETES_CONFIG_NAME: sullen-lemur-rsiapi
MANAGEMENT_ENDPOINT_RESTART_ENABLED: true
SPRING_CLOUD_KUBERNETES_RELOAD_ENABLED: true
SPRING_CLOUD_KUBERNETES_RELOAD_STRATEGY: refresh
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fvgg8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-fvgg8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned default/sullen-lemur-rsiapi-7b8d6d656c-7xmxh to gke-helm-cluster-default-pool-7c542461-sdpv
Normal Pulling 12m (x4 over 13m) kubelet Pulling image "gcr.io/rsi-api-test/rsi-api:1.0.0"
Warning Failed 12m (x4 over 13m) kubelet Failed to pull image "gcr.io/rsi-api-test/rsi-api:1.0.0": rpc error: code = NotFound desc = failed to pull and unpack image "gcr.io/rsi-api-test/rsi-api:1.0.0": failed to resolve reference "gcr.io/rsi-api-test/rsi-api:1.0.0": gcr.io/rsi-api-test/rsi-api:1.0.0: not found
Warning Failed 12m (x4 over 13m) kubelet Error: ErrImagePull
Warning Failed 11m (x6 over 13m) kubelet Error: ImagePullBackOff
Normal BackOff 3m32s (x42 over 13m) kubelet Back-off pulling image "gcr.io/rsi-api-test/rsi-api:1.0.0"
(base)
id=woodsman pwd= ~/rsi/backend/rsiapi git=(master) history-id=1495$
>helm ls
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
sullen-lemur 1 Thu May 26 23:19:00 2022 DEPLOYED rsiapi-0.1.0 1.0 default
The same image id was tied up in two SHA codes:
> docker image rm 8d0bfa85e8f9
Error response from daemon: conflict: unable to delete 8d0bfa85e8f9
(must be forced) - image is referenced in multiple repositories
>docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/rsi-api-test/rsi-api latest 8d0bfa85e8f9 10 hours ago 379MB
gcr.io/rsi-api-test/rsi-api v1.0.0 8d0bfa85e8f9 10 hours ago 379MB
>docker image rm -f 8d0bfa85e8f9
Untagged: gcr.io/rsi-api-test/rsi-api:latest
Untagged: gcr.io/rsi-api-test/rsi-api:v1.0.0
Untagged: gcr.io/rsi-api-test/rsi-api#sha256:473bd9e31df8fd5d5424e11a57cabfd308179c8507683d6b9219c804bb315116
Deleted: sha256:8d0bfa85e8f929221a7c6b66e5fd6008151e496407ed9e74072dd3e02314ad12
BONUS points to suggest a way/policy to increment the version tag in for helm. This is a Github Workflow Maven build of a spring boot application.
Also, I'm running Helm on my personal Linux machine, but want it to target a GCP cluster. However, I also tried installing helm and using it on a Minikube installation. What do I need to do to make sure I fully switch?

Kubernetes GPU Pod error : validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

When trying to create Pods that can use GPU, I get the error "exec: "nvidia-smi": executable file not found in $PATH" ".
To explain the error from the beginning, my main goal was to create JupyterHub enviroments that can use GPU. I installed Zero to JupyterHub for Kubernetes. I followed these steps to be able to use GPU. When I check my nodes GPUs seems schedulable by Kubernetes. So far everything seemed fine.
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
arge-server 1
However, when I logged in to JupyetHub and tried to open the profile using GPU, I got an error: [Warning] 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. So, I checked the Pods and I found that they were all in the "Waiting: PodInitializing" state.
kubectl get pods -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
nvidia-dcgm-x5rqs 0/1 Init:0/1 2 6d20h
nvidia-device-plugin-daemonset-jhjhb 0/1 Init:0/1 0 6d20h
gpu-feature-discovery-pd4xv 0/1 Init:0/1 2 6d20h
nvidia-dcgm-exporter-7mjgt 0/1 Init:0/1 2 6d20h
nvidia-operator-validator-9xjmv 0/1 Init:Error 10 26m
After that, I took a closer look at the Pod nvidia-operator-validator-9xjmv, which was the beginning of the error, and I saw that the toolkit-validation container was throwing a CrashLoopBackOff error. Here is the relevant part of the log:
kubectl describe pod nvidia-operator-validator-9xjmv -n gpu-operator-resources
Name: nvidia-operator-validator-9xjmv
Namespace: gpu-operator-resources
.
.
.
Controlled By: DaemonSet/nvidia-operator-validator
Init Containers:
.
.
.
toolkit-validation:
Container ID: containerd://e7d004f0809cbefdae5407ea42eb659972ea7eefa5dd6e45e968cbf3ed22bf2e
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator#sha256:a07fd1c74e3e469ac316d17cf79635173764fdab3b681dbc282027a23dbbe227
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 18 Nov 2021 12:55:00 +0300
Finished: Thu, 18 Nov 2021 12:55:00 +0300
Ready: False
Restart Count: 16
Environment:
WITH_WAIT: false
COMPONENT: toolkit
Mounts:
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hx7ls (ro)
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58m default-scheduler Successfully assigned gpu-operator-resources/nvidia-operator-validator-9xjmv to arge-server
Normal Pulled 58m kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 58m kubelet Created container driver-validation
Normal Started 58m kubelet Started container driver-validation
Normal Pulled 56m (x5 over 58m) kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 56m (x5 over 58m) kubelet Created container toolkit-validation
Normal Started 56m (x5 over 58m) kubelet Started container toolkit-validation
Warning BackOff 3m7s (x255 over 58m) kubelet Back-off restarting failed container
Then, I looked at the logs of the container and I got the following error.
kubectl logs -n gpu-operator-resources -f nvidia-operator-validator-9xjmv -c toolkit-validation
time="2021-11-18T09:29:24Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
toolkit is not ready
For similar issues, it was suggested to delete the failed Pod and deployment. However, doing these did not fix my problem. Do you have any suggestions?
I have;
Ubuntu 20.04
Kubernetes v1.21.6
Docker 20.10.10
NVIDIA-SMI 470.82.01
CUDA 11.4
CPU: Intel Xeon E5-2683 v4 (32) # 2.097GHz
GPU: NVIDIA GeForce RTX 2080 Ti
Memory: 13815MiB / 48280MiB
Thanks in advance.
In case you're are still having the issue, we just had the same issue on our cluster, the "dirty" fix is to do that:
rm /run/nvidia/driver
ln -s / /run/nvidia/drive
kubectl delete pod -n gpu-operator nvidia-operator-validator-xxxxx
The reason is the init pod of the nvidia-operator-validator try to execute nvidia-smi within a chroot from /run/nvidia/driver .. which is a tmpfs (so doesn't persist accross reboot) and is not populated when performing a manual install of the drivers.
Do hope for a better fix from Nvidia.

CrashLoopBackOff while deploying pod using image from private registry

I am trying to create a pod using my own docker image on localhost.
This is the dockerfile used to create the image :
FROM centos:8
RUN yum install -y gdb
RUN yum group install -y "Development Tools"
CMD ["/usr/bin/bash"]
The yaml file used to create the pod is this :
---
apiVersion: v1
kind: Pod
metadata:
name: server
labels:
app: server
spec:
containers:
- name: server
imagePullPolicy: Never
image: localhost:5000/server
ports:
- containerPort: 80
root#node1:~/test/server# docker images | grep server
server latest 82c5228a553d 3 hours ago 948MB
localhost.localdomain:5000/server latest 82c5228a553d 3 hours ago 948MB
localhost:5000/server latest 82c5228a553d 3 hours ago 948MB
The image has been pushed to localhost registry.
Following is the error I receive.
root#node1:~/test/server# kubectl get pods
NAME READY STATUS RESTARTS AGE
server 0/1 CrashLoopBackOff 5 5m18s
The output of describe pod :
root#node1:~/test/server# kubectl describe pod server
Name: server
Namespace: default
Priority: 0
Node: node1/10.0.2.15
Start Time: Mon, 07 Dec 2020 15:35:49 +0530
Labels: app=server
Annotations: cni.projectcalico.org/podIP: 10.233.90.192/32
cni.projectcalico.org/podIPs: 10.233.90.192/32
Status: Running
IP: 10.233.90.192
IPs:
IP: 10.233.90.192
Containers:
server:
Container ID: docker://c2982e677bf37ff11272f9ea3f68565e0120fb8ccfb1595393794746ee29b821
Image: localhost:5000/server
Image ID: docker-pullable://localhost.localdomain:5000/server#sha256:6bc8193296d46e1e6fa4cb849fa83cb49e5accc8b0c89a14d95928982ec9d8e9
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 07 Dec 2020 15:41:33 +0530
Finished: Mon, 07 Dec 2020 15:41:33 +0530
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tb7wb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-tb7wb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tb7wb
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m default-scheduler Successfully assigned default/server to node1
Normal Pulled 4m34s (x5 over 5m59s) kubelet Container image "localhost:5000/server" already present on machine
Normal Created 4m34s (x5 over 5m59s) kubelet Created container server
Normal Started 4m34s (x5 over 5m59s) kubelet Started container server
Warning BackOff 56s (x25 over 5m58s) kubelet Back-off restarting failed container
I get no logs :
root#node1:~/test/server# kubectl logs -f server
root#node1:~/test/server#
I am unable to figure out whether the issue is with the container or yaml file for creating pod. Any help would be appreciated.
Posting this as Community Wiki.
As pointed by #David Maze in comment section.
If docker run exits immediately, a Kubernetes Pod will always go into CrashLoopBackOff state. Your Dockerfile needs to COPY in or otherwise install and application and set its CMD to run it.
Root cause can be also determined by Exit Code. In 3) Check the exit code article, you can find a few exit codes like 0, 1, 128, 137 with description.
3.1) Exit Code 0
This exit code implies that the specified container command completed ‘sucessfully’, but too often for Kubernetes to accept as working.
In short story, your container was created, all action mentioned was executed and as there was nothing else to do, it exit with Exit Code 0.
A CrashLoopBackOff error occurs when a pod startup fails repeatedly in Kubernetes.`
Your image based on centos with few additional installations did not have any process in backgroud left, so it was categorized as Completed. As this happen so fast, kubernetes restarted it and it fall in loop.
$ kubectl run centos --image=centos
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
centos 0/1 CrashLoopBackOff 1 5s
centos 0/1 Completed 2 17s
centos 0/1 CrashLoopBackOff 2 31s
centos 0/1 Completed 3 46s
centos 0/1 CrashLoopBackOff 3 58s
centos 1/1 Running 4 88s
centos 0/1 Completed 4 89s
centos 0/1 CrashLoopBackOff 4 102s
$ kubectl describe po centos | grep 'Exit Code'
Exit Code: 0
But when you have used sleep 3600, in your container, command sleep was executing for hour. After this time it would also exit with Exit Code 0.
Hope it clarified.

Kubernetes cannot pull from insecure registry ans cannot run container from local image on offline cluster

I am working on a offline cluster (machines have no internet access), deploying docker images using ansible and docker compose scripts.
My servers are Centos7.
I have set up an insecure docker registry on the machines. We are going to change environnement, and I am installing kubernetes in order to manage my pull of container.
I follow this guide to install kubernetes:
https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services
After the installation, I tried to launch a testing pod. here is the yml for the pod, launching with
kubectl -f create nginx.yml
here the yml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: [my_registry_addr]:[my_registry_port]/nginx:v1
ports:
- containerPort: 80
I used kubectl describe to get more information on what was wrong:
Name: nginx
Namespace: default
Node: [my node]
Start Time: Fri, 15 Sep 2017 11:29:05 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Container ID:
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
Image ID:
Port: 80/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nginx to [my kubernet node]
1m 1m 2 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_addr]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
54s 54s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"[my_registry_addr]:[my_registry_port]\""
8s 8s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Network timed out while trying to connect to https://index.docker.io/v1/repositories/library/[my_registry_addr]/images. You may want to check your internet connection or if you are behind a proxy."
then, I go to my node and use journalctl -xe
sept. 15 11:22:02 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:02.350930396+02:00" level=info msg="{Action=create, LoginUID=4294967295, PID=11555}"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351536727+02:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351606330+02:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:32 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:32.353946452+02:00" level=error msg="Not continuing with pull after error: Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354309 11555 docker_manager.go:2161] Failed to create pod infra container: ErrImagePull; Skipping pod "nginx_default(8b5c40e5-99f4-11e7-98db-f8bc12456ee4)": Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354390 11555 pod_workers.go:184] Error syncing pod 8b5c40e5-99f4-11e7-98db-f8bc12456ee4, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:44 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:44.350708175+02:00" level=error msg="Handler for GET /v1.24/images/[my_registry_ip]:[my_registry_port]/json returned error: No such image: [my_registry_ip]:[my_registry_port]"
I sure thant my docker configuration is good, cause I am using it every day with ansible or mesos.
docker version is 1.12.6, kubernetes version is 1.5.2
What can I do now? I didn't find any configuration key for this usage.
When I saw that pulling was failing, I manually pull the image on all the nodes. I put a tag to ensure that kubernetes will to try to pull as default, and set " imagePullPolicy: IfNotPresent "
The syntax for specifying the docker image is :
[docker_registry]/[image_name]:[image_tag]
In your manifest file, you have used ":" to separate docker repository host and the port the repository is listening on. The default port for docker private registry I guess is 5000.
So change your image declaration from
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
to
Image: [my_registry_addr]/nginx:v1
Also, check the network connectivity from the worker node to your docker registry by doing a ping.
ping [my_registry_addr]
If you still want to check if the port 443 is opened on the registry you can do a tcp check on that port on the host running docker registry
curl telnet://[my_registry_addr]:443
Hope that helps.
I finally find what was the problem.
To work, Kubernetes need a pause container. Kubernetes was trying to find the pause container on the internet.
I deployed a custom pause container on my registry, I set up kubernetes pause container to this image.
After that, kubernetes is working like a charm.

Kubernetes does not pull docker image from private repository without https

I configured docker on the same host as my kubernetes-master for the private docker registry.Docker pushing to the private docker registry without https was successful. I also can pull the image just using docker.
When I run kubernetes for this image, I get with 'kubectl describe pods' following log :
kubectl describe pods
Name: fgpra-250514157-yh6vb
Namespace: default
Node: 5.179.232.64/5.179.232.64
Start Time: Tue, 11 Oct 2016 18:06:59 +0200
Labels: pod-template-hash=250514157,run=fgpra
Status: Pending
IP: <removed myself>
Controllers: ReplicaSet/fgpra-250514157
Containers:
fgpra:
Container ID:
Image: 5.179.232.65:5000/some_api_image
Image ID:
Port: 3000/TCP
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
default-token-q7u3x:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-q7u3x
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4s 4s 1 {default-scheduler } Normal Scheduled Successfully assigned fgpra-250514157-yh6vb to 5.179.232.64
4s 4s 1 {kubelet 5.179.232.64} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
4s 4s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Normal Pulling pulling image "5.179.232.65:5000/some_api_image"
4s 4s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Warning Failed Failed to pull image "5.179.232.65:5000/some_api_image": unable to ping registry endpoint https://5.179.232.65:5000/v0/
v2 ping attempt failed with error: Get https://5.179.232.65:5000/v2/: http: server gave HTTP response to HTTPS client
v1 ping attempt failed with error: Get https://5.179.232.65:5000/v1/_ping: http: server gave HTTP response to HTTPS client
4s 4s 1 {kubelet 5.179.232.64} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "fgpra" with ErrImagePull: "unable to ping registry endpoint https://5.179.232.65:5000/v0/\nv2 ping attempt failed with error: Get https://5.179.232.65:5000/v2/: http: server gave HTTP response to HTTPS client\n v1 ping attempt failed with error: Get https://5.179.232.65:5000/v1/_ping: http: server gave HTTP response to HTTPS client"
3s 3s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Normal BackOff Back-off pulling image "5.179.232.65:5000/some_api_image"
3s 3s 1 {kubelet 5.179.232.64} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "fgpra" with ImagePullBackOff: "Back-off pulling image \"5.179.232.65:5000/some_api_image\""
I already configured my /etc/init.d/sysconfig/docker to use my insecure private registry.
This is the command to start the kubernetes deployment :
kubectl run fgpra --image=5.179.232.65:5000/some_api_image --port=3000
How can I set kubernetes to pull from my private docker registry without using ssl?
This rather a docker issue than a kubernetes one. You need to add your http registry as a insecure-registry to your docker daemon on each kubernetes node.
docker daemon --insecure-registry=5.179.232.65:5000
In most environment there is a file like /etc/default/docker where you can add this parameter.

Resources