I have installed jenkins on a GKE cluster using the stable helm chart.
I am able to access and login to the UI.
However, when trying to run a simple job, the agent pod fails to be created.
The logs are not very informative on this
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.523+0000 [id=184] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-ld008, template=PodTemplate{inheritFrom='', name='default', slaveConnectTimeout=0, label='jenkins-kos-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins/agent', command='', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='500m', resourceRequestMemory='1Gi', resourceLimitCpu='4000m', resourceLimitMemory='8Gi', envVars=[ContainerEnvVar [getValue()=http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
jenkins-kos-58586644f9-vh278 jenkins java.lang.IllegalStateException: Pod has terminated containers: jenkins/default-ld008 (jnlp)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:166)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:187)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:127)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:132)
jenkins-kos-58586644f9-vh278 jenkins at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:290)
jenkins-kos-58586644f9-vh278 jenkins at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
jenkins-kos-58586644f9-vh278 jenkins at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.FutureTask.run(FutureTask.java:266)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jenkins-kos-58586644f9-vh278 jenkins at java.lang.Thread.run(Thread.java:748)
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.524+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-ld008
jenkins-kos-58586644f9-vh278 jenkins Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.559+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.560+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:56.009+0000 [id=53
Here are the kubernetes events
0s Normal Scheduled pod/default-zkwp4 Successfully assigned jenkins/default-zkwp4 to gke-kos-nodepool1-kq69
0s Normal Pulled pod/default-zkwp4 Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
0s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Normal Pulled pod/default-zkwp4 Container image "jenkins/jnlp-slave:3.27-1" already present on machine
0s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Normal Pulled pod/default-zkwp4 Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
1s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Warning Unhealthy pod/default-zkwp4 Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s Warning Unhealthy pod/default-zkwp4 Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s Normal Killing pod/default-zkwp4 Killing container with id docker://istio-proxy:Need to kill Pod
The TCP port for agent communication is fixed to 50000
Using jenkins/jnlp-slave:3.27-1 for the agent image.
Any ideas what might be causing this?
UPDATE 1: Here is a gist with the description of the failed agent.
UPDATE 2: Managed to pinpoint the actual error in the jnlp logs using stackdriver (although not aware of the root cause yet)
"SEVERE: Failed to connect to http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
UPDATE 3: Here comes the weird(est) part: from a pod I spin up within the jenkins namespace:
/ # dig +short jenkins-kos.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-kos.jenkins.svc.cluster.local 8080
jenkins-kos.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/
Jenkins
UPDATE 4: I can confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using 1.1.15
You can disable the sidecar proxy for agents.
Go to Manage Jenkins -> Configuration -> Kubernetes Cloud.
Select Annotations options and enter the below annotation value.
sidecar.istio.io/inject: "false"
Related
I'm trying to customize the jenkins/jenkins:latest image in order to install Docker so I'm able to run docker within the jenkins pipeline but when I run the following code using the following files, the pods, jenkins-jenkins, terminate with "Error" without outputing any meaningfully logs.
Dockerfile (custom_image:latest)
FROM jenkins/jenkins:latest
USER jenkins
(even though this Dockerfile is not installing docker the same error occurs)
values.yaml
jenkins:
name:
image: custom_image:latest
helm repo add jenkins https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/chart
helm install jenkins jenkins/jenkins-operator -n jenkins -f values.yaml
Outputs...
kubectl describe pod/jenkins-jenkins
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12s default-scheduler Successfully assigned jenkins/jenkins-jenkins to minikube
Normal Pulled 11s kubelet Container image "docker-jenkins:latest" already present on machine
Normal Created 11s kubelet Created container jenkins-master
Normal Started 11s kubelet Started container jenkins-master
Normal Pulled 11s kubelet Container image "virtuslab/jenkins-operator-backup-pvc:v0.1.0" already present on machine
Normal Created 11s kubelet Created container backup
Normal Started 11s kubelet Started container backup
Normal Killing 8s kubelet Stopping container backup
kubectl logs pod/jenkins-jenkins
...
Defaulted container "jenkins-master" out of: jenkins-master, backup
+ '[' '' == true ']'
+ echo 'To print debug messages set environment variable '\''DEBUG_JENKINS_OPERATOR'\'' to '\''true'\'''
+ mkdir -p /var/lib/jenkins/init.groovy.d
To print debug messages set environment variable 'DEBUG_JENKINS_OPERATOR' to 'true'
+ cp -n /var/jenkins/init-configuration/createOperatorUser.groovy /var/lib/jenkins/init.groovy.d
+ mkdir -p /var/lib/jenkins/scripts
+ cp /var/jenkins/scripts/init.sh /var/jenkins/scripts/install-plugins.sh /var/lib/jenkins/scripts
+ chmod +x /var/lib/jenkins/scripts/init.sh /var/lib/jenkins/scripts/install-plugins.sh
Installing plugins required by Operator - begin
+ echo 'Installing plugins required by Operator - begin'
+ cat
+ [[ -z '' ]]
+ install-plugins.sh
WARN: install-plugins.sh has been removed, please switch to jenkins-plugin-cli
kubectl describe pod/jenkins-jenkins-operator-7c4cd6dc7b-g6m7z
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18h default-scheduler Successfully assigned jenkins/jenkins-jenkins-operator-7c4cd6dc7b-g6m7z to minikube
Normal Pulled 18h kubelet Container image "virtuslab/jenkins-operator:v0.7.1" already present on machine
Normal Created 18h kubelet Created container jenkins-operator
Normal Started 18h kubelet Started container jenkins-operator
Normal SandboxChanged 3m56s kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 3m23s kubelet Back-off restarting failed container
Normal Pulled 3m11s (x2 over 3m55s) kubelet Container image "virtuslab/jenkins-operator:v0.7.1" already present on machine
Normal Created 3m11s (x2 over 3m55s) kubelet Created container jenkins-operator
Normal Started 3m10s (x2 over 3m55s) kubelet Started container jenkins-operator
kubectl logs jenkins-jenkins-operator-7c4cd6dc7b-g6m7z
2022-11-22T20:00:50.544Z DEBUG controller-jenkins Jenkins HTTP Service is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins slave Service is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Kubernetes resources are present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins master pod is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins master pod is terminating {"cr": "jenkins"}
2022-11-22T20:00:55.546Z DEBUG controller-jenkins Reconciling Jenkins {"cr": "jenkins"}
2022-11-22T20:00:55.546Z DEBUG controller-jenkins Operator credentials secret is present {"cr": "jenkins"}
2022-11-22T20:00:55.552Z DEBUG controller-jenkins Scripts config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.555Z DEBUG controller-jenkins Init configuration config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins Base configuration config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins GroovyScripts Secret and ConfigMap added watched labels {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins ConfigurationAsCode Secret and ConfigMap added watched labels {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins createServiceAccount with annotations map[] {"cr": "jenkins"}
2022-11-22T20:00:55.582Z DEBUG controller-jenkins Service account, role and role binding are present {"cr": "jenkins"}
2022-11-22T20:00:55.582Z DEBUG controller-jenkins Extra role bindings are present {"cr": "jenkins"}
2022-11-22T20:00:55.583Z DEBUG controller-jenkins Jenkins HTTP Service is present {"cr": "jenkins"}
2022-11-22T20:00:55.584Z DEBUG controller-jenkins Jenkins slave Service is present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Kubernetes resources are present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Jenkins master pod is present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Jenkins master pod is terminating {"cr": "jenkins"}
I don't see any issue in the logs you shared. You may try to install Jenkins using helm chart and not operator.
I summarized how to do that in Jenkins Docker in Docker Agent post. You may read about using Docker in Jenkins pipelines there as well.
I have a node.js service that I want to build with Jenkins in Kubernetes, using a Jenkins agent pod specified by the node.js project. I am trying to eliminate manual touch to Jenkins UI. Everything is running in one kubernetes cluster.
I am following this blog and adapting it slightly, but running into problems:
I get an error ‘Jenkins’ doesn’t have label ‘test-pod’
The job loops infinitely.
The build agent is successfully created in Kubernetes. The test-pod label is specified by the Jenkinsfile so I don't know why I get this error. And how is it infinitely looping?
podTemplate(
name: 'test-pod',
label: 'test-pod',
containers: [
containerTemplate(name: 'node14', image: 'node:14-alpine'),
containerTemplate(name: 'docker', image:'trion/jenkins-docker-client'),
],
{
node('test-pod') {
stage('Build'){
container('node14') {
// do nothing just yet
}
}
}
}
)
Here is part of the Jenkins console output:
Started by user admin
Obtained Jenkinsfile from git ssh://git#kube-master.cluster.dev/git/hello.git
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Created Pod: kubernetes jenkins/test-pod-2hdfp-9kcjj
[Normal][jenkins/test-pod-2hdfp-9kcjj][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-9kcjj to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container jnlp
jenkins/test-pod-2hdfp-9kcjj Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-9kcjj][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-gc2qb
[Normal][jenkins/test-pod-2hdfp-gc2qb][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-gc2qb to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container jnlp
jenkins/test-pod-2hdfp-gc2qb Container node14 was terminated (Exit Code: 0, Reason: Completed)
Still waiting to schedule task
‘Jenkins’ doesn’t have label test-pod’
[Normal][jenkins/test-pod-2hdfp-gc2qb][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-xwkm2
[Normal][jenkins/test-pod-2hdfp-xwkm2][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-xwkm2 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container jnlp
jenkins/test-pod-2hdfp-xwkm2 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-xwkm2][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-4ltq3
[Normal][jenkins/test-pod-2hdfp-4ltq3][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-4ltq3 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container jnlp
jenkins/test-pod-2hdfp-4ltq3 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-4ltq3][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-0216w
...
Update with latest findings
Master log (see debugging) doesn't provide much else:
...
2021-04-30 11:52:42.715+0000 [id=4660] INFO hudson.slaves.NodeProvisioner#lambda$update$6: test-pod-gb4vq-hf3d4 provisioning successfully completed. We have now 2 computer(s)
2021-04-30 11:52:42.741+0000 [id=4659] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/test-pod-gb4vq-hf3d4
2021-04-30 11:52:42.847+0000 [id=4680] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-pod-gb4vq-pdd69, template=PodTemplate{id='f29ecbdd-9c1d-468f-86ff-dd46ff40f306', name='test-pod-gb4vq', namespace='jenkins', label='test-pod', containers=[ContainerTemplate{name='node14', image='node:14-alpine'}, ContainerTemplate{name='docker', image='trion/jenkins-docker-client'}], annotations=[PodAnnotation{key='buildUrl', value='http://172.16.1.12/job/hello/14/'}, PodAnnotation{key='runUrl', value='job/hello/14/'}]}
java.lang.IllegalStateException: Pod is no longer available: jenkins/test-pod-gb4vq-pdd69
...
except that it suggests the container is starting up, then failing. It appears the loop is because the error handling in the Kubernetes plug-in isn't properly catching it and failing the job.
By watching for the build pod (using k9s) I am able to capture the pod's log, and Unknown client name also sounds like it is caused by fast container termination:
jnlp INFO: [JNLP4-connect connection to 172.16.1.12/172.16.1.12:50000] Local headers refused by remote: Unknown client name: test-pod-34sd7-5xhs2
jnlp Apr 29, 2021 10:42:15 PM hudson.remoting.jnlp.Main$CuiListener status
jnlp INFO: Protocol JNLP4-connect encountered an unexpected exception
jnlp java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2
jnlp at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
jnlp at hudson.remoting.Engine.innerRun(Engine.java:743)
jnlp at hudson.remoting.Engine.run(Engine.java:518)
jnlp Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2
jnlp at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
Just found a similar issue
This is useful: I added podRetention: always(), to podTemplate() after label so the agent pods don't terminate, and they show Error.
Good finding
With the above retaining the pod on error, I can now find /var/log/containers/<failed pod>.log and it has led me to a root cause.
2021-04-30T08:59:36.047989534-04:00 stderr F java.net.UnknownHostException: updates.jenkins.io
This is because of dnsPolicy that limits DNS to cluster-only lookups. The fix for this is to add hostNetwork: true to podTemplate() next to label.
Next, the image trion/jenkins-docker-client as recommended by the blog is a client AND a server, so it is the wrong image.
Switching to jenkins/agent creates a new problem. The pod now goes up and down doing nothing, not even logging. I suspect this is a launch parameter issue.
Now it is clear I shouldn't even have a Jenkins container in the Jenkinsfile, because the Kubernetes plug-in will automatically start a JNLP container.
And that means the problem is, at last, the node14 container - which either is immediately erroring, or immediately finding nothing to do and terminating.
The error handling is difficult to understand and troubleshoot, and the blog is wrong.
Start with a bare minimum working agent Jenkinsfile:
podTemplate(
name: 'build-pod',
namespace: 'jenkins',
podRetention: always(), // for debugging
{
node(POD_LABEL) {
stage('Build') {
sh "echo hello"
}
}
}
)
From there, extend it with containers, volumes, container build sections, etc. one step at a time.
Troubleshoot using the logs:
kubectl get pods -n jenkins to list the pod name, and then
kubectl logs -f <jenkins-pod> -n jenkins
(assuming jenkins is your Kubernetes namespace)
I have an on-premise Kubernetes(v1.15.3) cluster(1 Master and 2 Worker nodes). I wanted to run Jenkins agents in the cluster using Kubernetes-plugin feature by keeping Jenkins(version. 2.176.2) master outside the cluster. Therefore, I created a new namespace (jenkins) and followed the configurations mentioned here. Then, I filled my Kubernetes credentials in the cloud field of Jenkins master configuration. The connection was established successfully. Now, when I tried to to run a Jenkins job as a pod in Kubernetes, the pod is not coming online. The logs from Kubernetes shows:
Kubernetes-log
kubectl get sa
NAME SECRETS AGE
default 1 23h
jenkins 1 23h
kubectl get secrets
NAME TYPE DATA AGE
default-token-4gcr4 kubernetes.io/service-account-token 3 23h
jenkins-token-7nwbd kubernetes.io/service-account-token 3 23h
The Console output from Jenkins job shows:
Jenkins-log
Has anyone encountered a similar error before?
I am working on a offline cluster (machines have no internet access), deploying docker images using ansible and docker compose scripts.
My servers are Centos7.
I have set up an insecure docker registry on the machines. We are going to change environnement, and I am installing kubernetes in order to manage my pull of container.
I follow this guide to install kubernetes:
https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services
After the installation, I tried to launch a testing pod. here is the yml for the pod, launching with
kubectl -f create nginx.yml
here the yml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: [my_registry_addr]:[my_registry_port]/nginx:v1
ports:
- containerPort: 80
I used kubectl describe to get more information on what was wrong:
Name: nginx
Namespace: default
Node: [my node]
Start Time: Fri, 15 Sep 2017 11:29:05 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Container ID:
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
Image ID:
Port: 80/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nginx to [my kubernet node]
1m 1m 2 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_addr]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
54s 54s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"[my_registry_addr]:[my_registry_port]\""
8s 8s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Network timed out while trying to connect to https://index.docker.io/v1/repositories/library/[my_registry_addr]/images. You may want to check your internet connection or if you are behind a proxy."
then, I go to my node and use journalctl -xe
sept. 15 11:22:02 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:02.350930396+02:00" level=info msg="{Action=create, LoginUID=4294967295, PID=11555}"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351536727+02:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351606330+02:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:32 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:32.353946452+02:00" level=error msg="Not continuing with pull after error: Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354309 11555 docker_manager.go:2161] Failed to create pod infra container: ErrImagePull; Skipping pod "nginx_default(8b5c40e5-99f4-11e7-98db-f8bc12456ee4)": Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354390 11555 pod_workers.go:184] Error syncing pod 8b5c40e5-99f4-11e7-98db-f8bc12456ee4, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:44 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:44.350708175+02:00" level=error msg="Handler for GET /v1.24/images/[my_registry_ip]:[my_registry_port]/json returned error: No such image: [my_registry_ip]:[my_registry_port]"
I sure thant my docker configuration is good, cause I am using it every day with ansible or mesos.
docker version is 1.12.6, kubernetes version is 1.5.2
What can I do now? I didn't find any configuration key for this usage.
When I saw that pulling was failing, I manually pull the image on all the nodes. I put a tag to ensure that kubernetes will to try to pull as default, and set " imagePullPolicy: IfNotPresent "
The syntax for specifying the docker image is :
[docker_registry]/[image_name]:[image_tag]
In your manifest file, you have used ":" to separate docker repository host and the port the repository is listening on. The default port for docker private registry I guess is 5000.
So change your image declaration from
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
to
Image: [my_registry_addr]/nginx:v1
Also, check the network connectivity from the worker node to your docker registry by doing a ping.
ping [my_registry_addr]
If you still want to check if the port 443 is opened on the registry you can do a tcp check on that port on the host running docker registry
curl telnet://[my_registry_addr]:443
Hope that helps.
I finally find what was the problem.
To work, Kubernetes need a pause container. Kubernetes was trying to find the pause container on the internet.
I deployed a custom pause container on my registry, I set up kubernetes pause container to this image.
After that, kubernetes is working like a charm.
I'm trying to run official SonarQube Docker container locally. I'm using the command provided here:
https://hub.docker.com/_/sonarqube/
It exits about 1 minute after it was started. Logs are reporting Elasticsearch connectivity issue
2017.09.05 08:16:40 INFO web[][o.e.client.transport] [Edwin Jarvis] failed to connect to node [{#transport#-1}{127.0.0.1}{127.0.0.1:9001}], removed from nodes list
org.elasticsearch.transport.ConnectTransportException: [][127.0.0.1:9001] connect_timeout[30s]
.....
Caused by: java.net.ConnectException: Connection refused: /127.0.0.1:9001
.....
... 3 common frames omitted
2017.09.05 08:17:10 INFO app[][o.s.a.SchedulerImpl] Process [web] is stopped
2017.09.05 08:17:10 INFO app[][o.s.a.SchedulerImpl] SonarQube is stopped
Turns out SonarQube container didn't have enough resources. I shut down other docker containers and it works for me now.