Jenkins agent pod fails to run on kubernetes

Jenkins agent pod fails to run on kubernetes - jenkins

I have installed jenkins on a GKE cluster using the stable helm chart.
I am able to access and login to the UI.
However, when trying to run a simple job, the agent pod fails to be created.
The logs are not very informative on this
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.523+0000 [id=184] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-ld008, template=PodTemplate{inheritFrom='', name='default', slaveConnectTimeout=0, label='jenkins-kos-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins/agent', command='', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='500m', resourceRequestMemory='1Gi', resourceLimitCpu='4000m', resourceLimitMemory='8Gi', envVars=[ContainerEnvVar [getValue()=http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
jenkins-kos-58586644f9-vh278 jenkins java.lang.IllegalStateException: Pod has terminated containers: jenkins/default-ld008 (jnlp)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:166)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:187)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:127)
jenkins-kos-58586644f9-vh278 jenkins at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:132)
jenkins-kos-58586644f9-vh278 jenkins at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:290)
jenkins-kos-58586644f9-vh278 jenkins at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
jenkins-kos-58586644f9-vh278 jenkins at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.FutureTask.run(FutureTask.java:266)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jenkins-kos-58586644f9-vh278 jenkins at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jenkins-kos-58586644f9-vh278 jenkins at java.lang.Thread.run(Thread.java:748)
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.524+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-ld008
jenkins-kos-58586644f9-vh278 jenkins Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.559+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.560+0000 [id=184] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:56.009+0000 [id=53
Here are the kubernetes events
0s Normal Scheduled pod/default-zkwp4 Successfully assigned jenkins/default-zkwp4 to gke-kos-nodepool1-kq69
0s Normal Pulled pod/default-zkwp4 Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
0s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Normal Pulled pod/default-zkwp4 Container image "jenkins/jnlp-slave:3.27-1" already present on machine
0s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Normal Pulled pod/default-zkwp4 Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
1s Normal Created pod/default-zkwp4 Created container
0s Normal Started pod/default-zkwp4 Started container
0s Warning Unhealthy pod/default-zkwp4 Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s Warning Unhealthy pod/default-zkwp4 Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s Normal Killing pod/default-zkwp4 Killing container with id docker://istio-proxy:Need to kill Pod
The TCP port for agent communication is fixed to 50000
Using jenkins/jnlp-slave:3.27-1 for the agent image.
Any ideas what might be causing this?
UPDATE 1: Here is a gist with the description of the failed agent.
UPDATE 2: Managed to pinpoint the actual error in the jnlp logs using stackdriver (although not aware of the root cause yet)
"SEVERE: Failed to connect to http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
UPDATE 3: Here comes the weird(est) part: from a pod I spin up within the jenkins namespace:
/ # dig +short jenkins-kos.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-kos.jenkins.svc.cluster.local 8080
jenkins-kos.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/
Jenkins
UPDATE 4: I can confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using 1.1.15

You can disable the sidecar proxy for agents.
Go to Manage Jenkins -> Configuration -> Kubernetes Cloud.
Select Annotations options and enter the below annotation value.
sidecar.istio.io/inject: "false"

Related

Use custom jenkins/jenkins:latest image with helm package (docker within jenkins pipeline)

I'm trying to customize the jenkins/jenkins:latest image in order to install Docker so I'm able to run docker within the jenkins pipeline but when I run the following code using the following files, the pods, jenkins-jenkins, terminate with "Error" without outputing any meaningfully logs.
Dockerfile (custom_image:latest)
FROM jenkins/jenkins:latest
USER jenkins
(even though this Dockerfile is not installing docker the same error occurs)
values.yaml
jenkins:
name:
image: custom_image:latest
helm repo add jenkins https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/chart
helm install jenkins jenkins/jenkins-operator -n jenkins -f values.yaml
Outputs...
kubectl describe pod/jenkins-jenkins
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12s default-scheduler Successfully assigned jenkins/jenkins-jenkins to minikube
Normal Pulled 11s kubelet Container image "docker-jenkins:latest" already present on machine
Normal Created 11s kubelet Created container jenkins-master
Normal Started 11s kubelet Started container jenkins-master
Normal Pulled 11s kubelet Container image "virtuslab/jenkins-operator-backup-pvc:v0.1.0" already present on machine
Normal Created 11s kubelet Created container backup
Normal Started 11s kubelet Started container backup
Normal Killing 8s kubelet Stopping container backup
kubectl logs pod/jenkins-jenkins
...
Defaulted container "jenkins-master" out of: jenkins-master, backup
+ '[' '' == true ']'
+ echo 'To print debug messages set environment variable '\''DEBUG_JENKINS_OPERATOR'\'' to '\''true'\'''
+ mkdir -p /var/lib/jenkins/init.groovy.d
To print debug messages set environment variable 'DEBUG_JENKINS_OPERATOR' to 'true'
+ cp -n /var/jenkins/init-configuration/createOperatorUser.groovy /var/lib/jenkins/init.groovy.d
+ mkdir -p /var/lib/jenkins/scripts
+ cp /var/jenkins/scripts/init.sh /var/jenkins/scripts/install-plugins.sh /var/lib/jenkins/scripts
+ chmod +x /var/lib/jenkins/scripts/init.sh /var/lib/jenkins/scripts/install-plugins.sh
Installing plugins required by Operator - begin
+ echo 'Installing plugins required by Operator - begin'
+ cat
+ [[ -z '' ]]
+ install-plugins.sh
WARN: install-plugins.sh has been removed, please switch to jenkins-plugin-cli
kubectl describe pod/jenkins-jenkins-operator-7c4cd6dc7b-g6m7z
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18h default-scheduler Successfully assigned jenkins/jenkins-jenkins-operator-7c4cd6dc7b-g6m7z to minikube
Normal Pulled 18h kubelet Container image "virtuslab/jenkins-operator:v0.7.1" already present on machine
Normal Created 18h kubelet Created container jenkins-operator
Normal Started 18h kubelet Started container jenkins-operator
Normal SandboxChanged 3m56s kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 3m23s kubelet Back-off restarting failed container
Normal Pulled 3m11s (x2 over 3m55s) kubelet Container image "virtuslab/jenkins-operator:v0.7.1" already present on machine
Normal Created 3m11s (x2 over 3m55s) kubelet Created container jenkins-operator
Normal Started 3m10s (x2 over 3m55s) kubelet Started container jenkins-operator
kubectl logs jenkins-jenkins-operator-7c4cd6dc7b-g6m7z
2022-11-22T20:00:50.544Z DEBUG controller-jenkins Jenkins HTTP Service is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins slave Service is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Kubernetes resources are present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins master pod is present {"cr": "jenkins"}
2022-11-22T20:00:50.545Z DEBUG controller-jenkins Jenkins master pod is terminating {"cr": "jenkins"}
2022-11-22T20:00:55.546Z DEBUG controller-jenkins Reconciling Jenkins {"cr": "jenkins"}
2022-11-22T20:00:55.546Z DEBUG controller-jenkins Operator credentials secret is present {"cr": "jenkins"}
2022-11-22T20:00:55.552Z DEBUG controller-jenkins Scripts config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.555Z DEBUG controller-jenkins Init configuration config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins Base configuration config map is present {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins GroovyScripts Secret and ConfigMap added watched labels {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins ConfigurationAsCode Secret and ConfigMap added watched labels {"cr": "jenkins"}
2022-11-22T20:00:55.562Z DEBUG controller-jenkins createServiceAccount with annotations map[] {"cr": "jenkins"}
2022-11-22T20:00:55.582Z DEBUG controller-jenkins Service account, role and role binding are present {"cr": "jenkins"}
2022-11-22T20:00:55.582Z DEBUG controller-jenkins Extra role bindings are present {"cr": "jenkins"}
2022-11-22T20:00:55.583Z DEBUG controller-jenkins Jenkins HTTP Service is present {"cr": "jenkins"}
2022-11-22T20:00:55.584Z DEBUG controller-jenkins Jenkins slave Service is present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Kubernetes resources are present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Jenkins master pod is present {"cr": "jenkins"}
2022-11-22T20:00:55.585Z DEBUG controller-jenkins Jenkins master pod is terminating {"cr": "jenkins"}

I don't see any issue in the logs you shared. You may try to install Jenkins using helm chart and not operator.
I summarized how to do that in Jenkins Docker in Docker Agent post. You may read about using Docker in Jenkins pipelines there as well.

What should podTemplate look like to achieve Jenkins code-driven build agent in Kubernetes

I have a node.js service that I want to build with Jenkins in Kubernetes, using a Jenkins agent pod specified by the node.js project. I am trying to eliminate manual touch to Jenkins UI. Everything is running in one kubernetes cluster.
I am following this blog and adapting it slightly, but running into problems:
I get an error ‘Jenkins’ doesn’t have label ‘test-pod’
The job loops infinitely.
The build agent is successfully created in Kubernetes. The test-pod label is specified by the Jenkinsfile so I don't know why I get this error. And how is it infinitely looping?
podTemplate(
name: 'test-pod',
label: 'test-pod',
containers: [
containerTemplate(name: 'node14', image: 'node:14-alpine'),
containerTemplate(name: 'docker', image:'trion/jenkins-docker-client'),
],
{
node('test-pod') {
stage('Build'){
container('node14') {
// do nothing just yet
}
}
}
}
)
Here is part of the Jenkins console output:
Started by user admin
Obtained Jenkinsfile from git ssh://git#kube-master.cluster.dev/git/hello.git
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Created Pod: kubernetes jenkins/test-pod-2hdfp-9kcjj
[Normal][jenkins/test-pod-2hdfp-9kcjj][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-9kcjj to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container jnlp
jenkins/test-pod-2hdfp-9kcjj Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-9kcjj][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-gc2qb
[Normal][jenkins/test-pod-2hdfp-gc2qb][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-gc2qb to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container jnlp
jenkins/test-pod-2hdfp-gc2qb Container node14 was terminated (Exit Code: 0, Reason: Completed)
Still waiting to schedule task
‘Jenkins’ doesn’t have label test-pod’
[Normal][jenkins/test-pod-2hdfp-gc2qb][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-xwkm2
[Normal][jenkins/test-pod-2hdfp-xwkm2][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-xwkm2 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container jnlp
jenkins/test-pod-2hdfp-xwkm2 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-xwkm2][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-4ltq3
[Normal][jenkins/test-pod-2hdfp-4ltq3][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-4ltq3 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container jnlp
jenkins/test-pod-2hdfp-4ltq3 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-4ltq3][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-0216w
...
Update with latest findings
Master log (see debugging) doesn't provide much else:
...
2021-04-30 11:52:42.715+0000 [id=4660] INFO hudson.slaves.NodeProvisioner#lambda$update$6: test-pod-gb4vq-hf3d4 provisioning successfully completed. We have now 2 computer(s)
2021-04-30 11:52:42.741+0000 [id=4659] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/test-pod-gb4vq-hf3d4
2021-04-30 11:52:42.847+0000 [id=4680] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-pod-gb4vq-pdd69, template=PodTemplate{id='f29ecbdd-9c1d-468f-86ff-dd46ff40f306', name='test-pod-gb4vq', namespace='jenkins', label='test-pod', containers=[ContainerTemplate{name='node14', image='node:14-alpine'}, ContainerTemplate{name='docker', image='trion/jenkins-docker-client'}], annotations=[PodAnnotation{key='buildUrl', value='http://172.16.1.12/job/hello/14/'}, PodAnnotation{key='runUrl', value='job/hello/14/'}]}
java.lang.IllegalStateException: Pod is no longer available: jenkins/test-pod-gb4vq-pdd69
...
except that it suggests the container is starting up, then failing. It appears the loop is because the error handling in the Kubernetes plug-in isn't properly catching it and failing the job.
By watching for the build pod (using k9s) I am able to capture the pod's log, and Unknown client name also sounds like it is caused by fast container termination:
jnlp INFO: [JNLP4-connect connection to 172.16.1.12/172.16.1.12:50000] Local headers refused by remote: Unknown client name: test-pod-34sd7-5xhs2
jnlp Apr 29, 2021 10:42:15 PM hudson.remoting.jnlp.Main$CuiListener status
jnlp INFO: Protocol JNLP4-connect encountered an unexpected exception
jnlp java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2
jnlp at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
jnlp at hudson.remoting.Engine.innerRun(Engine.java:743)
jnlp at hudson.remoting.Engine.run(Engine.java:518)
jnlp Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2
jnlp at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
Just found a similar issue
This is useful: I added podRetention: always(), to podTemplate() after label so the agent pods don't terminate, and they show Error.
Good finding
With the above retaining the pod on error, I can now find /var/log/containers/<failed pod>.log and it has led me to a root cause.
2021-04-30T08:59:36.047989534-04:00 stderr F java.net.UnknownHostException: updates.jenkins.io
This is because of dnsPolicy that limits DNS to cluster-only lookups. The fix for this is to add hostNetwork: true to podTemplate() next to label.
Next, the image trion/jenkins-docker-client as recommended by the blog is a client AND a server, so it is the wrong image.
Switching to jenkins/agent creates a new problem. The pod now goes up and down doing nothing, not even logging. I suspect this is a launch parameter issue.
Now it is clear I shouldn't even have a Jenkins container in the Jenkinsfile, because the Kubernetes plug-in will automatically start a JNLP container.
And that means the problem is, at last, the node14 container - which either is immediately erroring, or immediately finding nothing to do and terminating.

The error handling is difficult to understand and troubleshoot, and the blog is wrong.
Start with a bare minimum working agent Jenkinsfile:
podTemplate(
name: 'build-pod',
namespace: 'jenkins',
podRetention: always(), // for debugging
{
node(POD_LABEL) {
stage('Build') {
sh "echo hello"
}
}
}
)
From there, extend it with containers, volumes, container build sections, etc. one step at a time.
Troubleshoot using the logs:
kubectl get pods -n jenkins to list the pod name, and then
kubectl logs -f <jenkins-pod> -n jenkins
(assuming jenkins is your Kubernetes namespace)

MountVolume.SetUp failed for volume "default-token-4gcr4" : object "jenkins"/"default-token-4gcr4" not registered

I have an on-premise Kubernetes(v1.15.3) cluster(1 Master and 2 Worker nodes). I wanted to run Jenkins agents in the cluster using Kubernetes-plugin feature by keeping Jenkins(version. 2.176.2) master outside the cluster. Therefore, I created a new namespace (jenkins) and followed the configurations mentioned here. Then, I filled my Kubernetes credentials in the cloud field of Jenkins master configuration. The connection was established successfully. Now, when I tried to to run a Jenkins job as a pod in Kubernetes, the pod is not coming online. The logs from Kubernetes shows:
Kubernetes-log
kubectl get sa
NAME SECRETS AGE
default 1 23h
jenkins 1 23h
kubectl get secrets
NAME TYPE DATA AGE
default-token-4gcr4 kubernetes.io/service-account-token 3 23h
jenkins-token-7nwbd kubernetes.io/service-account-token 3 23h
The Console output from Jenkins job shows:
Jenkins-log
Has anyone encountered a similar error before?

Kubernetes cannot pull from insecure registry ans cannot run container from local image on offline cluster

I am working on a offline cluster (machines have no internet access), deploying docker images using ansible and docker compose scripts.
My servers are Centos7.
I have set up an insecure docker registry on the machines. We are going to change environnement, and I am installing kubernetes in order to manage my pull of container.
I follow this guide to install kubernetes:
https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services
After the installation, I tried to launch a testing pod. here is the yml for the pod, launching with
kubectl -f create nginx.yml
here the yml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: [my_registry_addr]:[my_registry_port]/nginx:v1
ports:
- containerPort: 80
I used kubectl describe to get more information on what was wrong:
Name: nginx
Namespace: default
Node: [my node]
Start Time: Fri, 15 Sep 2017 11:29:05 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Container ID:
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
Image ID:
Port: 80/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nginx to [my kubernet node]
1m 1m 2 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_addr]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
54s 54s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"[my_registry_addr]:[my_registry_port]\""
8s 8s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Network timed out while trying to connect to https://index.docker.io/v1/repositories/library/[my_registry_addr]/images. You may want to check your internet connection or if you are behind a proxy."
then, I go to my node and use journalctl -xe
sept. 15 11:22:02 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:02.350930396+02:00" level=info msg="{Action=create, LoginUID=4294967295, PID=11555}"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351536727+02:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351606330+02:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:32 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:32.353946452+02:00" level=error msg="Not continuing with pull after error: Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354309 11555 docker_manager.go:2161] Failed to create pod infra container: ErrImagePull; Skipping pod "nginx_default(8b5c40e5-99f4-11e7-98db-f8bc12456ee4)": Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354390 11555 pod_workers.go:184] Error syncing pod 8b5c40e5-99f4-11e7-98db-f8bc12456ee4, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:44 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:44.350708175+02:00" level=error msg="Handler for GET /v1.24/images/[my_registry_ip]:[my_registry_port]/json returned error: No such image: [my_registry_ip]:[my_registry_port]"
I sure thant my docker configuration is good, cause I am using it every day with ansible or mesos.
docker version is 1.12.6, kubernetes version is 1.5.2
What can I do now? I didn't find any configuration key for this usage.
When I saw that pulling was failing, I manually pull the image on all the nodes. I put a tag to ensure that kubernetes will to try to pull as default, and set " imagePullPolicy: IfNotPresent "

The syntax for specifying the docker image is :
[docker_registry]/[image_name]:[image_tag]
In your manifest file, you have used ":" to separate docker repository host and the port the repository is listening on. The default port for docker private registry I guess is 5000.
So change your image declaration from
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
to
Image: [my_registry_addr]/nginx:v1
Also, check the network connectivity from the worker node to your docker registry by doing a ping.
ping [my_registry_addr]
If you still want to check if the port 443 is opened on the registry you can do a tcp check on that port on the host running docker registry
curl telnet://[my_registry_addr]:443
Hope that helps.

I finally find what was the problem.
To work, Kubernetes need a pause container. Kubernetes was trying to find the pause container on the internet.
I deployed a custom pause container on my registry, I set up kubernetes pause container to this image.
After that, kubernetes is working like a charm.

SonarQube docker container can't start, elasticsearch issue

I'm trying to run official SonarQube Docker container locally. I'm using the command provided here:
https://hub.docker.com/_/sonarqube/
It exits about 1 minute after it was started. Logs are reporting Elasticsearch connectivity issue
2017.09.05 08:16:40 INFO web[][o.e.client.transport] [Edwin Jarvis] failed to connect to node [{#transport#-1}{127.0.0.1}{127.0.0.1:9001}], removed from nodes list
org.elasticsearch.transport.ConnectTransportException: [][127.0.0.1:9001] connect_timeout[30s]
.....
Caused by: java.net.ConnectException: Connection refused: /127.0.0.1:9001
.....
... 3 common frames omitted
2017.09.05 08:17:10 INFO app[][o.s.a.SchedulerImpl] Process [web] is stopped
2017.09.05 08:17:10 INFO app[][o.s.a.SchedulerImpl] SonarQube is stopped

Turns out SonarQube container didn't have enough resources. I shut down other docker containers and it works for me now.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Jenkins agent pod fails to run on kubernetes - jenkins

You can disable the sidecar proxy for agents. Go to Manage Jenkins -> Configuration -> Kubernetes Cloud. Select Annotations options and enter the below annotation value. sidecar.istio.io/inject: "false"

Related

Use custom jenkins/jenkins:latest image with helm package (docker within jenkins pipeline)

What should podTemplate look like to achieve Jenkins code-driven build agent in Kubernetes

MountVolume.SetUp failed for volume "default-token-4gcr4" : object "jenkins"/"default-token-4gcr4" not registered

Kubernetes cannot pull from insecure registry ans cannot run container from local image on offline cluster

SonarQube docker container can't start, elasticsearch issue

Categories

Resources