Jenkins slave pod is offline - jenkins

I am having issue with slave pods not being able to connect to Jenkins master.
This is the Jenkins build output
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
‘ci-xprj2-2z8qp’ is offline
I can see this in the Jenkins pod log
2020-09-24 20:16:57.778+0000 [id=6228] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/ci-xprj2-2tqzn
2020-09-24 20:16:57.778+0000 [id=24] INFO hudson.slaves.NodeProvisioner#lambda$update$6: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
2020-09-24 20:16:57.779+0000 [id=24] INFO o.c.j.p.k.KubernetesCloud#provision: Excess workload after pending Kubernetes agents: 0
2020-09-24 20:16:57.779+0000 [id=24] INFO o.c.j.p.k.KubernetesCloud#provision: Template for label ci: Kubernetes Pod Template
2020-09-24 20:16:57.839+0000 [id=5801] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-09-24 20:16:59.902+0000 [id=6228] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: infrastructure/ci-xprj2-2tqzn
2020-09-24 20:16:59.906+0000 [id=6228] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (0/100): ci-xprj2-2tqzn
2020-09-24 20:17:00.911+0000 [id=6228] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (1/100): ci-xprj2-2tqzn
2020-09-24 20:17:01.917+0000 [id=6228] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (2/100): ci-xprj2-2tqzn
The log from ci-xprj2-2tqzn shows this:
Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: ci-xprj2-29g0p
Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Sep 24, 2020 8:18:59 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Sep 24, 2020 8:18:59 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins1:8080/]
Sep 24, 2020 8:19:19 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
java.io.IOException: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
at hudson.remoting.Engine.innerRun(Engine.java:693)
at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins1
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
... 2 more
My Jenkins config looks like this:
Any help?

It looks like the error to focus on would be:
SEVERE: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
java.io.IOException: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
...
Caused by: java.net.UnknownHostException
which means jenkins1 can't be resolved.
If jenkins1 corresponds to a Kubernetes service name, I would double check its name and details and then spin up another pod in your namespace that sleeps for a while so that you can exec in and see if you can resolve jenkins1.
kubectl exec -it <sleep-test-pod-name> /bin/bash
ping jenkins1
nslookup jenkins1 #install nslookup if not already installed
If jenkins1 corresponds to one of those single word domains you sometimes see at corporations, then I would double check your search prefixes in /etc/resolv.conf in your pods:
cat /etc/resolv.conf

Related

Unable to add Jenkins Windows Slave (Master on linux)

I got following command on Linux Jenkins master after adding windows node.
Run from agent command line:
java -jar agent.jar -jnlpUrl http://192.168.235.140:8080/computer/Windows/jenkins-agent.jnlp -secret 32961e0f735ecce01a1cdfc1075b509c50f865266b67c48da61ed35f58802a08 -workDir "C:\jenkins"
I don't know what this IP is in above command - 192.168.235.140.
Here are my IPs.
Jenkins Master IP: http://192.168.235.120:8080/
My host machine (Windows slave): 192.168.235.121
As per instructions, I need to replace IP 192.168.235.140 with my Jenkins server IP: 192.168.235.120
But while running the new command it's still trying to connect with that unknown IP and giving connection time out error.
PS C:\jenkins> java -jar agent.jar -jnlpUrl http://192.168.235.120:8080/computer/Windows/jenkins-agent.jnlp -secret 32961e0f735ecce01a1cdfc13sgt509c50f865266b67c48da61ed35f58802a08 -workDir "C:\jenkins"
Oct 02, 2022 1:20:45 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
....
....
INFO: Using Remoting version: 4.13.2
Oct 02, 2022 1:20:46 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using C:\jenkins\remoting as a remoting work directory
Oct 02, 2022 1:20:46 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://192.168.235.140:8080/]
Oct 02, 2022 1:21:07 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://192.168.235.140:8080/tcpSlaveAgentListener/: Connection timed out: connect
java.io.IOException: Failed to connect to http://192.168.235.140:8080/tcpSlaveAgentListener/: Connection timed out: connect
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
Question: Why it's not picking the correct Jenkins master IP from my input?
please someone suggest.

Kubernetes jenkins agent fails & shows - tcpSlaveAgentListener

I am runningJenkins Master & K8s-Master on same server. Jenkins running through tomcat Apache(not on K8s cluster). I have another server for K8s-Worker-Node, On both the server CentOS-8 OS installed. I have configured Jenkins Kubernetes Plugin version - 1.26.4 But while running pipeline job i always getting an error, Below is K8s cluster Jenkins agent pod log.
[root#K8s-Master /]# kubectl logs -f pipeline-test-33-sj6tl-r0clh-g559d -c jnlp
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: pipeline-test-33-sj6tl-r0clh-g559d
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 08, 2020 8:37:21 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Aug 08, 2020 8:37:21 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Aug 08, 2020 8:37:21 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins-serverjenkins/]
Aug 08, 2020 8:37:41 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins-server/jenkins/tcpSlaveAgentListener/: jenkins-server
java.io.IOException: Failed to connect to http://jenkins-serverjenkins/tcpSlaveAgentListener/: jenkins-server
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
at hudson.remoting.Engine.innerRun(Engine.java:693)
at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins-server
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
... 2 more
Below settings configuration already enabled.
Manage Jenkins --> Configure Global Security --> Agents Random [Enabled]
I am successfully able to communicate from my Jenkins to the K8s master cluster(Verified in Jenkins Cloud section).
Even in K8s master all the namespace pods are running. weave-net CNI installed, Don't know what is causing problem while agent provisioning through Jenkins.
My Jenkins/K8s master & K8s-Worker-Node /etc/hosts as follows.
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
75.76.77.5 jenkins-server jenkins-server.company.domain.com
75.76.77.6 k8s-node-1 k8s-node-1.company.domain.com
Below output getting in K8s-Worker node. It looks there is no problem in connecting jenkins-master from K8s-worker node.
# curl -I http://jenkins-server/jenkins/tcpSlaveAgentListener/
HTTP/1.1 200
Server: nginx/1.14.1
Date: Fri, 28 Aug 2020 06:13:34 GMT
Content-Type: text/plain;charset=UTF-8
Connection: keep-alive
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Content-Type-Options: nosniff
X-Hudson-JNLP-Port: 40021
X-Jenkins-JNLP-Port: 40021
X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnkgz8Av2x8R9R2KZDzWm1K11O01r7VDikW48rCNQlgw/pUeNSPJu9pv7kH884tOE65GkMepNdtJcOFQFtY1qZ0sr5y4GF5TOc7+U/TqfwULt60r7OQlKcrsQx/jJkF0xLjR+xaJ64WKnbsl0AiZhd8/ynk02UxFXKcgwkEP2PGpGyQ1ps5t/yj6ueFiPAHX2ssK8aI7ynVbf3YyVrtFOlqhnTy11mJFoLAZnpjYRCJsrX5z/xciVq5c2XmEikLzMpjFl0YBAsDo7JL4eBUwiBr64HPcSKrsBBB9oPE4oI6GkYUCAni8uOLfzoNr9B1eImaETYSdVPdSKW/ez/OeHjQIDAQAB
X-Jenkins-Agent-Protocols: JNLP4-connect, Ping
X-Remoting-Minimum-Version: 3.14
# curl http://jenkins-server:40021/
Jenkins-Agent-Protocols: JNLP4-connect, Ping
Jenkins-Version: 2.235.3
Jenkins-Session: 4455fd45
Client: 75.76.77.6
Server: 75.76.77.5
Remoting-Minimum-Version: 3.14
It looks Kubernetes DNS not resolving the name. So any pointers to resolve this problem will help. Thanks.
It was an Kubernetes DNS resolution issue. With the help of following link - https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution created dnsutils.yaml pod and found that my K8s cluster pods was returning following error "connection timed out; no servers could be reached" for below command.
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
So i have uninstalled and re-installed Kubernetes version - v1.19.0. Now everything working fine. Thanks.!!!

Jenkins JLNP agent not responding

I am trying to use jenkins and kubernetesplugin. I deployed a kubernetes cluster in AWS using kubeadm and I installed jenkins on a EC2 VM.
I am trying to initiate the deployment of pods in the kubernetes clusters using pipelines in jenkins.
However I keep getting an agent time out:
Waiting for agent to connect (30/100): mypod-67j9m-nqz0g
Feb 05, 2020 6:30:39 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Container is terminated mypod-67j9m-nqz0g [jnlp]: ContainerStateTerminated(containerID=docker://8021b8da7c087efd6c84085032c56b5523ca7492332e441fa302a561b93b9829, exitCode=255, finishedAt=2020-02-05T18:30:38Z, message=null, reason=Error, signal=null, startedAt=2020-02-05T18:30:07Z, additionalProperties={})
Feb 05, 2020 6:30:39 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher logLastLines
Error in provisioning; agent=KubernetesSlave name: mypod-67j9m-nqz0g, template=PodTemplate{, name='mypod-67j9m', label='mypod', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='golang', image='golang:1.8.0', command='cat', ttyEnabled=true}], annotations=[org.csanchez.jenkins.plugins.kubernetes.PodAnnotation#aab9c821]}. Container jnlp exited with error 255. Logs: Feb 05, 2020 6:30:08 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Feb 05, 2020 6:30:08 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://ec2-18-140-64-62.ap-southeast-1.compute.amazonaws.com:8080/]
Feb 05, 2020 6:30:38 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://xxxxx.ap-southeast-1.compute.amazonaws.com:8080/tcpSlaveAgentListener/: connect timed out
My pod is running but there is no response from the slave pod and the message states that
JNLP agent port is disabled and agents cannot connect this way
this is my current set up
podTemplate(cloud: 'kubernetes', label: 'mypod',containers: [
containerTemplate(name: 'golang', image: 'golang:1.8.0', ttyEnabled: true, command: 'cat')
]) {
node('mypod') {
stage('Get a Golang project') {
git url: 'https://github.com/hashicorp/terraform.git'
container('golang') {
stage('Build a Go project') {
sh 'echo hello world'
}
}
}
}
}
The slave connects back to Jenkins via JNLP on JNLP port (TCP port for incoming agents). If you will go to Manage > Configure Global Security; you would be able to see that; select Fixed port and specify a value say 50000.

Jenkins Kubernetes plugin failing to provision jnlp-slave pods

I have a Kubernetes 1.10.0, Docker 17.03.2-ce, and Jenkins 2.107.1 running on an Ubuntu 17.04 VM with Kubernetes Plugin 1.5 installed in Jenkins. I have 4 other Ubuntu VM(s) successfully set up as nodes in the cluster, including the untainted master. I can deploy nginx-based services directly and have unfettered access to the dashboard. So, Kubernetes itself seems happy enough.
Before you mention it, let me say that we don't have short term plans to run Jenkins master inside Kubernetes itself. So, I'd prefer to get this strategy working.
The plugin config for a Kubernetes Cloud is thus:
"Name": kubernetes
"Kubernetes URL": https://172.20.43.30:6443
from
# kubectl describe pods/kube-apiserver-jenkins-kube-master --namespace=kube-system | grep Liveness
Liveness: http-get https://172.20.43.30:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
after accepting the insecure cert, a browser to https://172.20.43.30:6443/ will show
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
"Kubernetes server certificate key" obtained from
# kubectl get pods/kube-apiserver-jenkins-kube-master -o yaml --namespace=kube-system | grep tls
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
# cat /etc/kubernetes/pki/apiserver.crt
-----BEGIN CERTIFICATE-----
MIIDZ******
*******************
****PP5wigl
-----END CERTIFICATE-----
"Kubernetes Namespace": jenkins-slaves
the jenkins-slaves namespace setup like this ...
create jenkins-namespace.yaml and add this:
apiVersion: v1
kind: Namespace
metadata:
name: jenkins-slaves
labels:
name: jenkins-slaves
spec:
finalizers:
- kubernetes
then
# kubectl create -f jenkins-namespace.yaml
namespace "jenkins-slaves" created
# kubectl -n jenkins-slaves create sa jenkins
serviceaccount "jenkins" created
# kubectl create role jenkins --verb=get,list,watch,create,patch,delete --resource=pods
role.rbac.authorization.k8s.io "jenkins" created
# kubectl create rolebinding jenkins --role=jenkins --serviceaccount=jenkins-slaves:jenkins
rolebinding.rbac.authorization.k8s.io "jenkins" created
# kubectl create clusterrolebinding jenkins --clusterrole cluster-admin --serviceaccount=jenkins-slaves:jenkins
clusterrolebinding.rbac.authorization.k8s.io "jenkins" created
added a Jenkins credential of "secret text" using the token spit out from
# kubectl get -n jenkins-slaves sa/jenkins --template='{{range .secrets}}{{ .name }} {{end}}' | xargs -n 1 kubectl -n jenkins-slaves get secret --template='{{ if .data.token }}{{ .data.token }}{{end}}' | head -n 1 | base64 -d -
a "Test Connection" shows "Connection test successful"
It should be noted that that same token can be used to login to the Kubernetes dashboard with full access rights.
"Jenkins URL": http://172.20.43.30:8080
"Kubernetes Pod Template:Name": jnlp slave
"Kubernetes Pod Template:Namespace": jenkins-slaves
"Kubernetes Pod Template:Labels": jenkins-slaves
"Kubernetes Pod Template:Usage": Only build jobs with label expressions matching this node
"Kubernetes Pod Template:Container Template:Name": jnlp-slave
"Kubernetes Pod Template:Container Template:Docker image": jenkins/jnlp-slave
"Kubernetes Pod Template:Container Template:Working directory": ./.jenkins-agent
At this point, if I create a job and "Restrict where this project can be run" to a "Label Expression" of "jenkins-slaves", I get:
Label jenkins-slaves is serviced by no nodes and 1 cloud. Permissions or other restrictions provided by plugins may prevent this job from running on those nodes.
If I try to build the job, it will sit in the build queue and the "Build Executor Status" will periodically say "jnlp-slave-##### (offline) (suspended)" and then disappear a couple seconds later.
The system log says:
Apr 03, 2018 12:16:21 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher logLastLines
Error in provisioning; agent=KubernetesSlave name: jnlp-slave-t8004, template=PodTemplate{inheritFrom='', name='jnlp slave', namespace='jenkins-slaves', label='jenkins-slaves', nodeSelector='', nodeUsageMode=EXCLUSIVE, workspaceVolume=org.csanchez.jenkins.plugins.kubernetes.volumes.workspace.EmptyDirWorkspaceVolume#44dcba2d, containers=[ContainerTemplate{name='jnlp-slave', image='jenkins/jnlp-slave', workingDir='./.jenkins-agent', command='/bin/sh -c', args='cat', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceLimitCpu='', resourceLimitMemory='', livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe#58f0ceec}]}. Container jnlp exited with error 255. Logs: Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior
Warning: SECRET is defined twice in command-line arguments and the environment variable
Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
Apr 03, 2018 4:16:16 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: jnlp-slave-t8004
Apr 03, 2018 4:16:16 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Apr 03, 2018 4:16:16 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.19
Apr 03, 2018 4:16:16 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Apr 03, 2018 4:16:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://172.20.43.30:8080/]
Apr 03, 2018 4:16:17 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://172.20.43.30:8080/tcpSlaveAgentListener/ is invalid: 404 Not Found
java.io.IOException: http://172.20.43.30:8080/tcpSlaveAgentListener/ is invalid: 404 Not Found
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:197)
at hudson.remoting.Engine.innerRun(Engine.java:518)
at hudson.remoting.Engine.run(Engine.java:469)
Apr 03, 2018 12:16:21 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for agent jnlp-slave-t8004
Apr 03, 2018 12:16:21 PM WARNING io.fabric8.kubernetes.client.Config tryServiceAccount
Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
Apr 03, 2018 12:16:21 PM INFO okhttp3.internal.platform.Platform log
ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Apr 03, 2018 12:16:21 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for agent jenkins-slaves/jnlp-slave-t8004
Apr 03, 2018 12:16:21 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer jnlp-slave-t8004
Apr 03, 2018 12:16:25 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Excess workload after pending Kubernetes agents: 1
Apr 03, 2018 12:16:25 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Template: Kubernetes Pod Template
Apr 03, 2018 12:16:25 PM WARNING io.fabric8.kubernetes.client.Config tryServiceAccount
Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
Apr 03, 2018 12:16:25 PM INFO okhttp3.internal.platform.Platform log
ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Apr 03, 2018 12:16:25 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0
Apr 03, 2018 12:16:35 PM WARNING io.fabric8.kubernetes.client.Config tryServiceAccount
Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
Apr 03, 2018 12:16:35 PM INFO hudson.slaves.NodeProvisioner$2 run
Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
Apr 03, 2018 12:16:35 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Excess workload after pending Kubernetes agents: 0
Apr 03, 2018 12:16:35 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Template: Kubernetes Pod Template
Apr 03, 2018 12:16:35 PM INFO okhttp3.internal.platform.Platform log
ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Apr 03, 2018 12:16:35 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Created Pod: jnlp-slave-bnz94 in namespace jenkins-slaves
Apr 03, 2018 12:16:35 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
-Steve Maring
Orlando, FL
I went to http://172.20.43.30:8080/configureSecurity/ and set "Agents:TCP port for JNLP agents" to "random"
I then got a "jnlp-slave-ttm5v (suspended)" that stays in the "Build Executor Status"
and the log said:
Container is waiting jnlp-slave-ttm5v [jnlp-slave]:
ContainerStateWaiting(message=Error response from daemon: the working directory './.jenkins-agent' is invalid, it needs to be an absolute path, reason=CreateContainerError, additionalProperties={})
After setting "Working directory" to "/home/jenkins" I saw a pod actually get created on k8s:
# kubectl get pods --namespace=jenkins-slaves
NAME READY STATUS RESTARTS AGE
jnlp-slave-1ds27 2/2 Running 0 42s
and my job ran successfully!
Started by user Buildguy
Agent jnlp-slave-1ds27 is provisioned from template Kubernetes Pod Template
Agent specification [Kubernetes Pod Template] (jenkins-slaves):
* [jnlp-slave] jenkins/jnlp-slave(resourceRequestCpu: , resourceRequestMemory: , resourceLimitCpu: , resourceLimitMemory: )
Building remotely on jnlp-slave-1ds27 (jenkins-slaves) in workspace
/home/jenkins/workspace/maven-parent-poms

jenkins swarm plugin: Unable to connect to master. Connection refused

I am using jenkins swarm plugin to connect a slave node to a master node. However, it is giving Connection Refused error.
I am using docker swarm to deploy this on a multi node cluster. Right now I am testing with a single node setup only and here is my compose file for the setup:
version: '3.1'
services:
viz:
image: manomarks/visualizer
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
ports:
- "8090:8080"
master:
image: myproject-jenkins-master
ports:
- "8080:8080"
- "50000:50000"
volumes:
- ./jenkins_home:/var/jenkins_home
nginx:
image: myproject-jenkins-nginx
ports:
- "80:80"
linuxagent:
image: myproject-jenkins-linuxagent
And its being deployed using the docker stack deploy command so each of the above service gets its own internal load balancer.
Jenkins is served on web via nginx. But I've also kept port 8080 exposed for debugging. If I open my browser and hit either of http://localhost or http://localhost:8080, I can access the jenkins web interface. On master, I have installed the jenkins swarm plugin. If I get inside the bash shell from my linuxagent container, which is intended to be a slave node, I can ping the other docker services but here is what happens when I try to run the swarm-client-3.3.jar file.
If I try to access jenkins, I can do so by accessing http://myproject_master:8080
root#d139902be5de:~# curl http://myproject_master:8080
<html><head><meta http-equiv='refresh' content='1;url=/login?from=%2F'/><script>window.location.replace('/login?from=%2F');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
-->
</body></html> root#d139902be5de:~#
Also, I can access the same via http://myproject_nginx
root#d139902be5de:~# curl http://myproject_nginx
<html><head><meta http-equiv='refresh' content='1;url=/login?from=%2F'/><script>window.location.replace('/login?from=%2F');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
-->
</body></html> root#d139902be5de:~#
If I try to launch the jar file for swarm, I get this error:
root#d139902be5de:~# java -jar swarm-client-3.3.jar -username mandeep -password 12213 -master http://myproject_nginx
Apr 09, 2017 5:52:44 PM hudson.plugins.swarm.Client main
INFO: Client.main invoked with: [-username mandeep -password 12213 -master http://myproject_nginx]
Apr 09, 2017 5:52:44 PM hudson.plugins.swarm.Client run
INFO: Discovering Jenkins master
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Apr 09, 2017 5:52:44 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to http://myproject_nginx/ bc1be8e7-eaf0-47ff-8aeb-36f75d6ba081 with ID c84ce43b
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: d139902be5de-c84ce43b
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://myproject_nginx/]
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: myproject_nginx
Agent port: 50000
Identity: 86:5b:f3:77:84:92:21:87:95:4c:4b:0e:f7:4e:e5:1d
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Apr 09, 2017 5:52:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to myproject_nginx:50000
Apr 09, 2017 5:52:55 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to myproject_nginx:50000 (retrying:2)
java.io.IOException: Failed to connect to myproject_nginx:50000
at org.jenkinsci.remoting.engine.JnlpAgentEndpoint.open(JnlpAgentEndpoint.java:243)
at hudson.remoting.Engine.connect(Engine.java:500)
at hudson.remoting.Engine.innerRun(Engine.java:364)
at hudson.remoting.Engine.run(Engine.java:287)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
at org.jenkinsci.remoting.engine.JnlpAgentEndpoint.open(JnlpAgentEndpoint.java:204)
... 3 more
Seems like the issue is that the slave expects master to be myproject_nginx and hence hits the port 50000 which fails because this service does not have port 50000 exposed. So for that purpose, I have exposed port 8080 and 50000 on the myproject_master service. But when I try that url, then I get a different error
root#d139902be5de:~# java -jar swarm-client-3.3.jar -username mandeep -password 12213 -master http://myproject_master:8080
Apr 09, 2017 5:57:01 PM hudson.plugins.swarm.Client main
INFO: Client.main invoked with: [-username mandeep -password 12213 -master http://myproject_master:8080]
Apr 09, 2017 5:57:01 PM hudson.plugins.swarm.Client run
INFO: Discovering Jenkins master
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Apr 09, 2017 5:57:01 PM hudson.plugins.swarm.Client run
SEVERE: IOexception occurred
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:286)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at hudson.plugins.swarm.SwarmClient.discoverFromMasterUrl(SwarmClient.java:224)
at hudson.plugins.swarm.Client.run(Client.java:115)
at hudson.plugins.swarm.Client.main(Client.java:88)
This is where I am completely stuck and unable to understand how to fix the issue. Can there be a problem because the slave is running behind a load balancer ? Or is there anything fundamentally wrong in the architecture that I am trying to accomplish ? I want to be able to scale the linuxagent service dynamically so that each of the node behaves as a slave agent and just connects with the master node whenever it is launched. I read about the swarm plugin for jenkins and found that it can be used to achieve this kind of setup
for me changing the order of parameters fixed it.
Please move -master to beginning of swarm parameters as:
java -jar /usr/share/jenkins/swarm-client-3.4.jar -disableSslVerification -master
This should fix the following error which i was seeing:
Jul 25, 2017 6:26:23 PM hudson.plugins.swarm.Client run
INFO: Discovering Jenkins master
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
further details.
Exception in thread "main" java.lang.IllegalArgumentException: Host
name may not be null
at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:68)
at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:107)
at org.apache.commons.httpclient.HttpMethodBase.setURI(HttpMethodBase.java:280)
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:220)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at hudson.plugins.swarm.SwarmClient.discoverFromMasterUrl(SwarmClient.java:220)
at hudson.plugins.swarm.Client.run(Client.java:114)
at hudson.plugins.swarm.Client.main(Client.java:87)
Try opening up Inbound for TCP ports 443 and 50000 both.

Resources