Jenkins slave pod on kubernetes randomly failing - jenkins

I have set a Jenkins master (on a VM) and this is provisioning jnlp slaves as kubernetes pods.
In very rare occasions, the pipeline fails, with this message:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.io.OutputStream.write(OutputStream.java:75)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.setupEnvironmentVariable(ContainerExecDecorator.java:510)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:474)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:333)
at hudson.Launcher$ProcStarter.start(Launcher.java:455)
Viewing kubernetes logs Stackdriver in Stackdriver, one can see that the pod does manage to connect to the master, e.g.
Handshaking
Agent discovery successful
Trying protocol: JNLP4-Connect
Remote Identity confirmed: <some_hash_here>
Connecting to <jenkins-master-url>:49187
started container
loading plugin ...
but after a while it fails and here are the relevant logs:
org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
hudson.remoting.jnlp.Main$CuiListener status
Terminated
hudson.remoting.Request$2 run
Failed to send back a reply to the request hudson.remoting.Request$2#336ec321: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#29d0e8b2:JNLP4-connect connection to <jenkins-master-url>/<public-ip-of-jenkins-master>:49187": channel is already closed
"Processing signal 'terminated'"
.
.
.
How can I further troubleshoot this random error?

Can you take a look at the Kubernetes Pod-Events with Stackdriver? We had a similar behavior with a different CI-System (GitlabCI). Our builds where also randomly failing. It turned out that the JVM inside the Container exceeded its memory limitation and was killed by Kubernetes (OOMKilled) and the CI-System recognised this as a build error.

Related

Jenkins slave configuration issue

Have been trying for hours for setting up master-slave configuration in jenkins and getting this error when triggering the jnlp file from slave machine: "SEVERE: The server rejected the connection: None of the protocols were accepted
java.lang.Exception: The server rejected the connection: None of the protocols were accepted".. Everything seems set correctly from my end - Java 1.8.181 in both master and slave machines, jenkins 2.147 in both machines.
Here is the entire log: https://gist.github.com/anuraagkb/13f4f226a411fe02596af66be877257d
My bad..for the Jenkins URL under Manage Jenkins-> Configure System I had entered localhost instead of the server IP which caused the issue.

Jenkins build log shows aborted by user

Jenkins job ( in Network A ) runs on a slave machine ( say, server A in Network A). Jenkins job has instructions as part of the build to SSH to a server ( say server B in Network B) and execute further steps.
The job runs for about 2.5 hours. Very randomly it fails with the error message stating
18:24:14 Aborted by <USERNAME>
18:24:14 Finished: ABORTED
On server B where the build is executed, TCP keep alive is set to yes and to probe a signal every 80secs. On the kernel level, the tcpkeepalive parameter is set to 2.5hours.
I'm sure that the problem is not the timeout from this machine as i have seen a run with a duration of 157 minutes pass successfully.
The build log does neither have any further lines nor it is descriptive.
How can i effectively debug this problem? We are unable to track anything from the network traffic as there is only one session when the slave is established with SSH.
If incase, this is due to any error within the build, how can i make Jenkins throw descriptive message so that we can narrow down to the root cause?
What specifically can be tracked in network to check if this is due to network glitch?

tcpSlaveAgentListener not found on Jenkins server

I am trying to connect to a Jenkins master instance from a slave. From a connectivity standpoint, everything looks good. I am able to curl the selected "TCP port for JNLP agent" as set in "Configure Global Security" in Jenkins from where I am starting the slave node:
$ curl http://myjenkinsurl:7004/
Jenkins-Agent-Protocols: CLI-connect, CLI2-connect, JNLP-connect,
JNLP2-connect, JNLP4-connect, Ping
Jenkins-Version: 2.62
Jenkins-Session: 77c90621
Client: 10.0.0.2
Server: 172.0.0.2
However, when trying to get to start a slave node, I get this error reported on the slave node:
INFO: Locating server among [http://myjenkinsurl:7004]
May 25, 2017 12:22:12 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://myjenkinsurl:7004/tcpSlaveAgentListener/ is invalid: 404 Not Found
I also get the 404 Not Found error when curling http://myjenkinsurl:7004/tcpSlaveAgentListener/
This is what my relevant section of Configure Global Security looks like:
I am getting this error since upgrading to Jenkins 2.62 and it was previously working with a similar configuration on Jenkins 2.19.
Most of the similar queries I can find to this question do not have this additional port configured correctly, but the output I get from curling Jenkins on this set port, 7004 in my case, seems to indicate that this is not where the problem lies.
How do I get the tcpSlaveAgentListener URL to function?
My case, I'm running both: master and the slaves on Kubernetes.
The challenge of getting this working under a ingress apart, I was getting a similar error, and if I understood it right:
the /tcpSlaveAgentListener/ should exist under Jenkins main port (usually 8080).
This URL, you configure at Manage Jenkins -> Configuration-> Cloud / Kubernetes
Jenkins URL: http://jenkins:8080
Then, under Manage Jenkins -> Global Security-> Agents
TCP Port for inbound agents: 50000.
Here you can't use the same port as your main service for Jenkins.
I'm running also master and slaves on kubernetes
What I did to correct the error is to set up two different jenkins urls:
ingress url
Manage Jenkins > System configuration > Configure system > Jenkins location > Jenkins url: https://jenkins.local/
jenkins service url
Configure clouds > Kubernetes > Jenkins url: http://jenkins-service:8080

Unable to follow sandbox link in Apache Mesos

I have a mesos cluster setup with a master and slave on separate hosts on GCE. In the mesos console I can see the list of tasks - but the sandbox link isn't working.
I'm getting this error
Failed to connect to slave '20150806-140802-3049385994-5050-1-S0' on '4c37a1dd950b:5051'.
Potential reasons:
The slave's hostname, '4c37a1dd950b', is not accessible from your network
The slave's port, '5051', is not accessible from your network
The slave timed out or went offline
4c37a1dd950b is on the same server as the master.
Any tips appreciated

Kubernetes minion not completely connecting

I have a dev kubernetes cluster setup where I have a minion running kube-proxy and kubelet. Both only start if it can connect to the master's apiserver, which it can. Howerver I am getting
error updating node status, will retry: error getting node "10.211.55.126": minion "10.211.55.126" not found
I notice prior to that I get this: Server rejected event '&api.Event followed by a large json object with mostly empty string values.
repeatedly when I try running the minion's kubelet. I have it pointing to a private ip and it is reporting that it can't fin the public ip. I imagine this is an etcd issue but I'm not sure, also it maybe flanneld?
Update 1
I managed getting pass the initial error by registering the minion(node?) with the master. This allow it to receive pods from mast and run the containers,; however, the minion is still not fully connected and resulting in the master to continuously push more pods to the minion. The kubelet process is reporting: Cannot get host IP: Host IP unknown; known addresses: []. Is there a flag to run kubelet with to give it the host ip?
Currently, I have to manually register the minion prior to spinning up the minion instance. This is because there is an open issue as of right now not allowing the minion to self-register in certain cases.
UPDATE
Now I'm using kube-register to register each minion/node on start of the kubelet service.

Resources