Jenkins remote executors are disconnecting frequently - jenkins

I am trying to setup 4 RHEL machines in jenkins as build executors. But the systems are getting disconnected so frequently. I have checked the following points already. Still no luck
Network problems between master and slave
Slave server availability
Slow response from slave (disabled Response time in Manage Jenkins > Manage Nodes > Configure > Response time)
I am getting the following error everytime
Feb 22, 2019 7:59:55 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

Related

Disconnected Jenkins agents using websocket

I am trying to move my build agents to use the websocket protocol instead of JNLP. It seems very close but my agents are showing as disconnected.
If I look at the log for the agent it says:
Inbound agent connected from x.x.x.x < agent's IP
and in the agent's logs it says:
2020-05-13T16:36:33.132-04:00 INFO: http://jenkins.domain.internal/login is not ready: 503 < waiting for Jenkins to join the lb
2020-05-13T16:36:43.749-04:00 May 13, 2020 8:36:43 PM hudson.remoting.jnlp.Main$CuiListener status
2020-05-13T16:36:43.749-04:00 INFO: WebSocket connection open
2020-05-13T16:36:43.749-04:00 May 13, 2020 8:36:43 PM hudson.remoting.jnlp.Main$CuiListener status
2020-05-13T16:36:43.749-04:00 INFO: Connected
But the agent is showing as offline. I'm at a loss to troubleshoot this. If I delete the agent out of Jenkins then I see a line in the Jenkins log saying that the agent is not recognized.
It appears that with version inbound-jenkins-agent:4.0.1-1 of the JNLP inbound agent works fine. Other, later versions are having trouble. At least in our environment things started working after using this version.
See this issue: https://github.com/jenkinsci/docker-inbound-agent/issues/172

Jenkins slave pod on kubernetes randomly failing

I have set a Jenkins master (on a VM) and this is provisioning jnlp slaves as kubernetes pods.
In very rare occasions, the pipeline fails, with this message:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.io.OutputStream.write(OutputStream.java:75)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.setupEnvironmentVariable(ContainerExecDecorator.java:510)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:474)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:333)
at hudson.Launcher$ProcStarter.start(Launcher.java:455)
Viewing kubernetes logs Stackdriver in Stackdriver, one can see that the pod does manage to connect to the master, e.g.
Handshaking
Agent discovery successful
Trying protocol: JNLP4-Connect
Remote Identity confirmed: <some_hash_here>
Connecting to <jenkins-master-url>:49187
started container
loading plugin ...
but after a while it fails and here are the relevant logs:
org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
hudson.remoting.jnlp.Main$CuiListener status
Terminated
hudson.remoting.Request$2 run
Failed to send back a reply to the request hudson.remoting.Request$2#336ec321: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#29d0e8b2:JNLP4-connect connection to <jenkins-master-url>/<public-ip-of-jenkins-master>:49187": channel is already closed
"Processing signal 'terminated'"
.
.
.
How can I further troubleshoot this random error?
Can you take a look at the Kubernetes Pod-Events with Stackdriver? We had a similar behavior with a different CI-System (GitlabCI). Our builds where also randomly failing. It turned out that the JVM inside the Container exceeded its memory limitation and was killed by Kubernetes (OOMKilled) and the CI-System recognised this as a build error.

Jenkins slave configuration issue

Have been trying for hours for setting up master-slave configuration in jenkins and getting this error when triggering the jnlp file from slave machine: "SEVERE: The server rejected the connection: None of the protocols were accepted
java.lang.Exception: The server rejected the connection: None of the protocols were accepted".. Everything seems set correctly from my end - Java 1.8.181 in both master and slave machines, jenkins 2.147 in both machines.
Here is the entire log: https://gist.github.com/anuraagkb/13f4f226a411fe02596af66be877257d
My bad..for the Jenkins URL under Manage Jenkins-> Configure System I had entered localhost instead of the server IP which caused the issue.

Jenkins build log shows aborted by user

Jenkins job ( in Network A ) runs on a slave machine ( say, server A in Network A). Jenkins job has instructions as part of the build to SSH to a server ( say server B in Network B) and execute further steps.
The job runs for about 2.5 hours. Very randomly it fails with the error message stating
18:24:14 Aborted by <USERNAME>
18:24:14 Finished: ABORTED
On server B where the build is executed, TCP keep alive is set to yes and to probe a signal every 80secs. On the kernel level, the tcpkeepalive parameter is set to 2.5hours.
I'm sure that the problem is not the timeout from this machine as i have seen a run with a duration of 157 minutes pass successfully.
The build log does neither have any further lines nor it is descriptive.
How can i effectively debug this problem? We are unable to track anything from the network traffic as there is only one session when the slave is established with SSH.
If incase, this is due to any error within the build, how can i make Jenkins throw descriptive message so that we can narrow down to the root cause?
What specifically can be tracked in network to check if this is due to network glitch?

Jenkins agent keeps disconnecting / reconnecting repeatedly

I have a jenkins master server. I just created a new jenkins agent and launching it via Java Web Start in a ubuntu host. The agent connects successfully, but after some time it says "Terminated", then again after some time it says "Connected". And it keeps repeating like this throughout.
I am not even trying to run a build/job yet
Interestingly enough, this ubuntu agent and this jnlp and this java web start has been working fine for the last several weeks - even until a few hours ago. Now suddenly it's starting to disconnected and reconnect repeatedly like this.
JNLP agent connected from /116.68.205.58
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 3.2
This is a Unix agent
ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
ERROR: Failed to install restarter
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:307)
at hudson.remoting.Channel.terminate(Channel.java:888)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:92)
at ......remote call to Channel to /116.68.205.58(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1537)
at hudson.remoting.Request.call(Request.java:172)
at hudson.remoting.Channel.call(Channel.java:821)
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.install(JnlpSlaveRestarterInstaller.java:52)
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.access$000(JnlpSlaveRestarterInstaller.java:33)
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:39)
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:36)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
JNLP agent connected from /116.68.205.58
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 3.2
This is a Unix agent
Check the jenkins slave log for possible problems. Also, how is the Availability setting under the Jenkins node configuration page?
Jenkins >> Manage Jenkins >> Manage Nodes >> your node >> Configure
I recently had a Windows slave with the same symptom and change the Availability from
"Take this agent online when in demand, and offline when idle"
to
"Keep this agent online as much as possible"
and it solved my problem, but you might have a different problem from the one I had. So I suggest first viewing the slave logs. If you can, post the log snippet here for further analysis.

Resources