I have a set of Jenkins slaves that were connected via SSH. They have all been working fine for several months. This morning, I found that all slaves were disconnected, and all had the same error when I tried to launch the agent:
01/05/18 16:27:13] [SSH] Starting slave process: cd "/home/ubuntu/jenkins_slave" && java -jar slave.jar
<===[JENKINS REMOTING CAPACITY]===>channel started
Slave JVM has not reported exit code. Is it still running?
[01/05/18 16:27:20] Launch failed - cleaning up connection
[01/05/18 16:27:20] [SSH] Connection closed.
ERROR: Connection terminated
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
One common issue I found was a mismatch between Jenkins and Java versions, but I believe mine are compatible (Jenkins server says 2.46.3 and the slaves all have Java 1.7).
Regarding "Is it still running?", I don't see a Jenkins slave process running:
ps aux | grep java
returns nothing.
I have been unable to locate any Jenkins logs on the slave side. All the logs I've found on the master side only reiterate the error pasted above.
Related
I just upgraded Jenkins from 2.249.2 to 2.263.2. When I tried to build in slave node, I could not SSH to server by SSH Publishers.
Running as SYSTEM
[Office365connector] No webhooks to notify
[EnvInject] - Loading node environment variables.
Building remotely on native-slave in workspace /home/jenkins/jenkins_workspace/workspace/test1
SSH: Connecting from host [native-slave]
SSH: Connecting with configuration [dev] ...
SSH: Disconnecting configuration [dev] ...
ERROR: Exception when publishing, exception message [Exec timed out or was interrupted after 120,000 ms]
Another issue is that I could not download artifact from Jenkins slave workspace, the Jenkins webpage just loading
It just happen after I upgraded to new version.
Is there something wrong or what might causing the issue?
I am successfully able to connect to the agent client server via ssh agent. The issue I am having is after deleting the agent and adding it again as a new agent.
What all so far I have tried on client- 1) restarted sshd 2) removed content from remote dir 3) Changed remote dir
I can see, it is always able to copy the remote.jar but somehow agent is not connecting after 1st time deletion of agent.
INFO: Both error and output logs will be printed to /home/a214p/remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 4.3
This is a Unix agent
ERROR: null
java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask.report(FutureTask.java:121)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:475)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:296)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[07/20/20 14:28:21] Launch failed - cleaning up connection
[07/20/20 14:28:21] [SSH] Connection closed.
ERROR: Connection terminated
Any help or suggestion?
Have you tried the method described here to add a Jenkins slave via SSH?
Additionally try following order:
remove slave agent from master
kill all Jenkins/java processes running in slave server
gracefully restart jenkins master
re-add slave agent in master
I was able to install and run jenkins on my linux subsystem in Windows 10.
It listens on 8082.
But unfortunately, for an unknown reason, it hangs up infinitely after a few minutes (or to be precise after a I've made a change in a job config and execute a build).
Then, I checked in the terminal:
root#jup1t3r /h/navds# service jenkins status
Correct java version found
2 instances of jenkins are running at the moment
but the pidfile /var/run/jenkins/jenkins.pid is missing
root#jup1t3r /h/navds# service jenkins stop
Correct java version found
* Stopping Jenkins Automation Server jenkins
...done.
root#jup1t3r /h/navds# service jenkins status
Correct java version found
2 instances of jenkins are running at the moment
but the pidfile /var/run/jenkins/jenkins.pid is missing
So there is no way to stop Jenkins. How can I restart it ?
I'm using CentOS7 as my master (Jenkins version 2.46) and have many slaves connected to it with different OS. I've spotted a strange behavior with mine three Windows_7 slaves - they go offline every few hours and after minutes comes back online. With this error:
Connection was broken: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
After a minute or so it comes back to normal
executing pre-launch scripts ...
[2017-02-14 17:00:57] [windows-slaves] Connecting to 10.10.126.130
Checking if Java exists
java -version returned 1.7.0.
[2017-02-14 17:01:00] [windows-slaves] Installing the Jenkins slave service
[2017-02-14 17:01:00] [windows-slaves] Copying jenkins-slave.exe
[2017-02-14 17:01:00] [windows-slaves] Copying slave.jar
[2017-02-14 17:01:00] [windows-slaves] Copying jenkins-slave.xml
[2017-02-14 17:01:00] [windows-slaves] Registering the service
[2017-02-14 17:01:00] [windows-slaves] Starting the service
[2017-02-14 17:01:00] [windows-slaves] Waiting for the service to become ready
[2017-02-14 17:01:05] [windows-slaves] Connecting to port 49,253
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 3.4.1
This is a Windows agent
just before slave CCG-WIN7-VS2008 gets online ...
executing prepare script ...
setting up slave CCG-WIN7-VS2008 ...
slave setup done.
Scheduled overwrite of jenkins-slave.exe on the next service startup
Agent successfully connected and online
I started getting this problem after moving these slaves from DHCP to static IP's. And only Windows slaves have this problem, all other OS's are working as they should.
They build projects so there is no problem there, but constant spam of email of disconnect and connect is a bit annoying.
Jenkins master is running on Amazon instance and slave machine set up on dedicated Soyoustart machine. Worked fine until it was needed to redo the slave setup: reinstalled the OS, installed Java, added masters key to slave authorized_keys and removed/added again the slave in masters known_hosts. Set up new credentials for the slave and configured the node in Jenkins master but it is unable to connect to the slave.
The setup is the the same that is and has been working with other slaves without hiccups. The only thing different is that this time the new slave is the same machine with the IP as the old one was.
It is possible to ssh into the slave from master from CLI(replaced filename and slave IP with placeholder for this post):
$ ssh -i <key-file> jenkins#<slave-ip>
Credentials have been set up :
Node is configured:
Output when connecting to the slave:
[05/17/15 07:30:31] [SSH] Opening SSH connection to <slave-ip>.
Key exchange was not finished, connection is closed.
ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Connection is not established!
at com.trilead.ssh2.Connection.getRemainingAuthMethods(Connection.java:1030)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPublicKeyAuthenticator.getRemainingAuthMethods(TrileadSSHPublicKeyAuthenticator.java:88)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPublicKeyAuthenticator.canAuthenticate(TrileadSSHPublicKeyAuthenticator.java:80)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:207)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:169)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1173)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:701)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[05/17/15 07:30:31] Launch failed - cleaning up connection
[05/17/15 07:30:31] [SSH] Connection closed.
Version numbers:
Jenkins 1.613
SSH Credentials Plugin 1.11
SSH Slaves plugin 1.9
For those who prefer to dig into code:
SSH Credentials Plugin
SSH Slave Plugin
Trilead SSH
Am I missing something obvious here? What could be causing this? Any known workaround? Or does it look like a bug that needs to be reported?
Please let me know if more information is needed.
I'm running Jenkins master using official Docker image which uses OpenJDK8 and should not need to install JCE.
Apparently this is an unresolved issue in Jenkins/SSH security.
My current workaround is by commenting out MACs and KexAlgorithm line in /etc/ssh/sshd_config of Jenkins Slave and restarting the sshd (service ssh restart on Ubuntu)
UPDATE: the issue has been resolved as of 2017-04-29
I suspect that you need to install the Java Cryptography Extension for your JVM.
Without that the RSA key size is limited and authentication is not being established.
See https://issues.jenkins-ci.org/browse/JENKINS-26495 for more details.