Jenkins Slave Service Not Running: How to Debug - jenkins

I have a Jenkins set-up where there is a master running on OS X with a Windows slave running on the same box as VM.
On many occasions when the VM is restarted the Jenkins service appears to either not start or possible encounters an error.
The set-up of the service looks correct and the VM is configured to automatically login as the Jenkins user, when its manually started everything seems to work fine so I can only assume the problem is on start-up of the box.
I have two questions:
Are there any well known gotcha's that can cause this?
Does anyone have some good strategies for debugging this? I'm assuming the answer will be somewhere in the Windows Logs but finding it is proving difficult (since the box and the user both contain the word Jenkins a simple find isn't helpful).

Related

Jenkins Agents "Unable to create live FilePath" and marked offline

Jenkins Controller reports : Unable to create live FilePath for i-xxxxxxxxxxxxx and Agent is marked Offline
Googling this error indicates that it is a problem with the communication paths between Controller and Agent, but what?
Background:
Jenkins Controller running v2.332.1, Java 11 64bit OS, inside a docker container
Jenkins Agents running Swarm-Client jar downloaded from the Controller on startup. Swarm Plugin Version 3.32 Java 11 and 64bit OS, inside a docker container
Agents and Controller are hosted on separate EC2 instances in AWS with Security Group permissions on the relevant ports.
The Instance starts up runs the Cloud-Init, downloads the swarm-client.jar from Jenkins Controller and then runs it with the parameters required to connect to the controller. I mention this to avoid the "are you using the correct version" comments :-)
The Agent connects and is all fully online and gets busy servicing the pending Job queue.
Then some time later, indeterminate, some jobs last > 24 hours and have not failed, other jobs last minutes and sometimes fail.
Things I have tried: (some)
The Swarm Client jar can use either WebSockets and connect to the FQDN of the Jenkins controller or use the JNLP protocol to connect to the IP and dedicated agent connection port (fixed value on the Controller).
Similar behavior is seen with either protocols.
Opening all the AWS Security Groups: incase there was another port, not mentioned, that needed to be open.
Bypass AWS Load balancer: Agent connects directly to Controller IP:PORT via JNLP
Matching Versions: Swarm Client downloaded from Controller
Updated Versions: Jenkins 2.319.3, 2.332.1
Normalized Java environments: Java 11 64bit OS
Enabled Logging on the Agents: periodic communications happens and then stops after a while, without obvious reason.
Increased Controller Instance size: m5.xlarge -> m5.2xlarge
Bumping Jenkins up to a non-LTS version allowed the connections to become more stable.
Jenkins 2.341 and Swarm-Client version 3.32 both use Remoting version 4.13
Now, while I am not particularly happy about running a non-LTS version of Jenkins, I am pleased to have found a workaround
I have also struggled with this issue, I am adding details here, so, that others don't have to struggle.
This is all what i tried:
we had everything running when we had JDK 8 in both master and slave.
So, we added code to have JDK 11 in both and I replaced ec2 of Jenkins with a new one with help of ASG.
So, issue came, and we reverted, but still the issue was the same.
So, I was just assuming by this warning in jenkins as it says moveto jdk 11,as there anything like deprecated...so, I was just checking also we can try this new version of Jenkins as well, what they have mentioned. --going to Jenkins 2.344 with jdk8 ,same issue, and also to different jenkins version didn't help and I lost hope.
I have tried with a biggest ec2 type for slave --didn't help
I checked htop in slave --didn't help.
I tried restarting jenkins master --didn't help.
I tried changing remote dir for slave as mentioned in stack overflow --didn't help.
So, I have a thought, as Jenkins ec2 is terminated and new ec2 came up, so, things may get updated in jenkins by that...and also warning showing to have a new version of jenkins and jdk 11..so, that looked somewhat a hope to me.
I tried by increasing tomeout 20 min in slave setup, didn't help.
I tried adding this command :sudo yum -y update --security in init script of node of jenkins ec2 plgin--will not help.
we have tried jdk 11 image, jdk8 image and new jdk8 jenkins version image, issue was same in all.
So, what finally solved the issue:
that we moved to older version of jenkins:
https://hub.docker.com/layers/jenkins/jenkins/jenkins/2.330-jdk8/images/sha256-97fcb[…]17da34f0d07c021ab57083ee8c77dc4b21281d3498137?context=explore
Fixed by upgrading to Jenkins 2.344

Why is it unadvisable to run Jenkins on the same computer one develops on?

I have read four tutorials about getting started with Jenkins, and whilst they say it is possible to run Jenkins on the same computer on develops on they also all recommend installing it on a separate one, most commonly a Mac Mini. However: I only own a MacBook Pro; am short on cash; and am only person contributing to my iOS projects currently (I want to learn Jenkins for future client work). So it would be better for me for now to use my MacBook for both purposes.
Whilst I appreciate this is a matter of opinion somewhat, I am wondering what the reason is for the recommendation of separation, and whether I might be able to run Jenkins on the MacBook for now?
Thank you for reading.
The reason it is advised to have a master server and a number of slave server is only valid in company (or big team) environment. It is that build job can be CPU and memory intensive and often many developer starts jobs on the server. In cases like that one machine (being the master and slave server ot once) will be slow. Not only the jobs will take longer to finish, but even the web interface may become unresponsive.
For learning the basic configuration steps one machine is totally enough and you can even run your builds with your Jenkins instance.
I'm not entirely sure what the reason for that is in those tutorials, however, I can suggest an easy way to get started with Jenkins for free (That's how I usually run jenkins for personal use). You can create a free account with one of the Cloud providers like AWS, GCP or Azure and have your jenkins running there. For example, in AWS you can have a 1-year free trial account where you can spin up some free servers. There are many tutorials online, like this one, which will show you step by step of how to get started with Jenkins on AWS. Here are some high-level steps:
Create a free account in AWS (or any other cloud provider)
Spin up an EC2 instance - it can be any linux version or windows, whatever you are more comfortable with
SSH or RDP to the instance and install jenkins - there are exact installation steps for any flavor of your OS out there
Once the installation is complete, you will be able to access jenkins on your browser - in case of AWS, it would be the public ip of the server and default port 8080

Windows 10 Jenkins slave keeps rebooting and Jenkins won't start and Edge won't open

We have a vm windows 10 machine jenkins slave that keeps rebooting although we have turned off windows auto update. When it restarts it tries to restart Jenkins which throws this error:
> java.lang.exception the server rejected the connection: none of the protocols were accepted at
> hudson.remoting.engine.onConnectionRejected(Engine.java:286)
> at hudson.remoting.Engine.run(Engine.java:262)
We then have to go in and manually start jenkins agent. This is not good for our automated tests that require this machine.
As an additional side note: Edge won't open. You can click on it all day and nothing appears to occur.
Any ideas on what is causing the windows reboot and how to stop it?
We wound up having our lab team create a new vm and deleting this one.
This sounds like a cause of malware, try downloading a free anti virus {have some help with this, alot contain malwate}.
Best Regards
Shadow
Thanks!
additional notes! Ive had this problem many times with windows 8 in a virtual box!

Jenkins Server Suddenly Fails. Cannot reach GUI

I setup a Jenkins server on a redhat linux VM a while back to run our unit and integration tests. It has worked without much trouble for about two months, but now I suddenly can no longer browse to the GUI/HUB. I don't believe I have changed anything (I know everyone says that :) ) however when I look at the logs I get the following errors
WARNING: Untrapped servlet exception
winstone.ClientSocketException: Failed to write to client
at winstone.ClientOutputStream.write(ClientOutputStream.java:41)
The Jenkins service is running, I have restarted it and the VM with no resolution to this issue. Even the jenkins jobs that I have written are still running as far as I can tell providing emails every now and again, but I cannot browse to the GUI. Anyone run into something like this before. I've searched for this issue and some people have been suggesting to re-install jenkins, but I am not trying to do that!
alright a long time later I finally figured it out. Turns out winstone was not the issue, but rather file permissions were to blame. Some of the files in my jenkins folder /var/lib/jenkins/ had root as their owner rather than jenkins. There were some in .m2 some in .grails and just scattered all about, not sure how this happened.
Anyway I just navigated to the home dir of jenkins /var/lib/jenkins and ran the following command
chown -R jenkins:jenkins jenkins

jenkins slave can't get started

I set up jenkins on a linux server and select a Win7 pc as slave. I choose "launch slave agents via java web start" in slave configuration. when I use following command to start the slave, the Jenkins slave agent window showed connected and then at once turned to "terminated". Anybody help? Thanks a lot!
set SLAVENAME=%1
set CYGPATH=%2
if x%CYGPATH% == x set CYGPATH=C:\APPS\cygwin\bin
set PATH=%CYGPATH%;%PATH%
:RUN_SLAVE
echo %PATH%
javaws %MASTER%/computer/%SLAVENAME%/slave-agent.jnlp
exit 0
My issue was that my drive was not set. I told it to use the D:\ Drive because I had cloned it from another machine. However that drive didn't exist (Coudln't access it to place down the files).
There could be many things that are wrong, but because you report the connection is established and then terminated, I think you have an incorrect address for Jenkins itself in the Jenkins global configuration.
Jenkins does not use the "Jenkins URL" setting for a lot of things, but establishing connection with Java Web Start slaves is one of them, so please ensure "Jenkins URL" in the master configuration is set correctly.
Just putting it out there so that it might be useful for someone...
I also had the same issue and on checking the Master log (nice place to start if you have this issue), I came to know that the Remote FS Root was wrong. Actually I was following the official tutorial and even on the tutorial, they suggest you to use "C:\Jenkins\" which is wrong!
Jenkins tries to copy some files to this path but in this case, inverted commas are not accepted. So setting it to C:\Jenkins\ worked out for me!

Resources