Jenkins Agents "Unable to create live FilePath" and marked offline - jenkins

Jenkins Controller reports : Unable to create live FilePath for i-xxxxxxxxxxxxx and Agent is marked Offline
Googling this error indicates that it is a problem with the communication paths between Controller and Agent, but what?
Background:
Jenkins Controller running v2.332.1, Java 11 64bit OS, inside a docker container
Jenkins Agents running Swarm-Client jar downloaded from the Controller on startup. Swarm Plugin Version 3.32 Java 11 and 64bit OS, inside a docker container
Agents and Controller are hosted on separate EC2 instances in AWS with Security Group permissions on the relevant ports.
The Instance starts up runs the Cloud-Init, downloads the swarm-client.jar from Jenkins Controller and then runs it with the parameters required to connect to the controller. I mention this to avoid the "are you using the correct version" comments :-)
The Agent connects and is all fully online and gets busy servicing the pending Job queue.
Then some time later, indeterminate, some jobs last > 24 hours and have not failed, other jobs last minutes and sometimes fail.
Things I have tried: (some)
The Swarm Client jar can use either WebSockets and connect to the FQDN of the Jenkins controller or use the JNLP protocol to connect to the IP and dedicated agent connection port (fixed value on the Controller).
Similar behavior is seen with either protocols.
Opening all the AWS Security Groups: incase there was another port, not mentioned, that needed to be open.
Bypass AWS Load balancer: Agent connects directly to Controller IP:PORT via JNLP
Matching Versions: Swarm Client downloaded from Controller
Updated Versions: Jenkins 2.319.3, 2.332.1
Normalized Java environments: Java 11 64bit OS
Enabled Logging on the Agents: periodic communications happens and then stops after a while, without obvious reason.
Increased Controller Instance size: m5.xlarge -> m5.2xlarge

Bumping Jenkins up to a non-LTS version allowed the connections to become more stable.
Jenkins 2.341 and Swarm-Client version 3.32 both use Remoting version 4.13
Now, while I am not particularly happy about running a non-LTS version of Jenkins, I am pleased to have found a workaround

I have also struggled with this issue, I am adding details here, so, that others don't have to struggle.
This is all what i tried:
we had everything running when we had JDK 8 in both master and slave.
So, we added code to have JDK 11 in both and I replaced ec2 of Jenkins with a new one with help of ASG.
So, issue came, and we reverted, but still the issue was the same.
So, I was just assuming by this warning in jenkins as it says moveto jdk 11,as there anything like deprecated...so, I was just checking also we can try this new version of Jenkins as well, what they have mentioned. --going to Jenkins 2.344 with jdk8 ,same issue, and also to different jenkins version didn't help and I lost hope.
I have tried with a biggest ec2 type for slave --didn't help
I checked htop in slave --didn't help.
I tried restarting jenkins master --didn't help.
I tried changing remote dir for slave as mentioned in stack overflow --didn't help.
So, I have a thought, as Jenkins ec2 is terminated and new ec2 came up, so, things may get updated in jenkins by that...and also warning showing to have a new version of jenkins and jdk 11..so, that looked somewhat a hope to me.
I tried by increasing tomeout 20 min in slave setup, didn't help.
I tried adding this command :sudo yum -y update --security in init script of node of jenkins ec2 plgin--will not help.
we have tried jdk 11 image, jdk8 image and new jdk8 jenkins version image, issue was same in all.
So, what finally solved the issue:
that we moved to older version of jenkins:
https://hub.docker.com/layers/jenkins/jenkins/jenkins/2.330-jdk8/images/sha256-97fcb[…]17da34f0d07c021ab57083ee8c77dc4b21281d3498137?context=explore

Fixed by upgrading to Jenkins 2.344

Related

Unable to download the Jenkins plugins running on Google Cloud Platform

I'm running the Jenkins as a Docker container on a Virtual Machine on Google Cloud Platform. On the very first screen of setup, I can see that a lot of plugins did not install in my Jenkins server?
Please let me know how to resolve this issue? Is it something due to with the security on the cloud by default which restricts downloading of plugins?
Refer following link for screenshot:-
https://storage.googleapis.com/mydockerissues/Jenkins%20Plugins%20Issue.PNG
Cheers
Something similar happened to me when running Jenkins on Docker on my local machine. To get everything to install I had to keep retrying. It took several retries but eventually I got everything installed.
I'm not sure why this is the case. Maybe it fails downloads whose dependencies aren't installed yet?

Jenkins Slave Service Not Running: How to Debug

I have a Jenkins set-up where there is a master running on OS X with a Windows slave running on the same box as VM.
On many occasions when the VM is restarted the Jenkins service appears to either not start or possible encounters an error.
The set-up of the service looks correct and the VM is configured to automatically login as the Jenkins user, when its manually started everything seems to work fine so I can only assume the problem is on start-up of the box.
I have two questions:
Are there any well known gotcha's that can cause this?
Does anyone have some good strategies for debugging this? I'm assuming the answer will be somewhere in the Windows Logs but finding it is proving difficult (since the box and the user both contain the word Jenkins a simple find isn't helpful).

Jenkins won't start after plugin installation *and does not log anything*

I installed Jenkins' Gradle plugin and used the automatic restart option via the Jenkins web interface. Jenkins seemed to hang on the "restarting..." page, so I finally tried to manually restart the Jenkins service on the server (64-bit Debian 7) using service jenkins restart.
Now, Jenkins is no longer running at all (verified with ps -ef | grep -i [J]enkins and service jenkins status), and when I try service jenkins [re]start, I see an [ ok ] message but nothing else seems to happen. I've deleted /var/log/jenkins/jenkins.log, and each time I try a service start (or restart), the log file reappears, but it's blank (ls -lA shows that the file was recently made, but cat produces no output). I also tried rebooting the server, with no effect. I finally deleted the Gradle folders under /var/lib/jenkins/plugins, which also did not appear to make a difference.
How do I even begin to approach this problem? Should I just re-install Jenkins?
EDIT: System info:
> uname -a
Linux AUC-Workstation1 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux
According to dpkg -l, I'm using Debian's jenkins package, version 1.617.
EDIT 2: I'm actually using the jenkins package provided directly by Jenkins, as per the instructions here.
I just had a problem where multiple Jenkins plugins were breaking Jenkins startup (after an upgrade) and here is the procedure I followed to resolve the issue, which might work for other plugin startup issues.
I'm working on an Ubuntu server, but I expect that this would work for Debian if it's going to work at all - I encourage others to adjust the procedure:
logged into the server and switched to the jenkins user (sudo su jenkins in my case)
went to the main jenkins directory
renamed plugins to plugins.problems_YYYYMMDD
previously, I attempted to disable the plugins, but this did not work for me (system still would not start)
created an empty directory plugins
restarted jenkins (sudo service jenkins restart)
In my case, this started just fine
iteratively followed the following procedure to add plugins back in
copied 1 or more plugins from plugins.problems_YYYYMMDD/ to plugins/
restarted jenkins
went to the plugin center and installed updates as available
sometimes I needed to install updates in a particular order due to dependencies
evaluated results in 'Manage Old Data'
I think I'm facing some manual updates of the old data
Note: if you know which plugins are likely the problem, then it is easier to just disable or temporarily (re)move them rather than (re)moving all of the plugins!
I never did figure out the initial problem, but I did get Jenkins working again, sort of.
I uninstalled Jenkins (using apt-get purge) and then re-installed it. This time it failed to start because it needed Java 7, but I apparently only had Java 6 installed (this surprised me, because I thought I had previously configured Jenkins to use Java 7 on that machine). So I installed openjdk-7-jdk and openjdk-7-jre, set JAVA and JAVA_HOME appropriately in the Jenkins config file, and started the service again. This allowed Jenkins to start.

please wait while jenkins is restarting- waiting long

I updated some plugins and restarted the jenkins but now it says:
Please wait while Jenkins is restarting
Your browser will reload automatically when Jenkins is ready.
It is taking too much time (waiting from last 40 minutes). I have only 1 project with around 20 builds. I have restarted jenkins many times and worked fine but now it stucks.
Is there any way out to kill/suspend jenkins to avoid this wait?
I had a very similar issue when using jenkins build-in restart function. To fix it I killed the service (with crossed fingers), but somehow it kept serving the "Please wait" page. I guess it is served by a separate thread, but since i could not see any running java or jenkins processes i restarted the server to stop it.
After reboot jenkins worked but it was not updated. To make it work it I ran the update again and restarted the jenkins service manually - it took less than a minute and worked just fine...
Jenkins seems to have a number of bugs related to restarting, and at least one unresolved: jenkins issue
Windows ONLY....
All the solutions here didn't work and restarting the server was not an option. If you are in the same situation.
I had to kill java.exe and restart the jenkins service. After I did this Jenkins reloaded several times and then went back to normal.
I was stuck on the jenkins restarting page for 10-ish minutes untill I did this.
Hope this helps.
Running this in the command line helped me:
service jenkins restart
I had a similar issue when updating plugins from the pluging update page and I marked the restart jenkins options. jenkins only showed the waiting message for a long time.
I solved the issue restoring .bak to .jpi files of the the plugins that I tried to update.
I did the follow in my jenkins
cd $JENKINS_HOME/plugins/
>sudo mv git.bak git.jpi
.
. (more plugins files)
.
>sudo mv ldap.bak ldap.jpi
>sudo /sbin/service jenkins restart
Check Event Viewer.
I found that my Java died.
Faulting application java.exe, version 7.0.250.17, time stamp 0x51c4b3fd, faulting module ntdll.dll, version 6.0.6002.18541, time stamp 0x4ec3e39f, exception code 0xc0000374, fault offset 0x000abc4f, process id 0x1188, application start time 0x01cee4f42968bc81.
Finally I found that it's Jenkins 1.540 problem. Don't use it.
https://issues.jenkins-ci.org/browse/JENKINS-20630
I faced the same issue after upgrading some plugins on Windows. Looking on jenkins.err.log it displayed this error
Exception in thread "main" java.io.IOException: Jenkins has failed to create a temporary file in C:\Users\builder\AppData\Local\Temp\
at Main.extractFromJar(Main.java:350)
at Main._main(Main.java:194)
at Main.main(Main.java:91)
Caused by: java.io.IOException: There is not enough space on the disk
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(Unknown Source)
at Main.extractFromJar(Main.java:347)
... 2 more
The problem was that the TEMP folder of the jenkins user had lots of temporary files. After cleaning that folder jenkins restarted correctly.
just performed a restart on the server. That fixed the issue !
In Command prompt execute this
C:\>service jenkins restart
Or
You can go for Service currently running in your machine( Win + R ) seach for Jenkins and Click on restart
For me, the cause seemed to be having lots of old job build logs hanging around. To clean them up, I ran:
cd $JENKINS_HOME/jobs
find -name 'builds' | xargs -n 1 bash -c 'rm -rf $0/[1-9]*'
Then I stopped and started Jenkins again, and it came up within a minute.
Credit to: https://stackoverflow.com/a/39230597/2255242
This is an old thread.. but my personal recommendation is to WAIT before attempting to do anything (such as restarting service, etc).
I wasted hours once trying to fix something that turned out to be not an issue in the first place. In the end, I messed things up and wasted a lot of time.
Just because you see errors in the logs doesn't necessarily mean that you need to take action.
The upgrade took about 45 minutes in the end for me. All i did at one point was refreshing my browser window. It can take a while.
Just my opinion
On Win 10: Stopping with the service command from the command line reported failure to stop the service, but I was able to stop it from services.msc (running as administrator). The updates were applied. Sorry, no definitive answer from me. YMMV.
I used TCPView and killed process that was using port 8080. BAsically it was all Java.exe from Jenkins. Killed all processes and restarted Jenkins Service
try to restart that inside windows services console, it will work
I have observed the same issue after installing a plugin and opting to restart the jenkins when no jobs are running.
When I looked at the jenkins server process, it was running fine and no issues.
On restarting the jenkins service using the below command and reloading the browser, Jenkins was up.
sudo service jenkins restart
If Jenkins is taking an unusually long time to restart the best recourse is to check the generated logs to see what may be wrong. However, even that may be of little help because many plugins try to be "quiet" by default, even if they are furiously working to load content. So if all else fails, you may have to resort to manually disabling plugins.
However here is a free tip: Some plugins are known to be messy. For example the Job Config History plugin we observed to write hundreds of thousands of records for both job configuration changes AND agent changes. Removing this plugin, and deleting the configHistory folder fixed one problem where our startup literally took > 4 hours.
In our case, the problem was we were launching ephemeral agents (via docker and/or kubernetes). Each new "agent" was treated as a configuration change. With thousands of agents per day, it didn't take long to fill up a substantial part of the disk with history that never was effectively cleared.
There are other plugins that leak data in this way. And you can also create self-inflicted wounds, e.g. by using a standalone process to remove "obsolete" files. An example where we were "bitten" is a process that tried to discard old build records, but did an incomplete job - and was "warring" with the running Jenkins process. Jenkins will try breaking its neck to load a build.xml record that is empty or incomplete.
Three more tips:
You can install the monitoring plugin. Often when the jenkins UI proper didn't start, we were able to see the /monitoring in action.
Likewise, /userContent can often be loaded even when the rest of the UI is not fully up.
Don't rule out bad actors. It just takes one aggressive script that tries, e.g. to load the entire build history and ship it back via a REST call to effectively deny service to all other UI users.
I try to fix a file named hudson.model.UpdateCenter.xml located /var/lib/jenkins
I change the URL to https://mirrors.tuna.tsinghua.edu.cn/jenkins/updates/update-center.json
Finally, restart Jenkins. it solves my problem

jenkins slave can't get started

I set up jenkins on a linux server and select a Win7 pc as slave. I choose "launch slave agents via java web start" in slave configuration. when I use following command to start the slave, the Jenkins slave agent window showed connected and then at once turned to "terminated". Anybody help? Thanks a lot!
set SLAVENAME=%1
set CYGPATH=%2
if x%CYGPATH% == x set CYGPATH=C:\APPS\cygwin\bin
set PATH=%CYGPATH%;%PATH%
:RUN_SLAVE
echo %PATH%
javaws %MASTER%/computer/%SLAVENAME%/slave-agent.jnlp
exit 0
My issue was that my drive was not set. I told it to use the D:\ Drive because I had cloned it from another machine. However that drive didn't exist (Coudln't access it to place down the files).
There could be many things that are wrong, but because you report the connection is established and then terminated, I think you have an incorrect address for Jenkins itself in the Jenkins global configuration.
Jenkins does not use the "Jenkins URL" setting for a lot of things, but establishing connection with Java Web Start slaves is one of them, so please ensure "Jenkins URL" in the master configuration is set correctly.
Just putting it out there so that it might be useful for someone...
I also had the same issue and on checking the Master log (nice place to start if you have this issue), I came to know that the Remote FS Root was wrong. Actually I was following the official tutorial and even on the tutorial, they suggest you to use "C:\Jenkins\" which is wrong!
Jenkins tries to copy some files to this path but in this case, inverted commas are not accepted. So setting it to C:\Jenkins\ worked out for me!

Resources