Jenkins Controller reports : Unable to create live FilePath for i-xxxxxxxxxxxxx and Agent is marked Offline
Googling this error indicates that it is a problem with the communication paths between Controller and Agent, but what?
Background:
Jenkins Controller running v2.332.1, Java 11 64bit OS, inside a docker container
Jenkins Agents running Swarm-Client jar downloaded from the Controller on startup. Swarm Plugin Version 3.32 Java 11 and 64bit OS, inside a docker container
Agents and Controller are hosted on separate EC2 instances in AWS with Security Group permissions on the relevant ports.
The Instance starts up runs the Cloud-Init, downloads the swarm-client.jar from Jenkins Controller and then runs it with the parameters required to connect to the controller. I mention this to avoid the "are you using the correct version" comments :-)
The Agent connects and is all fully online and gets busy servicing the pending Job queue.
Then some time later, indeterminate, some jobs last > 24 hours and have not failed, other jobs last minutes and sometimes fail.
Things I have tried: (some)
The Swarm Client jar can use either WebSockets and connect to the FQDN of the Jenkins controller or use the JNLP protocol to connect to the IP and dedicated agent connection port (fixed value on the Controller).
Similar behavior is seen with either protocols.
Opening all the AWS Security Groups: incase there was another port, not mentioned, that needed to be open.
Bypass AWS Load balancer: Agent connects directly to Controller IP:PORT via JNLP
Matching Versions: Swarm Client downloaded from Controller
Updated Versions: Jenkins 2.319.3, 2.332.1
Normalized Java environments: Java 11 64bit OS
Enabled Logging on the Agents: periodic communications happens and then stops after a while, without obvious reason.
Increased Controller Instance size: m5.xlarge -> m5.2xlarge
Bumping Jenkins up to a non-LTS version allowed the connections to become more stable.
Jenkins 2.341 and Swarm-Client version 3.32 both use Remoting version 4.13
Now, while I am not particularly happy about running a non-LTS version of Jenkins, I am pleased to have found a workaround
I have also struggled with this issue, I am adding details here, so, that others don't have to struggle.
This is all what i tried:
we had everything running when we had JDK 8 in both master and slave.
So, we added code to have JDK 11 in both and I replaced ec2 of Jenkins with a new one with help of ASG.
So, issue came, and we reverted, but still the issue was the same.
So, I was just assuming by this warning in jenkins as it says moveto jdk 11,as there anything like deprecated...so, I was just checking also we can try this new version of Jenkins as well, what they have mentioned. --going to Jenkins 2.344 with jdk8 ,same issue, and also to different jenkins version didn't help and I lost hope.
I have tried with a biggest ec2 type for slave --didn't help
I checked htop in slave --didn't help.
I tried restarting jenkins master --didn't help.
I tried changing remote dir for slave as mentioned in stack overflow --didn't help.
So, I have a thought, as Jenkins ec2 is terminated and new ec2 came up, so, things may get updated in jenkins by that...and also warning showing to have a new version of jenkins and jdk 11..so, that looked somewhat a hope to me.
I tried by increasing tomeout 20 min in slave setup, didn't help.
I tried adding this command :sudo yum -y update --security in init script of node of jenkins ec2 plgin--will not help.
we have tried jdk 11 image, jdk8 image and new jdk8 jenkins version image, issue was same in all.
So, what finally solved the issue:
that we moved to older version of jenkins:
https://hub.docker.com/layers/jenkins/jenkins/jenkins/2.330-jdk8/images/sha256-97fcb[…]17da34f0d07c021ab57083ee8c77dc4b21281d3498137?context=explore
Fixed by upgrading to Jenkins 2.344
I am getting the "Please wait while Jenkins is restarting" issue , I have restarted the Jenkins service but it still isn't working , I tried to install a plugin yesterday and since then it's showing me that message.
Any help would be much appreciated.
The simplest thing which will work almost every time is:
Login to the (windows) server where Jenkins is hosted.
Open CMD Prompt as administrator.
Go to path where jenkins.exe file is placed.
Enter command: jenkins.exe stop and then jenkins.exe start
For Linux server, kill the process and restart again.
We have a vm windows 10 machine jenkins slave that keeps rebooting although we have turned off windows auto update. When it restarts it tries to restart Jenkins which throws this error:
> java.lang.exception the server rejected the connection: none of the protocols were accepted at
> hudson.remoting.engine.onConnectionRejected(Engine.java:286)
> at hudson.remoting.Engine.run(Engine.java:262)
We then have to go in and manually start jenkins agent. This is not good for our automated tests that require this machine.
As an additional side note: Edge won't open. You can click on it all day and nothing appears to occur.
Any ideas on what is causing the windows reboot and how to stop it?
We wound up having our lab team create a new vm and deleting this one.
This sounds like a cause of malware, try downloading a free anti virus {have some help with this, alot contain malwate}.
Best Regards
Shadow
Thanks!
additional notes! Ive had this problem many times with windows 8 in a virtual box!
I updated some plugins and restarted the jenkins but now it says:
Please wait while Jenkins is restarting
Your browser will reload automatically when Jenkins is ready.
It is taking too much time (waiting from last 40 minutes). I have only 1 project with around 20 builds. I have restarted jenkins many times and worked fine but now it stucks.
Is there any way out to kill/suspend jenkins to avoid this wait?
I had a very similar issue when using jenkins build-in restart function. To fix it I killed the service (with crossed fingers), but somehow it kept serving the "Please wait" page. I guess it is served by a separate thread, but since i could not see any running java or jenkins processes i restarted the server to stop it.
After reboot jenkins worked but it was not updated. To make it work it I ran the update again and restarted the jenkins service manually - it took less than a minute and worked just fine...
Jenkins seems to have a number of bugs related to restarting, and at least one unresolved: jenkins issue
Windows ONLY....
All the solutions here didn't work and restarting the server was not an option. If you are in the same situation.
I had to kill java.exe and restart the jenkins service. After I did this Jenkins reloaded several times and then went back to normal.
I was stuck on the jenkins restarting page for 10-ish minutes untill I did this.
Hope this helps.
Running this in the command line helped me:
service jenkins restart
I had a similar issue when updating plugins from the pluging update page and I marked the restart jenkins options. jenkins only showed the waiting message for a long time.
I solved the issue restoring .bak to .jpi files of the the plugins that I tried to update.
I did the follow in my jenkins
cd $JENKINS_HOME/plugins/
>sudo mv git.bak git.jpi
.
. (more plugins files)
.
>sudo mv ldap.bak ldap.jpi
>sudo /sbin/service jenkins restart
Check Event Viewer.
I found that my Java died.
Faulting application java.exe, version 7.0.250.17, time stamp 0x51c4b3fd, faulting module ntdll.dll, version 6.0.6002.18541, time stamp 0x4ec3e39f, exception code 0xc0000374, fault offset 0x000abc4f, process id 0x1188, application start time 0x01cee4f42968bc81.
Finally I found that it's Jenkins 1.540 problem. Don't use it.
https://issues.jenkins-ci.org/browse/JENKINS-20630
I faced the same issue after upgrading some plugins on Windows. Looking on jenkins.err.log it displayed this error
Exception in thread "main" java.io.IOException: Jenkins has failed to create a temporary file in C:\Users\builder\AppData\Local\Temp\
at Main.extractFromJar(Main.java:350)
at Main._main(Main.java:194)
at Main.main(Main.java:91)
Caused by: java.io.IOException: There is not enough space on the disk
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(Unknown Source)
at Main.extractFromJar(Main.java:347)
... 2 more
The problem was that the TEMP folder of the jenkins user had lots of temporary files. After cleaning that folder jenkins restarted correctly.
just performed a restart on the server. That fixed the issue !
In Command prompt execute this
C:\>service jenkins restart
Or
You can go for Service currently running in your machine( Win + R ) seach for Jenkins and Click on restart
For me, the cause seemed to be having lots of old job build logs hanging around. To clean them up, I ran:
cd $JENKINS_HOME/jobs
find -name 'builds' | xargs -n 1 bash -c 'rm -rf $0/[1-9]*'
Then I stopped and started Jenkins again, and it came up within a minute.
Credit to: https://stackoverflow.com/a/39230597/2255242
This is an old thread.. but my personal recommendation is to WAIT before attempting to do anything (such as restarting service, etc).
I wasted hours once trying to fix something that turned out to be not an issue in the first place. In the end, I messed things up and wasted a lot of time.
Just because you see errors in the logs doesn't necessarily mean that you need to take action.
The upgrade took about 45 minutes in the end for me. All i did at one point was refreshing my browser window. It can take a while.
Just my opinion
On Win 10: Stopping with the service command from the command line reported failure to stop the service, but I was able to stop it from services.msc (running as administrator). The updates were applied. Sorry, no definitive answer from me. YMMV.
I used TCPView and killed process that was using port 8080. BAsically it was all Java.exe from Jenkins. Killed all processes and restarted Jenkins Service
try to restart that inside windows services console, it will work
I have observed the same issue after installing a plugin and opting to restart the jenkins when no jobs are running.
When I looked at the jenkins server process, it was running fine and no issues.
On restarting the jenkins service using the below command and reloading the browser, Jenkins was up.
sudo service jenkins restart
If Jenkins is taking an unusually long time to restart the best recourse is to check the generated logs to see what may be wrong. However, even that may be of little help because many plugins try to be "quiet" by default, even if they are furiously working to load content. So if all else fails, you may have to resort to manually disabling plugins.
However here is a free tip: Some plugins are known to be messy. For example the Job Config History plugin we observed to write hundreds of thousands of records for both job configuration changes AND agent changes. Removing this plugin, and deleting the configHistory folder fixed one problem where our startup literally took > 4 hours.
In our case, the problem was we were launching ephemeral agents (via docker and/or kubernetes). Each new "agent" was treated as a configuration change. With thousands of agents per day, it didn't take long to fill up a substantial part of the disk with history that never was effectively cleared.
There are other plugins that leak data in this way. And you can also create self-inflicted wounds, e.g. by using a standalone process to remove "obsolete" files. An example where we were "bitten" is a process that tried to discard old build records, but did an incomplete job - and was "warring" with the running Jenkins process. Jenkins will try breaking its neck to load a build.xml record that is empty or incomplete.
Three more tips:
You can install the monitoring plugin. Often when the jenkins UI proper didn't start, we were able to see the /monitoring in action.
Likewise, /userContent can often be loaded even when the rest of the UI is not fully up.
Don't rule out bad actors. It just takes one aggressive script that tries, e.g. to load the entire build history and ship it back via a REST call to effectively deny service to all other UI users.
I try to fix a file named hudson.model.UpdateCenter.xml located /var/lib/jenkins
I change the URL to https://mirrors.tuna.tsinghua.edu.cn/jenkins/updates/update-center.json
Finally, restart Jenkins. it solves my problem
I set up jenkins on a linux server and select a Win7 pc as slave. I choose "launch slave agents via java web start" in slave configuration. when I use following command to start the slave, the Jenkins slave agent window showed connected and then at once turned to "terminated". Anybody help? Thanks a lot!
set SLAVENAME=%1
set CYGPATH=%2
if x%CYGPATH% == x set CYGPATH=C:\APPS\cygwin\bin
set PATH=%CYGPATH%;%PATH%
:RUN_SLAVE
echo %PATH%
javaws %MASTER%/computer/%SLAVENAME%/slave-agent.jnlp
exit 0
My issue was that my drive was not set. I told it to use the D:\ Drive because I had cloned it from another machine. However that drive didn't exist (Coudln't access it to place down the files).
There could be many things that are wrong, but because you report the connection is established and then terminated, I think you have an incorrect address for Jenkins itself in the Jenkins global configuration.
Jenkins does not use the "Jenkins URL" setting for a lot of things, but establishing connection with Java Web Start slaves is one of them, so please ensure "Jenkins URL" in the master configuration is set correctly.
Just putting it out there so that it might be useful for someone...
I also had the same issue and on checking the Master log (nice place to start if you have this issue), I came to know that the Remote FS Root was wrong. Actually I was following the official tutorial and even on the tutorial, they suggest you to use "C:\Jenkins\" which is wrong!
Jenkins tries to copy some files to this path but in this case, inverted commas are not accepted. So setting it to C:\Jenkins\ worked out for me!