My organization just updated our Jenkins instance and are now seeing problems with our pipeline jobs. All of our declarative pipeline jobs now just stop at the “[Pipeline] Start of Pipeline” and will hang there indefinitely (see screenshot below)
When looking at what the job is doing in terms of the executors, it seems to be just sitting idily on the controller node and not even in the build queue. We have 4 executors on the build in node.
Background Info: We are running our Jenkins instance on Windows Server 2012 on premises. Our recent update was to 2.362.2. I can’t remember exactly the version number we updated from, but is was over 2 years old
Does anyone have any troubleshooting steps we could try? We tried downgrading relevant pipeline plugins but that did not seem to work. Have also tried looking at logs but am unsure if I have found any relevant information. If anyone needs more context, info, I’d be happy to provide it. I just don’t know where to even begin with troubleshooting this problem
Related
Here is the thing, I have got a program that may get stuck sometimes, and when it happens I need to reboot my machine.
So, I want to reboot my Jenkins slave when the program gets stuck then continue to execute the rest of my program without marking the whole job as failed.
Can Anyone tell me how to do that?
Actually I wanted to add this as a comment but I don't have enough reputation.
You may want to use Restart from stage feature as documented here
I am running jenkins multi branch job, suddenly it not allow me to change the configuration changes, its keep on loading without any timeout issue.
Can you please some one help me on this ?
You could have a look at the Jenkins master machine CPU and memory. Look what is consuming them. I have seen this happening when the CPU is nearly 100 %. In this case, restarting the Jenkins process or Jenkins master machine could help.
Try to remember/ask colleagues if there are any recent changes to Jenkins master machine. We had similar issues after installing plugins.
Avoid executing jobs on Jenkins master, use slave agents.
You may need to clean up old builds if you are not doing this already.
in my case, after disabling / enabling all plugins one by one, it was the "AWS SQS Build Trigger Plugin", causing the "save / apply" buttons to move, and not be functional
I have a job running on a WindowsServer2012R2 agent. The job is pausing between 2 plugins (BuildNameSetter v1.6.8 and DiscardOldBuilds v1.0.5) as you can see below:
13:05:25 Set build name.
13:05:25 New build name is '5.0.811.0'
13:20:21 Discard old builds...
I've started to notice this strange behavior after upgrading Jenkins Master from 2.89 to 2.190.3.
It's frustrating to see your job taking a 15 minutes nap!
Is this a server side issue or a agent side one?
Can someone give me some hints about how to tackle this problem?
Did you experience something similar?
you could have a look at jenkins central logs /log/all to see if there is any java stacktrace error in there
Then you should first try to isolate the issue. try deactivating the build name setter step first. then try to disable the discard old build. then enable build name setter again and keep discard old build deactivated
now you know which plugin is causing the issue try to downgrade or upgrade the plugin that makes your build hang
if the issue comes from discard old build, I would try to remove clean the job's workspace and remove builds manually
look for your issue on jenkins's jira system, upvote. create a ticket if you have not found another user experiencing the same issue
Finally you should be able to find workarounds for these plugins
Today I've upgraded Jenkins to a newer version (2.263.2) on the Jenkins server and the 15 minutes pause dissapeared.
I build openembedded image with jenkins pipeline. The pipeline ends successfully (according to logs). The finished workspace has about 40 GB. The problem is that even although the pipeline finishes, the job freezes for at least several hours. I am not sure if it ever recovers, as I have always killed it (cannot afford to have jenkins blocked for several days).
I don't observe this when building something small (~ 1GB). I also don't observe it when I wipe out the entire build as the last step. And I don't have any deployment explicitly set up.
What could be wrong?
Fixed. The problem was caused by cppcheck plugin, which was left activated for all jobs on Jenkins. After disabling the plugin the big jobs end normally.
There was a question about how I figured it out. I realized that there is an active plugin that performs a post-processing on the build output, and that the build output is really, really big. So I tried disabling the plugin and it helped.
Lesson learned: Don't put any post-processing plugins under the Overview and statistics of all builds view, as they are activated for all builds, even the undesirable ones.
I have many long running jobs that take almost a day to complete. Splitting is not possible. If the network fails then all progress is lost.
How can a slave survive disconnections?
EDIT 1
I have around 300 slaves running in Windows tied to one single Jenkins instance.
Slaves are connected using the manual method java - jar slave.jar -jlnpUrl <serverUrl> <slaveName>. I cannot run them as a regular Windows service because some tests manipulate GUI elements and require a real interactive session otherwise test fail.
EDIT 2
According to Jenkins Cookbook I should be using Cygwin + OpenSSH approach instead of custom script with JLNP-connector. Could this improve stability?
Jenkins was not originally designed for builds to survive across server or slave restarts. There is a CloudBees Long-Running Build plugin that supports long-running builds but, unfortunately, it is available only for enterprise users and still beta.
I didn't find any free alternative and would suggest you to try to improve your network stability and to split your long running jobs. At least you can divide your tests on logical groups (test suites).
Jenkins now has a workflow plugin. It claims to handle "server" restart and loss-of connectivity with slave.
From the link
A key feature of a workflow execution is that it's suspendable. That
is, while the workflow is running your script, you can shut down
Jenkins or lose a connectivity to a slave. When it comes back, Jenkins
will still remember what it was doing, and your workflow script
resumes execution as if it was never interrupted. A technique known as
the "continuation-passing style" execution plays a key role in
achieving this.
(not tested at all)
Edit: Copied from #Jesse Glick's comments :
Workflow is open source and available for anyone running Jenkins 1.580.1+ or later. CloudBees Jenkins Enterprise does include a checkpoint feature, but this is not necessary simply to have a build survive slave disconnections and Jenkins restarts: that is automatic