We're using Jenkins CI (version 1.454) as execution engine for automated tests (triggered via Ant). One of my tests runs for at least 1-2 days. I couldn't exactly find out, since Jenkins always terminates my test before it finishes. Last time after ~20,5h.
We got the "Build timeout" plugin installed, but the test project in question doesn't have a timeout configured, so it shouldn't interrupt. The tests are running on a slave node. The global Jenkins configuration does not contain any timeout settings.
I've seen two other people having a similar problem, but no answers so far
Stacktrace:
INFO: Test-Linux-lts-mc #31 aborted
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:127)
at hudson.remoting.Channel.call(Channel.java:681)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
at $Proxy37.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:859)
at hudson.Launcher$ProcStarter.join(Launcher.java:345)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:703)
at hudson.model.Build$RunnerImpl.build(Build.java:178)
at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:473)
at hudson.model.Run.run(Run.java:1408)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
The console log doesn't show any indication to an error or timeout. It just says Build aborted and cleans up the nodes workspace.
Jenkins itself does not have a timeout. I have 24 hour performance tests and Jenkins does not abort them.
The java.lang.InterruptedException makes me think there is a signal from your build process which escapes and hits Jenkins. Are you running ant target using the Invoke Ant build step? If you are, you could try to execute a shell build step and in the shell run ant yourtarget.
Related
I have a Jenkins Job DSL job that worked well until about january (it is not used that often). Last week, the job failed with the error message ERROR: java.io.IOException: Failed to persist config.xml (no Stack trace, just that message). There were no changes to the job since the last successful execution in january.
[...]
13:06:22 Processing provided DSL script
13:06:22 New run name is '#15 (Branch_B20_2_x)'
13:06:22 ERROR: java.io.IOException: Failed to persist config.xml
13:06:22 [WS-CLEANUP] Deleting project workspace...
13:06:22 [WS-CLEANUP] Deferred wipeout is used...
13:06:22 [WS-CLEANUP] done
13:06:22 Finished: FAILURE
I thougt that between january and noew, maybe some plugin was updated and the DSL script is now wrong, so I changed my DSL script to the most easy one I could imagine (example from job-dsl plugin page):
job('example') {
steps {
shell('echo Hello World!')
}
}
But the job still fails with the exact same error.
I checked the jenkins logs but nothing to see.
I am running jenkins in a docker swarm container and each job is executed in an own build agent conatiner using docker-swarm-plugin (no changes to that either, worked in january).
The docker deamon logs also show no errors.
The filesystem for the workspace of jenkins also is not full and the user in the build agent container has write access to taht file system.
It even does not work, when I mount an empty tmpfs to the workspace.
Does anyone have an idea what goes wrong or at least a hint where to continue searching for that error?
Jenkins version: 2.281
job-dsl plugin version: 1.77
Docker version: 20.10.4
Problem was solved by updating jenkins to 2.289
Seems like there war some problem with the combination of the versions before. I will keep you updated if some of the next updates chnages anything.
Problem:
My job failed on Delete.
[14:12:28] [WS-CLEANUP] Deleting project workspace... [14:12:37]
ERROR: [WS-CLEANUP] Cannot delete workspace: remote file operation
failed: c:\jenkins\workspace\v3000.0.0-CI-3-QA-LoadRunner_2012_18 at
hudson.remoting.Channel#387682a8:JNLP4-connect connection from
192.168.11.149/192.168.11.149:34302: java.io.IOException: Unable to delete
'c:\jenkins\workspace\v3000.0.0-CI-3-QA-LoadRunner_2012_18.git\objects\pack\pack-edfa06b8e1b0e8e57d244e1d6085bd6fedeb8392.pack'.
Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
[14:12:37] ERROR: Cannot delete workspace: remote file operation
failed: c:\jenkins\workspace\v3000.0.0-CI-3-QA-LoadRunner_2012_18 at
hudson.remoting.Channel#387682a8:JNLP4-connect connection from
192.168.11.149/192.168.11.149:34302: java.io.IOException: Unable to delete
'c:\jenkins\workspace\v3000.0.0-CI-3-QA-LoadRunner_2012_18.git\objects\pack\pack-edfa06b8e1b0e8e57d244e1d6085bd6fedeb8392.pack'.
Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
[14:12:37] Recording test results [14:12:37] ERROR: Step ‘Publish
Micro Focus tests result’ failed: Test reports were found but none of
them are new. Did leafNodes run?
Solution:
Manually Log in to Slave machine and restart Jenkins Slave agent that apparently lock a file there...
Then I can rebuild....
Required Solution:
I thought on creating ability automatically to restart Jenkins Slave before each run. Any ideas?
The problem solved via upgrade to: Microfocus Application Automation Tools v5.9
I have a few Jenkins build jobs for compiling a .NET application using npm, msbuild, and unit test case execution steps.
It shows build success and hangs after exit 0 status, because of that downstream jobs are delayed and failing after long waiting time.
Is it some process blocking or some plugin issue?
PS: I don't want to use build-timeout plugin as the build is already showing success but taking too long to exist from that job.
I had the same issue and added"-DSoftKillWaitSeconds=0" in jenkins.xml before the -jar option. Now jobs execute normally for me. Link to solution
I've configured a Jenkins matrix job that has a total of 4 axis points. For each axis there is a single build step (parameterized-trigger) triggering another project adding the current build parameters. All four jobs are triggering the same job with different parameters.
The problem I am seeing is only one drown stream job gets executed, I would expect 4.
This is NOT a post-build task, it is setup as a build step. This is the description in the Parameterized Trigger Plugin wiki page:
Build step
When using the trigger parameterized build as a buildstep
it will be called for every different configuration, so if triggering
another project with no parameters it will be done the same number of
times as you have configurations, possible causing the triggered job
to run more than once.
However this also allows you to trigger other jobs with parameters
relating to the current configuration, i.e. triggering a build on the
same node with the same JDK.
This is the log entry for the downstream job, it shows all for triggers, but they all seem to be launching the same exact job:
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x/ProductType=video,SecurityType=standard" build number 12
originally caused by:
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x" build number 12
originally caused by:
Started by user anonymous
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x/ProductType=video,SecurityType=secure" build number 12
originally caused by:
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x" build number 12
originally caused by:
Started by user anonymous
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x/ProductType=voice,SecurityType=standard" build number 12
originally caused by:
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x" build number 12
originally caused by:
Started by user anonymous
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x/ProductType=voice,SecurityType=secure" build number 12
originally caused by:
Started by upstream project "AndroidLibraries_Mx_Branch_5_1_x" build number 12
originally caused by:
Started by user anonymous
[Pipeline] node
Running on master in /var/lib/jenkins/workspace/AndroidLibrary_pipeline
<... job details ... >
If it makes any difference, the downstream job is a pipeline job as noted by the name.
Jenkins 2.19.2
Matrix Plugin 1.7.1
Parameterized Trigger Plugin 2.32
Pipeline 2.4
I've done a good bit of searching, and while I found a similar issue, the accepted answer does not help.
I figured it out. The parameters were not being passed correctly to the child job. I had the "Current build parameters" option set in the trigger parameterized job set, but I had to add "Predefined parameters" also, adding the parameters that the child job was expecting.
SecurityType=${SecurityType}
ProductType=${ProuductType}
I am trying to follow one of Koshuke's apporaches to organize my builds by using job-promotion on cludbees dev#cloud. In his presentation everything worked as a charm (beside running his examples from his own Jenkins instance deployed on localhost and me running jenkins jobs in dev#cloud).
Basically i have a couple of jobs and my main job called package within folder joy defines promotion process named "promotion-to-e2e-testing" (with criteria being completion of one of downstream jobs and no extra action defined for promotion). From the jenkins dashboard i can see that this promotion was successful. Nevertheless a job (called e2e-testing within folder joy) being configured with trigger "Build when another project is promoted : Job name: "joy/package", Promotion: "promotion-to-e2e-testing" is not fired !
I have looked at jenkins systems logs (via cloudbees Manage Jenkins link) and i can see:
May 20, 2013 6:04:33 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel s-8770fc61
java.io.IOException: Unexpected termination of the channel
athudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)May 20, 2013 6:04:26 AM hudson.model.Run execute
INFO: joy » package » promotion » promotion-to-e2e-testing #5 main build action completed: SUCCESS
May 20, 2013 6:04:26 AM hudson.slaves.WorkspaceList log
FINE: Executor #0 for s-8770fc61 : executing joy » package » promotion » promotion-to- e2e-testing #5 acquired /scratch/jenkins/workspace/joy/package
May 20, 2013 6:04:25 AM hudson.slaves.ChannelPinger setUpPingForChannel
By reading from the bottom to the top it seems that SynchronousCommandTransport is thrown after successful promotion. To make sure that throwing the exception is not volatile i have run my jobs a couple of times but still see the same exceptions in logs and my e2e-testing job is not fired.
Anyone can help me with that ? Maybe triggering a jobs by promotion is not available on cloudbees? Or maybe it is because my jobs are stored in a jenkins folder (all jobs being in a single folder) ?
configured with trigger Build when another project is promoted: Job name: joy/package
I suspect you have hit one of the symptoms of JENKINS-17955. If my hypothesis is correct, joy » package will work as the upstream name even though joy/package is really what the plugin ought to be expecting. But I have not yet dug further and tried to reproduce and fix it.