how to reboot Jenkins slave in pipeline without job failed - jenkins

Here is the thing, I have got a program that may get stuck sometimes, and when it happens I need to reboot my machine.
So, I want to reboot my Jenkins slave when the program gets stuck then continue to execute the rest of my program without marking the whole job as failed.
Can Anyone tell me how to do that?

Actually I wanted to add this as a comment but I don't have enough reputation.
You may want to use Restart from stage feature as documented here

Related

Jenkins build can't finish

For my build via Groovy script (without using Docker or maven or k8s, just call another sh file), when triggered by timer, sometimes, it cannot finish with issue:
process apparently never started in jenkinsWs/myjob#tmp/durable-d6c09021
(running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
Cannot contact Slave29: java.io.FileNotFoundException: File 'jenkinsWs/myjob#tmp/durable-d6c09021/output.txt' does not exist
But it always works when I run it manually. I've used monitor tools for my slave machine to track what happened, but nothing strange.
I've tried all solution from this but nohope.
Thanks in advance for any idea.

how to auto restart jenkins when its not responding

We are facing issue with Jenkins instance in production environment that when multiple jobs are qued and during long running jobs like code scanning, our jenkins instance is getting hanged and not responding
most of the time need to restart the jenkins manually. So as a workaround we are looking for below scenario.
Get a way to automatically identify when the jenkins become not responding or hanged
Get a way to automatically restart the jenkins instance when jenkins is not responding moretahn 30secs..
the approach should not restart the jenkins instance until its not responding as mentioned and restart is required.
Is there any way to have these steps available with our cloud service azuredevops, so that the pipeline can be triggered in such scenarios.
Hope this is a known issue for most of the experts here and looking for your guidance on how we can get rid of such issues.

The session for this agent already exists

I am using TFS to execute a nightly build that includes several steps that use the TFS Test Agent. I am running the latest version of TFS/Test Agent(2015 - Update 3) and there are no other builds being run at this time. Often(maybe half the time), when the nightly job is run the step "Visual Studio Test Agent Deployment" fails with the following error:
The job has been abandoned because agent Agent-XXX did not renew the
lock. Ensure agent is running, not sleeping, and has not lost
communication with the service.
This is due to the error found in the Test Agent's log file(under _diag):
The session for this agent already exists. Sleeping for 30 seconds
before next retry.
Microsoft.TeamFoundation.DistributedTask.WebApi.TaskAgentSessionConflictException:
The task agent Agent-XXX already has an active session for owner XXX.
This issue is directly referenced here, and indirectly talked about here.
The solution I've found to this issue is to restart the server that the test agent is running on, this clears any dead sessions, and after the server starts back up, the tests run just fine. I think this is effectively what is being done in the previously mentioned post. The result of resetting the configs is that the service is restarted.
While being presented as a solution in the linked article, it is only temporary. Even after the server has been restarted and the build runs successfully, the next day the issue will again reappear necessitating manual intervention to get the build to run.
I could schedule a task to reset the service or even restart the server directly before the nightly build is run, but it strikes me as a bandage rather than a fix. Has anyone experienced this issue before, and if so is there any way to prevent it from occurring in the first place?
Update 1
I simply set up a build that runs 5 minutes before my main tests that runs a Bat script to restart all my servers hosting my test agents. This is a workaround, but one that seems to resolve the issue. Hopefully someday someone can come up with a better solution than this, but for now, it's how I have to run automated testing in TFS.
Update 2
I have three servers now, all three exhibit the same issue, though it is hard to pin down exactly when it occurs. Scaling up the workaround without creating downtime it proving to be quite challenging.
Update 3
A better day came, I upgraded TFS to 2018, and the build agent to the latest version, this issue no longer occurs, I think its a bug in the old build agent. I still don't have a solution for the original version of the build agent...
t sounds like a process Agent.Listener.exe was running under somewhere on the machine, maybe as a service (not a logged in user session).
note, if an agent process is abruptly terminated while it has an active session, the session will eventually timeout (after 5 minutes i think). and on startup, if an agent encounters session conflict then it will retry for up to 5.5 minutes i think before giving up (enough time for an abruptly terminated session to expire).
i'm going to go ahead and close this and assume a process was running somewhere. we havent had any issues in this area and haven't heard any other reports, so i dont think there is an issue here with the agent. if you find a repro, or it looks like i'm wrong then please reopen.

Jenkins job is not stopping after completion of execution

I am running Jenkins using maven and once the job is completed it is not terminating until we terminate it manually.In console output able to see the results but not showing build success and showing the processing symbol/loading symbol. can any one tell me how to stop the job after job execution.
Do we need to terminate manually or
do we have need to add anything in post build to stop after successful execution?
do we have to set something in configuration to terminate automatically
please can anyone help me out?
A few things could be going on here:
Does the Maven build execute any other goals after running tests ? If so those could be hanging.
Does your build run on a slave ? If so, Jenkins copies log files and other artifacts back to the master after completing the build steps but before marking the build as complete. You may have a network or I/O bottleneck here.
If you can't figure out the root cause and just want to have the build terminate without intervention, you can use the build timeout plugin.
If you have jobs running in parallell then some plugins have to wait for the older jobs to finish until the current one can. Not sure if this is your situation.
After using webdriver.quit() in our selenium project, it's working fine.Job got completed and the reports were generated.

One execution per Windows VMware VM as Jenkins slaves?

I am trying to run some automated acceptance tests on a windows VM but am running into some problems.
Here is what I want, a job which runs on a freshly reverted VM all the time. This job will get an MSI installer from an upstream job, install it, and then run some automated tests on it, in this case using robotframework (but that doesn't really matter in this case)
I have setup the slave in the vSphere plugin to only have one executor and to disconnect after one execution. On disconnect is shutsdown and reverts. My hope was this meant that it would run one Jenkins job and then revert, the next job would get a fresh snapshot, and so would the next and so on.
The problem is if a job is in queue waiting for the VM slave, as soon as the first job finishes the next one starts, before the VM has shutdown and reverted. The signal to shutdown and revert has however been sent, so the next job is almost immedieatly failed as the VM shuts down.
Everything works fine as long as jobs needing the VM aren't queued while another is running, but if they are I run into this problem.
Can anyone suggest a way to fix this?
Am I better off using vSphere build steps rather than setting up a build slave in this fashion, if so how exactly do I go about getting the same workflow to work using buildsteps and (i assume) pipelined builds.
Thanks
You can set a 'Quiet period' - it's in Advanced Project Options when you create a build. You should set it at the parent job, and this is the time to wait before executing the dependent job
If you'll increase the wait time, the server will go down before the second job starts...
Turns out the version of the vSphere plugin I was using was outdated, this bug problem is fixed in the newer version

Resources