how to auto restart jenkins when its not responding - jenkins

We are facing issue with Jenkins instance in production environment that when multiple jobs are qued and during long running jobs like code scanning, our jenkins instance is getting hanged and not responding
most of the time need to restart the jenkins manually. So as a workaround we are looking for below scenario.
Get a way to automatically identify when the jenkins become not responding or hanged
Get a way to automatically restart the jenkins instance when jenkins is not responding moretahn 30secs..
the approach should not restart the jenkins instance until its not responding as mentioned and restart is required.
Is there any way to have these steps available with our cloud service azuredevops, so that the pipeline can be triggered in such scenarios.
Hope this is a known issue for most of the experts here and looking for your guidance on how we can get rid of such issues.

Related

how to reboot Jenkins slave in pipeline without job failed

Here is the thing, I have got a program that may get stuck sometimes, and when it happens I need to reboot my machine.
So, I want to reboot my Jenkins slave when the program gets stuck then continue to execute the rest of my program without marking the whole job as failed.
Can Anyone tell me how to do that?
Actually I wanted to add this as a comment but I don't have enough reputation.
You may want to use Restart from stage feature as documented here

Call jobs on demand

We have a docker container which is a CLI application, it runs, does it s things and exits.
I got the assignment to put this into kubernetes but that containers can not be deployed as it exits and then is considered a crashloop.
So the next question is if it can be put in a job. The job runs and gets restarted every time a request comes in over the proxy. Is that possible? Can job be restarted externally with different parameters in kubernetes?
So the next question is if it can be put in a job.
If it is supposed to just run once, a Kubernetes Job is a good fit.
The job runs and gets restarted every time a request comes in over the proxy. Is that possible?
This can not easyli be done without external add-ons. Consider using Knative for this.
Can job be restarted externally with different parameters in kubernetes?
Not easyli, you need to interact with the Kubernetes API, to create a new Job for this, if I understand you correctly. One way to do this, is to have a Job with kubectl-image and proper RBAC-permissions on the ServiceAccount to create new jobs - but this will involve some latency since it is two jobs.

How to restart interrupted Jenkins jobs after a server or node failure/restart?

I'm running a Jenkins server and some slaves on a docker swarm that's hosted on preemptive google instances (akin to AWS spot instances). I've got everything set up so that at any given moment there is a Jenkins master running on a single server and slaves running on every other server on the swarm. When one server gets terminated another is spun up and replaces it, and eventually Jenkins is back up running again on another machine even if its server was stopped, and slaves get replaced as they die.
I'm facing two problems:
My first one is when the Jenkins master dies and comes back online it tries to resume the jobs that were previously running and they end up getting stuck trying to be built. Is there any way to automatically have Jenkins restart jobs that were interrupted instead of trying to resume them?
The second is when a slave dies I'd like to automatically restart any jobs that were running on it elsewhere. Is there any way to do that?
Currently I'm dealing with both situations by have an external application retry the failed build jobs, but that's not really optimal.
Thanks!

Jenkins docker-plugin - Job does not start (waiting for executor)

I'm trying not (not hard enough it seems) to get our jenkins server to provision a jenkins-slave using docker.
I have installed the Docker-plugin and configured it according to the description on the page. I have also tested the connectivity and at least this part works.
I have also configured 1 label in the plugin and in my job. I even get a nice page showing me the connected jobs for this slave.
When I then try to start a build nothing really happens. A build is scheduled, but never started - (pending—Waiting for next available executor).
From the message it would seem like jenkins is not able to start the slave via docker....
I'm using docker 1.6.2 and the plugin is 0.10.1.
Any clue to what is going on would be much appreciated!
It seems the problem was that I had added the docker version in the plugin config. That is apparently a no-go according to this post

One execution per Windows VMware VM as Jenkins slaves?

I am trying to run some automated acceptance tests on a windows VM but am running into some problems.
Here is what I want, a job which runs on a freshly reverted VM all the time. This job will get an MSI installer from an upstream job, install it, and then run some automated tests on it, in this case using robotframework (but that doesn't really matter in this case)
I have setup the slave in the vSphere plugin to only have one executor and to disconnect after one execution. On disconnect is shutsdown and reverts. My hope was this meant that it would run one Jenkins job and then revert, the next job would get a fresh snapshot, and so would the next and so on.
The problem is if a job is in queue waiting for the VM slave, as soon as the first job finishes the next one starts, before the VM has shutdown and reverted. The signal to shutdown and revert has however been sent, so the next job is almost immedieatly failed as the VM shuts down.
Everything works fine as long as jobs needing the VM aren't queued while another is running, but if they are I run into this problem.
Can anyone suggest a way to fix this?
Am I better off using vSphere build steps rather than setting up a build slave in this fashion, if so how exactly do I go about getting the same workflow to work using buildsteps and (i assume) pipelined builds.
Thanks
You can set a 'Quiet period' - it's in Advanced Project Options when you create a build. You should set it at the parent job, and this is the time to wait before executing the dependent job
If you'll increase the wait time, the server will go down before the second job starts...
Turns out the version of the vSphere plugin I was using was outdated, this bug problem is fixed in the newer version

Resources