How to restart interrupted Jenkins jobs after a server or node failure/restart? - jenkins

I'm running a Jenkins server and some slaves on a docker swarm that's hosted on preemptive google instances (akin to AWS spot instances). I've got everything set up so that at any given moment there is a Jenkins master running on a single server and slaves running on every other server on the swarm. When one server gets terminated another is spun up and replaces it, and eventually Jenkins is back up running again on another machine even if its server was stopped, and slaves get replaced as they die.
I'm facing two problems:
My first one is when the Jenkins master dies and comes back online it tries to resume the jobs that were previously running and they end up getting stuck trying to be built. Is there any way to automatically have Jenkins restart jobs that were interrupted instead of trying to resume them?
The second is when a slave dies I'd like to automatically restart any jobs that were running on it elsewhere. Is there any way to do that?
Currently I'm dealing with both situations by have an external application retry the failed build jobs, but that's not really optimal.
Thanks!

Related

how to auto restart jenkins when its not responding

We are facing issue with Jenkins instance in production environment that when multiple jobs are qued and during long running jobs like code scanning, our jenkins instance is getting hanged and not responding
most of the time need to restart the jenkins manually. So as a workaround we are looking for below scenario.
Get a way to automatically identify when the jenkins become not responding or hanged
Get a way to automatically restart the jenkins instance when jenkins is not responding moretahn 30secs..
the approach should not restart the jenkins instance until its not responding as mentioned and restart is required.
Is there any way to have these steps available with our cloud service azuredevops, so that the pipeline can be triggered in such scenarios.
Hope this is a known issue for most of the experts here and looking for your guidance on how we can get rid of such issues.

Call jobs on demand

We have a docker container which is a CLI application, it runs, does it s things and exits.
I got the assignment to put this into kubernetes but that containers can not be deployed as it exits and then is considered a crashloop.
So the next question is if it can be put in a job. The job runs and gets restarted every time a request comes in over the proxy. Is that possible? Can job be restarted externally with different parameters in kubernetes?
So the next question is if it can be put in a job.
If it is supposed to just run once, a Kubernetes Job is a good fit.
The job runs and gets restarted every time a request comes in over the proxy. Is that possible?
This can not easyli be done without external add-ons. Consider using Knative for this.
Can job be restarted externally with different parameters in kubernetes?
Not easyli, you need to interact with the Kubernetes API, to create a new Job for this, if I understand you correctly. One way to do this, is to have a Job with kubectl-image and proper RBAC-permissions on the ServiceAccount to create new jobs - but this will involve some latency since it is two jobs.

Update jenkins during long running jobs

I want to upgrade my jenkins master without aborting or waiting for long running jobs to finish on slaves. Is there a plugin available that provides this feature?
We have several build jobs running regression and integration tests which take hours to run. Often, at least one of those jobs is running, making it hard to restart jenkins after updates. I know, that it is poosible to block the queue. We tried this, but it hinders more than it helps.
What we are looking for is a plugin, that runs jobs on slaves, caches the output as soon as the connection to the master is interrupted and sends the remaining output to the master when the master is up again. Does anybody know a plugin providing this feature.

Can I use Jenkins Slaves for automated testing on different operating systems?

I am setting up a CI workflow using Jenkins. I have various code bases that I would like to be able to test on different operating systems from Windows Server 2012 through 2003 and also Red Hat, etc.
I'm wondering if using Jenkins slaves would be an effective solution for this.
Specific questions are things like:
If a master executes a project, where is the project defined vs where does the job execute?
If I want to execute a job that tests a language I don't want to support on the masters operating system (think Ruby on Windows), do I still need to make the master aware of that language in order to define the job, say by installing the relevant plugin?
If I define a slave that's running inside a VM and I stop the VM, when the VM comes back up, am I going to have to run some sort of start up task on the slave, or pre-execute task on the master, to re register the slave before I can start a project running on the slave?
When the slave task completes and the results are reported back, are those results stored on the master such that I can shut down the slave and still have access to previous test run results and trending information?
Thanks in advance for any advice.
If a master executes a project, where is the project defined vs where does the job execute?
The jobs are defined and stored on the Master, they are executed on the Slave machines. You can define which jobs get executed on which slaves by using labels.
If I want to execute a job that tests a language I don't want to support on the masters operating system (think Ruby on Windows), do I still need to make the master aware of that language in order to define the job, say by installing the relevant plugin?
The Master doesn't need to know about the build environment. If you set up the Slave with the proper build environment, that should be fine. The master just delegates the jobs and such.
If I define a slave that's running inside a VM and I stop the VM, when the VM comes back up, am I going to have to run some sort of start up task on the slave, or pre-execute task on the master, to re register the slave before I can start a project running on the slave?
It depends on how you are connecting the Slave to the Master. For example, if you connect a Windows machine with the launch method: "Let Jenkins control this Windows slave as a Windows service". It should reconnect automatically when the Slave is back online. There is some setup involved in getting this to work however.
When the slave task completes and the results are reported back, are those results stored on the master such that I can shut down the slave and still have access to previous test run results and trending information?
Console log are kept on the Master. That's probably what you want.
Hope that helps :)

One execution per Windows VMware VM as Jenkins slaves?

I am trying to run some automated acceptance tests on a windows VM but am running into some problems.
Here is what I want, a job which runs on a freshly reverted VM all the time. This job will get an MSI installer from an upstream job, install it, and then run some automated tests on it, in this case using robotframework (but that doesn't really matter in this case)
I have setup the slave in the vSphere plugin to only have one executor and to disconnect after one execution. On disconnect is shutsdown and reverts. My hope was this meant that it would run one Jenkins job and then revert, the next job would get a fresh snapshot, and so would the next and so on.
The problem is if a job is in queue waiting for the VM slave, as soon as the first job finishes the next one starts, before the VM has shutdown and reverted. The signal to shutdown and revert has however been sent, so the next job is almost immedieatly failed as the VM shuts down.
Everything works fine as long as jobs needing the VM aren't queued while another is running, but if they are I run into this problem.
Can anyone suggest a way to fix this?
Am I better off using vSphere build steps rather than setting up a build slave in this fashion, if so how exactly do I go about getting the same workflow to work using buildsteps and (i assume) pipelined builds.
Thanks
You can set a 'Quiet period' - it's in Advanced Project Options when you create a build. You should set it at the parent job, and this is the time to wait before executing the dependent job
If you'll increase the wait time, the server will go down before the second job starts...
Turns out the version of the vSphere plugin I was using was outdated, this bug problem is fixed in the newer version

Resources