Fault tolerant Jenkins on DCOS - jenkins

I am running a Jenkins server on DCOS as documented here https://docs.mesosphere.com/1.7/usage/tutorials/jenkins/.
The Jenkins server is able to spawn new mesos slaves when new jobs are scheduled and kill them when the job is completed.
But if a cluster node crashes, having a Jenkins job running on it, Jenkins server doesn't re-run the job on other available nodes.
Is the Jenkins service on DCOS fault tolerant?
Can we re-run the job(on some other available node) that failed due to cluster node crashed in between execution of the job?

Jenkins itself does not rerun jobs that disappear. It is not specific to DC/OS or Mesos, it's just the way Jenkins works.
DC/OS and Mesos will make sure that Jenkins stays running and available to send jobs to, and in this way, it is "fault tolerant", but in the way you are asking about it isn't.

Related

docker job on remote machine running but not for Jenkins

I am trying to configure a docker jenkins slave on a remote machine. I have spent hours trying to make it work, but still no success.
I have installed Jenkins locally and I can make any docker job locally run without problem. I just specify a label in a jenkins pipeline like:
agent {
label 'jenkins-slave'
}
The same does not work for a remote machine. In manage jenkins > manage nodes and clouds > configure clouds I set the proper docker URI. Enabled and exposed. Test connection is ok.
I set "attach docker container" and I set the Credentials in "registry credentials".
In the remote server I can see that the docker is (correctly) started. Specifically, many dockers have being started probably because the job is trying to launch them again and again.
What I see instead in the Jenkins pipeline is this:
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] node
Still waiting to schedule task
All nodes of label ‘jenkins-slave’ are offline
I have tried tens of combinations, and different guides, but still no clue of the reason it is not working.
PS I have also disabled the firewall to see if that was the problem, but it is not.
Any other way to have more verbose logs rather than the only "all nodes of label XXX are offline"?
Thanks

How to scale Jenkins slaves according to Build Queue on Kubernetes Plugin

I'm running on the Jenkins pipeline, so I have set the value to Time in minutes to retain agent when idle in slave configuration after I have run my jobs my slave but when the other job needs to run if a slave agent is full, jobs are waiting on Build Queue.
How I can scale the my slave count on Jenkins with Kubernetes plugin?
You're looking for the Container Cap setting, if you're using the Jenkins Configuration as Code (JCasC) plugin this is configured via the containerCapStr key.

Jenkins master and Slave installation on CI/CD pipeline

I am trying to implement CI/CD pipeline by using Kubernetes and Jenkins. I am planning to use Kubernetes HA Cluster having 3 master and 5 worker machine / node.
Now I am exploring about the implementation tutorials for CI/CD Pipeline. And also exploring about the Jenkins usage with Kubernetes HA Cluster. When I am reading , I felt little bit confusions about Jenkins. That I am adding here.
1. I have total 8 VMs - 3 Master and 5 Worker machines / nodes (Kubernetes cluster). If I installing Jenkins in any one worker machines , then is there any problem while integrating with CI/CD pipeline for deployment ?
2. I am previously readed the following link for understanding the implementations,
https://dzone.com/articles/easily-automate-your-cicd-pipeline-with-jenkins-he
Is this mandatory to use Jenkins master and slave ?. In this tutorial showing that If kubectl,helm and docker is installed then don't need to use Jenkins slave. What is the idea about master and slave here?
3. If I am installing both jenkins master and slave in kubernetes cluster worker machine / node , then Need to install master and slave in separate separate VMs? I have still confusion about where to install Jenkins?
I am just started on CI/CD pipeline - Kubernetes and Jenkins.
Jenkins has two parts. There's the master, which manages all the jobs, and the workers, which perform the jobs.
The Jenkins master supports many kinds of workers (slaves) via plugins - you can have stand alone nodes, Docker based slaves, Kubernetes scheduled Docker slaves, etc.
Where you run the Jenkins master doesn't really matter very much, what is important is how you configure it to run your jobs.
Since you are on Kubernetes, I would suggest checking out the Kubernetes plugin for Jenkins. When you configure the master to use this plugin, it will create a new Kubernetes pod for each job, and this pod will run the Docker based Jenkins slave image. The way this works is that the plugin watches for a job in the job queue, notices there isn't a slave to run it, starts the Jenkins slave docker image, which registers itself with the master, then it does the job, and gets deleted. So you do not need to directly create slave nodes in this setup.
When you are in a Kubernetes cluster in a container based workflow, you don't need to worry about where to run the containers, let Kubernetes figure that out for you. Just use Helm to launch the Jenkins master, then connect to the Jenkins master and configure it to use Kubernetes slaves.

How to kill the hung job that is running on jenkins slave

Need help regarding kill the hung job that is running on Jenkins slave. As the job is hung we cannot able to stop from Jenkins GUI and unable to find any PID (by login to the server) as the Slave.jar itself a process that is running on a slave server.Can anyone help me to kill the particular hung Job?
Note : Need not to do restart of slave , As some other jobs are running on same slave.
Thanks In advance

Jenkins Node - Scripts are not stopped after performing stop operation in Jenkins

I configured Jenkins Slave and created a job to run in slave.
After triggering the job , if I am stopping it in Jenkins it showing the Job has been stopped.
But in the node still script is still running.
I noticed that too. see also JENKINS-6188

Resources