Jenkins on-demand build agents with SLURM - jenkins

I have access to an HPC cluster that uses SLURM to schedule jobs. I would like to evaluate having Jenkins submit jobs to the cluster.
As a simple test, it works to add the HPC login node to Jenkins, and configure a Jenkins job that executes srun ... ./myprogram on the login node. The srun command tells SLURM to assign a compute node and execute ./myprogram on it.
This is not an ideal solution. Every Jenkins job has to be written to use srun. Scripts of more than one command are difficult. The compute node is not running the Jenkins agent so features are limited.
Can I have Jenkins instead spawn new nodes on-demand whenever a job is in the queue? It could then execute srun ... jenkins-agent on the login node to spawn a Jenkins agent on some compute node. And then the compute node would connect back to Jenkins and take the enqueued job. I would briefly a node hpc-job-1643 appear and then it would disappear after the job completes.

Yes, you can have Jenkins launch nodes on-demand through Slurm.
Create a new node. Set the remote root directory to something accessible from all nodes of your Slurm cluster.
Set a label that your jobs will use to bring up this node.
Set Usage to "Only build jobs with label expression matching this node".
Select the "Launch agents via SSH" launch method (you need a plugin for this, I think)
Set the host to the hostname of your Slurm control node.
Set up the credentials and host key verification strategy as appropriate.
Click the "Advanced ..." button for the SSH launch method.
Set the Prefix Start Agent Command to srun --pty [options for whatever resources and constraints you need] sh -c '. Note the single quote at the end.
Set the Suffix Start Agent Command to '. That's a single quote.
For Availability, select "Bring this agent online when in demand, and take offline when idle". You'll need to set an in-demand delay (I use 0) and an idle delay.

Related

Prevent jobs from running on jenkins slave if a job of slave's own pipeline is running on it

I have master jenkins and slave jenkins. I hav kept slave jenkins no of build executors as 1. Slave Jenkins also has 1 pipeline (Lets say pipeline A).
lets suppose a job from slave jenkins' own pipeline is running right now (Job A). I schedule a job from the master jenkins for slave jenkins (Job B).
I dont want Job B to run while job A is running as both jobs use shared resources.
Right now, Job B runs in parallel with Job A, which is causing Job A to fail.
How to do that?
Thanks!
Your implementation is a bit tricky since you are talking about 2 separate machines with 2 separate Jenkins instances. One option is to get rid of the Jenkins instance in the slave machine and move the Jenkins job that runs on it to the master machine. Then, you can schedule the job to use the resources of the slave machine while being managed by the master machine. If you do that, no further configuration will be needed since you have set the number of executors to 1.
If that is not possible, the other option is to find a way for them to communicate with each other that a build is running. Consider the third point of this answer. You can have a variable in a database somewhere and when one job starts, it updates the variable. Before the second job starts, it has to poll the variable to see if there is a job already running. If yes, the build doesn't start, if no, build starts and updates the variable.
Another less elegant solution is to simply have a text file in a location accessible to both machines and write the variable data into that instead of a database.
One way to do this is by using the Lockable Resources Plugin.

How to run jenkins build alternatively on agent nodes?

Let's say I have a job A and also a agent configured. I want to run build 1 of Job A on master and build 2 of Job A on agent node.
Is there an option to achieve that ?
OR
Is there a way where my job looks at controller and if it already finds a build running, then start the next build on agent ?
Are you intending to run in parallel or just alternate? (Not a good idea to run jobs on master; could configure a node to run on same host as "master".). Seems to be parallel and you have restricted to one executor each on master and agent (you can have more, in which case any advice may be moot).
Nevertheless, Jenkins queue job allocation to executors is "sticky"; it tries to run where last run, unless unavailable. This can lead to overloading in nodes. So the M,A,M,A pattern is unnatural.
There are plugins that might help: Least Load, Scoring Load Balancer, but maybe not.
Perhaps an approach would be to restrict your job using a label and have a post-build groovy step that moves the label to the other upon success for the next run or two labels and the job self-modifies the label to match the other.

Can one configure Jenkins to fail a job if it takes too long to provision an agent?

I'm working with the Jenkins AWS EC2 plugin, which spawns EC2 nodes to execute Jenkins jobs. There are several cases where this plugin can hang indefinitely while waiting for a node to be provisioned. For example, if a project requires python but the EC2 image doesn't have python, Jenkins will spin up a node, fail to run the job, spin up another node, fail to run the job, spin up another node...
Meanwhile, the job hangs forever, Jenkins racks up an Amazon bill, and the console output looks like this:
[Pipeline] Start of Pipeline
[Pipeline] node
Still waiting to schedule task
‘Jenkins’ doesn’t have label ‘ec2worker’
Generally the solution is to just configure the EC2 cloud correctly in the first place, but easier said than done. It's easy to imagine, for instance, someone adding, say, node.js as a project dependency without updating the EC2 image, and then Jenkins is off to the races trying bill an AWS high score...
Ideally I could configure the plugin to limit the number of provision attempts before quitting, but there isn't an option for this. There is an option to limit the total number of nodes provisioned, but since each node is terminated after it's deemed unsuitable, Jenkins only considers there to be one active node. I.e., the number of nodes oscillates between 0 and 1, as Jenkins creates a node, discards it, and then creates another.
So I'm looking for a workaround. Is there a way to configure Jenkins to fail a build in the provisioning step? Can I limit the time it takes to create a node without limiting the total time of the whole job?
Preferably this configuration would be system-wide. But if it has to get pushed to each project config file, I imagine it looking something like this:
pipeline {
agent {
timeout(5m) {
label 'ec2worker'
}
}
}
Is there a Jenkins feature or plugin that does something like this?
In the end I couldn't find anything in Jenkins that did what I wanted -- though the Jenkins EC2 Plugin does seem to have a ticket in to support this missing feature.
I solved the problem in AWS with a lambda. The lambda is triggered whenever Jenkins destroys an instance, and from that event the lambda calculates how long the instance was alive. If it wasn't long enough (less than the idle period the node normally waits for), then Jenkins must be hanging and the lambda kills the job.

Jenkins - triggered builds on all Nodes

Currently, we have two machines. One has Jenkins installed and is hosted as master in Jenkins and another one is Slave. Number of executors for both Nodes are set to 1.
I am not exactly sure how Jenkins work behind the scenes but currently when I triggered 2 build jobs simultaneously, it somehow runs only on slave node (and put another build job in queue), if I disconnect the slave and leave only master, then it would run on master(and put another build job in queue).
How to configure Jenkins so that it leverage all my available nodes (master and slave). In other words, I would like to have all available nodes consumes the queue and not just for one of the Nodes.
As I understand, you need to enable Execute concurrent builds if necessary option in your job configuration and then you will be able to run your job simultaneously on all available nodes.
In addition to the above answer. We can also restrict the job to a particular node on which it should run.
For eg
A setup of 3 servers(2 Linux and one windows )
1 Linux server acts as master
1 Linux server acts as node
1 window server as as node
If we have a job that needs to be run on the windows node you can go to the job configuration and restrict the job to run on that node using the node name or label.
Additionally, the no. of executes define the instances of the slave or master node that can be executed parallelly across different jobs.
For running same job you need to check the enable concurrent build option and assign a label having more than 1 nodes in it
Cheers,
Yash

Use different slave for same job on different runs in Jenkins

How do we make Jenkins start the Job in a different slave every time it is run. It usually picks the slave which is free, but what if we wish to run the jobs in a different slave environment.
When you configure your job you should tick the box that says "Restrict where this project can be run" and add the name of the specific node you want your job to run on.

Resources