Jenkins - How to handle concurrent jobs that use a limited resource pool? - jenkins

I'm trying to improve some of the testing procedures at work and since I'm not an expert on Jenkins was hoping you guys could maybe point me in the right direction?.
Our current situation is like this. We have a huge suite of E2E tests that run regularly. These tests rely on a pool of limited resources (AWS VMs that are use to run each tests). We have 2 test suites. A full blown regression that consumes, at its peak, a total of ~80% of those resources and a much more light weight smoke run that just uses 15% or so.
Right now I'm using the lockable resources plugin. When the Test Run step comes it checks whether you are running a regression or not and if you are then it will request the single lock. If it is available then all good and if not it will wait until it becomes available before continuing. This allows me to make sure that at no point there will be more than 1 regression running at the same point but it has a lot of gaps. Like a regression could be running and several smoke runs might be triggered which will exhaust the resource pool.
What I would like to accomplish on a best-case-scenario would be some sort of conditional rules that would decide whether the test execution step can go forward or not based on something like this:
Only 1 regression can be running at the same time.
If a regression is running allow only 1 smoke run to be run in
parallel.
If no regression is running then allow up to 5 or 6 smoke tests.
If 2 or more smoke tests are running do not allow a regression to
launch.
Would something like that be possible from a Jenkins pipeline? In this case I'm using the declarative pipeline with a bunch of helper groovy code I've put together over time. My first idea is to see if there's a way to check if a lockable resource is available or not (but without actually requesting it yet) and then go through a bunch of if/then/else to set up the logic. But again I'm not sure if there's a way to check a lockable resource state or how many of a kind have already been requested.
Honestly, something this complex might probably be outside of what Jenkins is supposed to handle but I'm not sure and figured asking here would be a good start.
Thanks!.

Create a declarative pipeline with steps that build individual jobs. Don't allow people to run the jobs ad-hoc, or when changes are pushed to the repository, and force a build schedule.
How can this solve your issue:
Only 1 regression can be running at the same time.
Put all these jobs in sequence in a declarative pipeline.
If a regression is running allow only 1 smoke run to be run in parallel.
Put smoke tests that are related to the regression test in sequence, just after the regression build, but run the smoke tests in parallel, prior to the next regression build.
If no regression is running then allow up to 5 or 6 smoke tests.
See previous
If 2 or more smoke tests are running do not allow a regression to launch.
It will never happen if you run things in sequence.
Here is an ugly picture explaining what I am talking about.
You can manually create the pipeline, or use the coolness of blue ocean to give you a graphical interface to put the steps in sequence or in parallel:
https://jenkins.io/doc/tutorials/create-a-pipeline-in-blue-ocean/
The downside is that if one of those jobs fails, it will stop the build, but that is not necessarily a bad thing if the jobs are highly correlated.

Completely forgot to update this but after reading and experimenting a bit more with the lockable resources plugin I found out you could have several resources under the same label and request a set quantity whenever a specific job starts.
I defined 5 resources and set the Jenkinsfile to check whether you are running the test suite with the parameter regression or not. If you are running a full regression it will try to request 4 locks while a smoke test will only try to request 1. This way when there aren't enough locks available the job will wait until either the enough amount becomes available or the timeout expires.
Here's a snippet from my Jenkinsfile:
stage('Test') {
steps {
lock(resource: null, label: 'server-farm-tokens', quantity: getQuantityBySuiteType()) {
<<whatever I do to run my tests here>>
}
}
resource has to be null due to a bug in Jenkin's declarative pipeline. If you're using the scripted one you can ignore that parameter.

Related

Allow failure in Kubeflow Pipelines

Context
I have a kubeflow pipeline running multiple stages with python scripts. In one of the inner stages, I use kfp.dsl.ParallelFor to run 5-6 deep learning models, and in the next stage, I choose the best one with respect to a validation metric.
Problem
The issue is if one of the models fail, the whole pipeline fails. It'll complain that the dependencies of the next stage is not satisfied. However, if model A fails and model B is still running at that time, the pipeline state will continue to be running till the time model B is running, and it'll change only at end of all models in that stage.
Question
How can I allow partial failures in a stage? As long as at least one of the model is working, the next stage can work. How do I make it happen in kubeflow? For example, I have setup CI in Gitlab, which supports this.
If it is not possible to have this, I want the pipeline to fail immediately as soon as one model fails, and not wait for others only to fail later, which possibly can be way later based on configurations.
Obviously, a way to avoid failure will be to include a top level try - except in the python script, and it'll always return exit code as 0. However, in this way there shall be no visual indication that one (or more) models failed. It can be recovered from the logs, but it's rarely monitored in a scheduled pipeline when the entire run status is successful.

CI strategy for embedded development - on-hardware or manual-intervention-required tests

We have a bunch of tests and are implementing CI according to git flow, using Jenkins.
Some of these tests require hardware. However, some of those tests can take 4+ hours (or even 24+ hours) to run, and require hardware that we only have 1 or 2 copies of. Some also need to be run at night.
Furthermore, a minority of tests require some limited manual intervention every few hours to swap a chip out.
I know that a common strategy is to make a test slave for the hardware tests. However, if the job takes a day or more, every time something is pushed to a pull request, that will be prohibitively expensive.
Is there a common solution to this problem? Is GitHub Flow possible within these constraints, or am I going to require release branches, and the understanding that master is not guaranteed to be release-ready at any point since it won't have these tests run?
Is there some way to trigger a specific job through GitHub to launch these expensive jobs, so that they are run only if required?
reviewing your branching strategy can be part of the solution.
I would also review my test strategy, I would execute a small set of fully automated and quick tests on master for instance and the whole batch of tests on release branch.

Massive-Distributed Parallel Execution of tasks

We are currently struggling with the following task. We need to run a windows application (single instance only working) 1000 times with different input parameters. One run of this application can take up to multiple hours. It feels like we have the same problem like any video rendering farm – each picture of a video should be calculated independently and parallel – but it is not rendering.
Currently we tried to execute it with Jenkins and Pipeline jobs. We used the parallel steps in pipeline and lets Jenkins queue and execute the application. We use the Jenkins Label Expression to lets Jenkins choose which job can be run on which node.
The limitation in Jenkins is currently with massive parallel jobs (https://issues.jenkins-ci.org/browse/JENKINS-47724). When the queue contains multiple hundred jobs adding new jobs took much longer – will become even worse by increasing queue. And main problem: Jenkins will start the execution of parallel pipeline part-jobs only after finishing adding all to the queue.
We already investigated ideas how to solve this problem:
Python Distributed: https://distributed.readthedocs.io/en/latest/
a. For single functions it looks great, but for the complete run like we have in Jenkins => Deploy and collect results looks complex
b. Client->Server bidirectional communication needed – no chance to bring it online through a NAT (VM Server)
BOINC: https://boinc.berkeley.edu/
a. for our understanding we had to extend the backend in a massive way to bring our jobs working => to configure the jobs in BOINC we had to write a lot of new automating code
b. currently we need a predeployed application which can differ between different inputs => no equivalent of Jenkins Label Expression
Any ideas how to solve it?
Thanks in advance

Jenkins Pipeline and huge amount of parallel steps

I have searched the whole internet for 2 weeks now, asked on freenode IRC and in the Jenkins user group mailing list for that but got no answer so here I am, you are my last hope (no pressure)
I have a Jenkins scripted pipeline that generates hundreds of parallel branches that have to run simultaneously on hundreds of slaves node. At the moment it looks like Jenkins BlueOcean user interface is not suited for that. We reach a point were all the steps can't be displayed.
I need to provide some kind of background to let you understand our need: We have a huge project in our company that have thousands of Behat/Selenium and this takes more that 30h to run now if done sequentially. We implemented a basic solution some times ago were we use a queuing system (RabbitMq) to store all the tests and consumers that run the tests by downloading the source code from Jenkins and uploading artifacts back to Jenkins too, but this is not as scallable as Jenkins native slaves and it is not maintainable enough (eg. we don't benefit from real time output log and usage statistics).
I know there is an open issue that describe the problem here : https://issues.jenkins-ci.org/browse/JENKINS-41205 but, basically, I need a workaround working for the next week (Our deelopment team are waiting for this new pipeline for a long time now).
Our pippeline looks like that at the moment:
Build --- Unit Tests --- Integration Tests --- Functional Tests ---
| | |
tool A suite A matrix-A-A-batch 0
tool B suite B matrix-A-A-batch 1
tool C matrix-A-A-batch 2
matrix-A-A-batch 3
....
"Unable to display more"
You can find a full version of our Jenkinsfile here : https://github.com/willy-ahva/pim-community-dev/blob/086e4ed48ef1a3d880ca16b6f5572f350d26eb03/Jenkinsfile (It may looks complicated but, basically, the real problem is the "Functional Tests" stage)
My questions are:
Am I using parallel the good way ?
Is it only a Jenkins/BlueOcean issue and I should contribute to the issue I linked ? (If yes, how ? I'm not a Java dev at all)
Should I try to use MultiJob and parallelize jobs instead of steps ?
Is there any other tool except parallel that I can use ? (some kind of fork or whatever) ?
Thanks a lot for your help. I love what Jenkins became with the Pipeline and BlueOcean UI and I really want to make it work in our team.
This is probably a poor way to do the parallel tasks. I would instead treat each parallel map entry as a worker, and put your tests into a queue / stack / data structure. Each worker thread could pop off the queue as required, and then you wouldn't sit there with a million tasks queued. You would have to be more careful with your logging so that it is apparent which test failed, but that shouldn't be too tough.
It's probably not something that's easy to fix, as it is as much a UI design issue as anything else. I would recommend that you give it a poke though! Who knows, maybe a solution will click for you?
Probably not. In my opinion this makes this muddier
Parallel is your option for forking.
If you really want to keep doing this, but don't want the UI to be so weird, you can stop defining each test as a stage. It'll be less clear what failed when one fails, but the UI should be happier.

When do you have to make a new job in Jenkins

So I want to make a build-test-deploy environment in Jenkins:
I want to do:
- a build
- a karma test
- a protractor test
- a deploy
Now a very simple but important question: Do I have to do everything in one job (what's possible) or do I have to split it up in several jobs (and cd (going) to the build directory?). So it isn't clear when I have to make a new job.
It is really a matter of taste and your exact needs.
If you do not plan running build steps individually time after time (that is, if you only care about the build as an atomic piece), or if your build flow is simple and linear, it would make more sense to stick to a single job - this way you will keep all the configuration in one place and have a good overview of build results.
If, however, there are different paths that the build process may take, or the steps themselves involve more complex logic, or, for instance, there is a need for collecting statistics on each of them, then it might be more beneficial to extract some of the steps to separate jobs and chain them together according to your rules. Jenkins is super-flexible and does not enforce any particular approach upon you.

Resources