retry jenkins kubernetes agent connection - jenkins

I am using kubernetes plugin in Jenkins pipelines to create agents in kubernetes. I am able to launch, connect and do builds on the agents. However, when the agent pod doesn't have enough capacity, the agent bringup fails immediately with "forbidden: exceeded quota" error. My question is, is there a way to retry 'n' number of times with sleep time inbetween to bringup the agent as other builds running on kubernetes can finish and free up resources.
Thanks,
GD

the kubernetes plugin version i was using is 1.27.7 and apparently this is a known bug in that version ( https://issues.jenkins.io/browse/JENKINS-63976 ). the bug seems to be fixed on kubernetes plugin version 1.28.6.

Related

How to save on connectivity between VM to Agent and continues in long pipeline step

I runs a Jenkins which have some steps (prepare VMs before tests, check connectivity between Agents to VMs and running the tests) when the last step took approximately 10 hours and sometimes the connection betwenn the agent to the VM is down.
My question is if there any way to save on connectivity betwenn the VM to the agent and to could be recover from failers (reconnect the agent to VM and continue the tests when it stops)
more information:
I ran a java application with mvn clean install
I am not aware of solutions to recover a pipeline on slave disconnection.
However regarding slave re-connection I would use a simple cron like the following:
https://github.com/fredericrous/JenkinsDevEnv/blob/master/jenkins-slave-init#L43

jenkins kubernetes-plugin slaveConnectTimeout not honoured

i am running jenkins 2.103 in docker and have connected it to a kubernetes on arm cluster.
i have been able to manually connect the jnlp (v3.16) slave to the master, however it appears to take around 15mins for it to fully connect and report as online. Once online I can run builds as expected.
The problem is that it appears the 'slaveConnectTimeout' setting in the podTemplate is not honoured in the pipeline configuration, and neither is the default template setting of 'Timeout in seconds for Jenkins connection' in Pod Template section of Global Settings.
has anyone be able to make this setting work, and, does anyone have any idea what could be causing the 15min delay in registration?
this issue has been raised as a bug JENKINS-49281 now as well.
the issue ended up being openjdk and me not fully understanding what the kubernetes timeout is all about.
the delay in agent registration is not just a jenkins issue, i have seen the same behaviour in gocd and other java based apps. platform issue, not app issue

jenkins on demand slaves windows

The on-demand slaves are being created successfully from Jenkins. The first build on the slave is successful but the subsequent builds are fails. The restart of the slave or restart of the wimrc services allows the build to proceed again.
The tcpdump shows no errors. Can't figure out what the issue is. Looks like issues Jenkins communicating with on demand slaves using wimrc.
Has anybody faced similar issue?
The on-demand slaves are windows slave
The issue was with the "MaxMemoryPerShellMB" parameter of the Winrm. This was set to too low. So when the npm or git was doing a checkout it was running out this memory in the Winrm shell.
I have increased to 1GB and its working fine now

Jenkins and gitlab sharing build slaves

Let's say you have a gitlab instance and it already uses Jenkins for all its CI builds via the gitlab Jenkins plugin, etc. The Jenkins setup has a modest collection of build slaves providing a variety of platforms, etc. and each slave is set up to run just one job at a time (i.e. a Jenkins job gets exclusive access to the build slave, which is important for reasons I won't go into here).
Now let's say you want to consider using gitlab's own native CI support, moving one or more projects over to gitlab instead of Jenkins. The gitlab CI would need to use the same set of build slaves, but it needs to play nice with Jenkins and the two need to cooperate so that if one runs a job on a particular slave, the other won't submit a job to that same slave until the first finishes. In effect, while Jenkins is running a job on a slave, gitlab should see that slave as unavailable and vice versa.
Anyone have working methods for getting gitlab to tell Jenkins it is using a slave while it runs a CI job on there and vice versa? The method doesn't have to be 100% bullet proof, it would potentially be okay if both gitlab and Jenkins run a job on the same slave at the same time if it is a rare event (i.e. race conditions could potentially be tolerated if the frequency of occurrence is likely to be low).
Additional info:
Build slaves include Linux, Windows and Apple.
Docker is not used and would not be permitted at this time.
We have full admin access to everything, but changing code in gitlab or Jenkins themselves would be rejected. Adding scripts or plugins would be okay.

how to relaunch building application after jenkins slave agent was rebooted

we have jenkins project. use case:
jenkins triggers the build
slave agent builds application
server with slave agent goes to reboot (for any reason, for example, problem with electricity, somebody rebooted it, resource shortage and so on)
after that jenkins reports about failed build. how can we automatically relaunch application building in jenkins when slave agent recovered from failure?
There are two aspects to this issue -
Jenkins Server needs to reschedule the build that failed(when the slave-machine crashed).
Install the Naginator Plugin
Set it to rebuild whatever job you have set on the problematic slave
Jenkins Slave needs to restart automatically as soon as its host is up again.
On Windows, for example, you need to set it with a service that starts automatically
Note the Naginator Plugin doesn't know what caused the build to fail,
so it will try to rebuild any build that fails.
To solve this, scan the log for an indication that the slave crashed
and set a regular expression (in the Naginator) to catch it.
Cheers

Resources