I’m using Jenkins master slave with Gerrit plugin. Multibranch scanning is taking very long (around 24mins) for a repo with around 80 remote branches and 750Mb in size. Jenkins master is running on version 2.319.3 with 8 vCPUs and 32GB memory. However, the resource usage is very low despite the scanning takes very long. I’ve also checked the disk IOPS and network both looks good. Manually execution of git fetch command in master takes only 45 seconds. Any reasons for the long branch indexing?
Related
I'm running a single-node Jenkins instance on a project where I want to clean up closed PRs intermittently. The system can keep up just fine with the average PR submission rate; however, when we re-run the scan repository against the project the system encounters a ton of parallel build congestion.
When we run scan repository the multi-branch pipeline fires off the separate stages and those generally complete sequentially without issue and without hitting timeouts; however, the priority of declarative post actions seems lower than the other stages (no special plugins are installed that would cause that AFAIK). The behavior exhibited is that parallel builds start running impacting the total run-time for any one branch or pull request such that all of the builds might indicate say 60 or 90 minute build times instead of the usual 6-10 minutes when the only task remaining is filling out the checkstyle reports or whatever minor notifications tasks there are.
Is there a way to dedicate a single executor thread on the master node for declarative post actions so one branch or PR can be ran from end-to-end without being stuck waiting for available executors that have suddenly been picked to start a different PR or branch and run the earlier (and computationally expensive stages like linting code and running unit tests) in order to avoid ultimately hitting a timeout?
I am running jenkins multi branch job, suddenly it not allow me to change the configuration changes, its keep on loading without any timeout issue.
Can you please some one help me on this ?
You could have a look at the Jenkins master machine CPU and memory. Look what is consuming them. I have seen this happening when the CPU is nearly 100 %. In this case, restarting the Jenkins process or Jenkins master machine could help.
Try to remember/ask colleagues if there are any recent changes to Jenkins master machine. We had similar issues after installing plugins.
Avoid executing jobs on Jenkins master, use slave agents.
You may need to clean up old builds if you are not doing this already.
in my case, after disabling / enabling all plugins one by one, it was the "AWS SQS Build Trigger Plugin", causing the "save / apply" buttons to move, and not be functional
I want to upgrade my jenkins master without aborting or waiting for long running jobs to finish on slaves. Is there a plugin available that provides this feature?
We have several build jobs running regression and integration tests which take hours to run. Often, at least one of those jobs is running, making it hard to restart jenkins after updates. I know, that it is poosible to block the queue. We tried this, but it hinders more than it helps.
What we are looking for is a plugin, that runs jobs on slaves, caches the output as soon as the connection to the master is interrupted and sends the remaining output to the master when the master is up again. Does anybody know a plugin providing this feature.
Background
Can anyone provide some insight into exactly what the responsibilities of a Jenkins master is with regards to capacity planning.
I have a single master and slave set up currently where the master is a much less powerful EC2 t2.micro but the slave is a t2.medium.
Every now and then the master dies and there are errors relating to out of memory errors and unable to allocate memory on checking out a project. Jenkins has been configured with -Xmx768m.
I have verified that the build is tied to the slave node and not the master. The master has 0 executors configured and the job shows as running on the slave (1 executor).
Example Error
This is one such example
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/xyz.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:825)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1092)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1123)
at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113)
at org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:130)
at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:120)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:262)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git fetch --no-tags --progress https://github.com/xyz.git +refs/pull/16/head:refs/remotes/origin/PR-16 +refs/heads/develop:refs/remotes/origin/develop" returned status code 128:
stdout:
stderr: error: cannot fork() for fetch-pack: Cannot allocate memory
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1709)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:72)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:400)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:823)
... 8 more
My Question
What is the role of the master in this setup from a resource utilisation perspective? I'd have assumed it would be the slave that is checking out the project and building (not the mater). - hence why I allocate much more resource to the slave in terms of memory, CPU, disk capacity and IOPS.
Aside from centralised plugin configuration I didn't expect the master to play much of a role beyond starting an agent on the slaves via SSH (not something I expected to be intensive work from a mem/CPU point of view).
In the short term I plan to upgrade the master to a more powerful EC2 instance type, however - it'd be good to understand more about what the master really requires to ensure I'm planning capacity properly and not needlessly provisioning far too much.
If you're getting this error in the build log, and you've scripted your build to run on the slaves, either by requesting a node tag or by reducing the number of the executors on the master to 0, then the checkout will happen in the build agent.
If so, then what's running out of memory is not your master, but your slave.
I would try increasing the heap size for the slave.jar. How you do this will depend on how you're firing up the slaves.
From this bit in the Jenkins documentation you'll see it reads:
The JVM launch parameters of these Windows services are controlled by an XML file jenkins.xml and jenkins-slave.xml respectively. These files can be found in $JENKINS_HOME and in the slave root directory respectively, after you've install them as Windows services. The file format should be self-explanatory. Tweak the arguments for example to give JVM a bigger memory.
But if you're firing them up from the command line using java -jar slave.jar then here's where you should make use of the Xmx parameter. Maybe the box where you're running the agent does have enough memory available, but the Jenkins agent is just not using it.
Also, here's a really interesting page in the Jenkins wiki covering how to size both masters and slaves.
I use the Jenkins kubernetes plugin to build a job, but when I start build I have to wait for about 15s before the slave is online. Why does this happen?
You need to be aware that when you use the Kubernetes Plugin, your Jenkins-slave is created on demand when you build a job. Supposing you are using the jnlp-slave as your jenkins-slave image, 15s is the time need for k8s to schedule the pod and to start up the jnlp slave jar.
What you can do in order to optimize the time, is using the option Time in minutes to retain slave when idle in the Kubernetes Plugin configuration, that holds the pods running for a determined amount of time, so the next builds can reuse that slave.
I had the same problem. I fixed this to a large extent by making sure that my master and slave images were in the Same region, subnet as well VPC. Moreover, I was fetching some config files from AWS S3 and moved the bucket to the same region as well.
All of this combined should make everything much quicker.