Jenkins Master vs Slave Capacity Planning - jenkins

Background
Can anyone provide some insight into exactly what the responsibilities of a Jenkins master is with regards to capacity planning.
I have a single master and slave set up currently where the master is a much less powerful EC2 t2.micro but the slave is a t2.medium.
Every now and then the master dies and there are errors relating to out of memory errors and unable to allocate memory on checking out a project. Jenkins has been configured with -Xmx768m.
I have verified that the build is tied to the slave node and not the master. The master has 0 executors configured and the job shows as running on the slave (1 executor).
Example Error
This is one such example
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/xyz.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:825)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1092)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1123)
at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113)
at org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:130)
at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:120)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:262)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git fetch --no-tags --progress https://github.com/xyz.git +refs/pull/16/head:refs/remotes/origin/PR-16 +refs/heads/develop:refs/remotes/origin/develop" returned status code 128:
stdout:
stderr: error: cannot fork() for fetch-pack: Cannot allocate memory
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1709)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:72)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:400)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:823)
... 8 more
My Question
What is the role of the master in this setup from a resource utilisation perspective? I'd have assumed it would be the slave that is checking out the project and building (not the mater). - hence why I allocate much more resource to the slave in terms of memory, CPU, disk capacity and IOPS.
Aside from centralised plugin configuration I didn't expect the master to play much of a role beyond starting an agent on the slaves via SSH (not something I expected to be intensive work from a mem/CPU point of view).
In the short term I plan to upgrade the master to a more powerful EC2 instance type, however - it'd be good to understand more about what the master really requires to ensure I'm planning capacity properly and not needlessly provisioning far too much.

If you're getting this error in the build log, and you've scripted your build to run on the slaves, either by requesting a node tag or by reducing the number of the executors on the master to 0, then the checkout will happen in the build agent.
If so, then what's running out of memory is not your master, but your slave.
I would try increasing the heap size for the slave.jar. How you do this will depend on how you're firing up the slaves.
From this bit in the Jenkins documentation you'll see it reads:
The JVM launch parameters of these Windows services are controlled by an XML file jenkins.xml and jenkins-slave.xml respectively. These files can be found in $JENKINS_HOME and in the slave root directory respectively, after you've install them as Windows services. The file format should be self-explanatory. Tweak the arguments for example to give JVM a bigger memory.
But if you're firing them up from the command line using java -jar slave.jar then here's where you should make use of the Xmx parameter. Maybe the box where you're running the agent does have enough memory available, but the Jenkins agent is just not using it.
Also, here's a really interesting page in the Jenkins wiki covering how to size both masters and slaves.

Related

Jenkins slow multibranch pipeline scan for Gerrit repo

I’m using Jenkins master slave with Gerrit plugin. Multibranch scanning is taking very long (around 24mins) for a repo with around 80 remote branches and 750Mb in size. Jenkins master is running on version 2.319.3 with 8 vCPUs and 32GB memory. However, the resource usage is very low despite the scanning takes very long. I’ve also checked the disk IOPS and network both looks good. Manually execution of git fetch command in master takes only 45 seconds. Any reasons for the long branch indexing?

Jenkins not able to allow to save the configuration

I am running jenkins multi branch job, suddenly it not allow me to change the configuration changes, its keep on loading without any timeout issue.
Can you please some one help me on this ?
You could have a look at the Jenkins master machine CPU and memory. Look what is consuming them. I have seen this happening when the CPU is nearly 100 %. In this case, restarting the Jenkins process or Jenkins master machine could help.
Try to remember/ask colleagues if there are any recent changes to Jenkins master machine. We had similar issues after installing plugins.
Avoid executing jobs on Jenkins master, use slave agents.
You may need to clean up old builds if you are not doing this already.
in my case, after disabling / enabling all plugins one by one, it was the "AWS SQS Build Trigger Plugin", causing the "save / apply" buttons to move, and not be functional

Jenkins-swarm slaves go down

We have a large number of jenkins slaves setup vip jenkins-swarm plugin(auto-discover the master)
Recently, we started to see slaves going offline and existing jobs get stuck. We fixed it with restarting the master. However, it has been happening too frequently. Everything seems to be fine, no network issue nor gc issue.
Anyone?
On the jenkins slaves, we see this repeatity once the node become unavailable:
Retrying in 10 seconds
Attempting to connect to http://ci.foobar.com/jenkins/ 46185efa-0009-4281-89d2-c4018fa9240c with ID 5a0f1773
Could not obtain CSRF crumb. Response code: 404
Failed to create a slave on Jenkins CODE: 409
A slave called '10.9.201.89-5a0f1773' is already created and on-line

How many JVM invoked by Jenkins?

I use java -jar jenkins.war and java -jar slave.jar to run Jenkins master and slave. I want to know how many JVM invoked by Jenkins and I can configure the parameter of them.
In Master:
I think only one JVM (I don't run job in master)
In Slave:
java -jar slave.jar => one JVM
every job have a new JVM, this JVM runs pre-build step, get source code (SVN, GIT, ...), post-build step
every Maven will have its own JVM every Junit will have its own JVM
Another question is, I can set the JVM for slave in management node's advanced section, but who use it's configuration?
Every Maven build does not run in its own JVM. Java is multi-threaded. When you launch a slave you can configure the number of threads it can handle, same goes for the master.
Typically you run builds on slave nodes. On unix systems these can be setup to be automatically run from the master on remote nodes.
Manage Jenkins -> Manage Nodes -> New Node
Under Launch Advanced options you can specify the JVM parameters for the remote JVM running the Jenkins node software.
Each option has help, for example the "# executors" option:
This controls the number of concurrent builds that Jenkins can perform. So the value affects the overall system load Jenkins may incur. A good value to start with would be the number of processors on your system.
Increasing this value beyond that would cause each build to take longer, but it could increase the overall throughput, because it allows CPU to build one project while another build is waiting for I/O.
When using Jenkins in the master/slave mode, setting this value to 0 would prevent the master from doing any building on its own. Slaves may not have zero executors, but may be temporarily disabled using the button on the slave's status page.

Is this a supported Jenkins Master-slave configuration?

We have a master Jenkins running on a Linux system. The same master is attached as node using "Launch slave via execution of a command on the master". It has the same FS root as the JENKINS_HOME. The command is ssh "machine_name" "shell_script"
The shell script gets the latest slave.jar and runs it.
The master has 0 executors. The node has been given 7. I'm seeing weird behavior in the builds, like workspaces being deleted once a day, etc. I'm not sure if this is related to the way the Jenkins Master-slave is configured.
Any ideas if this is a supported configuration?

Resources