How can I reduce Jenkins master/slave communication? - jenkins

I have set up a Jenkins master (linux) and slave (Windows) on Amazon EC2. My build is running on the slave and it is considerably slower than what I was seeing on a similar desktop machine (c4.large).
Checking into the load on the slave, neither the CPU cores, nor the memory is fully used (CPU at 60%, 30% each, memory stable at about 2.5GB)
However, I'm starting to think the bottleneck is in the network traffic. I can see that there seems to be a lot of network traffic going on between master and slave. It averages at about 500Kbs going out and 250Kbs coming in, but there are regular spikes going into the multiple Mbs.
I've searched around, but I can't really figure out what data Jenkins is sending.
The job is doing quite a bit of logging, but this is not the only source. There's a console log of about 15MB for a 2 hour build, which would translate to roughly (15*1024*8)/(2*60*60) = 17Kbs
So my questions:
What else is Jenkins communicating between master and slaves?
How can I reduce this?
Update: Just to be sure, I disabled all logging during the build. And it turns out that was the source of the traffic. I don't understand why there's so much traffic given my calculations above. Maybe there's some kind of polling/acknowledging going on that's causing a lot of overhead, I still don't know.
It's apparently also not the main source of the slowdown, but still, I would like to find a way to reduce this network traffic as this is going to impact the AWS bill.

Related

Can't generate more than ~8000 RPM from Locust

I'm using Locust to load test my web servers. I'm running Locust in distributed mode. The worker nodes are written in Java, and use the Locust/Java port using locust4j. The master node and the worker nodes are containerized, our orchestrator is Kubernetes. When I want to spin up more workers, I'm doing it from there.
The problem that I'm running into is that no matter how many users I add, or worker nodes I add, I can't seem to generate more than ~8000 RPM. This is confirmed by the Locust web frontend, as well as the metrics I'm collecting from my web server.
Does anyone have any ideas why this is happening?
I've attached an image of timings I've collected. The snapshots are from running the load test for 60 seconds, I'm timing it from a stopwatch.
The usual culprit in these kinds of situations is your servers can't handle more than that. In my experience, the behavior you'll see client side as the servers get overwhelmed is you'll start to see a slow but steady increase in response times. This is one big reason why Locust includes those in the metrics it shows you.
Based on what I'm seeing in your screenshots, this is most likely the case for you. You have some very low minimum times but your average, median, and 90%iles are a lot higher than your minimums; your maximums are very significantly higher than those. Without seeing your charts I can't know for sure but that's a big red flag.
For more things to look out for, check out this question in the FAQ (especially see the list of server stats to investigate):
https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

Jenkins: jobs in queue are stuck and not triggered to be restarted

For a while, our Jenkins experiences critical problems. We have jobs hung, our job scheduler does not trigger the builds. After the Jenkins service restart, everything is back to normal, but after some period of time all problem are return. (this period can be week or day or ever less). Any idea where we can start looking? I'll appreciate any help on this issue
Muatik has made a good point in his comment, the recommended approach is to run jobs on agents (slave) nodes. If you already do it, you can look at:
Jenkins master machine CPU, RAM and hard disk usage. Access the machine and/or use plugin like Java Melody. I have seen missing graphics in the builds test results and stuck builds due to no hard disk space. You could also have hit the limit of RAM or CPU for the slaves/jobs you are executing. You may need more heap space.
Look at Jenkins log files, start with severe exceptions. If the files are too big or you see logrotate exceptions, you can change the logging levels, so that fewer exceptions are logged. For more details see my article on this topic. Try to fix exceptions that you see logged.
Go through recently made changes that can be the cause of such behavior, for example, new plugins, changes to config files (jenkins.xml)?
Look at TCP connections. Run netstat -a Are there suspicious connections (CLOSED_WAIT status)?
Delete old builds that you do not need.
We have been facing this issue from last 4 months, and tried everything, changing resources CPU & memory, increasing desired nodes in ASG. But nothing seems worked .
Solution: 1. Go to Manage Jnekins-> System Configurationd-> Maven project
configurations
2. In "usage" field, select "Only buid Jobs with label expressions matching this nodes"
Doing this resolved it and jenkins is working as a Rocket now :)

Neo4j HA Servers keep failing

We have just put our system into production and we have a lot of users on the production system. Our servers keep failing and we are not sure why. It seems to start with one server then it elects a new master and then a few minutes later all the servers go down in the cluster. I have it setup to send all the writes to the read databases and to leave the writes to the master. I have looked through the logs and cannot seem to find a root cause. Let me know what logs I should upload and or where I should look. Today alone we have had to restart the servers 4 times and it fixes it for a bit but its not a cure for the issue.
All databases are 16GB ram and 8 cpus and SSDs. I have them setup with the following settings in the neo4j.properties
neostore.nodestore.db.mapped_memory=1024M
neostore.relationshipstore.db.mapped_memory=2048M
neostore.propertystore.db.mapped_memory=6144M
neostore.propertystore.db.strings.mapped_memory=512M
neostore.propertystore.db.arrays.mapped_memory=512M
We are using newrelic to monitor the server and we do not see the hardware getting above 50% CPU and 40% memory so we are pretty sure that is not it.
Any help is appreciated :)

How many Remote Nodes can Jenkins manage

How many Remote Nodes can Jenkins manage ? Are there any limitations/memory issues?
What is more effective:
1) 100 Nodes 1 executor per node ?
2) 5 Nodes with 20 executors per node ?
Tx.
As far as i know, there is no limitation on # of nodes one can have although your system might feel like saying, enough is enough! Issues such as number of processes per user (we got this issue recently, not with Jenkins but some other application where RAM and disk space were fine but the system stopped responding. We started getting system cannot fork() error), total number of open files etc. Few such issues might still be configurable but may not be allowed/feasible.
If resource (in your case, nodes) is not a constraint, which process wouldn't like to run wild? :) In practical cases, generally you wouldn't have the flexibility to opt for first option. In second case where you have 5 nodes with 20 executors, all you have to make sure is not to tie up jobs to a particular node unless you have a compelling reason.
Some slaves are faster, while others are slow. Some slaves are closer (network wise) to a master, others are far away. So doing a good build distribution is a challenge. Currently, Jenkins employs the following strategy:
If a project is configured to stick to one computer, that's always honored.
Jenkins tries to build a project on the same computer that it was previously built.
Jenkins tries to move long builds to slaves, because the amount of network interaction between a master and a slave tends to be logarithmic to the duration of a build (IOW, even if project A takes twice as long to build as project B, it won't require double network transfer.) So this strategy reduces the network overhead.
You should also have a look at these links:
https://wiki.jenkins-ci.org/display/JENKINS/Least+Load+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/Gearman+Plugin

building multiple jobs in jenkins performance

In Jenkins I have 100 java projects. Each has its own build file.
Every time I want clear the build file and compile all source files again.
Using bulkbuilder plugin I tried compling all the jobs.. Having 100 jobs run parallel.
But performance is very bad. Individually if the job takes 1 min. in the batch it takes 20mins. More the batch size more the time it takes.. I am running this on powerful server so no problem of memory and CPU.
Please Suggest me how to over come this.. what configurations need to be done in jenkins.
I am launching jenkins using war file.
Thanks..
Even though you say you have enough memory and CPU resources, you seem to imply there is some kind of bottleneck when you increase the number of parallel running jobs. I think this is understandable. Even though I am not a java developer, I think most of the java build tools are able to parallelize build internally. I.e. building a single job may well consume more than one CPU core and quite a lot of memory.
Because of this I suggest you need to monitor your build server and experiment with different batch sizes to find an optimal number. You should execute e.g. "vmstat 5" while builds are running and see if you have idle cpu left. Also keep an eye on the disk I/O. If you increase the batch size but disk I/O does not increase, you are consuming all of the I/O capacity and it probably will not help much if you increase the batch size.
When you have found the optimal batch size (i.e. how many executors to configure for the build server), you can maybe tweak other things to make things faster:
Try to spend as little time checking out code as possible. Instead of deleting workspace before build starts, configure the SCM plugin to remove files that are not under version control. If you use git, you can use a local reference repo or do a shallow clone or something like that.
You can also try to speed things up by using SSD disks
You can get more servers, run Jenkins slaves on them and utilize the cpu and I/O capacity of multiple servers instead of only one.

Resources