For a while, our Jenkins experiences critical problems. We have jobs hung, our job scheduler does not trigger the builds. After the Jenkins service restart, everything is back to normal, but after some period of time all problem are return. (this period can be week or day or ever less). Any idea where we can start looking? I'll appreciate any help on this issue
Muatik has made a good point in his comment, the recommended approach is to run jobs on agents (slave) nodes. If you already do it, you can look at:
Jenkins master machine CPU, RAM and hard disk usage. Access the machine and/or use plugin like Java Melody. I have seen missing graphics in the builds test results and stuck builds due to no hard disk space. You could also have hit the limit of RAM or CPU for the slaves/jobs you are executing. You may need more heap space.
Look at Jenkins log files, start with severe exceptions. If the files are too big or you see logrotate exceptions, you can change the logging levels, so that fewer exceptions are logged. For more details see my article on this topic. Try to fix exceptions that you see logged.
Go through recently made changes that can be the cause of such behavior, for example, new plugins, changes to config files (jenkins.xml)?
Look at TCP connections. Run netstat -a Are there suspicious connections (CLOSED_WAIT status)?
Delete old builds that you do not need.
We have been facing this issue from last 4 months, and tried everything, changing resources CPU & memory, increasing desired nodes in ASG. But nothing seems worked .
Solution: 1. Go to Manage Jnekins-> System Configurationd-> Maven project
configurations
2. In "usage" field, select "Only buid Jobs with label expressions matching this nodes"
Doing this resolved it and jenkins is working as a Rocket now :)
Related
I have a Bazel repo that builds some artifact. The problem is that it stops half way through hanging with this message:
[3 / 8] no action
What on the earth could be causing this condition? Is this normal?
(No, the problem is not easily reducible, there is a lot of custom code and if I could localize the issue, I'd not be writing this question. I'm interested in general answer, what in principle could cause this and is this normal.)
It's hard to answer your question without more information, but yes it is sometimes normal. Bazel does several things that are not actions.
One reason that I've seen this is if Bazel is computing digests of lots of large files. If you see getDigestInExclusiveMode in the stack trace of the Bazel server, it is likely due to this. If this is your problem, you can try out the --experimental_multi_threaded_digest flag.
Depending on the platform you are running Bazel on:
Windows: I've seen similar behavior, but I couldn't yet determine the reason. Every few runs, Bazel hangs at the startup for about half a minute.
If this is mid-build during execution phase (as it appears to be, given that Bazel is already printing action status messages), then one possible explanation is that your build contains many shell commands. I measured that on Linux (a VM with an HDD) each shell command takes at most 1-2ms, but on my Windows 10 machine (32G RAM, 3.5GHz CPU, HDD) they take 1-2 seconds, with 0.5% of the commands taking up to 10 seconds. That's 3-4 orders of magnitude slower if your actions are heavy on shell commands. There can be numerous explanations for this (antivirus, slow process creation, MSYS being slow), none of which Bazel has control over.
Linux/macOS: Run top and see if the stuck Bazel process is doing anything at all. Try hitting Ctrl+\, that'll print a JVM stacktrace which could help identifying the problem. Maybe the JVM is stuck waiting in a lock -- that would mean a bug or a bad rule implementation.
There are other possibilities too, maybe you have a build rule that hangs.
Does Bazel eventually continue, or is it stuck for more than a few minutes?
How many Remote Nodes can Jenkins manage ? Are there any limitations/memory issues?
What is more effective:
1) 100 Nodes 1 executor per node ?
2) 5 Nodes with 20 executors per node ?
Tx.
As far as i know, there is no limitation on # of nodes one can have although your system might feel like saying, enough is enough! Issues such as number of processes per user (we got this issue recently, not with Jenkins but some other application where RAM and disk space were fine but the system stopped responding. We started getting system cannot fork() error), total number of open files etc. Few such issues might still be configurable but may not be allowed/feasible.
If resource (in your case, nodes) is not a constraint, which process wouldn't like to run wild? :) In practical cases, generally you wouldn't have the flexibility to opt for first option. In second case where you have 5 nodes with 20 executors, all you have to make sure is not to tie up jobs to a particular node unless you have a compelling reason.
Some slaves are faster, while others are slow. Some slaves are closer (network wise) to a master, others are far away. So doing a good build distribution is a challenge. Currently, Jenkins employs the following strategy:
If a project is configured to stick to one computer, that's always honored.
Jenkins tries to build a project on the same computer that it was previously built.
Jenkins tries to move long builds to slaves, because the amount of network interaction between a master and a slave tends to be logarithmic to the duration of a build (IOW, even if project A takes twice as long to build as project B, it won't require double network transfer.) So this strategy reduces the network overhead.
You should also have a look at these links:
https://wiki.jenkins-ci.org/display/JENKINS/Least+Load+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/Gearman+Plugin
In Jenkins I have 100 java projects. Each has its own build file.
Every time I want clear the build file and compile all source files again.
Using bulkbuilder plugin I tried compling all the jobs.. Having 100 jobs run parallel.
But performance is very bad. Individually if the job takes 1 min. in the batch it takes 20mins. More the batch size more the time it takes.. I am running this on powerful server so no problem of memory and CPU.
Please Suggest me how to over come this.. what configurations need to be done in jenkins.
I am launching jenkins using war file.
Thanks..
Even though you say you have enough memory and CPU resources, you seem to imply there is some kind of bottleneck when you increase the number of parallel running jobs. I think this is understandable. Even though I am not a java developer, I think most of the java build tools are able to parallelize build internally. I.e. building a single job may well consume more than one CPU core and quite a lot of memory.
Because of this I suggest you need to monitor your build server and experiment with different batch sizes to find an optimal number. You should execute e.g. "vmstat 5" while builds are running and see if you have idle cpu left. Also keep an eye on the disk I/O. If you increase the batch size but disk I/O does not increase, you are consuming all of the I/O capacity and it probably will not help much if you increase the batch size.
When you have found the optimal batch size (i.e. how many executors to configure for the build server), you can maybe tweak other things to make things faster:
Try to spend as little time checking out code as possible. Instead of deleting workspace before build starts, configure the SCM plugin to remove files that are not under version control. If you use git, you can use a local reference repo or do a shallow clone or something like that.
You can also try to speed things up by using SSD disks
You can get more servers, run Jenkins slaves on them and utilize the cpu and I/O capacity of multiple servers instead of only one.
We're running Jenkins server with few slaves that run the builds. Lately there are more and more builds that are running in the same time.
I see the java.exe process on the Jenkins server just increasing , and not decreasing even when the jobs were finnished.
Any idea dwhy oes it happen?
We're running Jenkins ver. 1.501.
Is there a way maybe to make the Jenkins server service ro wait until the last job is finnished, then restart automatically?
I can't seem to find a reference on this (still posting an answer because it's too long for comments ;-) ) but this is what I've observed using the Oracle JVM:
If more memory than currently reserved is needed, the JVM reserves more. So far so good. What it doesn't seem to do is release the memory once it's not needed anymore. You can watch this behaviour by turning on the heap size indicator in Eclipse.
I'd say the same does happen with Jenkins. A running Jenkins with only a few projects already can easily jump the 1 gig mark. If you have a lot of concurrent builds, Jenkins needs a lot of memory at some point. After the builds are done and the heapsize has decreased, the JVM keeps the memory reserved. It is practically "empty" but still claimed by the JVM so it's unavailable for other processes.
Again: It's just an observation. I'd be happy if someone with deeper insight on Java memory management would back this up (or disprove it)
As for a practical solution I'd say you gonna have to live with it to some point. Jenkins IS very hungry for memory. Restarting it solves the problem only temporary. At least it should stop claiming memory at some point because the "empty" reserved memory should be reused. If it's not this really sounds like a memory leak and would be a bug.
Jenkins' [excessive] use of memory without bounds seems to be a common observation. The Jenkins Wiki gives some suggestions for "I'm getting OutOfMemoryErrors".
We have also found that the Monitoring Plugin is useful for keeping an eye on the memory usage and helping us know if we might need to restart Jenkins soon.
Is there a way maybe to make the Jenkins server service ro wait until the last job is finnished, then restart automatically?
Check out the Restart Safely Plugin
I've installed a master jenkins instance and 2 slave nodes.
Both slaves are not synchronous with the master. Sometimes it shows that the slaves are 2 days or 1 hour in the future, sometimes it shows that time on slaves is behind the master - it seems to randomize.
Because of this some selenium tests or builds or other jobs doesn't work correctly anymore. The problem occurred suddenly and it doesn't matter which version of jenkins has been installed.
Has anyone an idea how to fix this problem?
Thank you very much.
Cheers
Christoph
It is hard to explain why the time difference would vary abruptly between the two machines. I assume you are referring to the information given by the http://jenkins.mydomain/computer/ url.
Normally you want to keep your machines in time sync, and enable NTP clients on each host, each pointing to the same set of NTP servers, either internal to your organization (if that is available), or the standard free NTP services available on the web.
Do you have this setup already and see abrupt time variations? If so, review your list of NTP services and make sure to use reliable ones and also the same list, it should help. Maybe narrow it down to just one service and then expand if need be.