Not sure how to resolve outofmemory issue on jenkins server? - jenkins

My Jenkins server keeps crashing, so I generated a heap dump which I then put through visualVM. It shows most of the memory is being used up by the class java.util.concurrent.concurrenhashmapnode.
My understanding is loads objects are being referenced, which are unable to be GC'd. As a result, most the memory is being used up by this. Any idea how to resolve this? I'm new to system admin stuff, so not the most technically proficient sorry.
TIA

I've recently came across OutOfMemoryError which crashed my Jenkins every 2 days. It was due to Ldap bug in old version of java: Ldap Error and java fixed versions matrix
In my case updating java fixed the problem.
Anyway to investigate OutOfMemoryError I did as follows:
restarted jenkins after crash,
took incremental thread dumps every half an hour, can be taken from <jenkinsUrl>/threadDump,
comparing thread dumps pointed me to memory leak on ldap threads.
In general I'd also suggest to:
update: java, Jenkins and it's plugins, other problematic tools,
investigate Jenkins logs, dumps, profile heap (what you already did).

Related

EXC_BREAKPOINT<redacted> how to debug?

We've recently released our app to our userbase and we are seeing a bunch of redacted exceptions in Sentry that we can't debug in any logical way.
The only thing these exceptions seem to have in common is that they never happen when the application is active:
And the available memory seems to be very low on these devices:
One theory we have is that the OS decides to close any background applications due to low memory available.
But it's quite a assumption to make at these point when I'm more inclined to believe we have made a mistake in our own code.
To my questions, how would we go about debugging these redacted exceptions? Are we right to believe that our app being closed when it's not active is no cause for concern?
The on-premise version of Sentry have several issues related to this specific problem. According to the Sentry team these will be fixed in a upcoming release for the on-premise version. But to summarize.
At first we had difficulties getting the upload scripts for the dSYMs to work. The Fastlane lane mentioned here did not work at all. Neither did the bash script that was prompted in the Sentry interface under debugging symbols.
What did work was using the sentry-cli (latest version) and bumping up the accepted file size for upload on our nginx server for our on premise. But after successfully getting our dSYMs file to actually show up in Sentry we had more problems.
The issues we've encountered are listed below:
A required debug symbol file was missing
#johan12345 Sorry for getting back to you so late. We've verified your debug symbols and can confirm they should process and symbolicate correctly. The issue you are referring to has been fixed a while back in both sentry-cli and sentry and will be available with the next release.
We have been preparing a major launch over the last couple of months which is why there have been no releases recently. However, since we've received a couple of requests regarding symbolication for on-premise customers, we will try to push a new release out soon. I cannot give you an exact timeline, though, so please stay tuned.
Again, I'm very sorry for the inconvenience this might have caused.
https://github.com/getsentry/sentry/issues/7595
Reprocessing 12 events …
Some users are reporting sometimes to be stuck on reprocessing. Mostly happens with self-installations but we also had two support issues.
This seems to be triggered by internal server errors in the processing pipeline in bad places.
Related: https://forum.sentry.io/t/stuck-there-are-x-events-pending-reprocessing/1518/6
https://github.com/getsentry/sentry/issues/5862
We've added a new button called "Discard all" which can be found above your processing issues list.
This will discard all processing issues and the corresponding events.
We've also found an error in our processing pipeline we've yet to fix.
I will close this issue for now and link new issues regarding processing errors later.
So the only thing I can advise you right now is basically deploy the master branch of Sentry because our last release was in November and we fixed a bunch of stuff since then.
Not sure if we release a new version before Sentry 9 (which still needs some time).
https://forum.sentry.io/t/ios-exceptions-shows-up-as-redacted/3681
TLDR: We are switching to Crashlytics

Mule 3.8.1 CE - Memory leak issue

In the past few months our company's Mule was down twice - it happened when there were lots of traffic. To investigate, we did a load test to simulate a large number of users. I ran this test on my local machine with 512m application memory size, used JMeter to send request to Mule (Number of threads: 1000, ramp-up period: 10sec). And I use Visual VM to analyze.
Here are my observations:
I see "java.lang.OutOfMemoryError: GC overhead limit exceeded" in Anypoint Studio's console.
No error in mule log.
I did a heap dump - it showed that the combined size of char array objects is really big.
We see a very high number of AsyncLogger related classes: we now suspect we have a memory leak when writing logs.
When I changed all log level to Error, in order to remove all normal logs in log4j2.xml, the Garbage Collection performed correctly.
Then I tried to use Mule Runtime 3.9.0 Community Version and reverted the log level change, I can see Garbage Collection was performing and number of threads go down when I stopped sending request to Mule.
So why is there memory leak problem in 3.8.1? I checked 3.9.0 release notes - Resolved Issue checklist, I don't see any issue related to this issue. What is the root cause of the problem?
Log4j was updated in Mule 3.8.5/3.9.0, that could be the reason. I suggest you take a look at all release notes in between 3.8.1 and 3.9.0 since the 3.9.0 release notes are based on the latest 3.8.x version at that point. In fact, you can find the update information in the 3.8.5 release notes. HTH

What is causing repeated glibc error with plink/batch job software-?

I am running plink software through a PBS batch job. This error occurs when I run the job:
*** glibc detected *** /software/plink: double free or corruption (out): 0x000000018dfafca0 ***
======= Backtrace: =========
[0x7d7691]
[0x7d8bea]
[0x45f5ed]
[0x47bb11]
[0x40669a]
[0x7bdb2c]
[0x400209]
However it only occur with one of my files (bw 30-60 gb files) and each rerun shows the exact same back trace map. I tried running it not through the batch scheduler and received the same error again, with the same backtrace map. I am just using the software (plink), and didn't write it, so most of the answers online are about writing and freeing memory in your program.
Any ideas on
what is causing this error, and
how I can fix it?
what is causing this error, and
A double-free or heap corruption in the plink
how I can fix it?
You can't. You can do one of two things, depending on how much you know and understand.
First, build the newest version of plink from source, and see if the problem persists.
If it does not, you are done (or at least you might hope that someone else found and fixed this problem).
If it does, you'll have to debug the problem sufficiently for either you, or plink developers to fix it. Some tools that should help: Valgrind and Address Sanitizer (note: in addition to Clang, Address Sanitizer is also included in GCC-4.8).
Once you have a good report (where the memory was allocated, and where it got corrupted), you should either fix it and submit your fix to plink developers, or give them a bug report with the allocation and corruption location and stack traces.

nunit large project integration tests, out of memory exception

I'm using nunit for running integration tests (which each test interacts with the database via nhibernate) for a large project with 600+ tests. problem is after 10-15 minutes minutes nunit throws out of memory exception. I used redgate ants memory profiler to see why nunit is not releasing memory between the tests. seems it tries to recreate permission objects per test and memory keeps growing and eventually it throws out of memory exception.
I took snapshots on one integration test class but you can see the memory keeps growing after awhile. I didn't find any call or setting for nunit to force memory release or solve this trust issue.
I really appreciate any help.
<NamedPermissionSets><PermissionSet class=\\"System.Security.NamedPermissionSet\\"version=\\"1\\" Unrestricted=\\"true\\" Name=\\"Full ...
ANTS Snapshots:
http://www.tinyuploads.com/images/IXwb8Q.jpg
http://www.tinyuploads.com/images/3R2VbB.png
I can't tell which version you're using, but your problem might have been fixed in NUnit 2.6.3. See here for the bug report.

PermGen problems with Lift and Jetty

I'm developing on the standard Lift platform (maven and jetty). I'm repeatedly (once every couple of days) getting this:
Exception in thread "7048009#qtp-3179125-12" java.lang.OutOfMemoryError: PermGen space
2009-09-15 19:41:38.629::WARN: handle failed
java.lang.OutOfMemoryError: PermGen space
This is in my dev environment. It's not a problem because I can keep restarting the server. In deployment I'm not having these problems so it's not a real issue. I'm just curious.
I don't know too much about the JVM. I think I'm correct in thinking that permanent generation memory is for things like classes and interned strings? What I remember is a bit mixed up with the .NET memory model...
Any reason why this is happening? Are the defaults just crazily low? Is it to do with all the auxiliary objects that Scala has to create for Function objects and similar FP things? Every time I restart Jetty with newly written code (every few minutes) I imagine it re-loads classes etc. But even so, it cant' be that many can it? And shouldn't the JVM be able to deal with a large number of classes?
Cheers
Joe
From this post:
This exception occurred for one simple reason :
the permgenspace is where class properties, such as methods, fields, annotations, and also static variables, etc. are stored in the Java VM, but this space has the particularity to not being cleaned by the garbage collector.
So if your webapp uses or creates a lot of classes (I’m thinking dynamic generations of classes), chances are you met this problem.
Here are some solutions that helped me get rid of this exception :
-XX:+CMSClassUnloadingEnabled : this setting enables garbage collection in the permgenspace
-XX:+CMSPermGenSweepingEnabled : allows the garbage collector to remove even classes from the memory
-XX:PermSize=64M -XX:MaxPermSize=128M : raises the amount of memory allocated to the permgenspace
May be this could help.
Edit July 2012 (almost 3 years later):
Ondra Žižka comments (and I have updated the answer above):
JVM 1.6.0_27 says: Please use:
CMSClassUnloadingEnabled (Whether class unloading enabled when using CMS GC)
in place of CMSPermGenSweepingEnabled in the future
See the full Hotspot JVM Options - The complete reference for mroe.
If you see this when running mvn jetty:run,
set the MAVEN_OPTS.
For Linux:
export MAVEN_OPTS="-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
For Windows:
set "MAVEN_OPTS=-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
Should be fine now. If not, increase -XX:MaxPermSize.
You can also put these permanently to your environment.
For Linux, append the export line to ~/.bashrc
For Windows, press Win-key + PrintScreen, and go Advanced > Environment.
See also http://support.microsoft.com/kb/310519.
This is because of the reloading of classes as you suggested. If you are using lots of libraries etc. the sum of classes will rapidly grow for each restart. Try monitoring your jetty instance with VisualVM to get an overview of memory consumption when reloading.
The mailing list (http://groups.google.com/group/liftweb/) is the official support forum for Lift, and where you'll be able to get a better answer. I don't know the particulars of your dev setup (you don't go into much detail), but I assume you're reloading your war in Jetty without actually restarting it. Lift doesn't perform dynamic class generation (as suggested by VonC above), but Scala compiles each closure as a separate class. If you're adding and removing closures to your code over the course of several days, it's possible that too many classes are being loaded and never unloaded and taking up perm space. I'd suggest you enable the options JVM options mentioned by VonC above and see if they help.
The permanent generation is where the JVM puts stuff that will probably not be (garbage) collected like custom classloaders.
Depending on what you are deploying, the perm gen setting can be low. Some applications and/or containers combination do contain some memory leaks, so when an app gets undeployed sometimes some stuff like class loaders are not collected, resulting in filling the Perm Space thus generating the error you are having.
Unfortunately, currently the best option in this case is to max up the perm space with the following jvm flag (example for 192m perm size):
-XX:MaxPermSize=192M (or 256M)
The other option is to make sure that either the container or the framework do not leak memory.

Resources