Mule 3.8.1 CE - Memory leak issue - memory

In the past few months our company's Mule was down twice - it happened when there were lots of traffic. To investigate, we did a load test to simulate a large number of users. I ran this test on my local machine with 512m application memory size, used JMeter to send request to Mule (Number of threads: 1000, ramp-up period: 10sec). And I use Visual VM to analyze.
Here are my observations:
I see "java.lang.OutOfMemoryError: GC overhead limit exceeded" in Anypoint Studio's console.
No error in mule log.
I did a heap dump - it showed that the combined size of char array objects is really big.
We see a very high number of AsyncLogger related classes: we now suspect we have a memory leak when writing logs.
When I changed all log level to Error, in order to remove all normal logs in log4j2.xml, the Garbage Collection performed correctly.
Then I tried to use Mule Runtime 3.9.0 Community Version and reverted the log level change, I can see Garbage Collection was performing and number of threads go down when I stopped sending request to Mule.
So why is there memory leak problem in 3.8.1? I checked 3.9.0 release notes - Resolved Issue checklist, I don't see any issue related to this issue. What is the root cause of the problem?

Log4j was updated in Mule 3.8.5/3.9.0, that could be the reason. I suggest you take a look at all release notes in between 3.8.1 and 3.9.0 since the 3.9.0 release notes are based on the latest 3.8.x version at that point. In fact, you can find the update information in the 3.8.5 release notes. HTH

Related

Not sure how to resolve outofmemory issue on jenkins server?

My Jenkins server keeps crashing, so I generated a heap dump which I then put through visualVM. It shows most of the memory is being used up by the class java.util.concurrent.concurrenhashmapnode.
My understanding is loads objects are being referenced, which are unable to be GC'd. As a result, most the memory is being used up by this. Any idea how to resolve this? I'm new to system admin stuff, so not the most technically proficient sorry.
TIA
I've recently came across OutOfMemoryError which crashed my Jenkins every 2 days. It was due to Ldap bug in old version of java: Ldap Error and java fixed versions matrix
In my case updating java fixed the problem.
Anyway to investigate OutOfMemoryError I did as follows:
restarted jenkins after crash,
took incremental thread dumps every half an hour, can be taken from <jenkinsUrl>/threadDump,
comparing thread dumps pointed me to memory leak on ldap threads.
In general I'd also suggest to:
update: java, Jenkins and it's plugins, other problematic tools,
investigate Jenkins logs, dumps, profile heap (what you already did).

JDK 11 (and newer) DirectByteBuffer Hold Large Off-Heap Memory Even At Startup

Our application uses alot of DirectByteBuffer's object via nio's FileChannel.map() and ByteBuffer.allocateDirect() to load and process files (ex. DICOM). The code is written in java 8 but compiled in java 11.0.3. We profiled our application using both JMC 7.x and JxRay (which shows DirectByteBuffers memory specifically). JxRay reported that our application used a large amount of DirectByteBuffer (off-heap memory) around 140MB even on the startup of the application, which is pretty unusual. Speficially JxRay report points to jdk.internal.jimage.ImageReader$SharedImageReader object that is holding this large memory. So I created a small hello world without any reference to DirectBuffer's class/object and JxRay reported almost identical result, which I am puzzled. Contacted JxRay team and they told me possiblly the newer JDK 11 jdk.internal.jimage.ImageReader$SharedImageReader could have been initialized and allocated that large memory. JxRay does not report this issue on JDK 1.8 version and they also said the format of the heap dump didn't change between JDK version (8 vs 11). I'm posting this question if somebody has encountered this issue or knowledge thereof.
Thanks

EXC_BREAKPOINT<redacted> how to debug?

We've recently released our app to our userbase and we are seeing a bunch of redacted exceptions in Sentry that we can't debug in any logical way.
The only thing these exceptions seem to have in common is that they never happen when the application is active:
And the available memory seems to be very low on these devices:
One theory we have is that the OS decides to close any background applications due to low memory available.
But it's quite a assumption to make at these point when I'm more inclined to believe we have made a mistake in our own code.
To my questions, how would we go about debugging these redacted exceptions? Are we right to believe that our app being closed when it's not active is no cause for concern?
The on-premise version of Sentry have several issues related to this specific problem. According to the Sentry team these will be fixed in a upcoming release for the on-premise version. But to summarize.
At first we had difficulties getting the upload scripts for the dSYMs to work. The Fastlane lane mentioned here did not work at all. Neither did the bash script that was prompted in the Sentry interface under debugging symbols.
What did work was using the sentry-cli (latest version) and bumping up the accepted file size for upload on our nginx server for our on premise. But after successfully getting our dSYMs file to actually show up in Sentry we had more problems.
The issues we've encountered are listed below:
A required debug symbol file was missing
#johan12345 Sorry for getting back to you so late. We've verified your debug symbols and can confirm they should process and symbolicate correctly. The issue you are referring to has been fixed a while back in both sentry-cli and sentry and will be available with the next release.
We have been preparing a major launch over the last couple of months which is why there have been no releases recently. However, since we've received a couple of requests regarding symbolication for on-premise customers, we will try to push a new release out soon. I cannot give you an exact timeline, though, so please stay tuned.
Again, I'm very sorry for the inconvenience this might have caused.
https://github.com/getsentry/sentry/issues/7595
Reprocessing 12 events …
Some users are reporting sometimes to be stuck on reprocessing. Mostly happens with self-installations but we also had two support issues.
This seems to be triggered by internal server errors in the processing pipeline in bad places.
Related: https://forum.sentry.io/t/stuck-there-are-x-events-pending-reprocessing/1518/6
https://github.com/getsentry/sentry/issues/5862
We've added a new button called "Discard all" which can be found above your processing issues list.
This will discard all processing issues and the corresponding events.
We've also found an error in our processing pipeline we've yet to fix.
I will close this issue for now and link new issues regarding processing errors later.
So the only thing I can advise you right now is basically deploy the master branch of Sentry because our last release was in November and we fixed a bunch of stuff since then.
Not sure if we release a new version before Sentry 9 (which still needs some time).
https://forum.sentry.io/t/ios-exceptions-shows-up-as-redacted/3681
TLDR: We are switching to Crashlytics

Does calling FastMM4 LogAllocatedBlocksToFile() periodically use up memory space?

I'm hunting an elusive memory problem in a Delphi 5 program, where memory gets randomly overwritten at the customer site. After trying a lot of things with no result so far I now want to use the FastMM4 output from the LogAllocatedBlocksToFile() to find out which objects are allocated immediately before the overwritten area. The program uses a timer to write allocated block information to a new file every 30 minutes. Unfortunately my test run of the program (DEBUG build) crashed after about 23 hours with an EOutOfMemory exception, using allocated memory of 1.83 GB according to MadExcept.
From SysInternals Process Explorer it does look like each call of LogAllocatedBlocksToFile() allocates but does not free memory:
The red spikes in the CPU Usage graph are the LogAllocatedBlocksToFile() calls. I have added calls to LogMemoryManagerStateToFile() immediately before and after, and the data for the last spike (increse of the private bytes from about 183 MB to about 218 MB) looks like this:
55054K Allocated
47911K Overhead
53% Efficiency
and this:
55055K Allocated
47910K Overhead
53% Efficiency
so FastMM4 seems not to be aware of the additional memory the program consumes according to Process Explorer.
I'm using version 4.991 of FastMM4, downloaded today from SourceForge. The test program runs in DEBUG mode, with the following defines set:
FullDebugMode
UseCustomFixedSizeMoveRoutines
UseCustomVariableSizeMoveRoutines
NoDebugInfo
ASMVersion
DetectMMOperationsAfterUninstall
RawStackTraces
LogErrorsToFile
LogMemoryLeakDetailToFile
AlwaysAllocateTopDown
SuppressFreeMemErrorsInsideException
EnableMemoryLeakReporting
HideExpectedLeaksRegisteredByPointer
RequireDebuggerPresenceForLeakReporting
EnableMMX
ForceMMX
EnableBackwardCompatibleMMSharing
UseOutputDebugString
Questions:
Is there any known problem with those functions? Am I not using them properly, are they not intended to be called multiple times in one debugging session? Is there a way to get that memory released again?
Short version:
I have tracked this down to be a version mismatch of the support library FastMM_FullDebugMode.dll.
An older version of the library works with the newer version compiled into the executable. There seems to be no check that versions do match. However, modules don't really work together at run-time.
Long version:
The project originally uses the older version 4.97 of FastMM4, which I have checked in here together with the support library (file version 1.44.0.4, product version 1.42).
While trying to find the bug in the program I have upgraded FastMM4 to version 4.991. I also remember to have copied the new support library (file version 1.61.0.6, product version 1.60) to the build directory. However, some time later I must have deleted it from the directory, or I copied it into the wrong directory to begin with, because two hours ago I checked the modules loaded by the application and found that the app had picked up the old version of the support library from another directory, as it was not in the build directory.
Since copying it there and restarting the app the problem seems to be gone. Memory usage doesn't increase when LogAllocatedBlocksToFile() is called.
Maybe this helps someone, so I answer this instead of deleting the question.
On with debugging...

PermGen problems with Lift and Jetty

I'm developing on the standard Lift platform (maven and jetty). I'm repeatedly (once every couple of days) getting this:
Exception in thread "7048009#qtp-3179125-12" java.lang.OutOfMemoryError: PermGen space
2009-09-15 19:41:38.629::WARN: handle failed
java.lang.OutOfMemoryError: PermGen space
This is in my dev environment. It's not a problem because I can keep restarting the server. In deployment I'm not having these problems so it's not a real issue. I'm just curious.
I don't know too much about the JVM. I think I'm correct in thinking that permanent generation memory is for things like classes and interned strings? What I remember is a bit mixed up with the .NET memory model...
Any reason why this is happening? Are the defaults just crazily low? Is it to do with all the auxiliary objects that Scala has to create for Function objects and similar FP things? Every time I restart Jetty with newly written code (every few minutes) I imagine it re-loads classes etc. But even so, it cant' be that many can it? And shouldn't the JVM be able to deal with a large number of classes?
Cheers
Joe
From this post:
This exception occurred for one simple reason :
the permgenspace is where class properties, such as methods, fields, annotations, and also static variables, etc. are stored in the Java VM, but this space has the particularity to not being cleaned by the garbage collector.
So if your webapp uses or creates a lot of classes (I’m thinking dynamic generations of classes), chances are you met this problem.
Here are some solutions that helped me get rid of this exception :
-XX:+CMSClassUnloadingEnabled : this setting enables garbage collection in the permgenspace
-XX:+CMSPermGenSweepingEnabled : allows the garbage collector to remove even classes from the memory
-XX:PermSize=64M -XX:MaxPermSize=128M : raises the amount of memory allocated to the permgenspace
May be this could help.
Edit July 2012 (almost 3 years later):
Ondra Žižka comments (and I have updated the answer above):
JVM 1.6.0_27 says: Please use:
CMSClassUnloadingEnabled (Whether class unloading enabled when using CMS GC)
in place of CMSPermGenSweepingEnabled in the future
See the full Hotspot JVM Options - The complete reference for mroe.
If you see this when running mvn jetty:run,
set the MAVEN_OPTS.
For Linux:
export MAVEN_OPTS="-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
For Windows:
set "MAVEN_OPTS=-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
Should be fine now. If not, increase -XX:MaxPermSize.
You can also put these permanently to your environment.
For Linux, append the export line to ~/.bashrc
For Windows, press Win-key + PrintScreen, and go Advanced > Environment.
See also http://support.microsoft.com/kb/310519.
This is because of the reloading of classes as you suggested. If you are using lots of libraries etc. the sum of classes will rapidly grow for each restart. Try monitoring your jetty instance with VisualVM to get an overview of memory consumption when reloading.
The mailing list (http://groups.google.com/group/liftweb/) is the official support forum for Lift, and where you'll be able to get a better answer. I don't know the particulars of your dev setup (you don't go into much detail), but I assume you're reloading your war in Jetty without actually restarting it. Lift doesn't perform dynamic class generation (as suggested by VonC above), but Scala compiles each closure as a separate class. If you're adding and removing closures to your code over the course of several days, it's possible that too many classes are being loaded and never unloaded and taking up perm space. I'd suggest you enable the options JVM options mentioned by VonC above and see if they help.
The permanent generation is where the JVM puts stuff that will probably not be (garbage) collected like custom classloaders.
Depending on what you are deploying, the perm gen setting can be low. Some applications and/or containers combination do contain some memory leaks, so when an app gets undeployed sometimes some stuff like class loaders are not collected, resulting in filling the Perm Space thus generating the error you are having.
Unfortunately, currently the best option in this case is to max up the perm space with the following jvm flag (example for 192m perm size):
-XX:MaxPermSize=192M (or 256M)
The other option is to make sure that either the container or the framework do not leak memory.

Resources