Does calling FastMM4 LogAllocatedBlocksToFile() periodically use up memory space? - delphi

I'm hunting an elusive memory problem in a Delphi 5 program, where memory gets randomly overwritten at the customer site. After trying a lot of things with no result so far I now want to use the FastMM4 output from the LogAllocatedBlocksToFile() to find out which objects are allocated immediately before the overwritten area. The program uses a timer to write allocated block information to a new file every 30 minutes. Unfortunately my test run of the program (DEBUG build) crashed after about 23 hours with an EOutOfMemory exception, using allocated memory of 1.83 GB according to MadExcept.
From SysInternals Process Explorer it does look like each call of LogAllocatedBlocksToFile() allocates but does not free memory:
The red spikes in the CPU Usage graph are the LogAllocatedBlocksToFile() calls. I have added calls to LogMemoryManagerStateToFile() immediately before and after, and the data for the last spike (increse of the private bytes from about 183 MB to about 218 MB) looks like this:
55054K Allocated
47911K Overhead
53% Efficiency
and this:
55055K Allocated
47910K Overhead
53% Efficiency
so FastMM4 seems not to be aware of the additional memory the program consumes according to Process Explorer.
I'm using version 4.991 of FastMM4, downloaded today from SourceForge. The test program runs in DEBUG mode, with the following defines set:
FullDebugMode
UseCustomFixedSizeMoveRoutines
UseCustomVariableSizeMoveRoutines
NoDebugInfo
ASMVersion
DetectMMOperationsAfterUninstall
RawStackTraces
LogErrorsToFile
LogMemoryLeakDetailToFile
AlwaysAllocateTopDown
SuppressFreeMemErrorsInsideException
EnableMemoryLeakReporting
HideExpectedLeaksRegisteredByPointer
RequireDebuggerPresenceForLeakReporting
EnableMMX
ForceMMX
EnableBackwardCompatibleMMSharing
UseOutputDebugString
Questions:
Is there any known problem with those functions? Am I not using them properly, are they not intended to be called multiple times in one debugging session? Is there a way to get that memory released again?

Short version:
I have tracked this down to be a version mismatch of the support library FastMM_FullDebugMode.dll.
An older version of the library works with the newer version compiled into the executable. There seems to be no check that versions do match. However, modules don't really work together at run-time.
Long version:
The project originally uses the older version 4.97 of FastMM4, which I have checked in here together with the support library (file version 1.44.0.4, product version 1.42).
While trying to find the bug in the program I have upgraded FastMM4 to version 4.991. I also remember to have copied the new support library (file version 1.61.0.6, product version 1.60) to the build directory. However, some time later I must have deleted it from the directory, or I copied it into the wrong directory to begin with, because two hours ago I checked the modules loaded by the application and found that the app had picked up the old version of the support library from another directory, as it was not in the build directory.
Since copying it there and restarting the app the problem seems to be gone. Memory usage doesn't increase when LogAllocatedBlocksToFile() is called.
Maybe this helps someone, so I answer this instead of deleting the question.
On with debugging...

Related

JDK 11 (and newer) DirectByteBuffer Hold Large Off-Heap Memory Even At Startup

Our application uses alot of DirectByteBuffer's object via nio's FileChannel.map() and ByteBuffer.allocateDirect() to load and process files (ex. DICOM). The code is written in java 8 but compiled in java 11.0.3. We profiled our application using both JMC 7.x and JxRay (which shows DirectByteBuffers memory specifically). JxRay reported that our application used a large amount of DirectByteBuffer (off-heap memory) around 140MB even on the startup of the application, which is pretty unusual. Speficially JxRay report points to jdk.internal.jimage.ImageReader$SharedImageReader object that is holding this large memory. So I created a small hello world without any reference to DirectBuffer's class/object and JxRay reported almost identical result, which I am puzzled. Contacted JxRay team and they told me possiblly the newer JDK 11 jdk.internal.jimage.ImageReader$SharedImageReader could have been initialized and allocated that large memory. JxRay does not report this issue on JDK 1.8 version and they also said the format of the heap dump didn't change between JDK version (8 vs 11). I'm posting this question if somebody has encountered this issue or knowledge thereof.
Thanks

Techniques and tools for debugging problems on remote machines?

Users have been reporting problems/crashes/bugs that I can't reproduce on my machine. I'm finding these problems difficult to fix.
I've started using EurekaLog (fantastic!) and SmartInspect. Both these tools have helped greatly but I'm still finding it difficult to catch some problems.
I've just purchased Debugging by David Agans (and waiting for it to arrive).
Are there any other tools or techniques specific to Delphi that will help with catching these hard to find remote problems? The kinds of problems I'm finding difficult to track down are those that don't raise exceptions or have a clear cause. EurekaLog catches exceptions and SmartInspect is pretty good once I have a theory to check. But in some cases it is a seemingly random crash and there are several thousand lines of code that may may be at fault. How to narrow down to the root cause?
MadExcept is what I use, and it is fabulous. I have also used EurekaLog and find the functionality almost exactly identical, except that I have more experience and time using MadExcept. it's free for non-commercial use, and reasonably priced for commercial use.
Update: MadExcept 4 is now out and even supports 64 bit Delphi XE2 apps, and has memory-leak checking too.
When nothing blows up, I rely on heavy use of trace logging. I have a TraceMessage(integer,string) function which I call throughout all my apps, and when someone has problems I get them to click a menu item that turns up the debug trace level to the most verbose level; it gives me a complete history of everything my application did, and this has helped me even more than madExcept, to solve problems at customer sites. Customers get a crash, and that crash report sent by madexcept contains a log file (created by my app) that is attached automatically. I believe you can do this equally well with madExcept and EurekaLog. If you need a logging system you could download Log4D, or you could write your own, it's pretty simple.
For always-free, try JclDebug, which requires more work to set up, but which has worked fabulously for me, also.
For help with heap problems, learn more about fastMM (full version) debug options.
And you shouldn't forget that Delphi itself supports Remote debugging, if you can reproduce a crash on machines in your office that don't have delphi installed, use remote debug across the office network instead of installing a complete RAD Studio installation on that other machine at your work. You could also use remote debug to connect to a client PC computer across the internet, but I have not tried remote debug across the internet yet, so I can't say whether it works great over the internet or not. I do know that since remote debug doesn't support automatic deploy of the EXE file you built (you have to do that part yourself), remote debug over internet, to a client PC is more work.
You might also find lots of your problems by fixing all your hints and warnings, and then going through with CodeHealer or Pascal Analyzer (PAL) from Peganza. These static analysis tools can help you find real code problems.
If performance and memory usage are your problems, get the full version of AQTime, and use it to profile and watch your system operate. It will help you fix your memory leaks, and understand your app's runtime behaviour and memory usage, not just leaks but bottlenecks for memory and CPU usage. Some of those bottlenecks can also be the source of some odd problems. I have even used AQTime to help me find deadlocks, since it can generate traces of execution, that can help me figure out what code is running, and locate deadlocks. Update: AQTime is not installable on machines other than your main dev machine, without violating the newly modified license terms for AQTime. These terms were never this restrictive in the good old days.
If you gave more exact idea of what your problems are, I'm sure other people could give you some more ideas that are specific, but all of the above are general techniques that have served me well.
One of the best way is to use the Remote Debugger that comes with Delphi, so you can debug directly the application running on the remote machine. THe remote debugger is somewhat buggy in some Delphi releases, and requires to follow the instructions carefully to make it working, but when needed it's a tool to consider. Also check if there are updates available for your version, they could come in a separate installer for deployment on "remote" systems. Otherwise first install the remote debugger, than check if the files installed has newer versions in your local installation, and the copy tehm on the remote machine.
CodeSite has helped me a lot in these situations. Since XE it is bundled with Delphi.
Logging is the key, in this matter.
Take a look at our TSynLog class available in our Open Source SynCommons library.
It does have the JCL Debug / MadExcept features, with some additional (like customer-side profiling, and logging):
logging with a set of levels;
fast, low execution overhead;
can load .map file symbols to be used in logging;
compression of .map into binary .mab (900 KB -> 70 KB);
inclusion of the .map/.mab into the .exe;
reading of an external .map to add unit names and line numbers to a log file without .map available information at execution;
exception logging (Delphi or low-level exceptions) with unit names and line numbers;
optional stack trace with units and line numbers;
methods or procedure recursive tracing, with Enter and auto-Leave using interfaces;
high resolution time stamps, for customer-side profiling of the application execution;
set / enumerates / TList / TPersistent / TObjectList / TContainer / dynamic array JSON serialization;
per-thread or global logging;
multiple log files on the same process;
integrated log archival (in zip or any other format);
Open Source, works from Delphi 5 up to XE.

can I alloc() in the main thread and free() in another?

I have a program that runs fine on MacOS and Linux and cross-compiles to Windows with mingw. Recently I made the program multi-threaded.
The current design of the program has memory allocated in the main thread and freed in the slave "worker" threads. That's not a problem on MacOS and Linux because the malloc/free system is multi-threaded.
I'm concerned about the cross-compiling, however. The version of mingw that I'm using is built from MacOS ports. It's a pretty ancient version of G++ (version 3.4.5) from 2004. I've been unsuccessful in my attempts to build a more recent version (I'd like to build a 64-bit version, but gave up). I'm getting pthreads from http://sourceware.org/pthreads-win32.
My concern is that the malloc & free system in 3.4.5 is not multi-threaded.
Questions:
Should I rewrite my program so that the blocks of memory to be freed are passed back to the main thread to be freed there?
Should I try to upgrade to a more recent mingw?
Is there any way to find these concurrency problems other than massive amounts of testing? That just doesn't feel good to me.
Thanks!
Why do you say malloc & free are not multithreaded?
mingw32 by default will link with msvcrt.dll which is a multithread dll. See [1]. There was[2] a single-threaded library provided by Microsoft, but it was only available for static linking.
PS: You mention that you are cross-compiling but you seem instead to be compiling the windows program in windows. In such case, Why don't you dowload the binaries from www.mingw.org? (it's a pain to figure out in their downloads the files needed, though)
1- http://msdn.microsoft.com/en-us/library/abx4dbyh%28v=VS.71%29.aspx
2- See [1]. Removed in Visual Studio 2005 http:// msdn.microsoft.com/en-us/library/abx4dbyh%28v=VS.80%29.aspx
I would avoid this. It sounds like you're trying to dodge the main issue.
Yes, that would be a good idea in any case...
One way to detect concurrency problems related to memory allocation/deallocation is a memory leak detector. I'm not sure if valgrind works on cygwin.

PermGen problems with Lift and Jetty

I'm developing on the standard Lift platform (maven and jetty). I'm repeatedly (once every couple of days) getting this:
Exception in thread "7048009#qtp-3179125-12" java.lang.OutOfMemoryError: PermGen space
2009-09-15 19:41:38.629::WARN: handle failed
java.lang.OutOfMemoryError: PermGen space
This is in my dev environment. It's not a problem because I can keep restarting the server. In deployment I'm not having these problems so it's not a real issue. I'm just curious.
I don't know too much about the JVM. I think I'm correct in thinking that permanent generation memory is for things like classes and interned strings? What I remember is a bit mixed up with the .NET memory model...
Any reason why this is happening? Are the defaults just crazily low? Is it to do with all the auxiliary objects that Scala has to create for Function objects and similar FP things? Every time I restart Jetty with newly written code (every few minutes) I imagine it re-loads classes etc. But even so, it cant' be that many can it? And shouldn't the JVM be able to deal with a large number of classes?
Cheers
Joe
From this post:
This exception occurred for one simple reason :
the permgenspace is where class properties, such as methods, fields, annotations, and also static variables, etc. are stored in the Java VM, but this space has the particularity to not being cleaned by the garbage collector.
So if your webapp uses or creates a lot of classes (I’m thinking dynamic generations of classes), chances are you met this problem.
Here are some solutions that helped me get rid of this exception :
-XX:+CMSClassUnloadingEnabled : this setting enables garbage collection in the permgenspace
-XX:+CMSPermGenSweepingEnabled : allows the garbage collector to remove even classes from the memory
-XX:PermSize=64M -XX:MaxPermSize=128M : raises the amount of memory allocated to the permgenspace
May be this could help.
Edit July 2012 (almost 3 years later):
Ondra Žižka comments (and I have updated the answer above):
JVM 1.6.0_27 says: Please use:
CMSClassUnloadingEnabled (Whether class unloading enabled when using CMS GC)
in place of CMSPermGenSweepingEnabled in the future
See the full Hotspot JVM Options - The complete reference for mroe.
If you see this when running mvn jetty:run,
set the MAVEN_OPTS.
For Linux:
export MAVEN_OPTS="-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
For Windows:
set "MAVEN_OPTS=-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
Should be fine now. If not, increase -XX:MaxPermSize.
You can also put these permanently to your environment.
For Linux, append the export line to ~/.bashrc
For Windows, press Win-key + PrintScreen, and go Advanced > Environment.
See also http://support.microsoft.com/kb/310519.
This is because of the reloading of classes as you suggested. If you are using lots of libraries etc. the sum of classes will rapidly grow for each restart. Try monitoring your jetty instance with VisualVM to get an overview of memory consumption when reloading.
The mailing list (http://groups.google.com/group/liftweb/) is the official support forum for Lift, and where you'll be able to get a better answer. I don't know the particulars of your dev setup (you don't go into much detail), but I assume you're reloading your war in Jetty without actually restarting it. Lift doesn't perform dynamic class generation (as suggested by VonC above), but Scala compiles each closure as a separate class. If you're adding and removing closures to your code over the course of several days, it's possible that too many classes are being loaded and never unloaded and taking up perm space. I'd suggest you enable the options JVM options mentioned by VonC above and see if they help.
The permanent generation is where the JVM puts stuff that will probably not be (garbage) collected like custom classloaders.
Depending on what you are deploying, the perm gen setting can be low. Some applications and/or containers combination do contain some memory leaks, so when an app gets undeployed sometimes some stuff like class loaders are not collected, resulting in filling the Perm Space thus generating the error you are having.
Unfortunately, currently the best option in this case is to max up the perm space with the following jvm flag (example for 192m perm size):
-XX:MaxPermSize=192M (or 256M)
The other option is to make sure that either the container or the framework do not leak memory.

Delphi: How to organize source code to increase compiler performance?

I'm working on a large delphi 6 project with quite a lot of dependancies. It takes several minutes to compile the whole project. The recompilation after a few changes is sometimes much more longer so that it is quicker to terminate Delphi, erase all dcu files and recompile everything.
Does anyone know a way to identify, what makes the compiler slower and slower? Any tips how to organize the code to improve compiler performance?
I have already tried following things:
Explicitly include most of the units in the dpr instead of relying on the search path: It didn't improve anything.
Use the command line compiler dcc32: it isn't faster.
Try to see what the compiler does (using ProcessExplorer from SysInternals): apparently it runs most of the time a function called 'KibitzGetOverloads'. But I can't do anything with this information...
EDIT, Summary of the answers until now:
The answer that worked best in my case:
The function "Clean unused units references" from cnpack. It almost automatically cleaned more than 1000 references, making a "cold" compilation about twice faster. ("cold" compilation = erase all dcu files before compiling). It gets the reference list from the compiler. So if you have some {$IFDEF } check that all your configurations still compile.
The next thing I would like to try:
Refactoring the unit references manually (eventually using an abstract class)
but it is much more work, since I first need to identify where the problems are. Some tools that might help:
GExperts adds a project dependencies browser to the delphi IDE (but unfortunately it can not show the size of each branch)
Delphi Unit Dependency Viewer V1.0 do about the same thing but without Delphi. It can calculate some simple statistics (Which units is the most referenced, ...)
Icarus which is referenced on a link in one of the answer.
Things that didn't change anything in my case:
Putting every files from my program and all components in one folder without subfolders.
Defragmenting the disk (I tried with a ramdisk)
Using a ramdisk for the code source and output folders.
Turning off the live scanning antivirus
Listing all the units in the dpr file instead of relying on the search path.
Using the command line compiler dcc32 or ecc32.
Things that didn't apply to my case:
Avoiding having dependencies on network shares.
Using DelphiSpeedUp, because I already had it.
Using a single folder for all dcu (I always do it)
Things that I didn't try:
Upgrading to another Delphi version.
Using dcc32speed.exe
Using a solid-state drive (I didn't tried it, but I tried with a ramdisk where I put all the source code. But maybe I should have installed delphi on the ramdisk too)
Some things that could slow down the compiler
Redundant units in your uses clause. See this question for a link to CnPack.
Not explicitly adding units to your project file. You've already seem to have covered that.
Changed compiler settings, most notably include TDD32 info.
Try to get rid of unused units in your uses clause and see if it makes a difference.
using Delphi 7 and 2009, last week I pass from almost 2 minutes for compiling and another 45 seconds from hitting f9 and get the main form of my app to 20 seconds compiling and running. This things has drive me crazy for about 6 months and nothing I tried seems to work. Using filemon from SysInternals, I realize than every unit (mostly components) that compiler requires was searched in every folder that was in Search Path, yes, this produce a LOT of FileOpen, FileExists and FileNotFound, etc. What I do was, put every DCU, DFM, RES, etc from components all in a single folder, and having just this folder in the search path, and a couple of others folders required by the project; the results were amazing. Other problem prior to the fix, was debugging. It takes almos 40 seconds in each F7, F8 key press while debuging, this has been fixed too. Hope this info can help you. Greetings form Isla de Margarita, Venezuela. Excuse my english, if any error ;)
Check are there any paths in search paths that aren't on your local machine.
i.e. Don't link to binaries on network shares, and check that the search path isn't checking any network shares.
I haven't seen the compiler get slower over time, but it's been a long time since we used Delphi 6.
It seems to be generally agreed upon in the Delphi community that, if you don't want to upgrade to the latest and greatest (Delphi 2007 or 2009), then Delphi 7 is the best/fastest/most stable. You might consider upgrading.
KibitzGetOverloads sounds like something from the kibitz compiler -- the "background" compiler that gives you code-completion, background error highlighting, code tooltips, etc. Sounds like you'd be better off checking the call stack of the command-line compiler, not the IDE; you'd get something more helpful.
I have never found compiles to be faster after deleting the DCUs. DCUs are there to make the build incremental, therefore faster. If you're seeing faster compiles after deleting all DCUs, check your hardware. Have you defragged your hard disk lately? How much free space do you have on the drive?
Have you set a single folder to get the DCUs. If not, they will be scattered all over.
Put all the units and their implicitly called units (except installed components from Library path) in the dpr. To be sure you did not miss some, empty your search path, it still should compile.
After reducing the search path, you can try to reduce your library path by installing your components into fewer folders.
Although only partly relevant to your exact question, I hear that the use of a solid-state drive is vastly increasing compile time with Delphi - Nick Hodges said this himself on the Delphi Podcast a couple of week ago.
Brian
U can automatically get rid of
unnecesseary unit references, which is very efficient optimization for compiling speed.
In your situation, dividing your
project into packages can improve
compiling speed. With this way, it
just generates modified package(s),
not single massive binary for each
recompilation. Working with packages
can also help about easy deployment
of your project updates.
Turn off your live scanning antivirus
We had the same (or similar) problem.
I of our package has compilation Time about 12 min.
After changes, now we have moved to 32 sg.
After many tests we found that the "problematic situation" was the following:
In a single package:
The A unit uses a large number of units: U1, U2, U3, U4, ... U100 (Uses of Interface) in the same package. This is an important unit that centralizes all the initialization work.
All units of the package, U1, U2, U3, .., U100 uses unit A (use of implementation)
This "circular reference" does not give compilation errors because the USES are different, but caused a large compile-time.
SOLUTION:
Eliminate the reference to each unit, U1, U2, U3 ,...., U100 in the A Unit.
Now, A unit use a large number of units: U1, U2 ,...., U100, but the units U1, U2 ,..., U100, does not use the unit A.
After this change the compile-time is down drastically.
If you have a similar situation, you can try this.
Excuse for my bad english.
Greetings.
Neftalí -Germán Estévez-
I had the same problem and I can come up with (2) reasons it effected me.
Circular references. The gentleman who stated that one was correct. I would have certain LARGE projects that would compile fast, and SMALL projects that compiled slow. Could not figure it out until I restructured the code and then I got the faster compile speeds. Lots of small units. It's easy to build monolithic units. But, there are many penalties from it.
I've heard it a 1000 times, develop on a slow machine like your users might be using. Hey, that's for the testing department. I can't waste time with compiling, Delphi load speeds, packages, etc. I went out and bought a "GAMERS" computer (WOW) with the Solid State Drives (as mentioned earlier), 12GB RAM, OVERCLOCKED "i7" Intel chip, triple video cards (linked), all on Vista64 (Vista is not bad once it is finally running with all installed parts). It was a real pain to get it all set up. But, I am not waiting anymore on my computer. Pure compile speed, load speed, plus a new fresh machine without all of the crap that was installed on the last one over the last 2 years. I even unloaded DelphiSpeedUp. Did not need it. And I don't need to turn off AntiVirus, since I did that one as well, and got penalized with the internet crap. So AntiVirus stays on. Pure and simple, get a BALLS OUT machine. Your time is worth more than what you will spend on a new computer.
Try to install a ram disk and set your dcu output path to point there. This more than halved my compilation time with Delphi 2007 on top of DelphiSpeedUp.
The compiler will only compile units that have changed. If you have changed the code in the interface section all units that depend on the changed unit are compiled. If only code in the implementation section is changed, the compile will only that unit but presumably link all the modules. Implies a good design of interfaces up front but if you restructure the code to restrict changes to the implementation compile times might reduce. I have no idea by how much. This fact is mentioned in the Delphi help files under Multiple and indirect unit references in Delphi 7 "Using Delphi".
Do not compile on network drives. Seek time is dramatically worse.
Consider pointing your dcu ("unit output" directory to a ramdrive.
Limit the number of include/unit directories.
Try to avoid minor circular references that the compiler still accepts, specially for large units (e.g. generated ORM units for your OPF). It might cause large units to be compiled twice. (does Delphi allow minor mutual circulars, or is that a FPC only feature?)
I never tried, but hardcoding all files with full/relative path in the central .dpr might also help (script to regenerate/update?). (you mention that above, but was it with path xx in '\path\yyy' notation?).
Other long shots:
Use Kylix (file/dir I/O under Linux is dramatically better in my experience (though that is from FPC experience)). Maybe we need a reversed cross-kylix :-)
Use a separate (windows) build machine, and tweak NTFS over the registry to be less "safe". (which you don't care for, since everything is a revision system to begin with). Afaik these options can only be done global for all filesystems, hence the separate system. Throw in a raid array or Raptor too.
Forget solid state. Nice buzz atm, but the high write ratio will kill it eventually (both life and performance when it gets fuller and can't optimally allocate anymore), and you need the expensive intel ones to beat two $75 HD's in RAID.
P.s. Sorry for the FPC references. I do both, and I sometimes don't know anymore what belongs to what.
What I do is always make sure to have very few directories in the library path, and most of the components and static code. I also make sure that NO sourcecode is available in the library path, only .dcu/.res etc. Only browsepath has the sourcecode, and special circumstances are handled through searchpath for the project.
Just limit what you compile in any situation.
A few years later I am struggling again with increasing compiling times. I am currently using Delphi XE4 and I am at a point where I absolutely need to refactor the units references. I thought about a new way to identify where are the problems:
I’m using Process Monitor from Microsoft/SysInternals to monitor the compiler:
I start Process Monitor with a filter to show only dcc32.exe
(or bds.exe when working from the IDE).
I build my project from the command line.
At the end I look at the CreateFile operations in the log of Process Monitor.
For each unit there will an entry for the .PAS file (when the compiler starts working on this unit) and one for the .DCU file (when the compiler is complexly done with this unit). By working on the log with a text editor and/or with Excel I can extract this kind of information:
A kind of “tree”, where you recursively see in which order the units have been compiled.
For each unit the delay between “.PAS file opened“ and “.DCU file written”.
Then I try to interpret the results to find places where doing some refactoring would speed the compile time. It is not so easy, but I’m getting some encouraging results.

Resources