What could explain the difference in memory usage reported by FastMM or GetProcessMemoryInfo? - delphi

My Delphi XE application is based on a single EXE using a local server DLL created by RemObjects and uses a lot of memory for a specific operation until it generates an exception saying there are not enough memory. So I'm trying to understand why and where this is happening so I placed various steps throughout my code where I report on memory usage. The problem is that I'm getting very different information based on the method used to get memory usage information:
If I use the method explained here which asks FastMM directly for both the Client EXE and Server DLL, here is what I get:
STEP 1: [client] = 36664572 - [server] = 3274976
STEP 2: [client] = 62641230 - [server] = 44430224
STEP 3: [client] = 66665630 - [server] = 44430224
Now if I use the method explained here which uses GetProcessMemoryInfo, I get far more memory usage:
STEP 1: [process] = 133722112
STEP 2: [process] = 1072115712
STEP 3: [process] = 1075818496
It looks like second method is the right based on my memory problems but how could the FastMM method be so "low" ? And what can explain the difference ?

GetProcessMemoryInfo also reports memory that is not managed by FastMM, like memory that is allocated by the various non Delphi dlls you might call (like winapi).
Also FastMM can allocate more memory from Windows that your application actually uses for internal structures, fragmentation and pooling.
And as last, with GetProcessMemoryInfo you measuring the Workingset size. That is what part of the application's memory currenctly in RAM instead if in the page file. It includes more than just data structures and is definately not comparable to the total memory the application has allocated. PagefileUsage would be more comparable. Workingset size almost never is what you are looking for. See here for a better explanation.
So they both give different results because they both measure different things.

Related

apache/lucenenet Unable to limit memory usage - RAMBufferSizeMB, RAMPerThreadHardLimitMB, and MaxBufferedDocs in IndexWriterConfig has no effect

Note that I've also posted an issue on GitHub on the repo: https://github.com/apache/lucenenet/issues/784
I'm running the latest Lucene .NET versions:
Lucene.Net 4.8.0-beta00016
Lucene.Net.Analysis.Common 4.8.0-beta00016
I'm doing the following:
using var analyzer = new KeywordAnalyzer();
using var directory = FSDirectory.Open(IndexPath);
var config = new IndexWriterConfig(LuceneVersion, analyzer)
{
OpenMode = OpenMode.CREATE, // Use OpenMode.CREATE to overwrite, or OpenMode.APPEND to just open
RAMPerThreadHardLimitMB = 100,
RAMBufferSizeMB = 100,
};
using var writer = new IndexWriter(directory, config);
// Write index to disk
writer.AddDocuments(productDocuments);
writer.AddDocuments(productCategoryDocuments);
writer.AddDocuments(productTypeDocuments);
writer.AddDocuments(productLineDocuments);
writer.Commit();
As soon as writer.AddDocuments() is called, the memory consumption grows a lot.
You can see the sudden increase in the Diagnostic Tool in visual studio:
Running it multiple times keeps increasing the memory usage up to 3GB (on my machine), at which point it doesn't grow any longer. And these 3GB are never released again. The program doesn't crash, and it stops acquiring memory.
I want to limit how much memory Lucene can use, but using RAMBufferSizeMB and RAMPerThreadHardLimitMB seems to have no effect at all.
I tried flushing as well, which had no effect, and I tried calling Dispose() which also had no effect.
I've also tried setting MaxBufferedDocs to 1000, still no limit to RAM usage.
Have I missed something in the documentation? Is there a way to limit RAM usage of the Lucene .NET IndexWriter?
To future developers having memory consumption issues with Lucene.Net:
Try writing one document at a time with writer.AddDocument() rather than writer.AddDocuments().
writer.AddDocument() handles memory much better than writer.AddDocuments().
Also calling writer.Commit() more will flush memory more often, and the Garbage Collector can keep the memory consumption lower.
My experiments with Lucene in a Console Application can be seen here: https://github.com/apache/lucenenet/issues/784

How do I debug a memory issue in Rust?

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

How do I fix this memory leak while using Chromium Embedded?

The GuiDemo code for Chromium Embedded (https://code.google.com/p/delphichromiumembedded/) is leaking few bytes of memory. Not much but it is VERY annoying to get that message from FastMM every time you stop the app. I guess the leak is in the Chromium Interface.
The unit has a Initialization section:
INITIALIZATION
CefCache := 'cache';
CefRegisterCustomSchemes := CefOnRegisterCustomSchemes;
CefRegisterSchemeHandlerFactory('dcef', '', True, TFileScheme);
The log is this:
A memory block has been leaked. The size is: 20
This block was allocated by thread 0x1674, and the stack trace (return addresses) at the time was:
40455E
4050A7
409C1D
405622
4050DC
4F0D7A
406598
406604
40A6C3
4F0E28
764CEE1C [BaseThreadInitThunk]
The block is currently used for an object of class: main$174$ActRec
The allocation number is: 323
--------------------------------2014/10/5 17:11:33--------------------------------
This application has leaked memory. The small block leaks are (excluding expected leaks registered by pointer):
13 - 20 bytes: main$174$ActRec x 1
The thing is that I have no clue who main$174$ActRec is.
The unit that hosts the demo is called indeed 'main.pas'. But there is no other var called 'main'.
main$174$ActRec is associated with the interface used to support an anonymous method. So that should give you a clue as how to look for the leak.
If you included an exception logging suite like madExcept, EurekaLog of JCL, you'd get a meaningful stack trace from FastMM. That also would help you find where the leak originates.
Once you can find what has been leaked then it ought to be possible to find a way to register it as an expected leak. However, if you can identify what has been leaked then I'd suggest trying to find a way not to leak it.
I can't help you identify the leak further because you didn't give any more information. There are many demos for this project and I don't know which one you are running.
The error is telling you that the memory block holds an instance of a main$174$ActRec class, not that the memory was allocated by the main$174$ActRec class. Somewhere in your app, ActRec.Create() is being called, but ActRec.Destroy() was not called. Since you do not know the exact memory address of the object being leaked, or at least the memory address of the variable that points at the object, you cannot register it by address. However, the full version of FastMM has an overloaded RegisterExpectedMemoryLeak() function that accepts a class type and count as input. That allows you to tell FastMM how many instances of the class type are allowed to be leaked before FastMM starts reporting them as leaks. Of course, that means you need access to the class type. If it is something internal to Chromium, you may be out of luck.

"EXC_BAD_ACCESS" vs "Segmentation fault". Are both same practically?

In my first few dummy apps(for practice while learning) I have come across a lot of EXC_BAD_ACCESS, that somehow taught me Bad-Access is : You are touching/Accessing a object that you shouldn't because either it is not allocated yet or deallocated or simply you are not authorized to access it.
Look at this sample code that has bad-access issue because I am trying to modify a const :
-(void)myStartMethod{
NSString *str = #"testing";
const char *charStr = [str UTF8String];
charStr[4] = '\0'; // bad access on this line.
NSLog(#"%s",charStr);
}
While Segmentation fault says : Segmentation fault is a specific kind of error caused by accessing memory that “does not belong to you.” It’s a helper mechanism that keeps you from corrupting the memory and introducing hard-to-debug memory bugs. Whenever you get a segfault you know you are doing something wrong with memory (more description here.
I wanna know two things.
One, Am I right about objective-C's EXC_BAD_ACCESS ? Do I get it right ?
Second, Are EXC_BAD_ACCESS and Segmentation fault same things and Apple has just improvised its name?
No, EXC_BAD_ACCESS is not the same as SIGSEGV.
EXC_BAD_ACCESS is a Mach exception (A combination of Mach and xnu compose the Mac OS X kernel), while SIGSEGV is a POSIX signal. When crashes occur with cause given as EXC_BAD_ACCESS, often the signal is reported in parentheses immediately after: For instance, EXC_BAD_ACCESS(SIGSEGV). However, there is one other POSIX signal that can be seen in conjunction with EXC_BAD_ACCESS: It is SIGBUS, reported as EXC_BAD_ACCESS(SIGBUS).
SIGSEGV is most often seen when reading from/writing to an address that is not at all mapped in the memory map, like the NULL pointer, or attempting to write to a read-only memory location (as in your example above). SIGBUS on the other hand can be seen even for addresses the process has legitimate access to. For instance, SIGBUS can smite a process that dares to load/store from/to an unaligned memory address with instructions that assume an aligned address, or a process that attempts to write to a page for which it has not the privilege level to do so.
Thus EXC_BAD_ACCESS can best be understood as the set of both SIGSEGV and SIGBUS, and refers to all ways of incorrectly accessing memory (whether because said memory does not exist, or does exist but is misaligned, privileged or whatnot), hence its name: exception – bad access.
To feast your eyes, here is the code, within the xnu-1504.15.3 (Mac OS X 10.6.8 build 10K459) kernel source code, file bsd/uxkern/ux_exception.c beginning at line 429, that translates EXC_BAD_ACCESS to either SIGSEGV or SIGBUS.
/*
* ux_exception translates a mach exception, code and subcode to
* a signal and u.u_code. Calls machine_exception (machine dependent)
* to attempt translation first.
*/
static
void ux_exception(
int exception,
mach_exception_code_t code,
mach_exception_subcode_t subcode,
int *ux_signal,
mach_exception_code_t *ux_code)
{
/*
* Try machine-dependent translation first.
*/
if (machine_exception(exception, code, subcode, ux_signal, ux_code))
return;
switch(exception) {
case EXC_BAD_ACCESS:
if (code == KERN_INVALID_ADDRESS)
*ux_signal = SIGSEGV;
else
*ux_signal = SIGBUS;
break;
case EXC_BAD_INSTRUCTION:
*ux_signal = SIGILL;
break;
...
Edit in relation to another of your questions
Please note that exception here does not refer to an exception at the level of the language, of the type one may catch with syntactical sugar like try{} catch{} blocks. Exception here refers to the actions of a CPU on encountering certain types of mistakes in your program (they may or may not be be fatal), like a null-pointer dereference, that require outside intervention.
When this happens, the CPU is said to raise what is commonly called either an exception or an interrupt. This means that the CPU saves what it was doing (the context) and deals with the exceptional situation.
To deal with such an exceptional situation, the CPU does not start executing any "exception-handling" code (catch-blocks or suchlike) in your application. It first gives the OS control, by starting to execute a kernel-provided piece of code called an interrupt service routine. This is a piece of code that figures out what happened to which process, and what to do about it. The OS thus has an opportunity to judge the situation, and take the action it wants.
The action it does for an invalid memory access (such as a null pointer dereference) is to signal the guilty process with EXC_BAD_ACCESS(SIGSEGV). The action it does for a misaligned memory access is to signal the guilty process with EXC_BAD_ACCESS(SIGBUS). There are many other exceptional situations and corresponding actions, not all of which involve signals.
We're now back in the context of your program. If your program receives the SIGSEGV or SIGBUS signals, it will invoke the signal handler that was installed for that signal, or the default one if none was. It is rare for people to install custom handlers for SIGSEGV and SIGBUS and the default handlers shut down your program, so what you usually get is your program being shut down.
This sort of exceptions is therefore completely unlike the sort one throws in try{}-blocks and catch{}es. Those exceptions are handled purely within the application, without involving the OS at all. Here what happens is that a throw statement is simply a glorified jump to the inner-most catch block that handles that exception. As the exception bubbles through the stack, it unwinds the stack behind it, running destructors and suchlike as needed.
Basically yes, indeed an EXC_BAD_ACCESS is usually paired with a SIGSEGV which is a signal that warns about the segmentation failure.
A segmentation failure is risen whenever you are working with a pointer that points to invalid data (maybe not belonging to the process, maybe read-only, maybe an invalid address in general).
Don't think about the segmentation fault in terms of "accessing an object", you are accessing a memory location, so an address. That address must be considered coherent by the OS memory protection system.
Not all errors which are related to accessing invalid data can be tracked by the memory manager, think about a pointer to a stack allocated variable, which is considered valid although its content is not valid anymore upon restoring the stack frame.

Physical memory in task manager don't changes when momory is allocated

all
My program maybe have a memory issue, so I try to find information about memory usage provided by various tools. In order to find the cause, I do simple experiments as well.
In release mode, I add the following code,
pChar = new char[((1<<30)/2)];
for(int i; i < ((1<<30)/2); i++)
{
pChar[i] = i % 256;
}
When the code is executed, the available physical memory in Windows task manager doesn't change. In my view, the compiler may remove the code to boost performance. I declare the variable as one global variable. It doesn't work. But in debug mode, the available physical memory in Windows task manager changes as expected. I can't understand that.
I have another question. Will the new operation allocate memory from virtual memory if the physical memory runs out. Or one exception will be thrown?
It's indeed quite possible that the compiler detects a "write-only" variable. Since it's non-volatile, the writes can be safely eliminated, and then there's no need for the OS to actually allocate RAM.
new just allocates address space, on modern systems. Physical RAM is allocated when needed. Typically this happens when the ctor runs, as it initializes the members. But in new char there's of course no ctor.

Resources