htop shows memory still used even though deallocate was called - memory

mac osx (catalina)
gfortran 9.3.0 from homebrew
htop 2.2.0 from homebrew
I have the following program in memtest.f90 which I compile with gfortran memtest.f90 -o test and then call with ./test
program main
implicit none
integer, parameter :: n=100000000
real, allocatable :: values(:)
print *, "no memory used yet, press enter"
read(*,*)
allocate(values(n))
values = 0.0
print *, "used a lot of memory, press enter"
read(*,*)
deallocate(values)
print *, "why is the memory still there in htop"
read(*,*)
end program main
I am expecting the memory used by the program to drop after calling the deallocate statement, however, as indicated by htop it continues to hover at about 382 MB (see image below)
is this a memory leak and if so how do I properly release the memory or am I just doing something wrong in looking at the memory consumed by the program?

The program will typically not return the memory to the operating system below some threshold. It may also take some time to be freed. This is not a Fortran issue, but rather a system issue.
I did not mark it as a duplicate of this Will malloc implementations return free-ed memory back to the system? because it is quite indirect, and deserves some commenting, but the issue is there. Fortran compilers typically call the malloc provided by the operating system's or accompanying C-compiler's C library.

Related

Can i give mapped memory to malloc?

Say I have a big block of mapped memory I finished using. It came from mmaping anonymous memory or using MAP_PRIVATE. I could munmap it, then have malloc mmap again the next time I make a big enough allocation.
Could I instead give the memory to malloc directly? Could I say "Hey malloc, here's an address range I mapped. Go use it for heap space. Feel free to mprotect, mremap, or even munmap it as you wish."?
I'm using glibc on linux.
glibc malloc calls __morecore (a function pointer) to obtain more memory. See <malloc.h>. However, this will not work in general because the implementation assumes that the function behaves like sbrk and returns memory from a single, larger memory region. In practice, with glibc malloc, the only realistic way to make memory available for reuse by malloc is calling munmap.
Other malloc implementations allow donating memory (in some cases as internal interfaces). For example, musl's malloc has a function called __malloc_donate which should do what you are asking for.

RAM Memory in TCL

I have a code that creates a global array and when I unset the array the memory is still busy.
I have tried in Windows with TCL 8.4 and 8.6
console show
puts "allocating memory..."
update
for {set i 0} {$i < 10000} {incr i} {
set a($i) $i
}
after 10000
puts "deallocating memory..."
update
foreach v [array names a] {
unset a($v)
}
after 10000
exit
In a lot of programs, both written in Tcl and in other languages, past memory usage is a pretty good indicator of future memory usage. Thus, as a general heuristic, Tcl's implementation does not try to return memory to the OS (it can always page it out if it wants; the OS is always in charge). Indeed, each thread actually has its own memory pool (allowing memory handling to be largely lock-free), but this doesn't make much difference here where there's only one main thread (and a few workers behind the scenes that you can normally ignore). Also, the memory pools will tend to overallocate because it is much faster to work that way.
Whatever you are measuring with, if it is with a tool external to Tcl at all, it will not provide particularly good real memory usage tracking because of the way the pooling works. Tcl's internal tools for this (the memory command) provide much more accurate information but aren't there by default: they're a compile-time option when building the Tcl library, and are usually switched off because they have a lot of overhead. Also, on Windows some of their features only work at all if you build a console application (a consequence of how they're implemented).

Is the stack only preserved above the stack pointer?

I sometimes see disassembled programs which have instructions like:
mov %eax, -4(%esp)
which stores eax to stack at esp-4, without changing esp.
I'd like to know whether in general, you could put data into the stack beyond the stack pointer, and have those data be preserved (not altered unless I do it specifically).
Also, does this depend on which OS I use?
It matters which OS you use, because different OSes have different ABIs. (See the x86 tag wiki if you don't know what that means).
There are two ways I can see that mov %eax, -4(%esp) could be sane:
In the Linux x32 ABI (long mode with 32bit pointers), where there's a 128B red zone like in the normal x86-64 ABI. Compilers frequently generate code using the address-size prefix when they can't prove that e.g. 4(%rdi) would be the same as 4(%edi) in every case (e.g. wraparound). Unfortunately gcc 5.3 still uses 32bit addressing for locals on the stack, which could only wrap if %rsp == 0 (since the ABI requires it to be 16B-aligned).
Anyway, void foo(void) { volatile int x = 10; } compiles to
movl $10, -4(%esp) / ret with gcc 5.3 -O3 -mx32 on the Godbolt Compiler Explorer.
In (kernel) code that runs with interrupts disabled. Since nothing asynchronous other than DMA can happen, nothing can clobber your stack memory. (Although x86 has NMIs: Non-maskable interrupts. Depending on the handler for NMIs, and whether they can be blocked at all, NMIs could clobber memory below the stack pointer, I think.)
In user-space, your signal handlers aren't the only thing that can asynchronously clobber memory below the stack pointer:
As Jester points out in comments on dwelch's answer, pages below the stack pointer can be discarded (asynchronously of course), so a process that temporarily uses a lot of stack isn't wasting all those pages forever. If %esp happens to be at a page boundary, -4(%esp) is in a different page. And instead of faulting in a newly-allocated page of stack memory, access to unmapped pages below the stack pointer turn into segfaults on Linux.
Unless you have a guarantee otherwise (e.g. the red zone), then you must assume that everything below %esp is scribbled over between every instruction. None of the standard 32bit ABIs have a red-zone, and the Windows 64bit ABI also lacks one. Asynchronous use of the stack (usually by signal handlers in Linux) is a whole-program thing, not something that the compiler could determine just from the current compilation unit (even in cases where the compiler could prove that -4(%esp) was in the same page as (%esp)).
Note that the Linux x32 ABI is a 64bit ABI for AMD64 aka x86-64, not i386 aka IA32 aka x86-32. It's much more like the usual AMD64 ABI, since it was designed after.
EDIT
not sure what you mean by above and below since some folks "see" addresses increasing up or increasing down.
But it doesnt matter. If the stack was initialized at address X and is currently at Y then the data between X and Y must be preserved (one end not inclusive). The memory on either side is fair game.
The compiler not the operating system makes this happen, it moves the stack pointer to cover whatever it needs for that function. And moves it back when done. Each nested function consuming more and more stack and each return giving a little back.

POSIX rlimit: What exactly can we assume about RLIMIT_DATA?

Prequisites
POSIX.1 2008 specifies the setrlimit() and getrlimit() functions. Various constants are provided for the resource argument, some of which are reproduced below for easier understaning of my question.
The following resources are defined:
(...)
RLIMIT_DATA
This is the maximum size of a data segment of the process, in bytes. If this limit is exceeded, the malloc() function shall fail with errno set to [ENOMEM].
(...)
RLIMIT_STACK
This is the maximum size of the initial thread's stack, in bytes. The implementation does not automatically grow the stack beyond this limit. If this limit is exceeded, SIGSEGV shall be generated for the thread. If the thread is blocking SIGSEGV, or the process is ignoring or catching SIGSEGV and has not made arrangements to use an alternate stack, the disposition of SIGSEGV shall be set to SIG_DFL before it is generated.
RLIMIT_AS
This is the maximum size of total available memory of the process, in bytes. If this limit is exceeded, the malloc() and mmap() functions shall fail with errno set to [ENOMEM]. In addition, the automatic stack growth fails with the effects outlined above.
Furthermore, POSIX.1 2008 defines data segment like this:
3.125 Data Segment
Memory associated with a process, that can contain dynamically allocated data.
I understand that the RLMIT_DATA resource was traditionally used to denote the maximum amount of memory that can be assigned to a process with the brk() function. Recent editions of POSIX.1 do no longer specify this function and many operating systems (e.g. Mac OS X) do not support this function as a system call. Instead it is emulated with a variant of mmap() which is not part of POSIX.1 2008.
Questions
I am a little bit confused about the semantic and use of the RLIMIT_DATA resource. Here are the concrete questions I have:
Can the stack be part of the data segment according to this specification?
The standard says about RLIMIT_DATA: “If this limit is exceeded, the malloc() function shall fail with errno set to [ENOMEM].” Does this mean that memory allocated with malloc() must be part of the data segment?
On Linux, memory allocated with mmap() does not count towards the data segment. Only memory allocated with brk() or sbrk() is part of the data segment. Recent versions of the glibc use a malloc() implementation that allocates all its memory with mmap(). The value of RLIMIT_DATA thus has no effect on the amount of memory you can allocate with this implementation of malloc().
Is this a violation of POSIX.1 2008?
Do other platforms exhibit similar behavior?
The standard says about RLIMIT_AS: "If this limit is exceeded, the malloc() and mmap() functions shall fail with errno set to [ENOMEM]." As the failure of mmap() is not specified for RLIMIT_DATA, I conclude that memory obtained from mmap() does not count towards the data segment.
Is this assumption true? Does this only apply to non-POSIX variants of mmap()?
FreeBSD also shares the problem of malloc(3) being implemented using mmap(2) in the default malloc implementation. I ran into this when porting a product from FreeBSD 6 to 7, where the switch happened. We switched the default limit for each process from RLIMIT_DATA=512M to RLIMIT_VMEM=512M, i.e. limit the virtual memory allocation to 512MB.
As for whether this violates POSIX, I don't know. My gut feeling is that lots of things violate POSIX and a 100% POSIX compliant system is as rare as a strictly-confirming C compiler.
EDIT: heh, and now I see that FreeBSD's name RLIMIT_VMEM is non-standard; they define RLIMIT_AS as RLIMIT_VMEM for POSIX compatibility.

SBCL used memory reports from top and (room) differs

I am running SBCL 1.0.51 on a Linux (Fedora 15) 32-bit system (kernel 3.6.5) with 1GB Ram and 256MB swap space.
I fire up sbcl --dynamic-space-size 125 and start calling a function that makes ~10000 http-requests (using drakma) to an http (couchDB) server and I just format to the standard-output the results of an operation on the returned data.
After each call I do a (sb-ext:gc :full t) and then (room). The results are not growing. No matter how many times I run the function, (room) reports the same used space (with some ups and downs, but around the same average which does not grow).
BUT: After every time I call the function, top reports that the VIRT and RES amount of the sbcl process keeps growing ,even beyond the 125MB space I told sbcl to ask for itself. So I have the following questions:
Why top -reported memory keeps growing, while (room) says it does not? The only thing I can think of is some leakage through ffi. I am not directly calling out with ffi but maybe some drakma dep does and forgets to free its C garbage. Anyway I dont know if this could even be an explanation. Could it be something else? Any insights?
Why isnt --dynamic-space-size honoured?

Resources