I have a fast question - are there any differences in memory structure between JVM 8 and JVM 11?
For example, in JVM 8 permanent generation was replaced with metaspace. I'm asking about changes like this. Unfortunately, I can't find any articles on the internet about that.
Going through the list of Java Enhancement Proposals (JEPs), the following ones seem to be relevant to memory structure in some way:
JEP 143: Improve Contended Locking
JEP 197: Segmented Code Cache
JEP 248: Make G1 the Default Garbage Collector
JEP 254: Compact Strings
JEP 270: Reserved Stack Areas for Critical Sections
JEP 310: Application Class-Data Sharing
JEP 316: Heap Allocation on Alternative Memory Devices
JEP 333: ZGC: A Scalable Low-Latency Garbage Collector (Experimental)
G1-by-default (JEP 248) may need some adjustments to tuning and monitoring, similar to the PermGen removal, and so does the segmented code cache (JEP 197).
Related
I am starting out with Elixir and observing some strange behavior when connect to my remote production node using iex.
As in the screenshot below, the observer reports that total of 92 MB memory is in use. However, when you sum up the memory consumption of processes, atoms, binaries, code and ets, it comes up to be: ~69 MB
Processes 19.00 MB
Atoms 0.97 MB (969 kB)
Binaries 13.00 MB
Code 28.00 MB
ETS 7.69 MB (7685 kB)
-------------------
Total 68.66 MB
So, my first question is where is this extra 23 MB of memory is coming from? I am pretty sure its not just a reporting issue. Because when I look at my Kubernetes pod's memory consumption, it is ~102 MB which is in alignment with the numbers observer is showing.
Only thing I can think of is that those 23 MB has not been garbage collected yet. Is my assumption valid? If so, its been 6 hours since this container started. And I have been monitoring the memory consumption from very beginning. Shouldn't this be garbage collected by now?
And second question: are there any Erlang VM / Elixir configuration tweaks I can make to optimize on memory footprint?
I have also been attempting to solve issues regarding memory management in OTP applications and one tool that has been particularly useful for me is the library written by Fred Hebert called recon. Especially the recon_alloc module that provides very useful information on memory usage in the Erlang VM.
The missing MegaBytes
The following quote is directly taken from the documentation of the recon_alloc:memory() function and might provide you an insight of what's going on :
The memory reported by `allocated' should roughly match what the OS
reports. If this amount is different by a large margin, it may be the
sign that someone is allocating memory in C directly, outside of
Erlang's own allocator -- a big warning sign. There are currently
three sources of memory alloction that are not counted towards this
value: The cached segments in the mseg allocator, any memory allocated
as a super carrier, and small pieces of memory allocated during
startup before the memory allocators are initialized. Also note that
low memory usages can be the sign of fragmentation in memory, in which
case exploring which specific allocator is at fault is recommended.
So I think that the extra 23 MB of memory usage might be caused by some undesired allocations, or perhaps due to fragmentation.
Tweaking ( with great caution /!\ )
As for your second question, there is a tool in Erlang called erts_alloc that also describes manual configuration of memory allocators. It can be done by passing command-line flags to the emulator, for example :
erl +SOMEFLAG +SOMEOTHERFLAG
But there's a big red warning in the documentation that strongly suggests that messing with these flags can result in much worse behaviour than with the default configuration.
So my advice would be to resort to these modifications if it is really the only way to solve the problem. In that case, there is a book about the Erlang Runtime System that has helped me understanding some aspects so I would also recommend giving it a read beforehand.
NOTE : Wild shot in the dark here and not answering your question directly, but it might be useful to double check what is going on with your binaries, as I see that there are 13 MB reported by the observer. Depending on their size (smaller or larger than 64 bytes), they are stored in process heaps or accessed by reference. I have faced case #1 with lots of small binaries piling up and ultimately crashing my system.
There are a few other helpful resources I found while trying to fix those problems :
This specific snippet from a blog post authored by Fred Hebert as well :
[erlang:garbage_collect(Pid) || Pid <- processes()].
It will trigger a GC on all running processes immediately. In my case it has done wonders. You can add an option to call it asynchronously too, so you don't have to block until it's all done :
[erlang:garbage_collect(Pid, [{async, RequestId}]) || Pid <- processes()].
This article about GC in Erlang
Efficiency guidelines in the Erlang docs for binaries, that provide useful implementation details.
Stuff goes Bad : Erlang in Anger, another free ebook written by ... yes it is Fred Hebert.
Hope this helps :)
Alright so I have a question regarding the Memory segments of a JVM,
I know every JVM would choose to implement this a little bit different yet it is an overall concept that should remain the same within all JVM's
A standart C / C++ program that does not use a virtual machine to execute during runtime has four memory segments during runtime,
The Code / Stack / Heap / Data
all of these memory segments are automatically allocated by the Operating System during runtime.
However, When a JVM executes a Java compiled program, during runtime it has 5 Memory segments
The Method area / Heap / Java Stacks / PC Registers / Native Stacks
My question is this, who allocates and manages those memory segments?
The operating system is NOT aware of a java program running and thinks it is a part of the JVM running as a regular program on the computer, JIT compilation, Java stacks usage, these operations require run-time memory allocation, And what I'm failing to understand Is how a JVM divides it's memory into those memory segments.
It is definitely not done by the Operating System, and those memory segments (for example the java stacks) must be contiguous in order to work, so if the JVM program would simply use a command such as malloc in order to receive the maximum size of heap memory and divide that memory into segments, we have no promise for contiguous memory, I would love it if someone could help me get this straight in my head, it's all mixed up...
When the JVM starts it has hundreds if not thousand of memory regions. For example, there is a stack for every thread as well as a thread state region. There is a memory mapping for every shared library and jar. Note: Java 64-bit doesn't use segments like a 16-bit application would.
who allocates and manages those memory segments?
All memory mappings/regions are allocated by the OS.
The operating system is NOT aware of a java program running and thinks it is a part of the JVM running as a regular program on the computer,
The JVM is running as a regular program however memory allocation uses the same mechanism as a normal program would. The only difference is that in Java object allocation is managed by the JVM, but this is the only regions which work this way.
JIT compilation, Java stacks usage,
JIT compilation occurs in a normal OS thread and each Java stack is a normal thread stack.
these operations require run-time memory allocation,
It does and for the most part it uses malloc and free and map and unmap
And what I'm failing to understand Is how a JVM divides it's memory into those memory segments
It doesn't. The heap is for Java Objects only. The maximum heap for example is NOT the maximum memory usage, only the maximum amount of objects you can have at once.
It is definitely not done by the Operating System, and those memory segments (for example the java stacks) must be contiguous in order to work
You are right that they need to be continuous in virtual memory but the OS does this. On Linux at least there is no segments used, only one 32-bit or 64-bit memory region.
so if the JVM program would simply use a command such as malloc in order to receive the maximum size of heap memory and divide that memory into segments,
The heap is divided either into generations or in G1 multiple memory chunks, but this is for object only.
we have no promise for contiguous memory
The garbage collectors either defragment memory by copying it around or take steps to try to reduce it to ensure there is enough continuous memory for any object you allocate.
would love it if someone could help me get this straight in my head, it's all mixed up...
In short, the JVM runs like any other program except when Java code runs it's object are allocated in a managed region of memory. All other memory regions act just as they would in a C program, because the JVM is a C/C++ program.
I have a few question about stack.
Is stack in CPU or RAM?
Is stack a place to run OPcode?
Is EIP in CPU or RAM?
Stack is always in RAM. There is a stack pointer that is kept in a register in CPU that points to the top of stack, i.e., the address of the location at the top of stack.
The stack is found within the RAM and not within the CPU. A segment is dedicated for the stack as seen in the following diagram:
From Wiki:
The stack area contains the program stack, a LIFO structure, typically
located in the higher parts of memory.
Which CPU are you talking about?
Some might contain memory that is used for callstacks, some contain memory that can be used for callstacks but require the OS to implement the callstack management code, and others contain no writable memory at all. For example, the x86 architecture tends to have one or more code caches and data caches built into the CPU.
Some CPUs or OSes implement operations that make specific areas of memory non-executable. To prevent stack-based buffer overflows, for example, many OSes use hardware and/or software-based data execution prevention, which might prevent stack memory from being executed as code. Some don't; It's entirely possible that an x86 CPU data cache line might be used to store both the callstack and code to be executed in faster memory.
EIP sounds like a register for the IA32 CPU architecture. If you're referring to IA-32, then yes, it's a CPU operation, though many OSes will switch it to/from RAM to emulate multi-tasking.
In modern architectures stack is mapped in ram.
Programming languages such ar C, C++, Pascal can allocate memory in ram, this is called Heap allocation, and other variables which live withing functions are stack allocated.
This dictated processors and operating systems to consider stack mapped within ram segment. And for processors with Memory Management Unit this can be anywhere in the ram. However, intel 8080 had a state bit indicating when it reads/writes from stack, thus stack could be implemented physically isolated from RAM. It is not known to me if such machine was implemented, but think of the situation, what memory does a C pointer points to, Heap or Stack.
Should Stack separation gain popularity we should have stack pointer and heap pointer in modern programming languages.
I am getting an OutOfMemoryError. How do I solve this?
Error: java.lang.OutOfMemoryError: Java heap space
You need to either increase the available heap space (with the Java -Xmx flag) or use less memory in your application.
I'd recommend that you try to use less memory. There are plenty of good profiling tools out there that you can use to discover where your code uses a lot of memory. It's also worth checking that you are not misusing any HashMaps or ObjectOutputStreams. Those two classes are notorious for soaking up memory if not used properly.
In C/C++ I can allocate memory in one thread and delete it in another thread. Yet whenever one requests memory from the heap, the heap allocator needs to walk the heap to find a suitably sized free area. How can two threads access the same heap efficiently without corrupting the heap? (Is this done by locking the heap?)
In general, you do not need to worry about the thread-safety of your memory allocator. All standard memory allocators -- that is, those shipped with MacOS, Windows, Linux, etc. -- are thread-safe. Locks are a standard way of providing thread-safety, though it is possible to write a memory allocator that only uses atomic operations rather than locks.
Now it is an entirely different question whether those memory allocators scale; that is, is their performance independent of the number of threads performing memory operations? In most cases, the answer is no; they either slow down or can consume a lot more memory. The first scalable allocator in both dimensions (speed and space) is Hoard (which I wrote); the Mac OS X allocator is inspired by it -- and cites it in the documentation -- but Hoard is faster. There are others, including Google's tcmalloc.
Yes an "ordinary" heap implementation supporting multithreaded code will necessarily include some sort of locking to ensure correct operation. Under fairly extreme conditions (a lot of heap activity) this can become a bottleneck; more specialized heaps (generally providing some sort of thread-local heap) are available which can help in this situation. I've used Intel TBB's "scalable allocator" to good effect. tcmalloc and jemalloc are other examples of mallocs implemented with multithreaded scaling in mind.
Some timing comparisons comparisons between single threaded and multithread-aware mallocs here.
This is an Operating Systems question, so the answer is going to depend on the OS.
On Windows, each process gets its own heap. That means multiple threads in the same process are (by default) sharing a heap. Thus the OS has to thread-synchronize its allocation and deallocation calls to prevent heap corruption. If you don't like the idea of the possible contention that may ensue, you can get around it by using the Heap* routines. You can even overload malloc (in C) and new (in C++) to call them.
I found this link.
Basically, the heap can be divided into arenas. When requesting memory, each arena is checked in turn to see whether it is locked. This means that different threads can access different parts of the heap at the same time safely. Frees are a bit more complicated because each free must be freed from the arena that it was allocated from. I imagine a good implementation will get different threads to default to different arenas to try to minimize contention.
Yes, normally access to the heap has to be locked. Any time you have a shared resource, that resource needs to be protected; memory is a resource.
This will depend heavily on your platform/OS, but I believe this is generally OK on major sytems. C/C++ do not define threads, so by default I believe the answer is "heap is not protected", that you must have some sort of multithreaded protection for your heap access.
However, at least with linux and gcc, I believe that enabling -pthread will give you this protection automatically...
Additionally, here is another related question:
C++ new operator thread safety in linux and gcc 4