What are the implications of a maximum memory limits in WebAssembly?

What are the implications of a maximum memory limits in WebAssembly? - memory

When I declare a memory section in a WebAssembly then I have to set the initial size and I can set the optional maximum size.
Does it have any advantages if I set the maximum to the same value as the initial value? What are the implications of this value for WebAssembly runtime?
Background: I write a Java to WebAssembly compiler and want to use the coming GC feature for my data. I does not need to grow the memory. I would use it only for constant values.

Allocating a large memory (especially when it's gigabytes) may fail. Failed to allocate the initial memory is a fatal error, while failed to grow the memory later is not. So it is a good idea to start with a smaller and safe initial size.
The WebAssembly communities already provides pretty good documentations:
Linear Memory Resizing in WebAssembly Design Rationale. (Highly recommended if you are writing a compiler.)
Resizing memory in Modules page.
I will sum up the information on how WebAssembly memory works here.
Why we need the optional maximum size
The underlying WebAssembly Memory is a JS ArrayBuffer object. ArrayBuffer is not a dynamic array, meaning it cannot be resized. However, Wasm Memory is a special ArrayBuffer that can be resized by Memory.grow() call, which is correspondent to grow_memory instruction in Wasm. Still, implementation of resizing ArrayBuffer costs a lot - It is the same as realloc(), which allocate a new buffer with the new size then deallocate the old buffer. You may avoid the overhead of reallocating the buffer by allocating a large initial memory but it causes another problem that the operation may fail and failed to do so means the Wasm engine failed to load the Wasm binary.
The optional maximum size solves those problems. When the maximum size is defined, the Wasm Memory tries to pre-allocate the maximum size of the buffer. By pre-allocating the buffer you can resize the buffer later without an expensive realloc() operation. It is okay even if the pre-allocating operation failed - you can try reallocte later when you need.
WebAssembly Memory Resizing Scenarios
grow_memory without setting the maximum size: The Wasm engine tries to reallocate the whole buffer, which is very expensive and increases possibilities to fail.
Allocating a big initial memory: It may fail and its failure is a fatal error.
grow_memory with the maximum size: The engine will use pre-allocated buffer instantly. Even if it failed to grow it is not a fatal error.
Setting initial size == maximum size: You won't benefit from any of them. It may experience a fatal error, and you cannot resize it later neither.

Related

How is there internal fragmentation in paging and no external fragmentation?

Please explain it nicely. Don't just write definition. Also explain what it does and how is it different from segmentation.

Fragmentation needs to be considered with memory allocation techniques. Paging is basically not a memory allocation technique, but rather a means of providing virtual address spaces.
Considering the comparison with segmentation, what you're probably asking about is the difference between a memory allocation technique using fixed size blocks (like the pages of paging, assuming 4KB page size here) and a technique using variable size blocks (like the segments used for segmentation).
Now, assume that you directly use the page allocation interface to implement memory management, that is you have two functions for dealing with memory:
alloc_page, which allocates a single page and returns a pointer to the beginning of the newly available address space, and
free_page, which frees a single, allocated page.
Now suppose all of your currently available virtual memory is used, but you need to store 1 additional byte. You call alloc_page and get a 4KB block of memory. You only use 1 byte of that huge block, but also the other 4095 bytes are, from the perspective of the allocator, used. If this happens multiple times eventually all pages will be allocated, so further calls to alloc_page will fail. Even if you just need another additional byte (which could be one of the 4095 that got wasted above) the allocator will tell you that you're out of memory. This is internal fragmentation.
If, on the other hand, you would use variable sized blocks (like in segmentation), then you're vulnerable to external fragmentation: Suppose you manage 6 bytes of memory (F means "free"):
FFFFFF
You first allocate 3 bytes for a, then 1 for b and finally 2 bytes for c:
aaabcc
Now you free both a and c, leaving only b allocated:
FFFbFF
You now have 5 bytes of unused memory, but if you try to allocate a block of 4 bytes (which is less than the available memory) the allocation will fail due to the unfavorable placement of the memory for b. This is external fragmentation.
Now, if you extend your page allocator to be able to allocate multiple pages and add alloc_multiple_pages, you have to deal with both internal and external fragmentation.

There is no external fragmentation in paging but internal fragmentation exists.
First, we need to understand what is external fragmentation. External fragmentation occurs when we have a memory to accommodate a process but it's not continuous.
How does it not occur in paging?
Paging divides virtual memory or all processes into equal-sized pages and physical memory into fixed size frames. So you are typically fixing equal size blocks called pages into equal block shaped spaces called frames! Try to visualize and conclude that there can never be external fragmentation.
In the case of segmentation, we divide virtual addresses into different sized blocks that is why there may be the case some blocks in main memory must stick together or compact to make space for the new process! I hope it helps!

When a process is divided into fix sized pages, there is generally some leftover space in the last page(internal fragmentation). When there are many processes, each of their last page's unused area could add up to be greater than or equal to size of one page. Now even if you have to total free size of one page or more but you cannot load a new page because a page has to be continuous. External fragmentation has happened. So, I don't think external fragmentation is completely zero in paging.
EDIT: It is all about how External Fragmentation is defined. The collection of internal fragmentation do not contribute to external fragmentation. External fragmentation is contributed by the empty space which is EXTERNAL to partition(or page). So if suppose there are only two frames in main memory ,say of size 16B, each occupied by only 1B data. The internal fragmentation in each frame is 15B. The total unused space is 30B. Now if you want to load one new page of some process, you will see that you do not have any frame available. You are unable to load a new page eventhough you have 30B unused space. Will you call this as external fragmentation? Answer is no. Because these 15B unused space are INTERNAL to the pages. So in paging, internal fragmentation is possible but not external fragmentation.

Paging allows a process to be allocated physical memory in non-contiguous fashion. I will answer that why external fragmentation can't occur in paging.
External frag occurs when a process, which was allocated contiguous memory , is unloaded from physical memory, which creates a hole (free space ) in the memory.
Now if a new process comes, which requires more memory than this hole, then we won't be able to allocate contiguous memory to that process due to non contiguous nature of free memory, this is called external fragmentation.
Now, the problem above originated due to the constraint of allocating contiguous memory to the process. This is what paging solved by allowing process to get non contiguous physical memory.

In paging, the probability of having external fragmentation is very low although internal fragmentation may occur.
In paging scheme, the whole main memory and the virtual memory is divided into some fixed size slots which are called pages (in case of virtual memory) and page frames (in case of main memory or RAM or physical memory). So, whenever a process is executed in main memory, it occupies the entire space of a page frame. Let us say, the main memory has 4096 page frames with each page frame having a size of 4096 bytes. Suppose, there is a process P1 which requires 3000 bytes of space for its execution in main memory. So, in order to execute P1, it is brought from virtual memory to main memory and placed in a page frame (F1) but P1 requires only 3000 bytes of space for its execution and as a result of which (4096 - 3000 = 1096 bytes) of space in the page frame F1 is wasted. In other words, this denotes the case of internal fragmentation in the page frame F1.
Again, external fragmentation may occur if some space of the main memory could not be included in a page frame. But this case is very rare as usually the size of a main memory, the size of a page frame as well as the total no. of page frames in main memory can be expressed in terms of power of 2.

As far as I've understood, I would answer your question like so:
Why is there internal fragmentation with paging?
Because a page has fixed size, but processes may request more or less space. Say a page is 32 units, and a process requests 20 units. Then when a page is given to the requesting process, that page is no longer useable despite having 12 units of free "internal" space.
Why is there no external fragmentation with paging?
Because in paging, a process is allowed to be allocated spaces that are non-contiguous in the physical memory. Meanwhile, the logical representation of those blocks will be contiguous in the virtual memory. This is what I mean:
A process requires 128 units of space. This is 4 pages as in the previous example. Unregardless of the actual page numbers (formally frame numbers) in the physical memory, you give those pages the numbers 0, 1, 2, and 3. This is the virtual representation that is the defining characteristic of paging itself. Those pages may be 21, 213, 23, 234 in the actual physical memory. But they can really be anything, contiguous or non-contiguous. Therefore, even if paging leaves small free spaces in between used spaces, those small free spaces can still be used together as if they were one contiguous block of space. That's why external fragmentation won't happen.

Frames are allocated as units. If the memory requirements of a process do not happen to coincide with page boundaries, the last frame allocated may not be completely full.
For example, if the page size is 2,048 bytes, a process of 72,766 bytes will need 35 pages plus 1,086 bytes. It will be allocated 36 frames, resulting in internal fragmentation of 2,048 - 1,086 = 962 bytes. In the worst case, a process would need 11 pages plus 1 byte. It would be allocated 11 + 1 frames, resulting in internal fragmentation of almost an entire frame.

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

I’m running Neo4J 2.2.1 with 150G heap space on a box with 240G. I set the neo4j.neostore.nodestore.dbms.pagecache.memory to 60G (slightly less than 75% of remaining system memory as recommended). However, when I startup I get an error that the system can’t start because I’m trying to allocate an array whose size exceeds the maximum allowed size.

Further testing indicates that it is either the node_cache_array_fraction and the relationship_cache_array_faction are causing the problem. It is supposed to default to 1%. On an 150G heap that should be 1.5G. However the array size being generated is too long.
Explicitly setting node_cache_size and relationship_cache_size seems to address this although it is far from ideal.

How to economize on memory use using the Xmx JVM option

How do I determine the lower bound for the JVM option Xmx or otherwise economize on memory without a trial and error process? I happen to set Xms and Xmx to be the same amount, which I assume helps to economize on execution time. If I set Xmx to 7G, and likewise Xms, it will happily report that all of it is being used. I use the following query:
Runtime.getRuntime().totalMemory()
If I set it to less than that, say 5GB, likewise all of it will be used. It is not until I provide very much less, say 1GB will there be an out-of-heap exception. Since my execution times are typically 10 hours or more, I need to avoid trial and error processes.

I'd execute the program with plenty of heap while monitoring heap usage with JConsole. Take note of the highest memory use after a major garbage collection, and set about maximum heap size 50% to 100% higher than that amount to avoid frequent garbage collection.
As an aside, totalMemory reports the size of the heap, not how much of it is presently used. If you set minimum and maximum heap size to the same number, totalMemory will be the same irrespective of what your program does ...

Using Xms256M and Xmx512M, and a trivial program, freeMemory is 244M and totalMemory is 245M and maxMemory is 455M. Using Xms512M and Xmx512M, the amounts are 488M, 490M, and 490M. This suggests that totalMemory is a variable amount that can vary if Xms is less than Xmx. That suggests the answer to the question is to set Xms to a small amount and monitor the highwater mark of totalMemory. It also suggests maxMemory is the ultimate heap size that cannot be exceed by the total of current and future objects.
Once the highwater mark is known, set Xmx to be somewhat more than that to be prudent -- but not excessively more because this is an economization effort -- and set Xms to be the same amount to get the time efficiency that is evidently preferred.

Does FastMM support reserving virtual memory and calling in chunks to grow an array?

I know I can reserve virtual memory using VirtualAlloc.
e.g. I can claim 1GB of virtual memory and then call in the first MB of that to put my a growing array into.
When the array grows beyond 1MB I call in the 2nd MB and so on.
This way I don't need to move the array around in memory when it grows, it just stays in place and the Intel/AMD virtual memory manager takes care of my problems.
However does FastMM support this structure, so I don't have to do my own memory management?
Pseudo code:
type
PBigarray = ^TBigarray;
TBigArray = array[0..0] of SomeRecord;
....
begin
VirtualMem:= FastMM.ReserveVirtualMemory(1GB);
PBigArray:= FastMM.ClaimPhysicalMemory(VirtualMem, 1MB);
....
procedure GrowBigArray
begin
FastMM.ClaimMorePhysicalMemory(PBigArray, 1MB {extra});
//will generate OOM exception when claim exceeds 1GB
Does FastMM support this?

No, FastMM4 (as of the latest version I looked at) does not explicitly support this. It's really not a functionality you would expect in a general purpose memory manager as it's trivially simple to do with VirtualAlloc calls.
NexusMM4 (which is part of NexusDB) does something that gives you a similar result, but without wasting all the address space before it is needed in the background.
If you make an initial large allocation (directly via GetMem, or indirectly via a dynamic array or such) the memory is allocated in just the size needed, via VirtualAlloc.
But if that allocation is then resized to a larger size, NexusMM will use a different way to allocate memory which allows it to simply unmap the allocation from the address space an remap it again, at a larger size, when further reallocs takes place.
This prevents the 2 major problems that most general purpose memory managers have when reallocating:
during a normal realloc the existing and new allocation need to be present in the address space at the same time, temporarily doubling the address space and physical memory requirements
during a normal realloc, the whole contents of the existing allocation needs to be copied
So with NexusMM you would get all the advantages of what you showed in your pseudo code (with the exception that the first realloc will involve a copy, and that growing your array might change it's address) by simply using normal GetMem/ReallocMem/FreeMem calls.

How can I find the size of the memory referenced by a pointer?

GetMem allows you to allocate a buffer of arbitrary size. Somewhere, the size information is retained by the memory manager, because you don't need to tell it how big the buffer is when you pass the pointer to FreeMem.
Is that information for internal use only, or is there any way to retrieve the size of the buffer pointed to by a pointer?

It would seem that the size of a block referenced by a pointer returned by GetMem() must be available from somewhere, given that FreeMem() does not require that you identify the size of memory to be freed - the system must be able to determine that, so why not the application developer?
But, as others have said, the precise details of the memory management involved are NOT defined by the system per se.... Delphi has always had a replaceable memory manager architecture, and the "interface" defined for compatible memory managers does not require that they provide this information for an arbitrary pointer.
The default memory manager will maintain the necessary information in whatever way suits it, but some other memory manager will almost certainly use an entirely different, if superficially similar, mechanism, so even if you hack a solution based on intimate knowledge of one memory manager, if you change the memory manager (or if it is changed for you, e.g. by a change in thesystem defined, memory manager which you perhaps are using by default, as occurred between Delphi 2005 and 2006, for example) then your solution will almost certainly break.
In general, it's not an unreasonable assumption on the part of the RTL/memory manager that the application should already know how big a piece of memory a GetMem() allocated pointer refers to, given that the application asked for it in the first place! :)
And if your application did NOT allocate the pointer then your application's memory manager has absolutely no way of knowing how big the block it references may be. It may be a pointer into the middle of some larger block, for example - only the source of the pointer can possibly know how it relates to the memory it references!
But, if your application really does need to maintain such information about it's own pointers, then it could of course easily devise a means to achieve this with a simple singleton class or function library through which GetMem()/FreeMem() requests are routed, to maintain a record of the associated requested size for each current allocated pointer. Such a mechanism could then of course easily expose this information as required, entirely reliably and independently of whatever memory manager is in use.
This may in face be the only option if an "accurate" record is required , as a given memory manager implementation may allocate a larger block of memory for a given size of data than is actually requested. I do not know if any memory manager does in fact do this, but it could do so in theory, for efficiency sake.

It is for internal use as it depends on the MemoryManager used. BTW, that's why you need to use the pair GetMem/FreeMem from the same MemoryManager; there is no canonical way of knowing how the memory has been reserved.
In Delphi, if you look at FastMM4, you can see that the memory is allocated in small, medium or large blocks:
the small blocks are allocated in pools of fixed size blocks (block size is defined at the pool level in the block type)
TSmallBlockType = packed record
{True = Block type is locked}
BlockTypeLocked: Boolean;
{Bitmap indicating which of the first 8 medium block groups contain blocks
of a suitable size for a block pool.}
AllowedGroupsForBlockPoolBitmap: byte;
{The block size for this block type}
BlockSize: Word;
the medium blocks are also allocated in pools but have a variable size
{Medium block layout:
Offset: -8 = Previous Block Size (only if the previous block is free)
Offset: -4 = This block size and flags
Offset: 0 = User data / Previous Free Block (if this block is free)
Offset: 4 = Next Free Block (if this block is free)
Offset: BlockSize - 8 = Size of this block (if this block is free)
Offset: BlockSize - 4 = Size of the next block and flags
{Get the block header}
LBlockHeader := PCardinal(Cardinal(APointer) - BlockHeaderSize)^;
{Get the medium block size}
LBlockSize := LBlockHeader and DropMediumAndLargeFlagsMask;
the large blocks are allocated individually with the required size
TLargeBlockHeader = packed record
{Points to the previous and next large blocks. This circular linked
list is used to track memory leaks on program shutdown.}
PreviousLargeBlockHeader: PLargeBlockHeader;
NextLargeBlockHeader: PLargeBlockHeader;
{The user allocated size of the Large block}
UserAllocatedSize: Cardinal;
{The size of this block plus the flags}
BlockSizeAndFlags: Cardinal;
end;

Is that information for internal use only, or is there any way to retrieve the size of the buffer pointed to by a pointer?
Do these two `alternatives' contradict each other?
It's for internal use only.

There is some information before the allocated area to store meta information. This means, each time you allocate a piece of memory, a bigger piece is allocated and the first bytes are used for meta information. The returned pointer is to the block following this meta information.
I can imagine that the format is changed with an other version of the memory manager so don't count on this.

That information is for internal use only.
Note that memory managers doesn't need to store the size as part of the memory returned, many memory managers will store it in an internal table and use the memory address of the start of the chunk given out as a lookup key in that table.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart