java.lang.OutOfMemoryError: Requested array size exceeds VM limit - neo4j

I’m running Neo4J 2.2.1 with 150G heap space on a box with 240G. I set the neo4j.neostore.nodestore.dbms.pagecache.memory to 60G (slightly less than 75% of remaining system memory as recommended). However, when I startup I get an error that the system can’t start because I’m trying to allocate an array whose size exceeds the maximum allowed size.

Further testing indicates that it is either the node_cache_array_fraction and the relationship_cache_array_faction are causing the problem. It is supposed to default to 1%. On an 150G heap that should be 1.5G. However the array size being generated is too long.
Explicitly setting node_cache_size and relationship_cache_size seems to address this although it is far from ideal.

Related

Neo4j TransactionMemoryLimit

I am running Neo4j (v4.1.5) community edition on a server node with 64GB RAM.
I set the heap size configuration as follows:
dbms.memory.heap.initial_size=31G
dbms.memory.heap.max_size=31G
During the ingestion via bolt, I got the following error:
{code: Neo.TransientError.General.TransactionMemoryLimit} {message:
Can't allocate extra 512 bytes due to exceeding memory limit;
used=2147483648, max=2147483648}
What I don't understand is that the max in the error message shows 2GB, while I've set the initial and max heap size to 31GB. Can someone help me understand how memory setting works in Neo4j?
It turned out that the default transaction memory allocation for this version was OFF_HEAP. Meaning that all the transactions were executed off heap with 2GB max. Adding the following setting in Neo4j resolved the issue:
dbms.tx_state.memory_allocation=ON_HEAP
I'm not sure why OFF_HEAP is the default setting while Neo4j manual recommends ON_HEAP setting:
When executing a transaction, Neo4j holds not yet committed data, the result, and intermediate states of the queries in memory. The size needed for this is very dependent on the nature of the usage of Neo4j. For example, long-running queries, or very complicated queries, are likely to require more memory. Some parts of the transactions can optionally be placed off-heap, but for the best performance, it is recommended to keep the default with everything on-heap.

What are the implications of a maximum memory limits in WebAssembly?

When I declare a memory section in a WebAssembly then I have to set the initial size and I can set the optional maximum size.
Does it have any advantages if I set the maximum to the same value as the initial value? What are the implications of this value for WebAssembly runtime?
Background: I write a Java to WebAssembly compiler and want to use the coming GC feature for my data. I does not need to grow the memory. I would use it only for constant values.
Allocating a large memory (especially when it's gigabytes) may fail. Failed to allocate the initial memory is a fatal error, while failed to grow the memory later is not. So it is a good idea to start with a smaller and safe initial size.
The WebAssembly communities already provides pretty good documentations:
Linear Memory Resizing in WebAssembly Design Rationale. (Highly recommended if you are writing a compiler.)
Resizing memory in Modules page.
I will sum up the information on how WebAssembly memory works here.
Why we need the optional maximum size
The underlying WebAssembly Memory is a JS ArrayBuffer object. ArrayBuffer is not a dynamic array, meaning it cannot be resized. However, Wasm Memory is a special ArrayBuffer that can be resized by Memory.grow() call, which is correspondent to grow_memory instruction in Wasm. Still, implementation of resizing ArrayBuffer costs a lot - It is the same as realloc(), which allocate a new buffer with the new size then deallocate the old buffer. You may avoid the overhead of reallocating the buffer by allocating a large initial memory but it causes another problem that the operation may fail and failed to do so means the Wasm engine failed to load the Wasm binary.
The optional maximum size solves those problems. When the maximum size is defined, the Wasm Memory tries to pre-allocate the maximum size of the buffer. By pre-allocating the buffer you can resize the buffer later without an expensive realloc() operation. It is okay even if the pre-allocating operation failed - you can try reallocte later when you need.
WebAssembly Memory Resizing Scenarios
grow_memory without setting the maximum size: The Wasm engine tries to reallocate the whole buffer, which is very expensive and increases possibilities to fail.
Allocating a big initial memory: It may fail and its failure is a fatal error.
grow_memory with the maximum size: The engine will use pre-allocated buffer instantly. Even if it failed to grow it is not a fatal error.
Setting initial size == maximum size: You won't benefit from any of them. It may experience a fatal error, and you cannot resize it later neither.

How to economize on memory use using the Xmx JVM option

How do I determine the lower bound for the JVM option Xmx or otherwise economize on memory without a trial and error process? I happen to set Xms and Xmx to be the same amount, which I assume helps to economize on execution time. If I set Xmx to 7G, and likewise Xms, it will happily report that all of it is being used. I use the following query:
Runtime.getRuntime().totalMemory()
If I set it to less than that, say 5GB, likewise all of it will be used. It is not until I provide very much less, say 1GB will there be an out-of-heap exception. Since my execution times are typically 10 hours or more, I need to avoid trial and error processes.
I'd execute the program with plenty of heap while monitoring heap usage with JConsole. Take note of the highest memory use after a major garbage collection, and set about maximum heap size 50% to 100% higher than that amount to avoid frequent garbage collection.
As an aside, totalMemory reports the size of the heap, not how much of it is presently used. If you set minimum and maximum heap size to the same number, totalMemory will be the same irrespective of what your program does ...
Using Xms256M and Xmx512M, and a trivial program, freeMemory is 244M and totalMemory is 245M and maxMemory is 455M. Using Xms512M and Xmx512M, the amounts are 488M, 490M, and 490M. This suggests that totalMemory is a variable amount that can vary if Xms is less than Xmx. That suggests the answer to the question is to set Xms to a small amount and monitor the highwater mark of totalMemory. It also suggests maxMemory is the ultimate heap size that cannot be exceed by the total of current and future objects.
Once the highwater mark is known, set Xmx to be somewhat more than that to be prudent -- but not excessively more because this is an economization effort -- and set Xms to be the same amount to get the time efficiency that is evidently preferred.

Maximum memory allocation on openCL CPU

I have read that there's a limit to the maximum memory allocation to around 60% of device memory, and these can be changed by modifying the GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables for GPU.
I am wonder if the AMD SDK has something similar for the CPU if I want to raise the limit of memory allocation?
For my current configuration, it returns the following:
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 2973.37MB
CL_DEVI_CEGLOBAL_MEM_SIZE = 11893.5MB
Thanks.
I was able to change this on my system. I don't know if this method was possible when you originally asked the question.
set the environment variable 'CPU_MAX_ALLOC_PERCENT' to the percentage of total memory you want to be able to allocate for a single global buffer. I have 8GB system memory, and after setting CPU_MAX_ALLOC_PERCENT to 80, clinfo reports the following:
Max memory allocation: 6871207116
Success! 6.399GB
You can also use GPU_MAX_ALLOC_PERCENT in the same way for your GPU devices.

CL_OUT_OF_RESOURCES for 2 millions floats with 1GB VRAM?

It seems like 2 million floats should be no big deal, only 8MBs of 1GB of GPU RAM. I am able to allocate that much at times and sometimes more than that with no trouble. I get CL_OUT_OF_RESOURCES when I do a clEnqueueReadBuffer, which seems odd. Am I able to sniff out where the trouble really started? OpenCL shouldn't be failing like this at clEnqueueReadBuffer right? It should be when I allocated the data right? Is there some way to get more details than just the error code? It would be cool if I could see how much VRAM was allocated when OpenCL declared CL_OUT_OF_RESOURCES.
I just had the same problem you had (took me a whole day to fix).
I'm sure people with the same problem will stumble upon this, that's why I'm posting to this old question.
You propably didn't check for the maximum work group size of the kernel.
This is how you do it:
size_t kernel_work_group_size;
clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel_work_group_size, NULL);
My devices (2x NVIDIA GTX 460 & Intel i7 CPU) support a maximum work group size of 1024, but the above code returns something around 500 when I pass my Path Tracing kernel.
When I used a workgroup size of 1024 it obviously failed and gave me the CL_OUT_OF_RESOURCES error.
The more complex your kernel becomes, the smaller the maximum workgroup size for it will become (or that's at least what I experienced).
Edit:
I just realized you said "clEnqueueReadBuffer" instead of "clEnqueueNDRangeKernel"...
My answer was related to clEnqueueNDRangeKernel.
Sorry for the mistake.
I hope this is still useful to other people.
From another source:
- calling clFinish() gets you the error status for the calculation (rather than getting it when you try to read data).
- the "out of resources" error can also be caused by a 5s timeout if the (NVidia) card is also being used as a display
- it can also appear when you have pointer errors in your kernel.
A follow-up suggests running the kernel first on the CPU to ensure you're not making out-of-bounds memory accesses.
Not all available memory can necessarily be supplied to a single acquisition request. Read up on heap fragmentation 1, 2, 3 to learn more about why the largest allocation that can succeed is for the largest contiguous block of memory and how blocks get divided up into smaller pieces as a result of using the memory.
It's not that the resource is exhausted... It just can't find a single piece big enough to satisfy your request...
Out of bounds acesses in a kernel are typically silent (since there is still no error at the kernel queueing call).
However, if you try to read the kernel result later with a clEnqueueReadBuffer(). This error will show up. It indicates something went wrong during kernel execution.
Check your kernel code for out-of-bounds read/writes.

Resources