I Recently created a 19c Enterprise DB on a Windows server with 54GB of memory. I allocated 48GB to the DB. Today I increased the physical memory to 116GB. I would like to increase the memory utilization to 106GB.
When investigating the memory parameters I found the SGA_LIMIT and MEMORY_MAX_TARGET set to 0. pga_aggregate_target was set to 12205M. Following guidance from Oracle documentation i made the following changes;
alter system set memory_max_target = 104G scope=spfile;
restart the DB
alter system set SGA_TARGET = 0 scope=both;
alter system set PGA_AGGREGATE_TARGET = 0 scope=both;
Task manager showed the memory utilization at 41GB. So I ran a few queries with large sorts like;
select * from table_with_over_30M_rows order by 1,2,3,4;
select * from table_with_over_30M_rows t1 inner join other_table_with_45M_rows t2 on t2.id=t1.id;
This caused the Task Manager memory utilization to increase to 48GB. I have allowed users back on and the memory utilization has not increased above 48GB.
I believe there something limiting the memory. I have checked all the parameters I can think of and nothing sticks out. My questions;
What should I look for regarding parameters?
How can I try to force additional memory to be consumed?
What other troubleshooting steps can I take?
Related
I am running Neo4j (v4.1.5) community edition on a server node with 64GB RAM.
I set the heap size configuration as follows:
dbms.memory.heap.initial_size=31G
dbms.memory.heap.max_size=31G
During the ingestion via bolt, I got the following error:
{code: Neo.TransientError.General.TransactionMemoryLimit} {message:
Can't allocate extra 512 bytes due to exceeding memory limit;
used=2147483648, max=2147483648}
What I don't understand is that the max in the error message shows 2GB, while I've set the initial and max heap size to 31GB. Can someone help me understand how memory setting works in Neo4j?
It turned out that the default transaction memory allocation for this version was OFF_HEAP. Meaning that all the transactions were executed off heap with 2GB max. Adding the following setting in Neo4j resolved the issue:
dbms.tx_state.memory_allocation=ON_HEAP
I'm not sure why OFF_HEAP is the default setting while Neo4j manual recommends ON_HEAP setting:
When executing a transaction, Neo4j holds not yet committed data, the result, and intermediate states of the queries in memory. The size needed for this is very dependent on the nature of the usage of Neo4j. For example, long-running queries, or very complicated queries, are likely to require more memory. Some parts of the transactions can optionally be placed off-heap, but for the best performance, it is recommended to keep the default with everything on-heap.
We have an application that writes lots of data in a Xodus database. It actually writes so much data that the garbage collector cannot keep up with freeing old files.
My question therefor is, are there any recommended settings for "maximum GC", or is there a way to force Xodus to "stop" (disallow writes) and do a full garbage collection at some point during the night?
Edit (requested information)
Non-default settings:
GcFilesDeletionDelay = 0
GcMinUtilization = 75
GcRunEvery = 1 (for testing)
GcRunPeriod = 1 (for testing)
GcTransactionAcquireTimeout = 1000
GcTransactionTimeout up to 120000 (for testing)
Fiddling with these settings has not increased GC throughput in a relevant way
What we do:
We have a single import thread that writes exclusively to an environment store. Data is written permanently throughout the day.
There are many parallel threads that read the data using read only transactions.
The data basically is measurement data in the form (location, values...)
50% of values get updated every day
There are roughly 8 million records in the database, the database currently has 122 GB on disk with 75% free space (as printed by the GC)
The VM has 20 GB of RAM, the environment store may use up to 25%
I'm trying to query ~8 million nodes in a neo4j database. I can do queries which hit the index for exact matches easily enough, but is there a performant way to do aggregations?
MATCH (r:Resident) RETURN r.forename, count(r.forename) ORDER BY count(r.forename)
This query just sits there until I eventually restart my server. I've read the performance guides and I'm watching vm_stat and it seems to be quickly running out of pages free. I've tried tuning the memory / JVM heap settings to various things, but I'm not sure I completely know what I'm doing ;) I've got an 8 GB MacBook Air with an SSD drive in case that's helpful for suggesting settings. Also, here's my stats on my DB from webadmin:
10,236,226 nodes
56,280,161 properties
10,190,430 relationships
2 relationship types
14,535 MB database disk usage
I inserted 8M nodes with just 1 prop, and got this query down to ~20s without changing the default settings (after warming up the cache--first time took 90s), which is comparable to other databases like postgres (which I also tested).
Some things you could try to do:
raise the sizes on the appropriate files mmio settings (as per the file sizes in data/graph.db/) in conf/neo4j.properties (at the top)
increase the node cache size in neo4j.properties
increase the heap init/max in neo4j-wrapper.conf
make sure you have enough RAM left over
How do I improve performance when writing to neo4j. I currently have neo4j set up on a server and I am currently running it in embedded more. I believe my configurations are storing all the content of my graph database in memory based upon configurations I've found online
neostore.nodestore.db.mapped_memory=0
neostore.relationship.db.mapped_memory=0
neostore.propertystore.db.mapped_memory=0
neostore.propertystore.db.strings.mapped_memory=0
neostore.propertystore.db.arrays.mapped_memory=0
neostore.propertystore.db.index.keys.mapped_memory=0
neostore.propertystore.db.index.mapped_memory=0
node_auto_indexing=true
node_keys_indexable=type,id
cache_type=strong
use_memory_mapped_buffers=false
node_cache_size=12G
relationship_cache_size=12G
node_cache_array_fraction=10
relationship_cache_array_fraction=10
Please let me know if this is incorrect. The problem that I am encountering is that when I try to persist information to the graph database. It appears that those times are not very quick in comparison to our MYSQL times of the samething(ex. to add 250 items would take about 3sec and in MYSQL it takes 1sec) . I read online that when you have multiple indexes that that can slow down performance on persisting data so I am working on that right now to see if that is my culprit. But, I just wanted to make sure that my configurations seem to be inline when it comes to running your graph database in memory.
Second question to this topic. Okay, if my configurations are good and my database is indeed in memory, then is there a way to optimize persisting data just in case this isn't the silver bullet. If we ran one thread against our test that executes this functionality, oppose to 10 threads, its seems like the times for execution bubbles up
ex.( thread 1 finishes 1s, thread 2 finishes 2s, thread 3 finishes 3s,etc). Is there some special multithreaded configuration that I am missing to improve the performance when mulitple threads are hitting it at one time.
Neo4J version
1.9.1-enterprise
My Jvm configs are
-Xms25G -Xmx25G -XX:+UseNUMA -XX:+UseSerialGC
My Machine Specs:
File system type ext3
You cache arguments are invalid.
node_cache_size=12G
relationship_cache_size=12G
node_cache_array_fraction=10
relationship_cache_array_fraction=10
These can only be used with the GCR cache. Setting the cache isn't going to put everything in memory for you at start up, you will have to write code to do this for you. Something like this:
GlobalGraphOperations ggo = GlobalGraphOperations.at(graphDatabaseFactory);
for (Node n : ggo.getAllNodes()) {
for (String propertyKey : n.getPropertyKeys()) {
n.getProperty(propertyKey);
}
for (Relationship relationship : n.getRelationships()) {
}
}
Beware with the strong cache, if you have a lot of nodes/relationships, eventually your cache will become large and performing GC against it will cause long pauses in your system.
My recommendation would be to use the memory mapped files, as this is an OS handled and will be outside of heap space. It doesn't provide near the speed of caching, but it will provide a speed up if you have to read from the neo store.
I am using R on some relatively big data and am hitting some memory issues. This is on Linux. I have significantly less data than the available memory on the system so it's an issue of managing transient allocation.
When I run gc(), I get the following listing
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2147186 114.7 3215540 171.8 2945794 157.4
Vcells 251427223 1918.3 592488509 4520.4 592482377 4520.3
yet R appears to have 4gb allocated in resident memory and 2gb in swap. I'm assuming this is OS-allocated memory that R's memory management system will allocate and GC as needed. However, lets say that I don't want to let R OS-allocate more than 4gb, to prevent swap thrashing. I could always ulimit, but then it would just crash instead of working within the reduced space and GCing more often. Is there a way to specify an arbitrary maximum for the gc trigger and make sure that R never os-allocates more? Or is there something else I could do to manage memory usage?
In short: no. I found that you simply cannot micromanage memory management and gc().
On the other hand, you could try to keep your data in memory, but 'outside' of R. The bigmemory makes that fairly easy. Of course, using a 64bit version of R and ample ram may make the problem go away too.