For demo purposes, I am running Neo4j in a low memory environment -- A laptop with 4GB of RAM, 1644MB is use for video memory, leaving only 2452 MB available for use.. It's also running SQL Server, our WCF services, and our clients.. So there's little memory for Neo4j.
I'm running LOAD CSV cypher scripts via REST from a C# service. There are more than 20 scripts, and theyt work well in a server environment. I've written code to paginate, so that they run in smaller batches. I've reduced the batch size very low ( 25 csv rows ) and a given script may do 300 batches, but I continue to get "Java heap space" errors at some point.
I've tried configuring Neo4j with a relatively large heap space ( 640MB ) which is all the available RAM size plus setting the cache_type to none, and it gets much further before I get the java heap space error. What I don't understand is in that case, why does it grow that much? Also until I restart the neo4j service, I get these java heap space errors quickly. The batch size doesn't seem to impact how much memory is used appreciably.
However, after doing that, and I run the application with these settings, the query performance becomes very slow due to the cache settings.
I am running this on a Windows 7 laptop with 4G RAM -- using Neo4j 2.2.1 Community Edition.
Thoughts?
Perhaps you can share your LOAD CSV statement and the other queries you run.
I think you just run into this:
http://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
So PROFILE or EXPLAIN your queries and make it not to use that much intermediate state. We can help if you share your statements.
And you should use PERIODIC COMMIT 100.
Something like:
heap=512M
dbms.pagecache.memory=200M
keep_logical_logs=false
cache_type=none
http://console.neo4j.org runs neo4j in memory putting up to 50 instances in a single gigabyte of memory. So it should be doable.
Related
Neo4j 3.5.12 Community Edition
Ubuntu Server 20.04.2
RAM: 32 Gb
EC2 instance with 4 or 8 CPUs (I change it to accommodate for processing at the moment)
Database files: 6.5Gb
Python, WSGI, Flask
dbms.memory.heap.initial_size=17g
dbms.memory.heap.max_size=17g
dbms.memory.pagecache.size=11g
I'm seeing high CPU use on the server in what appears to be a random pattern. I've profiled all the queries for the pages that I know that people are visiting at those times and they are all optimised with executions under 50ms in all cases. The CPU use doesn't seem linked with user numbers which are very low at most times anyway (max 40 concurrent users). I've checked all queries in cron jobs too.
I reduced the database notes significantly and that made no difference to performance.
I warm the database by preloading all nodes into ram with MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
The pattern is that there will be a few minutes of very low CPU use (as I would expect from this setup with these user numbers) and then processing on most CPU cores goes up to the high 90%s and the machine becomes unresponsive to new requests. Changing to an 8CPU instance sorts it, but shouldn't be needed for this level of traffic.
I would like to profile the queries with query logging, but the community edition doesn't support that.
Thanks.
Run a CPU profiler such as perf to record where CPU time is spent. You can then visualize it as a FlameGraph or, since your bursts only occur at random intervals, visualize it over time with Netflix' FlameScope
Since Neo4j is a Java application, it might also be worthwhile to have a look at async-profiler which is priceless when it comes to profiling Java applications (and it generates similar FlameGraphs and can output log files compatible with FlameScope or JMC)
Getting started with latest DSE, trying to setup an initial DSE solr cluster and wanting to make sure basic capacity needs are met. In following docs I have done some initial capacity testing following directions here:
http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/srch/srchCapazty.html
My test single node setup is on AWS, m3.xl, 80GB raid0 for the two 40GB ssd's, latest DSE installed
I have inserted a total of 6MM example records and run some solr searches which would be similar to that which production would be running.
Have the following numbers for my 6MM records:
6MM Records
7.6GB disk (Cassandra + solr)
2.56GB solr index size
96.2MB solr field cache(totalReadableMemSize)
25.57MB solr Heap
I am trying to plan out an initial starter cluster, would like to plan for around 250MM records stored and indexed to start. Read load will be pretty minimal in the early days, so not too worried about read throughput to start.
Following the capacity planning doc page and some numbers for 250MM from 6MM looks like base requirements for dataset would be:
250MM Records
106GB solr index size
317GB disk (Cassandra + solr)
4GB solr field cache(totalReadableMemSize)
1.1GB solr Heap
So some questions looking for some guidance on and if I am understanding docs correctly:
Should I be targeting ~360GB+ storage to be safe and not exceed 80% disk capacity on average as data set grows?
Should I use nodes that can allocate 6GB for solr + XGB for Cassandra? (ie: if entire solr index for 250MM is around 6GB for heap and field cache, and I partition across 3 nodes with replication)
With ~6GB for solr, how much should I try to dedicate to Cassandra proper?
Anything else to consider with planning (will be running on AWS)?
UPDATED (11/6) - Notes/suggestions from phact
With Cass+Solr running together, will target prescribed 14GB for each node for base operation, moving to targeted 30GB memory nodes on AWS, leaving 16GB for OS, solr index, solr field cache
I added the solr index size to numbers above, if suggested target to keep most/all index in memory seems I might need to target AT LEAST 8 nodes to start, with 30GB memory per node.
Seems like a good amount of extra overhead for solr nodes for targeting index in memory, might have to re-consider approach
DSE heap on a Solr node
The recommended heap size for a DSE node running solr is 14gb. This is because Solr and Cassandra actually run in the same JVM. You don't have to allocate memory for Solr separately.
AWS M3.xl
m3.xl's with 15gb ram will be a bit tight with a 14gb heap. However, if your workload is relatively light, you can probably get away with a 12gb heap on your solr nodes.
OS page cache
You do want to make sure that you are able to at least fit your Solr indexes in OS page cache (memory left over after subtracting out your heap -- assuming this is a dedicated box). Ideally, you will also have room for Cassandra to store some of your frequently read rows in page cache.
A quick and dirty way of figuring out how big your indexes are is checking the size of your index directory on your file system. Make sure to forecast out/extrapolate if you're expecting your data to grow. You can also check the index size for each of your cores as follows:
http://localhost:8983/solr/admin/cores?action=STATUS&memory=true
Note - each node should hold it's index in memory, not the entire cluster's index.
Storage
Yes you do want to ensure your disks are not over utilized or you may face issues during compaction. In theory--worse case scenario--tiered compaction could require up to 50% of your disk to be free. This is not common though, see more details here.
Whenever I try to run cypher queries in Neo4j browser 2.0 on large (anywhere from 3 to 10GB) batch-imported datasets, I receive an "Unknown Error." Then Neo4j server stops responding, and I need to exit out using Task Manager. Prior to this operation, the server shuts down quickly and easily. I have no such issues with smaller batch-imported datasets.
I work on a Win 7 64bit computer, using the Neo4j browser. I have adjusted the .properties file to allow for much larger memory allocations. I have configured my JVM heap to 12g, which should be fine for 64bit JDK. I just recently doubled my RAM, which I thought would fix the issue.
My CPU usage is pegged. I have the logs enabled but I don't know where to find them.
I really like the visualization capabilities of the 2.0.4 browser, does anyone know what might be going wrong?
Your query is taking a long time, and the web browser interface reports "Unknown Error" after a certain timeout period. The query is still running, but you won't see the results in the browser. This drove me nuts too when it first happened to me. If you run the query in the neo4j shell you can verify whether or not this is the problem, because the shell won't time out.
Once this timeout occurs, you can find that the whole system becomes quite non-responsive, especially if you re-run the query, because now you have two extremely long queries running in parallel!
Depending on the type of query, you may be able to improve performance. Sometimes it's as simple as limiting the number of returned nodes (in cases where you only need to find one node or path).
Hope this helps.
Grace and peace,
Jim
I am running some very large databases (500 MB and 300 MB) in my application on several different machines.
From a hardware perspective, the machines have been identically configured.
I am using SQL Server CE 4.0 as my DBMS.
The performance critical query has been indexed to improve its performance.
The problem is that on [only] one of the machines, I am observing egregiously slow query performance. This usually happens after a long period of time of inactivity (from a query perspective). After I do several (about 7-8) queries, the slow performance disappears.
The weird thing is that this initial slow query performance does not happen on the other machine.
The only difference between the two machines is the data contained inside the databases.
I suspect that the distribution of data on the slow machine is somehow reducing the effectiveness of the indexing and that SQL Server CE has to rebalance the indexing in a much more significant way than on the other faster machine.
One thing I notice is that when the query is very slow, the disk activity increases significantly and the process corresponding to reading the database shows a spike in the read bytes.
This does not happen on the other machine.
Does anyone know how I might go about root causing this issue?
My code is written in C++ and uses the ATL/OLEDB API to manipulate the database.
UPDATE: My performance profiling activities indicate that it's not the query itself that is slow - it is the processing of the returned rowset that takes a while. For each row returned, I query another database for related data. I understand that this is not the right way to do it but the performance problem only happens on one machine. One thing I noticed is that when I have other unrelated queries happening on the same database in other threads, the unrelated queries will stall the query that is exhibiting the performance problem.
I use rails - tire - elasticsearch, everything is mainly working very well, but just from time to time, my server start to be very slow. So I have to restart elasticsearch service and then everything go fine again.
I have impression that it happens after bulk inserts (around 6000 products). Can it be linked? Inserts last like 2 min max, but still after server has problem
EDIT :
finally it is not linked to bulk inserts
I have only this line in log
[2013-06-29 01:15:32,767][WARN ][monitor.jvm ] [Jon Spectre] [gc][ParNew][26438][9941] duration [3.4s], collections [1]/[5.2s], total [3.4s]/[57.7s], memory [951.6mb]->[713.7mb]/[989.8mb], all_pools {[Code Cache] [10.6mb]->[10.6mb]/[48mb]}{[Par Eden Space] [241.1mb]->[31mb]/[273mb]}{[Par Survivor Space] [32.2mb]->[0b]/[34.1mb]}{[CMS Old Gen] [678.3mb]->[682.6mb]/[682.6mb]}{[CMS Perm Gen] [35mb]->[35mb]/[166mb]}
Does someone understand this ?
This is just stabbing in the dark, but from what you report, there might be a bad memory setting for your java virtual machine.
ElasticSearch is built with Java and so runs on a JVM. Each JVM process has a defined set of memory to allocate when you start it up. When the available memory is not enough, it crashes, so it has to do garbage collection to free up space. When you run a Java process on the memory limit, it is occupied with a lot of GC runs and will get very slow.
You can have a look at the java jmx management console for what the process is doing and how much memory it has.