Neo4j node creation speed - neo4j

I have a fresh neo4j setup on my laptop, and creating new nodes via the REST API seems to be quite slow (~30-40 ms average). I've Googled around a bit, but can't find any real benchmarks for how long it "should" take; there's this post, but that only lists relative performance, not absolute performance. Is neo4j inherently limited to only adding ~30 new nodes per second (outside of batch mode), or is there something wrong with my configuration?
Config details:
Neo4j version 2.2.5
Server is on my mid-end 2014 laptop, running Ubuntu 15.04
OpenJDK version 1.8
Calls to the server are also from my laptop (via localhost:7474), so there shouldn't be any network latency involved
I'm calling neo4j via Clojure/Neocons; method used is "create" in the class clojurewerkz.neocons.rest.nodes
Using Cypher seems to be even slower; eg. calling "PROFILE CREATE (you:Person {name:"Jane Doe"}) RETURN you" via the HTML interface returns "Cypher version: CYPHER 2.2, planner: RULE. 5 total db hits in 54 ms."

Neo4j performance charasteristics is a tricky area.
Mesuring performance
First of all: it all depends a lot on how server is configured. Measuring anything on laptop is wrong way to do it.
Befor measuring performance you should check following:
You have appropriate server hardware (requirements)
Client and server are in local network.
Neo4j is properly configured (memory mapping, webserver thread pool, java heap size and etc)
Server is properly configured (Linux tcp stack, maximum available open files and etc)
Server is warmed up. Neo4j is written in Java, so you should do appropriate warmup before measuring numbers (i.e. making some load for ~15 minutes).
And last one - enterprise edition. Neo4j enterprise edition has some advanced features that can improve performance a lot (i.e. HPC cache).
Neo4j internally
Neo4j internally is:
Storage
Core API
Traversal API
Cypher API
Everything is performed without any additional network requests. Neo4j server is build on top of this solid foundation.
So, when you are making request to Neo4j server, you are measuring:
Latency between client and server
JSON serialization costs
Web server (Jetty)
Additional modules that are intended for managing locks, transaction and etc
And Neo4j itself
So, bottom line here is - Neo4j is pretty fast by itself, if used in embedded mode. But dealing with Neo4j server involved additional costs.
Numbers
We had internal Neo4j testing. We measured several cases.
Create nodes
Here we are using vanilla Transactional Cypher REST API.
Threads: 2
Node per transaction: 1000
Execution time: 1635
Total nodes created: 7000000
Nodes per second: 7070
Threads: 5
Node per transaction: 750
Execution time: 852
Total nodes created: 7000000
Nodes per second: 8215
Huge database sync
This one uses custom developed unmanaged extension, with binary protocol between server and client and some concurrency.
But this is still Neo4j server (in fact - Neo4j cluster).
Node count: 80.32M (80 320 000)
Relationship count: 80.30M (80 300 000)
Property count: 257.78M (257 780 000)
Consumed time: 2142 seconds
Per second:
Nodes - 37497
Relationships - 37488
Properties - 120345
This numbers shows true Neo4j power.
My numbers
I tried to measure performance right now
Fresh and unconfigured database (2.2.5), Ubuntu 14.04 (VM).
Results:
$ ab -p post_loc.txt -T application/json -c 1 -n 10000 http://localhost:7474/db/data/node
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: Jetty(9.2.4.v20141103)
Server Hostname: localhost
Server Port: 7474
Document Path: /db/data/node
Document Length: 1245 bytes
Concurrency Level: 1
Time taken for tests: 14.082 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 14910000 bytes
Total body sent: 1460000
HTML transferred: 12450000 bytes
Requests per second: 710.13 [#/sec] (mean)
Time per request: 1.408 [ms] (mean)
Time per request: 1.408 [ms] (mean, across all concurrent requests)
Transfer rate: 1033.99 [Kbytes/sec] received
101.25 kb/s sent
1135.24 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 19
Processing: 1 1 1.3 1 53
Waiting: 0 1 1.2 1 53
Total: 1 1 1.3 1 54
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 3
99% 4
100% 54 (longest request)
This one creates 10000 nodes using REST API, with no properties in 1 thread.
As you can see, event on my laptop in Linux VM, with default settings - Neo4j is able to create nodes in 4ms or less (99%).
Note: I have warmed up database before (created and deleted 100K nodes).
Bolt
If you are looking for best Neo4j performance, you should follow Bolt development. This is new binary protocol for Neo4j server.
More info: here, here and here.

One other thing to try is to run ./bin/neo4j-shell. Since there's no HTTP connection it can help you understand how much is Neo4j and how much is from the HTTP interface.
When I do that on 2.2.2 my CREATEs are generally around 10ms.
I'm not sure what the ideal is and if there is configuration which can improve the performance.

Related

What is the best way to performance test an SQS consumer to find the max TPS that one host can handle?

I have a SQS consumer running in EventConsumerService that needs to handle up to 3K TPS successfully, sometimes upwards of 20K TPS (or 1.2 million messages per minute). For each message processed, I make a REST call to DataService's TCP VIP. I'm trying to perform a load test to find the max TPS that one host can handle in EventConsumerService without overstraining:
Request volume on dependencies, DynamoDB storage, etc
CPU utilization in both EventConsumerService and DataService
Network connections per host
IO stats due to overlogging
DLQ size must be minimal, currently I am seeing my DLQ growing to 500K messages due to 500 Service Unavailable exceptions thrown from DataService, so something must be wrong.
Approximate age of oldest message. I do not want a message sitting in the queue for over X minutes.
Fatals and latency of the REST call to DataService
Active threads
This is how I am performing the performance test:
I set up both my consumer and the other service on one host, the reason being I want to understand the load on both services per host.
I use a TPS generator to fill the SQS queue with a million messages
The EventConsumerService service is already running in production. Once messages started filling the SQS queue, I immediately could see requests being sent to DataService.
Here are the parameters I am tuning to find messagesPolledPerSecond:
messagesPolledPerSecond = (numberOfHosts * numberOfPollers * messageFetchSize) * (1000/(sleepTimeBetweenPollsPerMs+receiveMessageTimePerMs))
messagesInSurge / messagesPolledPerSecond = ageOfOldestMessageSLA
ageOfOldestMessage + settingsUpdatedLatency < latencySLA
The variables for SqsConsumer which I kept constant are:
numberOfHosts = 1
ReceiveMessageTimePerMs = 60 ms? It's out of my control
Max thread pool size: 300
Other factors are all game:
Number of pollers (default 1), I set to 150
Sleep time between polls (default 100 ms), I set to 0 ms
Sleep time when no messages (default 1000 ms), ???
message fetch size (default 1), I set to 10
However, with the above parameters, I am seeing a high amount of messages being sent to the DLQ due to server errors, so clearly I have set values to be too high. This testing methodology seems highly inefficient, and I am unable to find the optimal TPS that does not cause such a tremendous number of messages to be sent to the DLQ, and does not cause such a high approximate age of the oldest message.
Any guidance is appreciated in how best I should test. It'd be very helpful if we can set up a time to chat. PM me directly

Spark JobServer, memory settings for release

I've set up a spark-jobserver to enable complex queries on a reduced dataset.
The jobserver executes two operations:
Sync with the main remote database, it makes a dump of some of the server's tables, reduce and aggregates the data, save the result as a parquet file and cache it as a sql table in memory. This operation will be done every day;
Queries, when the sync operation is finished, users can perform SQL complex queries on the aggregated dataset, (eventually) exporting the result as csv file. Every user can do only one query at time, and wait for its completion.
The biggest table (before and after the reduction, which include also some joins) has almost 30M of rows, with at least 30 fields.
Actually I'm working on a dev machine with 32GB of ram dedicated to the job server, and everything runs smoothly. Problem is that in the production one we have the same amount of ram shared with a PredictionIO server.
I'm asking how determine the memory configuration to avoid memory leaks or crashes for spark.
I'm new to this, so every reference or suggestion is accepted.
Thank you
Take an example,
if you have a server with 32g ram.
set the following parameters :
spark.executor.memory = 32g
Take a note:
The likely first impulse would be to use --num-executors 6
--executor-cores 15 --executor-memory 63G. However, this is the wrong approach because:
63GB + the executor memory overhead won’t fit within the 63GB capacity
of the NodeManagers. The application master will take up a core on one
of the nodes, meaning that there won’t be room for a 15-core executor
on that node. 15 cores per executor can lead to bad HDFS I/O
throughput.
A better option would be to use --num-executors 17 --executor-cores 5
--executor-memory 19G. Why?
This config results in three executors on all nodes except for the one
with the AM, which will have two executors. --executor-memory was
derived as (63/3 executors per node) = 21. 21 * 0.07 = 1.47. 21 – 1.47
~ 19.
This is explained here if you want to know more :
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

Disconnected from Neo4j. Please check if the cord is unplugged

I am running simple queries on neo4j 2.1.7
I am trying to execute that query:
MATCH (a:Caller)-[:MADE_CALL]-(c:Call)-[:RECEIVED_CALL]-(b:Receiver) CREATE(a)-[:CALLED]->(b) RETURN a,b
While the query is executing, I am getting the following error
Disconnected from Neo4j. Please check if the cord is unplugged.
Then another error:
GC overhead limit exceeded
I'm working on windows server 2012 with 16G of RAM and here is my nodes.properties file:
**
`neostore.nodestore.db.mapped_memory=1800M
neostore.relationshipstore.db.mapped_memory=1G
#neostore.relationshipgroupstore.db.mapped_memory=10M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=250M
neostore.propertystore.db.arrays.mapped_memory=10M
cache_type=weak
keep_logical_logs=100M size**`
and my neo4j-community.vmoption file:
**
-Xmx8192
-Xms4098
-Xmn1G
-include-options ${APPDATA}\Neo4j Community\neo4j-community.vmoptions**
I have 6 128 644 Nodes, 6 506 355 Relationships and 10 488 435 properties
Any solution?
TL;DR: Neo4j disconnected because your query is too inefficient. The solution is to improve the query.
Your Neo4j instance appears to have timed out and undergone a GC dump due to the computational intensiveness of your query. When you initialize the Neo4j database using the bash shell, you have the option of configuring certain JVM variables, of which include the amount of memory and heap size available to Neo4j. Should a query exceed these computational limitations, Neo4j automatically terminates the query, undergoes a GC dump, and disconnects.
Looking at the information you gave on the database, there are 6M nodes with 6M relationships. Considering that your query essentially looks for all pathways from Callers to Receivers across 6M nodes, then tries to perform bulk write operations, it's not surprising that Neo4j crashes/disconnects. I would suggest finding a way to limit the query (even with a simple LIMIT keyword) and running multiple smaller queries to get the job done.

First MongoDB query ultra slow on Linode

When I start my Rails application and open a page which needs to query my MongoDB database, then there is the following problem:
on my local machine it takes about 1600ms to perform the queries and render all
on my linode it takes about 4min to perform the first query and render all
After that everything is faster, caching, pages are loaded instantly, etc.
But really, 4min? Why is that? Is that the loading from disk to memory for MongoDB? Why does it take so much longer than on my local machine?
Is this due to the hard drive being shared on Linode? I noticed a lot of activity when running iostat
$ iostat -d 2
Linux 3.12.6-x86_64-linode36 (linode) 01/31/2014 _x86_64_ (8 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1129.69 43026.47 17.62 1940251345 794504
xvdb 248.43 2572.50 698.08 116005452 31479356
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 4491.50 179012.00 0.00 358024 0
xvdb 0.00 0.00 0.00 0 0
It's my understanding that Mongo loads all the data from disk into memory, so I guess it's likely that you're experiencing slow performance during that phase. Perhaps it makes sense to hit the db with several queries to warm it up before you enable your application.

Is there a monitoring tool like xentop that will track historical data?

I'd like to view historical data for guest cpu/memory/IO usage, rather than just current usage.
There is a perl program i have written that does this. See link text
It also supports logging to a URL.
Features:
perl xenstat.pl -- generate cpu stats every 5 secs
perl xenstat.pl 10 -- generate cpu stats every 10 secs
perl xenstat.pl 5 2 -- generate cpu stats every 5 secs, 2 samples
perl xenstat.pl d 3 -- generate disk stats every 3 secs
perl xenstat.pl n 3 -- generate network stats every 3 secs
perl xenstat.pl a 5 -- generate cpu avail (e.g. cpu idle) stats every 5 secs
perl xenstat.pl 3 1 http://server/log.php -- gather 3 secs cpu stats and send to URL
perl xenstat.pl d 4 1 http://server/log.php -- gather 4 secs disk stats and send to URL
perl xenstat.pl n 5 1 http://server/log.php -- gather 5 secs network stats and send to URL
Sample output:
[server~]# xenstat 5
cpus=2
40_falcon 2.67% 2.51 cpu hrs in 1.96 days ( 2 vcpu, 2048 M)
52_python 0.24% 747.57 cpu secs in 1.79 days ( 2 vcpu, 1500 M)
54_garuda_0 0.44% 2252.32 cpu secs in 2.96 days ( 2 vcpu, 750 M)
Dom-0 2.24% 9.24 cpu hrs in 8.59 days ( 2 vcpu, 564 M)
40_falc 52_pyth 54_garu Dom-0 Idle
2009-10-02 19:31:20 0.1 0.1 82.5 17.3 0.0 *****
2009-10-02 19:31:25 0.1 0.1 64.0 9.3 26.5 ****
2009-10-02 19:31:30 0.1 0.0 50.0 49.9 0.0 *****
Try Nagios, or Munin.
Xentop is a tool to monitor the domains (VMs) running under Xen. VMware's ESX has a similar tool (I believe its called esxtop).
The problem is that you'd like to see the historical CPU/Mem usage for domains on your Xen system, correct?
As with all Virtualization layers, there are two views of this information relevant to admins: the burden imposed by the domain on the host and the what the domain thinks is its process load. If the domain thinks it is running low on resources but the host is not, it is easy to allocate more resources to the domain from the host. If the host runs out of resources, you'll need to optimize or turn off some of the domains.
Unfortunately, I don't know of any free tools to do this. XenSource provides a rich XML-RPC API to control and monitor their systems. You could easily build something from that.
If you only care about the domain-view of its own resources, I'm sure there are plenty of monitoring tools already available that fit your need.
As a disclaimer, I should mention that the company I work for, Leostream, builds virtualization management software. Unfortunately, it does not really do utilization monitoring.
Hope this helps.
Both Nagios and Munin seem to have plugins/support for Xen data collection.
A Xen Virtual Machine Monitor Plugin for Nagios
munin plugins

Resources