Monitoring CYPHER queries performance in Neo4J - neo4j

I am using Neo4JClient to connect to my Neo4J database and execute CYPHER queries. My goal is to check performance of queries I send to database. Problem is that I have to check it on the db side so I can't use Stopwatch in .NET. Queries have to be executed using Neo4JClient. I don't need to know execution times for specific queries. I.e. average for last 1000 queries will be enough.
I can use only Neo4J Community Edition.
Thanks in advance!

Neo4j Enterprise Edition has the capability to log slow queries taking longer than a given threshold, see the config settings containing querylog on http://neo4j.com/docs/stable/configuration-settings.html.

Related

Parsing of CE SQL to DB SQL

I am supporting Filenet Applications and generally focus on performance improvement techniques. Often we face this issue related to the queries optimization. Generally we get the queries from DBA and these are DB SQL which are fired at the database level. Now from the application code we pass the CE SQL and not the DB SQL. I am aware that the CE parse the CE SQL to underlying DB SQL. I am trying to figure out if I have the DB SQL can I get the corresponding CE SQL which is being fired. A code or script which I can write in which I enter the CE SQL and the corresponding DB SQL gets generated. Appreciate if I could get any pointers on this as I am really stuck.
You need to enable Trace Logging for the DB subsystem. This is done through the Trace Control tab of Domain configuration in ACCE. Then you will be able to see database queries in p8_server_trace.log.
For convenience you might want to enable tracing for the SRCH subsystem as well. Then original and generated queries will go hand in hand.
Detailed info on Trace Logging is available in the FileNet P8 documentation.
The way to capture CE SQL queries is to turn on auditing for the object class your are interested in and select Query Event as the event. Now every time a query is performed an event object is created. This object has a property called QueryText which contains the CE query that is performed. You could use the creation time or some other information in the query to match it to your database query.
The query events can be queried using the ACCE or accessed programmatically using the API object com.filenet.api.events.QueryEvent.
Be aware that on a busy system a lot of query events can be generated!

bottleneck node taking long time to return

we're on neo4j 2.1.4 soon to upgrade to 2.2.1.
We've been experiencing some slow downs with certain cypher queries and I think they are mostly centered around two to three nodes out of millions in the graph. These nodes were created with the intent on having some monitoring put in place to check the availability of the graph. I've since found out that a few apps that have been built are actually exercising these queries before actually performing their write operations on the graph. Then I found out that our load balancer was setup to actually do some tests through multiple apps that end up querying the same nodes. So we have a large mix of applications that are all either pulling or updating these same nodes. This has resulted in those two nodes taking anywhere from 8 to 40 seconds to be returned.
Is there any way to determine how many updates and how many queries are being issued against one node?
Since Neo4j 2.2 there's a config option to log queries taking longer than a given threshold, see the dbms.querylog.XXXX settings in http://neo4j.com/docs/stable/configuration-settings.html.
To get an update count for a given node you could setup a custom TransactionEventHandler that tracks write accesses to your given nodes.

Neo4j performance difference in using shell and API

I understand that Neo4j supports different options to run the Cypher queries. The web browser, neo4j shell and the REST API.
Is there a difference in performance when using the shell and the API?
I'm working on a dataset that has around 10 million objects(nodes+edges).
Thanks!
The web browser use in the backend the ReST API. The shell is connected directly into neo4j.
So yes you will see performance differences, the shell will generally be more faster. Now using the shell will perform slower that connecting to ReST API in your application because in the shell you can't pass parameters.
In your application, passing parameters will permit that the execution can be cached (after the warmup).
Also, if you have bad indexes and bad queries, running it on a 10 million objects dataset will just result in being not performant in the shell, in the browser and in your application.

neo4j REST API slow

I am using Neo4j 2.0.0M4 community edition with Node.js with https://github.com/thingdom/node-neo4j to access the Neo4j DB server over REST API by passing Cypher queries.
I have observed that the data returned by Neo4j from the webadmin of neo4j and even from the REST APi is pretty slow. for e.g.
a query returning 900 records takes 1.2s and then subsequent runs take around 200ms.
and similarly if the number of records go upto 27000 the query in the webadmin browser takes 21 sec.
I am wondering whats causing the REST API to be so slow and also how to go about improving the performance?
a) It's using the CYPHER? the jSON parsing or
b) the HTTP Overhead itself as similar query with 27000 records returned in mysql takes 11 ms
Any help is highly appreciated
Neo4j 2.0 is currently a milestone build that is not yet performance optimized.
Consider enabling streaming and make sure you use parameterized Cypher.
For large result sets the browser consumes a lot of time for rendering. You might try the same query using cURL to see a difference.

Migrate Data from Neo4j to SQL

Hi I am using neo4j in my application and my structure is as following:
I am using Embedded Graph API
I have several databases that I point to using a pool that I maintain in my application eg-> db1, db2, db3, ..... db100
When I want to access a particular database I point to it using new EmbeddedGraphDatabase("Path to db(n)")
The problem is that when the connection pool count increases the RAM size being consumed by the application keep increasing and breaks down the application at a point of limit.
So I am Thinking of migrating from Neo4j to some other Database.
Additionally only a small part of my database is utilizing the graph structure.
One way for migration is that I write a script for it. Is there any better option?
My another question is what is the best Database so that my structure can be maintained.
Other view-point that I am thinking about is I can keep part of my data into Neo4j and shift another part to some other database.
If anything is unclear I can clarify.
Thanks in advance.
An EmbeddedGraphDatabase instance is not the equivalent of a "connection" in SQL. It's designed to run a long time (days, months). Hence starting/stopping is costly.
What is the use case for having hundreds of separate databases in the same JVM?
Your lots of small databases will perform poorly as the graphdb is designed to hold the whole datamodel on a single host.
Do you run a single JVM per database?
You can control the amount of memory used by neo4j by providing the correct properties for memory mapping and also use the gcr cache from neo4j-enterprise and control the cache size-property variables.
I think it still makes sense to keep the graph part in Neo4j and only move the non-graphy part.

Resources