Neo4j use only one core in Cypher query running - neo4j

When i run a Cypher query in UI, only one core in server is going up and the query gets stuck or responds very slow.
I use Neo4j 3.0.7 Community.
Someone have idea what i can tune for using all cores?

A single Cypher query is limited to a single thread. See this tweet from late 2015 by Stefan Armbruster:
A cypher statement is (in most cases) one transaction and therefore only on one thread.
If your query is slow, you can use various tricks for optimizing it: this blog post is a good starting point.

Related

Neo4j performance in server mode

I am learning neo4j. I am accessing neo4j via REST api(s) supported by the server mode. CRUD operations are implemented using neo4jOperations. For experimentation , I have benchmarked its read operations but I have found that methods : 'query' and 'queryForObjects' are taking huge execution time, although I am querying via a field which is indexed. Traversals are not complex.
I have : around 500K+ nodes, 900K+ relationships.
neo4j version : 3.0.8.
Is there any solution to improve the performance of query on neo4j in server mode?
Without looking at your actual queries and model it is hard to say why the performance would not be up to your expectations. Try to run the queries through the Neo4j browser and either EXPLAIN or PROFILE them, that may give you a hint of where the issue is.
Having said that, you really should move to version 3.2.1 and access the server over the bolt:/ protocol. That by itself should already significantly improve things.
Regards,
Tom

setting cypher planner for a query

I'm trying to understand cypher planner, and I'm not sure of few things.
should I ever change it, or let the Cypher engine to control it?
what is the difference between the COST and the RULE planner?
Since every version of neo4j may tweak the planners, the only way to know for sure which planner works better for a specific query and a specific neo4j version would be to use PROFILE and performance testing. Also, since the plan generated by the COST planner depends on the actual characteristics of your data, you may also want to periodically test query performance with both planners even when you do not upgrade to a newer neo4j version.
This neo4j blog entry provides some details on the planners.

Modifying Cypher Query Engine

I would like to modify the way Cypher processes queries sent to it for pattern matching. I have read about Execution plans and how Cypher chooses the best plan with the least number of operations and all. This is pretty good. However I am looking into implementing a Similarity Search feature that allows you to specify a Query graph that would be matched if not exact, close (similar). I have seen a few examples of this in theory. I would like to implement something of this sort for Neo4j. Which I am guessing would require a change in how the Query Engine deals with queries sent to it. Or Worse :)
Here are some links that demonstrate the idea
http://www.cs.cmu.edu/~dchau/graphite/graphite.pdf
http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper72.pdf
I am looking for ideas. Anything at all in relation to the topic would be helpful. Thanks in advance
(:I)<-[:NEEDING_HELP_FROM]-(:YOU)
From my point of view, better for you is to create Unmanaged Extensions.
Because you can create you own custom functionality into Neo4j server.
You are not able to extend Cypher Language without your own fork of source code.

General Cypher performance

After taking part in a very interesting tutorial with a focus on Cypher, I was pleasantly surprised by the declarativeness of the Cypher query language. It's a very natural way of retrieving data from Neo4J in my opinion.
Before that, I had only used the native API. And while that is less declarative, you sort of get used to it after a while. The complex constructions are all very similar and vary only in the details for my specific project.
Still, Cypher looked more natural to me and so I am contemplating on building the second version of my application with mainly Cypher queries to interact with my database. But I encountered an issue.
There are numerous ways to convert my application into Cypher and after having tried several possible queries, all with the desired result, it appears even the fastest query is still about 20 times slower than the native API.
Now, I don't mind giving up some performance for declarativeness, but times 20 is a little bit to much for me in an application that's already struggling with performance. Is there a workaround for this issue, or do I just have to stick with the native API?
Your conclusion sounds very familiar to me. I've also had performance issues when I used Neo4j and Spring Data Neo4j together. In the parts where performance really mattered, I switched to the core Traversal API which right now is significantly faster than an average Cypher query. This has a lot to do with the fact that there's no processing of a query and the fact that you control every aspect of the traversal. Cypher can only guess what the most optimal strategy is. I'm convinced that it will gain speed in the (near) future, but if performance really matters, I'd say stick with the core API.
Also, If you would be using java and spring data neo4j, consider using the advanced mapping mode (AspectJ) which is a lot faster than the simple mapping mode.

Neo4j - Cypher vs Gremlin query language

I'm starting to develop with Neo4j using the REST API.
I saw that there are two options for performing complex queries - Cypher (Neo4j's query language) and Gremlin (the general purpose graph query/traversal language).
Here's what I want to know - is there any query or operation that can be done by using Gremlin and can't be done with Cypher? or vice versa?
Cypher seems much more clear to me than Gremlin, and in general it seems that the guys in Neo4j are going with Cypher.
But - if Cypher is limited compared to Gremlin - I would really like to know that in advance.
For general querying, Cypher is enough and is probably faster. The advantage of Gremlin over Cypher is when you get into high level traversing. In Gremlin, you can better define the exact traversal pattern (or your own algorithms) whereas in Cypher the engine tries to find the best traversing solution itself.
I personally use Cypher because of its simplicity and, to date, I have not had any situations where I had to use Gremlin (except working with Gremlin graphML import/export functions). I expect, however, that even if i would need to use Gremlin, I would do so for a specific query I would find on the net and never come back to again.
You can always learn Cypher really fast (in days) and then continue with the (longer-run) general Gremlin.
We have to traverse thousands of nodes in our queries. Cypher was slow. Neo4j team told us that implementing our algorithm directly against the Java API would be 100-200 times faster. We did so and got easily factor 60 out of it. As of now we have no single Cypher query in our system due to lack of confidence. Easy Cypher queries are easy to write in Java, complex queries won't perform. The problem is when you have multiple conditions in your query there is no way in Cypher to tell in which order to perform the traversals. So your cypher query may go wild into the graph in a wrong direction first.
I have not done much with Gremlin, but I could imagine you get much more execution control with Gremlin.
The Neo4j team's efforts on Cypher have been really impressive, and it's come a long way. The Neo team typically pushes people toward it, and as Cypher matures, Gremlin will probably get less attention. Cypher is a good long-term choice.
That said- Gremlin is a Groovy DSL. Using it through its Neo4j REST endpoint allows full, unfettered access to the underlying Neo4j Java API. It (and other script plugins in the same category) cannot be matched in terms of low-level power. Plus, you can run Cypher from within the Gremlin plugin.
Either way, there's a sane upgrade path where you learn both. I'd go with the one that gets you up and running faster. In my projects, I typically use Gremlin and then call Cypher (from within Gremlin or not) when I need tabular results or expressive pattern matching- both are a pain in the Gremlin DSL.
I initially started using Gremlin. However, at the time, the REST interface was a little unstable, so I switched to Cypher. It has much better support for Neo4j. However, there are some types of queries that are simply not possible with Cypher, or where Cypher can't quite optimize the way you can with Gremlin.
Gremlin is built over Groovy, so you can actually use it as a generic way to get Neo4j to execute 'Java' code and perform various tasks from the server, without having to take the HTTP hit from the REST interface. Among others, Gremlin will let you modify data.
However, when all I want is to query data, I go with Cypher as it is more readable and easier to maintain. Gremlin is the fallback when a limitation is reached.
Gremlin queries can be generated programmatically.
(See http://docs.sqlalchemy.org/en/rel_0_7/core/tutorial.html#intro-to-generative-selects to know what I mean.)
This seems to be a bit more tricky with Cypher.
Cypher only works for simple queries. When you start incorporating complex business logic into your graph traversals it becomes prohibitively slow or stops working altogether.
Neo4J clearly knows that Cypher isn't cutting it, because they also provide the APOC procedures which include an alternate path expander (apoc.path.expand, apoc.path.subgraphAll, etc).
Gremlin is harder to learn but it's more powerful than Cypher and APOC. You can implement any logic you can think of in Gremlin.
I really wish Neo4J shipped with a toggleable Gremlin server (from reading around, this used to be the case). You can get Gremlin running against a live Neo4J instance, but it involves jumping through a lot of hoops. My hope is that since Neo4J's competitors are allowing Gremlin as an option, Neo4J will follow suit.
Cypher is a declarative query language for querying graph databases. The term declarative is important because is a different way of programming than programming paradigms like imperative.
In a declarative query language like Cypher and SQL we tell the underlying engine what data we want to fetch and we do not specify how we want the data to be fetched.
In Cypher a user defines a sub graph of interest in the MATCH clause. Then underlying engine runs a pattern matching algorithm to search for the similar occurrences of sub graph in the graph database.
Gremlin is both declarative and imperative features. It is a graph traversal language where a user has to give explicit instructions as to how the graph is to be navigated.
The difference between these languages in this case is that in Cypher we can use a Kleene star operator to find paths between any two given nodes in a graph database. In Gremlin however we will have to explicitly define all such paths. But we can use a repeat operator in Gremlin to find multiple occurrences of such explicit paths in a graph database. However, doing iterations over explicit structures in not possible in Cypher.
If you use gremlin, then it allow you to migrate the to different graph databases,
Since most of the graph databases supports the gremlin traversal, Its good idea to chose the gremlin.
Long answer short : Use cypher for query and gremlin for traversal. You will see the response timing yourself.

Resources