Has anyone done the performance evaluation for Neo4J Java Native APIs, Traversal APIs and Cypher.
Which of the above three options will yield me better result from performance perspective?
Also, for write operations, should I use Native java APIs or cypher. Is there a possibility to bulk DB operations in native APIs so that it just hit the DB only once and not for every node/relationship creation.
You'll be interested in this article. But the main takeaway from their tests is
The Core API is able to answer about 2000 friend of a friend queries
(I have to admit on a very sparse network).
The Traverser framework is
about 25% slower than the Core API
Worst is cypher which is slower at
least one order of magnitude only able to answer about 100 FOAF like
queries per second. I was shocked so I talked with Andres Taylor from
neo4j who is mainly working for cypher. He asked my which neo4j
version I used and I said it was 1.7. He told me I should check out
1.9. since Cypher has become more performant. So I run the benchmarks over neo4j 1.8 and neo4j 1.9 unfortunately Cypher became slower in
newer neo4j releases.
However, I would recommend using Cypher unless in a high performance situation. (Basically the harder it is to work with, the faster it can be. It is up to you to balance development effort with performance.) Also, this data is old, and each major update to Neo4j comes with new tricks the Cypher planner can use to query more efficiently. So Cypher performance will very based on DB content and Neo4j version (for better or worse)
Also, the Traversal API is built on the Core API, and Cypher is built on the Traversal API; So anything you can do in Cypher, can be done with the other 2.
Related
I know that you can write extensions that you can call from Cypher, but I'd really like to avoid having to write Java. I'm thinking something similar to SQL Server stored procedures. Is this possible, or could I maybe write a Cypher query and wrap it in some minimal Java to make the current capabilities work?
Besides #InverseFalcon's answer, there is really no Transact-SQL or PL/SQL-like languages for graphs yet.
The closest language I am aware of is SAP's GraphScriph:
GraphScript is a domain-specific, read-only graph query language tailored to serve advanced graph analysis tasks and to ease the specification of custom, complex graph algorithms.
Caveats: it is only available in the SAP HANA Graph product, and, as the quote says, it is read-only. For more details, see presentation slides and paper.
If you would like to avoid Java due to its verbosity but are fine with writing general purpose code on the JVM, you might want to try the Kotlin language. However, using anything else than Java tends to introduce some integration issues (across all JVM-based applications, not just Neo4j in particular), so be prepared to tackle those. There is an example project on GitHub for Neo4j Kotlin procedures to get you started. Caveats: even though there is basic Kotlin support in the Eclipse IDE, it's not on par with the IntelliJ edition. So you will probably need an IntelliJ license.
If you have access to APOC Procedures, you can use apoc.cypher.run() (or apoc.cypher.doIt() for write-queries) to execute a string cypher query.
You can always follow the tutorial for creating your own procedure and have it call the appropriate APOC cypher run procedure with a hardcoded query.
I have read the Neo4j Java Developer Reference Document recently, but I didn't see the information about Undo/Redo/Rollback, so I wonder is the Neo4j support these operations?
neo4j does not support undo/redo, but it does support transactions (so, rollbacks are supported).
See the specific documentation for Java, Cypher, the HTTP API, and Bolt.
I'm asking about differences between both edition of Neo4j: Community and Enterprise if it's any feature which speed up queries like graph traversing in Enterpise Edition?
I'm wondering because while comparing execution query times between Neo4j Community and MySQL, MySQL gave better results, link to discusion:
Neo4j slower than MySQL in performing recursive query
Thanks in advance for any suggestions!
Neo4j enterprise has currently a few features that make it faster in querying (e.g. a more scalable page-cache and a better lock-manager).
For Neo4j 3.2 a faster Cypher runtime will be available in Neo4j enterprise.
I put also some more feedback / questions into the linked discussion.
as I was wandering in the Web looking for a Gremlin implementation for Neo4j I found these two possible solutions:
https://github.com/thinkaurelius/neo4j-gremlin-plugin
http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#neo4j-gremlin
Does anybody know what is the difference between the two in practice?
I saw that 1. is a Neo4j plugin while it's not really clear to me what the second is, and if it would lock the entire database thus not allowing other connections (I noticed that it requires the path to the data folder).
Which one is preferred in the neo4j community?
Cheers,
Alberto
I'm not sure there's really a difference as there isn't a direct comparison to be made. The second link is to the TinkerPop project and specifically to the Neo4j implementation of TinkerPop APIs. It runs in an embedded mode and does not yet have support for HA (though we hope to have that soon). The Neo4j implementation can be run in Gremlin Server which let's you send Gremlin to it as a REST, websockets, etc endpoint.
The project in the first link you provided uses that implementation to allow you to send Gremlin to Neo4j Server - so the first project depends on the second.
Your rule of thumb should be activity in the source code.
neo4j-gremlin-plugin has 3 commits this year - https://github.com/thinkaurelius/neo4j-gremlin-plugin/commits/master
tikerpop is much more active - https://github.com/apache/incubator-tinkerpop/commits/master/neo4j-gremlin/src/main/java/org/apache/tinkerpop/gremlin/neo4j
neo4j-gremlin-plugin
Extending existing Neo4j server with support for Gremlin Query Language.
TinkerPop Neo4j-Gremlin
Extending Gremlin console with support for Neo4j server.
I'm working on a project (a social network) which use Neo4j (v1.9) as the underlying datastore and Spring Data Neo4j.
I'm trying to add a tag system to the project and I'm searching for ways to efficiently implement tag recommendation using collaborative filtering strategies.
After a lot of researches, I've come with these options:
Cypher. It is the embedded query language used by Neo4j. No other framework needed, maybe the computational times are better than the others. Maybe I can easily implement the queries using Spring Data Neo4j.
Apache Mahout. It offers machine learning algorithms focused primarly in the areas of collaborative filtering, clustering and classification. However, it isn't designed for graph databases and could be potentially slow.
Apache Giraph. Open source counterpart of Google Pregel.
Apache Spark. It is a fast and general engine for large-scale data processing.
reco4j. It is the best suited solution until now, but the project seems dead.
Apache Spark GraphX + Mazerunner. Suggested by the answer of #johnymontana. I'm documenting on it. The main issue is that I don't know if it supports collaborative filtering.
Graphaware Reco. Suggested by #ChristopheWillemsen in a comment. From the official site
is an extensible high-performance recommendation engine skeleton for
Neo4j, allowing for computing and serving real-time as well as
pre-computed recommendations.
However, I haven't understand yet if it works with old version of Neo4j (I can't upgrade the Neo4j version at the moment).
So, what do you suggest and why? Feel free to suggest other interesting frameworks not listed above.
Cypher is very fast when it comes to local traversals, but is not optimized for global graph operations. If you want to do something like compute similarity metrics between all pairs of users then using a graph processing framework (like Apache Spark GraphX) would be better. There is a project called Mazerunner that connects Neo4j and Spark that you might want to take a look at.
For a pure Cypher approach, here and here are a couple of recent blog posts demonstrating Cypher queries for recommendations.