Neo4j performance in server mode - neo4j

I am learning neo4j. I am accessing neo4j via REST api(s) supported by the server mode. CRUD operations are implemented using neo4jOperations. For experimentation , I have benchmarked its read operations but I have found that methods : 'query' and 'queryForObjects' are taking huge execution time, although I am querying via a field which is indexed. Traversals are not complex.
I have : around 500K+ nodes, 900K+ relationships.
neo4j version : 3.0.8.
Is there any solution to improve the performance of query on neo4j in server mode?

Without looking at your actual queries and model it is hard to say why the performance would not be up to your expectations. Try to run the queries through the Neo4j browser and either EXPLAIN or PROFILE them, that may give you a hint of where the issue is.
Having said that, you really should move to version 3.2.1 and access the server over the bolt:/ protocol. That by itself should already significantly improve things.
Regards,
Tom

Related

Neo4j Restful VS Neo4j JDBC

What are the comparative advantages of querying a neo4j DB via
REST API
JDBC
as a Spring Data plugin
Performance will be better within Java using JDBC as opposed to a REST API. Here's a good explanation of why:
When you add complexity the code will run slower. Introducing a REST
service if it's not required will slow the execution down as the
system is doing more.
Abstracting the database is good practice. If you're worried about
speed you could look into caching the data in memory so that the
database doesn't need to be touched to handle the request.
Before optimizing performance though I'd look into what problem you're
trying to solve and the architecture you're using, I'm struggling to
think of a situation where the database options would be direct access
vs REST.
Regarding using neo4j as a plugin you can certainly do so, but I have to imagine the performance would not be as good as using JDBC.
From the book "Graph Databases" - Ian Robinson
Queries run fastest when the portions of the graph needed to satisfy
them reside in main memory (that is, in the filesystem cache and the
object cache). A single graph database instance today can hold many
billions of nodes, relationships, and properties, meaning that some
graphs will be just too big to fit into main memory.
If you add another layer to the app, this will be reflected in performance, so the bare you can consumes your data the better the performance but also the complexity and understanding of the code.

graph database revision control

GitHub for Neo4J?
I'm evaluating graph databases as a possible solution for modeling a complex computer network. It occurs to me something like a revision control system would be useful for planning and testing updates to the database. I had been assuming that either we would instantiate a test network graph for such planning and then write a routing to sync the changes.
I see that this question has been asked and answered for relational databases (How do you maintain revision control of your database structure?). But I'm asking for graph databases, probably Neo4J.
In that relational thread someone pitches the Rails approach of making rollback a required element of database development. I like this idea too; I'm not sure how easy it is in graph databases.
How is this handled in the real world?
I found your question while also searching for an answer, so I don't have tested solutions to offer. But I can share that there's some discussion of this at How do I implement revisions with neo4j?, including a specific case at Neo4j / Strategy to keep history of node changes.
There's also a more detailed blog post at http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/, which weighs the read-time / write-time / storage requirements of several alternatives. It also includes a number of diagrams and example queries that helped me wrap my head around what all this would look like.
Hope that's still useful, lo these months later, and sorry I can't be of more help! If you've found something that works in the meantime, can you let us know?

Persisting data to neo4j stand alone server

I'm currently doing some R and D regarding moving some business functionality from an Oracle RDBMS to Neo4j to reduce join complexity in the application queries. Due to the maintenance and visibility requirements for the data, I believe the stand alone server is the best option.
My thought is that within a java program I would pull the relevant data out of the Oracle tables, map it to a node object and persist it to neo4j (creating the appropriate relationships in the process).
I'm curious, with SDN over REST not being an optimal solution, what options are available for persistence. Are server plugins or unmanaged extensions the preferred method or am I overcomplicating the issue as tends to happen from time to time.
Thank you!
REST refers to a way to query the data over a network, not a way to store the data. Typically, you're going to store the data on some machine; you then have the option of either making it accessible via RESTful services with the neo4j server, or just using java applications to access the data.
I assume by SDN you're referring to spring data neo4j. Spring is a framework used for java applications, and SDN then refers to a plugin if you will for spring that allows java programmers to store models in neo4j. One could indeed use spring-data-neo4j to read data in, and then store it in Neo4J - but again this is a method of how the data gets into neo4j, it's not storage by itself.
The storage model in most cases is pretty much always the same. This link describes aspects of how storage actually happens.
Now -- to your larger business objective. In order to do this with neo4j, you're going to need to take a look at your oracle data and decide how it is best modeled as a graph. There's a big difference between an oracle RDBMS and Neo4J in terms of how the data is represented. Once you've settled on a graph design, you can then load your data into neo4j (many different options for doing that).
Will all of this "reduce join complexity in the application queries"? Well, yes, in the sense that Neo4j doesn't do joins. Will it improve the speed/performance of your application? There's just no way to tell. The answer to that depends on what your app is, what the queries are, how you model the data as a graph, and how you express the resulting queries over that graph.

General Cypher performance

After taking part in a very interesting tutorial with a focus on Cypher, I was pleasantly surprised by the declarativeness of the Cypher query language. It's a very natural way of retrieving data from Neo4J in my opinion.
Before that, I had only used the native API. And while that is less declarative, you sort of get used to it after a while. The complex constructions are all very similar and vary only in the details for my specific project.
Still, Cypher looked more natural to me and so I am contemplating on building the second version of my application with mainly Cypher queries to interact with my database. But I encountered an issue.
There are numerous ways to convert my application into Cypher and after having tried several possible queries, all with the desired result, it appears even the fastest query is still about 20 times slower than the native API.
Now, I don't mind giving up some performance for declarativeness, but times 20 is a little bit to much for me in an application that's already struggling with performance. Is there a workaround for this issue, or do I just have to stick with the native API?
Your conclusion sounds very familiar to me. I've also had performance issues when I used Neo4j and Spring Data Neo4j together. In the parts where performance really mattered, I switched to the core Traversal API which right now is significantly faster than an average Cypher query. This has a lot to do with the fact that there's no processing of a query and the fact that you control every aspect of the traversal. Cypher can only guess what the most optimal strategy is. I'm convinced that it will gain speed in the (near) future, but if performance really matters, I'd say stick with the core API.
Also, If you would be using java and spring data neo4j, consider using the advanced mapping mode (AspectJ) which is a lot faster than the simple mapping mode.

What are the advantages of using Spring Data neo4j over just using neo4j directly?

I am brand new to NOSQL databases (or any kind of database) and I need to build a graph database in Java. I have never used SpringSource before either. Will using Spring Data neo4j make the process of creating a graph database easier or will it complicate things? Should I just try to work with neo4j directly?
Thank you very much.
It depends on your use-case. SDN is a good fit when you are already working in a Spring Environment and have a rich domain model which you want to map in the graph.
SDN is a good fit in all the cases where you mostly work with a results of a few hundred or thousand POJO objects which have to interact with existing libraries, ui-layers or other application parts that deal with POJO's.
If you're not working in a Spring environment it is up to you, it adds some complexity in setup and dependencies. There are also other solutions like jo4neo or Tinkerpop Frames that work on top of Neo4j.
It is a slower than the native Neo4j API due to the indirection introduced.
For highest performance you can always fall back onto the Neo4j API.
In general the Core-API is fastest, a good thing in between is the cypher-query language which is very expressive.

Resources