I'm new to neo4j and would like to know if it's possible to directly link a node from one graph to one or more nodes on another graph.
I have one core graph with thousands of other graphs. Each core node may link to other graphs, and nodes on that graph may link to other graphs or nodes on other graphs, including nodes on the core graph.
I know I can put all the nodes into one graph, but I would prefer to do it as described above.
Thanks!
Rein
You have only one graph in a single neo4j instance. You can store your "core graph" and all other graphs as one large unconnected network.
Related
I have a large dataset to run a specific Graph Data Science algorithm on.
The functional requirement is that the algorithm will be run often and that the dataset changes in real-time.
As I understand, in order to run an algorithm I have to project the persistent graph into memory first.
But, GDS only provides a projection of the whole dataset once (as a (filtered) snapshot), therefore, on each change to my dataset (i.e. a new relationship edge added between two nodes), I have to rerun the projection again, which seems quite an ineffective thing to do.
Is there a generic way to circumvent this and keep the Projection properly in sync with the persistent graph?
As per #tomaž-bratanič comment, it isn't possible at the moment.
I'm currently working on a project where we want to store sets of graphs in a graph database. For this purpose im using Neo4J and I can imagine two different solutions.
Put the nodes as the nodes into the database and the edges as the edges. Every edge and every node has a property graph_id, which indicates for nodes and edges to which graph they belong (trivial).
create three different labels of Nodes. The first label is Node, the second Edge and the third Graph. Now every node can be stored as Node, every edge as edge-Node and every edge and node belonging to a graph is connected to a graph node. ( the Neo4J documentation suggests to create labels for the different kinds of elements in the database )
Which solution would you prefer?
I think you are misreading the docs. They mean that different kinds of nodes should have different labels and different kinds of edges should have different labels. If you make everything, including edges into "nodes" then you will not be able to apply the library's implementations of graph theory algorithms.
Option 1 is the way to go.
Initial situation
I have several independent and disconnected graphs, each of them have a hierarchical like structure with a local root element. Each of these graphs consists of approximately 8 million nodes and 40 million relationships. I have successfully created a three-digit number of Cypher queries, which should now be applied to a single graph only and not the entirety of all graphs. The graph, the queries have to apply to, is specified by its root node.
Challenge to be solved
How can I realize a kind of pseudo multi-client capability for a graph, if all graphs have to remain in a common Neo4j database for reasons of reporting and pattern matching?
approach to the problem / preliminary result
Implement a single shortest path to the given root element for selection purposes in really every query at the beginning? Cons:
huge performance losses expected
with high development costs
Expand each graph with a separate, additional label? Cons:
complex queries, high development effort
For these cases, adding a specific label per tenant/client to all nodes in the subgraph tends to be the approach taken. It would require you to ensure that when you match to the relevant nodes in the query that you additionally make sure the nodes you're working with have the client's label present.
As a note for the future, native multi-tenancy support is one of the key features we're working on for the next year.
I have a directed graph where some two nodes are connected by multiple edges.
However I would like the two connected nodes to be visualised with only one edge with a property specifying the number of edges between them and possibly a relative edge thickness.
What query do I have to use to achieve this?
If you are referring to the neo4j browser web ui, there is no command to customize the visualization in that way.
In particular, the Cypher query language is only for performing DB operations, and does not have a way to directly affect visualizations.
I have created a basic implementation of high level client over Neo4J (https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-neo4j) and want to compare its performance with Native neo4j driver (and maybe SpringData too). This way I would be able to determine overhead my library is putting over native driver.
I plan to create an extension of YCSB for Neo4J.
My question is: what should be considered as a basic unit of object to be written into neo4j (should it be a single node or a couple of nodes joined by an edge).
What's current practice in Neo4J world. How people benchmarking neo4j performance are doing it.
There's already been some work for benchmarking Neo4J with Gatling: http://maxdemarzi.com/2013/02/14/neo4j-and-gatling-sitting-in-a-tree-performance-t-e-s-t-ing/
You could maybe adapt it.
See graphdb-benchmarks
The project graphdb-benchmarks is a benchmark between popular graph dataases. Currently the framework supports Titan, OrientDB, Neo4j and Sparksee. The purpose of this benchmark is to examine the performance of each graph database in terms of execution time. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems.
Clustering Workload (CW): CW consists of a well-known community detection algorithm for modularity optimization, the Louvain Method. We adapt the algorithm on top of the benchmarked graph databases and employ cache techniques to take advantage of both graph database capabilities and in-memory execution speed. We measure the time the algorithm needs to converge.
Massive Insertion Workload (MIW): Create the graph database and configure it for massive loading, then we populate it with a particular dataset. We measure the time for the creation of the whole graph.
Single Insertion Workload (SIW): Create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand edges and the nodes that appear during the insertion of these edges.
Query Workload (QW): Execute three common queries:
FindNeighbours (FN): finds the neighbours of all nodes.
FindAdjacentNodes (FA): finds the adjacent nodes of all edges.
FindShortestPath (FS): finds the shortest path between the first node and 100 randomly picked nodes.
One way to performance-test is to use e.g. http://gatling-tool.org/. There is work underway to create benchmark frameworks at http://ldbc.eu . Otherwise, benchmarking is highly dependent on your domain dataset and the queries you are trying to do. Maybe you could start at https://github.com/neo4j/performance-benchmark and improve on it?