In neo4j should all nodes connect to node 0 so that you can create a traversal that spans across all objects? Is that a performance problem when you get to large datasets? If so, how many nodes is too much? Is it ok not to have nodes connect to node 0 if I don't see a use case for it now, assuming I use indexes for finding specific nodes?
There is no need or requirement to connect everything to the root node. Indexes work great in finding starting points for your traversal. If you have say less then 5000 nodes connected to a starting node (like the root node), then a relationship scan is cheaper than an index lookup.
To judge what is better, you need to know a bit more about the domain.
Related
There is total 1 Category node and 2 Template node in my case. I put an * in [*] to support more further scenarios. But why there are so many db hit in this cypher for current data?
It's probably the * in the relationship part of your query that's doing it.
While you've got only one Category node and two Template nodes, you've asked Neo4j to hop through any number of relationships to get from one to the other and not given it any help to narrow down the search besides specifying the starting node.
For example, if your Category was connected to 100,000 other nodes (of any label, not just Template) you've forced Neo4j to jump through every single one of them looking to see if there's a path to a Template node - and if those nodes have their own connections then they all need to be explored too, because the depth of the traversal isn't constrained.
If you know how Category and Template nodes can be connected in ways you're interested in (for example, if there's only every some specific set of relationships you want to traverse) then you'll radically improve the performance of the query. Equally, reducing the maximum length of the path will help.
neo4j nodes and relationships
This is quite a tough job. I'm trying to find all nodes with two-way relationships starting from a specific node. Based on the image above, I would like to find all two-way relationships starting from node 1. Only nodes with two-way relationships match. For example, node 1,3,4 matches and node 1,2,3 matches as two separate groups. However, if node 2 and 4 has a two-way relationship, then node 1,2,3,4 matches as one group. The main idea is that all nodes are linked both ways in such a group. My idea is to find all nodes with two-way relationships starting from 1 and continue processing, but I'm not able to continue. Can anyone help me with this problem, thanks a lot. By the way, only the largest 'two-way-circle' is needed.
Your problem looks a lot like finding strongly connected components in the graph. As defined in the docs.
A directed graph is strongly connected if there is a path between all
pairs of vertices ( nodes ). This algorithms treats the graph as directed, so
the direction of the relationship is important and strongly connected
compoment exists only if there are relationships between nodes in both
direction.
Check out more in the documentation. You will need neo4j-graph-algorithms.
Example query with writing back the component of the graph to the node.
CALL algo.scc('Label','C', {write:true,partitionProperty:'partition'})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize
And then you can find your biggest component with the following query.
MATCH (u:Label)
RETURN distinct(u.partition) as partition,count(*) as size_of_partition
ORDER by size_of_partition DESC LIMIT 1
I have a science graph in neo4j which has names of some scientists as nodes and connected to nodes holding laws by relation has_discovered. The laws are then related to their application by relation has_application. I am new to cypher. I want to know what cql query will give me level 1 and level 2 nodes of the scientists nodes. Here level 1 will be the nodes holding laws and level 2 will be nodes holding their applications.
This query should probably take care of it, assuming your labels are :Scientist, :Law, and :Application.
MATCH (sci:Scientist)-[:has_discovered]->(law:Law)-[:has_application]->(app:Application)
RETURN sci, law, app
As long as your :has_discovered and :has_application relationships only connect those types of nodes, you can leave off the :Law and :Application labels (but you'll want to keep the :Scientist label so you begin your pattern match only at :Scientist nodes).
You can use COLLECT() as necessary to group results if you want.
So, i've created a Neo4j graph database out of a relational database. The graph database has about 7 million nodes, and about 9 million relationships between the nodes.
I now want to find all nodes, that are not connected to nodes with a certain label (let's call them unconnected nodes). For example, i have nodes with the labels "Customer" and "Order" (let's call them top-level-nodes). I want to find all nodes that have no relationship from or to these top-level-nodes. The relationship doesn't have to be direct, the nodes can be connected via other nodes to the top-level-nodes.
I have a cypher query which would solve this problem:
MATCH (a) WHERE not ((a)-[*]-(:Customer)) AND not ((a)-[*]-(:Order)) RETURN a;
As you can imagine, the query will need a long time to execute, the performance is bad. Most likely because of the undirected relationship and because it doesn't matter via how many nodes the relationship can be made. However, the relationship directions don't matter, and i need to make sure that there is no path from any node to one of the top-level-nodes.
Is there any way to find the unconnected nodes faster ? Note that the database is really big, and there are more than 2 labels which mark top-level-nodes.
You could try this approach, which does involve more operations, but can be run in batches for better performance (see apoc.periodic.commit() in the APOC procedures library).
The idea is to first apply a label (say, :Unconnected) to all nodes in your graph (batch execute with apoc.periodic.commit), and then, taking batches of top level nodes with that label, matching to all nodes in the subgraphs extending from them and removing that label.
When you finally have run out of top level nodes with the :Unconnected label (meaning all top level nodes and their subgraphs no longer have this label) then the only nodes remaining in your graph with the :Unconnected label are not connected to your top level nodes.
Any approach to this kind of operation will likely be slow, but the advantage again is that you can process this in batches, and if you get interrupted, you can resume. Once your queries are done, all the relevant unconnected nodes are now labeled for further processing at your convenience.
Also, one last note, in Neo4j undirected relationships have no arrows in the syntax ()-[*]-().
MATCH (a)
WHERE
not (a:Customer OR a:Order)
AND shortestPath((a)-[*]-(:Customer)) IS NULL
AND shortestPath((a)-[*]-(:Order)) IS NULL
RETURN a;
If you could add rel-types it would be faster.
One further optimization could be to check the nodes of an :Customer path for an :Order node and vice versa. i.e.
NONE(n in nodes(path) WHERE n:Order)
In general, this might be rather a set operation, i.e.
expand around all order and customer nodes in parallel into two sets
and compute the overlap between the two sets.
Then remove the overlap from the total number of nodes.
I added an issue for apoc here to add such a function or procedure
https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/223
I am writing my master thesis with Neo4j Database and I meet a problem. I need your help.
The picture at left is the data I saved in Neo4j, the whole picture represents how an application could be deployed in cloud. Every node represents a service.
For example, I have an Apach Module and I can "hosted_on" an Apache Server. The green line represents a possible option, because an Apache server can hosted on a Windows system or a Linux system.
So there are two possibilities for deployment, showed at right.
At right is what I want, I call it topology, it defines how an application deployment looks like.
what I want is to retrieve all possible typologies.
How I can get these possibilities topology by Cypher or Java traverse API?
Thanks very much..
I'm not sure if this is what you are getting at, but it might be helpful to consider the "What is related and how?" query:
// What is related, and how
MATCH (a)-[r]->(b)
WHERE labels(a) <> [] AND labels(b) <> []
RETURN DISTINCT head(labels(a)) AS This, type(r) as To, head(labels(b)) AS That
LIMIT 10
This will return Node labels and relationship names that are connected by at least one relationship in the graph. Is that what you mean by topology?