How to query parents-children tree in neo4j? - neo4j

I have a tree, I would like to get all nodes at every level. The depth of tree could be anything.
node(1)<-[PARENT]-node(2)<-[PARENT]-node(3)<-[PARENT]-node(4)
node(1)<-[PARENT]-node(5)<-[PARENT]-node(6)
node(2)<-[PARENT]-node(7)
node(5)<-[PARENT]-node(8)
node(2)<-[PARENT]-node(9)
so,
node(1) has two children node(2) and node(5)
node(2) has three children node(3),node(7) and node(9)
node(5) has two children node(6) and node(8)
node(3) has one child node(4)
This is the example of tree. I would like to get all nodes at every level in separate map. I tried many different cypher queries, but could not figure out a way to do it. If anyone can help. I would like to write one cypher query for doing this operation.

I figured out a simple query which keeps track of relationships, but in java, temple.query() returns Result> which is not good as I have to get nodes and relationships from that result. Here is the query:
match p=(n)<-[r:PARENT*]-b return relationships(p);
which returns all relationships in every path. from that list, have to build up the tree in java to maintain parent-children relationships.

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

Find the nodes in a Neo subgraph

I have a cyclic subgraph. I would like to know all the relationships in that subgraph. I don't know how deep the subgraph is, nor do I want to hardcode any relationship types.
The best thing I have found so far is driven by this snippet.
match(n:X)-[r*]->(m)
from r, I can find what I need. However, even for a small subgraph the cardinality of r* can be 30k or more. There is no point for Neo to calculate every path through the subgraph. I really just need the nodes or the individual relationships (preferred).
What is a way to just get the individual relationships in a subgraph? We're using Cypher.
Cypher provides no way to get all the relationships in a subgraph without following the paths. Besides, it has to explore those paths anyway in order to figure out what nodes and relationships belong to the subgraph.
To ensure that you get each relationship in a cyclic subgraph only once, you can do this:
MATCH p=(:Foo)-[*]->()
WITH RELATIONSHIPS(p) AS ps
UNWIND ps AS p
RETURN DISTINCT p;
Note, however, that variable-length path queries with no upper bound can be very expensive and may end up running "forever".
Alternate approach
If you can identify all the nodes in the desired subgraph, then there can be a more performant approach.
For example, let's suppose that all the nodes in the desired subgraph (and only those nodes) have the label X. In that case, this quick query will return all the relationships in the subgraph:
MATCH p=(:Foo)-[r]->()
RETURN r;
You can collect all nodes in a connected components with a breadth first or depth first search without filter.
The neo4j REST API has a traversal endpoint which can be used to do exactly that. It's not a Cypher query, but it could solve your problem: http://neo4j.com/docs/stable/rest-api-traverse.html
You can POST something like this against a node, there are options to only take unique nodes. Not sure but this might help with a cyclic graph.
{
"order" : "breadth_first",
"uniqueness" : "node_global",
"return_filter" : {
"language" : "builtin",
"name" : "all"
},
"max_depth" : 20
}

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Cypher: Multiple independent queries in one call

in my Neo4j 2.0 server database I have a forest, i.e. a set of trees. One of my use cases is to get the child nodes of an arbitrary subset of tree nodes.
For instance, I have the root nodes
root1 root2 root3 root4
and now I want the child nodes of root1 and root4. And I need to know which children belong to which root. Each query individually is a simple MATCH Cypher query. But for the sake of performance I would like to keep the amount of database calls low since I use the Neo4j server. Thus I am thinking about a way to tell Cypher "give me the child terms of root1 and root4 and tell me which node belongs to which root in the result". That is, I think of a kind of map. Or a collection of result sets where the first element is the child nodes of the first root, the second element the child nodes of the second root etc.
Is there a way to do this in Cypher or will I have to fall back to a server plugin here?
Thank you and best regards!
Edit:
To clarify: My main concern is that I need to know which children belong to which root. As an example, consider the small graph generated by this command:
create (r1:ROOT {name:"root1"}),
(r2:ROOT {name:"root2"}),
(c11:CHILD {name:"child1_1"}),
(c12:CHILD {name:"child1_2"}),
(c13:CHILD {name:"child1_3"}),
(c21:CHILD {name:"child2_1"}),
(c22:CHILD {name:"child2_2"}),
(c23:CHILD {name:"child2_3"}),
(r1)-[:HAS_CHILD]->(c11),
(r1)-[:HAS_CHILD]->(c12),
(r1)-[:HAS_CHILD]->(c13),
(r2)-[:HAS_CHILD]->(c21),
(r2)-[:HAS_CHILD]->(c22),
(r2)-[:HAS_CHILD]->(c23)
Here, we get root1 and root2 with three children, respectively.
To get the children of root1 I would issue the following query:
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' RETURN collect(c)
Now I know the children of root root1.
The question is: How would a query look like that queries the children of root1 AND root2 where the result would show the association of which child belongs to which root. Because clearly the query
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' OR r.name='root2' RETURN collect(c.id)
would give me the children of both roots. But now I would not know which root had which children. So what can I do?
You should give us more details but a query like this (adjusting properties and relationships), should work as you want:
MATCH (child) <-[:HAS_CHILD]- (root:ROOT)
WHERE root.name IN ['root1','root4']
RETURN child, root

How to get count for all nodes/edges downstream of some node in Neo4J

I'm wondering, within Cypher if there is a way to get a count of all nodes downstream of some node x.
For my particular use-case I have a number of graphs, which are separate entities, but stored in the same instance. I would like to find out, for each graph, what the node and relationship count is.
I already have this for relationships
start r=rel() return count()
and this for nodes
start n=node() return count()
for everything in the database.
Many thanks,
Eamonn
If you have some "reference" or root node per subgraph you can use path expressions to find all nodes:
start root=node:roots(id="xx")
match root-[*..5]->end
return count(distinct end)
It makes sense to limit the depth of your search.
you must index all your properties in your nodes/rels. then, you must start at these indexes to get the count, and if necessarily, sum them together for each graph.
let's assume we got 2 graphs, book-author type and car-color type. then to get the overal sum of nodes for each graph in cypher:
start g1=node:node_auto_index('bookName:*'), g11=node:node_auto_index('authorName:*'),
g2=node:node_auto_index('carName:*'), g22=node:node_auto_index('carColor:*')
return count(g1)+count(g11) as graph1, count(g2)+count(g22) as graph2
similary for all relationships. i don't know about any cypher solution which could simply group by an undefined property - that could solve the problem easily.

Resources