Get all nodes with a specific type of relationship to a root node - neo4j

I have a rather large and complex graph in Neo4j (millions of nodes and relationships in various types), I want to get all child nodes (in all depths) of a specific root node, but only with a specific type of relationship
I have tried: Match (n:NODE_TYPE)-[*:REL_TYPE]->(r:NODE_TYPE {id:SPECIFIC_ID}) return n
But I get a syntax error for specifying a label on the relationship
Querying the whole graph takes a really long time without specifying the relationship type, and nodes could go through paths that will eventually lead to the root node but will use other types of relationships (which is not good for my use case)

you need to change the order of the rel type and wildcard operator:
Match (n:NODE_TYPE)-[:REL_TYPE*]->(r:NODE_TYPE {id:SPECIFIC_ID})
return n

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

How to query Neo4j N levels deep with variable length relationship and filters on each level

I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.
Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.
The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.
I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.
Here is a Gist of a sample Cypher dump
cypher-dump
This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
RETURN path
This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
RETURN path
What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.
This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.
FYI:
[*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).
The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].
Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).
Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.
Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.
You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.

Neo4j query to ignore parent nodes which doesn't satisfy a condition but keep the same structure

I have a tree-like structure and I'm trying to get a Cypher query which will replace the parent node with the child if the parent node does not have a certain relation
for example the query: MATCH (c)-[:CHILD_OF*]->(p {id:"123"}) return c returns a structure like so (we don't care about what the other nodes are, the structure is the only thing that needs to be preserved)
()<-(A)
()<-()<-(B)<-()<-(C)
()<-(D)<-(E)<-()<-(F)
\-(G)<-()<-H)
How could I get the query to ignore all nodes without a certain property but keep it the same structure like so:
(A)
(B)<-(C)
(D)<-(E)<-(F)
(G)<-(H)
You should take a look at the procedures for creating virtual nodes and relationships in APOC Procedures.
These will allow you to create virtual relationships, that will not be saved to the graph, but will be present and viewable in your query.
The tricky part will be creating those new virtual relationships. You'll likely be filtering down nodes in all paths to the nodes you're interested in. At that point you may need to use apoc.coll.pairsMin() in order to get each adjacent pair of nodes in the collection on a row so you can create the virtual relationships between them.
After all the virtual relationships are created (in the same cypher query), match from the root node using those virtual relationships, and you should see the graph you want.

Cypher: Multiple independent queries in one call

in my Neo4j 2.0 server database I have a forest, i.e. a set of trees. One of my use cases is to get the child nodes of an arbitrary subset of tree nodes.
For instance, I have the root nodes
root1 root2 root3 root4
and now I want the child nodes of root1 and root4. And I need to know which children belong to which root. Each query individually is a simple MATCH Cypher query. But for the sake of performance I would like to keep the amount of database calls low since I use the Neo4j server. Thus I am thinking about a way to tell Cypher "give me the child terms of root1 and root4 and tell me which node belongs to which root in the result". That is, I think of a kind of map. Or a collection of result sets where the first element is the child nodes of the first root, the second element the child nodes of the second root etc.
Is there a way to do this in Cypher or will I have to fall back to a server plugin here?
Thank you and best regards!
Edit:
To clarify: My main concern is that I need to know which children belong to which root. As an example, consider the small graph generated by this command:
create (r1:ROOT {name:"root1"}),
(r2:ROOT {name:"root2"}),
(c11:CHILD {name:"child1_1"}),
(c12:CHILD {name:"child1_2"}),
(c13:CHILD {name:"child1_3"}),
(c21:CHILD {name:"child2_1"}),
(c22:CHILD {name:"child2_2"}),
(c23:CHILD {name:"child2_3"}),
(r1)-[:HAS_CHILD]->(c11),
(r1)-[:HAS_CHILD]->(c12),
(r1)-[:HAS_CHILD]->(c13),
(r2)-[:HAS_CHILD]->(c21),
(r2)-[:HAS_CHILD]->(c22),
(r2)-[:HAS_CHILD]->(c23)
Here, we get root1 and root2 with three children, respectively.
To get the children of root1 I would issue the following query:
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' RETURN collect(c)
Now I know the children of root root1.
The question is: How would a query look like that queries the children of root1 AND root2 where the result would show the association of which child belongs to which root. Because clearly the query
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' OR r.name='root2' RETURN collect(c.id)
would give me the children of both roots. But now I would not know which root had which children. So what can I do?
You should give us more details but a query like this (adjusting properties and relationships), should work as you want:
MATCH (child) <-[:HAS_CHILD]- (root:ROOT)
WHERE root.name IN ['root1','root4']
RETURN child, root

How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue:
There are two types of graphs I want to search for:
Star graphs: A graph with one center node and multiple outgoing relationships.
n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph)
So the main problem is that when I do something such as:
MATCH a-->b, a-->c, a-->d
MATCH a-->b-->c-->d
Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. For small graphs, this can easily be fixed with
WHERE not(a=b) AND not(a=c) AND ...
But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Is there any simple way I can specify the query to make the differently named nodes distinct?
Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use).
Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path.
While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path.
For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path:
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path
If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node
NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited.
You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships
Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes.
We also support repeating sequences of relationships or node labels.
If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported.

Resources