Iterate through Neo4j graph matching on node properties - neo4j

I have for example the following graph in Neo4j
(startnode)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#the line below can repeat itself 0..n times
(node)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#up to the endnode
(endnode)
There is an Interface properties I also need to match on. I do not want to follow all the paths, I just the one with Interface Node property I am looking for. For example Interface.VlanList CONTAINS ",23,"
I have done the following in Cypher but it applies that I already know how many iterations I am going to find which in reality is not the case.
match (n:StartNode {name:"device name"}) -[:BELONG_TO]- (i:Interface) -[:IS_CONNECTED]- (ii:Interface)-[:BELONG_TO]-(nn:Node) -[:BELONG_TO]- (iii:Interface) -[:IS_CONNECTED]- (iiii:Interface) -[:BELONG_TO]-(nnn:Node)
where i.VlanList CONTAINS ",841,"
AND ii.VlanList CONTAINS ",841,"
AND iii.VlanList CONTAINS ",841,"
return n, i,ii,nn,iii,iiii,nnn
I have been looking at the documentation but can not work out how the above could be resolved.

This should work:
// put the searchstring in a variable
WITH ',841,' AS searchstring
// look up start end endnode
MATCH (startNode: .... {...}), (endNode: .... {...})
// look for paths of variable length
// that have your search string in all nodes,
// except the first and the last one
WITH searchstring,startNode,endNode
MATCH path=(startnode)-[:BELONG_TO|IS_CONNECTED*]-(endnode)
WHERE ALL(i IN nodes(path)[1..-1] WHERE i.VlanList CONTAINS searchstring)
RETURN path
You can also look at https://neo4j.com/labs/apoc/4.1/graph-querying/path-expander/ for more ideas about how you can limit the pathfinding.

This query should work for you (assuming that the relationship directions I chose are correct):
MATCH p = (sNode:StartNode)-[:BELONG_TO]->(i1:Interface)-[:IS_CONNECTED]->(i2:Interface)-[:BELONG_TO]->(n1)-[:BELONG_TO|IS_CONNECTED*0..]->(eNode:Node)
WHERE sNode.name = "device name" AND eNode.name = "foo" AND LENGTH(p)%3 = 0
WITH p, i1, i2, n1, eNode, RELATIONSHIPS(p) AS rels, NODES(p) AS ns
WHERE n1 = eNode OR (
ALL(j IN RANGE(3, SIZE(rels)-3, 3) WHERE
'BELONG_TO' = TYPE(rels[j]) = TYPE(rels[j+2]) AND
'IS_CONNECTED' = TYPE(rels[j+1])) AND
ALL(x IN ([i1, i2] + REDUCE(s = [], i IN RANGE(3, SIZE(ns)-2, 3) | CASE WHEN i%3 = 0 THEN s ELSE s +ns[i] END))
WHERE x:Interface AND x.VlanList CONTAINS $substring)
)
RETURN p
It checks that the returned paths have the required pattern of node labels, node property value, and relationship types. It takes advantage of the variable length relationship syntax, using zero as the lower bound. Since there is no upper bound, the variable length relationship query query can take "forever" to finish (and in such a situation, you should use a reasonable upper bound).

Related

Cypher find path avoiding certain node types except at the start and end

I have a graph with the following structure:
(r:Region)-[:CONTAINS]-(s:Station)-[:IS_AT]-(t:TrackLocation)-[:IS_NEXT_TO]-(t:TrackLocation)
I want to find the shortest path between two Stations, using only track locations. My current query is:
match (s1:Station) where s1.crs = 'ADR' match (s2:Station) where s2.crs = 'NRW' match p=shortestPath((s1)-[*1..1000]-(s2)) WHERE ALL (n IN nodes(p) WHERE NOT n:Region) return nodes(p).
The problem is, a station can be at multiple TrackLocations; and sometimes the shortest path can go via an intermediate Station node if it thinks this is quicker than sticking to TrackLocations for the intermediate nodes. If I change the filter to WHERE NOT n:Region AND NOT n:Station, then of course it won't work because the start and end nodes are stations.
Is there any way to adjust this query to do (Station)-> via track locations only ->(Station)?
You can use OR logic here
match (s1:Station{crs: 'ADR'}), (s2:Station{crs : 'NRW'})
match p=shortestPath((s1)-[*1..1000]-(s2))
WHERE ALL (n IN nodes(p) WHERE n:TrackLocation or n = s1 or n = s2 ) return nodes(p)
If x is a list, then the list operation x[1..-1] returns a list with the second through next-to-last elements of x.
This query uses that list operation to avoid testing the first and last nodes in the path:
MATCH (s1:Station), (s2:Station)
WHERE s1.crs = 'ADR' AND s2.crs = 'NRW'
MATCH p = shortestPath((s1)-[*..1000]-(s2))
WITH NODES(p) AS ns
WHERE ALL(n IN ns[1..-1] WHERE n:TrackLocation)
RETURN ns

What is the correct code for getting the connecting nodes with constraints in neo4j?

I am trying to get the connecting nodes. The final node should have a type(r) of pobj. How can I specify it with the shortest path?
match(c:fdnode{name:'flights'})
match(d:fdnode)
match p = shortestPath((c)-[*..15]-(e)-[r]-(d))
where d.name = '700' and type(r) = 'pobj'
RETURN nodes(p)
If I remove r, the code returns desired output. But I need the type(r).
pobj is only for this specific case.I have multiple traversal criteria.
The following query may do what you want. It returns the nodes in the shortest undirected path (of up to length 15) that meets these criteria:
The first node has the label 'fdnode', and its name value is 'flights'.
The last node has the label 'fdnode', and its name value is '700'.
The last relationship has the type, pobj.
MATCH p = shortestPath((c:fdnode)-[*..15]-(d:fdnode))
WHERE c.name = 'flights' AND d.name = '700' AND TYPE(LAST(RELATIONSHIPS(p))) = 'pobj'
RETURN NODES(p);
NOTE: The matched path will be "undirected" because this query mimicked your query by not specifying any directionality in the MATCH pattern. So, every matched relationship is allowed to be in either direction. If this is not what you intended, you need to explicitly specify the directionality in your pattern.
Figured it out. This works.
match (a:iknode)-[*..10]->(b:iknode)-[r]->(c:iknode) where type(r) = 'pobj' and a.name = 'flights' return c
This also works:
MATCH p = ((c:fdnode)-[*..4]->(d:fdnode))
WHERE c.name = 'flights' AND TYPE(LAST(RELATIONSHIPS(p))) = 'pobj'
RETURN NODES(p) order by length(p) ;

Neo4J - Test nodes on path with unknown depth, with MATCH only

I want to test all nodes in the path from node a to node b (with only MATCH statement), where the depth is changing (could be any number). In the example below the depth is 2.
START a = node(86)
MATCH p0 = a-[*..2]-b
WHERE (b.attr = 'true') AND (a.attr = 'true')
RETURN p0
My question is how do I test the nodes between a and b for a certain attribute (attr = 'true'), using the MATCH statement, without knowing the depth required.
I find that using filter method I can filter out all the unwanted nodes.
like:
START a = node(86)
MATCH p0 = a-[*..2]-b
RETURN filter(x IN nodes(p0) WHERE x.attr = 'true')
But that is not what I need, I need to use MATCH.
Take a look at the Cypher refcard, specifically to the List Predicates section. The all() function should do the trick.
Something like:
START a=node(86)
MATCH p0=(a)-[*..2]-(b)
WHERE ALL(node in nodes(p0) WHERE node.attr = true)
RETURN p0
This will only match patterns where all the nodes in the pattern have that attribute as true.

Neo4J node traversal cypher where clause for each node

I've been playing with neo4j for a geneology site and it's worked great!
I've run into a snag where finding the starting node isn't as easy. Looking through the docs and the posts online I haven't seen anything that hints at this so maybe it isn't possible.
What I would like to do is pass in a list of genders and from that list follow a specific path through the nodes to get a single node.
in context of the family:
I want to get my mother's father's mother's mother. so I have my id so I would start there and traverse four nodes from mine.
so pseudo query would be
select person (follow childof relationship)
where starting node is me
where firstNode.gender == female
AND secondNode.gender == male
AND thirdNode.gender == female
AND fourthNode.gender == female
Focusing on the general solution:
MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
AND length(p) = size({genders})
AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor
here's how it works:
match the starting node by id
match all the variable-length paths going to any ancestor
constrain the length of the path (i.e. the number of relationships, which is the same as the number of ancestors), as you can't parameterize the length in the query
extract the genders in the path
nodes(p) returns all the nodes in the path, including the starting node
tail(nodes(p)) skips the first element of the list, i.e. the starting node, so now we only have the ancestors
extract() extracts the genders of all the ancestor nodes, i.e. it transforms the list of ancestor nodes into their genders
the extracted list of genders can be compared to the parameter
if the path matched, we can return the bound ancestor, which is the end of the path
However, I don't think it will be faster than the explicit solution, though the performance could remain comparable. On my small test data (just 5 nodes), the general solution does 26 DB accesses whereas the specific solution only does 22, as reported by PROFILE. Further profiling would be needed on a larger database to compare the performances:
PROFILE MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
AND length(p) = size({genders})
AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor
The general solution has the advantage of being a single query which won't need to be parsed again by the Cypher engine, whereas each generated query will need to be parsed.
It was more simple than I thought. Maybe there is still a better way so I'll leave this open for a bit.
the query would be
MATCH (n1:Person { Id: 'f59c40de-506d-4829-a765-7a3ae94af8d1' })
<-[:CHILDOF]-(n2 { Gender:'0'})
<-[:CHILDOF]-(n3 { Gender:'1'})
<-[:CHILDOF]-(n4 { Gender:'1'})
RETURN n4
and for each generation back would add a new row.
The equivalent query would look something like this:
MATCH (me:Person)
WHERE me.ID = ?
WITH me
MATCH (me)-[r:childof*4]->(ancestor:Person)
WITH ancestor, EXTRACT(rel IN r | endNode(rel).gender) AS genders
WHERE genders = ?
RETURN ancestor
Disclaimer, I haven't double-checked the syntax.
In Neo4j you typically find your start node first, typically by an ID of some sort (modify as required to match on a unique property). We then traverse a number of relationships to an ancestor, extract the gender property of all end nodes in the traversed relationships, and compare the genders to the expected list of genders (you'll need to make sure the argument is a bracketed list in the desired order).
Note that this approach filters down all possible results with that degree of childof relationship as opposed to walking your graph, so higher degrees of relationship (the higher the degree of ancestry you're querying), the slower the call will get.
I'm also unsure if you can parameterize the degree of the variable relationship, so that might prevent this from being a generalized solution for any degree of ancestry.
I'm not sure if you want a generic query which can work whatever the collection of genders you pass, or a specific solution.
Here's the specific solution: you match the path with the wanted length, and match each gender, as you've already noted in your own answer.
MATCH (me:Person)-[:IS_CHILD_OF]->(p1:Person)
-[:IS_CHILD_OF]->(p2:Person)
-[:IS_CHILD_OF]->(p3:Person)
-[:IS_CHILD_OF]->(p4:Person)
WHERE me.uuid = {uuid}
AND p1.gender = {genders}[0]
AND p2.gender = {genders}[1]
AND p3.gender = {genders}[2]
AND p4.gender = {genders}[3]
RETURN p4
Now, if you want to pass in a list of genders of an arbitrary length, it's actually possible. You match a variable-length path, make sure it has the right length (matching the number of genders), then match each gender in sequence.
MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
AND length(p) = size({genders})
AND all(i IN range(0, size({genders}) - 1)
WHERE {genders}[i] = extract(x in tail(nodes(p)) | x.gender)[i])
RETURN ancestor
Building on #InverseFalcon's answer, you can actually compare collections, which simplifies the query:
MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
AND length(p) = size({genders})
AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor

Cypher path needs to exclude a certain relation

I have this graph:
A-[:X]->B-> a whole tree of badness
A-[:Y]->C-> a whole tree of goodness
I would like to know how to specify a path starting with A that excludes the :X relationship.
In this case "Y" could be any one of a number of different edge types. I do not want to specify them explicitly.
How do I write a path statement that includes A-[*]-B where * is not :X but can be anything else?
Solution for a fixed number of relationships between A and B
You can exclude a relationship type by matching all relationships from A to B and then filter out a specific type with WHERE NOT
MATCH p = (a:Label1)-[]-(b:Label2)
WHERE NOT (a)-[:X]-(b)
RETURN p
Solution for a variable length path between A and B
If you have a variable length path between A and B you cannot put the exact pattern in the WHERE NOT. Instead, you can use a NONE predicate on the path:
MATCH p = (a:Label1)-[*]-(b:Label2)
// this WHERE makes sure that none of the relationships in the
// returned path fulfill the criterion type(relationship) = 'X'
WHERE NONE (r in relationships(p) WHERE type(r) = 'X')
RETURN p
This Cypher query is simpler than the variable-length path query from #MartinPreusse, as it avoids using the RELATIONSHIPS function. Profiling shows that its execution plan is also a bit simpler, so it might be faster.
MATCH p=(a:Label1)-[rels*]-(b:Label2)
WHERE NONE (r IN rels WHERE type(r)= 'X')
RETURN p

Resources