Cypher: WHERE ALL Not working as expected - neo4j

So my query is for a "superpath finding problem".
The relevant nodes here are;
route: The overall path object
tlroutesegment: The logical link between the route and the different segments (which compose the full path) (ps: I know this could be better represented using a relationship, however the database is just made this way :S)
oms: The PHYSICAL path segments itself
validochpath: More or less irrelevant for this question; Top level entity of routes
So on to the actual problem I am having; below is a WORKING solution to the above, HOWEVER, I wanted to optimize the query a bit by reducing the # of routes we have to search through in the 4th line here.
MATCH (vp:validochpath {"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)
WITH collect(oms) AS omsNodes
MATCH (ort:route)
WHERE ALL(x in omsNodes WHERE (ort)<--(:tlroutesegment)-->(x))
WITH ort
MATCH (ort)--(vp:validochpath)
RETURN *
This is what the new query looks like, as you can see I use the relation to filter out much of the route nodes.
MATCH (vp:validochpath {onepID:"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)<--(rs2:tlroutesegment)
WITH rs2, collect(oms) AS omsNodes
MATCH (rs2)-->(ort2:route)
WHERE ALL(x in omsNodes WHERE (x)<--(:tlroutesegment)-->(ort2))
MATCH (ort2)--(vp:validochpath)
RETURN *
The problem is, this query does not seem to filter out any nodes with the WHERE ALL and just returns everything.

In your second query, the WHERE clause accepts all matches.
From the first MATCH clause, we know that rs2 is a tlroutesegment and that all the nodes in omsNodes are related to rs2. From the second MATCH clause, we also know that ort2 is related to rs2. Your WHERE clause is checking that all the nodes in omsNodes are related to a tlroutesegment that is also related to ort2. Since rs2 is a tlroutesegment, this test always succeeds.
If you want to test for the existence of paths with tlroutesegment nodes that are different than rs2, try this WHERE clause:
WHERE ALL(x in omsNodes WHERE
SIZE([(x)<--(y:tlroutesegment)-->(ort2) WHERE y <> rs2 | y]) > 0)

Related

Why does the query to find intermediate nodes take so long?

The database has a graph with the following 3 nodes:
...->(1) ------>(3)-->...
\ ^
\ |
---->(2)---/
Now, I want to get all distinct nodes that are reachable from node 1 to node 3, including themselves where I know exactly unique properties of node 1 and node 3 (the nodes are actually commits from a github repository). So, I came up with the following query:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (origin)-[:CHANGED_TO*0..]->(intermediate_commit:App)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
However, the query never finishes or at least takes too long to complete. I know that I could have used MATCH p=(origin:App)-[:CHANGED_TO*0..]->(destination:App) and then UNWIND and return distinct nodes. The problem is, I believe, it queries different paths implying I am interested in relationships between them too. While in fact I am not interested in paths. What I need is only distinct nodes that match the pattern. My understanding is that querying paths is slower than it could be if I could query only the nodes.
Could you please help to understand what I am missing? Thanks!
The solution was quite simple. Instead of specifying a pattern in MATCH clause, we move the pattern to WHERE clause. Also, I split the pattern into 2 parts. I can't explain why exactly it is faster but my understanding is that when we move pattern to WHERE clause and MATCH only nodes, we let neo4j know that we are interested only in nodes and not in all possible paths that match the pattern.
The full query:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (intermediate_commit:App)
WHERE (origin)-[:CHANGED_TO*0..]->(intermediate_commit)
AND (intermediate_commit)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
Also, if you have a lot of nodes, I believe specifying LIMIT 1 to match origin and destination can also improve the query, like this:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
WITH origin
LIMIT 1
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
WITH origin, destination
LIMIT 1
MATCH (intermediate_commit:App)
WHERE (origin)-[:CHANGED_TO*0..]->(intermediate_commit)
AND (intermediate_commit)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
That might be an unbounded path search? Do you really want all paths of any length between the two nodes (e.g. paths spanning the entire graph?)
Does this do what you want?
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (origin:App)-[:CHANGED_TO *0..1]->(intermediate_commit:App)-[:CHANGED_TO *0..1]->(destination:App)
RETURN distinct intermediate_commit
I bounded the path length to one hop, changing from 0.. to 0..1
(which means minimum 0 hop, up to 1 relationship hop)
The pattern and conditions allow for the possibility of paths that extend past the start or end nodes yet reach them again further down, this is why it doesn't stop when it finds one matching path but keeps expanding beyond it. Remember Cypher is concerned with finding all possible paths that meet the pattern that exist in the graph. And because of your pattern, the check-beyond-the-start-and-end-nodes-without-limit doesn't just happen once, but per potential (intermediate_commit:App) found while expanding, this is why your query isn't returning.
One way you can get what you want, all possible paths but stopping when the node is reached, is to use the APOC path expanders, you can supply the node as a terminator node, which will halt further expansion past it.
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
CALL apoc.path.expandConfig(destination, {relationshipFilter:'<CHANGED_TO', terminatorNodes:[origin]}) YIELD path
UNWIND nodes[path] as node
WITH DISTINCT node
WHERE node:App
RETURN node as intermediate_commit
This is expanding backwards from destination to origin, seems like that could be more efficient. Once we have the paths, we can UNWIND the nodes from all paths, keep the distinct ones, and make sure we only take the :App nodes.

simple match query taking ages

I have a simple query
MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m
and when executing the query "manually" (i.e. using the browser interface to follow edges) I only get a single node as a result as there are no further connections. Checking this with the query
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o
shows no results and
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE) RETURN m
shows a single node so I have made no mistake doing the query manually.
However, the issue is that the first question takes ages to finish and I do not understand why.
Consequently: What is the reason such trivial query takes so long even though the maximum result would be one?
Bonus: How to fix this issue?
As Tezra mentioned, the variable-length pattern match isn't in the same category as the other two queries you listed because there's no restrictions given on any of the nodes in between n and m, they can be of any type. Given that your query is taking a long time, you likely have a fairly dense graph of :CONNECTION relationships between nodes of different types.
If you want to make sure all nodes in your path are of the same label, you need to add that yourself:
MATCH path = (n:TYPE {id:123})<-[:CONNECTION*]-(m:TYPE)
WHERE all(node in nodes(path) WHERE node:TYPE)
RETURN m
Alternately you can use APOC Procedures, which has a fairly efficient means of finding connected nodes (and restricting nodes in the path by label):
MATCH (n:TYPE {id:123})
CALL apoc.path.subgraphNodes(n, {labelFilter:'TYPE', relationshipFilter:'<CONNECTION'}) YIELD node
RETURN node
SKIP 1 // to avoid returning `n`
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o Is not a fair test of MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m because it excludes the possibility of MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:ANYTHING_ELSE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o.
For your main query, you should be returning DISTINCT results MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN DISTINCT m.
This is for 2 main reasons.
Without distinct, each node needs to be returned the number of times for each possible path to it.
Because of the previous point, that is a lot of extra work for no additional meaningful information.
If you use RETURN DISTINCT, it gives the cypher planner the choice to do a pruning search instead of an exhaustive search.
You can also limit the depth of the exhaustive search using ..# so that it doesn't kill your query if you run against a much older version of Neo4j where the Cypher Planner hasn't learned pruning search yet. Example use MATCH (n:TYPE {id:123})<-[:CONNECTION*..10]<-(m:TYPE) RETURN m

Query intersection of Paths in Neo4j using Cypher

Having this query working in Cypher (Neo4j):
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
RETURN p
which returns all possible paths belonging a specific group (group is just a property to classify nodes), I am struggling to get a query that returns the paths in common between both collection of paths. It would be something like this:
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
MATCH p=(g3:Node)-[:FOLLOWED_BY *2..2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15
RETURN INTERSECTION(path1, path2)
Of course I made that up. The goal is to get all the paths in common between both queries.
The start/end nodes of your 2 MATCHes have different groups, so they can never find common paths.
Therefore, when you ask for "paths in common", I assume you actually want to find the shared middle nodes (between the 2 sets of 3-node paths). If so, this query should work:
MATCH p1=(g:Node)-[:FOLLOWED_BY *2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
WITH COLLECT(DISTINCT NODES(p1)[1]) AS middle1
MATCH p2=(g3:Node)-[:FOLLOWED_BY *2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15 AND NODES(p2)[1] IN middle1
RETURN DISTINCT NODES(p2)[1] AS common_middle_node;

Neo4j: Expres path cypher query result in terms of nodes

I have the following node structure Emp[e_id, e_name, e_bossid]. What is more I have a recursive query that exploit the database in recursive traversal on SELF relation e_bossid-[REPORTS_TO]->e_id
MATCH (e:Employee) WHERE NOT (e)-[:REPORTS_TO]->()
SET e:Root;
MATCH path = (b:Root)<-[:REPORTS_TO*]-(e:Employee)
RETURN path
limit 1000;
However the result is PATH. I would like to have result in form of NODES not the path. I tried to use the nodes(path), but it gives me an error:
org.codehaus.jackson.map.JsonMappingException: Reference node not available (through reference chain: java.util.ArrayList[0]->java.util.HashMap["rel"]->java.util.HashMap["nodes(path)"]->java.util.ArrayList[0]->org.neo4j.rest.graphdb.entity.RestNode["restApi"]->org.neo4j.rest.graphdb.RestAPIFacade["direct"]->org.neo4j.rest.graphdb.ExecutingRestAPI["referenceNode"])
When I query without nodes(path) it seems to return only paths.
How this should be done on the ground of cypher query?
I'm not sure why you would want to get all possible paths in your organizational hierarchy. Maybe what you want to get is a set of paths from the leaves of the tree to the root of the tree, and to return each unique set as a row of nodes.
MATCH (b:Employee)
WHERE NOT (b)-[:REPORTS_TO]->()
MATCH (l:Employee)
WHERE NOT (l)<-[:REPORTS_TO]-()
MATCH p = shortestPath((b)<-[:REPORTS_TO*]-(l))
RETURN nodes(p) as reports
As far as your error goes, that looks like a bug, although I don't know what version of Neo4j you are using. In all likelihood, your query won't complete because your Root employees are still a member of the Employee label. Which means that this pattern: MATCH path = (b:Root)<-[:REPORTS_TO*]-(e:Employee) matches the Root employees on each side of the variable length traversal.
Give my query a try and let me know what happens.

Cypher query to find all paths with same relationship type

I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.

Resources