I have the following node structure Emp[e_id, e_name, e_bossid]. What is more I have a recursive query that exploit the database in recursive traversal on SELF relation e_bossid-[REPORTS_TO]->e_id
MATCH (e:Employee) WHERE NOT (e)-[:REPORTS_TO]->()
SET e:Root;
MATCH path = (b:Root)<-[:REPORTS_TO*]-(e:Employee)
RETURN path
limit 1000;
However the result is PATH. I would like to have result in form of NODES not the path. I tried to use the nodes(path), but it gives me an error:
org.codehaus.jackson.map.JsonMappingException: Reference node not available (through reference chain: java.util.ArrayList[0]->java.util.HashMap["rel"]->java.util.HashMap["nodes(path)"]->java.util.ArrayList[0]->org.neo4j.rest.graphdb.entity.RestNode["restApi"]->org.neo4j.rest.graphdb.RestAPIFacade["direct"]->org.neo4j.rest.graphdb.ExecutingRestAPI["referenceNode"])
When I query without nodes(path) it seems to return only paths.
How this should be done on the ground of cypher query?
I'm not sure why you would want to get all possible paths in your organizational hierarchy. Maybe what you want to get is a set of paths from the leaves of the tree to the root of the tree, and to return each unique set as a row of nodes.
MATCH (b:Employee)
WHERE NOT (b)-[:REPORTS_TO]->()
MATCH (l:Employee)
WHERE NOT (l)<-[:REPORTS_TO]-()
MATCH p = shortestPath((b)<-[:REPORTS_TO*]-(l))
RETURN nodes(p) as reports
As far as your error goes, that looks like a bug, although I don't know what version of Neo4j you are using. In all likelihood, your query won't complete because your Root employees are still a member of the Employee label. Which means that this pattern: MATCH path = (b:Root)<-[:REPORTS_TO*]-(e:Employee) matches the Root employees on each side of the variable length traversal.
Give my query a try and let me know what happens.
Related
The database has a graph with the following 3 nodes:
...->(1) ------>(3)-->...
\ ^
\ |
---->(2)---/
Now, I want to get all distinct nodes that are reachable from node 1 to node 3, including themselves where I know exactly unique properties of node 1 and node 3 (the nodes are actually commits from a github repository). So, I came up with the following query:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (origin)-[:CHANGED_TO*0..]->(intermediate_commit:App)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
However, the query never finishes or at least takes too long to complete. I know that I could have used MATCH p=(origin:App)-[:CHANGED_TO*0..]->(destination:App) and then UNWIND and return distinct nodes. The problem is, I believe, it queries different paths implying I am interested in relationships between them too. While in fact I am not interested in paths. What I need is only distinct nodes that match the pattern. My understanding is that querying paths is slower than it could be if I could query only the nodes.
Could you please help to understand what I am missing? Thanks!
The solution was quite simple. Instead of specifying a pattern in MATCH clause, we move the pattern to WHERE clause. Also, I split the pattern into 2 parts. I can't explain why exactly it is faster but my understanding is that when we move pattern to WHERE clause and MATCH only nodes, we let neo4j know that we are interested only in nodes and not in all possible paths that match the pattern.
The full query:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (intermediate_commit:App)
WHERE (origin)-[:CHANGED_TO*0..]->(intermediate_commit)
AND (intermediate_commit)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
Also, if you have a lot of nodes, I believe specifying LIMIT 1 to match origin and destination can also improve the query, like this:
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
WITH origin
LIMIT 1
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
WITH origin, destination
LIMIT 1
MATCH (intermediate_commit:App)
WHERE (origin)-[:CHANGED_TO*0..]->(intermediate_commit)
AND (intermediate_commit)-[:CHANGED_TO*0..]->(destination)
RETURN distinct intermediate_commit
That might be an unbounded path search? Do you really want all paths of any length between the two nodes (e.g. paths spanning the entire graph?)
Does this do what you want?
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
MATCH (origin:App)-[:CHANGED_TO *0..1]->(intermediate_commit:App)-[:CHANGED_TO *0..1]->(destination:App)
RETURN distinct intermediate_commit
I bounded the path length to one hop, changing from 0.. to 0..1
(which means minimum 0 hop, up to 1 relationship hop)
The pattern and conditions allow for the possibility of paths that extend past the start or end nodes yet reach them again further down, this is why it doesn't stop when it finds one matching path but keeps expanding beyond it. Remember Cypher is concerned with finding all possible paths that meet the pattern that exist in the graph. And because of your pattern, the check-beyond-the-start-and-end-nodes-without-limit doesn't just happen once, but per potential (intermediate_commit:App) found while expanding, this is why your query isn't returning.
One way you can get what you want, all possible paths but stopping when the node is reached, is to use the APOC path expanders, you can supply the node as a terminator node, which will halt further expansion past it.
MATCH (origin:App)
WHERE origin.commit='10cb31b0a72525923c01dc34f8690f311a361d42'
MATCH (destination:App)
WHERE destination.commit='51fde433973463f057ffcbcbab0bc8944ab3ec9c'
CALL apoc.path.expandConfig(destination, {relationshipFilter:'<CHANGED_TO', terminatorNodes:[origin]}) YIELD path
UNWIND nodes[path] as node
WITH DISTINCT node
WHERE node:App
RETURN node as intermediate_commit
This is expanding backwards from destination to origin, seems like that could be more efficient. Once we have the paths, we can UNWIND the nodes from all paths, keep the distinct ones, and make sure we only take the :App nodes.
So my query is for a "superpath finding problem".
The relevant nodes here are;
route: The overall path object
tlroutesegment: The logical link between the route and the different segments (which compose the full path) (ps: I know this could be better represented using a relationship, however the database is just made this way :S)
oms: The PHYSICAL path segments itself
validochpath: More or less irrelevant for this question; Top level entity of routes
So on to the actual problem I am having; below is a WORKING solution to the above, HOWEVER, I wanted to optimize the query a bit by reducing the # of routes we have to search through in the 4th line here.
MATCH (vp:validochpath {"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)
WITH collect(oms) AS omsNodes
MATCH (ort:route)
WHERE ALL(x in omsNodes WHERE (ort)<--(:tlroutesegment)-->(x))
WITH ort
MATCH (ort)--(vp:validochpath)
RETURN *
This is what the new query looks like, as you can see I use the relation to filter out much of the route nodes.
MATCH (vp:validochpath {onepID:"some ID HERE"})-->(ort:route)<--
(rs:tlroutesegment)-->(oms:oms)<--(rs2:tlroutesegment)
WITH rs2, collect(oms) AS omsNodes
MATCH (rs2)-->(ort2:route)
WHERE ALL(x in omsNodes WHERE (x)<--(:tlroutesegment)-->(ort2))
MATCH (ort2)--(vp:validochpath)
RETURN *
The problem is, this query does not seem to filter out any nodes with the WHERE ALL and just returns everything.
In your second query, the WHERE clause accepts all matches.
From the first MATCH clause, we know that rs2 is a tlroutesegment and that all the nodes in omsNodes are related to rs2. From the second MATCH clause, we also know that ort2 is related to rs2. Your WHERE clause is checking that all the nodes in omsNodes are related to a tlroutesegment that is also related to ort2. Since rs2 is a tlroutesegment, this test always succeeds.
If you want to test for the existence of paths with tlroutesegment nodes that are different than rs2, try this WHERE clause:
WHERE ALL(x in omsNodes WHERE
SIZE([(x)<--(y:tlroutesegment)-->(ort2) WHERE y <> rs2 | y]) > 0)
Having this query working in Cypher (Neo4j):
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
RETURN p
which returns all possible paths belonging a specific group (group is just a property to classify nodes), I am struggling to get a query that returns the paths in common between both collection of paths. It would be something like this:
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
MATCH p=(g3:Node)-[:FOLLOWED_BY *2..2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15
RETURN INTERSECTION(path1, path2)
Of course I made that up. The goal is to get all the paths in common between both queries.
The start/end nodes of your 2 MATCHes have different groups, so they can never find common paths.
Therefore, when you ask for "paths in common", I assume you actually want to find the shared middle nodes (between the 2 sets of 3-node paths). If so, this query should work:
MATCH p1=(g:Node)-[:FOLLOWED_BY *2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
WITH COLLECT(DISTINCT NODES(p1)[1]) AS middle1
MATCH p2=(g3:Node)-[:FOLLOWED_BY *2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15 AND NODES(p2)[1] IN middle1
RETURN DISTINCT NODES(p2)[1] AS common_middle_node;
In my graph each node has a name and graph is actually a tree, so there exists a /path/to/each/node. Here's the query I currently use to get the path:
MATCH p=(n:Node{id:4})-[:CHILD_OF*0..200]->(r:Root{treeName:"vt"})
RETURN reduce(path = "", node IN nodes(p) | node.name + "/" + path) as path
An actual query is somewhat heavier, but the behavior is the same. So, having a ("")<-("a")<-("b")<-("c")<-("d") path I will get /a/b/c/d/. I don't mind trimming the last /, but I'm really worried about the order of the nodes iterator returned by nodes(p).
So, my question is mainly targeting neo4j team - are there any guarantees as to the order? Would it be better is I just returned Path and then manually extracted each property? I'm using Cypher with an embedded neo4j distribution, so that won't be a problem.
I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.