I need a query that will get me the shortest circle path between nodes (so if there are multiple paths just returns the shortest one). In addition, these paths shouldn't contain repeated nodes. Examples:
In this case, if I pass "Item B" as input, I should receive the path "Item B -> Item C -> Item E -> Item B" since the other path "Item B -> Item C -> Item A -> Item C -> Item E - Item B" not only is longer but also contains repeated nodes (Item C)
Using the same picture, if I pass "Item A" as input, I should receive the path "Item A -> Item C -> Item A"
In addition, it would be nice if the response could include all the nodes involved, without repeating the starting and final node that is the same in all cases.
Thanks in advance!
Try something like:
MATCH (n:Node{id:"a"})
MATCH p=(n)-[*..20]->(n)
WITH p, length(p) as len
ORDER by len ASC LIMIT 1
UNWIND nodes(p) as node
RETURN distinct node
Not sure how well it scales though, note that I added a filter that checks for paths only 20 or fewer hops away.
Related
I have for example the following graph in Neo4j
(startnode)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#the line below can repeat itself 0..n times
(node)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#up to the endnode
(endnode)
There is an Interface properties I also need to match on. I do not want to follow all the paths, I just the one with Interface Node property I am looking for. For example Interface.VlanList CONTAINS ",23,"
I have done the following in Cypher but it applies that I already know how many iterations I am going to find which in reality is not the case.
match (n:StartNode {name:"device name"}) -[:BELONG_TO]- (i:Interface) -[:IS_CONNECTED]- (ii:Interface)-[:BELONG_TO]-(nn:Node) -[:BELONG_TO]- (iii:Interface) -[:IS_CONNECTED]- (iiii:Interface) -[:BELONG_TO]-(nnn:Node)
where i.VlanList CONTAINS ",841,"
AND ii.VlanList CONTAINS ",841,"
AND iii.VlanList CONTAINS ",841,"
return n, i,ii,nn,iii,iiii,nnn
I have been looking at the documentation but can not work out how the above could be resolved.
This should work:
// put the searchstring in a variable
WITH ',841,' AS searchstring
// look up start end endnode
MATCH (startNode: .... {...}), (endNode: .... {...})
// look for paths of variable length
// that have your search string in all nodes,
// except the first and the last one
WITH searchstring,startNode,endNode
MATCH path=(startnode)-[:BELONG_TO|IS_CONNECTED*]-(endnode)
WHERE ALL(i IN nodes(path)[1..-1] WHERE i.VlanList CONTAINS searchstring)
RETURN path
You can also look at https://neo4j.com/labs/apoc/4.1/graph-querying/path-expander/ for more ideas about how you can limit the pathfinding.
This query should work for you (assuming that the relationship directions I chose are correct):
MATCH p = (sNode:StartNode)-[:BELONG_TO]->(i1:Interface)-[:IS_CONNECTED]->(i2:Interface)-[:BELONG_TO]->(n1)-[:BELONG_TO|IS_CONNECTED*0..]->(eNode:Node)
WHERE sNode.name = "device name" AND eNode.name = "foo" AND LENGTH(p)%3 = 0
WITH p, i1, i2, n1, eNode, RELATIONSHIPS(p) AS rels, NODES(p) AS ns
WHERE n1 = eNode OR (
ALL(j IN RANGE(3, SIZE(rels)-3, 3) WHERE
'BELONG_TO' = TYPE(rels[j]) = TYPE(rels[j+2]) AND
'IS_CONNECTED' = TYPE(rels[j+1])) AND
ALL(x IN ([i1, i2] + REDUCE(s = [], i IN RANGE(3, SIZE(ns)-2, 3) | CASE WHEN i%3 = 0 THEN s ELSE s +ns[i] END))
WHERE x:Interface AND x.VlanList CONTAINS $substring)
)
RETURN p
It checks that the returned paths have the required pattern of node labels, node property value, and relationship types. It takes advantage of the variable length relationship syntax, using zero as the lower bound. Since there is no upper bound, the variable length relationship query query can take "forever" to finish (and in such a situation, you should use a reasonable upper bound).
I am trying to get a query that starting from a node, it returns the missing node that, when making a new relation to it, would complete a circle. Also it should respond which is the node that, if the circle is close, will end up having a relationship with the input node. Example:
Let's say I have B -> C and C -> A. In this case, if I pass A as input, I would like to receive { newRelationToMake: B, relationToInputNode: C } as a result, since connecting A -> B will result in a closed circle ABC and the relation that the node A will be having will come from C.
Ideally, this query should work for a maximum of n depths. For example for a depth of 4, with relations B -> C, C -> D and D -> A, and I pass A as input, I would need to receive { newRelationToMake: C, relationToInputNode: D} (since if I connect A -> C I close the ACD circle) but also receive {newRelationToMake: B, relationToInputNode: D }(since if I connect A -> B I would close the ABCD circle).
Is there any query to get this information?
Thanks in advance!
You are basically asking for all distinct nodes on paths leading to A, but which are not directly connected to A.
Here is one approach (assuming the nodes all have a Foo label and the relationships all have the BAR type):
MATCH (f:Foo)-[:BAR*2..]->(a:Foo)
WHERE a.id = 'A' AND NOT EXISTS((f)-[:BAR]->(a))
RETURN DISTINCT f AS missingNodes
The variable-length relationship pattern [:BAR*2..] looks for all paths of length 2 or more.
Given graph with Activity (blue nodes) and Gateway (most of gray nodes)
When I execute the query (activitiNodeId is node called "Notify host regulators"):
MATCH p =(cur:Activity {projectId: '13', activitiNodeId: 'sid-84FC0D7F-9683-4D63-A2EA-A3ABB2AD10AE_0_null'})-[r:PRECEDES*]->(next)
WHERE ANY (label IN labels(next) WHERE label IN ['Activity', 'End'])
AND NOT (cur)-[:PRECEDES*]->(:Activity)-[:PRECEDES*]->(next)
RETURN p
I expect to get the following subgraph (because condition NOT (cur)-[:PRECEDES*]->(:Activity)-[:PRECEDES*]->(next) says that I expect to find all paths where there's no node of type Activity anywhere in path between cur and next):
But for some reason I got this one (it rejects paths when there's 2 Gateway nodes between Activity nodes):
I managed to achieve result I want only by manually calculating nodes in every path:
MATCH p =(cur:Activity {projectId: '13', activitiNodeId: 'sid-84FC0D7F-9683-4D63-A2EA-A3ABB2AD10AE_0_null'})-[r:PRECEDES*]->(next)
WHERE ANY (label IN labels(next) WHERE label IN ['Activity', 'End'])
AND SIZE(FILTER(x IN REDUCE(s = [], x IN EXTRACT(n IN NODES(p) | LABELS(n)) | s + x) WHERE x = 'Activity' OR x = 'End')) < 3
RETURN p
I use neo4j 3.2 with cypher.default_language_version=3.1, because of this issue.
Could anybody explain me this Cypher behavior?
WHERE NOT (cur)-[:PRECEDES*]->(:Activity)-[:PRECEDES*]->(next) actually means "where there is no path between cur and next that contains an Activity node".
Whereas your expected result contains 3 such intermediate Activity nodes, the actual result has no intermediate Activity nodes.
[UPDATED]
The query below should "find all paths where there's no node of type Activity anywhere in path between cur and next". The NONE function is used to filter out all paths that have intermediate Activity nodes. Notice that I also simplified the label tests.
MATCH p =(cur:Activity {projectId: '13', activitiNodeId: 'sid-84FC0D7F-9683-4D63-A2EA-A3ABB2AD10AE_0_null'})-[:PRECEDES*]->(next)
WHERE
(next:Activity OR next:End) AND
NONE(n IN NODES(p)[1..-1] WHERE n:Activity)
RETURN p;
The results of the above query will not match your apparently erroneous "expected" graph, since your expected graph contains 3 paths that do have intermediate Activity nodes.
Here is the breakdown of your query
// Matches start_node | pathof0+_relationships | end_node
MATCH p =(cur:Activity {projectId: '13', activitiNodeId: 'sid-84FC0D7F-9683-4D63-A2EA-A3ABB2AD10AE_0_null'})-[r:PRECEDES*]->(next)
// Where end_node is an Activity or End
WHERE ANY (label IN labels(next) WHERE label IN ['Activity', 'End'])
// And there exists no path from start to end that has an activity between them
AND NOT (cur)-[:PRECEDES*]->(:Activity)-[:PRECEDES*]->(next)
RETURN p
Note that the last check is "if A path exists" not "if this path contains". Because A path exists to the far node, all paths to it get filtered out.
In your corrected query, you are actually checking "if any node in THIS path is" (the iteration of the path p variable) so that is why it has different results.
I have this dataset containing 3M nodes and more than 5M relationships. There about 8 different relationship types. Now I want to return 2 nodes if they are inter-connected.. Here the 2 nodes are A & B and I would like to see if they are inter-connected.
MATCH (n:WCD_Ent)
USING INDEX n:WCD_Ent(WCD_NAME)
WHERE n.WCD_NAME = "A"
MATCH (m:WCD_Ent)
USING INDEX m:WCD_Ent(WCD_NAME)
WHERE m.WCD_NAME = "B"
MATCH (n) - [r*] - (m)
RETURN n,r,m
This gives me Java Heap Space error.
Another conditionality I am looking to put in my query is if the relationship between the 2 nodes A&B contains one particular relationship type(NAME_MATCH) atleast once. A Could you help me address the same?
Gabor's suggestion is the most important fix; you are blowing up heap space because you are generating a cartesian product of rows to start, then filtering out using the pattern. Generate rows using the pattern and you'll be much more space efficient. If you have an index on WCD_Ent(WCD_NAME), you don't need to specify the index, either; this is something you only do if your query is running very slow and a PROFILE shows that the query planner is skipping the index. Try this one instead:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*..5]-(m:WCD_Ent { WCD_NAME: "B" })
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
The WHERE filter here will check all of the relationships in r (which is a collection, the way you've assigned it) and ensure that at least 1 of them matches the desired type.
Tore's answer (including the variable relationship upper bound) is the best one for finding whether two nodes are connected and if a certain relationship exists in a path connecting them.
One weakness with most of the solutions given so far is that there is no limitation on the variable relationship match, meaning the query is going to crawl your entire graph attempting to match on all possible paths, instead of only checking that one such path exists and then stopping. This is likely the cause of your heap space error.
Tore's suggesting on adding an upper bound on the variable length relationships in your match is a great solution, as it also helps out in cases where the two nodes aren't connected, preventing you from having to crawl the entire graph. In all cases, the upper bound should prevent the heap from blowing up.
Here are a couple more possibilities. I'm leaving off the relationship upper bound, but that can easily be added in if needed.
// this one won't check for the particular relationship type in the path
// but doesn't need to match on all possible paths, just find connectedness
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN EXISTS((n)-[*]-(m))
// using shortestPath() will only give you a single path back that works
// however WHERE ANY may be a filter to apply after matches are found
// so this may still blow up, not sure
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN shortestPath((n)-[r*]-(m))
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
// Adding LIMIT 1 will only return one path result
// Unsure if this will prevent the heap from blowing up though
// The performance and outcome may be identical to the above query
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
MATCH (n)-[r*]-(m)
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
LIMIT 1
Some enhancements:
Instead of the WHERE condition, you can bind the property value inside the pattern.
You can combine the three MATCH conditions into a single one, which makes sure that the query engine will not calculate a Cartesian product of n AND m. (You can also use EXPLAIN to visualize the query plan and check this.)
The resulting query:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*]-(m:WCD_Ent { WCD_NAME: "B" })
RETURN n, r, m
Update: Tore Eschliman pointed out that you don't need to specify the indices, so I removed these two lines from the query:
USING INDEX n:WCD_Ent(WCD_NAME)
USING INDEX m:WCD_Ent(WCD_NAME)
Let's say we have nodes that has an array property.
Node 1
fruits = ['apple','mango']
Node 2
fruits = ['apple']
Node 3
fruits = ['tomato']
and we want to find all nodes wherein one of their fruits exists in Maria's basket.
Maria's basket = ['orange','grape','apple']
So our end result would be : Node 1 and Node 2.
My approach would be matching all nodes whose elements of its fruits array exists with Maria's basket. But I couldn't get it to work
match (n) where x in n.fruits in ['orange','grape','apple'] return n
I tried the query above and returns syntax error since x is not defined. How do we properly approach this problem?
The second approach I'm thinking is, match all nodes if there is a UNION that exists between a node's fruits and Maria's fruits.
If you want to find nodes where exactly one fruit matches:
MATCH (n)
WHERE single(x IN n.fruits WHERE x IN ['orange', 'grape', 'apple'])
RETURN n;
If you want to find nodes where >= 1 fruits match:
MATCH (n)
WHERE any(x IN n.fruits WHERE x IN ['orange', 'grape', 'apple'])
RETURN n;
I wasn't sure which one you wanted based on your wording.