I am trying to find the shortest path between two nodes but I need to exclude some nodes from the path. The cypher that I am trying is
START a=node(1), b=node(2), c=node(3,4)
MATCH p=a-[rel:RELATIONSHIP*]-b
WHERE NOT (c in nodes(p))
RETURN p
ORDER BY length(p) LIMIT 1
But this is giving me a path which includes one of the nodes in c.
Is there a way to do a traversal excluding some nodes?
Thanks
The MATCH ... WHERE part should be fine, but your START clause may not do what you expect. Do the following and consider the result
START a=node(1), b=node(2), c=node(3,4)
RETURN ID(a), ID(b), ID(c)
You get back
==> +-----------------------+
==> | ID(a) | ID(b) | ID(c) |
==> +-----------------------+
==> | 1 | 2 | 3 |
==> | 1 | 2 | 4 |
==> +-----------------------+
That means the rest of the query is executed twice, once excluding (3) from the path and once excluding (4). But that also means of course that it is run once not excluding each of them, which means you can indeed get results with those nodes present on the path.
If you want to exclude both of those nodes from the path, try collecting them and filtering with NONE or NOT ANY or similar. I think something like this should do it (can't test at the moment).
START a=node(1), b=node(2), c=node(3,4)
WITH a, b, collect (c) as cc
MATCH p=a-[rel:RELATIONSHIP*]-b
WHERE NONE (n IN nodes(p) WHERE n IN cc)
RETURN p
ORDER BY length(p) LIMIT 1
Related
In neo4j my database consists of chains of nodes. For each distinct stucture/layout (does graph theory has a better word?), I want to count the number of chains. For example, the database consists of 9 nodes and 5 relationships as this:
(:a)->(:b)
(:b)->(:a)
(:a)->(:b)
(:a)->(:b)->(:b)
where (:a) is a node with label a. Properties on nodes and relationships are irrelevant.
The result of the counting should be:
------------------------
| Structure | n |
------------------------
| (:a)->(:b) | 2 |
| (:b)->(:a) | 1 |
| (:a)->(:b)->(:b) | 1 |
------------------------
Is there a query that can achieve this?
Appendix
Query to create test data:
create (:a)-[:r]->(:b), (:b)-[:r]->(:a), (:a)-[:r]->(:b), (:a)-[:r]->(:b)-[:r]->(:b)
EDIT:
Thanks for the clarification.
We can get the equivalent of what you want, a capture of the path pattern using the labels present:
MATCH path = (start)-[*]->(end)
WHERE NOT ()-->(start) and NOT (end)-->()
RETURN [node in nodes(path) | labels(node)[0]] as structure, count(path) as n
This will give you a list of the labels of the nodes (the first label present for each...remember that nodes can be multi-labeled, which may throw off your results).
As for getting it into that exact format in your example, that's a different thing. We could do this with some text functions in APOC Procedures, specifically apoc.text.join().
We would need to first add formatting around the extraction of the first label to add the prefixed : as well as the parenthesis. Then we could use apoc.text.join() to get a string where the nodes are joined by your desired '->' symbol:
MATCH path = (start)-[*]->(end)
WHERE NOT ()-->(start) and NOT (end)-->()
WITH [node in nodes(path) | labels(node)[0]] as structure, count(path) as n
RETURN apoc.text.join([label in structure | '(:' + label + ')'], '->') as structure, n
I have nodes in a chain, like this:
(n {height:100})
|
(n)
|
(n)
|
(n)
|
(n)
I can get these nodes with this cypher query:
MATCH chain=(start :n {height:100})-[:chain*4]->(end :n)
RETURN chain
However, each node in this chain also has a single node coming off it with a specific relationship, like this:
(n)-[:single]->(o)
|
(n)-[:single]->(o)
|
(n)-[:single]->(o)
|
(n)-[:single]->(o)
|
(n)-[:single]->(o)
I would like to return each (n), as well as the (o) coming off it.
Is it possible to do this in one cypher query?
Shouldn't be a problem, though this is easier if we don't match on the path, but get all nodes in the chain (and the single node off of each) instead.
MATCH (start :n {height:100})-[rels:chain*0..4]->(chainlink :n)-[:single]->(o)
RETURN chainlink, o
ORDER BY SIZE(rels)
Okay, it seems the use of WITH(chain) and UNWIND does the trick:
MATCH chain=(start :n {height:100})-[:chain*4]->(end :n)
WITH NODES(chain) AS nodes
UNWIND nodes as node
OPTIONAL MATCH (node)-[:single]->(o :o)
RETURN nodes, COLLECT(o) as os
I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B.
So do labels order effects search time? Is this future is documented anywhere?
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)
how do I express the following in Cypher
"Return all nodes with at least one incoming edge of type A and no outgoing edges".
Best Regards
You can use a pattern to exclude nodes from the result subset like this:
MATCH ()-[:A]->(n) WHERE NOT (n)-->() RETURN n
Try
MATCH (n)
WHERE ()-[:A]->n AND NOT n-->()
RETURN n
or
MATCH ()-[:A]->(n)
WHERE NOT n-->()
RETURN DISTINCT n
Edit
Pattern expressions can be used both for pattern matching and as predicates for filtering. If used in the MATCH clause, the paths that answer the pattern are included in the result. If used for filtering, in the WHERE clause, the pattern serves as a limiting condition on the paths that have previously been matched. The result is limited, not extended to include the filter condition. When a pattern is used as a predicate for filtering, the negation of that predicate is also a predicate that can be used as a filter condition. No path answers to the negation of a pattern (if there is such a thing) so negations of patterns cannot be used in the MATCH clause. The phrase
Return all nodes with at least one incoming edge of type A and no outgoing edges
involves two patterns on nodes n, namely any incoming relationship [:A] on n and any outgoing relationship on n. The second must be interpreted as a pattern for a predicate filter condition since it involves a negation, not any outgoing relationship on n. The first, however, can be interpreted either as a pattern to match along with n, or as another pattern predicate filter condition.
These two interpretations give rise to the two cypher queries above. The first query matches all nodes and uses both patterns to filter the result. The second matches the incoming relationship on n along with n and uses the second pattern to filter the results.
The first query will match every node only once before the filtering happens. It will therefore return one result item per node that meets the criteria. The second query will match the pattern any incoming relationship [:A] on n once for each path, i.e. once for each incoming relationship on n. It may therefore contain a node multiple times in the result, hence the DISTINCT keyword to remove doubles.
If the items of interest are precisely the nodes, then using both patterns for predicates in the WHERE clause seems to me the correct interpretation. It is also more efficient since it needs to find only zero or one incoming [:A] on n to resolve the predicate. If the incoming relationships are also of interest, then some version of the second query is the right choice. One would need to bind the relationship and do something useful with it, such as return it.
Below are the execution plans for the two queries executed on a 'fresh' neo4j console.
First query:
----
Filter
|
+AllNodes
+----------+------+--------+-------------+------------------------------------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------+------+--------+-------------+------------------------------------------------------------------------------------------------------------------------------+
| Filter | 0 | 0 | | (nonEmpty(PathExpression((17)-[ UNNAMED18:A]->(n), true)) AND NOT(nonEmpty(PathExpression((n)-[ UNNAMED36]->(40), true)))) |
| AllNodes | 6 | 7 | n, n | |
+----------+------+--------+-------------+------------------------------------------------------------------------------------------------------------------------------+
Second query:
----
Distinct
|
+Filter
|
+TraversalMatcher
+------------------+------+--------+-------------+--------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+------+--------+-------------+--------------------------------------------------------------+
| Distinct | 0 | 0 | | |
| Filter | 0 | 0 | | NOT(nonEmpty(PathExpression((n)-[ UNNAMED30]->(34), true))) |
| TraversalMatcher | 0 | 13 | | n, UNNAMED8, n |
+------------------+------+--------+-------------+--------------------------------------------------------------+
Given a Neo4J Node with an array property, how do I create a Cypher query to return only the node(s) that match an array literal?
Using the console I created a node with the array property called "list":
neo4j-sh (0)$ create n = {list: [1,2,3]};
==> +-------------------+
==> | No data returned. |
==> +-------------------+
==> Nodes created: 1
==> Properties set: 1
==> 83 ms
neo4j-sh (0)$ start n=node(1) return n;
==> +-----------------------+
==> | n |
==> +-----------------------+
==> | Node[1]{list:[1,2,3]} |
==> +-----------------------+
==> 1 row
==> 1 ms
However, my query does not return the Node that was just created given a WHERE clause that matches an array literal:
neo4j-sh (0)$ start n=node(1) where n.list=[1,2,3] return n;
==> +---+
==> | n |
==> +---+
==> +---+
==> 0 row
==> 0 ms
It's entirely possible I'm mis-using Cypher. Any tips on doing exact array property matching in Cypher would be helpful.
The console is always running the latest SNAPSHOT builds of Neoj4. the version refers to the Cypher Syntax parswer, we will point that out more clearly :)
Now, there has been some fixing around the Array handling in Cypher, see https://github.com/neo4j/community/pull/815 and https://github.com/neo4j/community/issues/818 which problably are the ones that make the console work. This has been merged in after 1.8.M07, so in order to get it work locally, please download one of the latest 1.8.-SNAPSHOT, build it from GITHUB or wait for 1.8.M08 which is due very soon.
/peter
Which version of Neo4j are you using?
Your same code works for me in 1.8M07.
http://console.neo4j.org/?id=p9cy6l
Update:
I get the same result (no results) in a local install via the web client. Maybe it's a web client issue?