Unexpected behavior combining collections in Cypher

Unexpected behavior combining collections in Cypher - neo4j

Using http://console.neo4j.org as a sandbox, I have come across the following unexpected behavior:
Statement 1 - Returns 1 row with a collection containing Neo Node
MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
WITH c1+[] AS c2
RETURN c2
Statement 2 - Returns 0 rows (unexpected)
MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
MATCH (n:Crew)
WHERE n.name="NoOne"
WITH c1+COLLECT(n) AS c2
RETURN c2
Statement 3 - Returns 1 row containing an empty collection
MATCH (n:Crew)
WHERE n.name="NoOne"
WITH COLLECT(n) AS c1
RETURN c1
I fail to see why Statement 2 is not returning the same result as Statement 1, because it should return a collection containing the Neo node, just like in Statement 1.
Statement 3 shows that the second MATCH in Statement 2 should be resulting in an empty collection.
Is this behavior expected in Cypher? If that's the case, I'd be happy about a small explanation to help me understand this behavior.

I've run into this exact behavior before, and it is very frustrating. The issue is with the second MATCH clause in Query 2: if an existing result row (in this case, your single row with c1) doesn't return any results for a MATCH, that row will be dropped completely after that MATCH clause, even though that MATCH on its own (without the pre-existing result row) returns an empty collection. If you convert it to an OPTIONAL MATCH you'll be able to keep your result row when there are no matches.
UPDATE: See below for a more thorough analysis, but the tl,dr is that the second COLLECT(n) in Statement 2 does return an empty list, just like in Statement 3; however, the whole clause WITH c1+COLLECT(n) AS c2 returns no rows, because there are no rows with a c1 value after the second MATCH.

I can't quite think of the right explanation for why the 2nd query doesn't do what you expect, but if you have multiple optional matches that you want to chain together then you could use an OPTIONAL MATCH to do that:
OPTIONAL MATCH (n:Crew)
WHERE n.name="Neo"
WITH COLLECT(n) AS c1
OPTIONAL MATCH (n:Crew)
WHERE n.name="NoOne"
WITH c1+COLLECT(n) AS c2
RETURN c2

Related

Unexpected relation and node COUNT in Neo4j

Whenever I am using a query to get the count of a specific node, I always get the number greater than 1 even though there is only one distinct type of that node existing.
Sample query:
MATCH (p)-[rel]->(v:myDistinctNode) RETURN COUNT(v)
Output: 80
MATCH (p)-[rel]->(v:myDistinctNode) RETURN COUNT(DISTINCT v)
Output: 1
I see different results while using DISTINCT, but I cannot use DISTINCT all the time. Why I am seeing this and how can I avoid it? Thanks!
Neo4j Kernel-Version: 3.5.14

The short answer is that you need to use a collect statement to make it work.
MATCH (p)<-[rel]-(v:myDistinctNode) WITH collect(v) AS nodes RETURN count(nodes)
This should return one.
I'm not a cypher expert, but I believe the reason it doesn't work is that the cypher result seems more like a table where in one row you have p, another row you have r, and the last row you have v. Even though v is a unique entity, there are still 80 rows that have v.

How to stop Neo4J Cypher from processing an empty collection?

I have this kind of request Cypher Neo4J request:
MATCH (c1:Concept)
WHERE c1.name in (['word'])
WITH COLLECT(distinct c1) as concepts
MATCH (ctx:Context)
WHERE ALL(c in concepts
WHERE (c)-->(ctx) AND ((ctx.by) = '15229100-b20e-11e3-80d3-6150cb20a1b9'))
RETURN ctx
If there is a c1 with the name word, then it gets processed fine and I get acceptable results.
However, if there's no c1 with word then an empty collection is returned, however, it gets further processed and I just get all the ctx:Context nodes that satisfy the ctx.by criteria. Which is not right.
How to fix that in the request?

Aggregations (alone, without any non-aggregation variables as grouping keys) will succeed even when there are no rows, emitting a single row with the result, which will allow further processing, since there is a row to operate on.
To get the behavior you want, add a filter after the aggregation to ensure you have a non-empty list. This will ensure that if the list is empty, rows go to 0 and the subsequent operations won't take place:
MATCH (c1:Concept)
WHERE c1.name in (['word'])
WITH COLLECT(distinct c1) as concepts
WHERE size(concepts) <> 0
MATCH (ctx:Context)
WHERE ALL(c in concepts
WHERE (c)-->(ctx) AND ((ctx.by) = '15229100-b20e-11e3-80d3-6150cb20a1b9'))
RETURN ctx

cypher distinct is returning duplicate using with parameter

MATCH (c:someNode) WHERE LOWER(c.erpId) contains (LOWER("1"))
OR LOWER(c.constructionYear) contains (LOWER("1"))
OR LOWER(c.label) contains (LOWER("1"))
OR LOWER(c.name) contains (LOWER("1"))
OR LOWER(c.description) contains (LOWER("1"))with collect(distinct c) as rows, count(c) as total
MATCH (c:someNode)-[adtype:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE toString(ad.streetAddress) contains "1"
OR toString(ad.postalCity) contains "1"
with distinct rows+collect( c) as rows, count(c) +total as total
UNWIND rows AS part
RETURN part order by part.name SKIP 20 Limit 20
When I run the following cypher query it returns duplicate results. Also it the skip does not seem to work. What am I doing worng

When you use WITH DISTINCT a, b, c (or RETURN DISTINCT a, b, c), that just means that you want each resulting record ({a: ..., b: ..., c: ...}) to be distinct -- it does not affect in any way the contents of any lists that may be part of a, b, or c.
Below is a simplified query that might work for you. It does not use the LOWER() and TOSTRING() functions at all, as they appear to be superfluous. It also only uses a single MATCH/WHERE pair to find all the the nodes of interest. The pattern comprehension syntax is used as part of the WHERE clause to get a non-empty list of true value(s) iff there are any anotherObject node(s) of interest. Notice that DISTINCT is not needed.
MATCH (c:someNode)
WHERE
ANY(
x IN [c.erpId, c.constructionYear, c.label, c.name, c.description]
WHERE x CONTAINS "1") OR
[(c)-[:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE ad.streetAddress CONTAINS "1" OR ad.postalCity CONTAINS "1"
| true][0]
RETURN c AS part
ORDER BY part.name SKIP 20 LIMIT 20;

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?

Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.

You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .

Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

Cypher query doesn't return all the expected nodes

I have this graph:
A<-B->C
B is the root of a tiny tree. There is exactly one relation between A and B, and one between B and C.
When I run the following, one node is returned. Why does this Cypher query not return the A and C nodes?
MATCH(a {name:"A"})<-[]-(rewt)-[]->(c) RETURN c
It would seem to be that the first half of that query would find the root, and the second half would find both child nodes.
Until a few minutes ago, I would have thought it logically identical to the following query which works. What's the difference?
MATCH (a {name:"A"})<-[]-(rewt)
MATCH (rewt)-[]->(c)
RETURN c
EDIT for cybersam
I have abstracted my database so we could discuss my specific issue. Now, we still have a tiny tree, but there are 4 nodes that are children of the root.(Sorry this is different, but I'm developing and don't want to change my environment too much.)
This query returns all 4:
match(a)<-[]-(b:ROOT)-[]->(c) return c
One of them has a name of "dddd"...
match(a {name"dddd"})<-[]-(b:ROOT)-[]->(c) return c
This query only returns three of them. "dddd" is not included. omg.
To answer cybersam's specific question, this query:
MATCH (a {name:"dddd"})<--(rewt:CODE_ROOT)
MATCH (rewt)-->(c)
RETURN a = c;
Returns four rows. The values are true, false, false, false

[UPDATED]
There is a difference between your 2 queries. A MATCH clause will filter out all duplicate relationships.
Therefore, your first query would filter out all matches where the left-side relationship is the same as the right-side relationship:
MATCH(a {name:"A"})<--(rewt)-->(c)
RETURN c;
Your second query would allow the 2 relationships to be the same, since the relationships are found by 2 separate MATCH clauses:
MATCH (a {name:"A"})<--(rewt)
MATCH (rewt)-->(c)
RETURN c;
If I am right, then the following query should return N rows (where N is the number of outgoing relationships from rewt) and only one value should be true:
MATCH (a {name:"A"})<--(rewt)
MATCH (rewt)-->(c)
RETURN a = c;

Both work just fine for me. I've tried on 2.3.0 Community.
Do you mind posting your CREATE command ?

In each MATCH clause, each relationship will be matched only once. See http://neo4j.com/docs/stable/cypherdoc-uniqueness.html for reference.
See this related question as well: What does a comma in a Cypher query do?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Unexpected behavior combining collections in Cypher - neo4j

Related

Unexpected relation and node COUNT in Neo4j

How to stop Neo4J Cypher from processing an empty collection?

cypher distinct is returning duplicate using with parameter

Neo4j indices slow when querying across 2 labels

Cypher query doesn't return all the expected nodes

Categories

Resources