In Neo4j 2.0 this query:
MATCH (n) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
returns different results than this query:
START n = node(*) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
The first query works as expected returning a single Node for 'blevine' and the associated Nodes mentioned in the OPTIONAL MATCH clauses. The second query returns many more Nodes which do not even have a username property. I realize that start n = node(*) is not recommended and that START is not even required in 2.0. But the second form (with OPTIONAL MATCH replaced with question marks on the relationship type) worked prior to 2.0. In the second form, why is 'n' not being constrained to the single 'blevine' node by the first WHERE clause?
To run the second query as expected you would just need to add WITH n. In your query you would need to filter the result and pass it for optional match which is to be done using WITH
START n = node(*) WHERE n.username = 'blevine'
WITH n
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
From the documentation
WHERE defines the MATCH patterns in more detail. The predicates are part of the
pattern description, not a filter applied after the matching is done.
This means that WHERE should always be put together with the MATCH clause it belongs to.
when you do start n=node(*) where n.name="xyz" you need to pass the result explicitly into your next optional matches. But when you do MATCH (n) WHERE n.name="xyz" this tells graph specifically what node to start looking into.
EDIT
Here is the thing. The documentation says Optional Match returns null if a pattern is not found so in your first case, it includes all those results too where n.username property is null or cases where n doesnt even have a relationship suggested in the OPTIONAL MATCH pattern. So when you do a WITH n , the graph is explicitly told to use only n.
Excerpt from the documentation (link : here)
OPTIONAL MATCH matches patterns against your graph database, just like MATCH does.
The difference is that if no matches are found, OPTIONAL MATCH will use NULLs for
missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher
equivalent of the outer join in SQL.
Either the whole pattern is matched, or nothing is matched. Remember that
WHERE is part of the pattern description, and the predicates will be
considered while looking for matches, not after. This matters especially
in the case of multiple (OPTIONAL) MATCH clauses, where it is crucial to
put WHERE together with the MATCH it belongs to.
Also few more things to note about the behaviour of WHERE clause: here
Excerpts:
WHERE is not a clause in it’s own right — rather, it’s part of MATCH,
OPTIONAL MATCH, START and WITH.
In the case of WITH and START, WHERE simply filters the results.
For MATCH and OPTIONAL MATCH on the other hand, WHERE adds constraints
to the patterns described. It should not be seen as a filter after the
matching is finished.
Related
Quite new to neo4j/cypher. Im trying to return a property that is accessed by 2 different paths, depending on the Label, in this case, the Label of (n).
MATCH (k:KeyNode)<-[:BASED_ON]-(n)-[:CONTROLS|:MODIFIES]->()
WHERE id(k)=123456
//if label(n) = LabelA
OPTIONAL MATCH (n)<-[:LABEL_A_REL]-(c:Controller)-[:CONTROLS]->(r:Resource)-[:TYPE_OF]->(rt:ResourceType)
//if label(n) = NotLabelA
OPTIONAL MATCH (n)-[:LABEL_NOT_A_REL]->(r:Resource)-[:TYPE_OF]->(rt:ResourceType)
OPTIONAL MATCH (r)-[:PARENT*]->(ro:Room)
RETURN ID(r) as resourceId, ID(ro) as siteId, ID(rt) as rt:ResourceType
As is, the path defined 1st optional match and its defined nodes take precedence, leaving the 2nd opt match/path node redefinitions untouched, i assume because cypher won't redefine a variable. The goal is to get (r) and (rt) found on 2 possible paths.
I considered using CASE WHEN structure, but from the documentation I see only the option to return single properties, and not multiple (though i could be wrong)
This could be an approach:
MATCH (k:KeyNode)<-[:BASED_ON]-(n)-[:CONTROLS|:MODIFIES]->()
WHERE id(k)=123456
OPTIONAL MATCH (n)<-[:LABEL_A_REL]-(c:Controller)-[:CONTROLS]->(r1:Resource)-[:TYPE_OF]->(rt1:ResourceType)
OPTIONAL MATCH (n)-[:LABEL_NOT_A_REL]->(r2:Resource)-[:TYPE_OF]->(rt2:ResourceType)
// COALESCE to deal with precedence
WITH COALESCE(r1,r2) AS r,
COALESCE(rt1,rt2) AS rt
OPTIONAL MATCH (r)-[:PARENT*]->(ro:Room)
RETURN ID(r) as resourceId, ID(ro) as siteId, ID(rt) as rt:ResourceType
Consider the following schema, where orange nodes are of type Person and brown nodes are of type Movie. (This is from the "movies" dataset that is shipped with Neo4j).
The query that I am trying to write goes as follows:
Find all reviewer pairs, one following the other, and return the names
of the two reviewers. If they have both reviewed the same movie,
return the title of the movie as well. Restrict the query so that the first letter of the name of both reviewers is ’J’
Now, consider the following CYPHER query:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
This returns the following (incorrect) results:
Why?
What I've gathered so far:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
the WHERE constraints are considered while looking for matches, not afterwards.
When an OPTIONAL MATCH does not apply fully, null is put for the missing parts of the pattern
I still don't understand, why "Angela Scope" shows up in the results. In any case, if the predicates should forbid it to ever show up.
PS: I am aware that the following query returns the correct results
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
RETURN DISTINCT a.name, b.name, m.title
however, I'd like to find out why these two queries return different results and especially why the one mentioned first returns exactly this result.
Sure, you're almost at the answer already:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
This is important. You should not view the WHERE clause as independent, as it is associated with and modifies the preceding clause. So read it out like MATCH ... WHERE ... and OPTIONAL MATCH ... WHERE ... and WITH ... WHERE ... as a whole.
Remember that an OPTIONAL MATCH will never filter out rows. It will keep existing rows, and for any newly introduced variables, will try to find matches using the pattern provided that passes its WHERE clause. If it doesn't find matches, newly introduced variables will be set to null. And again...no filtering.
So for this snippet:
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
Angela Scope and Jessica Thompson have a follows relationship between them, and they have reviewed the same movie, The Replacements, but they fail the WHERE clause, since Angela's name doesn't start with a 'J'. Therefore the OPTIONAL MATCH didn't find anything, so the newly introduced variable m will come back as null. Nothing will be filtered.
In order to have a predicate filter your rows, the WHERE clause needs to be associated with a MATCH, or a WITH. So we could fix it as in the correct query you added later, or like this:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WITH a, m, b
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
And this is less efficient since the filtering happens after we've done the OPTIONAL MATCH. Better to filter earlier, so we only execute the OPTIONAL MATCH when we already have our filtered results.
Also to note, you have an issue with duplicates here due to your matching of these patterns at the start: (a:Person)-[:REVIEWED]->(:Movie). While this does indeed find persons who are reviewers, you will get a row per path that matches the pattern...so for Jessica Thompson, for example, you can see she has reviewed 2 movies, so there are two paths that match that pattern, which is why she's showing up at least twice per other reviewer in your results (and it will be multiplicative, depending on the number of movies the other reviewer has reviewed.
To fix this, instead of looking for all paths of a :Person reviewing a :Movie, look for a :Person where they have reviewed a movie:
MATCH (a:Person)
WHERE (a)-[:REVIEWED]->()
Because the pattern becomes a predicate, Cypher only has to find at least one :REVIEWED relationship from a :Person, and then it can stop looking, and you won't have those duplicate results.
I'd like to pull and combine data from several different paths that share a path at the beginning, not all of which might exist. For example, I'd like to do something like this:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (s)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
so that it doesn't retrace the path up to s. I can't actually do this, though, because UNION separates queries and the (s)-[:OPTIONAL] in the second part will match anything with an outgoing OPTIONAL relation; the s is a loose handle.
Is there a better way of doing this than repeating the path:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
I made a few attempts using WITH, but they all either caused the query to return nothing if any part failed, or I could not get them to line up into a single column and instead got rows with redundant data, or (with multiple, nested WITHs, which I'm not sure about the scoping of) just fetching everything.
Have you looked at the semantics of an optional match? So you can match to s, beyond s and your optional component. Something like:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN data.attribute, otherData.attribute
Sorry I missed the importance of a single column, is it really important?
You can gather the vaues into a single collection :
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN [data.attribute] + COLLECT(otherData.attribute)
But doesn't this work for a single column:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
WITH [data.attribute] + COLLECT(otherData.attribute) as col
RETURN UNWIND col AS val
I am trying to match multiple outer joins on the same level in neo4j.
My database consists of users and a count of common up ur downratings on articles. The ratings counts are on seprate edges for up and downratings between the users.
---------- -----------
| User n | -[:rating_positive {weight}]-> | User n2 |
---------- -----------
| ^
\-----[:rating_negative {weight}]-------/
Now i want to produce edges that sum up these ratings.
I would love to use multiple optional merges, that do so sch as e.g.:
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
But: In this example I get all users without any positive ratings and those with positive and negatice ratings but none with only negative ratings. So there seems to be a sequence in OPTIONAL MATCH.
If I swap the order of the "OPTIONAL MATCHes" I get those with only negative ratings but not those with onl positive ratings.
So "OPTIONAL MATCH" is somehow a sequence where only when the first
sequence is met I get something from the second and so on?
Is there a workaround?
Neo4j Version is 2.1.3.
P.S.:
Even more confusing matching against NULL does not seem to work. So this query:
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
WHERE rating_positive IS NULL AND rating_negative IS NOT NULL
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
will give me lots of edges with NULL rating_negative and NON NULL rating_positive. I don't know what is happening with null matching in WHERE?
Anyway I found a way to recode the nulls to 0 values using "coalesce":
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
WITH n, n2, coalesce(rating_positive.weight, 0) AS rating_positive, coalesce(rating_negative.weight, 0) as rating_negative
WHERE rating_positive = 0 AND rating_negative > 0
RETURN n.uid, n2.uid, rating_positive, rating_negative
With this query it works as expected.
I believe more than the sequencing of the optional match, it's the fact that you've bound n2.
So the next optional match is restricted to only match the nodes identified to be candidates for n2 in the previous match. And so it appears that the order of the optional match influences it.
If you take a look at a small sample graph I set up here http://console.neo4j.org/r/lrp55o , the following query
MATCH (n:User)
OPTIONAL
MATCH (n)-[:rating_negative]->(n2)
OPTIONAL
MATCH (n)-[:rating_positive]->(n2)
RETURN n,n2
returns B-[:rating_negative]->C and C-[:rating_negative]->D but it leaves out A-[:rating_positive]->B.
The first optional match for rating_negative bound C and D as nodes for "n2". The second optional match found no n which has a rating_positive to C or D and hence the results.
I'm a bit unclear about what you are trying to do with the query and null checks but a union would be one way to give you all the positive and negative relations (which you can add your filters to):
MATCH (n:User)
OPTIONAL
MATCH (n)-[rating:rating_negative]->(n2)
RETURN n,n2,rating.weight
UNION ALL
MATCH (n:User)
OPTIONAL
MATCH (n)-[rating:rating_positive]->(n2)
RETURN n, n2, rating.weight
If this is not what you're looking for, a small subgraph at http://console.neo4j.org?init=0 would be great to help you further.
EDIT: Since comments indicated that the sum of ratings was required between a pair of users, the following query does the job:
MATCH (u:User)-[rating:rating_positive|:rating_negative]->(u2)
RETURN u,u2,sum(rating.weight)
I can't be entirely sure whether this is what is causing your problem but it appears to me that you should be omitting the labels in the OPTIONAL MATCH clauses.
Perhaps try the query below
MATCH (n:`User`)
OPTIONAL MATCH (n)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n)-[rating_negative:rating_negative]-(n2)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
It may also be worth including the relationship directions.
MATCH (n:`User`)
OPTIONAL MATCH (n)-[rating_positive:rating_positive]->(n2:`User`)
OPTIONAL MATCH (n)-[rating_negative:rating_negative]->(n2)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
What is the difference between
OPTIONAL MATCH clauseA, clauseB
and
OPTIONAL MATCH clauseA
OPTIONAL MATCH clauseB
I get different behavior depending on which form I use.
For example:
START n=node(111)
OPTIONAL MATCH n<-[links_n_in]-(n_from),n-[links_n_out]->(n_to)
RETURN n,COLLECT(n_from) AS n_from,COLLECT(links_n_in) AS links_n_in,COLLECT(n_to) AS n_to,COLLECT(links_n_out) AS links_n_out
which is designed to return a node; it's incoming relationships and from nodes; it's outgoing relationship and to nodes.
I have a test graph consisting of Node 111 which has 4 outgoing relationships each of which points to the same Node (I have other test cases in which 111 points to different Nodes). Executing the query as above returns only Node 111 in column 'n'. The columns for 'n_from', 'links_n_in', 'n_to', 'links_n_out' are empty.
If I modify the query to:
START n=node(111)
OPTIONAL MATCH n<-[links_n_in]-(n_from)
OPTIONAL MATCH n-[links_n_out]->(n_to)
RETURN n,COLLECT(n_from) AS n_from,COLLECT(links_n_in) AS links_n_in,COLLECT(n_to) AS n_to,COLLECT(links_n_out) AS links_n_out
then the n_to and link_n_out columns are populated as expected.
The first form treats it as a single extended pattern that must match entirely.
The second form treats them as distinct optional patterns, and can match the two separately.
So your results make sense, when you think about what it's doing--if the whole OPTIONAL MATCH pattern isn't found, it doesn't match any of the OPTIONAL MATCH pattern.