I'm trying to find all nodes that don't connect to a specific node. I have an app where students doing an assignment discover themes in a story, and then write explications. Then, other students do peer reviews of these explications. My data looks like this:
Assignment-hasTheme->Theme-hasChild->Theme
Annotation-theme->Theme
Explication-owner->User
Explication-annotation->Annotation
PeerReview-explication->Explication
As part of the application, when a user has to do a peer review, I have to find all the explications written by other users. It seems to me like this query should work:
MATCH
(u),
(a)-[:hasTheme]->(:Theme)
-[:hasChild*]->(:Theme)
<-[:theme]-(ann:Annotation)
<-[:annotation]-(e:Explication)
OPTIONAL MATCH
(e)<-[:explication]-(p:PeerReview)
WHERE id(a)=7 AND id(u)=4
AND (e)-[:owner]->(u)
RETURN e, count(e) AS explicationCount
ORDER BY explicationCount ASC
The problem is that it doesn't: I get all the explications that all users have written. That includes the explications the user wrote. Can anyone tell me how to exclude those?
The problem is that the WHERE clause is only associated with one other clause...the preceding MATCH, OPTIONAL MATCH, or WITH. In your query, it's associated with the OPTIONAL MATCH.
If you re-read your query knowing this, you can see that the first MATCH has no WHERE clause, so it's matching on all assignments and all users, finding all explications.
THEN it does the optional match to get :PeerReviews matching on the given assignment and user ids where the explication owner is the user with the given id. The WHERE is only affecting which :PeerReviews (variable p) are matched.
A couple other things I can see...you're introducing a variable ann on the :Annotations matched in the pattern, and a variable p for the :PeerReview, but you're not actually doing anything with these in the query. This also makes your OPTIONAL MATCH useless, you're not returning or operating on the matched :PeerReviews.
My recommendation is to remove those variables and remove your OPTIONAL MATCH completely.
MATCH
(u),
(a)-[:hasTheme]->(:Theme)
-[:hasChild*]->(:Theme)
<-[:theme]-(:Annotation)
<-[:annotation]-(e:Explication)
WHERE id(a)=7 AND id(u)=4
AND (e)-[:owner]->(u)
RETURN e, count(e) AS explicationCount
ORDER BY explicationCount ASC
If you do want to add in the OPTIONAL MATCH and use the matched :PeerReview, ensure that it's below the WHERE affecting the MATCH, like so:
MATCH
(u),
(a)-[:hasTheme]->(:Theme)
-[:hasChild*]->(:Theme)
<-[:theme]-(:Annotation)
<-[:annotation]-(e:Explication)
WHERE id(a)=7 AND id(u)=4
AND (e)-[:owner]->(u)
OPTIONAL MATCH
(e)<-[:explication]-(p:PeerReview)
RETURN e, count(e) AS explicationCount, p
ORDER BY explicationCount ASC
EDIT
In response to the comments where the desired result is each :Explication and the count of all linked :PeerReviews, you would use this query:
MATCH
(u),
(a)-[:hasTheme]->(:Theme)
-[:hasChild*0..]->(:Theme)
<-[:theme]-(:Annotation)
<-[:annotation]-(e:Explication)
WHERE id(a)=7 AND id(u)=4
AND (e)-[:owner]->(u)
OPTIONAL MATCH
(e)<-[:explication]-(p:PeerReview)
RETURN e, count(p) as peerReviewCount
ORDER BY peerReviewCount ASC
EDIT
Updated the above query so it will find annotations on the parent theme as well instead of just its children.
Related
Consider the following schema, where orange nodes are of type Person and brown nodes are of type Movie. (This is from the "movies" dataset that is shipped with Neo4j).
The query that I am trying to write goes as follows:
Find all reviewer pairs, one following the other, and return the names
of the two reviewers. If they have both reviewed the same movie,
return the title of the movie as well. Restrict the query so that the first letter of the name of both reviewers is ’J’
Now, consider the following CYPHER query:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
This returns the following (incorrect) results:
Why?
What I've gathered so far:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
the WHERE constraints are considered while looking for matches, not afterwards.
When an OPTIONAL MATCH does not apply fully, null is put for the missing parts of the pattern
I still don't understand, why "Angela Scope" shows up in the results. In any case, if the predicates should forbid it to ever show up.
PS: I am aware that the following query returns the correct results
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
RETURN DISTINCT a.name, b.name, m.title
however, I'd like to find out why these two queries return different results and especially why the one mentioned first returns exactly this result.
Sure, you're almost at the answer already:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
This is important. You should not view the WHERE clause as independent, as it is associated with and modifies the preceding clause. So read it out like MATCH ... WHERE ... and OPTIONAL MATCH ... WHERE ... and WITH ... WHERE ... as a whole.
Remember that an OPTIONAL MATCH will never filter out rows. It will keep existing rows, and for any newly introduced variables, will try to find matches using the pattern provided that passes its WHERE clause. If it doesn't find matches, newly introduced variables will be set to null. And again...no filtering.
So for this snippet:
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
Angela Scope and Jessica Thompson have a follows relationship between them, and they have reviewed the same movie, The Replacements, but they fail the WHERE clause, since Angela's name doesn't start with a 'J'. Therefore the OPTIONAL MATCH didn't find anything, so the newly introduced variable m will come back as null. Nothing will be filtered.
In order to have a predicate filter your rows, the WHERE clause needs to be associated with a MATCH, or a WITH. So we could fix it as in the correct query you added later, or like this:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WITH a, m, b
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
And this is less efficient since the filtering happens after we've done the OPTIONAL MATCH. Better to filter earlier, so we only execute the OPTIONAL MATCH when we already have our filtered results.
Also to note, you have an issue with duplicates here due to your matching of these patterns at the start: (a:Person)-[:REVIEWED]->(:Movie). While this does indeed find persons who are reviewers, you will get a row per path that matches the pattern...so for Jessica Thompson, for example, you can see she has reviewed 2 movies, so there are two paths that match that pattern, which is why she's showing up at least twice per other reviewer in your results (and it will be multiplicative, depending on the number of movies the other reviewer has reviewed.
To fix this, instead of looking for all paths of a :Person reviewing a :Movie, look for a :Person where they have reviewed a movie:
MATCH (a:Person)
WHERE (a)-[:REVIEWED]->()
Because the pattern becomes a predicate, Cypher only has to find at least one :REVIEWED relationship from a :Person, and then it can stop looking, and you won't have those duplicate results.
I want to create a map projection with node properties and some additional information.
Also I want to collect some ids in a collection and use this later in the query to filter out nodes (where ID(n) in ids...).
The map projection is created in an apoc call which includes several union matches.
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
MATCH (n)-[:has]->({"Vacation"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u, nodeWithInfo
This does not return anything because, when the where part is evaluated it doesn´t check "nodesWithAdditionalInfosIds" as a flat list but instead only gets one id per row.
The problem only exists because I am passing the ids (nodesWithAdditionalInfosIds) AND the nodeProjection (nodeWithInfo) on in the WITH clause.
If I instead only use the id collection and don´t use the nodeWithInfo projection the following adjustement works and returns my only the nodes which ids are in the id collection:
...
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
MATCH (n)-[:has]->({"Urlaub"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u
If I just return the collection "nodesWithAdditionalInfosIds" directly after the WITH clause in both examples this gets obvious. Since the first one generates a flat list in one result row and the second one gives me one id per row.
I have the feeling that I am missing a crucial knowledge about neo4js With clause.
Is there a way I can pass on my listOfIds and use it as a flat list without the need to have an exclusive WITH clause for the collection?
edit:
Right now I am using the following workaround:
After I do the check on the ID of "n" and "u" I don´t return but instead keep the filtered "n" and "u" nodes and start a second apoc call that returns "nodeWithInfo" like before.
WITH n, u
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo, n, u
WHERE nodeWithInfo.id = ID(n) OR nodeWithInfo.id = ID(u)
RETURN nodeWithInfo, n, u
This way I can return the nodes n, u and the additional information (to one of the nodes) per row. But I am sure there must be a better way.
I know ids in neo4j have to be used with care, if at all. In this case I only need them to be valid inside this query, so it doesn´t matter if the next time the same node has another id.
The problem is stripped down to the core problem (in my opinion), the original query is a little bigger with several UNION MATCH inside apoc and the actual match on nodes which ids are contained in my collection is checking for some more restrictions instead of asking for any node.
Aggregating functions, like COLLECT(), aggregate over a set of "grouping keys".
In the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
the grouping key is nodeWithInfo. Therefore, each nodesWithAdditionalInfosIds would always be a list containing one value.
And in the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
there is no grouping key. Therefore, in this situation, nodesWithAdditionalInfosIds will contain all the nodeWithInfo.id values.
I have a database in which I have Entity nodes, User nodes, and a couple of relationships including LIKES, POSTED_BY. I'm trying to write a query to achieve this objective:
Find all Entity nodes that a particular user LIKES or those that have been POSTED_BY that User
Note that I have simplified my query - in real I have a bunch of other conditions similar to the above.
I'm trying to use a COLLECT clause to aggregate the list of all Entity nodes, and build on that line by line.
MATCH (e)<-[:LIKES]-(me:User{id: 'rJVbpcqzf'} )
WITH me, COLLECT(e) AS all_entities
MATCH (e)-[:POSTED_BY]->(me)
WITH me, all_entities + COLLECT(e) AS all_entities
UNWIND all_entities AS e
WITH DISTINCT e
RETURN e;
This seems to be returning the correct list ONLY if there is at least one Entity that the user has liked (i.e., if the first COLLECT returns a non-empty list). However, if there is no Entity that I have liked, the entire query returns empty.
Any suggestions on what I'm missing here?
Use OPTIONAL MATCH:
MATCH (me:User {id: 'rJVbpcqzf'})
OPTIONAL MATCH (me)-[:LIKES|POSTED_BY]->(e)
RETURN collect(DISTINCT e) AS all_entities
Notes:
Instead of collecting and unwinding, you can simply use DISTINCT. You can also use DISTINCT with collect.
You can also use multiple relationship types, i.e. the LIKES|POSTED_BY for the relationship type here.
I have a graph database where there are user and interest nodes which are connected by IS_INTERESTED relationship. I want to find interests which are not selected by a user. I wrote this query and it is not working
OPTIONAL MATCH (u:User{userId : 1})-[r:IS_INTERESTED] -(i:Interest)
WHERE r is NULL
Return i.name as interest
According to answers to similar questions on SO (like this one), the above query is supposed to work.However,in this case it returns null. But when running the following query it works as expected:
MATCH (u:User{userId : 1}), (i:Interest)
WHERE NOT (u) -[:IS_INTERESTED] -(i)
return i.name as interest
The reason I don't want to run the above query is because Neo4j gives a warning:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (i))
What am I doing wrong in the first query where I use OPTIONAL MATCH to find nonexistent relationships?
1) MATCH is looking for the pattern as a whole, and if can not find it in its entirety - does not return anything.
2) I think that this query will be effective:
// Take all user interests
MATCH (u:User{userId: 1})-[r:IS_INTERESTED]-(i:Interest)
WITH collect(i) as interests
// Check what interests are not included
MATCH (ni:Interest) WHERE NOT ni IN interests
RETURN ni.name
When your OPTIONAL MATCH query does not find a match, then both r AND i must be NULL. After all, since there is no relationship, there is no way get the nodes that it points to.
A WHERE directly after the OPTIONAL MATCH is pulled into the evaluation.
If you want to post-filter you have to use a WITH in between.
MATCH (u:User{userId : 1})
OPTIONAL MATCH (u)-[r:IS_INTERESTED] -(i:Interest)
WITH r,i
WHERE r is NULL
Return i.name as interest
I have users in my graphdb and they are voting to brands. I have a case which I need to find the users that don't vote any brand. I prepare a console view you can play with. I need to take 'Trinity' named node, in this console example;
Console Example
Tried optional match without luck.
The right way with optional match is more cumbersome (but potentially faster):
MATCH (n:User)
OPTIONAL MATCH (n)-[:Voted]->(brand)
WITH n,brand
WHERE brand IS NULL
RETURN n, brand
As the WHERE belongs internally to the optional match (like in sql join ON (...)) so it can be used to specify constraints that the optional match will adhere to.
So if you want to filter the "results" of the optional matching you have to separate that with WITH.