Cypher returning multiple rows per match - neo4j

I am querying neo4j with cypher and am getting two rows per match:
MATCH (g:Group {Name: "Goliath_Treasury237"})-[w:MEMBER]->(a:Account)<-[y:ACCOUNT]-(p:Person)-[MANAGER*0..1]->(b:Person)
WHERE NOT p.staffID = b.staffID
MATCH (g:Group {Name: "Goliath_Treasury237"})-[j:MEMBER]->(v:Account)<-[x:ACCOUNT]-(p:Person)-[f:DEP*0..1]->(d:Department)
RETURN p.GivenName, p.Surname, p.staffID, p.CorpT, b.staffID, d.Name
I'm trying to get the information for both the department and the boss of a person which I'm struggling with declaratively unless I do the double match. This returns two rows for each match where a person has a boss, one with their ID as their bosses ID and one with the correct bosses id. For people without a boss I get one row back but the bosses id is their own.
If I remove the variable length path for boss then I get one row for each individual but no row where someone doesn't have a boss.
I'm at a loss now, any help would be great!

There's several things we can fix here.
For one, you don't need the second match to start over from the beginning, you can reuse the same p node without needing all the stuff before. You also don't need to have a variable on every node and relationship if you're not going to use or return it in some way.
You can use an OPTIONAL MATCH when you don't want the match to filter out rows, newly-introduced variables in the OPTIONAL MATCH will be null if the match fails.
Something like this should work:
MATCH (:Group {Name: "Goliath_Treasury237"})-[:MEMBER]->(:Account)<-[:ACCOUNT]-(p:Person)
OPTIONAL MATCH (p)-[MANAGER]->(b:Person)
WHERE NOT p.staffID = b.staffID
OPTIONAL MATCH (p)-[:DEP]->(d:Department)
RETURN p.GivenName, p.Surname, p.staffID, p.CorpT, b.staffID, d.Name

Related

Optional Match and Where -- How does this query work?

Consider the following schema, where orange nodes are of type Person and brown nodes are of type Movie. (This is from the "movies" dataset that is shipped with Neo4j).
The query that I am trying to write goes as follows:
Find all reviewer pairs, one following the other, and return the names
of the two reviewers. If they have both reviewed the same movie,
return the title of the movie as well. Restrict the query so that the first letter of the name of both reviewers is ’J’
Now, consider the following CYPHER query:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
This returns the following (incorrect) results:
Why?
What I've gathered so far:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
the WHERE constraints are considered while looking for matches, not afterwards.
When an OPTIONAL MATCH does not apply fully, null is put for the missing parts of the pattern
I still don't understand, why "Angela Scope" shows up in the results. In any case, if the predicates should forbid it to ever show up.
PS: I am aware that the following query returns the correct results
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
RETURN DISTINCT a.name, b.name, m.title
however, I'd like to find out why these two queries return different results and especially why the one mentioned first returns exactly this result.
Sure, you're almost at the answer already:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
This is important. You should not view the WHERE clause as independent, as it is associated with and modifies the preceding clause. So read it out like MATCH ... WHERE ... and OPTIONAL MATCH ... WHERE ... and WITH ... WHERE ... as a whole.
Remember that an OPTIONAL MATCH will never filter out rows. It will keep existing rows, and for any newly introduced variables, will try to find matches using the pattern provided that passes its WHERE clause. If it doesn't find matches, newly introduced variables will be set to null. And again...no filtering.
So for this snippet:
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
Angela Scope and Jessica Thompson have a follows relationship between them, and they have reviewed the same movie, The Replacements, but they fail the WHERE clause, since Angela's name doesn't start with a 'J'. Therefore the OPTIONAL MATCH didn't find anything, so the newly introduced variable m will come back as null. Nothing will be filtered.
In order to have a predicate filter your rows, the WHERE clause needs to be associated with a MATCH, or a WITH. So we could fix it as in the correct query you added later, or like this:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WITH a, m, b
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
And this is less efficient since the filtering happens after we've done the OPTIONAL MATCH. Better to filter earlier, so we only execute the OPTIONAL MATCH when we already have our filtered results.
Also to note, you have an issue with duplicates here due to your matching of these patterns at the start: (a:Person)-[:REVIEWED]->(:Movie). While this does indeed find persons who are reviewers, you will get a row per path that matches the pattern...so for Jessica Thompson, for example, you can see she has reviewed 2 movies, so there are two paths that match that pattern, which is why she's showing up at least twice per other reviewer in your results (and it will be multiplicative, depending on the number of movies the other reviewer has reviewed.
To fix this, instead of looking for all paths of a :Person reviewing a :Movie, look for a :Person where they have reviewed a movie:
MATCH (a:Person)
WHERE (a)-[:REVIEWED]->()
Because the pattern becomes a predicate, Cypher only has to find at least one :REVIEWED relationship from a :Person, and then it can stop looking, and you won't have those duplicate results.

Collection in WITH clause gets expanded to one element per row

I want to create a map projection with node properties and some additional information.
Also I want to collect some ids in a collection and use this later in the query to filter out nodes (where ID(n) in ids...).
The map projection is created in an apoc call which includes several union matches.
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
MATCH (n)-[:has]->({"Vacation"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u, nodeWithInfo
This does not return anything because, when the where part is evaluated it doesn´t check "nodesWithAdditionalInfosIds" as a flat list but instead only gets one id per row.
The problem only exists because I am passing the ids (nodesWithAdditionalInfosIds) AND the nodeProjection (nodeWithInfo) on in the WITH clause.
If I instead only use the id collection and don´t use the nodeWithInfo projection the following adjustement works and returns my only the nodes which ids are in the id collection:
...
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
MATCH (n)-[:has]->({"Urlaub"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u
If I just return the collection "nodesWithAdditionalInfosIds" directly after the WITH clause in both examples this gets obvious. Since the first one generates a flat list in one result row and the second one gives me one id per row.
I have the feeling that I am missing a crucial knowledge about neo4js With clause.
Is there a way I can pass on my listOfIds and use it as a flat list without the need to have an exclusive WITH clause for the collection?
edit:
Right now I am using the following workaround:
After I do the check on the ID of "n" and "u" I don´t return but instead keep the filtered "n" and "u" nodes and start a second apoc call that returns "nodeWithInfo" like before.
WITH n, u
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo, n, u
WHERE nodeWithInfo.id = ID(n) OR nodeWithInfo.id = ID(u)
RETURN nodeWithInfo, n, u
This way I can return the nodes n, u and the additional information (to one of the nodes) per row. But I am sure there must be a better way.
I know ids in neo4j have to be used with care, if at all. In this case I only need them to be valid inside this query, so it doesn´t matter if the next time the same node has another id.
The problem is stripped down to the core problem (in my opinion), the original query is a little bigger with several UNION MATCH inside apoc and the actual match on nodes which ids are contained in my collection is checking for some more restrictions instead of asking for any node.
Aggregating functions, like COLLECT(), aggregate over a set of "grouping keys".
In the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
the grouping key is nodeWithInfo. Therefore, each nodesWithAdditionalInfosIds would always be a list containing one value.
And in the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
there is no grouping key. Therefore, in this situation, nodesWithAdditionalInfosIds will contain all the nodeWithInfo.id values.

Query for who you like, who likes you, and mutual likes?

Is there a better way to do this than 3 separate queries?
MATCH (you:User {name: "Alexander"})-[:LIKES]->(youLike:User)
RETURN youLike
MATCH (likesYou:User)-[:LIKES]->(you:User {name: "Alexander"})
RETURN likesYou
MATCH (mutualLike:User)-[:LIKES]->(you:User {name: "Alexander"})-[:LIKES]->(mutualLike:User)
RETURN mutualLike
Here is a shot at a single query.
Essentially, find yourself first, optionally find people that you like and collect them, optionally find people that like you and collect them, then return both collections and the intersection of the two.
By matching the node that identifies you and reusing it you match yourself once instead of three times.
Using the collection filter function allowsyou two find the intersection of the two :LIKES populations without rematching those nodes.
The OPTIONAL keyword allows the query to continue if either :LIKES population is empty.
MATCH (you:User {name: "Alexander"})
WITH you
OPTIONAL MATCH(you)-[:LIKES]->(youLike:User)
WITH you, collect(youLike) as youLike
OPTIONAL MATCH (likesYou:User)-[:LIKES]->(you)
WITH you, youLike, collect(likesYou) as likesYou
RETURN you
, youLike
, likesYou
, filter(n in youLike where n in likesYou) as mutualLike

Cypher Query not returning nonexistent relationships

I have a graph database where there are user and interest nodes which are connected by IS_INTERESTED relationship. I want to find interests which are not selected by a user. I wrote this query and it is not working
OPTIONAL MATCH (u:User{userId : 1})-[r:IS_INTERESTED] -(i:Interest)
WHERE r is NULL
Return i.name as interest
According to answers to similar questions on SO (like this one), the above query is supposed to work.However,in this case it returns null. But when running the following query it works as expected:
MATCH (u:User{userId : 1}), (i:Interest)
WHERE NOT (u) -[:IS_INTERESTED] -(i)
return i.name as interest
The reason I don't want to run the above query is because Neo4j gives a warning:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (i))
What am I doing wrong in the first query where I use OPTIONAL MATCH to find nonexistent relationships?
1) MATCH is looking for the pattern as a whole, and if can not find it in its entirety - does not return anything.
2) I think that this query will be effective:
// Take all user interests
MATCH (u:User{userId: 1})-[r:IS_INTERESTED]-(i:Interest)
WITH collect(i) as interests
// Check what interests are not included
MATCH (ni:Interest) WHERE NOT ni IN interests
RETURN ni.name
When your OPTIONAL MATCH query does not find a match, then both r AND i must be NULL. After all, since there is no relationship, there is no way get the nodes that it points to.
A WHERE directly after the OPTIONAL MATCH is pulled into the evaluation.
If you want to post-filter you have to use a WITH in between.
MATCH (u:User{userId : 1})
OPTIONAL MATCH (u)-[r:IS_INTERESTED] -(i:Interest)
WITH r,i
WHERE r is NULL
Return i.name as interest

How to query for multiple OR'ed Neo4j paths?

Anyone know of a fast way to query multiple paths in Neo4j ?
Lets say I have movie nodes that can have a type that I want to match (this is psuedo-code)
MATCH
(m:Movie)<-[:TYPE]-(g:Genre { name:'action' })
OR
(m:Movie)<-[:TYPE]-(x:Genre)<-[:G_TYPE*1..3]-(g:Genre { name:'action' })
(m)-[:SUBGENRE]->(sg:SubGenre {name: 'comedy'})
OR
(m)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
The problem is, the first "m:Movie" nodes to be matched must match one of the paths specified, and the second SubGenre is depenedent on the first match.
I can make a query that works using MATCH and WHERE, but its really slow (30 seconds with a small 20MB dataset).
The problem is, I don't know how to OR match in Neo4j with other OR matches hanging off of the first results.
If I use WHERE, then I have to declare all the nodes used in any of the statements, in the initial MATCH which makes the query slow (since you cannot introduce new nodes in a WHERE)
Anyone know an elegant way to solve this ?? Thanks !
You can try a variable length path with a minimal length of 0:
MATCH
(m:Movie)<-[:TYPE|:SUBGENRE*0..4]-(g)
WHERE g:Genre and g.name = 'action' OR g:SubGenre and g.name='comedy'
For the query to use an index to find your genre / subgenre I recommend a UNION query though.
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' })
RETURN distinct m
UNION
(m:Movie)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
RETURN distinct m
Perhaps the OPTIONAL MATCH clause might help here. OPTIONAL MATCH beavior is similar to the MATCH statement, except that instead of an all-or-none pattern matching approach, any elements of the pattern that do not match the pattern specific in the statement are bound to null.
For example, to match on a movie, its genre and a possible sub-genre:
OPTIONAL MATCH (m:Movie)-[:IS_GENRE]->(g:Genre)<-[:IS_SUBGENRE]-(sub:Genre)
WHERE m.title = "The Matrix"
RETURN m, g, sub
This will return the movie node, the genre node and if it exists, the sub-genre. If there is no sub-genre then it will return null for sub. You can use variable length paths as you have above as well with OPTIONAL MATCH.
[EDITED]
The following MATCH clause should be equivalent to your pseudocode. There is also a USING INDEX clause that assumes you have first created an index on :SubGenre(name), for efficiency. (You could use an index on :Genre(name) instead, if Genre nodes are more numerous than SubGenre nodes.)
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' }),
(m)-[:SUBGENRE]->()<-[:SUB_TYPE*0..3]-(sg:SubGenre { name: 'comedy' })
USING INDEX sg:SubGenre(name)
Here is a console that shows the results for some sample data.

Resources