Cypher query to find nodes that have 3 relationships - neo4j

I figured out how to write this query when I am looking for 2 relationships, but not sure how to add more relationships to the query.
Assume you have a book club database with 'reader' and 'book' as nodes. The 'book' nodes have a 'genre' attribute (to define that the book is a Fiction, Non-Fiction, Biography, Reference, etc.) There is a Relationship "HasRead" between 'reader' nodes and 'book' nodes where someone has read a particular book.
If I want to find readers that have read both Fiction AND Non-Fiction books, I could execute this Cypher query:
Start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction')
Match b1-[:HadRead]-r-[:HasRead]-b2
Return r.ReaderName
The key to the above query is the Match clause that has the two book aliases feeding into the r alias for the 'reader' nodes.
Question: How would I write the query to find users that have read Fiction AND Non-Fiction AND Reference books? I'm stuck with how you would write the Match clause when you have more than 2 things you are looking for.

You can have multiple line specified in a single MATCH clause, separated by commas. For example, the following two MATCH clauses are semantically equivalent (and will be evaluated identically by the engine):
//these mean the same thing!
match a--b--c
match a--b, b--c
You can have any number of these matches. So, plugging that into your query, you get this:
start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction'),
b3=node:MyBookIndex('Genre:Reference')
match b1-[:HasRead]-r,
b2-[:HasRead]-r,
b3-[:HasRead]-r
return r.ReaderName

You can user cypher 'with' clause -
start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction'),
b3=node:MyBookIndex('Genre:Reference')
match b1-[:HasRead]-r-[:HasRead]-b2
with b3, r
match b3-[:HasRead]-r
return r.ReaderName

Related

how to gather the unique related rows in a neo4j cypher query?

I have 3 posts, 1 of which has two comments. I need to retrieve the 3 post plus the 2 comments and the authors. Problem is cypher returns all the associated links. How can I get only the rows I need? When run this query I get 72 rows....I am only expecting 4 rows:
MATCH (p:Posts)
WITH p
MATCH(c:Comments)-[r:COMMENTED_ON]-()
WITH p, c
MATCH (u)-[:MADE_A_POST]->()
WITH p,c,u
MATCH (n)-[:POSTED_COMMENT]-()
RETURN {post:p,comments:c,author:u,commentator:n}
I am trying to ensure that each row only contain the properties related to that row. In this case I should have 4 rows...since one of the post has 2 comments. I looked at UNION, COLLECT and DISTINCT and none seems to be a good fit. Any help would be appreciated.
In your query you are effectively doing the following:
Match all posts
Match all comments that have a COMMENTED_ON relationship
MATCH all nodes that have a MADE_A_POST relationship
MATCH all nodes that have POSTED_COMMENT relationship
As I don't have sight of your data model, I can only guess, but it looks like you're making no direct connection between posts with comments, and then finding the authors of those.
Taking a guess, I think your query should look like the following:
MATCH (p:Posts)
WITH p
MATCH (n)-[:POSTED_COMMENT]-(c:Comments)-[r:COMMENTED_ON]-(p)
MATCH (u)-[:MADE_A_POST]->(p)
RETURN {post:p,comments:c,author:u,commentator:n}

Cypher COLLECT clause to build a list where some lists could be empty

I have a database in which I have Entity nodes, User nodes, and a couple of relationships including LIKES, POSTED_BY. I'm trying to write a query to achieve this objective:
Find all Entity nodes that a particular user LIKES or those that have been POSTED_BY that User
Note that I have simplified my query - in real I have a bunch of other conditions similar to the above.
I'm trying to use a COLLECT clause to aggregate the list of all Entity nodes, and build on that line by line.
MATCH (e)<-[:LIKES]-(me:User{id: 'rJVbpcqzf'} )
WITH me, COLLECT(e) AS all_entities
MATCH (e)-[:POSTED_BY]->(me)
WITH me, all_entities + COLLECT(e) AS all_entities
UNWIND all_entities AS e
WITH DISTINCT e
RETURN e;
This seems to be returning the correct list ONLY if there is at least one Entity that the user has liked (i.e., if the first COLLECT returns a non-empty list). However, if there is no Entity that I have liked, the entire query returns empty.
Any suggestions on what I'm missing here?
Use OPTIONAL MATCH:
MATCH (me:User {id: 'rJVbpcqzf'})
OPTIONAL MATCH (me)-[:LIKES|POSTED_BY]->(e)
RETURN collect(DISTINCT e) AS all_entities
Notes:
Instead of collecting and unwinding, you can simply use DISTINCT. You can also use DISTINCT with collect.
You can also use multiple relationship types, i.e. the LIKES|POSTED_BY for the relationship type here.

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

How to add multiple relationships to a single node in one query

Consider this example: I have one Author node, and several Book nodes. I want to create the WROTE relationship between the Author and several Book nodes in a single cypher Query statement. Also, the only way I have to look up the nodes for both the Author and all the Book nodes is by their node ID.
Here's what I tried:
MATCH (a:Author) WHERE id(a) = '31'
MATCH (b0:Book) WHERE id(b0) = '32'
MATCH (b1:Book) WHERE id(b1) = '33'
CREATE (b0)<-[:WROTE {order : '0'}]-(a)
CREATE (b1)<-[:WROTE {order : '1'}]-(a)
However, it doesn't seem to work.
Thanks.
Neo4js native ids are stored as numbers, so do not match by string. Also Cypher has IN clause that lets you match arrays, so you can simplify you query to this
MATCH (a:Author) where id(a)=31
MATCH (b:Book) where id(b) in [32,33]
CREATE (a)-[:WROTE]->(b)
Found the issue to be that I shouldn't be using the ' when matching node IDs!

What is the difference between multiple MATCH clauses and a comma in a Cypher query?

In a Cypher query language for Neo4j, what is the difference between one MATCH clause immediately following another like this:
MATCH (d:Document{document_ID:2})
MATCH (d)--(s:Sentence)
RETURN d,s
Versus the comma-separated patterns in the same MATCH clause? E.g.:
MATCH (d:Document{document_ID:2}),(d)--(s:Sentence)
RETURN d,s
In this simple example the result is the same. But are there any "gotchas"?
There is a difference: comma separated matches are actually considered part of the same pattern. So for instance the guarantee that each relationship appears only once in resulting path is upheld here.
Separate MATCHes are separate operations whose paths don't form a single patterns and which don't have these guarantees.
I think it's better to explain providing an example when there's a difference.
Let's say we have the "Movie" database which is provided by official Neo4j tutorials.
And there're 10 :WROTE relationships in total between :Person and :Movie nodes
MATCH (:Person)-[r:WROTE]->(:Movie) RETURN count(r); // returns 10
1) Let's try the next query with two MATCH clauses:
MATCH (p:Person)-[:WROTE]->(m:Movie) MATCH (p2:Person)-[:WROTE]->(m2:Movie)
RETURN p.name, m.title, p2.name, m2.title;
Sure you will see 10*10 = 100 records in the result.
2) Let's try the query with one MATCH clause and two patterns:
MATCH (p:Person)-[:WROTE]->(m:Movie), (p2:Person)-[:WROTE]->(m2:Movie)
RETURN p.name, m.title, p2.name, m2.title;
Now you will see 90 records are returned.
That's because in this case records where p = p2 and m = m2 with the same relationship between them (:WROTE) are excluded.
For example, there IS a record in the first case (two MATCH clauses)
p.name m.title p2.name m2.title
"Aaron Sorkin" "A Few Good Men" "Aaron Sorkin" "A Few Good Men"
while there's NO such a record in the second case (one MATCH, two patterns)
There are no differences between these provided that the clauses are not linked to one another.
If you did this:
MATCH (a:Thing), (b:Thing) RETURN a, b;
That's the same as:
MATCH (a:Thing) MATCH (b:Thing) RETURN a, b;
Because (and only because) a and b are independent. If a and b were linked by a relationship, then the meaning of the query could change.
In a more generic way, "The same relationship cannot be returned more than once in the same result record." [see 1.5. Cypher Result Uniqueness in the Cypher manual]
Both MATCH-after-MATCH, and single MATCH with comma-separated pattern should logically return a Cartesian product. Except, for comma-separated pattern, we must exclude those records for which we already added the relationship(s).
In Andy's answer, this is why we excluded repetitions of the same movie in the second case: because the second expression from each single MATCH was using there the same :WROTE relationship as the first expression.
If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (a)) .
IN short their is NO Difference in this both query but used it very carefully.
In a more generic way, "The same relationship cannot be returned more than once in the same result record." [see 1.5. Cypher Result Uniqueness in the Cypher manual]
How about this statement?
MATCH p1=(v:player)-[e1]->(n)
MATCH p2=(n:team)<-[e2]-(m)
WHERE e1=e2
RETURN e1,e2,p1,p2

Resources