how to gather the unique related rows in a neo4j cypher query? - neo4j

I have 3 posts, 1 of which has two comments. I need to retrieve the 3 post plus the 2 comments and the authors. Problem is cypher returns all the associated links. How can I get only the rows I need? When run this query I get 72 rows....I am only expecting 4 rows:
MATCH (p:Posts)
WITH p
MATCH(c:Comments)-[r:COMMENTED_ON]-()
WITH p, c
MATCH (u)-[:MADE_A_POST]->()
WITH p,c,u
MATCH (n)-[:POSTED_COMMENT]-()
RETURN {post:p,comments:c,author:u,commentator:n}
I am trying to ensure that each row only contain the properties related to that row. In this case I should have 4 rows...since one of the post has 2 comments. I looked at UNION, COLLECT and DISTINCT and none seems to be a good fit. Any help would be appreciated.

In your query you are effectively doing the following:
Match all posts
Match all comments that have a COMMENTED_ON relationship
MATCH all nodes that have a MADE_A_POST relationship
MATCH all nodes that have POSTED_COMMENT relationship
As I don't have sight of your data model, I can only guess, but it looks like you're making no direct connection between posts with comments, and then finding the authors of those.
Taking a guess, I think your query should look like the following:
MATCH (p:Posts)
WITH p
MATCH (n)-[:POSTED_COMMENT]-(c:Comments)-[r:COMMENTED_ON]-(p)
MATCH (u)-[:MADE_A_POST]->(p)
RETURN {post:p,comments:c,author:u,commentator:n}

Related

Is there a simpler version of this cypher query?

I have constructed a query to find the people who follow each other and who have read books in the same genre. Here it is:
MATCH (u1:User)-[:READ]->(b1:Book)
WITH collect(DISTINCT b1.genre) AS genres,u1 AS user1
MATCH (u2:User)-[:READ]->(b2:Book)
WHERE (user1)<-[:FOLLOWS]->(u2) AND b2.genre IN genres
RETURN DISTINCT user1.username AS user1,u2.username AS user2
The idea is that we collect all the book genres for one of them, and if a book read by the other is in that list of genres (and they follow each other), then we return those users. This seems to work: we get a list of distinct pairs of individuals. I wonder, though, if there a quicker way to do this? My solution seems somewhat clumsy, but I found it surprisingly finicky trying to specify that they have read a book in the same genre without getting back all the pairs of books and duplicating individuals. For example, I
first wrote the following:
MATCH (b1:Book)<-[:READ]-(u1:User)-[:FOLLOWS]-(u2:User)-[:READ]->(b2:Book)
WHERE b1.genre = b2.genre
RETURN DISTINCT u1.username AS user1, u2.username AS user2
Which seems simpler, but in fact it returned repeated names for all the books that were read in the same genre. Is my solution the simplest, or is there a simpler one?
This is one way of rewriting the query
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
MATCH (n1)-[:READ]->(book), (n2)-[:READ]->(book2)
WHERE book.genre = book2.genre
RETURN n1.username, n2.username, count(*)
Here is another collecting genres for each user
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WITH n1, n2,
[(n1)-[:READ]->(book) | book.genre] AS g1,
[(n2)-[:READ]->(book) | book.genre] AS g2
WHERE ANY(x IN g1 WHERE x IN g2)
RETURN n1, n2, count(*)
Note that sometimes longer queries are not especially better in the sense that the ways the data are retrieved need to make sense to yourself.
Your model however clearly shows that you would benefit from a bit of graph refactoring, extracting the genre into its own node, for eg
MATCH (n:Book)
MERGE (g:Genre {name: n.genre})
MERGE (n)-[:HAS_GENRE]->(g)
And this would be the new query which leverages a graph model
PROFILE
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WHERE (n1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(n2)
RETURN n1.username, n2.username, count(*)

How to get counts of Edges related to a node in one query - NEO4J

I have two questions.
What is the best way to index user activities like posts, reposts, comments, upvotes, and downvotes. My current solution is representing every activity as a POST. It should work, but I know its quite expensive to regard upvotes and downvotes as new nodes when I can just use a relationship to represent this. But then, I want to be able to fetch everything once and order.
Secondly: When I run the following excluding the WITH and following MATCH, The result is larger but as I try to get the counts of reposts, replies and upvotes. The result keeps getting smaller and eventually nothing.
MATCH (me:User {id: "172ed572-e3af-d3ee-77c0-8d9d181b12f1"})-[:COLLEAGUE_OF]-(u:User)-[posted:POSTED]->(p:Post) WHERE posted.date >= 0
WITH p, posted, u AS user MATCH (p)-[ro:REPOST_OF]-(:Post)
WITH count(ro) AS reposts, posted, ro, user MATCH (p)-[rt:REPLY_TO]->(:Post)
WITH count(rt) AS replies, posted, user, reposts MATCH (p)-[uv:UP_VOTE]->(:Post)
WITH count(uv) AS upvotes, posted, user, reposts, replies, p
RETURN p AS post, posted, user, reposts, replies
ORDER BY -posted.date
You need to read the documentation on aggregating functions (like COUNT). In particular, you need to understand that the WITH (and RETURN) clause treats terms that do not contain aggregating functions as the "grouping keys" for the terms that do contain aggregating functions.
For example, a clause such as WITH foo, COUNT(foo) AS fooCount will always produce a fooCount of 1.
WITH clauses must specify the bound variables whose values you want to use later in the same query; any unspecified variables will be dropped. SInce your second and third WITH clauses do not specify p, their subsequent MATCH clauses are actually NOT using the previously bound value for p (but creating totally new p variables, each having multiple values).
You should use OPTIONAL MATCH instead of MATCH to get the counts of things that may not exist. A MATCH would cause the entire query to abort if it fails to find a match.
You neglected to make the (p)-[ro:REPOST_OF]-(:Post) relationship pattern directional. If you wanted to get a count of the number of times that p was reposted, so you should have used the pattern (p)<-[ro:REPOST_OF]-(:Post).
You forgot to return upvotes.
You should use ORDER BY posted.date DESC instead of ORDER BY -posted.date.
This may work better for you:
MATCH (:User {id: "172ed572-e3af-d3ee-77c0-8d9d181b12f1"})-[:COLLEAGUE_OF]-(user:User)-[posted:POSTED]->(p:Post)
WHERE posted.date >= 0
OPTIONAL MATCH (p)<-[ro:REPOST_OF]-(:Post)
WITH p, posted, user, COUNT(ro) AS reposts
OPTIONAL MATCH (p)-[rt:REPLY_TO]->(:Post)
WITH p, posted, user, reposts, COUNT(rt) AS replies
OPTIONAL MATCH (p)-[uv:UP_VOTE]->(:Post)
RETURN p, posted, user, reposts, replies, COUNT(uv) AS upvotes
ORDER BY posted.date DESC

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

cypher query get wrong results

I have a graph which contains 3 types of nodes : Post, User, Cat, every cat have so many posts that are linked together with a 'next' relationship, every user can comment on these posts and the comment is a 'Commented' relationship with a 'content' and 'id' property.
I want to get the posts of some cat and all the comments related to every post.
I've tried this but it don't give me posts without comments:
MATCH (cat:Cat {id:1})-[:lastPost]->(last)-[:next*0..]->(rest)
MATCH (rest)<-[c:Commented]-(u:User)
RETURN c, rest
Is there any way to achieve what I want in Cypher ? thank you
Try changing the second MATCH statement to an OPTIONAL MATCH:
MATCH (cat:Cat {id:1})-[:lastPost]->(last)-[:next*0..]->(rest)
OPTIONAL MATCH (rest)<-[c:Commented]-(u:User)
RETURN c, rest
Any values that don't match the OPTIONAL MATCH pattern will be NULL.

Cypher query to find nodes that have 3 relationships

I figured out how to write this query when I am looking for 2 relationships, but not sure how to add more relationships to the query.
Assume you have a book club database with 'reader' and 'book' as nodes. The 'book' nodes have a 'genre' attribute (to define that the book is a Fiction, Non-Fiction, Biography, Reference, etc.) There is a Relationship "HasRead" between 'reader' nodes and 'book' nodes where someone has read a particular book.
If I want to find readers that have read both Fiction AND Non-Fiction books, I could execute this Cypher query:
Start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction')
Match b1-[:HadRead]-r-[:HasRead]-b2
Return r.ReaderName
The key to the above query is the Match clause that has the two book aliases feeding into the r alias for the 'reader' nodes.
Question: How would I write the query to find users that have read Fiction AND Non-Fiction AND Reference books? I'm stuck with how you would write the Match clause when you have more than 2 things you are looking for.
You can have multiple line specified in a single MATCH clause, separated by commas. For example, the following two MATCH clauses are semantically equivalent (and will be evaluated identically by the engine):
//these mean the same thing!
match a--b--c
match a--b, b--c
You can have any number of these matches. So, plugging that into your query, you get this:
start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction'),
b3=node:MyBookIndex('Genre:Reference')
match b1-[:HasRead]-r,
b2-[:HasRead]-r,
b3-[:HasRead]-r
return r.ReaderName
You can user cypher 'with' clause -
start b1=node:MyBookIndex('Genre:Fiction'),
b2=node:MyBookIndex('Genre:Non-Fiction'),
b3=node:MyBookIndex('Genre:Reference')
match b1-[:HasRead]-r-[:HasRead]-b2
with b3, r
match b3-[:HasRead]-r
return r.ReaderName

Resources