Querying Relationship is very slow - neo4j

I want to create a new table based on relationships, after retrieving results.The query works and outputs for few results.
match (p:Person)-[r:LIVES]->(t:COUNTRY)<-[r2:LIVES]-(p2:Person)
where p<>p2 and r.year = r2.year and r.year >=2015 and r2.year>=2015
return p,r,t,r2,p2 limit 25;
But the server stops responding when I want all results.
match (p:Person)-[r:LIVES]->(t:COUNTRY)<-[r2:LIVES]-(p2:Person)
where p<>p2 and r.year = r2.year and r.year >=2015 and r2.year>=2015
return p,r,t,r2,p2;
This is basically self-joining relationship "LIVES".I have searched a lot but not found a way to create indices on relationships.
Any suggestions?

Since it sound like you aren't bound to the results format, I would like to offer an alternative Cypher.
The thing is that in your cypher, p and p2 basically form a cartesian product of people in the same year and country. So really you just need to group people by year and country, filter out groups of one, and then every pair in that bag is your p and p2 from your original query.
MATCH (p:Person)-[r:LIVES]->(t:COUNTRY)
WHERE r.year >=2015
WITH t, DISTINCT r.year as Year, COLLECT({n:p, r:r}) as people
WHERE SIZE(people) > 1
RETURN t, people

Related

Is there a simpler version of this cypher query?

I have constructed a query to find the people who follow each other and who have read books in the same genre. Here it is:
MATCH (u1:User)-[:READ]->(b1:Book)
WITH collect(DISTINCT b1.genre) AS genres,u1 AS user1
MATCH (u2:User)-[:READ]->(b2:Book)
WHERE (user1)<-[:FOLLOWS]->(u2) AND b2.genre IN genres
RETURN DISTINCT user1.username AS user1,u2.username AS user2
The idea is that we collect all the book genres for one of them, and if a book read by the other is in that list of genres (and they follow each other), then we return those users. This seems to work: we get a list of distinct pairs of individuals. I wonder, though, if there a quicker way to do this? My solution seems somewhat clumsy, but I found it surprisingly finicky trying to specify that they have read a book in the same genre without getting back all the pairs of books and duplicating individuals. For example, I
first wrote the following:
MATCH (b1:Book)<-[:READ]-(u1:User)-[:FOLLOWS]-(u2:User)-[:READ]->(b2:Book)
WHERE b1.genre = b2.genre
RETURN DISTINCT u1.username AS user1, u2.username AS user2
Which seems simpler, but in fact it returned repeated names for all the books that were read in the same genre. Is my solution the simplest, or is there a simpler one?
This is one way of rewriting the query
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
MATCH (n1)-[:READ]->(book), (n2)-[:READ]->(book2)
WHERE book.genre = book2.genre
RETURN n1.username, n2.username, count(*)
Here is another collecting genres for each user
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WITH n1, n2,
[(n1)-[:READ]->(book) | book.genre] AS g1,
[(n2)-[:READ]->(book) | book.genre] AS g2
WHERE ANY(x IN g1 WHERE x IN g2)
RETURN n1, n2, count(*)
Note that sometimes longer queries are not especially better in the sense that the ways the data are retrieved need to make sense to yourself.
Your model however clearly shows that you would benefit from a bit of graph refactoring, extracting the genre into its own node, for eg
MATCH (n:Book)
MERGE (g:Genre {name: n.genre})
MERGE (n)-[:HAS_GENRE]->(g)
And this would be the new query which leverages a graph model
PROFILE
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WHERE (n1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(n2)
RETURN n1.username, n2.username, count(*)

Efficient way to find common relationship

I've recently started learning Cypher. I have a database containing four users and films. Users can have can have [:WATCHED] / [:WATCHLISTED] / [:FAVORITED] relationships with films.
I want to get the films which all four users have watched. Here's a working query I've written:
match (u1)-[:WATCHED]->(f)<-[:WATCHED]-(u2),
(u3)-[:WATCHED]->(f)<-[:WATCHED]-(u4)
return u1, u2, u3, u4, f
I wanted to know if there was a more efficient way to do this. Or any another way, which I can't of. I'm asking this out of curiosity.
You can do this for example :
MATCH (f:Film)
WHERE size((f)<-[:WATCHED]-()) = 4
RETURN f, [(f)<-[:WATCHED]-(u:User) | u] as watchers
Here I assume that there is only one relationship of type WATCHED between a user and a movie, even if the user has watched the movie many times.
To avoid having to hardcode a User node count, this query efficiently gets the count using the DB's internal statistics:
MATCH (u:User)
WITH COUNT(u) AS userCount
MATCH (f:Film)
WHERE SIZE((f)<-[:WATCHED]-()) = userCount
RETURN f;
This query does not return the users that watched the film, since that is literally all the users in the DB, and with a sufficiently large number of them your query can run out of memory -- or it can take a very long time for a client (like the neo4j Browser) to receive and process the results. I think the main point of a query like this is to find the films, not the users. If you really want to get all the users, a separate query will do: MATCH (u:Users) RETURN u.
You can use all:
https://neo4j.com/docs/developer-manual/current/cypher/functions/predicate/
This checks if a predicate is true for all elements.

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

Neo4j / Cypher query syntax feedback

I'm developing a kind of reddit service to learn Neo4j.
Everything works fine, I just want to get some feedback on the Cypher query to get the most recent news stories, the author and number of comments, likes and dislikes.
I'm using Neo4j 2.0.
MATCH comments = (n:news)-[:COMMENT]-(o)
MATCH likes = (n:news)-[:LIKES]-(p)
MATCH dislikes = (n:news)-[:DISLIKES]-(q)
MATCH (n:news)-[:POSTED_BY]-(r)
WITH n, r, count(comments) AS num_comments, count(likes) AS num_likes, count(dislikes) AS num_dislikes
ORDER BY n.post_date
LIMIT 20
RETURN *
o, p, q, r are all nodes with the label user. Should the label be added to the query to speed it up?
Is there anything else you see that I could optimize?
I think you're going to want to get rid of the multiple matches. Cypher will filter on each one, filtering through one another, rather than getting all the information.
I would also avoid the paths like comments, and rather do the count on the nodes you are saving. When you do MATCH xyz = (a)-[:COMMENT]-(b) then xyz is a path, which contains the source, relationship and destination node.
MATCH (news:news)-[:COMMENT]-(comment),(news:news)-[:LIKES]-(like),(news:news)-[:DISLIKES]-(dislike),(news:news)-[:POSTED_BY]-(posted_by)
WHERE news.post_date > 0
WITH news, posted_by, count(comment) AS num_comments, count(like) AS num_likes, count(dislike) AS num_dislikes
ORDER BY news.post_date
LIMIT 20
RETURN *
I would do something like this.
MATCH (n:news)-[:POSTED_BY]->(r)
WHERE n.post_date > {recent_start_time}
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20
To speed it up and have not neo search over all your posts, I would probably index the post-date field (assuming it doesn't contain time information). And then send this query in for today, yesterday etc. until you have your 20 posts.
MATCH (n:news {post_date: {day}})-[:POSTED_BY]->(r)
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20

Neo4j Cypher query RETURN distinct set of nodes

I have a simple social network graph db model. Users can follow other users and post posts. I am trying to get a list of all posts that a user has posted along with any that anyone the user follows has posted
START a=node:node_auto_index(UserIdentifier = "USER0")
MATCH (a)-[:POSTED]->(b), (a)-[:FOLLOWS]->(c)-[:POSTED]->(d)
RETURN b, d;
It is returning the cross product of the two, a tuple of all the values in b joined with all the values in d. (b x d) I would like just a straight list of posts. How do I do this? Do I need to do two separate queries?
Anwsered at https://groups.google.com/forum/?fromgroups=#!topic/neo4j/SdM7bKNRDEA :
START a=node:node_auto_index(UserIdentifier = "USER0")
MATCH (a)-[:POSTED]->(b)
WITH a, collect(b) as posts
MATCH (a)-[:FOLLOWS]->(c)-[:POSTED]->(d)
RETURN posts, collect(d) as followersPosts;
Another way you can do it now (and IMHO the cleaner way) is to take advantage of variable length relationships.
START user=node...
MATCH (user) -[:FOLLOWS*0..1]-> (following) -[:POSTED]-> (post)
RETURN post
The advantage to this way is it lets you aggregate both your own queries and your friends/followings' queries uniformly. E.g. sorting, limiting, paginating, etc.

Resources