I've recently started learning Cypher. I have a database containing four users and films. Users can have can have [:WATCHED] / [:WATCHLISTED] / [:FAVORITED] relationships with films.
I want to get the films which all four users have watched. Here's a working query I've written:
match (u1)-[:WATCHED]->(f)<-[:WATCHED]-(u2),
(u3)-[:WATCHED]->(f)<-[:WATCHED]-(u4)
return u1, u2, u3, u4, f
I wanted to know if there was a more efficient way to do this. Or any another way, which I can't of. I'm asking this out of curiosity.
You can do this for example :
MATCH (f:Film)
WHERE size((f)<-[:WATCHED]-()) = 4
RETURN f, [(f)<-[:WATCHED]-(u:User) | u] as watchers
Here I assume that there is only one relationship of type WATCHED between a user and a movie, even if the user has watched the movie many times.
To avoid having to hardcode a User node count, this query efficiently gets the count using the DB's internal statistics:
MATCH (u:User)
WITH COUNT(u) AS userCount
MATCH (f:Film)
WHERE SIZE((f)<-[:WATCHED]-()) = userCount
RETURN f;
This query does not return the users that watched the film, since that is literally all the users in the DB, and with a sufficiently large number of them your query can run out of memory -- or it can take a very long time for a client (like the neo4j Browser) to receive and process the results. I think the main point of a query like this is to find the films, not the users. If you really want to get all the users, a separate query will do: MATCH (u:Users) RETURN u.
You can use all:
https://neo4j.com/docs/developer-manual/current/cypher/functions/predicate/
This checks if a predicate is true for all elements.
Related
I have constructed a query to find the people who follow each other and who have read books in the same genre. Here it is:
MATCH (u1:User)-[:READ]->(b1:Book)
WITH collect(DISTINCT b1.genre) AS genres,u1 AS user1
MATCH (u2:User)-[:READ]->(b2:Book)
WHERE (user1)<-[:FOLLOWS]->(u2) AND b2.genre IN genres
RETURN DISTINCT user1.username AS user1,u2.username AS user2
The idea is that we collect all the book genres for one of them, and if a book read by the other is in that list of genres (and they follow each other), then we return those users. This seems to work: we get a list of distinct pairs of individuals. I wonder, though, if there a quicker way to do this? My solution seems somewhat clumsy, but I found it surprisingly finicky trying to specify that they have read a book in the same genre without getting back all the pairs of books and duplicating individuals. For example, I
first wrote the following:
MATCH (b1:Book)<-[:READ]-(u1:User)-[:FOLLOWS]-(u2:User)-[:READ]->(b2:Book)
WHERE b1.genre = b2.genre
RETURN DISTINCT u1.username AS user1, u2.username AS user2
Which seems simpler, but in fact it returned repeated names for all the books that were read in the same genre. Is my solution the simplest, or is there a simpler one?
This is one way of rewriting the query
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
MATCH (n1)-[:READ]->(book), (n2)-[:READ]->(book2)
WHERE book.genre = book2.genre
RETURN n1.username, n2.username, count(*)
Here is another collecting genres for each user
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WITH n1, n2,
[(n1)-[:READ]->(book) | book.genre] AS g1,
[(n2)-[:READ]->(book) | book.genre] AS g2
WHERE ANY(x IN g1 WHERE x IN g2)
RETURN n1, n2, count(*)
Note that sometimes longer queries are not especially better in the sense that the ways the data are retrieved need to make sense to yourself.
Your model however clearly shows that you would benefit from a bit of graph refactoring, extracting the genre into its own node, for eg
MATCH (n:Book)
MERGE (g:Genre {name: n.genre})
MERGE (n)-[:HAS_GENRE]->(g)
And this would be the new query which leverages a graph model
PROFILE
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WHERE (n1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(n2)
RETURN n1.username, n2.username, count(*)
I am trying to create a query in Neo4J (which is something I'm literally learning as I go) but I am struggling. So I am using the Neo4J sandbox and movie data set provided (:play movie-graph). This is what I am looking for...
Create a query to retrieve all the movie directors that worked with actors 55 years or older at the time the movie was released
Display the networks of actors, writers, and movies for these actors
This is what I have but I'm sure it is completely wrong. Any help would be appreciated!
MATCH (directors)<-[:DIRECTED]-(directors)-[:ACTED_IN]->(actors)-(actors: Age) WHERE actors.age >= 55 RETURN directors.name
Here is a query which will return all directors who directed movies wit actors born before 1970. Change the year to any year you need it to be to satisfy your criteria.
MATCH (q)-[r:DIRECTED]->(k)<-[:ACTED_IN]-(s) where s.born<1970 RETURN DISTINCT q;
Few suggestions for you,
. Understand neo4j nodes and relationships. In your query, you have written (directors), (actors) etc.
I dont find these entities in the movie dataset. It shows that you need to understand these concepts and how these are represented in the queries. The browser console shows these entities clearly.
. Understand relationship directions and how they are represented in the queries.
. Modify the above query in small ways and see how the results changes. For ex, instead of q, return s. Remove DISTINCT clause etc.
. Finally create your own custom data set and try out queries.
Good luck.
To find the names of directors who directed movies that had an actor aged 55+ at the time the movie was released, you can take advantage of the new existential subquery in neo4j 4.0:
MATCH (actor:Person)-[:ACTED_IN]->(m)<-[:WROTE]->(w)
WHERE m.released-actor.born >= 55
RETURN actor, COLLECT(DISTINCT m) AS movies, COLLECT(DISTINCT w) AS writers
To get the actors who were 55+ at the time a movie they were in was released, along with a list of those movies and a list of their writers:
MATCH (w)-[:WROTE]->(m)<-[:ACTED_IN]-(actor:Person)
WHERE m.released-actor.born >= 55
RETURN actor, COLLECT(DISTINCT m) AS movies, COLLECT(DISTINCT w) AS writers
I want to create a new table based on relationships, after retrieving results.The query works and outputs for few results.
match (p:Person)-[r:LIVES]->(t:COUNTRY)<-[r2:LIVES]-(p2:Person)
where p<>p2 and r.year = r2.year and r.year >=2015 and r2.year>=2015
return p,r,t,r2,p2 limit 25;
But the server stops responding when I want all results.
match (p:Person)-[r:LIVES]->(t:COUNTRY)<-[r2:LIVES]-(p2:Person)
where p<>p2 and r.year = r2.year and r.year >=2015 and r2.year>=2015
return p,r,t,r2,p2;
This is basically self-joining relationship "LIVES".I have searched a lot but not found a way to create indices on relationships.
Any suggestions?
Since it sound like you aren't bound to the results format, I would like to offer an alternative Cypher.
The thing is that in your cypher, p and p2 basically form a cartesian product of people in the same year and country. So really you just need to group people by year and country, filter out groups of one, and then every pair in that bag is your p and p2 from your original query.
MATCH (p:Person)-[r:LIVES]->(t:COUNTRY)
WHERE r.year >=2015
WITH t, DISTINCT r.year as Year, COLLECT({n:p, r:r}) as people
WHERE SIZE(people) > 1
RETURN t, people
I have the following query
MATCH (User1 )-[:VIEWED]->(page)<-[:VIEWED]- (User2 )
RETURN User1.userId,User2.userId, count(page) as cnt
Its a relatively simple query to find co-page view counts between users.
Its just too slow, and I have to terminate it after some time.
Details
User consists of about 150k Nodes
Page consists of about 180k Nodes
User -VIEWS-> Page has about 380k Relationships
User has 7 attributes, and Page has about 5 attributes.
Both User and Page are indexed on UserId and PageId respectively.
Heap Size is 512mb (tried to run on 1g too)
What would be some of the ways to optimize this query as I think the count of the nodes and relationships are not a lot.
Use Labels
Always use Node labels in your patterns.
MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
RETURN u1.userId, u2.userId, count(p) AS cnt;
Don't match on duplicate pairs of users
This query will be executed for all pairs of users (that share a viewed page) twice. Each user will be mapped to User1 and then each user will also be mapped to User2. To limit this:
MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE id(u1) > id(u2)
RETURN u1.userId, u2.userId, count(p) AS cnt;
Query for a specific user
If you can bind either side of the pattern the query will be much faster. Do you need to execute this query for all pairs of users? Would it make sense to execute it relative to a single user only? For example:
MATCH (u1:User {name: "Bob"})-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE NOT u1=u2
RETURN u1.userId, u2.userId, count(p) AS cnt;
As you are trying different queries you can prepend EXPLAIN or PROFILE to the Cypher query to see the execution plan and number of data hits. More info here.
I have a simple social network graph db model. Users can follow other users and post posts. I am trying to get a list of all posts that a user has posted along with any that anyone the user follows has posted
START a=node:node_auto_index(UserIdentifier = "USER0")
MATCH (a)-[:POSTED]->(b), (a)-[:FOLLOWS]->(c)-[:POSTED]->(d)
RETURN b, d;
It is returning the cross product of the two, a tuple of all the values in b joined with all the values in d. (b x d) I would like just a straight list of posts. How do I do this? Do I need to do two separate queries?
Anwsered at https://groups.google.com/forum/?fromgroups=#!topic/neo4j/SdM7bKNRDEA :
START a=node:node_auto_index(UserIdentifier = "USER0")
MATCH (a)-[:POSTED]->(b)
WITH a, collect(b) as posts
MATCH (a)-[:FOLLOWS]->(c)-[:POSTED]->(d)
RETURN posts, collect(d) as followersPosts;
Another way you can do it now (and IMHO the cleaner way) is to take advantage of variable length relationships.
START user=node...
MATCH (user) -[:FOLLOWS*0..1]-> (following) -[:POSTED]-> (post)
RETURN post
The advantage to this way is it lets you aggregate both your own queries and your friends/followings' queries uniformly. E.g. sorting, limiting, paginating, etc.