Is there a better way to do this than 3 separate queries?
MATCH (you:User {name: "Alexander"})-[:LIKES]->(youLike:User)
RETURN youLike
MATCH (likesYou:User)-[:LIKES]->(you:User {name: "Alexander"})
RETURN likesYou
MATCH (mutualLike:User)-[:LIKES]->(you:User {name: "Alexander"})-[:LIKES]->(mutualLike:User)
RETURN mutualLike
Here is a shot at a single query.
Essentially, find yourself first, optionally find people that you like and collect them, optionally find people that like you and collect them, then return both collections and the intersection of the two.
By matching the node that identifies you and reusing it you match yourself once instead of three times.
Using the collection filter function allowsyou two find the intersection of the two :LIKES populations without rematching those nodes.
The OPTIONAL keyword allows the query to continue if either :LIKES population is empty.
MATCH (you:User {name: "Alexander"})
WITH you
OPTIONAL MATCH(you)-[:LIKES]->(youLike:User)
WITH you, collect(youLike) as youLike
OPTIONAL MATCH (likesYou:User)-[:LIKES]->(you)
WITH you, youLike, collect(likesYou) as likesYou
RETURN you
, youLike
, likesYou
, filter(n in youLike where n in likesYou) as mutualLike
Related
I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)
I'm stumped on this one, and I think the answer will be straightforward, so let me cut right to it.
Given a graph that looks like this:
Created by a query that looks like this:
CREATE (simpsons:Family {name: "Simpson"})
CREATE (homer:Father {name: "Homer"})
CREATE (lisa:Daughter {name: "Lisa"})
CREATE (snowball:Pet {name: "snowball"})
CREATE (lisa)-[:owns]->(snowball)-[:has]->(:Item {name: "catnip"})
CREATE (homer)-[:has]->(:Item {name: "beer"})
CREATE (lisa)-[:has]->(:Item {name: "saxophone"})
CREATE (lisa)<-[:memberOf]-(simpsons)-[:memberOf]->(homer)
Why would a query that looks like this fail?
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
Taken separately, its two components both find matches.
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"})
RETURN f;
and
MATCH (f:Family),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
But when pieced together there are no matches.
I have tried PROFILEing the query and it seems like Cypher works backwards from Snowball. It can make that first connection between the family and Snowball.
After that it does a VarLengthExpand(All)
snowball, f, lisa
(f)-[ UNNAMED22:*..10]-(lisa)
Which yields 6 rows. We then drop to 0 rows with this Filter:
snowball, f, lisa
lisa: Daughter
I can get the match to work if I declare a connection between the family and a daughter in the first line of the match statement, but for reasons having to do w/ my particular application this is not a useful workaround.
MATCH (f:Family)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(lisa)-[*1..10]-(snowball:Pet {name: "snowball"})-[*1..10]-(:Item {name: "catnip"})
RETURN f;
I think I'm missing something about how Cypher searches for these patterns. Does anyone have insight into what that might be? Thank you for your time!
This isn't a Cypher bug, this is a side-effect of relationship uniqueness within a given MATCH pattern.
From the uniqueness section of the docs:
While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
This type of uniqueness is usually correct, and is great for preventing infinite loops when using variable-length relationships which traverse a cycle.
Relationship uniqueness is preserved for patterns from a MATCH or an OPTIONAL MATCH, even when these include multiple comma separated paths, as in your case.
You have all of the paths within the pattern of a single MATCH, so relationships must be unique; if used in one path, they will not be reused for another path.
The real problem is here: (f)-[*1..10]-(snowball:Pet) because you've already traversed the same relationship (<memberOf between the Simpsons and Lisa) when you did (f)-[*1..10]-(lisa:Daughter) earlier. Since the relationship cannot be reused, one of those two paths will not be able to be matched, so the entire MATCH fails...no such pattern exists with unique relationships.
Note that when you break up the single MATCH into multiple MATCHes, as in stdob--'s answer, the query succeeds. There is no uniqueness in play here between separate MATCH clauses.
I am querying neo4j with cypher and am getting two rows per match:
MATCH (g:Group {Name: "Goliath_Treasury237"})-[w:MEMBER]->(a:Account)<-[y:ACCOUNT]-(p:Person)-[MANAGER*0..1]->(b:Person)
WHERE NOT p.staffID = b.staffID
MATCH (g:Group {Name: "Goliath_Treasury237"})-[j:MEMBER]->(v:Account)<-[x:ACCOUNT]-(p:Person)-[f:DEP*0..1]->(d:Department)
RETURN p.GivenName, p.Surname, p.staffID, p.CorpT, b.staffID, d.Name
I'm trying to get the information for both the department and the boss of a person which I'm struggling with declaratively unless I do the double match. This returns two rows for each match where a person has a boss, one with their ID as their bosses ID and one with the correct bosses id. For people without a boss I get one row back but the bosses id is their own.
If I remove the variable length path for boss then I get one row for each individual but no row where someone doesn't have a boss.
I'm at a loss now, any help would be great!
There's several things we can fix here.
For one, you don't need the second match to start over from the beginning, you can reuse the same p node without needing all the stuff before. You also don't need to have a variable on every node and relationship if you're not going to use or return it in some way.
You can use an OPTIONAL MATCH when you don't want the match to filter out rows, newly-introduced variables in the OPTIONAL MATCH will be null if the match fails.
Something like this should work:
MATCH (:Group {Name: "Goliath_Treasury237"})-[:MEMBER]->(:Account)<-[:ACCOUNT]-(p:Person)
OPTIONAL MATCH (p)-[MANAGER]->(b:Person)
WHERE NOT p.staffID = b.staffID
OPTIONAL MATCH (p)-[:DEP]->(d:Department)
RETURN p.GivenName, p.Surname, p.staffID, p.CorpT, b.staffID, d.Name
I have three types of nodes:
(:Meal), (:User), (:Dish)
With relationships:
(:Meal)<-[:JOIN]-(:User), (:Meal)<-[:ORDERED]-(:Dish)
Now I want to fetch the information of meal in one query. I want to get result like this:
id: 1
name: xxx,
users: [1,2,3,4],
dishes: [23,42,42]
where users and dishes fields contains the ids of those users and dishes.
I tried:
MATCH (meal:Meal)
OPTIONAL MATCH (meal)<-[:JOIN]-(user:User)
OPTIONAL MATCH (meal)<-[:ORDERED]-(dish:Dish)
RETURN id(meal), meal.name, COLLECT(ID(user)), COLLECT(ID(dish))
However, this query will generate a lot duplication of users and dishes. If there are N users and M dishes, it will match N*M user-dish pairs.
I do realize that I can use DISTINCT to remove duplication. However, I am not sure about the efficiency of such query.
Is there any better way?
Try to separate the different parts of your query using WITH:
MATCH (meal:Meal)
OPTIONAL MATCH (meal)<-[:JOIN]-(user:User)
WITH meal, collect(ID(user)) as users
OPTIONAL MATCH (meal)<-[:ORDERED]-(dish:Dish)
RETURN id(meal), meal.name, users, COLLECT(ID(dish)) as dishes
I'd like to combine two requests into one query and I'm not sure what happens when 2 match statements are used in a single cypher query.
say I have a list of friends and I'd like to see a list of my friends with each of their uncles and siblings listed in a collection. Can I have two match statements that would do the job? e.g.
match friends-[:childOf]->parents-[:brother]->uncles
, friends-[:childOf]->parents<-[:childOf]-siblings
return friends, collect(siblings), collect(uncles)
However, if I do a query like this, it always returns no results.
Since you have already chosen parents in your first match class, you can do like this -
match friends-[:childOf]->parents-[:brother]->uncles
with friends, parents, uncles
match parents<-[:childOf]-siblings
return friends, collect(siblings), collect(uncles)
You may want to make some of those relationships optional. For example, if you find a sibling but you don't find any uncles, this query will return null because it didn't satisfy both match clauses. If you make the end relationships optional then you don't have to satisfy both clauses completely to return data. So:
match friends-[:childOf]->parents-[?:brother]->uncles
, friends-[:childOf]->parents<-[?:childOf]-siblings
return friends, collect(siblings), collect(uncles)