I have a graph dataset loaded in Neo4j with nodes being various persons and relationships being some "real" relationships between them. What makes it complicated is that each relationship has a time period during which it was valid. For example:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190301", to: "20190331"}]->
(p3:PERSON {name: "Cecil"})
I'd like to take one concrete person P and get a list of all persons with whom P was in an indirect relationship through other persons. It must hold that the intersection of dates in any relationship chain is nonempty.
So from the previous example, if we take Andy as P, the result should be Andy, Betty, because the relationship with Cecil was valid in a completely different period of time. But in the following case:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190210", to: "20190301"}]->
(p3:PERSON {name: "Cecil"})
the result should be Andy, Betty, Cecil.
Is there a way how to specify this condition in Cypher? I'm looking for an efficient solution which prunes the already found paths.
You basically have a list of intervals from all relationships on a path. For this list of intervals you need to check if they all overlap. This can be done by checking max(from) <= min(to), in cypher:
MATCH path=(p:PERSON {name:'Andy'})-[*..10]-(other) // Doesn't matter how you get the paths
UNWIND relationships(path) as r
WITH path,max(r.from) AS maxFrom,min(r.to) AS minTo
WHERE maxFrom <= minTo
RETURN extract(x in nodes(path) | x.name)
Related
I'm trying to find a match pattern to match paths of certain node types. I don't care about the type of relation. Any relation type may match. I only care about the node types.
Of course the following would work:
MATCH (n)-->(:a)-->(:b)-->(:c) WHERE id(n) = 0
But, some of these paths may have relations to themselves. This could be for :b, so I'd also like to match:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
And:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
I can do this with relations easily enough, but I can't figure out how to do this with nodes, something like:
MATCH (n)-->(:a)-->(:b*1..)-->(:c) WHERE id(n) = 0
As a practical example, let's say I have a database with people, cars and bikes. The cars and bikes are "owned" by people, and people have relationships like son, daughter, husband, wife, etc. What I'm looking for is a query that from a specific node, gets all nodes of related types. So:
MATCH (n)-->(:person*1..)-->(:car) WHERE Id(n) = 0
I would expect that to get node "n", it's parents, grandparents, children, grandchildren, all recursively. And then of those people, their cars. If I could assume that I know the full list of relations, and that they only apply to people, I could get this to work as follows:
MATCH
p = (n)-->(:person)-[:son|daughter|husband|wife|etc*0..]->(:person)-->(:car)
WHERE Id(n) = 0
RETURN nodes(p)
What I'm looking for is the same without having to specify the full list of relations; but just the node label.
Edit:
If you want to find the path from one Person node to each Car node, using only the node labels, and assuming nodes may create cycles, you can use apoc.path.expandConfig.
For example:
MERGE (mark:Person {name: "Mark"})
MERGE (lju:Person {name: "Lju"})
MERGE (praveena:Person {name: "Praveena"})
MERGE (zhen:Person {name: "Zhen"})
MERGE (martin:Person {name: "Martin"})
MERGE (joe:Person {name: "Joe"})
MERGE (stefan:Person {name: "Stefan"})
MERGE (alicia:Person {name: "Alicia"})
MERGE (markCar:Car {name: "Mark's car"})
MERGE (ljuCar:Car {name: "Lju's car"})
MERGE (praveenaCar:Car {name: "Praveena's car"})
MERGE (zhenCar:Car {name: "Zhen's car"})
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (praveena)-[:CHILD_OF]-(martin)
MERGE (praveena)-[:MARRIED_TO]-(joe)
MERGE (zhen)-[:CHILD_OF]-(joe)
MERGE (alicia)-[:CHILD_OF]-(joe)
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (anthony)-[:CHILD_OF]-(rik)
MERGE (martin)-[:CHILD_OF]-(mark)
MERGE (stefan)-[:CHILD_OF]-(zhen)
MERGE (lju)-[:CHILD_OF]-(stefan)
MERGE (markCar)-[:OWNED]-(mark)
MERGE (ljuCar)-[:OWNED]-(lju)
MERGE (praveenaCar)-[:OWNED]-(praveena)
MERGE (zhenCar)-[:OWNED]-(zhen)
Running a query:
MATCH (n:Person{name:'Joe'})
CALL apoc.path.expandConfig(n, {labelFilter: "Person|/Car", uniqueness: "NODE_GLOBAL"})
YIELD path
RETURN path
will return four unique paths from Joe node to the four car nodes. There are several options for uniqueness of the path, see uniqueness
The /CAR makes it a Termination label, i.e. returned paths are only up to this given label.
I'm new to Neo4J and Cypher and decided to play around with the Movie sample data that is provided when installing Neo4J desktop.
I want to run a very simple query, namely to retrieve the titles of the movies which involved 3 people, Liv Tyler, Charlize Theron, and Bonnie Hunt. Matching up two people is not a problem (see the code below) but including a third one is difficult.
In SQL this wouldn't be a problem for me, but Cypher causes serious headaches. Here is the query so far:
MATCH (Person {name: "Liv Tyler"})-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(Person {name: "Bonnie Hunt"})
RETURN movie.title AS Title
I've tried to use AND statements, but nothing works.
So how to include Charlize Theron in this query?
You can use multiple patterns to match three or more connections to a single node.
You can use the variable movie which you are using in your query to refer same Movie node to include the pattern (:Person {name: "Charlize Thero"})-[:ACTED_IN]->(movie).
MATCH (:Person {name: "Liv Tyler"})-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(:Person {name: "Bonnie Hunt"}),
(:Person {name: "Charlize Theron"})-[:ACTED_IN]->(movie)
RETURN movie.title AS Title
You can also rewrite the above query as follows:
MATCH (:Person {name: "Liv Tyler"})-[:ACTED_IN]->(movie:Movie),
(:Person {name: "Bonnie Hunt"})-[:DIRECTED]->(movie),
(:Person {name: "Charlize Theron"})-[:ACTED_IN]->(movie)
RETURN movie.title AS Title
If you have some arbitrary number of actors (parameterized), where you can't hardcode the :Person nodes in question, you can instead match on :Person nodes with their name in the parameter list, then filter based on the count of patterns found (you want to make sure all of the persons who acted in the movie are counted).
But if we do that first for directors, then we have some movie matches already, and can apply an all() predicate on the list of actors to ensure they all acted in the movie.
Assuming two list parameters, one for actors, one for directors:
MATCH (director:Person)-[:DIRECTED]->(m:Movie)
WHERE director.name in $directors
WITH m, count(director) as directorCount
WHERE directorCount = size($directors)
AND all(actor IN $actors WHERE (:Person {name:actor})-[:ACTED_IN]->(m))
RETURN m.title as Title
If I have a graph that looks like so for nodes (p) and (e):
(p:Person)-[r:WorksFor]->(e:Employer)
And I have the following data:
(Person {name: Andrew})-[r:WorksFor]->(Employer {name: Google})
(Person {name: James})-[r:WorksFor]->(Employer {name: Google})
(Person {name: James})-[r:WorksFor]->(Employer {name: Apple})
(Person {name: Evan})-[r:WorksFor]->(Employer {name: Apple})
How can I query between (Person {name: Evan}) through each relationship and get to (Person {name: Andrew}) returning each employer and person along the way with an arbitrary number of employers and persons in-between?
Ideally the above would return a chain that looked like:
(Andrew)->(Google)->(James)->(Apple)->(Evan)
Thank you for your help.
(EDIT) Addendum:
The following seems to work but only if the players are separated by only two degrees, is there a way to make this completely variable length?
MATCH
(p:Person {name: "Andrew"})-->(e:Employer)<--(p3:Person)-->(e2:Employer)<--(p2:Person {name: "Evan"})
RETURN *
You want variable-length pattern matching.
Depending on your graph you can define the relationships to traverse or leave off the type. We can omit the direction to specify that we don't care about the direction of relationships to traverse:
MATCH path = (p:Person {name: "Andrew"})-[:WorksFor*]-(p2:Person {name: "Evan"})
RETURN path
If you want the nodes in the path, you can return nodes(path) to get that list.
If you only want the shortest path between these two, you can match to both then match using the shortestPath function:
MATCH (p:Person {name: "Andrew"}), (p2:Person {name: "Evan"})
MATCH path = shortestPath((p)-[:WorksFor*]-(p2))
RETURN path
I am trying to find the minimum checkpoints that were traversed between the nodes for each person. There are multiple paths that can be traversed by each person.
Example:
CREATE
(:person {id: 0}),
(:person {id: 1})-[:rel1]->(:chkpt1 {id: '1'})-[:rel2]->(:chkpt2 {id: '2'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_1'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_2'})-[:rel2]->(:chkpt2 {id: '2_1'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_3'})-[:rel2]->(:chkpt2 {id: '2_2'})-[:rel3]->(:chkpt3 {id: '3_1'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_4'})-[:rel2]->(:chkpt2 {id: '2_3'})-[:rel3]->(:chkpt3 {id: '3_2'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_5'})-[:rel2]->(:chkpt2 {id: '2_4'})-[:rel3]->(:chkpt3 {id: '3_3'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_6'})-[:rel2]->(:chkpt2 {id: '2_5'})-[:rel3]->(:chkpt3 {id: '3_4'})
Currently, I am using the OPTIONAL MATCH clause and running multiple queries as follows:
MATCH (p:person)
OPTIONAL MATCH (p)-[:rel1]-(cp1:chkpt1)
WITH p, cp1
WHERE cp1 IS NULL
RETURN p.id
Returns: person0
Then I run a separate query to find all the persons that didn't make it to the next checkpoint.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)
OPTIONAL MATCH (cp1)-[:rel2]-(cp2:chkpt2)
WITH p, cp1, cp2
WHERE cp2 IS NULL
RETURN DISTINCT p.id, cp1.id
Returns: person2
Similarly for the next checkpoint.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)
OPTIONAL MATCH (cp2)-[:rel3]-(cp3:chkpt3)
WITH p, cp1, cp2, cp3
WHERE cp3 IS NULL
RETURN DISTINCT p.id, cp1.id, cp2.id
Returns: person1 and person2
I want to return only person1 as person2 missed previous traversals.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)-[:rel3]-(cp3:chkpt3)
RETURN DISTINCT p.id, cp1.id, cp2.id
Returns: person2 and person3
However, I want to only return person3 as person2 did not make it to chkpt3 and chkpt2.
I need to not include the persons that have already been excluded because they did not make it to the previous checkpoint on another traversal.
Example:
person1 should only show up that they did not make it to chkpt1.
person2 should only show up that they did not make it to chkpt3.
person3 shows up in chkpt3 as they completed all the paths to the final chkpt3.
I would like to summarize the counts of the persons that made it to a certain checkpoint. As there could be multiple persons that made it to the shortest checkpoint.
I also tried to combine all queries with multiple OPTIONAL MATCH clauses but that slows down a lot when the number of nodes increases.
There will be 100.000 to a million total nodes. The actual traversal will only involve 1000s of nodes as the persons will be filtered based on some value.
However, I want to only return person3 as person2 did not make it to chkpt3 and chkpt2.
How about this query? It counts the number of traversals a person is involved in and checks whether they made it to checkpoint 3 in all traversals.
MATCH (p:person)-[r]-()
WITH p, count(r) AS allTraversals
MATCH (p)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)-[:rel3]-(cp3:chkpt3)
WITH p, allTraversals, count(cp3) AS cp3s
WHERE allTraversals = cp3s
RETURN p
(Note: this will not work for person0.)
Additionally, a couple of observations:
(1.) You can use the WHERE NOT <pattern> construct to formulate negative conditions in a more succinct way.
MATCH (p:person)
WHERE NOT (p:person)-[:rel1]->(:chkpt1)
RETURN p
(2.) If it's possible, you might consider reviewing your data model and store persons and checkpoints as a single node and add the paths between them. This is a more graph-like representation and should help in formulating efficient queries.
I would like to match all paths from one given node.
-->(c: {name:"*Tom*"})
/
(a)-->(b)-->(d: {name:"*Tom*"})
\
-->(e: {name:"*Tom*"})
These paths have specified structure that:
- the name of all children of the second-last node (b) should contain "Tom" substring.
How to write correct Cypher?
Let's recreate the dataset:
CREATE
(a:Person {name: 'Start'}),
(b:Person),
(c:Person {name: 'Tommy Lee Jones'}),
(d:Person {name: 'Tom Hanks'}),
(e:Person {name: 'Tom the Cat'}),
(a)-[:FRIEND]->(b),
(b)-[:FRIEND]->(c),
(b)-[:FRIEND]->(d),
(b)-[:FRIEND]->(e)
As you said in the comment, all requires a list. To get a list, you should use the collect function on the neighbours of b:
MATCH (:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(bn:Person)
WITH b, collect(bn) AS bns
WHERE all(bn in bns where bn.name =~ '.*Tom.*')
RETURN b, bns
We call b's neighbours as bn and collect them to a bns list.