Cypher query to find shortest partial paths and aggregate the result - neo4j

I am trying to find the minimum checkpoints that were traversed between the nodes for each person. There are multiple paths that can be traversed by each person.
Example:
CREATE
(:person {id: 0}),
(:person {id: 1})-[:rel1]->(:chkpt1 {id: '1'})-[:rel2]->(:chkpt2 {id: '2'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_1'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_2'})-[:rel2]->(:chkpt2 {id: '2_1'}),
(:person {id: 2})-[:rel1]->(:chkpt1 {id: '1_3'})-[:rel2]->(:chkpt2 {id: '2_2'})-[:rel3]->(:chkpt3 {id: '3_1'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_4'})-[:rel2]->(:chkpt2 {id: '2_3'})-[:rel3]->(:chkpt3 {id: '3_2'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_5'})-[:rel2]->(:chkpt2 {id: '2_4'})-[:rel3]->(:chkpt3 {id: '3_3'}),
(:person {id: 3})-[:rel1]->(:chkpt1 {id: '1_6'})-[:rel2]->(:chkpt2 {id: '2_5'})-[:rel3]->(:chkpt3 {id: '3_4'})
Currently, I am using the OPTIONAL MATCH clause and running multiple queries as follows:
MATCH (p:person)
OPTIONAL MATCH (p)-[:rel1]-(cp1:chkpt1)
WITH p, cp1
WHERE cp1 IS NULL
RETURN p.id
Returns: person0
Then I run a separate query to find all the persons that didn't make it to the next checkpoint.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)
OPTIONAL MATCH (cp1)-[:rel2]-(cp2:chkpt2)
WITH p, cp1, cp2
WHERE cp2 IS NULL
RETURN DISTINCT p.id, cp1.id
Returns: person2
Similarly for the next checkpoint.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)
OPTIONAL MATCH (cp2)-[:rel3]-(cp3:chkpt3)
WITH p, cp1, cp2, cp3
WHERE cp3 IS NULL
RETURN DISTINCT p.id, cp1.id, cp2.id
Returns: person1 and person2
I want to return only person1 as person2 missed previous traversals.
MATCH (p:person)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)-[:rel3]-(cp3:chkpt3)
RETURN DISTINCT p.id, cp1.id, cp2.id
Returns: person2 and person3
However, I want to only return person3 as person2 did not make it to chkpt3 and chkpt2.
I need to not include the persons that have already been excluded because they did not make it to the previous checkpoint on another traversal.
Example:
person1 should only show up that they did not make it to chkpt1.
person2 should only show up that they did not make it to chkpt3.
person3 shows up in chkpt3 as they completed all the paths to the final chkpt3.
I would like to summarize the counts of the persons that made it to a certain checkpoint. As there could be multiple persons that made it to the shortest checkpoint.
I also tried to combine all queries with multiple OPTIONAL MATCH clauses but that slows down a lot when the number of nodes increases.
There will be 100.000 to a million total nodes. The actual traversal will only involve 1000s of nodes as the persons will be filtered based on some value.

However, I want to only return person3 as person2 did not make it to chkpt3 and chkpt2.
How about this query? It counts the number of traversals a person is involved in and checks whether they made it to checkpoint 3 in all traversals.
MATCH (p:person)-[r]-()
WITH p, count(r) AS allTraversals
MATCH (p)-[:rel1]-(cp1:chkpt1)-[:rel2]-(cp2:chkpt2)-[:rel3]-(cp3:chkpt3)
WITH p, allTraversals, count(cp3) AS cp3s
WHERE allTraversals = cp3s
RETURN p
(Note: this will not work for person0.)
Additionally, a couple of observations:
(1.) You can use the WHERE NOT <pattern> construct to formulate negative conditions in a more succinct way.
MATCH (p:person)
WHERE NOT (p:person)-[:rel1]->(:chkpt1)
RETURN p
(2.) If it's possible, you might consider reviewing your data model and store persons and checkpoints as a single node and add the paths between them. This is a more graph-like representation and should help in formulating efficient queries.

Related

Neo4j (4.1.3) : How to create relationship on the fly when match returns exactly one row for one label

I am using neo4j 4.1.3. I have 2 labels : Person and Rec. I need to create a relationship between these 2 only when there exists exactly one person with that last name. When I run the below, relationships are created for all persons with that last name. My code is below:
MATCH (echk:REC {id: 'abcdef'})
OPTIONAL MATCH (p:Person) WHERE p.name CONTAINS 'Shirley'
WITH p, echk, count(p) as personcnt
where personcnt=1
CALL apoc.merge.relationship(p, 'Test', {}, {}, echk, {}) YIELD rel
RETURN p, echk, rel
I have 33 people with "Shirley' and 33 relationships are getting created with that REC. However, there should be no relationship.
Run this
MATCH (echk:REC {id: 'abcdef'})
OPTIONAL MATCH (p:Person) WHERE p.name CONTAINS 'Shirley'
RETURN p, echk, count(p) as personcnt
and you'll see why all 33 are created.
If there can only be one match, I suggest trying to find the "one" Shirley first, if they don't exist you are done, if more than one you are done. If and only if only one match returns, then continue on to match with 'abcdef' and do the merge.
If I have both p and count(p) in the WITH, I always get 1 for count(p) which makes sense. So I have changed my query as below to make this to work:
OPTIONAL MATCH (p:Person) WHERE p.name CONTAINS 'Shirley'
WITH count(p) AS cnt WHERE cnt = 1
MATCH (echk:REC {id: 'abcdef'})
MATCH (p:Person) where p.name contains 'Shirley'
CALL apoc.merge.relationship(p, 'Test', {}, {}, echk, {}) yield rel
I have now added this to my cypher logic to create dynamic relationship and it works as expected. Hopefully this helps someone at a later date.

Neo4j:Delete all relationships except for those in list

I am trying to delete all relationships to a node except those that are in a list. I have already create a node (:Person {name: 'John'}) and 4 other nodes (:Car). Then I MERGE all the car nodes to the person node. I then want to delete all the relationships for the person node except for those in a list (shown below)
UNWIND [{name:'test1'}, {name:'test2'}] AS test
MATCH (p:Person {name:'John'})
OPTIONAL MATCH (p)-[d:DRIVES]->(c:Car)
WHERE NOT EXISTS((p)-[:DRIVES]->(c:Car {name:test.name}))
DELETE d
RETURN p
However the query above deletes all relationships but when I reduce the list to include only 1 car node, the above query works (i.e. the query only works when the list contains only 1 node and doesn't work when the list is larger). I am not sure why this is the case.
I am using neo4j 4.1.
Thanks in advance.
This should work:
WITH ['test1', 'test2'] AS tests
MATCH (p:Person {name: 'John'})
OPTIONAL MATCH (p)-[d:DRIVES]->(c:Car)
WHERE NOT c.name IN tests
DELETE d
RETURN p
and also this:
WITH ['test1', 'test2'] AS tests
MATCH (p:Person {name: 'John'})
FOREACH(x IN [(p)-[d:DRIVES]->(c:Car) WHERE NOT c.name IN tests | d] | DELETE x)
RETURN p

Neo4j Cypher: interdependent relationship values in a path

I have a graph dataset loaded in Neo4j with nodes being various persons and relationships being some "real" relationships between them. What makes it complicated is that each relationship has a time period during which it was valid. For example:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190301", to: "20190331"}]->
(p3:PERSON {name: "Cecil"})
I'd like to take one concrete person P and get a list of all persons with whom P was in an indirect relationship through other persons. It must hold that the intersection of dates in any relationship chain is nonempty.
So from the previous example, if we take Andy as P, the result should be Andy, Betty, because the relationship with Cecil was valid in a completely different period of time. But in the following case:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190210", to: "20190301"}]->
(p3:PERSON {name: "Cecil"})
the result should be Andy, Betty, Cecil.
Is there a way how to specify this condition in Cypher? I'm looking for an efficient solution which prunes the already found paths.
You basically have a list of intervals from all relationships on a path. For this list of intervals you need to check if they all overlap. This can be done by checking max(from) <= min(to), in cypher:
MATCH path=(p:PERSON {name:'Andy'})-[*..10]-(other) // Doesn't matter how you get the paths
UNWIND relationships(path) as r
WITH path,max(r.from) AS maxFrom,min(r.to) AS minTo
WHERE maxFrom <= minTo
RETURN extract(x in nodes(path) | x.name)

Cypher Query to return x Number of a particular type of node

Lets say we have a Neo4j graph such as (Brand)-[:from]->(Post)<-[:likes]-(Person).
How can I return a cypher query which will have a minimum number of brand posts, say 3. I want this to be scalable and not dependent on a specific property attribute value.
Hence the results would return at least 3 instances of the Brand nodes, as well as maybe 5 from Post and 15 from Person.
I have tried a few different things:
1.) Declare several variable names for each brand (not scalable)
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
Match (b1:Brand)-[]->(p1:Post)<-[]-(per2:Person)
Match (b2:Brand)-[]->(p2:Post)<-[]-(per3:Person)
return b,b1,b2,p,p1,p2,per,per2,per3
limit 30
This didn't work because it essentially return the same as
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
return b,p,per
limit 30
2.) Use a foreach some
Match (b:Brand) WITH collect (distinct b) as bb
FOREACH (b in bb[0..3] | MATCH (b)-[]->(p:Post)<-[]-(per:Person))
RETURN b, p, per LIMIT 40
This didn't work because you can't use Match inside a Foreach call.
The only way I know how to do this is to declare a where clause with their unique property brand name values which is not scalable. It looks like this:
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
where b.brand = "b1" OR b.brand ="b2" or b.brand = "b3"
Return b,p,per
Limit 30
However the above still doesn't even return what I want.
Please help. Here is a quick graph to test on:
Create (b1:Brand {brand:'b1'})
Create (b2:Brand {brand:'b2'})
Create (b3:Brand {brand:'b3'})
Create (p1:Post {id: "001",message: "foo"})
Create (p2:Post {id: "002",message: "bar"})
Create (p3:Post {id: "003",message: "baz"})
Create (p4:Post {id: "004",message: "raz"})
Create (per1:Person {id: "001",name: "foo"})
Create (per2:Person {id: "002",name: "foo"})
Create (per3:Person {id: "003",name: "foo"})
Create (per4:Person {id: "004",name: "foo"})
Create (per5:Person {id: "005",name: "foo"})
Create (per6:Person {id: "006",name: "foo"})
Create (per7:Person {id: "007",name: "foo"})
Merge (b1)-[:FROM]->(p1)
Merge (b1)-[:FROM]->(p2)
Merge (b2)-[:FROM]->(p3)
Merge (b3)-[:FROM]->(p4)
Merge (per1)-[:LIKES]->(p1)
Merge (per1)-[:LIKES]->(p2)
Merge (per1)-[:LIKES]->(p3)
Merge (per2)-[:LIKES]->(p1)
Merge (per2)-[:LIKES]->(p4)
Merge (per3)-[:LIKES]->(p3)
Merge (per4)-[:LIKES]->(p1)
Merge (per5)-[:LIKES]->(p2)
Merge (per6)-[:LIKES]->(p1)
Merge (per6)-[:LIKES]->(p2)
Merge (per6)-[:LIKES]->(p3)
Merge (per6)-[:LIKES]->(p4)
Merge (per7)-[:LIKES]->(p4)
You can use the unwind instead of foreach:
Match (b:Brand) WITH collect (distinct b) as bb
UNWIND bb[0..3] as b
MATCH (b)-[]->(p:Post)<-[]-(per:Person)
RETURN b, p, per LIMIT 40
Or combine with and limit:
MATCH (b:Brand) WITH distinct b LIMIT 3
MATCH (b)-[]->(p:Post)<-[]-(per:Person)
RETURN b, p, per LIMIT 40

How can I use cypher to return some limited amount of nodes, and a count of all nodes?

I currently have this query:
START n=node(*)
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, count(p)
LIMIT 5
Say there are 42 people in FooManGroup, I want to return 5 of these people, with a count of 42.
Is this possible to do in one query?
Running this now returns 5 rows, which is fine, but a count of 104, which is the total number of nodes of any type in my DB.
Any suggestions?
You can use a WITH clause to do the counting of the persons, followed by an identical MATCH clause to do the matching of each person. Notice that you need to START on the p nodes and not just some n that will match any node in the graph:
MATCH (p:Person )-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, personsInGroup
LIMIT 5
It may not be the best or most elegant way to this, but it works. If you use cypher 2.0 it may be a bit more compact like this:
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Relationship types are always uppercased in cypher, so :is_member should be :IS_MEMBER which I think is more readable:
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Try this:
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN count(p), collect(p)[0..5]

Resources