Neo4j Cypher - Query partial fixed route and partial variable route - neo4j

Let's say I have a graph network like shown here:
I can do a cypher query using something like
MATCH (a:A)-[]->(b:B)-[]->(c:C)-[]-(d1:D),
(a)-[]->(b)-[]->(c)-[]-(d2:D),
(a)-[]->(b)-[]->(c)-[]-(d3:D),
(a)-[]->(b)-[]->(c)-[]-(d4:D),
WHERE d1.val = '1' AND d2.val = '2' AND d3.val ='3', d4.val = '4'
RETURN a, b, c, d1, d2, d3, d4
Is there a way to simplify this query, without explicitly rewriting the relationship over and over again, which are identical. I am trying to find every relation which has all the D values I am expecting, which is large list so probably an IN clause would be appropriate.
Edit:
Sample data based on answer below
create (a1:A {name: 'A1'})
create (b1:B {name: 'B1'})
create (c1:C {name: 'C1'})
create (d1:D {name: 'D1', val: 1})
create (d2:D {name: 'D2', val: 2})
create (d3:D {name: 'D3', val: 3})
create (d4:D {name: 'D4', val: 4})
create (a1)-[:NEXT]->(b1)
create (b1)-[:NEXT]->(c1)
create (c1)-[:NEXT]->(d1)
create (c1)-[:NEXT]->(d2)
create (c1)-[:NEXT]->(d3)
create (c1)-[:NEXT]->(d4)
create (a2:A {name: 'A2'})
create (b2:B {name: 'B2'})
create (c2:C {name: 'C2'})
create (a2)-[:NEXT]->(b2)
create (b2)-[:NEXT]->(c2)
create (c2)-[:NEXT]->(d1)
create (c2)-[:NEXT]->(d2)
create (a3:A {name: 'A3'})
create (b3:B {name: 'B3'})
create (c3:C {name: 'C3'})
create (a3)-[:NEXT]->(b3)
create (b3)-[:NEXT]->(c3)
create (c3)-[:NEXT]->(d1)
create (c3)-[:NEXT]->(d2)
create (c3)-[:NEXT]->(d3)
create (c3)-[:NEXT]->(d4)
return *
So the query should result in A1-->B1-->C1-->D1,D2,D3,D4 and A3-->B3-->C3-->D1,D2,D3,D4
Since A2-->B2--C2 links with only D1,D2 and not D3,D4 it should not be in the result.

The beginning of the path is always the same, so you don't need to repeat it. Then, based on a list of values, you want to check if you can find a D for each and every one of them: it could be a job for all.
Mixing all that, we get:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList
WHERE all(value IN values WHERE any(d IN dList WHERE d.val = value))
RETURN a, d, c, dList
However, if n is the number of values, that's an O(n^2) algorithm because of the second WHERE.
Let's collect the values of the nodes while collecting the nodes themselves, to avoid the double loop and turn it into a O(n) algorithm:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList, collect(DISTINCT d.val) AS dValues
WHERE all(value IN values WHERE value in dValues)
RETURN a, d, c, dList
Assuming the list of values passed as a parameter only contains distinct values, we can even change that into an O(1) algorithm by simply comparing the size of the input list and the distinct values found:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList, collect(DISTINCT d.val) AS dValues
WHERE size({values}) = size(dValues)
RETURN a, d, c, dList
Because dValues ⊂ values, if the 2 sets have the same size, they're equal.
If D.val are globally unique, or at least unique for all the D nodes connected to a single C, it can be further simplified:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList
WHERE size({values}) = size(dList)
RETURN a, d, c, dList
If the values are globally unique, the query will be faster with the unicity constraint as it will also index the values:
CREATE CONSTRAINT ON (d:D) ASSERT d.val IS UNIQUE

If every D node has a unique val property (if any), this should work:
WITH [1,2,3,4] AS desired
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN desired
WITH a, b, c, COLLECT(DISTINCT d) AS ds
WHERE SIZE(ds) = SIZE(desired)
RETURN a, b, c, ds
The result will have a row for every matched A, B, C combination, along with the collection of D nodes.

Assuming the following data set...
create (a:A {name: 'A'})
create (b:B {name: 'B'})
create (c:C {name: 'C'})
create (d1:D {name: 'D1', val: 1})
create (d2:D {name: 'D2', val: 2})
create (d3:D {name: 'D3', val: 3})
create (d4:D {name: 'D4', val: 4})
create (a)-[:NEXT]->(b)
create (b)-[:NEXT]->(c)
create (c)-[:NEXT]->(d1)
create (c)-[:NEXT]->(d2)
create (c)-[:NEXT]->(d3)
create (c)-[:NEXT]->(d4)
return *
You could execute a query something like this to match all of the specific D nodes in a particular value range.
match (a:A)-->(b:B)-->(c:C)-->(d:D)
where d.val in range(1,4)
return *
Here is an updated query based on your updated question. I collected the D values for each A,B,C chain of nodes.
match (a:A)-->(b:B)-->(c:C)-->(d:D)
where d.val in range(1,4)
with a, b, c, d
order by a.name, b.name, c.name, d.name
return a, b, c, collect(d) as d
order by a.name, b.name, c.name

Related

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

NEO4J: Finding disconnected nodes

I have this sample data
With the sample query
CREATE (a1:A {title: "a1"})
CREATE (a2:A {title: "a2"})
CREATE (a3:A {title: "a3"})
CREATE (b1:B {title: "b1"})
CREATE (b2:B {title: "b2"})
MATCH (a:A {title: "a1"}), (b:B {title: "b1"})
CREATE (a)-[r:LINKS]->(b)
MATCH (a:A), (b:B) return a,b
What I am trying to achieve:
Find all the node type A that are not connected to node type B (ans: a2, a3)
Find all the node type B that are not connected to node type A (ans: b2)
Both of this requirements are expected to be bi-directional, and have the same query template.
Where I have reached
Get all A not connected to B: gets me a2 and a3 as expected
MATCH path=(a:A)-[r]-(b:B)
WHERE (a)-[r]-(b)
WITH collect(a) as al
MATCH (c:A)
WHERE not c IN al
RETURN c
Get all disconnected B, I get both b1 and b2 which is incorrect, and printing "al" revealed that the list is empty
MATCH path=(b:B)-[r]-(a:A)
WHERE (b)-[r]-(a)
WITH collect(b) as al
MATCH (c:B)
WHERE not c IN al
RETURN c
some how
WHERE (b)-[r]-(a) **!=** WHERE (a)-[r]-(b)
even if I have the the direction as bi-directional (not mentioned)
If I change it to WHERE (a)-[r]-(b) in the second query then it works, but I want a generic bi-directional query.
Use the path pattern in where:
MATCH (a:A) WHERE NOT (a)-[:LINKS]-(:B)
RETURN a;
MATCH (b:B) WHERE NOT (b)-[:LINKS]-(:A)
RETURN b;
Or combine into one query:
OPTIONAL MATCH (a:A) WHERE NOT (a)-[:LINKS]-(:B)
WITH collect(a) AS aNodes
OPTIONAL MATCH (b:B) WHERE NOT (b)-[:LINKS]-(:A)
WITH aNodes,
collect(b) AS bNodes
RETURN aNodes, bNodes
Update: why the original query produces an incorrect result?
I think this is a bug. The problem is that when you use a variable for a relationship in where, the pattern implicitly uses the direction from left to right, even if it is not specified:
// Will return 0, but for test data should return 1
MATCH (b:B)-[r]-(a:A) WHERE (b)-[r]-(a)
RETURN COUNT(*);
// Will return 1
MATCH (b:B)-[r]-(a:A) WHERE (b)<-[r]-(a)
RETURN COUNT(*);
// Will return 1
MATCH (b:B)-[r]-(a:A) WHERE (b)--(a)
RETURN COUNT(*);
// Will return 1
MATCH (b:B)-[r]-(a:A) WHERE (a)-[r]-(b)
RETURN COUNT(*);

How to match all paths that ends with nodes with common properties in Neo4j?

I would like to match all paths from one given node.
-->(c: {name:"*Tom*"})
/
(a)-->(b)-->(d: {name:"*Tom*"})
\
-->(e: {name:"*Tom*"})
These paths have specified structure that:
- the name of all children of the second-last node (b) should contain "Tom" substring.
How to write correct Cypher?
Let's recreate the dataset:
CREATE
(a:Person {name: 'Start'}),
(b:Person),
(c:Person {name: 'Tommy Lee Jones'}),
(d:Person {name: 'Tom Hanks'}),
(e:Person {name: 'Tom the Cat'}),
(a)-[:FRIEND]->(b),
(b)-[:FRIEND]->(c),
(b)-[:FRIEND]->(d),
(b)-[:FRIEND]->(e)
As you said in the comment, all requires a list. To get a list, you should use the collect function on the neighbours of b:
MATCH (:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(bn:Person)
WITH b, collect(bn) AS bns
WHERE all(bn in bns where bn.name =~ '.*Tom.*')
RETURN b, bns
We call b's neighbours as bn and collect them to a bns list.

Creating relation with the values of another node instead of using that node

How to store values in a relation of neo4j in Cypher Query Language?
Example: I have 3 nodes A,B,C. 'A' should relate with c using the values/properties of 'B'. Without using Node B separately we should use its values in the relation of A->C
Something like this will create a new FOO relationship with the properties of the B node. I made up a data model, since you did not provide yours.
MATCH (a:A {name: 'a'}), (b:B {name: 'b'}), (c:C {name: 'c'})
CREATE (a)-[rel:FOO]->(c)
SET rel = b;
RETURN a, b, c, rel;
If you wanted to also delete the b node, you can add a DELETE b clause right before the RETURN (and remove b from the RETURN clause).

Cypher Query to return x Number of a particular type of node

Lets say we have a Neo4j graph such as (Brand)-[:from]->(Post)<-[:likes]-(Person).
How can I return a cypher query which will have a minimum number of brand posts, say 3. I want this to be scalable and not dependent on a specific property attribute value.
Hence the results would return at least 3 instances of the Brand nodes, as well as maybe 5 from Post and 15 from Person.
I have tried a few different things:
1.) Declare several variable names for each brand (not scalable)
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
Match (b1:Brand)-[]->(p1:Post)<-[]-(per2:Person)
Match (b2:Brand)-[]->(p2:Post)<-[]-(per3:Person)
return b,b1,b2,p,p1,p2,per,per2,per3
limit 30
This didn't work because it essentially return the same as
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
return b,p,per
limit 30
2.) Use a foreach some
Match (b:Brand) WITH collect (distinct b) as bb
FOREACH (b in bb[0..3] | MATCH (b)-[]->(p:Post)<-[]-(per:Person))
RETURN b, p, per LIMIT 40
This didn't work because you can't use Match inside a Foreach call.
The only way I know how to do this is to declare a where clause with their unique property brand name values which is not scalable. It looks like this:
Match (b:Brand)-[]->(p:Post)<-[]-(per:Person)
where b.brand = "b1" OR b.brand ="b2" or b.brand = "b3"
Return b,p,per
Limit 30
However the above still doesn't even return what I want.
Please help. Here is a quick graph to test on:
Create (b1:Brand {brand:'b1'})
Create (b2:Brand {brand:'b2'})
Create (b3:Brand {brand:'b3'})
Create (p1:Post {id: "001",message: "foo"})
Create (p2:Post {id: "002",message: "bar"})
Create (p3:Post {id: "003",message: "baz"})
Create (p4:Post {id: "004",message: "raz"})
Create (per1:Person {id: "001",name: "foo"})
Create (per2:Person {id: "002",name: "foo"})
Create (per3:Person {id: "003",name: "foo"})
Create (per4:Person {id: "004",name: "foo"})
Create (per5:Person {id: "005",name: "foo"})
Create (per6:Person {id: "006",name: "foo"})
Create (per7:Person {id: "007",name: "foo"})
Merge (b1)-[:FROM]->(p1)
Merge (b1)-[:FROM]->(p2)
Merge (b2)-[:FROM]->(p3)
Merge (b3)-[:FROM]->(p4)
Merge (per1)-[:LIKES]->(p1)
Merge (per1)-[:LIKES]->(p2)
Merge (per1)-[:LIKES]->(p3)
Merge (per2)-[:LIKES]->(p1)
Merge (per2)-[:LIKES]->(p4)
Merge (per3)-[:LIKES]->(p3)
Merge (per4)-[:LIKES]->(p1)
Merge (per5)-[:LIKES]->(p2)
Merge (per6)-[:LIKES]->(p1)
Merge (per6)-[:LIKES]->(p2)
Merge (per6)-[:LIKES]->(p3)
Merge (per6)-[:LIKES]->(p4)
Merge (per7)-[:LIKES]->(p4)
You can use the unwind instead of foreach:
Match (b:Brand) WITH collect (distinct b) as bb
UNWIND bb[0..3] as b
MATCH (b)-[]->(p:Post)<-[]-(per:Person)
RETURN b, p, per LIMIT 40
Or combine with and limit:
MATCH (b:Brand) WITH distinct b LIMIT 3
MATCH (b)-[]->(p:Post)<-[]-(per:Person)
RETURN b, p, per LIMIT 40

Resources