How to find the similarity between two graphs in neo4j - neo4j

In below graphs A, B, C, D, E, G, H, I, J, K is main graph and L, M, N, O, P, Q, R is sub graph. First I am trying to get the sub graph exist in main graph so, I got the sub graph like D, E, G, H. Now I want apply the cosine similarity between L, M, N, O, P, Q, R graph and D, E, G, H.
Code for creating the graph :
MERGE (a:Node {name:'A', score:1})
MERGE (b:Node {name:'B', score:2})
MERGE (c:Node {name:'C', score:3})
MERGE (d:Code {name:'D', score:4})
MERGE (e:Code {name:'E', score:5})
MERGE (g:Code {name:'G', score:7})
MERGE (h:Code {name:'H', score:8})
MERGE (i:Node {name:'I', score:9})
MERGE (j:Node {name:'J', score:10})
MERGE (k:Node {name:'K', score:11})
MERGE (a)-[:Connects {score:1}]->(b)
MERGE (a)-[:Connects {score:2}]->(c)
MERGE (a)-[:Connects {score:3}]->(d)
MERGE (b)-[:Connects {score:4}]->(c)
MERGE (b)-[:Connects {score:5}]->(d)
MERGE (b)-[:Connects {score:6}]->(j)
MERGE (c)-[:Connects {score:7}]->(d)
MERGE (c)-[:Connects {score:8}]->(e)
MERGE (d)-[:Connects {score:10}]->(g)
MERGE (d)-[:Connects {score:11}]->(h)
MERGE (e)-[:Connects {score:14}]->(g)
MERGE (e)-[:Connects {score:15}]->(h)
MERGE (g)-[:Connects {score:20}]->(h)
MERGE (g)-[:Connects {score:21}]->(i)
MERGE (g)-[:Connects {score:22}]->(j)
MERGE (i)-[:Connects {score:23}]->(j)
MERGE (i)-[:Connects {score:24}]->(k)
MERGE (j)-[:Connects {score:25}]->(k)
CREATE (l:Test {name:'L', score:4})
CREATE (m:Test {name:'M', score:5})
CREATE (n:Test {name:'N', score:6})
CREATE (o:Test {name:'O', score:7})
CREATE (p:Test {name:'P', score:8})
CREATE (q:Test {name:'Q', score:12})
CREATE (r:Test {name:'R', score:13})
CREATE (l)-[:Connects {score:10}]->(o)
CREATE (l)-[:Connects {score:11}]->(p)
CREATE (l)-[:Connects {score:12}]->(n)
CREATE (m)-[:Connects {score:13}]->(n)
CREATE (m)-[:Connects {score:14}]->(o)
CREATE (m)-[:Connects {score:15}]->(p)
CREATE (n)-[:Connects {score:16}]->(o)
CREATE (n)-[:Connects {score:17}]->(p)
CREATE (n)-[:Connects {score:26}]->(q)
CREATE (n)-[:Connects {score:27}]->(r)
CREATE (o)-[:Connects {score:20}]->(p)
I am new to neo4j cypher query, Please suggest How to apply cosine similarity between graphs (L, M, N, O, P, Q, R graph and D, E, G, H).
To get the sub graph I used below cypher query :
MATCH (n) where n.score IN [4,5,6,7,8,12,13] AND NONE(l IN labels(n) WHERE l=~'Tes.*')
MATCH path = (n)-[l]->(m) where m.score IN [4,5,6,7,8,12,13]
UNWIND nodes(path) as node
RETURN node
I got the sub graph like below
Please suggest How to apply cosine similarity between graphs (L, M, N, O, P, Q, R graph and D, E, G, H).

You are almost there in getting the cosine similarity. You just need to collect the distinct scores. For reference, also look at this website. https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-cosine/
MATCH (n) where n.score IN [4,5,6,7,8,12,13] AND NONE(l IN labels(n) WHERE l=~'Tes.*')
MATCH (t: Test)
MATCH path = (n)-[l]->(m) where m.score IN [4,5,6,7,8,12) AS similarity,13]
UNWIND nodes(path) as node
RETURN algo.similarity.cosine(
collect(node.score), collect(m.score)) AS similarity
Result:
similarity
0.9733970633316996

Related

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

Neo4j Cypher : listing edges

Having this data for example :
CREATE
(p1:Person {name:"p1"}),
(p2:Person {name:"p2"}),
(p3:Person {name:"p3"}),
(p4:Person {name:"p4"}),
(p5:Person {name:"p5"}),
(p1)-[:KNOWS]->(p2),
(p1)-[:KNOWS]->(p3),
(p1)-[:KNOWS]->(p4),
(p5)-[:KNOWS]->(p3),
(p5)-[:KNOWS]->(p4)
I want to get common relationships between p1 and p5 :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN p, p1, p5
This returns 4 nodes : p1, p3, p4, p5 and 4 edges.
My aim is to get edges with direction as table rows : from and to. So this seems to works :
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r1).name AS from, endNode(r1).name AS to
UNION
MATCH (p1:Person {name:"p1"})-[r1:KNOWS]-(p:Person)-[r2:KNOWS]-(p5:Person {name:"p5"})
RETURN startNode(r2).name AS from, endNode(r2).name AS to
The result is a table :
from | to
-----|----
p1 | p3
p1 | p4
p5 | p3
p5 | p4
My questions are :
Is it correct ?
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
And what if i want common nodes to 3 persons ?
The best way to check performance is to PROFILE your queries.
Is it correct ?
I'm not sure why you do a UNION, you can easily use a path check :
PROFILE MATCH (p1:Person {name:"p1"}), (p5:Person {name:"p5"})
MATCH path=(p1)-[*..2]-(p5)
UNWIND rels(path) AS r
RETURN startNode(r).name AS from, endNode(r).name AS to
Is it the best way to do it ? I mean about performance when there will be thousands of nodes.
Generally you would match first the start and end nodes of the path you want with single lookups (make sure you have an index/constraint on the label/property pair for the Person nodes).
Depending on your graph degree this can be an extensive operation, you can fine tune by limiting the max depth of the paths *..15 for example.
And what if i want common nodes to 3 persons ?
There are multiple ways depending on the size of your graph :
a) if not too many nodes :
Match the 3 nodes and find Persons that have at least one connection to ALL 3:
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
MATCH (p:Person) WHERE ALL(x IN persons WHERE EXISTS((x)--(p)))
RETURN p
b) some tuning, assume one common will be directly connected to the first node in the 3
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p
MATCH (p)-[:KNOWS]-(other)
WHERE ALL (x IN persons WHERE EXISTS((x)--(other)))
RETURN other
c) if you need the commons in a multiple depth path :
PROFILE MATCH (p:Person) WHERE p.name IN ["p1","p4","p3"]
WITH collect(p) AS persons
WITH persons, persons[0] as p1, persons[1] as p2
MATCH path=(p1)-[*..15]-(p2)
WHERE ANY(x IN nodes(path) WHERE x = persons[2])
UNWIND rels(path) AS commonRel
WITH distinct commonRel AS r
RETURN startNode(r) AS from, endNode(r) AS to
I would suggest to grow your graph and try/tune your use cases

Neo4J/Cypher Filter nodes based on multiple relationships

Using Neo4J and Cypher:
Given the diagram below, I want to be able to start at node 'A' and get all the children that have a 'ChildOf' relationship with 'A', but not an 'InactiveChildOf' relationship. So, in this example, I would get back A, C and G. Also, a node can get a new parent ('H' in the diagram) and if I ask for the children of 'H', I should get B, D and E.
I have tried
match (p:Item{name:'A'}) -[:ChildOf*]-(c:Item) where NOT (p)-[:InactiveChildOf]-(c) return p,c
however, that also returns D and E.
Also tried:
match (p:Item{name:'A'}) -[rels*]-(c:Item) where None (r in rels where type(r) = 'InactiveChildOf') return p,c
But that returns all.
Hopefully, this is easy for Neo4J and I am just missing something obvious. Appreciate the help!
Example data: MERGE (a:Item {name:'A'}) MERGE (b:Item {name:'B'}) MERGE (c:Item {name:'C'}) MERGE (d:Item {name:'D'}) MERGE (e:Item {name:'E'}) MERGE (f:Item {name:'F'}) MERGE (g:Item {name:'G'}) MERGE (h:Item {name:'H'}) MERGE (b)-[:ChildOf]->(a) MERGE (b)- [:InactiveChildOf] ->(a) MERGE (c)-[:ChildOf]->(a) MERGE (d)-[:ChildOf]->(b) MERGE (e)-[:ChildOf]->(b) MERGE (f)-[:ChildOf]->(c) MERGE (f)- [:InactiveChildOf] ->(c) MERGE (g)-[:ChildOf]->(c) MERGE (b)-[:ChildOf]->(h)
Note, I understand that I could simply put an "isActive" property on the ChildOf relationship or remove the relationship, but I am exploring options and trying to understand if this concept would work.
If a query interpreted as: find all the nodes, the path to which passes through the nodes unrelated by InactiveChildOf to the previous node, the request might be something like this:
match path = (p:Item{name:'A'})<-[:ChildOf*]-(c:Item)
with nodes(path) as nds
unwind range(0,size(nds)-2) as i
with nds,
nds[i] as i1,
nds[i+1] as i2
where not (i1)-[:InactiveChildOf]-(i2)
with nds,
count(i1) as test
where test = size(nds)-1
return head(nds),
last(nds)
Update: I think that this version is better (check that between two nodes there is no path that will contain at least one non-active type of relationship):
match path = (p:Item {name:'A'})<-[:ChildOf|InactiveChildOf*]-(c)
with p, c,
collect( filter( r in rels(path)
where type(r) = 'InactiveChildOf'
)
) as test
where all( t in test where size(t) = 0 )
return p, c
By reading and examining the graph, correct me if I'm wrong but the actual text representation of the cypher query should be
Find me nodes in a path to A, all nodes in that path cannot have an outgoing
InactiveChildOf relationship.
So, in Cypher it would be :
MATCH p=(i:Item {name:"A"})<-[:ChildOf*]-(x)
WHERE NONE( x IN nodes(p) WHERE (x)-[:InactiveChildOf]->() )
UNWIND nodes(p) AS n
RETURN distinct n
Which returns

Neo4j Cypher - Query partial fixed route and partial variable route

Let's say I have a graph network like shown here:
I can do a cypher query using something like
MATCH (a:A)-[]->(b:B)-[]->(c:C)-[]-(d1:D),
(a)-[]->(b)-[]->(c)-[]-(d2:D),
(a)-[]->(b)-[]->(c)-[]-(d3:D),
(a)-[]->(b)-[]->(c)-[]-(d4:D),
WHERE d1.val = '1' AND d2.val = '2' AND d3.val ='3', d4.val = '4'
RETURN a, b, c, d1, d2, d3, d4
Is there a way to simplify this query, without explicitly rewriting the relationship over and over again, which are identical. I am trying to find every relation which has all the D values I am expecting, which is large list so probably an IN clause would be appropriate.
Edit:
Sample data based on answer below
create (a1:A {name: 'A1'})
create (b1:B {name: 'B1'})
create (c1:C {name: 'C1'})
create (d1:D {name: 'D1', val: 1})
create (d2:D {name: 'D2', val: 2})
create (d3:D {name: 'D3', val: 3})
create (d4:D {name: 'D4', val: 4})
create (a1)-[:NEXT]->(b1)
create (b1)-[:NEXT]->(c1)
create (c1)-[:NEXT]->(d1)
create (c1)-[:NEXT]->(d2)
create (c1)-[:NEXT]->(d3)
create (c1)-[:NEXT]->(d4)
create (a2:A {name: 'A2'})
create (b2:B {name: 'B2'})
create (c2:C {name: 'C2'})
create (a2)-[:NEXT]->(b2)
create (b2)-[:NEXT]->(c2)
create (c2)-[:NEXT]->(d1)
create (c2)-[:NEXT]->(d2)
create (a3:A {name: 'A3'})
create (b3:B {name: 'B3'})
create (c3:C {name: 'C3'})
create (a3)-[:NEXT]->(b3)
create (b3)-[:NEXT]->(c3)
create (c3)-[:NEXT]->(d1)
create (c3)-[:NEXT]->(d2)
create (c3)-[:NEXT]->(d3)
create (c3)-[:NEXT]->(d4)
return *
So the query should result in A1-->B1-->C1-->D1,D2,D3,D4 and A3-->B3-->C3-->D1,D2,D3,D4
Since A2-->B2--C2 links with only D1,D2 and not D3,D4 it should not be in the result.
The beginning of the path is always the same, so you don't need to repeat it. Then, based on a list of values, you want to check if you can find a D for each and every one of them: it could be a job for all.
Mixing all that, we get:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList
WHERE all(value IN values WHERE any(d IN dList WHERE d.val = value))
RETURN a, d, c, dList
However, if n is the number of values, that's an O(n^2) algorithm because of the second WHERE.
Let's collect the values of the nodes while collecting the nodes themselves, to avoid the double loop and turn it into a O(n) algorithm:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList, collect(DISTINCT d.val) AS dValues
WHERE all(value IN values WHERE value in dValues)
RETURN a, d, c, dList
Assuming the list of values passed as a parameter only contains distinct values, we can even change that into an O(1) algorithm by simply comparing the size of the input list and the distinct values found:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList, collect(DISTINCT d.val) AS dValues
WHERE size({values}) = size(dValues)
RETURN a, d, c, dList
Because dValues ⊂ values, if the 2 sets have the same size, they're equal.
If D.val are globally unique, or at least unique for all the D nodes connected to a single C, it can be further simplified:
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN {values}
WITH a, b, c, collect(d) AS dList
WHERE size({values}) = size(dList)
RETURN a, d, c, dList
If the values are globally unique, the query will be faster with the unicity constraint as it will also index the values:
CREATE CONSTRAINT ON (d:D) ASSERT d.val IS UNIQUE
If every D node has a unique val property (if any), this should work:
WITH [1,2,3,4] AS desired
MATCH (a:A)-->(b:B)-->(c:C)-->(d:D)
WHERE d.val IN desired
WITH a, b, c, COLLECT(DISTINCT d) AS ds
WHERE SIZE(ds) = SIZE(desired)
RETURN a, b, c, ds
The result will have a row for every matched A, B, C combination, along with the collection of D nodes.
Assuming the following data set...
create (a:A {name: 'A'})
create (b:B {name: 'B'})
create (c:C {name: 'C'})
create (d1:D {name: 'D1', val: 1})
create (d2:D {name: 'D2', val: 2})
create (d3:D {name: 'D3', val: 3})
create (d4:D {name: 'D4', val: 4})
create (a)-[:NEXT]->(b)
create (b)-[:NEXT]->(c)
create (c)-[:NEXT]->(d1)
create (c)-[:NEXT]->(d2)
create (c)-[:NEXT]->(d3)
create (c)-[:NEXT]->(d4)
return *
You could execute a query something like this to match all of the specific D nodes in a particular value range.
match (a:A)-->(b:B)-->(c:C)-->(d:D)
where d.val in range(1,4)
return *
Here is an updated query based on your updated question. I collected the D values for each A,B,C chain of nodes.
match (a:A)-->(b:B)-->(c:C)-->(d:D)
where d.val in range(1,4)
with a, b, c, d
order by a.name, b.name, c.name, d.name
return a, b, c, collect(d) as d
order by a.name, b.name, c.name

Need only common nodes across multiple paths - Neo4j Cypher

sMy Cypher query finds the common child nodes for several starting nodes. Each path has only it's node id's extracted and returned resulting in hundreds of rows in each path. See p1 and p2 example (showing only 3 rows and two start points).
Match p1=(n1:node{name:"x" })-[r:sub*]->(i)), p2=(n2:node{name:"y" })-[r:sub*]->(i))
RETURN DISTINCT i, extract(a IN nodes(p1)| a.id) as p1, extract(b IN nodes(p2)| b.id) as p2
----RESULTS----
p1=1,4,3
p1=1,8,3
p1=1,8,9,3
p2=6,7,3
p2=6,5,9,3
p2=6,7,10,3
What I would like is to intersect the paths in cypher during the query so that I don't have to do it after. In php I would iterate using:
$result = array_intersect($p1,$p2);
This would return 9,3 from the above example because they are the only common nodes shared by all paths. Is there a way to do this in Cypher so that I don't have hundreds of rows returned?
Thanks!
I believe this will meet your needs.
Here is a picture of the data under consideration.
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, collect( distinct n ) as p2
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 | n.name] as n
EDIT: updated solution to include a third path
// match the two different paths with the common ending i
match p1=(n1:Node {name: 1 })-[:SUB*]->(i)
, p2=(n2:Node {name: 6 })-[:SUB*]->(i)
, p3=(n3:Node {name: 4 })-[:SUB*]->(i)
// collect both sets of paths for every
with i, collect(p1) as p1, collect(p2) as p2, collect(p3) as p3
// recombine the nodes of the first path(s) as distinct collections of nodes
unwind p1 as p
unwind nodes(p) as n
with i, p2, p3, collect( distinct n ) as p1
// recombine the nodes of the second path(s) as distinct collections of
unwind p2 as p
unwind nodes(p) as n
with i, p1, p3, collect( distinct n ) as p2
// recombine the nodes of the third path(s) as distinct collections of
unwind p3 as p
unwind nodes(p) as n
with i, p1, p2, collect( distinct n ) as p3
// return the common ending node with the nodes common to each path
return i, [n in p1 where n in p2 and n in p3 | n.name] as n

Resources