Find a node which is connected to all start nodes - neo4j

i'm trying to model a cypher query for the following scenario:
I have 3 start nodes A, B, C and i'm trying to find n nodes D which are related to all three start nodes. At the end I will reduce the weight property of the relationships and choose the node with the highest weight.
Thanks in advance for helping me out!

How about something like this?
MATCH (a:Node {name: "A"})-[r1]-(d)
, (b:Node {name: "B"})-[r2]-(d)
, (c:Node {name: "C"})-[r3]-(d)
RETURN d.name
, r1.weight + r2.weight + r3.weight AS Weight
ORDER BY Weight DESC
LIMIT 1
Only return d's that match all of a, b, c; add up the weights of those relative relationships; sort descending by weight; and pick off the first.

Related

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

Finding the most similar node by their shared child nodes

I am trying to find a node which would be the most similar to another one by the child nodes they both share and then list those nodes they share.
For example I have:
N1-[has]->A
N1-[has]->B
N1-[has]->C
N1-[has]->D
N2-[has]->A
N2-[has]->B
N2-[has]->E
N2-[has]->F
N3-[has]->A
N3-[has]->B
N3-[has]->C
N3-[has]->G
So then I want to check which node is the most similar by it's child nodes to N1.
It should be N3 because they share 3 child nodes
Now i can find which node it is by using
match (n1:Node {name: "some name"})-[:HAS]->(i)<-[:HAS]-(n2:Node)
with n2.name as n, count(*) as c
return n order by c desc limit 1
But I need the list of the nodes they share, I have been sitting on this for quite some time and can not get my head around it.
You can try using collect() to store similar nodes into a collection and then return it:
match (n1:Node {name: "some name"})-[:HAS]->(i)<-[:HAS]-(n2:Node)
with n2.name as n, collect(i) as similarNodes, count(*) as c
return n, similarNodes
order by c desc limit 1

Neo4j duplicate relationship

I have duplicate relationships between nodes e.g:
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following; (I'm reading the data from a csv file)
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
CREATE UNIQUE (a)-[r:CONNECTED_TO]-(b)
SET r.weight = coalesce(r.weight, 0) + 1
But when I start this query, ıt creates duplicate coauthor nodes. The weight will update. It seems like this:
(Author)-[r:CONNECTED_TO]->(Coauthor)
( It creates 3 same coauthor nodes for the author)
If you need to fix it after the fact, you could aggregate all of the relationships and the weight between each set of applicable nodes. Then update the first relationship with the new aggregated number. Then with the collection of relationships delete the second through the last. Perform the update only where there is more than one relationship. Something like this...
MATCH (a:Author {name: 'A'})-[r:CONNECTED_TO]->(b:CoAuthor {name: 'B'})
// aggregate the relationships and limit it to those with more than 1
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
WHERE size(rels) > 1
// update the first relationship with the new total weight
SET (rels[0]).weight = new_weight
// bring the aggregated data forward
WITH a, b, rels, new_weight
// delete the relationships 1..n
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you are doing it for the whole graph and the graph is expansive you may want to perm the update it in batches using limit or some other control mechanism.
MATCH (a:Author)-[r:CONNECTED_TO]->(b:CoAuthor)
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
LIMIT 100
WHERE size(rels) > 1
SET (rels[0]).weight = new_weight
WITH a, b, rels, new_weight
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you want to eliminate the problem when loading...
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
MERGE (a)-[r:CONNECTED_TO]->(b)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = coalesce(r.weight, 0) + 1
Side Note: not really knowing your data model, I would consider modelling CoAuthor as Author as they are likely authors in their own right. It is probably only in the context of a particular project they would be considered a coauthor.

Make relationship based on other relationships in neo4j?

suppose i have following relationships stored in neo4j.
A->B,A->D,C->B,C->E
Here A, C are of same label nodes and B, E also are of same label nodes.
What is the cypher query to count how many nodes A and C have in common?
Based on that I want to make to a relationship between A and C. I would like to add a relationship rank between them and give it some value say 0.5 because 1 node common. What would that query look like?
To return the number of common nodes between A and C match a pattern that has A at one and C a the other with an intermediary node. Then count the occurrences of the intermediary node.
match (:TypeOne {name: 'A'})--(common)--(:TypeOne {name: 'C'})
return count(common)
If you want to create a relationship directly between A and C as a result of the match then use merge or create with the A and C nodes. And use set to add a value to the newly created relationship.
Something like this should satisfy your requirements.
match (a:TypeOne {name: 'A'})--(common)--(c:TypeOne {name: 'C'})
with a, c, count(common) as in_common
merge (a)-[rel:COMMON_WITH]->(c)
set rel.value = in_common * 0.5
return *

Find the distance in a path between each node and the last node of the path

I am very new to Cypher and I need help to solve a problem I am facing..
In my graph I have a path represeting a data stream and I need to know, for each node in the path, the distance from the last node of the path.
For example if i have the following path:
(a)->(b)->(c)->(d)
the distance must be 3 for a, 2 for b, 1 for c and 0 for d.
Is there an efficient way to obtain this result in Cypher?
Thanks a lot!
Mauro
If it is just hops between nodes then i think this will fit the bill.
match p=(a:Test {name: 'A'})-[r*3]->(d:Test {name: 'D'})
with p, range(length(p),0,-1) as idx
unwind idx as elem
return (nodes(p)[elem]).name as Node
, length(p) - elem as Distance
order by Node
In this answer, I define a path to be "complete" if its start node has no incoming relationship and its end node has no outgoing relationship.
This query returns, for each "complete" path, a collection of objects containing each node's neo4j-generated ID and the number of hops to the end of that path:
MATCH p=(x)-[*]->(y)
WHERE (NOT ()-->(x)) AND (NOT (y)-->())
WITH NODES(p) AS np, LENGTH(p) AS lp
RETURN EXTRACT(i IN RANGE(0, lp, 1) | {id: ID(np[i]), hops: lp - i})
NOTE: Matching with [*] will be costly with large graphs, so you may need to limit the maximum hop value. For example, use [*..4] instead to limit the max hop value to 4.
Also, qualifying the query with appropriate node labels and relationship types may speed it up.

Resources