I have duplicate relationships between nodes e.g:
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following; (I'm reading the data from a csv file)
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
CREATE UNIQUE (a)-[r:CONNECTED_TO]-(b)
SET r.weight = coalesce(r.weight, 0) + 1
But when I start this query, ıt creates duplicate coauthor nodes. The weight will update. It seems like this:
(Author)-[r:CONNECTED_TO]->(Coauthor)
( It creates 3 same coauthor nodes for the author)
If you need to fix it after the fact, you could aggregate all of the relationships and the weight between each set of applicable nodes. Then update the first relationship with the new aggregated number. Then with the collection of relationships delete the second through the last. Perform the update only where there is more than one relationship. Something like this...
MATCH (a:Author {name: 'A'})-[r:CONNECTED_TO]->(b:CoAuthor {name: 'B'})
// aggregate the relationships and limit it to those with more than 1
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
WHERE size(rels) > 1
// update the first relationship with the new total weight
SET (rels[0]).weight = new_weight
// bring the aggregated data forward
WITH a, b, rels, new_weight
// delete the relationships 1..n
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you are doing it for the whole graph and the graph is expansive you may want to perm the update it in batches using limit or some other control mechanism.
MATCH (a:Author)-[r:CONNECTED_TO]->(b:CoAuthor)
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
LIMIT 100
WHERE size(rels) > 1
SET (rels[0]).weight = new_weight
WITH a, b, rels, new_weight
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you want to eliminate the problem when loading...
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
MERGE (a)-[r:CONNECTED_TO]->(b)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = coalesce(r.weight, 0) + 1
Side Note: not really knowing your data model, I would consider modelling CoAuthor as Author as they are likely authors in their own right. It is probably only in the context of a particular project they would be considered a coauthor.
Related
I have created many nodes in neo4j, the attributes of these nodes are the same, they all have user_id and item_id, the code used is as follows:
LOAD CSV WITH HEADERS FROM 'file://data.csv' AS row
CREATE (main:Main_table {USER_ID: row.user_id,
ITEM_ID: row.item_id}
)
CREATE INDEX ON :Main_table(USER_ID);
CREATE INDEX ON :Main_table(ITEM_ID);
Now I want to create relationship between the nodes with the same user_id or item_id. For example, if node A, B and C have the same USER_ID, I want to create (A)-[:EDGE]->(B), (A)-[:EDGE]->(C) and (B)-[:EDGE]->(C). In order to achieve this goal, I tried the following code:
MATCH (a:Main_table),(b:Main_table)
WHERE a.USER_ID = b.USER_ID
CREATE (a)-[:USER_EDGE]->(b);
MATCH (a:Main_table),(b:Main_table)
WHERE a.ITEM_ID = b.ITEM_ID
CREATE (a)-[:ITEM_EDGE]->(b);
But due to the large amount of data (3000000 nodes, 100000 users), this process is very slow, how can I quickly complete this process? Any help would be greatly appreciated!
Your query is causing a cartesian product, and the Cypher planner does not use indexes to optimize node lookups involving node property comparisons.
A query like this (instead of your USER_EDGE query) may be faster, as it does not cause a cartesian product:
MATCH (a:Main_table)
WITH a.USER_ID AS id, COLLECT(a) AS mains
UNWIND mains AS a
UNWIND mains AS b
WITH a, b
WHERE ID(a) < ID(b)
MERGE (a)-[:USER_EDGE]->(b)
This query uses the aggregating function COLLECT to collect the nodes that have the same USER_ID value, and uses the ID(a) < ID(b) test to ensure that a and b are not the same nodes and to also prevent duplicate relationships (in opposite directions).
I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1
I am trying to find a node which would be the most similar to another one by the child nodes they both share and then list those nodes they share.
For example I have:
N1-[has]->A
N1-[has]->B
N1-[has]->C
N1-[has]->D
N2-[has]->A
N2-[has]->B
N2-[has]->E
N2-[has]->F
N3-[has]->A
N3-[has]->B
N3-[has]->C
N3-[has]->G
So then I want to check which node is the most similar by it's child nodes to N1.
It should be N3 because they share 3 child nodes
Now i can find which node it is by using
match (n1:Node {name: "some name"})-[:HAS]->(i)<-[:HAS]-(n2:Node)
with n2.name as n, count(*) as c
return n order by c desc limit 1
But I need the list of the nodes they share, I have been sitting on this for quite some time and can not get my head around it.
You can try using collect() to store similar nodes into a collection and then return it:
match (n1:Node {name: "some name"})-[:HAS]->(i)<-[:HAS]-(n2:Node)
with n2.name as n, collect(i) as similarNodes, count(*) as c
return n, similarNodes
order by c desc limit 1
suppose i have following relationships stored in neo4j.
A->B,A->D,C->B,C->E
Here A, C are of same label nodes and B, E also are of same label nodes.
What is the cypher query to count how many nodes A and C have in common?
Based on that I want to make to a relationship between A and C. I would like to add a relationship rank between them and give it some value say 0.5 because 1 node common. What would that query look like?
To return the number of common nodes between A and C match a pattern that has A at one and C a the other with an intermediary node. Then count the occurrences of the intermediary node.
match (:TypeOne {name: 'A'})--(common)--(:TypeOne {name: 'C'})
return count(common)
If you want to create a relationship directly between A and C as a result of the match then use merge or create with the A and C nodes. And use set to add a value to the newly created relationship.
Something like this should satisfy your requirements.
match (a:TypeOne {name: 'A'})--(common)--(c:TypeOne {name: 'C'})
with a, c, count(common) as in_common
merge (a)-[rel:COMMON_WITH]->(c)
set rel.value = in_common * 0.5
return *
i'm trying to model a cypher query for the following scenario:
I have 3 start nodes A, B, C and i'm trying to find n nodes D which are related to all three start nodes. At the end I will reduce the weight property of the relationships and choose the node with the highest weight.
Thanks in advance for helping me out!
How about something like this?
MATCH (a:Node {name: "A"})-[r1]-(d)
, (b:Node {name: "B"})-[r2]-(d)
, (c:Node {name: "C"})-[r3]-(d)
RETURN d.name
, r1.weight + r2.weight + r3.weight AS Weight
ORDER BY Weight DESC
LIMIT 1
Only return d's that match all of a, b, c; add up the weights of those relative relationships; sort descending by weight; and pick off the first.