sum relationships in neo4j bloom scene action - neo4j

so in neo4j, I have company nodes and relationships between them as invoices, relationship propertys as invoice amount and products description. I want to sum relationships amount to be left only one arrow, and sumerized amount, i don't want to delete relationships from db. i added this query to the bloom scene action but cant get any result
match(n:company)-[r:invoice]->(m:company)
where id(n) in $nodes
with n, m, sum(r.amount) as total, collect(r) as relationships
where size(relationships) > 1
with total, head(relationships) as keep, tail(relationships) as delete
set keep.w = total
foreach(r in delete | delete r)
return *

You can use use virtual relationships that show the aggregated sums.
match(n:company)-[r:invoice]->(m:company)
where id(n) in $nodes
with n,m, sum(r.amount) as amount, count(*) as rels
where rels > 1
return n,m,apoc.create.vRelationship(n,'TOTAL',{amount:amount},m) as rel

Related

Count the number of relationships between a pair of nodes and set it as parameter in Neo4J

I have a graph where a pair of nodes can have several relationships between them.
I would like to count this relationships between each pair of nodes, and set it as a parameter of each relationship.
I tried something like:
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
with s, e, r, length(r) as cnt
MATCH (s2:LabeledExperience{name:s.name})-[r2:NextExp{name:r.name}]->(e2:LabeledExperience{name: e.name})
SET r2.weight = cnt
But this set the weight always to one.
I also tried:
MATCH ()-[r:NextExp]->()
with r, length(r) as cnt
MATCH ()-[r2:NextExp{name:r.name}]->()
SET r2.weight = cnt
But this takes too much time since there are more than 90k relationships and there is no index (since it is not possible to have them on edges).
They are always set to 1 because of the way you are counting.
When you group by s, e, r that is always going to result in a single row. But if you collect(r) for every s, e then you will get a collection of all of the :NextExp relationships between those two nodes.
Also, length() is for measuring the length (number of nodes) in a matched path and should not work directly on a relationship.
Match the relationship and put them in a collection for each pair of nodes. Iterate over each rel in the collection and set the size of the collection of rels.
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
WITH s, e, collect(r) AS rels
UNWIND rels AS rel
SET rel.weight = size(rels)

How to efficiently find multiple relationship size

We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:
MATCH (n)-[:REL_TYPE]-(m)
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
To demonstrate a similar result, the below sample code can run on the movie graph after running
:play movies in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)
MATCH (n)-[]-(m)
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
Is there a better/more efficient way to query the graph?
The following query is more performant, since it only scans each relationship once [whereas size((n)--(m)) will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m) result for, but be aware that it is possible for some nodes to be counted twice after the addition.
[UPDATE]
If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)

Any way to filter out the most frequent terms in Neo4J APOC request?

I have the following request:
CALL apoc.index.relationships('TO','context:34b4a5b0-0dfa-11e9-98ed-7761a512a9c0')
YIELD rel, start, end WITH DISTINCT rel, start, end
RETURN DISTINCT start.uid AS source_id,
start.name AS source_name,
end.uid AS target_id,
end.name AS target_name,
rel.uid AS edge_id,
rel.context AS context_id,
rel.statement AS statement_id,
rel.weight AS weight
Which returns a table of results such as
The question:
Is there a way to filter out the top 150 most connected nodes (source_name/source_id and target_name/edge_id nodes)?
I don't think it would work with frequency as each table row is unique (because of the different edge_id) but maybe there's a function inside Neo4J / Cypher that allows me to count the top most frequent (source_name/source_id and target_name/edge_id) nodes?
Thank you!
This query might do what you want:
CALL apoc.index.relationships('TO','context:34b4a5b0-0dfa-11e9-98ed-7761a512a9c0')
YIELD rel, start, end
WITH start, end, COLLECT(rel) AS rs
ORDER BY SIZE(rs) DESC LIMIT 50
RETURN
start.uid AS source_id,
start.name AS source_name,
end.uid AS target_id,
end.name AS target_name,
[r IN rs | {edge_id: r.uid, context_id: r.context, statement_id: r.statement, weight: r.weight}] AS rels
The query uses the aggregating function COLLECT to collect all the relationships for each pair of start/end nodes, keeps the data for the 50 node pairs with the most relationships, and returns a row of data for each pair (with the data for the relationships in a rels list).
You could always use size( (node)-[:REL]->() ) to get the degree.
And if you compute the top-n degree's first you can filter those out by comparing
WHERE min < size( (node)-[:REL]->() ) < max

Cypher: Returning node relationships and properties for all nodes in a collection

Currently I'm running personalized PageRank over a set of nodes. I want to take the top n nodes and RETURN all relationships between these nodes in addition to the resulting PageRank score for each node and a set of properties for each node.
Right now I'm managing to RETURN the start and end nodes of the relationships, but I'm unable to figure out how to also RETURN the scores and any additional node properties.
My query is as follows:
MATCH (p) WHERE p.paper_id = $paper_id
CALL algo.pageRank.stream(null, null, {direction: "BOTH", sourceNodes: [p]})
YIELD nodeId, score
WITH p, nodeId, score ORDER BY score DESC
LIMIT 25
MATCH (n) WHERE id(n) = nodeId
WITH collect(n) as nodes
UNWIND nodes as n
MATCH (n)-[r:Cites]->(p) WHERE p in nodes
RETURN startNode(r).paper_id as start, collect(endNode(r).paper_id) as end
In the second block of code I collect the matched nodes n in order to then find all relationships between nodes in that collection later on. However, including score in the WITH collect(n) as nodes line results in score being used as the indexer when I want to somehow pass it in separately (is this allowed?).
The output format is not important since I can properly format it server side.
APOC can help here, you can use apoc.algo.cover() to get all the relationships between a set of nodes.
As for bringing the score along, it would probably be best to extract from the collected nodes a map projection which includes the score:
MATCH (p) WHERE p.paper_id = $paper_id
CALL algo.pageRank.stream(null, null, {direction: "BOTH", sourceNodes: [p]})
YIELD nodeId, score
WITH p, nodeId, score ORDER BY score DESC
LIMIT 25
MATCH (n) WHERE id(n) = nodeId
WITH collect(nodeId) as ids, collect(n {.*, score}) as nodes
CALL apoc.algo.cover(ids) YIELD rel
RETURN nodes, collect(rel) as rels

Neo4j duplicate relationship

I have duplicate relationships between nodes e.g:
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following; (I'm reading the data from a csv file)
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
CREATE UNIQUE (a)-[r:CONNECTED_TO]-(b)
SET r.weight = coalesce(r.weight, 0) + 1
But when I start this query, ıt creates duplicate coauthor nodes. The weight will update. It seems like this:
(Author)-[r:CONNECTED_TO]->(Coauthor)
( It creates 3 same coauthor nodes for the author)
If you need to fix it after the fact, you could aggregate all of the relationships and the weight between each set of applicable nodes. Then update the first relationship with the new aggregated number. Then with the collection of relationships delete the second through the last. Perform the update only where there is more than one relationship. Something like this...
MATCH (a:Author {name: 'A'})-[r:CONNECTED_TO]->(b:CoAuthor {name: 'B'})
// aggregate the relationships and limit it to those with more than 1
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
WHERE size(rels) > 1
// update the first relationship with the new total weight
SET (rels[0]).weight = new_weight
// bring the aggregated data forward
WITH a, b, rels, new_weight
// delete the relationships 1..n
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you are doing it for the whole graph and the graph is expansive you may want to perm the update it in batches using limit or some other control mechanism.
MATCH (a:Author)-[r:CONNECTED_TO]->(b:CoAuthor)
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
LIMIT 100
WHERE size(rels) > 1
SET (rels[0]).weight = new_weight
WITH a, b, rels, new_weight
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you want to eliminate the problem when loading...
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
MERGE (a)-[r:CONNECTED_TO]->(b)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = coalesce(r.weight, 0) + 1
Side Note: not really knowing your data model, I would consider modelling CoAuthor as Author as they are likely authors in their own right. It is probably only in the context of a particular project they would be considered a coauthor.

Resources