Cypher - how to walk graph while computing - neo4j

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away

One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)

1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

Related

Cypher : Return Nodes that matched along with Nodes that didn't match

With Labels A, B, and Z, A and B have their own relationships to Z. With the query
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z
RETURN DISTINCT a, matched_z
Which returns the nodes of A and all the Nodes Z that have a relationship to A and B
I'm stuck on trying to ALSO return a separate array of the Z Nodes that B has with Z but not with A (i.e. missing_z). I am attempting to do an initial query to return all the relationships between B & Z
results = MATCH (b:B { uuid: {id} })
MATCH (b)-[:rel2]->(z:Z)
RETURN DISTINCT COLLECT(z.uuid) AS z
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z, z
RETURN DISTINCT a, matched_z, filter(skill IN z.array WHERE NOT z.uuid IN {results}) AS missing_z
The results seem to have nil for missing_z where one would assume it should be populated. Not sure if filter is the correct way to go with a WHERE NOT / IN scenario. Can the above 2 queries be combined into 1?
The hard part here, in my opinion, is that any failed matches will drop everything you have matched so far. But your starting point seems to be "All Z related by B.uuid", So start by collecting that and filtering/copying from there.
Use WITH + aggregation functions to copy+filter columns
Use OPTIONAL MATCH if a failure to match shouldn't drop already collected rows.
If I understand what you are trying to do well enough, This cypher should do the job, and just adjust it as needed (let me know if you need help understanding any part of it/adapting it)
// Match base set
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
// Collect into single list
WITH COLLECT(z) as zs
// Match all A (ignore relation to Zs)
MATCH (a:A)
// For each a, return a, the sub-list of Zs related to a, and the sub-list of Zs not related to a
RETURN a as a, FILTER(n in zs WHERE (a)-[:rel1]->(n)) as matched, FILTER(n in zs WHERE NOT (a)-[:rel1]->(n)) as unmatched
This query might do what you want:
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
WITH COLLECT(z) as all_zs
UNWIND all_zs AS z
MATCH (a)-[:rel1]->(z)
WITH all_zs, COLLECT(DISTINCT z) AS matched_zs
RETURN matched_zs, apoc.coll.subtract(all_zs, matched_zs) AS missing_zs;
It first stores in the all_zs variable all the Z nodes that have a rel2 relationship from b. This collection's contents remain unaffected even if the second MATCH clause matches a subset of those Z nodes.
It then stores in matched_zs the distinct all_zs nodes that have a rel1 relationship from any A node.
Finally, it returns:
the matched_zs collection, and
the unique nodes from all_zs that are not also in matched_zs, as missing_zs.
The query uses the convenient APOC function apoc.coll.subtract to generate the latter return value.

How to find specific subgraph in Neo4j using where clause

I have a large graph where some of the relationships have properties that I want to use to effectively prune the graph as I create a subgraph. For example, if I have a property called 'relevance score' and I want to start at one node and sprawl out, collecting all nodes and relationships but pruning wherever a relationship has the above property.
My attempt to do so netted this query:
start n=node(15) match (n)-[r*]->(x) WHERE NOT HAS(r.relevance_score) return x, r
My attempt has two issues I cannot resolve:
1) Reflecting I believe this will not result in a pruned graph but rather a collection of disjoint graphs. Additionally:
2) I am getting the following error from what looks to be a correctly formed cypher query:
Type mismatch: expected Any, Map, Node or Relationship but was Collection<Relationship> (line 1, column 52 (offset: 51))
"start n=node(15) match (n)-[r*]->(x) WHERE NOT HAS(r.relevance_score) return x, r"
You should be able to use the ALL() function on the collection of relationships to enforce that for all relationships in the path, the property in question is null.
Using Gabor's sample graph, this query should work.
MATCH p = (n {name: 'n1'})-[rs1*]->()
WHERE ALL(rel in rs1 WHERE rel.relevance_score is null)
RETURN p
One solution that I can think of is to go through all relationships (with rs*), filter the the ones without the relevance_score property and see if the rs "path" is still the same. (I quoted "path" as technically it is not a Neo4j path).
I created a small example graph:
CREATE
(n1:Node {name: 'n1'}),
(n2:Node {name: 'n2'}),
(n3:Node {name: 'n3'}),
(n4:Node {name: 'n4'}),
(n5:Node {name: 'n5'}),
(n1)-[:REL {relevance_score: 0.5}]->(n2)-[:REL]->(n3),
(n1)-[:REL]->(n4)-[:REL]->(n5)
The graph contains a single relevant edge, between nodes n1 and n2.
The query (note that I used {name: 'n1'} to get the start node, you might use START node=...):
MATCH (n {name: 'n1'})-[rs1*]->(x)
UNWIND rs1 AS r
WITH n, rs1, x, r
WHERE NOT exists(r.relevance_score)
WITH n, rs1, x, collect(r) AS rs2
WHERE rs1 = rs2
RETURN n, x
The results:
╒══════════╤══════════╕
│n │x │
╞══════════╪══════════╡
│{name: n1}│{name: n4}│
├──────────┼──────────┤
│{name: n1}│{name: n5}│
└──────────┴──────────┘
Update: see InverseFalcon's answer for a simpler solution.

Cypher - Only show node name, not full node in path variable

In Cypher I have the following query:
MATCH p=(n1 {name: "Node1"})-[r*..6]-(n2 {name: "Node2"})
RETURN p, reduce(cost = 0, x in r | cost + x.cost) AS cost
It is working as expected. However, it prints the full n1 node, then the full r relationship (with all its attributes), and then full n2.
What I want instead is to just show the value of the name attribute of n1, the type attribute of r and again the name attribute of n2.
How could this be possible?
Thank you.
The tricky part of your request is the type attribute of r, as r is a collection of relationships of the path, not a single relationship. We can use EXTRACT to produce a list of relationship types for all relationships in your path. See if this will work for you:
MATCH (n1 {name: "Node1"})-[r*..6]-(n2 {name: "Node2"})
RETURN n1.name, EXTRACT(rel in r | TYPE(rel)) as types, n2.name, reduce(cost = 0, x in r | cost + x.cost) AS cost
You also seem to be calculating a cost for the path. Have you looked at the shortestPath() function?

Neo4j query with logical AND on relationship types instead of OR

UPDATE: I've changed the graphic and example queries to make the request more clear. The basic idea is the same, but now I'm showing that there really are more than just two relationships. The idea is I want TWO of them to match, not necessarily ALL of them.
Given the following Neo4j graph:
Is it possible to specify a relationship in a query that requires that TWO specific relationships be there for a match, but not necessarily all, without simply stating each full matching path separately? I want a logical AND on the relationship types, just like we have a logical OR using the | character.
This is how you would use a logical OR with the | character:
// OR on MEMBER_OF and GRANT_GROUP_COMP
MATCH (p:Person {name:'John'})-[r:MEMBER_OF|GRANT_GROUP_COMP]->(t:Team {name:'Team 1'})
RETURN p,r,t
What I'm looking for is something like this, an AND with a & or simlar that REQUIRES that both relationships be present:
// AND type functionality in the relationship I'd like
MATCH (p:Person {name:'John'})-[r:MEMBER_OF&GRANT_GROUP_COMP]->(t:Team {name:'Team 1'})
RETURN p,r,t
Without having to resort to this - which works for me just fine:
// I'd like to avoid this
MATCH (p:Person {name:'John'})-[r:MEMBER_OF]->(t:Team {name:'Team 1'}),
(p)-[r2:GRANT_GROUP_COMP]->(t)
RETURN p,r,r2,t
Any insight would be appreciated, but based on responses so far, it simply doesn't exist.
What about this?
MATCH (D:Person {name:'Donald'})-[r1:WORKS_AT]->
(o:Office {code:'279'})<-[r2:SUPPORTS]-(D)
RETURN *
Inspired version of Dave
MATCH (D:Person {name:'Donald'})-[r:WORKS_AT|SUPPORTS]->(o:Office {code:'279'})
WITH D, o, collect(r) as rels,
collect(distinct type(r)) as tmp WHERE size(tmp) >= 2
return D, o, rels
Update:
MATCH (D:Person {name:'Donald'})
- [r: MEMBER_OF
| GRANT_INDIRECT_ALERTS
| GRANT_INDIRECT_COMP
| GRANT_GROUP_ALERTS
| GRANT_GROUP_COMP
] ->
(o:Office {code:'279'})
WITH D, o, collect(r) as rels,
collect(distinct type(r)) as tmp WHERE size(tmp) >= 2 AND size(tmp) <= 5
return D, o, rels
This query will return a result if John and Team 1 have MEMBER_OF AND GRANT_GROUP_COMP relationships between them.
(This is very similar to the second answer of #stdob--, but requires the size of types to be exactly 2.)
MATCH (p:Person {name: 'John'})-[r:MEMBER_OF|GRANT_GROUP_COMP]->(t:Team {name: 'Team 1'})
WITH p, t, COLLECT(r) AS rels, COLLECT(DISTINCT type(r)) AS types
WHERE SIZE(types) = 2
RETURN p, t, rels;
You could add the second relationship type in a WHERE clause. Something like this...
MATCH (p:Person {name:'John'})-[r:GRANT_GROUP_COMP]->(t:Team {name:'Team 1'})
WHERE (p)-[:MEMBER_OF]->(t)
RETURN *
Or you could make sure that the complete set is in the collection of relationship types. Something like this...
MATCH (p:Person {name:'John'})-[r]->(t:Team {name:'Team 1'})
with p,t,collect(type(r)) as r_types
where all(r in ['MEMBER_OF','GRANT_GROUP_COMP'] where r in r_types)
RETURN p, t, r_types

Neo4j duplicate relationship

I have duplicate relationships between nodes e.g:
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following; (I'm reading the data from a csv file)
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
CREATE UNIQUE (a)-[r:CONNECTED_TO]-(b)
SET r.weight = coalesce(r.weight, 0) + 1
But when I start this query, ıt creates duplicate coauthor nodes. The weight will update. It seems like this:
(Author)-[r:CONNECTED_TO]->(Coauthor)
( It creates 3 same coauthor nodes for the author)
If you need to fix it after the fact, you could aggregate all of the relationships and the weight between each set of applicable nodes. Then update the first relationship with the new aggregated number. Then with the collection of relationships delete the second through the last. Perform the update only where there is more than one relationship. Something like this...
MATCH (a:Author {name: 'A'})-[r:CONNECTED_TO]->(b:CoAuthor {name: 'B'})
// aggregate the relationships and limit it to those with more than 1
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
WHERE size(rels) > 1
// update the first relationship with the new total weight
SET (rels[0]).weight = new_weight
// bring the aggregated data forward
WITH a, b, rels, new_weight
// delete the relationships 1..n
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you are doing it for the whole graph and the graph is expansive you may want to perm the update it in batches using limit or some other control mechanism.
MATCH (a:Author)-[r:CONNECTED_TO]->(b:CoAuthor)
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
LIMIT 100
WHERE size(rels) > 1
SET (rels[0]).weight = new_weight
WITH a, b, rels, new_weight
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you want to eliminate the problem when loading...
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
MERGE (a)-[r:CONNECTED_TO]->(b)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = coalesce(r.weight, 0) + 1
Side Note: not really knowing your data model, I would consider modelling CoAuthor as Author as they are likely authors in their own right. It is probably only in the context of a particular project they would be considered a coauthor.

Resources