Neo4j aggregate/count of connected nodes - neo4j

I'm trying to do a count of nodes connected to node A, where node A is part of a Cypher query starting from node B, and I'm getting unexpected results. Here's an example setup. Pretend we're looking at books and owners, and books cite other books while owners of course own books:
Book B1
Book B2 CITES B1
Book B3 CITES B1
Book B4
Owner O1 OWNS B1
Owner O2 OWNS B2
Owner O3 OWNS B3 and B4
So let's say I'm looking at Book B1, and I want to find each book that cites it, then count the books owned by each person who owns the citing book. So if I start with B1, I should find owners O2 and O3, since each owns a book that cites B1. If I count the books they own, I should get 1 for O2, and 2 for O3.
So first, a query to just list the owners works fine:
start a=node(1) MATCH a<-[:CITES]-b<-[:OWNS]-c return c.name
That returns the names as expected. But then this query:
start a=node(1) MATCH a<-[:CITES]-b<-[:OWNS]-c-[:OWNS]->d return c.name, count(d)
It seems as though it should get to c, which is the list of owners, then go through the OWNS relationship to the owned books as d, and count them. But instead I get:
+--------------------+
| c.name | count(d) |
+--------------------+
| "O3" | 1 |
+--------------------+
It feels like it's leaving out the books/nodes that have already been found through the other OWNS link -- the ones represented by b. Is there any way to do this in a single query, or is it best to gather up the owners as c, then query again for each of them? It feels like this should be possible, but I haven't figured it out yet. Any ideas would be great -- thanks in advance.

You're right, once a node gets found, you can't find it again in the same match under a different named variable. You can break this up using a WITH, and then use d in the same way, and it will match all of them.
START a=node(14)
MATCH a<-[:CITES]-b<-[:OWNS]-c
WITH c MATCH c-[:OWNS]->d
RETURN c.name, count(d);
http://console.neo4j.org/?id=x1jst9

Related

Aggregate function MAX alongwith Merge Neo4j Cypher

Im new to neo4j and I am looking to seek help.
I have 2 entities named entity1 and entity2 and their relationship is defined (CEO) ,however I was successful to load the data and form the relationship using merge
confidence value | Entity1 | Entity2 | Relationship
0.884799964 |Jamie Dimon | JPMorgan Chase | CEO
0.884799964 |Jamie Dimon | JPMorgan Chase | CEO
0.380894504 |Jamie Dimon | JPMorgan Chase | CEO
My question : the confidence value is 0.88 and 0.38 for Jamie Dimon , Now I want to display a single relationship between Jamie Dimon and JPMorgan which holds the maximum confidence value (0.88)
With this query I was able to display 2 relationships with confidence value 0.88 and 0.38 instead of 3 relationships, but I want a single relationship which holds maximum confidence.
LOAD CSV WITH HEADERS FROM 'file:/result_is_of_neo4j_final1.csv' AS line
MERGE (e1:Entity1 {name: line.relation_first, e1_confidence: toFloat(line.entities_0_confidence)})
WITH line, e1
MERGE (e2:Entity2 {name : line.relation_second, e2_confidence: toFloat(line.entities_1_confidence)})
WITH e2, e1, line
MERGE (e1)-[r:IS_FROM {relation : line.relation_relation, r_confidence: toFloat(line.relation_confidence)}]->(e2)
RETURN e1,r,e2
How much data are you planning to load this way? If it's a large import, it's likely you'll want to use PERIODIC COMMIT to speed up the import and avoid memory issues. However, doing so will also impact any kind of comparison and conditional logic in your import, as it's not guaranteed that the rows you need to compare are being executed within the same transaction.
I'd recommend importing all of your nodes and relationships without any extra logic, and then running a query after all the data is loaded to remove the unnecessary relationships.
This query should work for you after the graph is loaded. For every :IS_FROM relationship of a given relation, it will only keep the relationship with the highest r_confidence, and delete the others:
MATCH (e1:Entity1)-[r:IS_FROM]->(e2:Entity2)
WITH e1, r.relation as relation, COLLECT(r) as rels, MAX(r.r_confidence) as maxConfidence, e2
WHERE SIZE(rels) > 1
WITH FILTER(r in rels WHERE r.r_confidence <> maxConfidence) as toDelete
FOREACH (rel in toDelete | DELETE rel)
EDIT
If you need to get rid of duplicate relationships too, then an alternate approach that should work better might be to order your relationships of a specific relation between two nodes by confidence, and delete all except the first:
MATCH (e1:Entity1)-[r:IS_FROM]->(e2:Entity2)
WITH e1, r.relation as relation, r, e2
ORDER BY r.r_confidence DESC
// since we just sorted, the collection will be in order
WITH e1, relation, COLLECT(r) as rels, e2
// delete all other relationships except the top
FOREACH (rel in TAIL(rels) | DELETE rel)

Return Neo4J Combined Relationships When Searching Across Several Relationship Types

I would like to query for various things and returned a combined set of relationships. In the example below, I want to return all people named Joe living on Main St. I want to return both the has_address and has_state relationships.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
RETURN r, r1;
But when I run this query in the Neo4J browser and look under the "Text" view, it seems to put r and r1 as columns in a table (something like this):
│r │r1 │
╞═══╪═══|
│{} │{} │
rather than as desired with each relationship on a different row, like:
Joe Smith | has_address | 1 Main Street
1 Main Street | has_state | NY
Joe Richards | has_address | 22 Main Street
I want to download this as a CSV file for filtering elsewhere. How do I re-write the query in Neo4J to get the desired result?
You may want to look at the Cypher cheat sheet, specifically the Relationship Functions.
That said, you have variables on all the nodes you need. You can output all the data you need on each row.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
RETURN p.name AS name, a.street AS address, s.name AS state
That should be enough.
What you seem to be asking for above is a way to union r and r1, but in such a way that they alternate in-order, one row being r and the next being its corresponding r1. This is a rather atypical kind of query, and as such there isn't a lot of support for easily making this kind of output.
If you don't mind rows being out of order, it's easy to do, but your start and end nodes for each relationship are no longer the same type of thing.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
WITH COLLECT(r) + COLLECT(r1) as rels
UNWIND rels AS rel
RETURN startNode(rel) AS start, type(rel) AS type, endNode(rel) as end

Travel graph within one level only

Within a graph there is a group G1 - this group G1 has 3 subgroups S1, S2 and S3. The relation is classified as IS_SUBGROUP_OF.
G1 itsself is again a subgroup of another group, lets call it D1. D1 has a lot of subgroups where G1 is only one.
Having a user U1 who is member of a Subgroup of G1 - here S1. I want to create a query which is able to gather all users of subgroup S1, traverse from user U1 to S1 and from there to G1, get the users of G1 and down from G1 to S2 and S3 and grab all users from S2 and S3 as well. The final result should be all users in the subgroups S1, S2 and S3 from the parent Group G1 including the users of G1.
I have tried:
MATCH (d:User) --> (S1:Subgroup)-[:IS_SUBGROUP_OF*0..]->(G1:Group)
WHERE d.name = "U1"
RETURN d
Unfortunately I traverse all groups and give back all users of any group in the graph. I tried to change the hop-level in the relation (e.g. 1 only) but didnt succeed. Do you have a hint how to create the query to get only this subset of users?
The name of the groups are just for the example and not known in the real world - all I know is the username (here: U1) - and from there I need to find various groups depending where the user is situated. So in the query I cannot work with names of groups but only with variables as they are not known.
* EDITED *
Sorry for the confusion, I labeld S1 wrongly as Subgroup, but only the relation mentions 'IS_SUBGROUP_OF', so all Group Nodes have the label 'Group', D1 would also have the label 'Group'. I also add the relation label for users, so the statement looks now like this:
MATCH (d:User) -[:IS_MEMBER_OF]-> (S1:Group)-[:IS_SUBGROUP_OF*0..]->(G1:Group)
WHERE d.name = "U1"
RETURN d
Let's try this, a minor tweak on Dave's answer (which should work fine, as far as I can tell...)
MATCH (:User {name: 'U1'})-[:IS_MEMBER_OF]->(:Group)-[:IS_SUBGROUP_OF]->(superGroup:Group)
WITH superGroup
MATCH (superGroup)<-[:IS_SUBGROUP_OF*0..1]-(:Group)<-[:IS_MEMBER_OF]-(users:User)
RETURN COLLECT(DISTINCT users)
Based upon the starting user, this finds the grandparent group or supergroup (G1 according to your example), then matches on users that are members of G1 or any of its immediate subgroups and returns the distinct collection. It will include the original matched user.
This answer assumes the user is identified as a member of a group by the relationship IS_MEMBER_OF.
The query first determines the parent group G1 based on the supplied user U1. It then determines all of the users of the child groups of G1 (S1, S2, S3) and returns the collection of distinct users accross the child groups.
This is a somewhat generalized approach that could be used to traverse more levels by modifying the number of levels to traverse in each situation.
// follow IS_MEMBER_OF or IS_SUBGROUP_OF relationships up
// the group/user hierarchy to find the parent group two
// levels up
match (u:User1 {name: 'U1'})-[:IS_MEMBER_OF|IS_SUBGROUP_OF*2]->(g:Group)
// using the parent group
with g
// follow the IS_MEMBER_OF or IS_SUBGROUP_OF relationships back down
// the hierarchy to find all of the peer users or the original user
match (g)<-[:IS_MEMBER_OF|IS_SUBGROUP_OF*2]-(u:User)
return collect(distinct u)
Would this work?
MATCH (d:User)-[*0..1]-(G1:Group)
WHERE d.name= 'U1'
RETURN DISTINCT d

Get all node-relation originating from a node in Neo4J

Let's say we have this cypher
match (n:Person{pid:322})-[k*1..2]->(s) return k
k would be a Collection of all relation originating from a specific node n in at most 2 hops.
How can I get all Node-Relation-Node where Relation is in k? It's something like match (a)-[b]->(c) foreach b in k return a,b,c but I know this query is invalid since foreach keyword in Neo4J can't be used in this case.
Edit:
I think I should add some illustration to make things clearer. Here I use an example from Neo4J documentation:
When I start from Charlie Sheen and using 1..2 range, the query must return (assuming the query ends with return a,b,c)
a | b | c
Charlie Sheen | ACTED_IN | Wall Street
Charlie Sheen | FATHER | Martin Sheen
Martin Sheen | ACTED_IN | The American President
Martin Sheen | ACTED_IN | Wall Street
This query should produce the a, b, and c values your edited question asks for (assuming you always want the name property values):
MATCH (n:Person{pid:322})-[k*1..2]->(s)
WITH LAST(k) AS lk
RETURN STARTNODE(lk).name AS a, TYPE(lk) AS b, ENDNODE(lk).name AS c;
I'm not sure why that wouldn't just be:
match (n:Person{pid:322})-[k*1..2]->(s) return n,k,s
Unless I'm misunderstanding your question.
A path will give you the sequence of nodes and relationsjips from n to s via k
MATCH p=(n:Person{pid:322})-[k*1..2]->(s) return p
You can also extract only the nodes and/or only the relationships from the path using nodes(p) or rels(p)

Neo4J cypher or C# api array prop intersection

I Have relationships where there is a property on the relationship that is an array type. and I'm looking to get all relationships based on the array on the first rel, and then from then on, only find nodes connected where there is at least 1 common element. ie :
[a, b, c, d] ->(node)-[a, f, g]-> would link
but
[a,b,c,d]->(node)-[r,f,g]-> would not link
match (c:Company {RegistrationNumber : 'regNumber'}), c<-[r]-(n)
with n, c,r, extract(x in r.AlertIds | x) as alerts
match path= n-[*..7]-p
with c, n, path,extract(alertP in rels(path) | alertP.AlertId) as ap, extract(a in alerts | a) as alert
return distinct n,alert,c,rels(path), ap, nodes(path)
The above cypher is looking a little crazy but basically I'm trying to get where any element in the array alerts (from the top part) is in any relationship found in the array in the rel on the second part.
Any help would be greatly appreciated
update 1
http://console.neo4j.org/?id=ee23d3
above link is hopefully something that can be used to better understand what i'm looking for.
I want to able to be able to find all links following any path where there are common AlertIds.
so in the example linked I want to only see 1 path being returned with nodes (3,2,1,0) and the AlertIds that were found in the path ie ("e1")
If this would be easier to do to say any paths where the links are followed with the last node being the first company (the one in the initial match).
I hope this helps
Here is a console that shows a query that may point you in the right direction. It is based on the same data as your console.
Here is the query it uses:
MATCH path=(company:Company {RegNumber : "3254"})-[emp:IsEmployee]->(employee:Person)-[rel*..7]->(company)
WHERE ALL (r IN rel
WHERE ANY (a IN r.AlertIds
WHERE a IN emp.AlertIds))
RETURN company, emp.AlertIds AS alerts, employee, path
This query finds all paths up to length 8 that start and end with the same Company (3254), where the first relationship (emp) is of type IsEmployee, and where all subsequent relationships have at least one AlertIds element in common with the elements in emp.AlertIds.

Resources