Neo4j read all subgraphs without duplications Cypher query - neo4j

I've been trying to get a subgraph based on a node query.
The query should ignore the relationship directions as long as all of the nodes in the subgraph are connected:
ex:
u1 -FRIEND-> u2 -FRIEND-> u3
u4 -FRIEND-> u5 -FRIEND-> u6
searching for u1 or u2 or u3, should return a set of: [u1,u2,u3]
I used the following Cypher query:
MATCH (a:User)-[:FRIEND_OF*0..]-(b)
WHERE a.userId = 'some_id'
WITH a, collect(DISTINCT b) AS sets
RETURN DISTINCT sets
The problem is that I'm getting all of the set's permutations like:
DATA: u1 -FRIEND-> u2 -FRIEND-> u3
RETURN: [u1,u2,u3],[u1,u3,u2],[u2,u1,u3]...
how can I distinct the different sets to return only one permutation?
I also would like to support a case a user can be in different subgraphs so the response should be couple of subgraphs.
thanks

This query should work:
MATCH p=(a:User)-[:FRIEND_OF*0..]-(b)
WHERE a.userId = 'some_id'
WITH DISTINCT a, b
ORDER BY ID(b)
WITH a, COLLECT(b) AS sets
RETURN DISTINCT sets;
It gets distinct a/b pairs, orders the b nodes by native ID, puts the ordered nodes in collections, and finally returns distinct collections.
You may want to create an index for :User(userId) for better performance.

Related

Neo4j How to drop a duplicated EDGE?

I am learning Cypher / Neo4j, using C#
I created this the EDGE 3 times.
client.Cypher
.Match("(user1:Person)", "(user2:Person)")
.Where((Person user1) => user1.name == "Tony")
.AndWhere((Person user2) => user2.name == "Maria Esther")
//.Create("(user1)-[:PAI]->(user2)")
.Create("(user2)-[:FILHO {DataDeNascimento: '2006'}]->(user1)")
.ExecuteWithoutResults();
How to drop the 2 other :FILHO (duplicated edges)?
This query will delete duplicate :FILHO relationships between Person nodes:
MATCH (p1:Person)-[r:FILHO]->(p2:Person)
WITH p1, p2, COLLECT(r) as rels
FOREACH(r IN tail(rels) | DELETE r)
First, it matches on all FILHO relationships and Person nodes.
Then aggregates the relationships for each pair of Person nodes into the rels collection.
Then iterates through the tail of each rels collection (all relationships, but the first) and deletes them.
It may be better to consider how to avoid creating dupicate edges. Consider using merge rather than create.

neo4j cypher to filter multi paths based on two relationships

I have the following graph:
I need to get all the AD nodes which are related to a particular User node. If I search by a user B1, I should get all the AD nodes which are connected by HAS relation to B1 node as well as the AD nodes which are connected to its parent by HAS relation. But if any of these AD nodes are connected by an EXCLUDES relation, I should filter that one out.
For example, if I search by B1, I should get AD4,AD2
AD1 has EXCLUDES with D1 and AD3 has excludes with C1, hence filtered out.
I am using the following cypher
MATCH path=(p:AD)-[:HAS|EXCLUDES]-()<-[:CHILD_OF*]-(u:User) USING INDEX u:User(id) WHERE u.id = 'B1'
with p,
collect( filter( r in rels(path)
where type(r) = 'EXCLUDES'
)
) as test
where all( t in test where size(t) = 0 )
return p
The issue is when I search with C1, it return AD4,AD3,AD2. How can I eliminate AD3 from the result?
:CHILD_OF* doesn't include your starting node. To include that, set a lowerbound of 0:
[:CHILD_OF*0..]
That said, there are probably better ways to form your query. Try this, maybe:
MATCH (u:User)
WHERE u.id = 'B1'
WITH u, [(p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u) | p] as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p
EDIT
The pattern comprehension feature was released with Neo4j 3.1. You won't be able to use that in an older version. Try this instead:
MATCH (u:User)
WHERE u.id = 'B1'
OPTIONAL MATCH (p:AD)-[:EXCLUDES]-()<-[:CHILD_OF*0..]-(u)
WITH u, collect(p) as excluded
MATCH (p:AD)-[:HAS]-()<-[:CHILD_OF*0..]-(u)
WHERE not p in excluded
RETURN p

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Find nodes (n) with :OWNER relations to nodes (a), (b), and (c) and no other nodes

My Neo4j 3.2 database has nodes (n) which may have :OWNER relations to other nodes. I want to find all the nodes (n) with :OWNER relations specifically to nodes (a), (b), and (c) and specifically not to any other nodes.
I would have thought that this would be fairly easily accomplished with
MATCH (n), (o)
WHERE (
(n)-[:OWNER]->(o) AND o.uuid IN $owner_ids
AND NOT ((n)-[:OWNER]->(o) AND NOT o.uuid IN $owner_ids)
RETURN (n)
But it doesn't work. This query incorrectly returns nodes (n) with :OWNER relations to (a), (b), (c), and (d). I've also tried
MATCH (n), (o)
WHERE (n)-[:OWNER]->(o) AND o.uuid IN $owner_ids
WITH (n),(o)
WHERE NOT ((n)-[:OWNER]->(o) AND NOT o.uuid IN $owner_ids)
RETURN (n)
As well as what feels like a million other permutations to no avail. Any suggestions are greatly appreciated!
UPDATE
The above is a simplified scenario. As requested in a comment, an example closer to reality is:
MATCH (a)<-[:ANSWER]-(:Person {uuid: $person_id}), (o)
WHERE (exists((o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)) AND o.uuid IN $owner_ids)
AND NOT (exists((o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)) AND NOT o.uuid IN $owner_ids)
RETURN (a)
ANSWER
The full answer is
MATCH (o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)<-[:ANSWER]-(:Person {uuid: $person_id})
WHERE o.uuid IN $owner_ids
WITH (a), count(o) as cnt
WHERE cnt = size(()<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a))
RETURN (a)
Assuming you add labels to your graph (let's use :Node for now, though it's not clear from your description if all nodes should be the same or if some should use different labels), and that you have a unique constraint on :Node(uuid) for quick lookup, this should work:
MATCH (n:Node)-[:OWNER]->(o:Node)
WHERE o.uuid IN $owner_ids
WITH n, count(o) as cnt
WHERE cnt = size((n)-[:OWNER]->())
RETURN n
Your query had a cartesian product between n and o (the cross product of all the nodes of your graph with each other), which won't perform well. You need to specify the relationship in the MATCH, not the WHERE.
As for the rest of the query, we're getting, for each n, the count of o nodes (those with the ids in question), and ensuring that the number of :OWNER relationships for each n is equal to that count. If it's greater, then there are :OWNER relationships to other nodes, so those are filtered out.
The size() function we're using, since we aren't specifying anything for the end node, is efficient at getting relationship counts.
If your requirement was to have all three :OWNER relationships present (not just any subset of them), then I'd use this query:
WITH ['a', 'b', 'c'] AS ids
MATCH (n:Node)-[:OWNER]->(o:Node)
WITH n,
COUNT(CASE WHEN o.uuid IN ids THEN 1 END) AS matches_found,
size(ids) AS matches_desired,
count(o) AS total_relationships
WHERE matches_found = matches_desired
AND matches_found = total_relationships
RETURN n
ORDER BY n.uuid

Neo4j duplicate relationship

I have duplicate relationships between nodes e.g:
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
(Author)-[:CONNECTED_TO {weight: 1}]->(Coauthor)
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following; (I'm reading the data from a csv file)
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
CREATE UNIQUE (a)-[r:CONNECTED_TO]-(b)
SET r.weight = coalesce(r.weight, 0) + 1
But when I start this query, ıt creates duplicate coauthor nodes. The weight will update. It seems like this:
(Author)-[r:CONNECTED_TO]->(Coauthor)
( It creates 3 same coauthor nodes for the author)
If you need to fix it after the fact, you could aggregate all of the relationships and the weight between each set of applicable nodes. Then update the first relationship with the new aggregated number. Then with the collection of relationships delete the second through the last. Perform the update only where there is more than one relationship. Something like this...
MATCH (a:Author {name: 'A'})-[r:CONNECTED_TO]->(b:CoAuthor {name: 'B'})
// aggregate the relationships and limit it to those with more than 1
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
WHERE size(rels) > 1
// update the first relationship with the new total weight
SET (rels[0]).weight = new_weight
// bring the aggregated data forward
WITH a, b, rels, new_weight
// delete the relationships 1..n
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you are doing it for the whole graph and the graph is expansive you may want to perm the update it in batches using limit or some other control mechanism.
MATCH (a:Author)-[r:CONNECTED_TO]->(b:CoAuthor)
WITH a, b, collect(r) AS rels, sum(r.weight) AS new_weight
LIMIT 100
WHERE size(rels) > 1
SET (rels[0]).weight = new_weight
WITH a, b, rels, new_weight
UNWIND range(1,size(rels)-1) AS idx
DELETE rels[idx]
If you want to eliminate the problem when loading...
MATCH (a:Author {authorid: csvLine.author_id}),(b:Coauthor { coauthorid: csvLine.coauthor_id})
MERGE (a)-[r:CONNECTED_TO]->(b)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = coalesce(r.weight, 0) + 1
Side Note: not really knowing your data model, I would consider modelling CoAuthor as Author as they are likely authors in their own right. It is probably only in the context of a particular project they would be considered a coauthor.

Resources