How to find same nodes in neo4j? - neo4j

one question about neo4j. I am implementing the data and with a simple cypher query i want to extract the data where the end node is the same for example.
i have a collection of accountIDs.
`
match (n) where n.ID IN ['4260890','04379258','04643207','2250893','228910','2290','225067003','2002755','225832','2138572','4174122','01884','06563','13397','5216','7789','236740']
WITH collect (n) as IDs
UNWIND IDs AS n
Now, i want to see which employee id belongs to same department. i tried to do it like,
match (n)-[r:Has]-(v) with n, collect (v) as V
Unwind V as Vn
match (m)-[r:Has]-(Vm) with m, collect (Vm) as Va
unwind Va as Vm
Where Vm in Vn
AND NOT 'm'='n'
return distinct n.id as EmpId
collect (distinct Vm) as department
But the problem is, i think this query is not correct. I have error on 2nd where clause.
Can someone help me to find same department shared by the employees of one collection.
Thanks in advnace.

Related

Separating matching nodes in a query result

I defined the directed relation Know on person nodes. For example, if Sara knows Alice then Sara-> Alice. I wrote this Cypher query to find all the people who know both the right and left side of the directed relation.
match ((n:Person)-[:Know]-> (m:Person)),(p:Person)
where EXISTS ((m)<-[:Know]-(p)-[:Know]->(n))
RETURN m,n,p
I need to get subgraphs with 3 nodes in the query's result but the result I get is a graph with many nodes. Is there any method to change the query to generate subgraphs with just 3 nodes (for example, a subgraph of Alex-> Sara, Alex-> Alice, Sara-> Alice and if Sara has the same condition on two other people it is shown in another subgraph). This requires repeating some nodes in the output.
MATCH clauses are more flexible than that. Try this:
MATCH (n:Person)-[:Know]->(m:Person)<-[:Know]-(p:Person)-[:Know]->(n)
WHERE NOT EXISTS (()-[:Know]->(p))
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(m)
WHERE q <> n
AND q <> p
}
AND NOT EXISTS {
WITH m, n, p
MATCH (q:Person)-[:Know]->(n)
WHERE q <> p
}
RETURN m, n, p
You might have to use a unique ID property, and I'm not sure if the WITH clause will work here as I've gotten it; but with subqueries, you are generally able to import variables from above using WITH.

Neo4j Return nodes such that every link follows an specific pattern

I was practicing with the Movie Database from Neo4j in order to practice and I have done the next query:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
RETURN a
This query returns 3 rows but If I go to the graph view on the web editor and expand the "Tom Hanks" node I, of course, have one movie such that Tom Hanks directed and acted in that movie but the rest of the connected nodes only have the ACTED_IN relation. What I want to do is to, in this case, filter and remove Tom Hanks from the result since he has at least one connection such that it has only one relation (either ACTED_IN or DIRECTED)
PD: My expected result would be only the row representing node "Clint Eastwood"
So you only want results where the person acted in and directed the same movies, but never simply acted in, without directing, or directed, without acting.
You could use this approach:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN]->()) = actedDirectedCount AND size((a)-[:DIRECTED]->()) = actedDirectedCount
RETURN a
Though you can simplify this a bit by combining the relationship types in the pattern used in your WHERE clause like so:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN|DIRECTED]->()) = actedDirectedCount * 2
RETURN a
If the actedDirectedCount = 3 movies, then there must be at a minimum 3 :ACTED_IN relationships and 3 :DIRECTED relationships, so a minimum of 6 relationships using either relationship. If there are any more than this, then there are additional movies that they either acted in or directed, so we'd filter that out.
There options come to my mind:
1.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, collect(distinct m) as actedMovies
with a where all(x in directedMovies where x in actedMovies) and all(x in actedMovies where x in directedMovies)
return a
2.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with * order by id(m)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, m order by id (m)
with a, directedMovies, collect(distinct m) as actedMovies
with a where actedMovies=directedMovies
return a
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
with * where all(x in directedMovies where (a)-[:ACTED_IN]->(x))
MATCH (m:Movie)<-[:ACTED_IN]-(a)
with a, collect(distinct m) as actedMovies
with * where all(x in actedMovies where (a)-[:DIRECTED]->(x))
return a
The first two are equally expensive and the last one is a bit more expensive.

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?
We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a
Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Find nodes (n) with :OWNER relations to nodes (a), (b), and (c) and no other nodes

My Neo4j 3.2 database has nodes (n) which may have :OWNER relations to other nodes. I want to find all the nodes (n) with :OWNER relations specifically to nodes (a), (b), and (c) and specifically not to any other nodes.
I would have thought that this would be fairly easily accomplished with
MATCH (n), (o)
WHERE (
(n)-[:OWNER]->(o) AND o.uuid IN $owner_ids
AND NOT ((n)-[:OWNER]->(o) AND NOT o.uuid IN $owner_ids)
RETURN (n)
But it doesn't work. This query incorrectly returns nodes (n) with :OWNER relations to (a), (b), (c), and (d). I've also tried
MATCH (n), (o)
WHERE (n)-[:OWNER]->(o) AND o.uuid IN $owner_ids
WITH (n),(o)
WHERE NOT ((n)-[:OWNER]->(o) AND NOT o.uuid IN $owner_ids)
RETURN (n)
As well as what feels like a million other permutations to no avail. Any suggestions are greatly appreciated!
UPDATE
The above is a simplified scenario. As requested in a comment, an example closer to reality is:
MATCH (a)<-[:ANSWER]-(:Person {uuid: $person_id}), (o)
WHERE (exists((o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)) AND o.uuid IN $owner_ids)
AND NOT (exists((o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)) AND NOT o.uuid IN $owner_ids)
RETURN (a)
ANSWER
The full answer is
MATCH (o)<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a)<-[:ANSWER]-(:Person {uuid: $person_id})
WHERE o.uuid IN $owner_ids
WITH (a), count(o) as cnt
WHERE cnt = size(()<-[:OWNER]-(:Owner)<-[:OWNER]-(:Form)-[:ANSWER]->(a))
RETURN (a)
Assuming you add labels to your graph (let's use :Node for now, though it's not clear from your description if all nodes should be the same or if some should use different labels), and that you have a unique constraint on :Node(uuid) for quick lookup, this should work:
MATCH (n:Node)-[:OWNER]->(o:Node)
WHERE o.uuid IN $owner_ids
WITH n, count(o) as cnt
WHERE cnt = size((n)-[:OWNER]->())
RETURN n
Your query had a cartesian product between n and o (the cross product of all the nodes of your graph with each other), which won't perform well. You need to specify the relationship in the MATCH, not the WHERE.
As for the rest of the query, we're getting, for each n, the count of o nodes (those with the ids in question), and ensuring that the number of :OWNER relationships for each n is equal to that count. If it's greater, then there are :OWNER relationships to other nodes, so those are filtered out.
The size() function we're using, since we aren't specifying anything for the end node, is efficient at getting relationship counts.
If your requirement was to have all three :OWNER relationships present (not just any subset of them), then I'd use this query:
WITH ['a', 'b', 'c'] AS ids
MATCH (n:Node)-[:OWNER]->(o:Node)
WITH n,
COUNT(CASE WHEN o.uuid IN ids THEN 1 END) AS matches_found,
size(ids) AS matches_desired,
count(o) AS total_relationships
WHERE matches_found = matches_desired
AND matches_found = total_relationships
RETURN n
ORDER BY n.uuid

Resources