How to set a count(variable) as a property of a node - neo4j

I'm currently trying to get the count of all movies that each actor has acted in (neo4j example movie database), and then set that as a num_movies_acted attribute for the person node.
So far, I'm able to get a list of all the actors and their respective movie count (including if it's 0 because of the OPTIONAL MATCH)
This is what I have:
MATCH (p:Person)
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
RETURN p.name as name, count(m) as num_movies_acted
How would I then set that into the Person Node? I know I should use something like:
SET p.num_movies_acted = count(m), but that fails to work.
Invalid use of aggregating function count(...) in this context (line 3, column 26 (offset: 84))
"SET p.num_movies_acted = count(m)"
EDIT: Would this work?
MATCH (p:Person)
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
WITH p, count(m) as num_movies_acted
SET p.num_movies_acted = num_movies_acted
RETURN p
since I am "storing" the count(m) into a variable first

MATCH (p:Person) OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie) RETURN
p.name as name, count(m) as num_movies_acted
This query returns a list as num_movies_acted, which fails to work when you try to set it as an property of an individual node.
EDIT: Would this work?
MATCH (p:Person) OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie) WITH p,
count(m) as num_movies_acted SET p.num_movies_acted = num_movies_acted
RETURN p
Yes this would work fine as you are counting the Movie node for each of the Person node and setting the property.
You can also try:
MATCH (p:Person)
OPTIONAL MATCH (p)-[r:ACTED_IN]->(m:Movie)
WITH p, count(r) as num_movies_acted
SET p.num_movies_acted = num_movies_acted
RETURN p

This is a note for someone expecting a tree with an aggregation property.
I need a tree( Person-[ACTED_IN]->Movie ) with p.num_movies_acted,
so finally I got a cypher:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE m.released > 2000
WITH p, count(r) as num_movies_acted
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE m.released > 2000
SET p.num_movies_acted = num_movies_acted
RETURN p,r,m
I got the same error of aggregation, so tried to somehow avoid it.
I'm not confident in it. So, kindly tell me more efficient one.

Related

Neo4j Cypher : existing match path exists but no match is returned if first node is equal to last node

I have a path like the one in the image shown
When I do this match query, there is no match returned, which is strange:
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b2:B)--(a2:A)
where a.id = a2.id and b.id = b2.id
return count(d)
This one is giving no match too:
match (a:A)--(b:B)
with a,b
match (a)--(b)--(c:C)--(d:D)--(c2:C)--(b)--(a)
return count(d)
But this one is giving the paths, if C type nodes have the property B_id which is the ID of their node B type attached to them:
match (c:C)--(d:D)--(c2:C)
where c.B_id = c2.B_id
return count(d)
This seems strange to me.
Any ideas on why those matches are not working?
Which would be the query to retrieve count(d)?
I believe your first query is almost correct, the only bit is that you're using a different alias for the same node A which would enforce that a and a2 are different nodes in the graph, same for b and b2
Try this one
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b)--(a)
where a.id = a2.id and b.id = b2.id
return count(d)
Your queries:
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b2:B)--(a2:A)
where a.id = a2.id and b.id = b2.id
return count(d)
match (a:A)--(b:B)
with a,b
match (a)--(b)--(c:C)--(d:D)--(c2:C)--(b)--(a)
return count(d)
are not returning matches due to the last matches in your pattern (c2:C)--(b2:B)--(a2:A) and (c2:C)--(b)--(a). The reason being there are only 1 B and 1 A nodes in your database. So, when these matches occur neo4j tries to find a new path between a and b, other than the one it has fetched from the beginning. Since, there is only one relationship between a and b, no path is found for this match. Hence, you see no matches. To fix it, you can simply remove a from the end, since it's redundant, like this:
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b)
return count(d)
The count returned above might seem incorrect due to the Cartesian product, for the exact count, use this:
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b)
return count(DISTINCT d)
EDIT
The graph you showed above, has only one B node, linked to A. The reason your path fails at the end is that (b)--(a) is already established, and no new path between b and a is there. I created multiple B nodes linked to A nodes, on an empty graph, using the queries:
MERGE (a:A{id: 1})
MERGE (b:B{id: 1})
MERGE (c:C{id: 1})
MERGE (c1:C{id: 2})
MERGE (c2:C{id: 3})
MERGE (d:D{id: 1})
MERGE (a)<-[:R]-(b)
MERGE (c)-[:R]->(b)
MERGE (c1)-[:R]->(b)
MERGE (c2)-[:R]->(b)
MERGE (c)-[:R]->(d)
MERGE (c1)-[:R]->(d)
MERGE (c2)-[:R]->(d)
// This is a separate query.
CREATE (b:B{id: 1})
WITH b
MATCH (c:C{id: 1})
MATCH (c1:C{id: 2})
MATCH (c2:C{id: 3})
MATCH (a:A{id: 1})
MERGE (b)-[:R]->(a)
MERGE (c)-[:R]->(b)
MERGE (c1)-[:R]->(b)
MERGE (c2)-[:R]->(b)
Now, the match is successful and a non-zero count is returned, using the query:
match (a:A)--(b:B)--(c:C)--(d:D)--(c2:C)--(b2:B)--(a2:A)
where a.id = a2.id and b.id = b2.id
return count(d)

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

OPTIONAL MATCH returns no path for disconnect nodes

I find weird that using OPTIONAL MATCH nodes that don’t have the expected relationship are not returned as a single node in path.
OPTIONAL MATCH path = (:Person) -[:LIKES]- (:Movie)
UNWIND nodes(p) as n
UNWIND rels(p) as e
WITH n
WHERE HEAD(LABELS(n)) = “Person”
return COUNT(DISTINCT n)
The number of people returned only includes those who liked a movie. By using OPTIONAL I would have expected all people to be returned.
Is there a workaround to this or am I doing some this wrong in the query?
A better way to go about this would be to match to all :People nodes first, then use the OPTIONAL MATCH to match to movies (or, if you want a collection of the movies they liked, use pattern comprehension).
If you do need to perform an UNWIND on an empty collection without wiping out the row, use a CASE around some condition to use a single-element list rather than the empty list.
MATCH (n:Person) // match all persons
OPTIONAL MATCH p = (n) -[:LIKES]- (m:Movie) // p and m are the optionals
UNWIND CASE WHEN p is null THEN [null] ELSE nodes(p) END as nodes // already have n, using a different variable
UNWIND CASE WHEN p is null THEN [null] ELSE rels(p) END as e // forcing a single element list means UNWIND won't wipe out the row
WITH n
WHERE HEAD(LABELS(n)) = “Person” // not really needed at all, and bad practice, you don't know the order of the labels on a node
return COUNT(DISTINCT n) // if this is really all you need, just keep the first match and the return of the query (without distinct), don't need anything else

cypher NOT IN query with Optional Match

NOT RELEVANT - SKIP TO Important Edit.
I have the following query:
MATCH (n)
WHERE (n:person) AND n.id in ['af97ab48544b'] // id is our system identifier
OPTIONAL MATCH (n)-[r:friend|connected|owner]-(m)
WHERE (m:person OR m:dog OR m:cat)
RETURN n,r,m
This query returns all the persons, dogs and cats that have a relationship with a specific person. I would like to turn it over to receive all the nodes & relationships that NOT includes in this query results.
If it was SQL it would be
select * from graph where id NOT IN (my_query)
I think that the OPTIONAL MATCH is the problematic part. I How can I do it?
Any advice?
Thanks.
-- Important Edit --
Hey guys, sorry for changing my question but my requirements has been changed. I need to get the entire graph (all nodes and relationships) connected and disconnected except specific nodes by ids. The following query is working but only for single id, in case of more ids it isn't working.
MATCH (n) WHERE (n:person)
OPTIONAL MATCH (n)-[r:friend|connected|owner]-(m) WHERE (m:person OR m:dog OR m:cat)
WITH n,r,m
MATCH (excludeNode) WHERE excludeNode.id IN ['af97ab48544b']
WITH n,r,m,excludeNode WHERE NOT n.id = excludeNode.id AND (NOT m.id = excludeNode.id OR m is null)
RETURN n,m,r
Alternatively I tried simpler query:
MATCH (n) WHERE (n:person) AND NOT n.id IN ['af97ab48544b'] return n
But this one does not returns the relationships (remember I need disconnected nodes also).
How can I get the entire graph exclude specific nodes? That includes nodes and relationships, connected nodes and disconnected as well.
try this:
match (n) where not n.id = 'id to remove' optional match (n)-[r]-(m)
where not n.id in ['id to remove'] and not m.id in ['id to remove']
return n,r,m
You've gotta switch the 'perspective' of your query... start by looping over every node, then prune the ones that connect to your person.
MATCH (bad:person) WHERE bad.id IN ['af97ab48544b']
WITH COLLECT(bad) AS bads
MATCH path = (n:person) - [r:friend|:connected|:owner] -> (m)
WHERE n._id = '' AND (m:person OR m:cat OR m:dog) AND NOT ANY(bad IN bads WHERE bad IN NODES(path))
RETURN path
That said, this is a problem much more suited to SQL than to a graph. Any time you have to loop over every node with a label, you're in relational territory, the graph will be less efficient.

Cypher query to find nodes that are not related to other node by property

Consider the following DB structure:
For your convenience, you can create it using:
create (p1:Person {name: "p1"}),(p2:Person {name: "p2"}),(p3:Person {name: "p3"}),(e1:Expertise {title: "Exp1"}),(e2:Expertise {title: "Exp2"}),(e3:Expertise {title: "Exp3"}),(p1)-[r1:Expert]->(e1),(p1)-[r2:Expert]->(e2),(p2)-[r3:Expert]->(e2),(p3)-[r4:Expert]->(e3),(p2)-[r5:Expert]->(e3)
I want to be able to find all Person nodes that are not related to a specific Expertise node, e.g. "Exp2"
I tried
MATCH (p:Person)--(e:Expertise)
WHERE NOT (e.title = "Exp2")
RETURN p
But it returns all the Person nodes (while I expected it to return only p3).
Logically, this result makes sense because each of these nodes is related to at least one Expertise that is not Exp2.
But what I want is to find all the Person nodes that are not related to Exp2, even if they are related to other nodes as well.
How can this be done?
Edit
It appears that I wasn't clear on the requirements. This is a (very) simplified way of presenting my problem with a much more complicated DB.
Consider the possibility that Expertise has more properties which I would like to use in the same query (not necessarily with negation). For example:
MATCH (p)--(e)
WHERE e.someProp > 5 AND e.anotherProp = "cookie" AND NOT e.title = "Exp2"
UPDATE
You need to restrict it a bit more, meaning to only the person
MATCH (p:Person), (e:Expertise {title="Exp2"})
WHERE NOT (p)-[]->(e)
RETURN p
I think you will be just fine with the <> operator :
MATCH (p:Person)--(e:Expertise)
WHERE e.title <> "Exp2"
RETURN p
Or you can express it in a pattern :
MATCH (p:Person)
WHERE NOT EXISTS((p)--(e:Expertise {title:"Exp2"}))
RETURN p
Little change query from #ChristopheWillemsen:
MATCH (e:Expertise) WHERE e.someProperty > 5 AND NOT e.title = someValue
WITH collect(e) as es
MATCH (p:Person) WHERE all(e in es WHERE NOT Exists( (p)--(e) ) )
RETURN p
UPDATE:
// Collect the `Expertise` for which the following conditions:
MATCH (e:Expertise) WHERE e.num > 3 AND e.title = 'Exp2'
WITH collect(e) as es
// Select the users who do not connect with any of of expertise from `es` set:
OPTIONAL MATCH (p:Person) WHERE all(e in es WHERE NOT Exists( (p)--(e) ) )
RETURN es, collect(p)
Another query with some optimization:
// Get the set of `Expertise-node` for which the following conditions:
MATCH (e:Expertise) WHERE e.num > 3 AND e.title = 'Exp2'
// Collect all `Person-node` connected to node from the `Expertise-node` set:
OPTIONAL MATCH (e)--(p:Person)
WITH collect(e) as es, collect(distinct id(p)) as eps
//Get all `Person-node` not in `eps` set:
OPTIONAL MATCH (p:Person) WHERE NOT id(p) IN eps
RETURN es, collect(p)

Resources