In this Cypher query, I want to sum all the weights over paths in a graph:
MATCH p=(n:person)-[r*2..3]->(m:person)
WHERE n.name = 'alice' and m.name = 'bob'
WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p
return n.name, m.name, weight_sum
LIMIT 10
In this query, I expect to receive a table with 3 columns: n.name, m.name (identical in all the rows), and weight_sum -- according to the weight sum in the specific path.
However, I get this error:
reduce(...) requires '| expression' (an accumulation expression) (line 3,
column 6 (offset: 89))
"WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p"
I obviously miss something trivial. But what?
Shouldn't that be
REDUCE(weights=0, rel IN r | weights + rel.weight) AS weight_sum
(with a pipe instead of a colon) as per the documentation in http://neo4j.com/docs/developer-manual/current/cypher/functions/list/ ?
reduce(totalAge = 0, n IN nodes(p)| totalAge + n.age) AS reduction
Hope this helps.
Regards,
Tom
Related
in my neo4j graph DB I have issues and persons as my nodes, and relationships are "links" (issue to issue), and "resource" (issue to person).
I'm interested in finding all paths of issues where the sum of their weights is greater than a threshold y and the overall length of the chain is longer than x.
I'm not sure if the following works, as I think it just gives me issues with 5 links
MATCH (s:Issue)-[rs:links*5..]->(m:Issue)
WITH s, rs, m
unwind rs as r
return s AS source_node,
id(s) AS source_id,
r,
m AS target_node,
id(m) AS target_id
I've tried with count as well but I don't think it is the right way to proceed.
To do this, use REDUCE() to accumulate the weights of relationships. Consider a x = 5 and y = 200:
MATCH (s:Issue)-[rs:links*5..]->(m:Issue) // match depths with 5 (x) or more
WITH REDUCE(weights = 0, rel IN rs | weights + rel.weight) AS total_weight, s, rs, m
WHERE total_weight < 100 // filter by total_weight < y (100)
unwind rs as r
return s AS source_node,
id(s) AS source_id,
r,
m AS target_node,
id(m) AS target_id
I have the following query in neo4j which uses a UNION
MATCH (u:User {userId:'1'})-[dw:DIRECTOR_WEIGHT]->(d:Person)-[:DIRECTED]->(m:Movie)
WITH m, avg(dw.weight) AS mean_dw, 0 AS mean_aw, 0 AS mean_gw
WHERE m.title = 'Bambi'
RETURN m.title, mean_dw, mean_aw, mean_gw, mean_dw + mean_aw + mean_gw AS total
UNION
MATCH (u:User {userId:'1'})-[aw:ACTOR_WEIGHT]->(a:Person)-[:ACTED_IN]->(m:Movie)
WITH m, 0 AS mean_dw, avg(aw.weight) AS mean_aw, 0 AS mean_gw
WHERE m.title = 'Bambi'
RETURN m.title, mean_dw, mean_aw, mean_gw, mean_dw + mean_aw + mean_gw AS total
UNION
MATCH (u:User {userId:'1'})-[gw:GENRE_WEIGHT]->(g:Genre)<-[:GENRE]-(m:Movie)
WITH m, 0 AS mean_dw, 0 AS mean_aw, avg(gw.weight) AS mean_gw
WHERE m.title = 'Bambi'
RETURN m.title, mean_dw, mean_aw, mean_gw, mean_dw + mean_aw + mean_gw AS total
yielding the following result:
╒═════════╤═══════════════╤════════════════╤═════════════════╤═════════════════╕
│"m.title"│"mean_dw" │"mean_aw" │"mean_gw" │"total" │
╞═════════╪═══════════════╪════════════════╪═════════════════╪═════════════════╡
│"Bambi" │7.2916666666667│"0" │"0" │7.2916666666667 │
├─────────┼───────────────┼────────────────┼─────────────────┼─────────────────┤
│"Bambi" │"0" │0.45322110715442│"0" │0.45322110715442 │
├─────────┼───────────────┼────────────────┼─────────────────┼─────────────────┤
│"Bambi" │"0" │"0" │9.289617486338933│9.289617486338933│
└─────────┴───────────────┴────────────────┴─────────────────┴─────────────────┘
My problem is the "total" doesn't do what I intend it to do, since I only want a single total per movie (i.e. the sum of the three non-zero weights: 7.29 + 0.45 + 9.28),
but I cannot find a way to use this returned result further. I.e., I would like to be able to say say something like
RETURN m.title, sum(total)
or
RETURN m.title, mean_dw + mean_aw + mean_gw
after getting the union of mean_dw, mean_aw, and mean_gw respectively
While post-union processing isn't currently supported by Cypher, you can get around this with apoc.cypher.run() in APOC procedures. This will let you perform a union within the run and yield the unioned result, allowing you to finish up whatever remaining processing you want.
Though looking at your queries, you're performing identical operations in each one, the only difference is the relationships followed in the matches. There's also some unnecessary work being done for three separate mean columns, as the only thing you're interested in is getting the average of each specific relationship's weight as the mean, and then summing all the means.
That should allow us to cut out some redundant operations and work with a narrower set of variables.
Something like this:
MATCH (u:User {userId:'1'}), (m:Movie{title:'Bambi'})
CALL apoc.cypher.run("
MATCH (u)-[dw:DIRECTOR_WEIGHT]->()-[:DIRECTED]->(m)
RETURN avg(dw.weight) as mean
UNION ALL
MATCH (u)-[aw:ACTOR_WEIGHT]->()-[:ACTED_IN]->(m)
RETURN avg(aw.weight) AS mean
UNION ALL
MATCH (u)-[gw:GENRE_WEIGHT]->()<-[:GENRE]-(m)
RETURN avg(gw.weight) AS mean
", {u:u, m:m}) YIELD value
RETURN m.title, SUM(value.mean) as total
Now, all that said, you don't necessarily need to use unions at all. You can just use subqueries connected with WITH.
MATCH (u:User {userId:'1'}), (m:Movie{title:'Bambi'})
MATCH (u)-[dw:DIRECTOR_WEIGHT]->()-[:DIRECTED]->(m)
WITH u, m, avg(dw.weight) as total
MATCH (u)-[aw:ACTOR_WEIGHT]->()-[:ACTED_IN]->(m)
WITH u, m, total + avg(aw.weight) AS total
MATCH (u)-[gw:GENRE_WEIGHT]->()<-[:GENRE]-(m)
WITH u, m, total + avg(gw.weight) AS total
RETURN m.title, total
I am writing a Cypher query in Neo4j 2.0.4 that attempts to get the total number of inbound and outbound relationships for a selected node. I can do this easily when I only use this query one-node-at-a-time, like so:
MATCH (g1:someIndex{name:"name1"})
MATCH g1-[r1]-()
RETURN count(r1);
//Returns 305
MATCH (g2:someIndex{name:"name2"})
MATCH g2-[r2]-()
RETURN count(r2);
//Returns 2334
But when I try to run the query with 2 nodes together (i.e. get the total number of relationships for both g1 and g2), I seem to get a bizarre result.
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
For some reason, the number is much much greater than the total of 305+2334.
It seems like other Neo4j users have run into strange issues when using multiple MATCH clauses, so I read through Michael Hunger's explanation at https://groups.google.com/d/msg/neo4j/7ePLU8y93h8/8jpuopsFEFsJ, which advised Neo4j users to pipe the results of one match using WITH to avoid "identifier uniqueness". However, when I run the following query, it simply times out:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
I suspect this query doesn't return because there's a lot of records returned by r1. In this case, how would I operate my "get-number-of-relationships" query on 2 nodes? Am I just using some incorrect syntax, or is there some fundamental issue with the logic of my "2 node at a time" query?
Your first problem is that you are returning a Cartesian product when you do this:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
If there are 305 instances of r1 and 2334 instances of r2, you're returning (305 * 2334) == 711870 rows, and because you are summing this (count(r1)+count(r2)) you're getting a total of 711870 + 711870 == 1423740.
Your second problem is that you are not carrying over g2 in the WITH clause of this query:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
You match on g2 in the first MATCH clause, but then you leave it behind when you only carry over r1 in the WITH clause at line 3. Then, in line 4, when you match on g2-[r2]-() you are matching literally everything in your graph, because g2 has been unbound.
Let me walk through a solution with the movie dataset that ships with the Neo4j browser, as you have not provided sample data. Let's say I want to get the total count of relationships attached to Tom Hanks and Hugo Weaving.
As separate queries:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
RETURN COUNT(r)
=> 13
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN COUNT(r)
=> 5
If I try to do it your way, I'll get (13 * 5) * 2 == 90, which is incorrect:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(r1) + COUNT(r2)
=> 90
Again, this is because I've matched on all combinations of r1 and r2, of which there are 65 (13 * 5 == 65) and then summed this to arrive at a total of 90 (65 + 65 == 90).
The solution is to use DISTINCT:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2)
=> 18
Clearly, the DISTINCT modifier only counts the distinct instances of each entity.
You can also accomplish this with WITH if you wanted:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
WITH COUNT(r) AS r1
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN r1 + COUNT(r)
=> 18
TL;DR - Beware of Cartesian products. DISTINCT is your friend:
MATCH (:someIndex{name:"name1"})-[r1]-(),
(:someIndex{name:"name2"})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2);
The explosion of results you're seeing can be easily explained:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
In the 2nd line every combination of any relationship from g1 is combined with any relationship of g2, this explains the number since 1423740 = 305 * 2334 * 2. So you're evaluating basically a cross product here.
The right way to calculate the sum of all relationships for name1 and name2 is:
MATCH (g:someIndex)-[r]-()
WHERE g.name in ["name1", "name2"]
RETURN count(r)
I have modeled my neo4j database according to this answer by Nicole White in this link
and I also successfully tested the cypher query
MATCH (a:Stop {name:'A'}), (d:Stop {name:'D'})
MATCH route = allShortestPaths((a)-[:STOPS_AT*]-(d)),
stops = (a)-[:NEXT*]->(d)
RETURN EXTRACT(x IN NODES(route) | CASE WHEN x:Stop THEN 'Stop ' + x.name
WHEN x:Bus THEN 'Bus ' + x.id
ELSE '' END) AS itinerary,
REDUCE(d = 0, x IN RELATIONSHIPS(stops) | d + x.distance) AS distance
against a small test graph with 10 nodes.
But my original graph which contains about 2k nodes and 6k relationships causes trouble with the query. The query simply stops and I get an error:
java.lang.OutOfMemoryError: Java heap space
Can you help me to optimize my query or any other solution?
Thank you
try to introduce a WITH to limit the calculation of :NEXT paths to only those pairs of a, d that are known to be a shortestpath. It's also a good practice to supply an upper limit for variable path length matches - im using 100 here as an example:
MATCH route = allShortestPaths(
(a:Stop {name:'A'})-[:STOPS_AT*100]-(d:Stop {name:'D'})
)
WITH route, a, d
MATCH stops = (a)-[:NEXT*100]->(d)
RETURN EXTRACT(x IN NODES(route) | CASE WHEN x:Stop THEN 'Stop ' + x.name
WHEN x:Bus THEN 'Bus ' + x.id
ELSE '' END) AS itinerary,
REDUCE(d = 0, x IN RELATIONSHIPS(stops) | d + x.distance) AS distance
I'm trying to to get the SUM of the weights on each path my MATCH finds. Query is below:
START n=node(10200)
MATCH p=(n)-[r*1..5]->(m:Facility)
WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum
WHERE ALL(n in nodes(p) WHERE 1=length(filter(m in nodes(p) : m=n)))
RETURN p AS paths, length(p) AS pc,
(weight_sum / (length(p) * (length(p) / 2))) AS sp;
Every time I run it, I'm getting...
Unknown identifier `p`
If I remove my WITH line (and the weight_sum RETURN value), the query knows what 'p' is and executes just fine. Is there a problem with my query that the value of 'p' is being lost? Is there a better alternative to get the SUM of these relationship properties?
You can just pipe "p" to the next part of the query via the WITH:
START n=node(10200)
MATCH p=(n)-[r*1..5]->(m:Facility)
WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p
WHERE ALL(n in nodes(p) WHERE 1=length(filter(m in nodes(p) : m=n)))
RETURN p AS paths, length(p) AS pc,
(weight_sum / (length(p) * (length(p) / 2))) AS sp;