Unknown Error with Cypher Request for sum() over Subtrees - neo4j

Trying to make following cypher request match (n:FOLDER)-[r*]->(m:FILE) with n,sum(m.size) as calc SET n.calculatedSize=calc
after about one minute the cypher browser says Unknown error.
My Request should sumarize the size of the whole subtree. So every folder should have a summarized size of all it subitems (FOLDER and FILE). in Production environment there will be about 9million items with a depth of max 15.
Why the Request returns Unknown error, is there any better way to achieve the calculated size?

fadanner,
You might find it is faster to first do a one-level calculation to sum the file sizes into their immediate parent folders, then work up.
MATCH (n:FOLDER)-[r]-(m:FILE)
WITH n, sum(m.size) as calc
SET n.calculatedSize = calc
Set a temporary property on all FOLDER nodes to indicate whether they have been visited yet.
MATCH (m:FOLDER) set m.seen = 0
Mark the leaf folders as seen.
MATCH (m:FOLDER)
WHERE NOT (m)-[:CONTAINS]->(:FOLDER)
SET m.seen = 1
Repeatedly apply this query until the return value is zero to calculate all the sizes.
MATCH (m:FOLDER {seen : 0})-[:CONTAINS]->(n:FOLDER)
WITH m, sum(n.seen) AS val1, count(n) AS val2, sum(n.calculatedSize) AS val3
WHERE val1 = val2
SET m.calculatedSize = m.calculatedSize + val3, m.seen=1
RETURN count(m)
Once you are done, remove the 'seen' properties with
MATCH(m:FOLDER)
REMOVE m.seen
Hope this helps.
Grace and peace,
Jim

Try to specify a limit in your variable path length:
match (n:FOLDER)-[r*..15]->(m:FILE)
with n,sum(m.size) as calc
SET n.calculatedSize=calc

Related

Get the shortest path from certain node to ANY node of a given label in Neo4J

I have a node, let's call it X. I wanna find the closest node to X, let's name it Y and Y has to have a certain label. If there are multiple such Y nodes at the same distance to X, I'd like them all to be returned
Suppose we have nodes A and B of a certain label. The minimum path length from X to A is 3 and from X to B is 5. I want it to return A and only A. If the minimum path length is equal, I'd like it to return both of them (both A and B)
Here's what I have so far :
MATCH p=shortestPath((selectedNode {name:'X'})-[*]-(y:GivenLabel))
WITH y.name as y, length(p)=min(length(p)) AS l
RETURN y
The problem of this query is that it returns both A and B in the above example, no matter what the minimum path to each one of them is. I thought about using LIMIT 1 and ordering them but then it'll only display one of them, even if the minimum path length to each one of them is equal
Thanks in advance!
The collect function will let you return a single row containing a list of values.
MATCH p=shortestPath((selectedNode {name:'X'})-[*]-(y:GivenLabel))
RETURN length(p) AS l, collect(y.name) as targets
ORDER BY l
LIMIT 1
If you want to return the values as individual records, instead of a list, you can use UWIND.
MATCH p=shortestPath((selectedNode {name:'X'})-[*]-(y:GivenLabel))
WITH length(p) AS l, collect(y.name) as targets
ORDER BY l
LIMIT 1
UNWIND targets as target
RETURN l, target
If you have APOC Procedures, you can use one of the path expander procs to find a shortest path to a given label and limit it. Unfortunately you'll need a second call to get multiple nodes of the label at the same distance away.
// assume we've already MATCHed to selectedNode
...
CALL apoc.path.expandConfig(selectedNode, {labelFilter:'/GivenLabel', limit:1}) YIELD path
WITH selectedNode, length(path) as pathLength
CALL apoc.path.subgraphNodes(selectedNode, {labelFilter:'/GivenLabel', maxLevel:pathLength}) YIELD node
RETURN node
In the label filter, the prefix of / indicates that the label will be used in a termination filter, once it finds the first occurrence it will stop expanding, and use the node as a result.
The path expander procs use breadth-first expansion by default, so it will be the shortest path from your starting node to a node of the given label.
limit:1 ensures that we return after finding the first result (this can be very expensive if there is no such node of the given label, or if it's far away, so you might consider providing a maxLevel as an upper bound).
We make a similar path expander call (subgraphNodes(), since we no longer need the path, just the node at the end) using the length of the path found previously as our maxLevel, that will return all nodes with the given label at that distance.

Neo4j shortestPath with highest aggregated relationship propertie

I want to find the shortest path between two nodes. The path itself is not the problem... The bigger problem is, that I´ll want to return the path, where the aggregated relationship property on the path is highest.
For better understanding, here´s what I want:
This is my query
MATCH
(startNode:Person {id:"887111"}),
(endNode:Person {id:"789321"}),
paths = allShortestPaths((startNode)-[r:KNOWS *..20]-(endNode))
RETURN paths
In this example I´ll want to have the path from Elissa (id: 887111) to Kasey (id: 789321) where the aggregated count ON the relationship is MAX.
I´ve also had a look at 'shortestPath', which only gives me one path. The other way is to call the 'dijkstra'-algo, with this I´ll get only the path with the lowest 'cost' (and not the highest).
So in my example the only path which should shown up is Elissa->Travon->Kasey
I´d think, the problem isn´t that complex, but at the moment I´m gettin stucked with this..
Thanks so far in advance.
UPDATE
after calling the suggested query
MATCH (startNode:Person {id:"789321"}), (endNode:Person {id:"887111"})
CALL apoc.algo.dijkstra(startNode, endNode, 'KNOWS', '_duration') YIELD path, weight
RETURN path, -weight AS weight
my result is the following
[UPDATED]
I present 2 answers, depending one what you are trying to do.
1. Finding path with max total weight
To find the path with max total weight, you can feed to the Dijkstra algorithm the negation of the original weight properties. The resulting "lowest" total weight will be a negative value that, when negated, will actually be the highest total weight (based on the original weight properties).
There is an APOC procedure, apoc.algo.dijkstra that implements the Dijkstra algorithm, but it does not allow you to use the negative value of the specified weight property. So, to use that procedure, you would need to add a new property to each KNOWS relationship with the appropriate negative value. For example, to add the negative weights to existing relationships (assuming w is the original weight property, and _w will contain the corresponding negative value):
MATCH ()-[k:KNOWS]->()
SET k._w = -k.w;
Once you have the negative weights, the following should give you the path with the max weight:
MATCH (startNode:Person {id:"887111"}), (endNode:Person {id:"789321"})
CALL apoc.algo.dijkstra(startNode, endNode, 'KNOWS', '_w') YIELD path, weight
RETURN path, -weight AS weight;
2. Choosing from the shortest paths the one with maximum total weight
MATCH
(startNode:Person {id:"887111"}),
(endNode:Person {id:"789321"}),
path = allShortestPaths((startNode)-[:KNOWS *..20]-(endNode))
RETURN path, REDUCE(s = 0, r IN RELATIONSHIPS(path) | s + r.duration) AS weight
ORDER BY weight DESC
LIMIT 1;

How to update Nodes within a random manner in Neo4j

how can i update a random set of nodes in Neo4j. I tried the folowing:
match (Firstgraph)
with id(Firstgraph) as Id
return Firstgraph.name, Firstgraph.version,id(Firstgraph)
order by rand();
match (G1:FirstGraph)
where id(G1)=Id
set G1.Version=5
My idea is the get a random set then update it, but i got the error:
Expected exactly one statement per query but got: 2
Thanks for your help.
Let's find out what's the problem here, first of all, your error
Expected exactly one statement per query but got: 2
This is coming from your query, if we check it, we see that you did two queries in the same sentence, that's why you get this error.
match (Firstgraph) with id(Firstgraph) as Id
return Firstgraph.name, Firstgraph.version,id(Firstgraph) order by
rand(); match (G1:FirstGraph) where id(G1)=Id set G1.Version=5
This is not a good query, because you can't use ; in a query sentence, it's the query end marker, so you can't do another query after this, but you can use UNION:
match (Firstgraph) with id(Firstgraph) as Id
return
Firstgraph.name, Firstgraph.version,id(Firstgraph) order by rand()
UNION
match (G1:FirstGraph) where id(G1)=Id set G1.Version=5
Also, if you want to match a random set of nodes, you can simply do this (this example is for a 50% chances to get each node):
Match (node) Where rand() > 0.5 return node
And then do whatever you want with the node using WITH

Neo4J/Cypher: is it possibile to filter the length of a path in the where clause?

Let's suppose we have this simple pattern:
p=(a)-[r]-(b)
where nodes a and b have their own properties already set in the WHERE clause (e.g. a:Movie AND a.title = "The Matrix" AND b:Movie).
I'd like to add another condition in the WHERE clause like
LEGHT(p) =2 OR LENGTH(p)>6
(not the correct syntax, I know)
As far as I know, it is possibile to specify the length of a path in the MATCH clause with the syntax [r*min..max] but that doesn't cover the case I'm looking for.
Any help would be appreciated =)
Yes, that does work in neo4j, exactly as you've specified.
Sample data:
create (a:Foo)-[:stuff]->(b:Foo)-[:stuff]->(c:Foo);
Then this query:
MATCH p=(x:Foo)-[:stuff*]->(y:Foo)
WHERE length(p)>1 RETURN p;
Returns only the 2-step path from a->b->c and not either one-step path (a->b or b->c)
Does this work for you?
MATCH p=(a)-[r*2..]-(b)
WHERE LENGTH(r) = 2 OR LENGTH(r) > 6
RETURN p
Note that with a large DB this query can take a very long time, or not finish, since the MATCH clause does not set an upper bound on the path length.

Neo4j: Finding simple path between two nodes takes a lot of time

Neo4j: Finding simple path between two nodes takes a lot of time even after using upper limit (*1..4). I don't want to use allShortestPath or shortestPath because it doesnt return all the paths.
Match p=((n {Name:"Node1"}) -[*1..4]-> (m {Name:"Node2"})) return p;
Any suggestions to make it faster ?
If you have a lot of nodes, try creating an index so that the neo4j DB engine does not have to search through every node to find the ones with the right Name property values.
I am presuming that, in your example, the n and m nodes are really the same "type" of node. If this is true, then:
Add a label (I'll call it 'X') to every node (of the same type as n and m). You can use the following to add the 'X' label to node(s) represented by the variable n. You'd want to precede it with the appropriate MATCH clause:
SET n:X
Create an index on the Name property of nodes with the X label like this:
CREATE INDEX ON :X(Name);
Modify your query to:
MATCH p=((n:X {Name:"Node1"}) -[*1..4]-> (m:X {Name:"Node2"}))
RETURN p;
If you do the above, then your query should be faster.

Resources