Neo4j out of memory Error - neo4j

I have modeled my neo4j database according to this answer by Nicole White in this link
and I also successfully tested the cypher query
MATCH (a:Stop {name:'A'}), (d:Stop {name:'D'})
MATCH route = allShortestPaths((a)-[:STOPS_AT*]-(d)),
stops = (a)-[:NEXT*]->(d)
RETURN EXTRACT(x IN NODES(route) | CASE WHEN x:Stop THEN 'Stop ' + x.name
WHEN x:Bus THEN 'Bus ' + x.id
ELSE '' END) AS itinerary,
REDUCE(d = 0, x IN RELATIONSHIPS(stops) | d + x.distance) AS distance
against a small test graph with 10 nodes.
But my original graph which contains about 2k nodes and 6k relationships causes trouble with the query. The query simply stops and I get an error:
java.lang.OutOfMemoryError: Java heap space
Can you help me to optimize my query or any other solution?
Thank you

try to introduce a WITH to limit the calculation of :NEXT paths to only those pairs of a, d that are known to be a shortestpath. It's also a good practice to supply an upper limit for variable path length matches - im using 100 here as an example:
MATCH route = allShortestPaths(
(a:Stop {name:'A'})-[:STOPS_AT*100]-(d:Stop {name:'D'})
)
WITH route, a, d
MATCH stops = (a)-[:NEXT*100]->(d)
RETURN EXTRACT(x IN NODES(route) | CASE WHEN x:Stop THEN 'Stop ' + x.name
WHEN x:Bus THEN 'Bus ' + x.id
ELSE '' END) AS itinerary,
REDUCE(d = 0, x IN RELATIONSHIPS(stops) | d + x.distance) AS distance

Related

Pattern Matching in Neo4j

Assume that in an application, the user gives us a graph and we want to consider it as a pattern and find all occurrences of the pattern in the neo4j database. If we knew what the pattern is, we could write the pattern as a Cypher query and run it against our database. However, now we do not know what the pattern is beforehand and receive it from the user in the form of a graph. How can we perform a pattern matching on the database based on the given graph (pattern)? Is there any apoc for that? Any external library?
One way of doing this is to decompose your input graph into edges and create a dynamic cypher from it. I have worked on this quite some time ago, and the solution below is not perfect but indicates a possible direction.
For example, if you feed this graph:
and you take the id(node) from the graph, (i am not taking the rel ids, this is one of the imperfections)
this query
WITH $nodeids AS selection
UNWIND selection AS s
WITH COLLECT (DISTINCT s) AS selection
WITH selection,
SPLIT(left('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',SIZE(selection)*2-1),",") AS nodeletters
WITH selection,
nodeletters,
REDUCE (acc="", nl in nodeletters |
CASE acc
WHEN "" THEN acc+nl
ELSE acc+','+nl
END) AS rtnnodes
MATCH (n) WHERE id(n) IN selection
WITH COLLECT(n) AS nodes,selection,nodeletters,rtnnodes
UNWIND nodes AS n
UNWIND nodes AS m
MATCH (n)-[r]->(m)
WITH DISTINCT "("
+nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(n) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(n)| acc + ':'+ p))+")-[:"+type(r)+"]->("
+ nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(m) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(m)| acc + ':'+ p))+")" as z,rtnnodes
WITH COLLECT(z) AS parts,rtnnodes
WITH REDUCE(y=[], x in range(0, size(parts)-1) | y + replace(parts[x],"[","[r" + (x+1))) AS parts2,
REDUCE (acc="", x in range(0, size(parts)-1) | CASE acc WHEN "" THEN acc+"r"+(x+1) ELSE acc+",r"+(x+1) END) AS rtnrels,
rtnnodes
RETURN
REDUCE (acc="MATCH ",p in parts2 |
CASE acc
WHEN "MATCH " THEN acc+p
ELSE acc+','+p
END)+
" RETURN "+
rtnnodes+","+rtnrels+
" LIMIT "+{limit}
AS cypher
returns something like
cypher: "MATCH (a:Person)-[r1:DRIVES]->(b:Car),(a:Person)-[r2:KNOWS]->(c:Person) RETURN a,b,c,r1,r2 LIMIT 50"
which you can feed to the next query.
In Graphileon, you can just select the nodes, and the result will be visualized as well.
Disclosure : I work for Graphileon
I have used patterns in genealogy queries.
The X-chromosome is not transmitted from father to son. As you traverse a family tree you can use the reduce function to create a concatenated string of the sex of the ancestor. You can then accept results that lack MM (father-son). This query gives all the descendants inheriting the ancestor's (RN=32) X-chromosome.
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return distinct m.fullname as Fullname
I am developing other pattern specific queries as part of a Neo4j PlugIn for genealogy. These will include patterns of triangulation groups.
GitHub repository for Neo4j Genealogy PlugIn

Can't make reduce work in cypher

In this Cypher query, I want to sum all the weights over paths in a graph:
MATCH p=(n:person)-[r*2..3]->(m:person)
WHERE n.name = 'alice' and m.name = 'bob'
WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p
return n.name, m.name, weight_sum
LIMIT 10
In this query, I expect to receive a table with 3 columns: n.name, m.name (identical in all the rows), and weight_sum -- according to the weight sum in the specific path.
However, I get this error:
reduce(...) requires '| expression' (an accumulation expression) (line 3,
column 6 (offset: 89))
"WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p"
I obviously miss something trivial. But what?
Shouldn't that be
REDUCE(weights=0, rel IN r | weights + rel.weight) AS weight_sum
(with a pipe instead of a colon) as per the documentation in http://neo4j.com/docs/developer-manual/current/cypher/functions/list/ ?
reduce(totalAge = 0, n IN nodes(p)| totalAge + n.age) AS reduction
Hope this helps.
Regards,
Tom

Optimize cypher query to avoid cartesian product

The query purpose is pretty trivial. For a given nodeId(userId) I want to return on the graph all nodes which has relaionship within X hops and I want to aggregate and return the distance(param which set on the relationship) between them)
I came up with this:
MATCH p=shortestPath((user:FOLLOWERS{userId:{1}})-[r:follow]-(f:FOLLOWERS)) " +
"WHERE f <> user " +
"RETURN (f.userId) as userId," +
"reduce(s = '', rel IN r | s + rel.dist + ',') as dist," +
"length(r) as hop"
userId({1}) is given as Input and is indexed.
I believe Iam having here cartesian product. how would you suggest avoiding it?
You can make the cartesian product less onerous by creating an index on :FOLLOWERS(userId) to speed up one of the two "legs" of the cartesian product:
CREATE INDEX ON :FOLLOWERS(userId);
Even though this will not get rid of the cartesian product, it will run in O(N log N) time, which is much faster than O(N ^ 2).
By the way, your r relationship needs to be variable-length in order for your query to work. You should specify a reasonable upper bound (which depends on your DB) to assure that the query will finish in a reasonable time and not run out of memory. For example:
MATCH p=shortestPath((user:FOLLOWERS { userId: 1 })-[r:follow*..5]-(f:FOLLOWERS))
WHERE f <> user
RETURN (f.userId) AS userId,
REDUCE (s = '', rel IN r | s + rel.dist + ',') AS dist,
LENGTH(r) AS hop;

How to perform distinct while having multiple paths using Cypher

For a given node(sourceNode) I want to retrieve all node's which has relationships to my sourceNode within 3 hops.
The problem starts when we have multiple pathes btw source and destination nodes.
I dont care which path I get as long as I get one and I dont want to get the other ones (would be great to get only the shortest path)
So this is my code:
MATCH (user:C9 {userId:'70'})-[r:follow*1..3]-f WHERE f <> user
RETURN DISTINCT (f.userId) as userId,
reduce(s = '', rel IN r | s + rel.dist + ',') as dist,
length(r) as hop
The repose for this consist the same nodeId(userId's) and not performing distinct:
I would like to avoid the duplicated lines with the same userId.
any idea how to perform the distinct here?
Thanks,
ray.
How about something like this? Rather than look for distinct user just ue shortestPath to get to each follower 1..3 out from the starting user.
MATCH p=shortestPath((user:C9 {userId:'70'})-[r:follow*1..3]-(f))
WHERE f <> user
RETURN f.userId,
reduce(s = '', rel IN r | s + rel.dist + ',') as dist,
length(p) as hop
Alternatively, if you were looking to do it by shortest distance regardless of hops you could do something like the following example. Instead of using shortestPath, aggregate the distances on each relationship, order by shortest, put them in a collection, order by user and return the first element of the collection which will be the shortest
MATCH p=(user:C9 {userId:'70'})-[r:follow*1..3]-(f)
WHERE f <> user
with f.userId as user_id
, reduce(s = 0, rel IN relationships(p) | s + rel.dist) as dist
, length(p) as hops
order by dist
with user_id, collect(dist) as dists_per_follow, collect(hops) as hops_per_follow
return user_id
, dists_per_follow[0] as shortest
, dists_per_follow, hops_per_follow
order by user_id

Remove all labels for Neo4j Node

The following examples are taken from the Neo4j documentation found here.
Using Cypher, it is possible to remove a single, known label using a Cypher statement like so:
MATCH (n { name: 'Peter' })
REMOVE n:German
RETURN n
You can also remove multiple labels like so:
MATCH (n { name: 'Peter' })
REMOVE n:German:Swedish
RETURN n
So how would one remove all labels from a node using simple Cypher statements?
You can also try this way using doIt method from apoc library:
match (n {name: 'Peter'})
call apoc.cypher.doIt(
"match (o)" +
" where ID(o) = " + ID(n) +
" remove "+reduce(a="o",b in labels(n) | a+":"+b) +
" return (o);",
null)
yield value
return value
There's no syntax for that yet! Labels are usually things that are known quantities, so you can list them all out if you want. There's no dynamic way to remove them all, though.
so, how about a two step cypher approach? use cypher to generate some cypher statements and then execute your cypher statements in the shell.
You could try something like this to generate the batch cypher statements
match (n)
return distinct "match (n"
+ reduce( lbl_str= "", l in labels(n) | lbl_str + ":" + l)
+ ") remove n"
+ reduce( lbl_str= "", l in labels(n) | lbl_str + ":" + l)
+ ";"
The output should look something like this...
match (n:Label_1:Label_2) remove n:Label_1:Label_2;
match (n:Label_1:Label_3) remove n:Label_1:Label_3;
match (n:Label_2:Label_4) remove n:Label_2:Label_4;
You would probably want to remove any duplicates and depending on your data there could be quite a few.
Not exactly what you are looking for but I think it would get you to the same end state using just cypher and the neo4j shell.
Shiny NEW and improved cypher below...
I edited this down to something that would work in the browser alone. It hink this is a much better solution. It is still two steps but it produces a single statement that can be cut and paste into the browser.
match (n)
with distinct labels(n) as Labels
with reduce(lbl_str="", l in Labels | lbl_str + ":" + l) as Labels
order by Labels
with collect(Labels) as Labels
with Labels, range(0,length(Labels) - 1) as idx
unwind idx as i
return "match (n" + toString(i) + Labels[i] + ")" as Statement
union
match (n)
with distinct labels(n) as Labels
with reduce(lbl_str="", l in Labels | lbl_str + ":" + l) as Labels
order by Labels
with collect(Labels) as Labels
with Labels, range(0,length(Labels) - 1) as idx
unwind idx as i
return "remove n" + toString(i) + Labels[i] as Statement
which produces output like this...
match (n0:Label_A)
match (n1:Label_B)
match (n2:Label_C:Label_D)
match (n3:Label_E)
remove n0:Label_A
remove n1:Label_B
remove n2:Label_C:Label_D
remove n3:Label_E
which can then be cut and paste into the Neo4j browser.

Resources