How To Generate k-ary trees in Neo4j quickly? - neo4j

I'm generating perfect k-ary trees in neo4j but my queries for doing so don't seem very efficient I was wondering if I could improve on them in anyway, my go code below shows all three queries Im running to generate the trees, k is number of children per node, h is tree height:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
lastNode := ((iPow(k, (h + 1)) - 1) / (k - 1)) - 1
err := execNeo(fmt.Sprintf("FOREACH(i IN RANGE(0, %d, 1) | CREATE (:NODE {id:i, value:i}))", lastNode))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE), (b:NODE) WHERE b.id = a.id * %d + 1 CREATE (a)-[:FIRST_CHILD]->(b)", k))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE), (b:NODE) WHERE b.id = a.id + 1 AND a.id %% %d <> 0 CREATE (a)-[:NEXT_SIBLING]->(b)", k))
if err != nil {
return err
}
return nil
}
I think this is slow for h > 9 because of the last 2 queries, the MATCH on the 2 unconnected nodes, when I run this in the neo4j web client it warns about:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (b))
Is there a way I can reformulate these queries to be more efficient?
EDIT:
The code is here if you wish to run it: https://github.com/robsix/data_model_perf_test

Graphs are designed to quickly identify a single point and then traverse from there. Your query structure (write all the nodes, then sort them and add relationships) does pretty much the opposite, which is why you're getting all those warnings. Unfortunately, to provide variable children per node, you will need to be able to query the id property quickly, so make sure that you have an index on :Node(id) and then try a single big query like this:
WITH 3 AS k, 2 AS h
WITH k, REDUCE(s = toFloat(0), x IN RANGE(1, h-1)|s + k^x) AS max_parent_id
UNWIND RANGE(0, toInt(max_parent_id)) AS parent_id
WITH k, parent_id, k*parent_id+1 AS first_child_id
MERGE (parent:NODE {id:parent_id, value:parent_id})
MERGE (child:NODE {id: first_child_id, value:first_child_id})
MERGE (parent) - [:FIRST_CHILD] -> (child)
WITH k, first_child_id
UNWIND RANGE(first_child_id + 1, first_child_id + k - 1) AS next_child_id
MERGE (last_child:NODE {id:next_child_id -1, value:next_child_id -1})
MERGE (next_child:NODE {id:next_child_id, value:next_child_id})
MERGE (last_child) - [:NEXT_SIBLING] -> (next_child)
This will run through all the possible parent ids, and for each one, will MERGE (match or create) a node with the right ID. It will then MERGE the first child node, whose ID you can already calculate, along with the FIRST_CHILD relationship. This will avoid your cartesian problem. The query will then go through the ids of each possible sibling to the first one, MATCH the existing sibling, and MERGE the next sibling along with the relationship.
UPDATE: I am so sorry, I totally overlooked the node visualization when testing it. I've updated the query since, to solve an index error and account for some reordering that I didn't know Cypher did. You learn something every day! But yeah, what's up there now generates the right graph.

The best I can come up with is still to use three queries but the are used in an interesting way to create the K-ary tree without making neo4j do too many searches:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
lastNode := ((iPow(k, (h+1)) - 1) / (k - 1)) - 1
if lastNode % 2 != 0 {
err := execNeo(fmt.Sprintf("UNWIND RANGE(0, %d, 2) AS id CREATE (a:NODE {id:id, value: id})-[:NEXT_SIBLING]->(b:NODE {id: id+1, value: id+1}) WITH a, b MATCH (c:NODE {id: b.id+1}) CREATE (b)-[:NEXT_SIBLING]->(c)", lastNode - 1))
if err != nil {
return err
}
} else {
err := execNeo(fmt.Sprintf("UNWIND RANGE(1, %d, 2) AS id CREATE (a:NODE {id:id, value: id})-[:NEXT_SIBLING]->(b:NODE {id: id+1, value: id+1}) WITH a, b MATCH (c:NODE {id: b.id+1}) CREATE (b)-[:NEXT_SIBLING]->(c)", lastNode))
if err != nil {
return err
}
err = execNeo("MATCH (a:NODE {id:1}) CREATE (:NODE {id:0, value:0})-[:NEXT_SIBLING]->(a)")
}
lastParentNode := (lastNode - 1) / k
err := execNeo(fmt.Sprintf("UNWIND RANGE(0, %d, 1) AS id MATCH shortestPath((a:NODE {id:id})-[:NEXT_SIBLING *]->(b:NODE {id:id*%d+1})) CREATE (a)-[:FIRST_CHILD]->(b)", lastParentNode, k))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE)-[r:NEXT_SIBLING]->(b:NODE) WHERE a.id %% %d = 0 DELETE r", k))
if err != nil {
return err
}
return nil
}
I should note that this algorithm is specifically for perfect k-ary trees with node id's allocated in breadth first order, the way it works is:
1) generate all the nodes in pairs and assign them all in order as being NEXT_SIBLINGS of one another i.e. 0->1->2->3->4 so you end up with a straight graph.
2) loop through all the ids small enough to have children and match using the shortestPath function in the hope that neo4j is smart enough to work out that given the current shape of the graph as soon as it finds a match, that is the shortest possible path and so return early without continuing to search further.
3) the last query then grabs adjacent nodes that should not be considered NEXT_SIBLINGS and deletes the relationship leaving behind a perfect k-ary tree with depth h.
The changes have sped up the data creation by at least an order of magnitude.
UPDATE:
The accepted answer above is correct, this is just the go code that matches it:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
return execNeo(fmt.Sprintf(`
WITH %d AS k, %d AS h
WITH k AS k, REDUCE(s = toFloat(0), x IN RANGE(1, h-1)|s + k^x) AS max_parent_id
UNWIND RANGE(0, toInt(max_parent_id)) AS parent_id
WITH k AS k, parent_id, k*parent_id+1 AS first_child_id
MERGE (parent:NODE {id:parent_id, value:parent_id})
MERGE (child:NODE {id: first_child_id, value:first_child_id})
MERGE (parent) - [:FIRST_CHILD] -> (child)
WITH k AS k, first_child_id
UNWIND RANGE(first_child_id + 1, first_child_id + k - 1) AS next_child_id
MERGE (last_child:NODE {id:next_child_id -1, value:next_child_id -1})
MERGE (next_child:NODE {id:next_child_id, value:next_child_id})
MERGE (last_child) - [:NEXT_SIBLING] -> (next_child)
`, k, h))
}
It is several order of magnitude faster than my improvements I originally described in this answer

Related

Bidirectional recursion in neo4j

Cannot find answer online for this. I want to do a recursive query upstream and downstream on protein interaction map. If user enters a protein (protein 'C') and depth N=2, I want to return 2 upstream and 2 downstream proteins in the interaction map and the regulation. However if its upstream then protein 'b' on right side of MATCH needs to come first in the return table and if its downstream direction then protein 'a' on left side of match needs to come first in return table. How can I do this?
For instance this is the bidirection but half of the rows are the wrong order in columns 1 and 3.
MATCH p = (a:Protein { name:'C' })<-[:REGULATES*1..2]->(b:Protein)
WITH *, relationships(p) as r
RETURN nodes(p)[length(p)-1].name AS Protein1, r[length(p)-1] as Regulates, b.name AS Protein2
I can only get what I want with two calls and switching order or RETURN columns.
MATCH p = (a:Protein { name:'C' })-[:REGULATES*1..2]->(b:Protein)
WITH *, relationships(p) as r
RETURN nodes(p)[length(p)-1].name AS Protein1, r[length(p)-1] as Regulates, length(p), b.name AS Protein2
MATCH p = (a:Protein { name:'C' })<-[:REGULATES*1..2]-(b:Protein)
WITH *, relationships(p) as r
RETURN b.name AS Protein1, r[length(p)-1] as Regulates, nodes(p)[length(p)-1].name AS Protein2
Figured it out using functions startNode and endNode. The last() and head() functions are also handy.
MATCH p = (n:Protein { name:'C' })<-[:REGULATES*1..3]->(b:Protein)
WITH *, relationships(p) as rs
RETURN startNode(last(rs)).name as Protein1, last(rs).direction as Regulates, endNode(last(rs)).name as Protein2, length(p)

How to execute cypher query within a CASE WHEN THEN clause in Neo4j Cypher

I have a use case where I am trying to optimize my Neo4j db calls and code by using the RETURN CASE WHEN THEN clauses in Cypher to run different queries depending on the WHEN result. This is my example:
MATCH (n {email: 'abc123#abc.com'})
RETURN
CASE WHEN n.category='Owner' THEN MATCH '(n)-[r:OWNS]->(m)'
WHEN n.category='Dealer' THEN MATCH (n)-[r:SUPPLY_PARTS_FOR]->(m)
WHEN n.category='Mechanic' THEN MATCH (n)-[r:SERVICE]-(m) END
AS result;
I am not sure this is legal but this is what I want to achieve. I am getting syntax errors like Invalid input '>'. How can I achieve this in the best manner?
EDIT for possible APOC solution:
This was my plan before discovering the limitation of FOREACH...
MATCH (user:Person {email:{paramEmail}})
FOREACH (_ IN case when 'Owner' = {paramCategory} then [1] else [] end|
SET user:Owner, user += queryObj
WITH user, {paramVehicles} AS coll
UNWIND coll AS vehicle
MATCH(v:Vehicles {name:vehicle})
CREATE UNIQUE (user)-[r:OWNS {since: timestamp()}]->(v)
SET r += paramVehicleProps
)
FOREACH (_ IN case when 'Mechanic' = {Category} then [1] else [] end|
SET user:Owner, user += queryObj
WITH user, {paramVehicles} AS coll
….
)
FOREACH (_ IN case when 'Dealer' = {paramCategory} then [1] else [] end|
SET user:Owner, user += queryObj
WITH user, {paramVehicles} AS coll
…...
)
RETURN user,
CASE {paramCategory}
WHEN 'Owner' THEN [(n)-[r:OWNS]->(m) | m and r]
WHEN 'Dealer' THEN [(n)-[r:SUPPLY_PARTS_FOR]->(m) | m]
WHEN 'Mechanic' THEN [(n)-[r:SERVICE]-(m) | m]
END AS result`,{
paramQueryObj: queryObj,
paramVehicles: makeVehicleArray,
paramVehicleProps: vehiclePropsArray,
paramSalesAgent: dealerSalesAgentObjarray,
paramWarehouseAgent: dealerWarehouseAgentObjarray
}).....
does anyone know to convert this using apoc.do.when()? note I need 'm' and 'r' in the first THEN.
You should still use a label in your first match, otherwise you get a full database scan and not a index lookup by email!!
for your query you can use pattern comprehensions:
MATCH (n:Person {email: 'abc123#abc.com'})
RETURN
CASE n.category
WHEN 'Owner' THEN [(n)-[r:OWNS]->(m) | m]
WHEN 'Dealer' THEN [(n)-[r:SUPPLY_PARTS_FOR]->(m) | m]
WHEN 'Mechanic' THEN [(n)-[r:SERVICE]-(m) | m] END
AS result;
It is also possible to use apoc.do.case: https://neo4j.com/labs/apoc/4.4/overview/apoc.do/apoc.do.case/
CALL apoc.do.case([
false,
'CREATE (a:Node{name:"A"}) RETURN a AS node',
true,
'CREATE (b:Node{name:"B"}) RETURN b AS node'
],
'CREATE (c:Node{name:"C"}) RETURN c AS node',{})
YIELD value
RETURN value.node AS node;

How to set enumeration on properties for selected nodes in one cypher statement?

My graph contains a set of nodes which are enumerated using a dedicated field fid. I want to update this enumeration periodically.
My current approach is to reset the enumeration and execute multiple statements that increase the fid for each node.
1. (f:File) set f.fid = -1
for(int i = 0; i < count ; i++) {
2. (f:File) set f.fid = i where id(f) = nodeId
}
I guess it should be possible to execute this task using a single cypher statement using the foreach clause.
MATCH p=(f:File)
FOREACH (n IN nodes(p)| SET f.fid = -1 )
I was looking for something similar to this statement.
MATCH (f:File)
WITH COLLECT(f) AS fs
WITH fs, i = 0
FOREACH (f in fs, i=i+1| SET f.fid = i ) return f.fid, f.name
Based on the following console set : http://console.neo4j.org/r/447qni
The following query seems to do the trick :
MATCH (f:File)
WITH collect(f) as f, count(f) AS c
UNWIND range(0,c-1) AS x
WITH f[x] AS file,x
SET file.iteration = x+1

Shortest path through a node

How to get the shtortest path between two nodes, where the path is going through a specific node. This is what I've got so far:
MATCH (a:User { username:"User1" }),(b:User { username:"User2" }),(c:User { username:"User3" }),
p = allShortestPaths((a)-[*]-(b))
RETURN p
I want a result in which the path goes through User3 (c).
You can find each leg separately:
MATCH (a:User {username:"User1"}),(b:User {username:"User2"}),(c:User {username:"User3"}),
p = allShortestPaths((a)-[*]-(c)), q = allShortestPaths((c)-[*]-(b))
RETURN p, q
Be aware, though, that if you have very long paths, or cycles, this query can take a long time or never finish. You should consider putting an upper bound on the path lengths, e.g.:
MATCH (a:User {username:"User1"}),(b:User {username:"User2"}),(c:User {username:"User3"}),
p = allShortestPaths((a)-[*..10]-(c)), q = allShortestPaths((c)-[*..10]-(b))
RETURN p, q

Cypher query produces incomplete results (neo4j-1.9-SNAPSHOT)

I have run into the problem when executing a cypher query against a database "neo4j-1.9-SNAPSHOT" on Windows 7.
The database can be downloaded from the topic in Google Groups.
When I run the fist 2 queries in the web admin console, I do not get node with id ="45" as the first node in the path in the result list.
1) start a = node:my_nodes(label='2826'), b = node:my_nodes(label='2826')
match a-[r1]-b
with a, b, r1
match b-[r2]-c
where c.label = 2826 and r1.label = r2.label and id(r1) <> id(r2)
return id(a), id(b), id(c), id(r1), id(r2);
2) START n0=node:my_nodes(label='2826'), n1=node:my_nodes(label='2826'),
n2=node:my_nodes(label='2826')
MATCH n0-[r0]-n1-[r1]-n2
where r0.label = r1.label and id(r0)<>id(r1)
RETURN id(n0), id(n1), id(n2), id(r0), id(r1);
However, when I run the 3rd query, node with id="45" should definitely be in the result list of the first two queries. Moreover, when checking the database it seems to be the case.
3) start a = node(45), b = node:my_nodes(label='2826')
match a-[r1]-b
with a, b, r1
match b-[r2]-c
where a.label = 2826 and c.label = 2826 and r1.label = r2.label and id(r1) <> id(r2)
return id(a), a.label, id(b), id(c), id(r1), id(r2);
On running the cypher query:
start a = node:my_nodes(label='2826')
return id(a);
node with id="45" is in the index.
Any ideas what can be wrong with the first 2 queries?

Resources