Call APOC procedure in apoc.periodic.iterate - neo4j

I am trying to add new relationships based on a property of existing relationships with the apoc.create.relationship function:
:auto CALL apoc.periodic.iterate(
"MATCH (source:Entity)-[r:TEMP_RELATION]->(target:Entity) RETURN source, r, target",
"CALL apoc.create.relationship(source, r.`Interaction-type`, r, target) YIELD rel RETURN rel",
{batchSize:5}
);
When I run this query I get Java heap errors (max heap is 8g). It looks like iterate is not actually iterating but loading too much into memory. I use Neo4j 4.4.8 on a Mac (M1).
Any ideas why there is a memory leak here?

Since neo4j 4 the behavior changed, when you pass a node or relationship to a separate transaction/statement it carries along it's own transaction where it originated from.
So all updates are accumulated on that original transaction.
To avoid that you have to "rebind" the nodes and rels best by returning id(n) as id or id(r) as relId
Then you can re-match the node and rels in the update statement: WHERE id(n) = id and use it from there.
In your example:
:auto CALL apoc.periodic.iterate(
"MATCH (source:Entity)-[r:TEMP_RELATION]->(target:Entity) RETURN id(source) as sId, properties(r) as props, r.`Interaction-type` as type, id(target) as tId",
"MATCH (source), (target) where id(source) = sId AND id(target) = tId
CALL apoc.create.relationship(source, type, props, target) YIELD rel RETURN count(*)",
{batchSize:10000}
);

Related

Mass delete empty properties in a Neo4j database

I have a Neo4j database with 100M nodes. A lot of those nodes contain empty properties and I would like to remove these properties.
I have tried the following query:
:auto MATCH (n)
WITH n
call { with n
UNWIND keys(n) as k
WITH n, k
WHERE n[k] = ''
WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys)
YIELD node
RETURN node
} in transactions of 50000 rows;
I get the following error message:
Query cannot conclude with CALL (must be a RETURN clause, an update clause, a unit subquery call, or a procedure call with no YIELD) (line 3, column 1 (offset: 19))
"call { with n"
^
Can someone tell me what I'm doing wrong and how to fix that?
Thanks for your help !
I propose a counter solution to your query. Below is using apoc iterate function which will extract the data with empty property and execute the removal of the property by batch (50k) in parallel.
CALL apoc.periodic.iterate(
"MATCH (n) UNWIND keys(n) as k WITH n, k WHERE n[k] = '' RETURN n, k",
"WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys) YIELD node
RETURN node",
{batchSize:50000, parallel:true})
To explain the error you are getting, a subquery call cannot use another call that requires a yield function. It is mentioned in the error message at the end
qoute: Query cannot conclude with CALL with another procedure call with
YIELD

Cypher apoc.export.json.query is painstakingly slow

I'm trying to export subgraph (all nodes and relationships on some path) from neo4j to json.
I'm running a Cypher export query with
WITH "{cypher_query}" AS query CALL apoc.export.json.query(query, "filename.jsonl", {}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
Where cypher_query is
MATCH p = (ancestor: Term {term_id: 'root_id'})<-[:IS_A*..]-(children: Term) WITH nodes(p) as term, relationships(p) AS r, children AS x RETURN term, r, x"
Ideally, I'd have the json be triples of subject, relationship, object of (node1, relationship between nodes, node2) - my understanding is that in this case I'm getting more than two nodes per line because of the aggregation that I use.
It takes more than two hours to export something like 80k nodes and it would be great to speed up this query.
Would it benefit from being wrapped in apoc.periodic.iterate? I thought apoc.export.json.query is already optimized with this regard, but maybe I'm wrong.
Would it benefit from replacing the path-matching query in standard cypher syntax with some apoc function?
Is there a more efficient way of exporting a subgraph from a neo4j database to json? I thought that maybe creating a graph object and exporting it would work but have no clue where the bottleneck is here and hence don't know how to progress.
You could try this (although I do not see why you would need the rels in the result, unless they have properties)
// limit the number of paths
MATCH p = (root: Term {term_id: 'root_id'})<-[:IS_A*..]-(leaf: Term)
WHERE NOT EXISTS ((leaf)<-[:IS_A]-())
// extract all relationships
UNWIND relationships(p) AS rel
// Return what you need (probably a subset of what I indicated below, eg. some properties)
RETURN startNode(rel) AS child,
rel,
endNode(rel) AS parent

How to avoid specific paths in allShortestPath function?

The graph is supposed to represent a system similar to github, with commits (commit1, commit2, commit3 and commit4), documents (d1, d2) and changes on those documents (green nodes).
I am trying to use CYPHER to get all the documents values at a specific commit. In other words, I am trying to find the shortest path between the specific commit and each of the documents on my graph, but avoiding some paths.
Imagine if I am on commit4, d1 should be equal to foo2 and d2 should be equal to spain. This could be solved with the following CYPHER query:
MATCH (c:Commit {id: 'commit4'})-[:FOLLOWS|CHANGED*]->(:Value)<-[:EQUALS]-(d:Document), p = allShortestPaths((c)-[*]-(d))
RETURN p
This would give the following response:
Now, imagine that I want to be get the values on commit3. The request should not return any changes from the commit4. However, if I use the allShortestPaths function the way I do, it will go through commit4 since it is actually the shortest path to d1 and return the exact same response than if my starting node was commit4.
MATCH (c:Commit {id: 'commit3'})-[:FOLLOWS|CHANGED*]->(:Value)<-[:EQUALS]-(d:Document), p = allShortestPaths((c)-[*]-(d))
RETURN p
How could I avoid the allShortestPath function to go through a [:FOLLOWS]->(c) relationship and solve my problem?
From what you explained, I understand that you don't want to traverse the FOLLOWS edge in the opposite direction of the edge. To do so you can use cypher projection in algo.shortestPath:
MATCH (start:Commit {name:'commit4'})
MATCH (end:Document)
CALL algo.shortestPath.stream(start, end, null,{
nodeQuery:'MATCH (n) RETURN id(n) AS id',
relationshipQuery:'MATCH (n)-[r:FOLLOWS|CHANGED]->(p) RETURN id(n) AS source, id(p) AS target UNION MATCH (n)-[r:EQUALS]-(p) RETURN id(n) AS source, id(p) AS target',
graph:'cypher'})
YIELD nodeId, cost
WITH end as document, algo.asNode(nodeId) AS value WHERE "Value" in labels(value)
return document, value
Replace "commit4" with any other commit name.

Cypher merge nodes with same property and collected the other property

I have nodes with this structure
(g:Giocatore { nome, match, nazionale})
(nome:'Del Piero', match:'45343', nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
(nome:'Del Piero', match:'18235', nazionale:'ITA')
The property 'match' is unique (ID's of match) while there are several 'nome' with the same name.
I want to merge all the nodes with the same 'nome' and create a collection of different 'match' like this
(nome:'Del Piero', match:[45343,18235], nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
I tried with apoc library too but nothing works.
Any idea?
Can you try this query :
MATCH (n:Giocatore)
WITH n.nome AS nome, collect(n) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.match) AS matches
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches
Here I'm using APOC to merge the nodes, but then I do a map transformation on the node list to have an array of match, and I set it on the merged node.
I don't know if you have a lot of Giocatore nodes, so perhaps this query will do an OutOfMemory exception, so you will have to batch your query. You can for example replace the first line by MATCH (n:Giocatore) WHERE n.nome STARTS WITH 'A' and repeat it for each letter or you can also use the apoc.periodic.iterate procedure :
CALL apoc.periodic.iterate(
'MATCH (n:Giocatore) WITH n.nome AS nome, collect(n) AS node2Merge RETURN node2Merge, extract(x IN node2Merge | x.match) AS matches',
'CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches',
{batchSize:1000,parallel:true,retries:3,iterateList:true}
) YIELD batches, total

Neo4J Graph Algorithms Cypher Projection should return only numbers?

Hello I make a Graph Algorithm Neo4J request in Cypher of the following kind, which first finds the nodes and then the relations between them:
CALL algo.pageRank.stream('MATCH (u:User{uid:"0ee14110-426a-11e8-9d67-e79789c69fd7"}),
(ctx:Context{name:"news180417"}), (u)<-[:BY]-(c:Concept)-[:AT]->(ctx)
RETURN DISTINCT id(c) as id',
'CALL apoc.index.relationships("TO","user:0ee14110-426a-11e8-9d67-e79789c69fd7")
YIELD rel, start, end WITH DISTINCT rel, start, end MATCH (ctx:Context)
WHERE rel.context = ctx.uid AND (ctx.name="news180417" )
RETURN DISTINCT id(start) AS source, id(end) AS target',
{graph:'cypher', iterations:5});
Which works fine. However, when I try to return c.uid instead of its Neo4J id() the Graph Algorithms don't accept it.
Does it mean I can only operate using Neo4J ids in Graph Algorithms?
When you use Cypher projection with the Graph Algorithms procedures, you pass 2 Cypher statements (and a config map).
The first Cypher statement must return an id variable whose value is the native ID of a node.
The second Cypher statement must return source and target variables whose values are also node IDs.
So, yes, your Cypher statements must always return neo4j native IDs.

Resources