Mass delete empty properties in a Neo4j database - neo4j

I have a Neo4j database with 100M nodes. A lot of those nodes contain empty properties and I would like to remove these properties.
I have tried the following query:
:auto MATCH (n)
WITH n
call { with n
UNWIND keys(n) as k
WITH n, k
WHERE n[k] = ''
WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys)
YIELD node
RETURN node
} in transactions of 50000 rows;
I get the following error message:
Query cannot conclude with CALL (must be a RETURN clause, an update clause, a unit subquery call, or a procedure call with no YIELD) (line 3, column 1 (offset: 19))
"call { with n"
^
Can someone tell me what I'm doing wrong and how to fix that?
Thanks for your help !

I propose a counter solution to your query. Below is using apoc iterate function which will extract the data with empty property and execute the removal of the property by batch (50k) in parallel.
CALL apoc.periodic.iterate(
"MATCH (n) UNWIND keys(n) as k WITH n, k WHERE n[k] = '' RETURN n, k",
"WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys) YIELD node
RETURN node",
{batchSize:50000, parallel:true})
To explain the error you are getting, a subquery call cannot use another call that requires a yield function. It is mentioned in the error message at the end
qoute: Query cannot conclude with CALL with another procedure call with
YIELD

Related

Call APOC procedure in apoc.periodic.iterate

I am trying to add new relationships based on a property of existing relationships with the apoc.create.relationship function:
:auto CALL apoc.periodic.iterate(
"MATCH (source:Entity)-[r:TEMP_RELATION]->(target:Entity) RETURN source, r, target",
"CALL apoc.create.relationship(source, r.`Interaction-type`, r, target) YIELD rel RETURN rel",
{batchSize:5}
);
When I run this query I get Java heap errors (max heap is 8g). It looks like iterate is not actually iterating but loading too much into memory. I use Neo4j 4.4.8 on a Mac (M1).
Any ideas why there is a memory leak here?
Since neo4j 4 the behavior changed, when you pass a node or relationship to a separate transaction/statement it carries along it's own transaction where it originated from.
So all updates are accumulated on that original transaction.
To avoid that you have to "rebind" the nodes and rels best by returning id(n) as id or id(r) as relId
Then you can re-match the node and rels in the update statement: WHERE id(n) = id and use it from there.
In your example:
:auto CALL apoc.periodic.iterate(
"MATCH (source:Entity)-[r:TEMP_RELATION]->(target:Entity) RETURN id(source) as sId, properties(r) as props, r.`Interaction-type` as type, id(target) as tId",
"MATCH (source), (target) where id(source) = sId AND id(target) = tId
CALL apoc.create.relationship(source, type, props, target) YIELD rel RETURN count(*)",
{batchSize:10000}
);

Cypher: return a string if no match using fulltext search

I have a Cypher query
CALL db.index.fulltext.queryNodes('myIndex', 'coding')
YIELD node
RETURN node
which returns the coding node if the index actually matches with any existing nodes, and return null if there is no match.
Instead of returning a null value if there is no match, I want to return a string or a message like No match found.
I was thinking I can combine apoc.when() like
CALL db.index.fulltext.queryNodes('myIndex', 'coding')
YIELD node
WITH node
CALL apoc.when(node is not null, 'RETURN node', 'RETURN "No match found"', {node:node})
but I get an error
Query cannot conclude with CALL (must be RETURN or an update clause) (line 5, column 1 (offset: 77))
"CALL apoc.when(node is not null, 'RETURN node', 'RETURN "No match found"', {node:node})"
I tried adding
YIELD value
RETURN value
at the end of the statement, but it does not return the message when there is no match and works as if the apoc.when() is not used.
You can simply use coalesce:
CALL db.index.fulltext.queryNodes('myIndex', 'coding')
YIELD node
RETURN coalesce(node, "No match found") as result
The problem was that after YIELD returns null, all the operations after that are not computed (similar to how MATCH vs OPTIONAL MATCH work).
I was able to overcome this by creating a collection, unwind with case, and coalesce as Tomaz suggested in his answer.
CALL db.index.fulltext.queryNodes('myIndex', 'coding')
YIELD node
WITH collect(node) as nodes
UNWIND (CASE nodes WHEN [] then [null] else nodes end) as n
RETURN coalesce(n, "No match found")

Unexpected Behavior Chaining Any() and None() in Neo4j

What I'm trying to get is nodes having certain property values for any property name (key) and not having some other values for any property.
So in short, pseudo-google query be like:
+Tom +val1 +val2 (...) -Cruise -valX -valY (...)
and Cypher query be like:
MATCH (n) WHERE (
ANY ( p in KEYS(n) WHERE n[p] CONTAINS 'Tom' ) AND
NONE ( p in KEYS(n) WHERE n[p] CONTAINS 'Cruise')
)
RETURN n
But the test result with movie database (:play movie graph) was just an empty list, while there are other actors named 'Tom' in the database, such as Tom Hanks.
(match (n) where (any( p in KEYS(n) WHERE n[p] contains 'Tom')) return n
gives [Tom Tykwer, Tom Hanks, Tom Cruise, Tom Skerritt])
So I experimented with 'om' instead of 'Tom', and this time, the result is a incomplete list of 'om's:
match (n) where (
any( p in KEYS(n) WHERE n[p] contains 'om') and
none( p in Keys(n) WHERE n[p] contains 'Cruise')
)
return n
gives
[Romantic (genre), Naomie Harris, James Thompson, Jessica Thompson]
(No Tom's -- why?)
Also tried NOT ANY() in place of NONE() and had same results.
Where does this inconsistency come from?
#stdob-- offers an accurate explanation of the issue.
But there are simpler workarounds. For instance, you can use the COALESCE function() to force a NULL value to be treated as FALSE:
MATCH (n)
WHERE
ANY ( p in KEYS(n) WHERE n[p] CONTAINS 'Tom' ) AND
NONE( p in KEYS(n) WHERE COALESCE(n[p] CONTAINS 'Cruise', FALSE))
RETURN n
The problem is that nodes have properties with a type other than string. And for them, NONE-verification gives null, which gives an error for where entirely. For example, this query return nothing:
WITH {k1: 1, k2: '2'} AS test
WHERE NONE(key IN keys(test) WHERE test[key] CONTAINS '1')
RETURN test
So in this case you need to check the type of the property. Since there is no native type-checking function, you can use the function from the APOC library:
MATCH (n) WHERE (
ANY(p in KEYS(n) WHERE apoc.meta.cypher.type(n[p]) = 'STRING' AND n[p] CONTAINS 'Tom') AND
NONE(p in KEYS(n) WHERE apoc.meta.cypher.type(n[p]) = 'STRING' AND n[p] CONTAINS 'Cruise')
)
RETURN n

Cypher merge nodes with same property and collected the other property

I have nodes with this structure
(g:Giocatore { nome, match, nazionale})
(nome:'Del Piero', match:'45343', nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
(nome:'Del Piero', match:'18235', nazionale:'ITA')
The property 'match' is unique (ID's of match) while there are several 'nome' with the same name.
I want to merge all the nodes with the same 'nome' and create a collection of different 'match' like this
(nome:'Del Piero', match:[45343,18235], nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
I tried with apoc library too but nothing works.
Any idea?
Can you try this query :
MATCH (n:Giocatore)
WITH n.nome AS nome, collect(n) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.match) AS matches
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches
Here I'm using APOC to merge the nodes, but then I do a map transformation on the node list to have an array of match, and I set it on the merged node.
I don't know if you have a lot of Giocatore nodes, so perhaps this query will do an OutOfMemory exception, so you will have to batch your query. You can for example replace the first line by MATCH (n:Giocatore) WHERE n.nome STARTS WITH 'A' and repeat it for each letter or you can also use the apoc.periodic.iterate procedure :
CALL apoc.periodic.iterate(
'MATCH (n:Giocatore) WITH n.nome AS nome, collect(n) AS node2Merge RETURN node2Merge, extract(x IN node2Merge | x.match) AS matches',
'CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches',
{batchSize:1000,parallel:true,retries:3,iterateList:true}
) YIELD batches, total

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?
Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.
You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .
Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

Resources