I use neo4j apoc tool to load data from json file which is stored on hdfs , example of the file content like this :
{"id":"572911761","label":"Label1","nickName":"xxx","screenName":"xxx","userType":2}
{"id":"111117971217247","label2":"Label","nickName":"dada","userType":2}
{"id":"111112559184932","label3":"Label","nickName":"Kwok","screenName":"kwok","userType":2}
{"id":"1447694416","label":"Label4","nickName":"Sylar","screenName":"sylar","userType":2}
{"id":"111111154273959","label":"Label2","nickName":"Chan","screenName":"kmuhk","userType":2}
The Field of label is the label of node in neo4j, means I want set Dynamic Label by read Label from the file per line , my input cypher is :
CALL apoc.load.json("hdfs://hdp1:8020/apoc/graph/apoc_graph_20200422202753_0_nodes") yield value
call apoc.merge.node(value.label, {uid:value.uid}, {nickname:value.nickName,screen_name: value.screenName })
and it execute error, the error :
Neo.ClientError.Statement.SyntaxError
Neo.ClientError.Statement.SyntaxError: Query cannot conclude with CALL (must be RETURN or an update clause) (line 2, column 1 (offset: 98))
"call apoc.merge.node(value.label, {uid:value.uid}, {nickname:value.nickName,screen_name: value.screenName })"
^
Ask for help, thanks !
The error message is pretty helpful here, it's saying that the last line of your query must either be some kind of update to the graph (like a CREATE or a MERGE), or otherwise a RETURN statement. Here, we've got a CALL.
To fix, we want to yield the merged node from apoc.merge.node, then return it:
CALL apoc.load.json("hdfs://hdp1:8020/apoc/graph/apoc_graph_20200422202753_0_nodes") yield value
CALL apoc.merge.node(value.label, {uid:value.uid}, {nickname:value.nickName,screen_name: value.screenName }) YIELD node
RETURN node
Related
I'm using cypher and the neo4j browser to create nodes from csv input.
I want to read in each row of my csv file with headers and then create a node with that row as properties.
MY current code is:
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS ROW
WITH ROW
CREATE (n:node $ROW)
This throws an error saying parameter missing.
Try this
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS row
CREATE (n:node)
SET n+= row
In Cypher, variables that start with "$" must be passed to the query as parameters. Your Cypher code is locally binding values to the ROW variable (and not passing a parameter), so change $ROW to ROW.
In addition, if you want to make sure that you do not generate duplicate nodes, you should consider using MERGE instead of CREATE. But before you do so, you must carefully read the documentation on MERGE to understand how to use it properly.
I have a task in hand where for an input job title pairs I have to output how similar these 2 titles are. For similarity I will be taking into account different factors like common relationships between 2 job title nodes etc.
I am currently into the 1st part, i.e getting a best matching job title node for input job title. I am using the full-text-search capability of neo4j to solve this. I have come up with the following query
CALL db.index.fulltext.queryNodes("full-text-job-title", '"software engineer" OR (software~0.7 engineer~0.7)') yield node, score
with collect(node)[..1] as matchedTitles1
CALL db.index.fulltext.queryNodes("full-text-job-title", '"software developer" OR (software~0.7 developer~0.7)') yield node, score
with collect(node)[..1] as matchedTitles2
return matchedTitles1[0], matchedTitles2[0]
It returns the following error
Neo.ClientError.Statement.SyntaxError
Neo.ClientError.Statement.SyntaxError: Variable `matchedTitles1` not defined (line 5, column 8 (offset: 351))
"return matchedTitles1[0], matchedTitles2[0]"
^
I am unable to solve this error. Also I want to return the top matching job-title node for each of the input job title pair. Currently I have come up with this - with collect(node)[..1] as matchedTitles1, but I think there has to be a better way to return the top matching job-title node
Any help will be deeply appreciated.
Your syntax error is because you're not passing through matchedTitles1 in your second WITH statement. Everything from before the WITH that you want to refer to after the WITH needs to be included in the WITH statement.
The following is valid Cypher:
CALL db.index.fulltext.queryNodes("full-text-job-title", '"software engineer" OR (software~0.7 engineer~0.7)') yield node, score
with collect(node)[..1] as matchedTitles1
CALL db.index.fulltext.queryNodes("full-text-job-title", '"software developer" OR (software~0.7 developer~0.7)') yield node, score
with collect(node)[..1] as matchedTitles2, matchedTitles1 // Pass through matchedTitles1 from the first CALL so it's visible in the RETURN
return matchedTitles1[0], matchedTitles2[0]
I try to write a cypher query that extracts a set of labels, that share one specific label. After is selected the labels i try to rename them. Which means add a prefix to each of the labels andrename the labels in the graph with help of apoc.refactor.rename.label. Therefore i wrote the following query.
match (c:TheLabel)
with collect(distinct filter( l in labels(c) where not l in ["UNIQUE IMPORT LABEL","TheLabel"])[0]) as curr_label
unwind curr_label as cl
with cl as cl, "AR_"+cl as nl
call apoc.refactor.rename.label(cl, nl)
return null
But this query fails with the following error message:
Neo.ClientError.Statement.SyntaxError: Procedure call inside a query does not support naming results implicitly (name explicitly using `YIELD` instead) (line 5, column 1 (offset: 214))
"call apoc.refactor.rename.label(cl, nl) return null"
I can't understand where i could use yield to get this query run.
I tried the first part separately i.e. return nl & cl after the with. This works fine. I also tried to use the rename function with one specific cl and cl that i got while trying the first part of the query. That is also working fine. Only the combination seems not to work.
Edit:
I figured out that every unwind seems to break the query never the less if I use the variable that is defined by unwind or not.
Minimal example that produces the same error:
unwind [1,2,3,4] as cl
call apoc.refactor.rename.label("Test", "Test")
return cl
Thanks in advance for any help or solutions.
If a procedure is defined to return any results, then the Cypher language requires that the CALL clause must be paired with a YIELD clause -- even if you don't care about any of the results. The only exception is when the entire Cypher statement consists of just a CALL clause (this is referred to in the docs as a "standalone procedure call").
To quote from the docs:
If the called procedure declares at least one result field, YIELD may
generally not be omitted. However YIELD may always be omitted in a
standalone procedure call. In this case, all result fields are yielded
as newly-bound variables from the procedure call to the user.
Ok after trying around i figured it out. You need to yield at least one field of the return of the call for example:
unwind [1,2,3,4] as cl
call apoc.refactor.rename.label("Test", "Test")
yield total
return null // everything is possible for return.
I don't know why it's working but it works. Maybe it has to do something with the stream that the procedure produces, but I'm really not sure. If somebody knows why it solves my problem please comment.
I'd like to implement Cypher query and using APOC functions remove all of the existing triggers:
I'm trying the following query:
CALL apoc.trigger.list() yield name
CALL apoc.trigger.remove(name) yield name, installed
but it fails with the following error:
Neo.ClientError.Statement.SyntaxError: Query cannot conclude with CALL
(must be RETURN or an update clause) (line 1, column 37 (offset: 36))
"CALL apoc.trigger.list() yield name CALL apoc.trigger.remove(name)
yield name, installed" ^
How to properly implement this query ?
As the error says, a query cannot end with a CALL (unless the CALL is the only statement in the query). It needs either a write operation (MERGE, CREATE, SET, REMOVE, DELETE) or a return.
You can add RETURN name, installed at the end, if you want to return the values yielded by the call. Otherwise, if you really don't care about what is returned, RETURN DISTINCT true ought to do the trick.
Oh, and you may want to alias name in one of your YIELDs or the other, as you may get an error of a variable name conflict.
I have a column in a csv that looks like this:
I am using this code to test how the splitting of the dates is working:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth
return date_of_birth;
This code block works fine and gives me what I'd expect, which is a collection of three values for each date, or perhaps a null if there was no date ( e.g,
[4, 5, 1971]
[0, 0, 2003]
[0, 0, 2005]
. . .
null
null
. . .
My question is, what is this problem with the nulls that are created, and why can't I do a MERGE when there are nulls?
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
date_of_birth: date_of_birth
});
This block above gives me the error:
Cannot merge node using null property value for date_of_birth
I have searched around and have only found one other SO question about this error, which has no answer. Other searches didn't help.
I was under the impression that if there isn't a value, then Neo4j simply doesn't create the element.
I figured maybe the node can't be generated since, after all, how can a node be generated if there is no value to generate it from? So, since I know there are no ID's missing, maybe I could MERGE with ID and date, so Neo4j always sees a value.
But this code didn't fare any better (same error message):
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
ID: line.ID
,date_of_birth: date_of_birth
});
My next idea is that maybe this error is because I'm trying to split a null value on slashes? Maybe the whole issue is due to the SPLIT.
But alas, same error when simplified to this:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH line
MERGE (p:Person {
subject_person_id: line.subject_person_id
,date_of_birth: line.date_of_birth
});
So I don't really understand the cause of the error. Thanks for looking at this.
EDIT
Both #stdob-- and #cybersam have both answered with equally excellent responses, if you came here via Google please consider them as if both were accepted
As #cybersam said merge not work well with queries where the properties are set within the scope in null. So, you can use on create and on match:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
MERGE (p:Person {
subject_person_id: line.subject_person_id
})
ON CREATE SET p.date_of_birth = line.date_of_birth
ON MATCH SET p.date_of_birth = line.date_of_birth
Some Cypher queries, like MERGE, do not work well with NULL values.
The somewhat tricky workaround for handling this situation with MERGE is to use the FOREACH clause to conditionally perform the MERGE. This query might work for you:
LOAD CSV WITH HEADERS FROM 'file:///..some_csv.csv' AS line
FOREACH (x IN CASE WHEN line.date_of_birth IS NULL THEN [] ELSE [1] END |
MERGE (:Person {date_of_birth: SPLIT(line.date_of_birth, '/')})
);
Another solution that I've been rather fond of is to just tell cypher to skip rows in which the field of interest is NULL as follows:
USING PERIODIC COMMIT #
LOAD CSV WITH HEADERS FROM
'file:///.../csv.csv' AS line
WITH line, SPLIT(line.somedatefield, delimiter) AS date
WHERE NOT line.somedatefield IS NULL
[THE REST OF YOUR QUERY INVOLVING THE FIELD]
Or you can use COALESCE(n.property?, {defaultValue})
Following with Vojtech Ruzicka's approach, you can use something like this your_value:COALESCE(line.your_value, 'default value')
Link to the documentation here, in case you need more information.