How to create a parameter on a node with the name of the parameter being a value from the file being imported with Cypher/ APOC? - neo4j

I am new to neo4j and am trying to import some json formatted data.
I got the first steps of reading all json files and turning some data into nodes and edges. How to create a parameter of a node on the fly, I could not figure out.
SET j[issn.type] = issn.value should create a new parameter on j with the name of the value found in the json data and give it the value issn.value. The latter should be fine, but j[issn.type] does not seem to work.
How do I achieve this?
Thanks
Full Query
call apoc.load.directory("*.json") yield value as files unwind files as file
CALL apoc.load.json(file) YIELD value as object
UNWIND object.items AS entry
MERGE (p:Publisher {name: entry.publisher})
MERGE (j:Journal {name: entry.`container-title`})
ON CREATE SET j.created = timestamp()
FOREACH (issn IN entry.`issn-type` |
SET j[issn.type] = issn.value
)
MERGE (p)-[r:publishes]->(j)
ON CREATE SET r.created = timestamp()
RETURN p

The syntax SET j[issn.type] = issn.value to set the properties in Cypher, is not yet supported. To achieve this use apoc.create.setProperty function, something like this:
call apoc.load.directory("*.json") yield value as files
unwind files as file
CALL apoc.load.json(file) YIELD value as object
UNWIND object.items AS entry
MERGE (p:Publisher {name: entry.publisher})
MERGE (j:Journal {name: entry.`container-title`})
ON CREATE SET j.created = timestamp()
WITH p, j, entry
UNWIND entry.`issn-type` AS issn
CALL apoc.create.setProperty(j, issn.type, issn.value)
YIELD node
MERGE (p)-[r:publishes]->(node)
ON CREATE SET r.created = timestamp()
RETURN p
Here's the documentation link. I am assuming in the query that entry.issn_type is some sort of list, that's why I am unwinding it, because we can't call apoc functions from within a FOREACH loop.

Related

Creating property-less nodes in Neo4j

I have a schema like (:A)-[:TYPE_1]-(:B)-[:TYPE_2]-(:A). I need to link [:TYPE_1] and [:TYPE_2] Relationships to certain other Nodes (Say, types C,D,E etc.). I had to create some Nodes without any properties, like (:A)-[:TYPE_1]-(:Action)--(:B)--(:Action)-[:TYPE_2]-(:A). The only purpose of the (:Action) Nodes is to enable me to link the action to some other Nodes (because I can't link a relationship to a Node). Thus, there are no properties associated with them. Since I changed my schema, I am finding that MERGE queries have slowed down incredibly. Obviously, I can't index the (:Action) Nodes, but all other Indexes are in place. What could be going wrong?
Edit:
My logic is that 1) There are multiple csv files 2) Each row in each file provides one (a1:A)-[:TYPE_1]-(type_1:Action)--(b:B)--(type_2:Action)-[:TYPE_2]-(a2:A) pattern. 3) Two different files may provide the same a1,a2 and b entities. 4) However, if the file pertains to a1, it will give qualifiers for type_1 and if the file pertains to a2, it will give qualifiers for type_2. 5) Hence, I do an OPTIONAL MATCH to see if the pattern exists. 6) If it doesn't, I create the pattern, qualifying either type_1, or type_2 based on a parameter in the row called qualifier, which can be type_1 or type_2. 7) If it does, then I just qualify the type_1 or type_2 as the case may be.
statement = """
MERGE (file:File {id:$file})
WITH file
UNWIND $rows as row
MERGE (a1:A {id:row.a1})
ON CREATE
SET a1.name=row.a1_name
MERGE (a2:A {id:row.a2})
ON CREATE
SET a2.name=row.a2_name
MERGE (b:B {id:row.b})
ON CREATE
SET b.name = row.b_name,
MERGE (c:C {id:row.c})
MERGE (d:D {id:row.d})
MERGE (e:E {id:row.e})
MERGE (b)-[:FROM_FILE]->(file)
WITH b,c,d,e,a1,a2,row
OPTIONAL MATCH (a1)-[:TYPE_1]->(type_1:Action)-[:INITIATED]->(b)<-[:INITIATED]-(type_2:Action)<-[:TYPE_2]-(a2)
WITH a1,b,a2,row,c,d,e,type_1,type_2
CALL apoc.do.when(type_1 is null,
"WITH a1,b,a2,row,c,d,e
CALL apoc.do.when(row.qualifier = 'type1',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e})
YIELD value
RETURN value",
"
WITH row,c,d,e,type_1,type_2
CALL apoc.do.when(row.qualifier = 'type1',
'MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,type_1:type_1,type_2:type_2,c:c,d:d,e:e})
YIELD value
RETURN value",
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e,type_1:type_1,type_2:type_2})
YIELD value
RETURN count(*) as count
"""
params = []
for row in df.itertuples():
params_dict = {'a1': row[1], 'a1_name': row[-3],'a2':row[2],'a2_name':row[-4],'b_name':row[3],'b':row[-2],'c':int(row[6]),'d':row[7],'e':row[5],'qualifier':row[-1]}
params.append(params_dict)
if row[0] % 5000 == 0:
graph.run(statement, parameters = {"rows" : params,'file':file})
params = []
graph.run(statement, parameters = {"rows" : params,'file':file})
It's hard to say exactly what the issue is but I do notice that you use MERGE a bit more than you actually need to. In your apoc.do.when call you call
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
even though you know that you just created type_1 and type_2 so none of the relationships exist. If you change that to a CREATE you should see a speedup. The same logic applies to the other MERGE calls in that statement.

apoc.load.jdbc check row property before create

I'm using Apoc.load.jdbc to get data from Oracle Database and create row from it, here is the code:
call apoc.load.driver('oracle.jdbc.driver.OracleDriver')
WITH "jdbc:oracle:thin:#10.82.14.170:1521/ORACLE" as url
CALL apoc.load.jdbc(url,"select * from Patients",[],{credentials:{user:'KCB',password:'123'}}) YIELD row
Create (p:Person) set p=row
return p
That code work fine but I want to check row property before create it. Such as:
If (row.ID!=p.ID)
{
set p=row
}
Else{Not set}
How can I do that with my code? Thanks a lot!
As #TomažBratanič mentions in his answer, your desired conditional check makes no sense. That is, unless you also replace your CREATE clause.
Your query uses CREATE to always create a new p with no properties. So row.ID <> p.ID will always be true, and you'd always be executing the SET clause.
However, I believe your real intention is to avoid changing an existing Person node (and to avoid creating a duplicate Person node for the same person). So, below is a query that uses MERGE and ON CREATE to do that. I assume that people have unique ID values.
CALL apoc.load.driver('oracle.jdbc.driver.OracleDriver')
CALL apoc.load.jdbc(
"jdbc:oracle:thin:#10.82.14.170:1521/ORACLE",
"select * from Patients",[],{credentials:{user:'KCB',password:'123'}}
) YIELD row
MERGE (p:Person {ID: row.ID})
ON CREATE SET p = row
RETURN p
Also, you should consider creating an index (or uniqueness constraint) on :Person(ID) to optimize the lookup of existing Person nodes.
You can use a CASE statement to achieve this:
call apoc.load.driver('oracle.jdbc.driver.OracleDriver')
WITH "jdbc:oracle:thin:#10.82.14.170:1521/ORACLE" as url
CALL apoc.load.jdbc(url,"select * from Patients",[],{credentials:{user:'KCB',password:'123'}}) YIELD row
Create (p:Person) set p = CASE WHEN row.ID <> p.id THEN row ELSE null END
return p
However, this statement does not make sense, because you always create a new Person, so the row.ID will never be the same as p.id.

Neo4j/Cypher Create nodes from multiple nested json

I am trying to create graph from the sample data below. I am new to cypher and learned new things from tutorial and stack help. I am stuck at the problem below. I am trying to create nodes from nested arrays for multiple properties.
Following the link: UNWIND multiple unrelated arrays loaded from JSON file
Sample Data:
[ { 'organization': ['MIT','Univ. of CT'],
'student_names': ['Adam Smith'],
'unique_id': 'ABC123'},
{ 'organization': ['Harvard'],
'student_names': ['Adam Smith', 'Cate Scott'],
'unique_id': 'ABC124'},
{ 'organization': ['Harvard'],
'student_names': ['Mandy T.', 'Bob Smith'],
'unique_id': 'ABC125'}]
Here is what I tried:
CALL apoc.load.json('file:///test2.json') YIELD value AS class
MERGE (c:Class {name: class.name})
SET
c.organization = class.organization,
c.student_names = class.student_names
WITH c, class
UNWIND class.organization AS org
MERGE (o:Organization {name: org})
MERGE (o)-[:ACCEPTED]->(c)
WITH c, class
UNWIND class.student_names AS student
MERGE (s:StudentName {name: student})
MERGE (s)-[:ATTENDS]->(o)
I keep getting error Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for name. I don't see any null values in the data. What is causing this? How can I fix this? Thanks!!!
MERGE doesn't work if the property on which you merging has a null value.
Here, with MERGE (c:Class {name: class.name}) You are trying merge Class node on the property name but there is no such a property in the json.
I guess you wanted to merge this on unique_id property. So you can change your as
MERGE (c:Class {unique_id: class.unique_id})
Rest of the query looks ok.

Create node and relationship given parent node

I am creating a word tree but when I execute this cypher query:
word = "MATCH {} MERGE {}-[:contains]->(w:WORD {{name:'{}'}}) RETURN w"
.format(parent_node, parent_node, locality[i])
where parent_node has a type Node
It throws this error:
py2neo.cypher.error.statement.InvalidSyntax: Can't create `n8823` with properties or labels here. It already exists in this context
formatted query looks like this:
'MATCH (n8823:HEAD {name:"sanjay"}) MERGE (n8823:HEAD {name:"sanjay"})-[:contains]->(w:WORD {name:\'colony\'}) RETURN w'
The formatted query is broken and won't work, but I also don't see how that could be what the formatted query actually looks like. When you do your string format you pass the same parameter (parent_node) twice so the final string should repeat whatever that parameter looks like. It doesn't, and instead has two different patterns for the match and merge clauses.
Your query should look something like
MATCH (n8823:Head {name: "sanjay"})
MERGE (n8823)-[:CONTAINS]->(w:Word {name: "colony"})
RETURN w
It's probably a bad idea to do string formatting on a Node object. Better to either use property values from your node object in a Cypher query to match the right node (and only the variable that you bind the matched node to in the merge clause) or use the methods of the node object to do the merge.
Although the MERGE clause is able to bind identifiers (like n8823), Cypher unfortunately does not allow MERGE to re-bind an identifier that had already been bound -- even if it would not actually change the binding. (On the other hand, the MATCH clause does allow "rebinding" to the same binding.) Simply re-using a bound identifier is OK, though.
So, the workaround is to change your Cypher query to re-use the bound identifier. Also, the recommended way to dynamically specify query data without changing the overall structure of a query is to use "query parameters". For py2neo, code along these lines should work for you (note that the parent_name variable would contain a name string, like "sanjay"):
from py2neo import Graph
graph = Graph()
cypher = graph.cypher
results = cypher.execute(
"MATCH (foo:{name:{a}}) MERGE (foo)-[:contains]->(w:WORD {{name:'{b}'}}) RETURN w",
a=parent_name, b=locality[i])

Neo4j merge return something only if it was created

Neo4j's merge will create new node if it doesn't exist. And it has ON CREATE and ON MATCH as two distinctions. However, is there a way to return different information if the node was created as to if the node was matched?
MERGE (charlie { name:'Charlie Sheen' })
ON CREATE SET charlie.name = 'Charlie'
RETURN charlie
Something like: ON CREATE RETURN 1, ON MERGE RETURN 0
There's a relevant example on the merge page of the documentation:
MERGE (keanu:Person { name:'Keanu Reeves' })
ON CREATE SET keanu.created = timestamp()
ON MATCH SET keanu.lastSeen = timestamp()
RETURN keanu.name, has(keanu.lastSeen);
Basically this stores your 0 or 1 flag in the existence or absence of the "lastSeen" attribute.
Your query has this oddity that you're matching on "Charlie Sheen" but then modifying the value you matched (name) to be "Charlie". That's odd -- it means that because you're modifying it each time, even if you add an ON MATCH clause, it will never fire. You'll create it new each time, then modify it, guaranteeing it will be created new the next time you run that query. I would guess that it's a bad idea to modify the terms of your match later in the query, unless it's a special circumstance.

Resources