Creating relationships based on nested list - neo4j

I'm building a graph based and some of the relationships are based on information in nested lists. The relevant nodes are (b:Bundle) and (o:Object); the bundles require certain objects with different quantities and qualities. The nested list that contains these requirements has the format [ [object1, quantity1, quality1], [object2, quantity2, quality2], ... ]
but in the .csv file that I'm using the field has the format
o1,qn1,ql1|o2,qn2,ql2|... The relationship I want to create is
(b)-[r:REQUIRES {quantity, quality}]->(o).
I've tried using various combinations of SPLIT, UNWIND, and FOREACH. A minimal example from my data set:
id: 1
requirements: 24,1,0|188,1,0|190,1,0|192,1,0
That is to say, (b:Bundle {id:1}) -[r:REQUIRES {quantity:1, quality:0}]-> (o:Object {id:24}) and so on.
LOAD CSV WITH HEADERS FROM 'file:///bundles.csv' AS line
WITH SPLIT( UNWIND SPLIT ( line.requirements, '|' ), ',') as reqList
MATCH ( o:Object { id:TOINTEGER(reqList[0]) } )
MATCH ( b:Bundle { id:TOINTEGER(line.id) } )
MERGE (b) -[r:REQUIRES]-> (o)
ON CREATE SET r.quantity = TOINTEGER(reqList[1]),
r.quality = TOINTEGER(reqList[2]);
The error this query gives is
Neo.ClientError.Statement.SyntaxError: Invalid input 'P': expected 't/T' (line 2, column 22 (offset: 78))
" WITH SPLIT( UNWIND SPLIT ( line.requirements, '|' ), ',') as reqList"
^

Assuming your CSV file actually looks like this:
id requirements
1 24,1,0|188,1,0|190,1,0|192,1,0
then this query should work:
LOAD CSV WITH HEADERS FROM 'file:///bundles.csv' AS line FIELDTERMINATOR ' '
WITH line.id AS id, SPLIT(line.requirements, '|' ) AS reqsList
UNWIND reqsList AS reqsString
WITH id, SPLIT(reqsString, ',') AS reqs
MATCH ( o:Object { id:TOINTEGER(reqs[0]) } )
MATCH ( b:Bundle { id:TOINTEGER(id) } )
MERGE (b) -[r:REQUIRES]-> (o)
ON CREATE SET r.quantity = TOINTEGER(reqs[1]),
r.quality = TOINTEGER(reqs[2]);

Related

How I need to use these CALLs correctly?

I've used first CALL after LOAD CSV for splitting transactions and second for creating relations between two types of nodes - Legal_Entity and Natural_Person. SYSTEM node is my analog for dicts contains associations between numbers - numeric interpretations of relations - and text of these relations.
In relation to is relation type for Legal_Entity or for Natural_Person I need to connect others nodes (a.e., if code is 100 it means LEGAL-LEGAL connection with text "has a division". If code is 110 it means LEGAL-PHYSICAL connection with text "founded by").
Now I need to make a request that would determine which type of node the connection is being built to and build it accordingly.
If it's needed I can add more clarify info for used data.
:auto LOAD CSV WITH HEADERS FROM 'file:///CSVs/связи_фикс_все.csv' as row
call {
with row
match (legal:Legal_Entity {hid_party: row['first_related_hid']})
match (sys:SYSTEM)
with sys, legal,
row,
sys['type_' + tostring(row['id_relation_type'])] as relation_data,
sys[row['second_related_type']] as rel_type
CALL {
WITH rel_type, relation_data, row, legal
WITH rel_type, relation_data, row, legal
WHERE rel_type = 'Legal_Entity'
match (legal_2:Legal_Entity {hid_party : row['second_related_hid']})
CALL apoc.create.relationship(legal, relation_data[1], NULL, legal_2) YIELD rel1
CALL apoc.create.relationship(legal_2, relation_data[3], NULL, legal) YIELD rel2
return rel_type as rel_type_2
UNION
WITH rel_type, relation_data, row, legal
WITH rel_type, relation_data, row, legal
WHERE rel_type = 'Natural_Person'
match (natural:Legal_Entity {hid_party : row['second_related_hid']})
CALL apoc.create.relationship(legal, relation_data[1], NULL, natural) YIELD rel1
CALL apoc.create.relationship(natural, relation_data[3], NULL, legal) YIELD rel2
return rel_type as rel_type_2
} return rel_type_2
} IN TRANSACTIONS of 10 rows
The subqueries seem to be behaving weird when you have them nested. You can simply use PERIODIC COMMIT if you want to import data in batches and get rid of the top-level subquery.
:auto USING PERIODIC COMMIT 10
LOAD CSV WITH HEADERS FROM 'file:///CSVs/связи_фикс_все.csv' as row
with row
MERGE (legal:Legal_Entity {hid_party: row['first_related_hid']})
MERGE (sys:SYSTEM)
with sys, legal,
row,
sys['type_' + tostring(row['id_relation_type'])] as relation_data,
sys[row['second_related_type']] as rel_type
CALL {
WITH rel_type, relation_data, row, legal
WITH rel_type, relation_data, row, legal
WHERE rel_type = 'Legal_Entity'
match (legal_2:Legal_Entity {hid_party : row['second_related_hid']})
CALL apoc.create.relationship(legal, relation_data[1], NULL, legal_2) YIELD rel AS rel1
CALL apoc.create.relationship(legal_2, relation_data[3], NULL, legal) YIELD rel AS rel2
return rel_type as rel_type_2
UNION
WITH rel_type, relation_data, row, legal
WITH rel_type, relation_data, row, legal
WHERE rel_type = 'Natural_Person'
match (natural:Legal_Entity {hid_party : row['second_related_hid']})
CALL apoc.create.relationship(legal, relation_data[1], NULL, natural) YIELD rel AS rel1
CALL apoc.create.relationship(natural, relation_data[3], NULL, legal) YIELD rel AS rel2
return rel_type as rel_type_2
}
RETURN distinct 'done'

Toggle relationship in Neo4J

I'm trying to implement follow/unfollow in Neo4J. I would like to write a query would toggle the relationship between two nodes.
I currently have the following query:
neoSession.writeTransaction(tx => tx.run('MATCH (me:User), (other:User) WHERE ID(me) = $me AND ID(other) = $other OPTIONAL MATCH (me)-[af:FOLLOWS]->(other) CALL apoc.do.when(af IS NULL, CREATE (me)-[f:FOLLOWS]->(other), DELETE af)', { me: req.user_id, other: req.body.user, datetime: Date.now() }));
Prettified query-only:
MATCH (me:User), (other:User)
WHERE ID(me) = $me AND ID(other) = $other
OPTIONAL MATCH (me)-[af:FOLLOWS]->(other)
CALL
apoc.do.when(
af IS NULL,
CREATE (me)-[f:FOLLOWS]->(other),
DELETE af
)
But this results in the error
Neo4jError: Invalid input '>' (line 1, column 169 (offset: 168))
"MATCH (me:User), (other:User) WHERE ID(me) = $me AND ID(other) = $other OPTIONAL MATCH (me)-[af:FOLLOWS]->(other) CALL apoc.do.when(af IS NULL, CREATE (me)-[f:FOLLOWS]->(other), DELETE af)"
The queries (last two arguments) to apoc.do.when() have to be strings, so quote each of them.
Also, in order for each of those queries to use those variables, you need to pass those variables in a parameter map as a 4th argument.
Each of the conditional queries must RETURN something, otherwise there will be no rows yielded and anything after would be a no-op.
The call must YIELD value, so that needs to be present, and last, a query cannot end with a procedure call, so you need to RETURN something.
This one should work, you can adjust it as needed:
MATCH (me:User), (other:User)
WHERE ID(me) = $me AND ID(other) = $other
OPTIONAL MATCH (me)-[af:FOLLOWS]->(other)
CALL
apoc.do.when(
af IS NULL,
"CREATE (me)-[f:FOLLOWS]->(other) RETURN f",
"DELETE af RETURN null as f",
{me:me, af:af}
) YIELD value
RETURN value.f as f

Handling empty array types when using APOC to import a CSV to neo4j

I have a csv file wherein some fields are array-types. Fields are separated with , and array items are separated with ;. For example:
index, name, friends, neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,
where Jim has three friends, John, Tim, and Fred, and three neighbors, Susan, Megan, and Cheryl, and Sean has no friends and no neighbors.
However, when I read this into neo4j using apoc.load.csv, I end up with list properties with empty strings inside of them (rather than empty lists). For example:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('file.csv',
{header:true,sep:',',
mapping:{
friends:{array:true},
neighbors:{array:true}}
})
YIELD map as row RETURN row
","
CREATE (p:Person) SET p = row
",
{batchsize:50000, iterateList:true, parallel:true});
Gives me a Person with name Sean but with friends=[ "" ] and neighbors=[ "" ].
What I want is Sean to have friends=[] and neighbors=[].
Thank you!
Make sure there are no extraneous spaces in your CSV file header (or else some property names will start with a space):
index,name,friends,neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,
Use list comprehension to help eliminate all friends and neighbors elements that are empty strings:
CALL apoc.periodic.iterate(
"CALL apoc.load.csv(
'file.csv',
{
header:true, sep:',',
mapping: {
friends: {array: true},
neighbors: {array: true}
}
}) YIELD map
RETURN map
",
"CREATE (p:Person)
SET p = map
SET p.friends = [f IN p.friends WHERE f <> '']
SET p.neighbors = [n IN p.neighbors WHERE n <> '']
",
{batchsize:50000, iterateList:true, parallel:true}
);
With the above changes, this query:
MATCH (person:Person) RETURN person;
returns this result:
╒══════════════════════════════════════════════════════════════════════╕
│"person" │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Jim","index":"0","neighbors":["Susan","Megan","Cheryl"],"frie│
│nds":["John","Tim","Fred"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Susan","index":"1","neighbors":["Megan","Cheryl"],"friends":[│
│"Jim","John"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Sean","index":"2","neighbors":[],"friends":[]} │
└──────────────────────────────────────────────────────────────────────┘
[UPDATED]
Also, if it is not possible for your CSV file to contain an "empty" friend or neighbor substring (e.g., John;;Fred), then this version of the query that uses CASE instead of list comprehension would be more efficient:
CALL apoc.periodic.iterate(
"CALL apoc.load.csv(
'file.csv',
{
header:true, sep:',',
mapping: {
friends: {array: true},
neighbors: {array: true, arraySep:';'}
}
}) YIELD map
RETURN map
",
"CREATE (p:Person)
SET p = map
SET p.friends = CASE p.friends WHEN [''] THEN [] ELSE p.friends END
SET p.neighbors = CASE p.neighbors WHEN [''] THEN [] ELSE p.neighbors END
",
{batchsize:50000, iterateList:true, parallel:true}
);

Node creation using cypher Foreach

I have 2 csv files and their sructure is as follows:
1.csv
id name age
1 aa 23
2 bb 24
2.csv
id product location
1 apple CA
2 samsung PA
1 HTC AR
2 philips CA
3 sony AR
// 1.csv
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
CREATE (a:first { id: toInt(csvLine.id), name: csvLine.name, age: csvLine.age})
// 2.csv
LOAD CSV WITH HEADERS FROM "file:///G:/2.csv" AS csvLine
CREATE (b:second { id: toInt(csvLine.id), product: csvLine.product, location: csvLine.location})
Now i want to create another node called "third", using the following cypher query.
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
But the above query give me duplicate entries. I think here it runs 2 time after the load. Please suggest or any alternative way for this.
CREATE always creates, even if something is already there. So that's why you're getting duplicates. You probably want MERGE which only creates an item if it doesn't already exist.
I wouldn't ever do CREATE (e:third) or MERGE (e:third) because without specifying properties, you'll end up with duplicates anyway. I'd change this:
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
To this:
MERGE (e:third { name: label(a) + label(b) + "id",
origin: label(a),
destination: label(b),
param: a.id })
This then would create the same node when necessary, but avoid creating duplicates with all the same property values.
Here's the documentation on MERGE
You don't use csvLine at all for matching the :first and :second node!
So your query doesn't make sense
This doesn't make sense either:
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
c are paths with a single node, i.e. (a)
so instead of the foreach you would use a directly instead

Neo4j: Conditional return/IF clause/String manipulation

This is in continuation of Neo4j: Listing node labels
I am constructing a dynamic MATCH statement to return the hierarchy structure & use the output as a Neo4j JDBC input to query the data from a java method:
MATCH p=(:Service)<-[*]-(:Anomaly)
WITH head(nodes(p)) AS Service, p, count(p) AS cnt
RETURN DISTINCT Service.company_id, Service.company_site_id,
"MATCH srvhier=(" +
reduce(labels = "", n IN nodes(p) | labels + labels(n)[0] +
"<-[:BELONGS_TO]-") + ") WHERE Service.company_id = {1} AND
Service.company_site_id = {2} AND Anomaly.name={3} RETURN " +
reduce(labels = "", n IN nodes(p) | labels + labels(n)[0] + ".name,");
The output is as follows:
MATCH srvhier=(Service<-[:BELONGS_TO]-Category<-[:BELONGS_TO]-SubService<-
[:BELONGS_TO]-Assets<-[:BELONGS_TO]-Anomaly<-[:BELONGS_TO]-) WHERE
Service.company_id = {1} and Service.company_site_id = {21} and
Anomaly.name={3} RETURN Service.name, Category.name, SubService.name,
Assets.name, Anomaly.name,
The problem I am seeing:
The "BELONGS_TO" gets appended to my last node
Line 2: Assets<-[:BELONGS_TO]-Anomaly**<-[:BELONGS_TO]-**
Are there string functions (I have looked at Substring..) that can be used to remove it? Or can I use a CASE statement with condition n=cnt to append "BELONGS_TO"?
The same problem persists with my last line:
Line 5: Assets.name,Anomaly.name**,** - the additional "," that I need to eliminate.
Thanks.
I think you need to introduce a case statement into the reduce clause something like this snippet below. If the node isn't the last element of the collection then append the "<-[:BELONGS_TO]-" relationship. If it is the last element then don't append it.
...
reduce(labels = "", n IN nodes(p) |
CASE
WHEN n <> nodes(p)[length(nodes(p))-1] THEN
labels + labels(n)[0] + "<-[:BELONGS_TO]-"
ELSE
labels + labels(n)[0]
END
...
Cypher has a substring function that works basically like you'd expect. An example: here's how you'd return everything but the last three characters of a string:
return substring("hello", 0, length("hello")-3);
(That returns "he")
So you could use substring to trim the last separator off of your query that you don't want.
But I don't understand why you're building your query in such a complex way; you're using cypher to write cypher (which is OK) but (and I don't understand your data model 100%) it seems to me like there's probably an easier way to write this query.

Resources