Creating property-less nodes in Neo4j - neo4j

I have a schema like (:A)-[:TYPE_1]-(:B)-[:TYPE_2]-(:A). I need to link [:TYPE_1] and [:TYPE_2] Relationships to certain other Nodes (Say, types C,D,E etc.). I had to create some Nodes without any properties, like (:A)-[:TYPE_1]-(:Action)--(:B)--(:Action)-[:TYPE_2]-(:A). The only purpose of the (:Action) Nodes is to enable me to link the action to some other Nodes (because I can't link a relationship to a Node). Thus, there are no properties associated with them. Since I changed my schema, I am finding that MERGE queries have slowed down incredibly. Obviously, I can't index the (:Action) Nodes, but all other Indexes are in place. What could be going wrong?
Edit:
My logic is that 1) There are multiple csv files 2) Each row in each file provides one (a1:A)-[:TYPE_1]-(type_1:Action)--(b:B)--(type_2:Action)-[:TYPE_2]-(a2:A) pattern. 3) Two different files may provide the same a1,a2 and b entities. 4) However, if the file pertains to a1, it will give qualifiers for type_1 and if the file pertains to a2, it will give qualifiers for type_2. 5) Hence, I do an OPTIONAL MATCH to see if the pattern exists. 6) If it doesn't, I create the pattern, qualifying either type_1, or type_2 based on a parameter in the row called qualifier, which can be type_1 or type_2. 7) If it does, then I just qualify the type_1 or type_2 as the case may be.
statement = """
MERGE (file:File {id:$file})
WITH file
UNWIND $rows as row
MERGE (a1:A {id:row.a1})
ON CREATE
SET a1.name=row.a1_name
MERGE (a2:A {id:row.a2})
ON CREATE
SET a2.name=row.a2_name
MERGE (b:B {id:row.b})
ON CREATE
SET b.name = row.b_name,
MERGE (c:C {id:row.c})
MERGE (d:D {id:row.d})
MERGE (e:E {id:row.e})
MERGE (b)-[:FROM_FILE]->(file)
WITH b,c,d,e,a1,a2,row
OPTIONAL MATCH (a1)-[:TYPE_1]->(type_1:Action)-[:INITIATED]->(b)<-[:INITIATED]-(type_2:Action)<-[:TYPE_2]-(a2)
WITH a1,b,a2,row,c,d,e,type_1,type_2
CALL apoc.do.when(type_1 is null,
"WITH a1,b,a2,row,c,d,e
CALL apoc.do.when(row.qualifier = 'type1',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e})
YIELD value
RETURN value",
"
WITH row,c,d,e,type_1,type_2
CALL apoc.do.when(row.qualifier = 'type1',
'MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,type_1:type_1,type_2:type_2,c:c,d:d,e:e})
YIELD value
RETURN value",
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e,type_1:type_1,type_2:type_2})
YIELD value
RETURN count(*) as count
"""
params = []
for row in df.itertuples():
params_dict = {'a1': row[1], 'a1_name': row[-3],'a2':row[2],'a2_name':row[-4],'b_name':row[3],'b':row[-2],'c':int(row[6]),'d':row[7],'e':row[5],'qualifier':row[-1]}
params.append(params_dict)
if row[0] % 5000 == 0:
graph.run(statement, parameters = {"rows" : params,'file':file})
params = []
graph.run(statement, parameters = {"rows" : params,'file':file})

It's hard to say exactly what the issue is but I do notice that you use MERGE a bit more than you actually need to. In your apoc.do.when call you call
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
even though you know that you just created type_1 and type_2 so none of the relationships exist. If you change that to a CREATE you should see a speedup. The same logic applies to the other MERGE calls in that statement.

Related

How to return nodes that have only one given relationship

I have nodes that represent documents, and nodes that represent entities. Entities can be referenced in document, if so, they are linked together with a relationship like that :
(doc)<-[:IS_REFERENCED_IN]-(entity)
The same entity can be referenced in several documents, and a document can reference several entities.
I'd like to delete, for a given document, every entity that are referenced in this given document only.
I thought of two different ways to do this.
The first one uses java to make a foreach and would basically be something like that :
List<Entity> entities = MATCH (d:Document {id:0})<-[:IS_REFERENCED_IN]-(e:Entity) return e
for (Entity entity : entities){
MATCH (e:Entity)-[r:IS_REFERENCED_IN]->(d:Document) WITH *, count(r) as nb_document_linked WHERE nb_document_linked = 1 DELETE e
}
This method would work but i'd like not to use a foreach or java code to make it. I'd like to do it in one cypher query.
The second one uses only one cypher query but doesn't work. It's something like that :
MATCH (d:Document {id:0})<-[:IS_REFERENCED_IN]-(e:Entity)-[r:IS_REFERENCED_IN]->(d:Document) WITH *, count(r) as nb_document_linked WHERE nb_document_linked = 1 DELETE e
The problem here is that nb_document_linked is not unique for every entity, it is a unique variable for all the entities, which mean it'll count every relationship of every entity, which i don't want.
So how could I make a kind of a foreach in my cypher query to make it work?
Sorry for my english, I hope the question is clear, if you need any information please ask me.
You can do something like:
MATCH (d:Document{key:1})<-[:IS_REFERENCED_IN]-(e:Entity)
WITH e
MATCH (d:Document)<-[:IS_REFERENCED_IN]-(e)
WITH COUNT (d) AS countD, e
WHERE countD=1
DETACH DELETE e
Which you can see working on this sample data:
MERGE (a:Document {key: 1})
MERGE (b:Document {key: 2})
MERGE (c:Document {key: 3})
MERGE (d:Entity{key: 4})
MERGE (e:Entity{key: 5})
MERGE (f:Entity{key: 6})
MERGE (g:Entity{key: 7})
MERGE (h:Entity{key: 8})
MERGE (i:Entity{key: 9})
MERGE (j:Entity{key: 10})
MERGE (k:Entity{key: 11})
MERGE (l:Entity{key: 12})
MERGE (m:Entity{key: 13})
MERGE (d)-[:IS_REFERENCED_IN]-(a)
MERGE (e)-[:IS_REFERENCED_IN]-(a)
MERGE (f)-[:IS_REFERENCED_IN]-(a)
MERGE (g)-[:IS_REFERENCED_IN]-(a)
MERGE (d)-[:IS_REFERENCED_IN]-(b)
MERGE (e)-[:IS_REFERENCED_IN]-(b)
MERGE (f)-[:IS_REFERENCED_IN]-(c)
MERGE (g)-[:IS_REFERENCED_IN]-(c)
MERGE (j)-[:IS_REFERENCED_IN]-(a)
MERGE (h)-[:IS_REFERENCED_IN]-(a)
MERGE (i)-[:IS_REFERENCED_IN]-(a)
MERGE (g)-[:IS_REFERENCED_IN]-(c)
MERGE (k)-[:IS_REFERENCED_IN]-(c)
MERGE (l)-[:IS_REFERENCED_IN]-(c)
MERGE (m)-[:IS_REFERENCED_IN]-(c)
On which it removes 3 Entities.
The first MATCH finds the entities that are attached to your input doc, and the second MATCH finds the number of documents that each of these entities is connected to.

Merge statement in Cypher

I came across this statement in a Intro to Cypher video:
Ignoring the last MERGE statement, does the MERGE essentially do an INSERT...ON DUPLICATE KEY ? For example:
MERGE (a:Person {name: "Ann"})
ON CREATE SET a.twitter = "#ann"
Would correspond to:
INSERT INTO Person (name) VALUES ("Ann")
ON DUPLICATE KEY SET twitter = "#ann"
And by extension, if there is a MERGE on a node that doesn't already exist does it act as if it is a CREATE keyword?
Yes, that is what MERGE does. Note that it is not limited to just key fields. It takes into account all fields you provide in the MERGE clause. See also https://neo4j.com/docs/cypher-manual/current/clauses/merge/

Remove unnecessary relationships between nodes?

I tried to build a graph model using data, here is the cypher query
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (y:Year {year:toInteger(line.YearofJoining)})
ON CREATE SET y.month = line.MonthNamofJoining
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (ag)-[:LOCALITY]->(c)
MERGE (c)-[:JOINING_YEAR]->(y)
I need to return all connecting path between four employees, so I tried below query
MATCH p = (a:Employee)-[:AGE]->(ag)-[:LOCALITY]-(c)-[:JOINING_YEAR]-(y)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
But the result i got correct but there are many unnecessary paths. i am uploading the resulted graph please check and correct where i am doing wrong.resulted image
There are potentially several things to fix in the import query:
The year nodes are misleading. I think you should extract the month attribute to a separate node, like this:
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
Also, the modelling seems wrong. Currently, a Location is linked to year (or soon: month in year) via JOINING_YEAR. An age is linked to a location. This does seem to make sense.
You probably want an intermediate node to represent the fact that an employee has a joined a location (given Neo4j doesn't support relationships between more than 2 nodes).
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (j:Join {empid:line.EmpID}) // need a property to merge on
MERGE (a)-[:JOINED]->(j)
MERGE (j)-[:LOCALITY]->(c)
MERGE (j)-[:JOINING_MONTH]->(m)
Your read query becomes:
MATCH p = (:Location)<-[:LOCALITY]-(:Join)<-[:JOINED]-(a:Employee)-[:AGE]->(:Age)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
Unrelated formatting note:
the recommended case for attribute is camelCase (e.g. empId instead of empid) and for relation types is SNAKE_CASE (e.g. JOINING_YEAR instead of JOININGYEAR).
By convention, relation types are verbs more often than not.

Cypher: Adding properties to relationship as distinct values

What I'm trying to do is to write a query - I already made it a webservice(working on local machine, so I get the name and people as parameters) - which connects people who share the same hobbies and set the hobbies as the relationship property as an array.
My first attempt was;
MERGE (aa:Person{name:$name})
WITH aa, $people as people
FOREACH (person IN people |
MERGE (bb:Person{name:person.name})
MERGE (bb)-[r:SHARESSAMEHOBBY]->(aa)
ON MATCH SET r.hobbies = r.hobbies + person.hobby
ON CREATE SET r.hobbies = [person.hobby])
However this caused duplicated property elements like ["swimming","swimming"]
I'm trying to set only unique properties. Then I tried the following query;
MERGE (aa:Person{name:$name})
WITH aa, $people as people FOREACH (person IN people | MERGE (bb:Person{name:person.name}) MERGE (bb)-[r:SHARESSAMEHOBBY]->(aa)
WITH r, COALESCE(r.hobbies, []) + person.hobby AS hobbies
UNWIND hobbies as unwindedHobbies
WITH r, collect(distinct, unwindedHobbies) AS unique
set r.as = unique)
However now it gives me syntax error;
errorMessage = "[Neo.ClientError.Statement.SyntaxError] Invalid use of WITH inside FOREACH
Any help is appreciated.
This should work:
MERGE (aa:Person {name: $name})
WITH aa
UNWIND $people AS person
MERGE (bb:Person {name: person.name})
MERGE (bb)-[r:SHARESSAMEHOBBY]-(aa)
WITH r, person, CASE
WHEN NOT EXISTS(r.hobbies) THEN {new: true}
WHEN NOT (person.hobby IN r.hobbies) THEN {add: true}
END AS todo
FOREACH(ignored IN todo.new | SET r.hobbies = [person.hobby])
FOREACH(ignored IN todo.add | SET r.hobbies = r.hobbies + person.hobby);
You actually had 2 issues, and the above query addresses both:
If a SHARESSAMEHOBBY relationship already existed in the opposite direction (from aa to bb), the following MERGE clause would have caused the unnecessary creation of a second SHARESSAMEHOBBY relationship (from bb to aa):
MERGE (bb)-[r:SHARESSAMEHOBBY]->(aa)
To avoid this, you should have used a non-directional relationship pattern (which is is permitted by MERGE, but not CREATE) to match a relationship in either direction, like this:
MERGE (bb)-[r:SHARESSAMEHOBBY]-(aa)
You needed to determine whether it is necessary to initialize a new hobbies list or to add the person.hobby value to an existing r.hobbies list that did not already have that value. The above query uses a CASE clause to assign to todo either NULL, or a map with a key indicating what additional work to do. It then uses a FOREACH clause to execute each thing to do, as appropriate.

Where vs property specification

Trying to understand when to use a property value vs a WHERE clause.
$Match (g:GROUP {GroupID: 1}) RETURN g
gives the expected response (all reported properties as expected).
And,
$match (a:ADDRESS {AddressID: 454}) return a
gives the expected response (all reported properties as expected).
However, the combo in a MERGE
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1;
creates two new nodes (with no properties, of course, except a redundant AddressID and GroupID. The AddressID and GroupID were created with toInt() and I tried putting the property values in toInt() also (same result):
Added 2 labels, created 2 nodes, set 2 properties, created 1 relationship, returned 1 row in 77 ms.
So, after DETACH DELETE the extraneous nodes, I try again with (which works)
Match (g:GROUP) WHERE g.GroupID = 1
Match (a:ADDRESS) WHERE a.AddressID = 454
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
Returned 1 row in 14 ms.
WHY does the separate MATCHing work while the property spec does not?
MERGE is one of the trickier clauses for exactly this behavior.
From the Cypher documentation for MERGE:
When using MERGE on full patterns, the behavior is that either the
whole pattern matches, or the whole pattern is created. MERGE will not
partially use existing patterns — it’s all or nothing. If partial
matches are needed, this can be accomplished by splitting a pattern up
into multiple MERGE clauses.
So when you're going to MERGE a pattern, and you aren't using variables bound to already existing nodes, then the entire pattern is matched, or if it doesn't exist, the entire pattern is created, which, in your case, creates duplicate nodes, as your intent is to use existing nodes in the MERGE.
In general, when you want to MERGE a relationship or pattern between nodes that already exist, it's best to MATCH or MERGE on the nodes which should already exist first, and then MERGE the pattern with the matched or merged variables.
EDIT
I think there's some confusion here about the reasons for the differences in the queries.
This doesn't have anything to do with whether the properties are defined in a WHERE clause, or inline on the nodes in the MATCH clauses.
In fact, you can do this just fine with your last query, and it will behave identically:
Match (g:GROUP {GroupID:1})
Match (a:ADDRESS {AddressID:454})
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
The reasons for the differences, again, the behavior of MERGE
Really the easiest way to grasp what's going on is to consider what the behavior would be if MERGE were substituted first with MATCH, and then if no match was found, with CREATE.
MATCH (g)-[r:USES]->(a)
and if there is no match, it does CREATE instead
CREATE (g)-[r:USES]->(a)
That should make sense...a CREATE with existing nodes will create the missing part, the relationship.
Contrast that with using MERGE on the entire pattern:
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
Return g.ShortName, type(r), a.Line1;
First this will attempt a MATCH:
MATCH (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
and then when no match is found, a CREATE
CREATE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
And given what we know of how CREATE works, it doesn't attempt to match parts of the pattern (and there are no variables that have already matched to existing elements of the graph), it creates the pattern as a whole, creating a brand new :GROUP and :ADDRESS node with the given properties, and the new :USES relationship.
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1; most likely creates two nodes, because those properties (GroupID for the GROUP node / AddressID for ADDRESS node) are not the only properties on those nodes.
Matching the nodes first ensures that you're getting nodes with matching properties (which could have other properties, too) and merge those.
If you had an index with uniqueness constraint on both GroupID for GROUP nodes and AddressID for ADDRESS nodes, then the MERGE without matching first, should still make that connection.

Resources