Remove unnecessary relationships between nodes? - neo4j

I tried to build a graph model using data, here is the cypher query
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (y:Year {year:toInteger(line.YearofJoining)})
ON CREATE SET y.month = line.MonthNamofJoining
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (ag)-[:LOCALITY]->(c)
MERGE (c)-[:JOINING_YEAR]->(y)
I need to return all connecting path between four employees, so I tried below query
MATCH p = (a:Employee)-[:AGE]->(ag)-[:LOCALITY]-(c)-[:JOINING_YEAR]-(y)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
But the result i got correct but there are many unnecessary paths. i am uploading the resulted graph please check and correct where i am doing wrong.resulted image

There are potentially several things to fix in the import query:
The year nodes are misleading. I think you should extract the month attribute to a separate node, like this:
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
Also, the modelling seems wrong. Currently, a Location is linked to year (or soon: month in year) via JOINING_YEAR. An age is linked to a location. This does seem to make sense.
You probably want an intermediate node to represent the fact that an employee has a joined a location (given Neo4j doesn't support relationships between more than 2 nodes).
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (j:Join {empid:line.EmpID}) // need a property to merge on
MERGE (a)-[:JOINED]->(j)
MERGE (j)-[:LOCALITY]->(c)
MERGE (j)-[:JOINING_MONTH]->(m)
Your read query becomes:
MATCH p = (:Location)<-[:LOCALITY]-(:Join)<-[:JOINED]-(a:Employee)-[:AGE]->(:Age)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
Unrelated formatting note:
the recommended case for attribute is camelCase (e.g. empId instead of empid) and for relation types is SNAKE_CASE (e.g. JOINING_YEAR instead of JOININGYEAR).
By convention, relation types are verbs more often than not.

Related

Why is LOAD WITH HEADERS statement only importing the first row of my dataset?

I am attempting a project where I must import a dataset into neo4j. After trying to use the LOAD CSV WITH HEADERS statement, I noticed that it only imported the first row from my file. After realizing this, I attempted to use the apoc plugin to run CALL apoc.periodic.iterate thinking that since my dataset had 16719 rows, it needed to be able to wait for each row to be called on so it would not fail.
apoc.periodic.iterate attempt:
CALL apoc.periodic.iterate
(
"LOAD CSV WITH HEADERS FROM 'file:///Video_Games_Sales_as_at_22_Dec_2016.csv' as row
WITH row
RETURN row",
"MERGE (g:Game)
ON CREATE SET g.Name = row.Name,
g.Release = row.Release,
g.NASales = row.NASales,
g.EUSales = row.EUSales,
g.JPSales = row.JPSales,
g.OtherSales = row.OtherSales,
g.GlobalSales = row.GlobalSales
MERGE (p:Platform)
ON CREATE SET p.Name = row.Platform
MERGE (c:Genre)
ON CREATE SET c.Type = row.Genre
MERGE (v:Publisher)
ON CREATE SET v.Name = row.Publisher
MERGE (x:Developer)
ON CREATE SET x.Name = row.Developer
MERGE (r:Rating)
ON CREATE SET r.Rating = row.Rating
MERGE (g)-[:ON_PLATFORM]-(p)
MERGE (g)-[:GENRE]-(c)
MERGE (g)-[:PUBLISHEDBY]-(v)
MERGE (g)-[:DEVELOPEDBY]-(x)
MERGE (g)-[:RATED]-(r)",
{batchSize: 10000, iterateList: true}
)
YIELD batches, total
RETURN batches, total;
Even after running this new statement, it only imported the first row and all relationships.
In an attempt to figure out what I am doing wrong, I would like to know if anyone has experienced a similar issue?
With that being said, if you see where I am messing up, please point me in the right direction.
It may have to do with the fact that you do things like
MERGE (g:Game)
which may overwrite the same node every time.
Normally you do
MERGE (g:Game {Name: row.Name})
assuming that Name is an identifying property.
Also, make sure that you have a CONSTRAINT set for the Name property.
Same of course for all the other node types that you are using.

Creating property-less nodes in Neo4j

I have a schema like (:A)-[:TYPE_1]-(:B)-[:TYPE_2]-(:A). I need to link [:TYPE_1] and [:TYPE_2] Relationships to certain other Nodes (Say, types C,D,E etc.). I had to create some Nodes without any properties, like (:A)-[:TYPE_1]-(:Action)--(:B)--(:Action)-[:TYPE_2]-(:A). The only purpose of the (:Action) Nodes is to enable me to link the action to some other Nodes (because I can't link a relationship to a Node). Thus, there are no properties associated with them. Since I changed my schema, I am finding that MERGE queries have slowed down incredibly. Obviously, I can't index the (:Action) Nodes, but all other Indexes are in place. What could be going wrong?
Edit:
My logic is that 1) There are multiple csv files 2) Each row in each file provides one (a1:A)-[:TYPE_1]-(type_1:Action)--(b:B)--(type_2:Action)-[:TYPE_2]-(a2:A) pattern. 3) Two different files may provide the same a1,a2 and b entities. 4) However, if the file pertains to a1, it will give qualifiers for type_1 and if the file pertains to a2, it will give qualifiers for type_2. 5) Hence, I do an OPTIONAL MATCH to see if the pattern exists. 6) If it doesn't, I create the pattern, qualifying either type_1, or type_2 based on a parameter in the row called qualifier, which can be type_1 or type_2. 7) If it does, then I just qualify the type_1 or type_2 as the case may be.
statement = """
MERGE (file:File {id:$file})
WITH file
UNWIND $rows as row
MERGE (a1:A {id:row.a1})
ON CREATE
SET a1.name=row.a1_name
MERGE (a2:A {id:row.a2})
ON CREATE
SET a2.name=row.a2_name
MERGE (b:B {id:row.b})
ON CREATE
SET b.name = row.b_name,
MERGE (c:C {id:row.c})
MERGE (d:D {id:row.d})
MERGE (e:E {id:row.e})
MERGE (b)-[:FROM_FILE]->(file)
WITH b,c,d,e,a1,a2,row
OPTIONAL MATCH (a1)-[:TYPE_1]->(type_1:Action)-[:INITIATED]->(b)<-[:INITIATED]-(type_2:Action)<-[:TYPE_2]-(a2)
WITH a1,b,a2,row,c,d,e,type_1,type_2
CALL apoc.do.when(type_1 is null,
"WITH a1,b,a2,row,c,d,e
CALL apoc.do.when(row.qualifier = 'type1',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e})
YIELD value
RETURN value",
"
WITH row,c,d,e,type_1,type_2
CALL apoc.do.when(row.qualifier = 'type1',
'MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,type_1:type_1,type_2:type_2,c:c,d:d,e:e})
YIELD value
RETURN value",
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e,type_1:type_1,type_2:type_2})
YIELD value
RETURN count(*) as count
"""
params = []
for row in df.itertuples():
params_dict = {'a1': row[1], 'a1_name': row[-3],'a2':row[2],'a2_name':row[-4],'b_name':row[3],'b':row[-2],'c':int(row[6]),'d':row[7],'e':row[5],'qualifier':row[-1]}
params.append(params_dict)
if row[0] % 5000 == 0:
graph.run(statement, parameters = {"rows" : params,'file':file})
params = []
graph.run(statement, parameters = {"rows" : params,'file':file})
It's hard to say exactly what the issue is but I do notice that you use MERGE a bit more than you actually need to. In your apoc.do.when call you call
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
even though you know that you just created type_1 and type_2 so none of the relationships exist. If you change that to a CREATE you should see a speedup. The same logic applies to the other MERGE calls in that statement.

Merge statement in Cypher

I came across this statement in a Intro to Cypher video:
Ignoring the last MERGE statement, does the MERGE essentially do an INSERT...ON DUPLICATE KEY ? For example:
MERGE (a:Person {name: "Ann"})
ON CREATE SET a.twitter = "#ann"
Would correspond to:
INSERT INTO Person (name) VALUES ("Ann")
ON DUPLICATE KEY SET twitter = "#ann"
And by extension, if there is a MERGE on a node that doesn't already exist does it act as if it is a CREATE keyword?
Yes, that is what MERGE does. Note that it is not limited to just key fields. It takes into account all fields you provide in the MERGE clause. See also https://neo4j.com/docs/cypher-manual/current/clauses/merge/

How to conditionally add property to relationship from CSV in Neo4j

I am making a Neo4j graph to show a network of music artists.
I have a CSV with a few columns. The first column is called Artist and is the person who made the song. The second and third columns are called Feature1 and Feature2, respectively, and represent the featured artists on a song (see example https://docs.google.com/spreadsheets/d/1TE8MtNy6XnR2_QE_0W8iwoWVifd6b7KXl20oCTVo5Ug/edit?usp=sharing)
I have merged so that any given artist has just a single node. Artists are connected by a FEATURED relationship with a strength property that represents the number of times someone has been featured. When the relationship is initialized, the relationship property strength is set to 1. For example, when (X)-[r:FEATURED]->(Y) occurs the first time r.strength = 1.
CREATE CONSTRAINT ON (a:artist) ASSERT a.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature) ASSERT f.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature1) ASSERT f.artistName IS UNIQUE;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
CREATE (artist)-[:FEATURES {strength:1}]->(feature)
CREATE (artist)-[:FEATURES {strength:1}]->(feature1)
Then I deleted the None node for songs that have no features
MATCH (artist:Artist {artistName:'None'})
OPTIONAL MATCH (artist)-[r]-()
DELETE artist, r
If X features Y on another song further down the CSV, the code currently creates another (duplicate) relationship with r.strength = 1. Rather than creating a new relationship, I'd like to have only the one (previously created) relationship and increase the value of r.strength by 1.
Any idea how can I do this? My current approach has been to just create a bunch of duplicate relationships, then go back through and count all duplicate relationships, and set
r.strength = #duplicate relationships. However, I haven't been able to get this to work, and before I waste more time on this, I figured there is a more efficient way to accomplish this.
Any help is greatly appreciated. Thanks!
You can use MERGE on relationships with ON MATCH SET
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
MERGE (artist)-[f1:FEATURES]->(feature)
ON CREATE SET f1.strength = 1
ON MATCH SET f2.strength = f1.strength + 1
MERGE (artist)-[f2:FEATURES]->(feature1)
ON CREATE SET f2.strength = 1
ON MATCH SET f2.strength = f2.strength + 1

Creating relationship type between two nodes neo4j

I'm trying to create a relationship between two nodes and for some reason I am unable to do to.
MATCH (C:Company {Company: 'Node1'})
MATCH (J:Company {Company: 'Node2'})
MERGE (C)-[:Partner]-(J);
I'm getting the result (no changes, no records). Before trying to create this relationship, I uploaded a csv with the following cypher:
LOAD CSV WITH HEADERS FROM 'FILE:///company_info.csv' AS line
MERGE (C:Company {Company: line.Company })
ON CREATE SET
C.Partner = line.Partner,
C.Product = line.Product,
C.Partners = line.Partners,
C.Customers = line.Customers
ON MATCH SET
C.Partner = line.Partner,
C.Product = line.Product,
C.Partners = line.Partners,
C.Customers = line.Customers
I know that the C.Partner = line.Partner created a partner property not a relationship type. Any suggestions on what I can do here to create the relationship type?
So, according to a comment to the other answer, your actual issue is that you created a Company node with the wrong property value ("Node1 " instead of "Node1"). Therefore, your first MATCH clause failed.
To change the node property value from "Node1 " to "Node1" via Cypher, you can do this:
MATCH (c:Company {Company: 'Node1 '})
SET c.Company = 'Node1';
If this is a general problem, you can trim whitespace from both ends of that property value in all Company nodes this way:
MATCH (c:Company)
SET c.Company = TRIM(c.Company);
If you just want to trim on the right side, can can use the RTRIM function instead of TRIM.
You need to provide a direction for the relationship.
MATCH (C:Company {Company: 'Node1'})
MATCH (J:Company {Company: 'Node2'})
MERGE (C)-[:Partner]->(J);

Resources