Joins not working due to missing properties in Neo4J - neo4j

I imported data into the neo4j database and there were some blank values as well so while importing the data, the property didn't create into the nodes which have blank values. i.e.
Student
Name | City
Amit | Delhi
Akshay |
So 2 nodes are being created. 1 node has 2 properties and another node has a single property.
I created a relationship something like:-
LOAD CSV WITH HEADERS FROM "file:///Student.csv" AS row
MATCH (e:College {College_ID: row.College_ID})
MATCH (c:Student {Name: row.Name,City: row.City})
MERGE (c)-[:REQ_TestedBy_TC]->(e);
Now when I'm going to join these nodes like:-
(c:College)-[r:CollegeHaveStudent]->(s:Student)
Then it is returning only the first row because the second student doesn't contain the City property so Join is not working.
I need help with what can be a workaround for that situation.

Related

Trying to create a relationships between nodes with 2 different properties names but some of them have the same value

Hey all a beginner neo4j student here o/
I created 40 nodes
20 :EMPLOYEE and 20 :ORGANIZATION nodes
Employee nodes have the owns_organizationNr property
Organization nodes have the ownedBy_organizationNr property
some of those nodes have the same value and I tried the query below
to create relationships between those who match by the property value and insert the
n1.owns_organizationNr property and value into to new relationship but something is missing
can you guys help me, please?
MATCH (n1:EMPLOYEE) , (n2:ORGANIZATION)
WHERE
HAS(n1.owns_organizationNr) AND HAS(n2.ownedBy_organizationNr) AND n1.owns_organizationNr = n2.ownedBy_organizationNr
CREATE (n1) -[:OWNS_ORG{n1.owns_organizationNr}]->(n2)
The query is doing a cartesian product so it is resulting to a lot of rows. You can use below query. Also, there is a typo error on the CREATE statement. I also fix it in the last line.
MATCH (n1:EMPLOYEE) WHERE n1.owns_organizationNr is not null
WITH n1
MATCH (n2:ORGANIZATION)
WHERE n2.ownedBy_organizationNr is not null AND n1.owns_organizationNr = n2.ownedBy_organizationNr
CREATE (n1) -[:OWNS_ORG{owns_organizationNr: n1.owns_organizationNr}]->(n2)
Sample result:

Neo4J set new property on Relationship by Relationship ID

I have a Neo4J Graph and want to add a new Property on the Relationships based on the ID of the Relationship, which is already set. The ID is a Property and looks like this:
id:16_0beta1_1b500480_1221807483755_439038_8369
In a CSV-File I have stored 400 IDs and a type corresponding to the IDs. Neo4J should load the CSV-File and look through all relationships. When a Relationship ID matches an ID from the CSV-File it should set the new Property like this: set r.SysML=row.type and create a new Property on the Relationship:
SysML:Block
For the nodes the following clause worked well:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
merge(n:name {id:row.sysID})
on match set n.SysML=row.type
For Relationship Property i tried:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
merge ()-[r:rel {id:row.sysID}]->()
on match set r.SysML=row.type
I couldn't solve it even with many variations of the relationship...
You should NEVER do this:
merge ()-[r:rel {id:row.sysID}]->()
If the relationship does not already exist, that clause would create a new relationship between 2 brand new nodes having no labels or properties (and your on match clause would also not be applied).
Since your question indicates you just want to update the SysML property of existing rel relationships, you should use MATCH instead of MERGE:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
MATCH ()-[r:rel {id: row.sysID}]->()
SET r.SysML = row.type
By the way, it would be more efficient if you qualified the end nodes (e.g., by supplying labels, or even property values) to avoid having to scan through every relationship in the DB.

Update existing relationship property value in Neo4j using CSV

I already have some data in Neo4j, data is modelled in the below fashion :
:A {ID:"123",Group:"ABC",Family:"XYZ"}
:B {ID:"456",Group:"ABC",Family:"XYZ"})
(:A)-[:SCORE{score:'2'}]-(:B)
Please find the attached image for more clarification how data looks like currently.
Now,
I am importing some new data through CSV file which has 5 columns
A's ID
B's ID
Score through which A is attached to B
Group
Family
In the new data there can be some new A Ids or some new B Ids
Question :
I want to create those new nodes of type A and B and create a relationship 'Score' and assigning score as the value of relationship type 'Score' between them
there are chances that there already existing scores between A and B might have changed. So i want to just update the previous score with the new one.
How to write cypher to achieve the above problem using CSV as import.
I used the below cypher query to model data for the first time:
using periodic commit LOAD CSV WITH HEADERS FROM "file:///ABC.csv" as line Merge(a:A{ID: line.A,Group:line.Group,Family:line.Family})
Merge(b:B{ID: line.A,Group:line.Group,Family:line.Family})
Merge(a)-[:Score{score:toFloat(line.Score)}]-(b)
Note: Family and Group are same for both type of nodes 'A' and 'B'
Thanks in advance.
You can MERGE the relationship and set the score after the fact so it does not create new SCORE relationships for every new value.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///ABC.csv" AS line
MERGE (a:A {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (b:B {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (a)-[score:SCORE]-(b)
SET score.score = toFloat(line.Score)

neo4j relationships between PK and FK

i have two csv files the first one is like
movies.csv
movieId | title | genres
links.csv
movieId | tmdbId | imdbId
ive tried this cypher query
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///links.csv" AS row
WITH row
MATCH (movie:Movie {id: toInt(row.movieId)})
MERGE (link)-[r:LINK]->(movie)
ON CREATE SET r.tmdbId = toInt(row.tmdbId)
this didnt work for me, it doesnt create new label "LINK" or form the relationship correctly,,
i want to be able when i have a movieId to get its corresponding tmdbId
t've tried several methods but none of them worked, im new to neo4k and still familiarised with sql
Your usage of link is as a variable, not a label (you would use :Link if you wanted to create a new node with that label), and it's not really clear what your link is supposed to be, as you don't have any ids or any properties on it. It's also not clear what you need other nodes for, as a movie node can easily have properties for related ids (so you look up the :Movie node by its movieId and then get the tmdbId from that node).
If you could provide more details about your use cases, and what you want to model and how it's connected, that would help.
EDIT
Okay, so it sounds like you're modeling :Movies, and you also want :Link nodes that hold both tmdbId and imdbId properties. As mentioned above, in reality you should just set the properties on the :Movie node itself and not bother with :Link nodes at all, but this is for the sake of getting used to neo4j, so okay.
First of all, to make sure our matches are fast as we build these relationships, we need unique constraints on nodes through their unique IDs.
CREATE CONSTRAINT ON (m:Movie)
ASSERT m.id IS UNIQUE
CREATE CONSTRAINT ON (l:Link)
ASSERT l.tmdbId IS UNIQUE
CREATE CONSTRAINT ON (l:Link)
ASSERT l.imdbId IS UNIQUE
Your import would be:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///links.csv" AS row
WITH row
MATCH (movie:Movie {id: toInt(row.movieId)})
MERGE (link:Link{imdbId: toInt(row.imdbId), tmdbId: toInt(row.tmdbId)})
MERGE (link)-[:LINK]->(movie)
And a query to get the movie by one id would be:
MATCH (link:Link)-[:LINK]->(movie:Movie)
WHERE link.imdbId = 123
RETURN movie
You should be able to infer the query for going the opposite direction, starting with a movieId and traversing the :LINK relationship to the :Link node (you may want to change one of these, as having the same name for a node label and a relationship type might get confusing since you're new to this) to get the relevant ID.

Error creating relationships over huge dataset

My question is similar to the one pointed here :
Creating unique node and relationship NEO4J over huge dataset
I have 2 tables Entity (Entities.txt) & Relationships (EntitiesRelationships_Updated.txt) which looks like below: Both the tables are inside an import folder within the Neo4j database. What I am trying to do is load the tables using the load csv command and then create relationships.
As in the table below: If ParentID is 0, it means that ENT_ID does not have a parent. If it is populated, then it has a parent. For example in the table below, ENT_ID = 3 is the parent of ENT_ID = 4 and ENT_ID = 1 is the parent of ENT_ID = 2
**Entity Table**
ENT_ID Name PARENTID
1 ABC 0
2 DEF 1
3 GHI 0
4 JKG 3
**Relationship Table**
RID ENT_IDPARENT ENT_IDCHILD
1 1 2
2 3 5
The Entity table has 2 million records and the relationship tables has about 400K lines
Each RID has a particular tag associated with it. For example RID = 1 has it that the relation is A FATHER_OF B; RID = 2 has it that the relation is A MOTHER_OF B. Similarly there are 20 such RIDs associated.
Both of these are in txt format.
My first step is to load the entity table. I used the following script:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Entities.txt" AS Entity FIELDTERMINATOR '|'
CREATE (n:Entity{ENT_ID: toInt(Entity.ENT_ID),NAME: Entity.NAME,PARENTID: toInt(Entity.PARENTID)})
This query works fine. It takes about 10 minutes to load 2.8mil records. The next step I do is to index the records:
CREATE INDEX ON :Entity(PARENTID)
CREATE INDEX ON :Entity(ENT_ID)
This query runs fine as well. Following this I tried creating the relationships from the relationship table using a similar query as in the above link:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///EntitiesRelationships_Updated.txt" AS Rships FIELDTERMINATOR '|'
MATCH (n:A {ENT_IDPARENT : Rships.ENT_IDPARENT})
with Entity, n
MATCH (m:B {ENT_IDCHILD : Rships.ENT_IDCHILD})
with m,n
MERGE (n)-[r:RELATION_OF]->(m);
As I do this, my query keeps running for about an hour and it stops at a particular size(in my case 2.2gb) I followed this query based on the link above. This includes the edit from the solution below and still does not work
I have one more query, which would be as follows (Based on the above link). I run this query as I want to create a relationship based of the Entity table
PROFILE
MATCH(Entity)
MATCH (a:Entity {ENT_ID : Entity.ENT_ID})
WITH Entity, a
MATCH (b:Entity {PARENTID : Entity.PARENTID})
WITH a,b
MERGE (a)-[r:PARENT_OF]->(b)
While I tried running this query, I get a Java Heap Space Error. Unfortunately, I have not been able to get the solution for these.
Could you please advice if I am doing something wrong?
This query allows you to take advantage of your :Entity(ENT_ID) index:
MATCH (child:Entity)
WHERE child.PARENTID > 0
WITH child.PARENTID AS pid, child
MATCH (parent:Entity {ENT_ID : pid})
MERGE (parent)-[:PARENT_OF]->(child);
Cypher does not use indices when the property value comes from another node. To get around that, the above query uses a WITH clause to represent child.PARENTID as a variable (pid). The time complexity of this query should be O(N). You original query has a complexity of O(N * N).
[EDITED]
If the above query takes too long or encounters errors that might be related to running out of memory, try this variant, which creates 1000 new relationships at a time. You can change 1000 to any number that is workable for you.
MATCH (child:Entity)
WHERE child.PARENTID > 0 AND NOT ()-[:PARENT_OF]->(child)
WITH child.PARENTID AS pid, child
LIMIT 1000
MATCH (parent:Entity {ENT_ID : pid})
CREATE (parent)-[:PARENT_OF]->(child)
RETURN COUNT(*);
The WHERE clause filters out child nodes that already have a parent relationship. And the MERGE operation has been changed to a simpler CREATE operation, since we have already ascertained that the relationship does not yet exist. The query returns a count of the number of relationships created. If the result is less than 1000, then all parent relationships have been created.
Finally, to make the repeated queries automated, you can install the APOC plugin on the neo4j server and use the apoc.periodic.commit procedure, which will repeatedly invoke a query until it returns 0. In this example, I use a limit parameter of 10000:
CALL apoc.periodic.commit(
"MATCH (child:Entity)
WHERE child.PARENTID > 0 AND NOT ()-[:PARENT_OF]->(child)
WITH child.PARENTID AS pid, child
LIMIT {limit}
MATCH (parent:Entity {ENT_ID : pid})
CREATE (parent)-[:PARENT_OF]->(child)
RETURN COUNT(*);",
{limit: 10000});
Your entity creation Cypher looks fine, as do your indexes.
I am rather confused about the last two Cypher fragments though.
Since your relationships have a specific label or id associated with them, it's probably best to add your relationships by loading from the relationship table data, though the node labels in your query (A and B) aren't used in your Entity creation and aren't in your graph, and neither are ENT_IDPARENT or ENT_IDCHILD fields. Looks like this isn't really the Cypher you used, but an example you built off of?
I'd change this relationship creation query to this, setting the type property of the relationship for post-processing later (this assumes that there can only be one :RELATION_OF relation between the same two nodes):
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///EntitiesRelationships_Updated.txt" AS Rships FIELDTERMINATOR '|'
MATCH (parent:Entity {ENT_ID : Rships.ENT_IDPARENT})
MATCH (child:Entity {ENT_ID : Rships.ENT_IDCHILD})
MERGE (parent)-[r:RELATION_OF]->(child)
ON CREATE SET r.RID = Rships.RID;
Later on, if you like, you can match on your relationships with an RID, and add the corresponding type ("FATHER_OF", "MOTHER_OF", etc) property.
As for creating the :PARENT_OF relationship, you're doing some extra match on an Entity variable bound to every single node in your graph - get rid of that.
Instead, use this:
PROFILE
// first, match on all Entities with a PARENTID property
MATCH(child:Entity)
WHERE EXISTS(child.PARENTID)
// next, find the parent for each child by the child's PARENTID
WITH child
MATCH (parent:Entity {ENT_ID : child.PARENTID})
MERGE (parent)-[:PARENT_OF]->(child)
// lastly remove the parentid from the child, so it won't be reprocessed
// if we run the query again.
REMOVE child.PARENTID
EDITED the above query to use an existence check on child.PARENTID, and to remove child.PARENTID after the corresponding relationship has been created.
If you need a solution that uses batching, you could do this manually (adding LIMIT 100000 to your WITH child line, or you could install the APOC Procedures Library and use its periodic.commit() function to batch your processing.

Resources