Here is my model:
(domain)-[:has]-(data_file)-[:contains]-(entities)-[:have]-(columns)-[:have]-(datatypes)
Code to import data
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from "file:///test.csv" as row
MERGE(domain:Domain{name:row.Domain})
MERGE(data_file:SourceFile{name:row.data_File})
MERGE(entity:Entity{name:row.Entity})
MERGE(column:ColumnName{name:row.Column_Name, Data_Type: row.Data_Type})
MERGE (domain)-[:HAS]->(data_file)
MERGE (data_file)-[:HAS]->(entity)
MERGE (entity)-[:HAS]->(column)
The problem I have now is that when two different entity nodes have same Column_Name but with different data types, both entity nodes points to a single node (matched by its Column_Name)
e.g
1. data_file-A has Entity-A, Column_Name: user_id (Data_Type: String)
2. data_file-B has Entity-B, Column_Name: user_id (Data_Type: Boolean)
When I run the import above, I expected that (1.) above should have an edge to node user_id whose data type is String, and (2.) should have another edge to another node user_id whose data type is Boolean. But this is not happening. Both point to a single node whose name is user_id
How can I resolve this? Is there a way to refer to the dynamic id created by neo4j while creating edges, so that I can create edges between those dynamic ids?
Something like MERGE (entity.id)-[:HAS]->(column.id)
Related
I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey
The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.
I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.
i'm trying to solve a problem of the 1: many relationship display in neo4j. My dataset is as below
child,desc,type,parent
1,PGD,Exchange,0
2,MSE 1,MSE,1
3,MSE 2,MSE,1
4,MSE 3,MSE,1
5,MSE 4,MSE,1
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
7,BRAS 2,BRAS,4
7,BRAS 2,BRAS,5
10,NPE 1,NPE,6
11,NPE 2,NPE,7
12,OLT,OLT,10
12,OLT,OLT,11
13,FDC,FDC,12
14,FDP,FDP,13
15,Cust 1,Customer,14
16,Cust 2,Customer,14
17,Cust 3,Customer,14
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
CREATE(:ftthsample
{child_id:line.child,
desc:line.desc,
type:line.type,
parent_id:line.parent});
//Relations
match (child:ftthsample),(parent:ftthsample)
where child.child_id=parent.parent_id
create (child)-[:test]->(parent)
//Query:
MATCH (child)-[childrel:test*]-(elem)-[parentrel:test*]->(parent)
WHERE elem.desc='FDP'
RETURN child,childrel,elem,parentrel
It returns a display as below.
I want the duplicate nodes to be displayed as one. Newbie with Neo4J. Can anyone of the experts help please?
This seems like an error in your graph creation query. You have a few lines in your query specifying the same node multiple times, but with multiple parents:
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
I'm guessing you actually want this to be a single node, with parent relationships to nodes with the given parent ids, instead of two separate nodes.
Let's adjust your import query. Instead of using a CREATE on each line, we'll use MERGE, and just on the child_id, which seems to be your primary key (maybe consider just using id instead, as a node can have an id on its own, without having to consider the context of whether it's a parent or child). We can use the ON CREATE clause after MERGE to add in the remaining properties only if the MERGE resulted in node creation (instead of matching to an existing node.
That will ensure we only have one node created per child_id.
Rather than having to rematch the child, we can use the child node we just created, match on the parent, and create the relationship.
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
MERGE(child:ftthsample {child_id:line.child})
ON CREATE SET
child.desc = line.desc,
child.type = line.type
WITH child, line.parent as parentId
MATCH (parent:ftthsample)
WHERE parent.child_id = parentId
MERGE (child)-[:test]->(parent)
Note that we haven't added line.parent as a property. It's not needed, since we only use that to create relationships, and after the relationships are there, we won't need those again.
I have a Node table and an Edge table, both available as CSV files.
I managed to load the Node table by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///NodesETL.csv' AS line
CREATE (:InfoNodes {id: toString(line.id), description: toString(line.description)})
This query creates the InfoNodes with the field values of the CSV file as properties of :InfoNodes which is fine.
InfoNodes have relationships with other InfoNodes, e.g. these relationships exists between Nodes with the same label.
These relationships are stored in an Edge table available as an additional CSV file.
Every row of this Edge table holds idfrom and idto fields that defines the relationships between InfoNodes on basis of their id property.
The Edge table holds also 3 additional fields representing properties of the relationship. The firstproperty is always a string and never NULL e.g. never an empty string. The secondproperty and thirdproperty, both type string, can have NULL values like "". So secondproperty and/or thirdproperty can contain NULL values.
I try to use this Edge table to create the [:RELATIONSHIP {firstproperty:, secondproperty:, thirdproperty:}] relationships between (:InfoNodes) by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[:RELATION {firstproperty: toString(line.firstproperty), secondproperty: toString(line.secondproperty), thirdproperty: toString(line.thirdproperty)}]->(to)
This second Cypher script results into an error when secondproperty and thirdproperty in the Edge table contain NULL values.
The Neo4j error message is: Cannot merge relationship using null property value for secondproperty.
When I remove from the second script the secondproperty field and secondproperty: property than the same type of error occurs mentioning thirdproperty: Cannot merge relationship using null property value for thirdproperty
When I remove secondproperty and thirdproperty fields and properties from the previous script then the [:RELATIONS] relationships between InfoNodes are created including firstproperty table fields stored as firstproperty: property of the [:RELATION] relationship.
Question: How to extend the second script in order to load from the Edge table the second property and thirdproperty fields into secondproperty: and thirdproperty: of [:RELATION] relationships including NULL values?
Can't MERGE with null values; 'Cannot merge node using null property value' in neo4j describes the same problem but doesn't answer my question in case of multiple fields/properties with NULL values.
You'll want to re-review the MERGE section in the developer guide. Specifically, in the introduction, there's mention of ON CREATE and ON MATCH. This allows you to set properties in cases where the MERGE resulted in a creation, or when instead the MERGE matched upon existing elements.
Typically you will want to MERGE only properties that uniquely define the thing, like IDs, and set the rest of the properties within ON CREATE.
Your query after this change might look something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[r:RELATION {firstproperty: toString(line.firstproperty)}]->(to)
ON CREATE SET r.secondproperty = toString(line.secondproperty), r.thirdproperty = toString(line.thirdproperty)
I'm very new to Neo4j, been playing around with it for a couple of days now.
I'm trying to use Neo4j to map our company's database by showing how one table is related to another (data is pulled to or pushed from one table to another) and what scripts are used to do this pulling and pushing. To do this, I'm using three different properties: TableName, ScriptName, and TableTouch.
TableName: Table node which corresponds to the name of a table
ScriptName: Script Node which corresponds to the script which
updatesa table
TableTouch: Used to show which table affects another
table
Here is an example of the .CSV I'm importing:
TableName ScriptName TableTouch
Source ScriptA Water/Oil
Water ScriptB Source
Oil ScriptC Source
Here is the code I have thus far:
CREATE CONSTRAINT ON (c:Table) ASSERT c.TableName IS UNIQUE;
CREATE CONSTRAINT ON (c:Scripts) ASSERT c.ScriptName IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"file:///C:\\NeoTest.CSV" AS line
MERGE (table:Table {TableName: UPPER(line.TableName)})
SET table.TableTouch = UPPER(line.TableTouch)
MERGE (script:Scripts {ScriptName: UPPER(line.ScriptName)})
MERGE (table) - [:UPDATED_BY] -> (script)
This will relate scripts to their appropriate tables and load in all the table and script nodes.
Now an example of what I need is for Node "Source" to connect to Node "Water" because "Source" = Water.TableTouch and "Water" = Source.TableTouch.
Assume any given table could have multiple tables listed in the TableTouch property.
I want the TableName nodes to connect to other TableName nodes where the TableName of one node is found in the TableName.TableTouch of another node. How would I go about doing this? Do I need to have my .CSV formatted differently for this?
Thanks,
-Andrew
Edit: This may make things more clear
What I have:
What I'd like to have (red arrows):
[UPDATED]
If I understand your scenario, you want to represent the Script that is used to generate each Table, and what other table was used by that Script.
And, if I understand the meaning of your CSV file and your pictures, it looks like the Source table is generated by ScriptA without using data from any other tables. If so, you can create your CSV file to look something like this (where the Source table row's TableTouch column has the special value NOTHING -- you can use some other name -- to indicate that column actually has no value):
TableName,ScriptName,TableTouch
Source,ScriptA,NOTHING
Water,ScriptB,Source
Oil,ScriptC,Source
Data model:
(src:Table {name: 'Source'})<-[:USES]-(s:Script {name: 'ScriptC'})-[:MAKES]->(dest:Table {name: 'Oil'})
Note: This data model allows a single Script to "use" any number of source Tables and "make" any number of destination Tables.
Create Constraints
CREATE CONSTRAINT ON (t:Table) ASSERT t.name IS UNIQUE;
CREATE CONSTRAINT ON (s:Script) ASSERT s.name IS UNIQUE;
Import data
LOAD CSV WITH HEADERS FROM "file:///C:\\NeoTest.CSV" AS line
MERGE (src:Table {name: line.TableTouch})
MERGE (dest:Table {name: line.TableName})
MERGE (script:Script {name: line.ScriptName})
MERGE (script)-[:USES]->(src)
MERGE (script)-[:MAKES]->(dest)
Note: To keep the query simple, we just go ahead and create (at most) one NOTHING node to represent the absence of a source Table.
Results