We are using Neo4j 2.1.4 Community edition.
We are facing some issues in getting the specific paths in neo4j.
Below is the csv file used.
In graph database we are creating Product, Company,Country and Zipcode nodes along with the relationship type as ‘MyRel’ at each level.In the above data we wanted to differentiate each path ,
that is
Mobile, Google,US,88888 -- as path1
Mobile,Goolge,US -- as path2
Mobile,Goolge -- as path3
That’s why we created one more column called Path in data file and maintaining the Path value as a relatioship property. So whenever someone wants to see the different paths he can query based on Relationship property either 1 or 2 or 3. For eample , whenever we query for the relationship property, we should get Mobile,Google ,US
But whenever I do this , in graph it is creating dummy node for Country and Zipcode. This is due to in 2nd and 3rd row the zip and country values are empty(null).
Query used:
LOAD CSV WITH HEADERS FROM "file:C:\\WorkingFolder\\Neo4j\\EDGE_Graph_POC\\newdata\\trial1.csv " as file
MERGE (p:Product {Name:file.Product})
MERGE (comp:Company {Name:file.Company})
MERGE (c:Country {Name:file.Country})
MERGE (zip:Zipcode{Code:file.Zipcode})
CREATE (p)-[:MyRel{Path:file.Path}]->(comp)-[:MyRel{Path:file.Path}]->(c)-[:MyRel{Path:file.Path}]->(zip)
Resultant graph:
So how can I avoid creating dummy nodes ?
Is there any better alternative option to get the proper path?
Thanks,
First, A simple solution is to follow the LOAD CSV query with others that clean up your graph. Run the queries
MATCH (zip:Zipcode { Code : ''})<-[r]-()
DELETE zip, r
and
MATCH (c:Country { Name : ''})<-[r]-()
DELETE c, r
You will then have the graph you desire.
You can filter them out with
WHERE file.Country <> '' and file.Zipcode <> ''
and split up your CREATE e.g.
CREATE (p)-[:MyRel{Path:file.Path}]->(comp)
WHERE file.Country <> '' and file.Zipcode <> ''
MERGE (c:Country {Name:file.Country})
MERGE (zip:Zipcode{Code:file.Zipcode})
CREATE (comp)-[:MyRel{Path:file.Path}]->(c)-[:MyRel{Path:file.Path}]->(zip)
Related
I am trying to create a relationship between two existing nodes. I am reading the node ID's from a CSV and creating the relationship with the following query:
LOAD CSV WITH HEADERS FROM "file:///8245.csv" AS f
MATCH (Ev:Event) where id(Ev) =f.first
MATCH (Ev_sec:Event) where id(Ev_sec) = f.second
WITH Ev, Ev_sec
MERGE (Ev) - [:DF_mat] - > (Ev_sec)
However, it is not changing anything the database. How can I solve this problem?
Thanks!
I solved the problem. So, I again queried for the ID(node) and this time I exported them as a string (by using toString(ID(node)) ). Then while loading to the database, I converted them to Integer. The query is as follows:
LOAD CSV WITH HEADERS FROM "file:///8245_new.csv" AS csvLine
match (ev:Event) where id(ev)=toInteger(csvLine.first)
match (ev_sec:Event) where id(ev_sec)=toInteger(csvLine.second)
merge (ev)-[:DF_mat]-> (ev_sec)
I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist
I am working in a Streamsets pipeline to read data from a active file directory where .csv files are uploaded remotely and put those data in a neo4j database.
The steps I have used is-
Creating a observation node for each row in .csv
Creating a csv node and creating relation between csv & the record
Updating Timestamp taken from csv node to burn_in_test nodes, already created in graph database from different pipeline, if it is latest
creating relation from csv to burn in test
deleting outdated relation based on latest timestamp
Now I am doing all of these using jdbc query and the cypher query used is
MERGE (m:OBSERVATION{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
CUR: "${record:value('/CUR')}",
VOLT: "${record:value('/VOLT')}",
Rel_Lot: "${record:value('/Rel_Lot')}",
TimestampINT: "${record:value('/TimestampINT')}",
Temp: "${record:value('/Temp')}",
LP: "${record:value('/LP')}",
MON: "${record:value('/MON')}"
})
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
TimestampINT: "${record:value('/TimestampINT')}"
})
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
Right now this process is very slow and taking about 15 mins of time for 10 records. Can This be further optimized?
Best practices when using MERGE is to merge on a single property and then use SET to add other properties.
If I assume that serial number is property is unique for every node (might not be), it would look like:
MERGE (m:OBSERVATION{SerialNumber: "${record:value('/SerialNumber')}"})
SET m.Test_Stage = "${record:value('/Test_Stage')}",
m.CUR= "${record:value('/CUR')}",
m.VOLT= "${record:value('/VOLT')}",
m.Rel_Lot= "${record:value('/Rel_Lot')}",
m.TimestampINT = "${record:value('/TimestampINT')}",
m.Temp= "${record:value('/Temp')}",
m.LP= "${record:value('/LP')}",
m.MON= "${record:value('/MON')}"
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}"
})
SET t.Test_Stage = "${record:value('/Test_Stage')}",
t.TimestampINT = "${record:value('/TimestampINT')}"
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
another thing to add is that I would probably split this into two queries.
First one would be the importing part and the second one would be the delete of relationships. Also add unique constraints and indexes where possible.
I have the following file A.csv
"NODE","PREDECESSORS"
"1",""
"2","1"
"3","1;2"
I want to create with the nodes: 1,2,3 and its relationships 1->2->3 and 1->3
I have already tried to do so:
LOAD CSV WITH HEADERS FROM 'file:///A.csv' AS line
CREATE (:Task { NODE: line.NODE, PREDECESSORS: SPLIT(line.PREDECESSORS ';')})
FOREACH (value IN line.PREDECESSORS |
MERGE (PREDECESSORS:value)-[r:RELATIONSHIP]->(NODE) )
But it does not work, that is, it does not create any relationship.
Please, might you help me?
The problem is in your MERGE:
MERGE (PREDECESSORS:value)-[r:RELATIONSHIP]->(NODE)
This is merging a :value labeled node and assigning it to the variable PREDECESSORS, which can't be what you want to do.
A better approach would be not save the predecessor data in the node, just use that to match on the relevant nodes and create the relationships.
It will also help to have an index on :Task(NODE) so your matches to the predecessors are quick.
Remember also that cypher queries do not process the entire query for each row, but rather each operation in the query is processed for each row, so once the CREATE executes, all nodes will be created, there's no need to use MERGE the predecessor nodes.
Try something like this:
LOAD CSV WITH HEADERS FROM 'file:///A.csv' AS line
CREATE (node:Task { NODE: line.NODE})
WITH node, SPLIT(line.PREDECESSORS, ';') as predecessors
MATCH (p:Task)
WHERE p.NODE in predecessors
MERGE (p)-[:RELATIONSHIP]->(node)
In Neo4j, I am trying to load a CSV file whilst creating a relationship between nodes based on the condition that a certain property is matched.
My Cypher code is:
LOAD CSV WITH HEADERS FROM "file:C:/Users/George.Kyle/Simple/Simple scream v3.csv" AS
csvLine
MATCH (g:simplepages { page: csvLine.page}),(y:simplepages {pagekeyword: csvLine.keyword} )
MATCH (n:sensitiveskin)
WHERE g.keyword = n.keyword
CREATE (f)-[:_]->(n)
You can see I am trying to create a relationship between 'simplepages' and 'sensitiveskin' based on their keyword properties being the same.
The query is executing but relationships won't form.
What I hope for is that when I execute a query such as
MATCH (n:sensitiveskin) RETURN n LIMIT 25
You will see all nodes (both sensitive skin and simple pages) with auto-complete switched on.
CREATE (f)-[:_]->(n) is using an f variable that was not previously defined, so it is creating a new node (with no label or properties) instead, and then creating a relationship from that new node. I think you meant to use either g or y instead of f. (Probably y, since you don't otherwise use it?)