USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///dept_emp.csv" AS row
MATCH (emp_no:Employee {emp_no: row.emp_no})
MATCH (dept_no:departments {dept_no: row.dept_no})
MERGE(Employee)-[:belongs_to{from_date: row.from_date,to_date:
row.to_date}]->(departments);
i want to make a relationship with properties using this query the structure of node
employee is
[
gender M
emp_no 10001
birth_date 1953-09-02
last_name Facello
hire_date 1986-06-26
first_name Georgi
]
node departements is
[
dept_no d009
dept_name Customer Service
]
structure of file
dept_emp.csv is
(
emp_no dept_name from_date to_date
)
the ide does'nt showing an error just start processing and after 6 hour still processing.
I think you may be a bit confused about match syntax, for which part is the variable, and which is the node label.
MATCH (emp_no:Employee {emp_no: row.emp_no})
In the above match, :Employee is the label of the node. emp_no is the variable bound to the :Employee node that is matched.
Later in the query, you have this:
MERGE(Employee)-[:belongs_to{from_date: row.from_date,to_date:
row.to_date}]->(departments);
The problem here is that Employee and departments don't refer to anything you've matched to previously, this is the first occurrence of these variables, and that will throw off what this MERGE is doing. As it is, it is checking all relationships between all nodes (and doing this for every row in your CSV), looking for :belongs_to relationships with the given date properties.
I suggest you stop the query (by killing Neo4j if necessary) cleaning up the data (in case it needs it) and trying again, but try a MERGE with the variables you previously bound:
MERGE(emp_no)-[:belongs_to{from_date: row.from_date,to_date:
row.to_date}]->(dept_no);
Make sure you have indexes or unique constraints on :Employee(emp_no) and :departments(dept_no) for fast matching.
Related
I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.
My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2
Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2
I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.
I am trying to load large dataset into neo4j-3 and looking for the options. I found one neo4j-import but the problem with that is it is for initial load only. I have to load 2M records around every week.
I tried loading through shell but having some performance issue, I tried following.
1) Creating constraint upfront.
2) Creating Node and relationships in separate query.
3) Heap space 8G
4) dbms.memory.pagecache 4G
Many times the import just hangs and does nothing for hours.
Edit - CSV load being executed:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
OPTIONAL MATCH (per:Person {UID : "Person."+row.player_cardnum})
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
EDIT
On a second look, seems like you're trying to implement some kind of conditional logic to your insert.
It looks like what you're trying to do is figure out if a :Person exists with a UID (derived from some concatenation with row.player_cardnum), and in the case where that :Person doesn't exist and the match fails, MERGE a :Person with the CardNumber given by row.player_cardnum.
If this is your goal, you're ALMOST there with your query. The problem is with your WHERE clause.
Understand that WHERE clauses are linked with a preceding MATCH, OPTIONAL MATCH, or WITH, and only affects the linked clause.
With that WHERE on that OPTIONAL MATCH, per will always be null, but more importantly, your row will still exist, and the following MERGE will ALWAYS take place for all rows in the CSV. This is probably the source of your slowdown, as it's creating new :Person nodes for all rows.
If you're trying to null out the row completely when the OPTIONAL MATCH hits on an existing :Person (so the MERGE won't happen in that case), you'll need to add a WITH clause, and make sure your WHERE clause is applied to it instead of the OPTIONAL MATCH.
Additionally, make sure that you have either unique constraints or indexes on Person.UID and Person.CardNumber. As for the UID match, I've heard that indexes are not used when there's some kind of string concatenation of the thing you're matching upon, so you may need to assemble it first and pass it in with a WITH.
Your final query would look like this:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
// first build the UID so we can take advantage of the index
WITH row, "Person." + row.player_cardnum AS UID
OPTIONAL MATCH (per:Person {UID : UID})
// the WHERE now applies to the WITH, which will filter out and null out the row when an OPTIONAL MATCH is found
WITH row, per
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.
Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END