neo4j Optimize a relationsship check (query) - neo4j

after importing data via CSV LOAD I want to connect the imported nodes to customer nodes that are already in the DB. The idea was to look up all imported nodes with the Label TICKET and run through the result set and create the relationship.
Here is the code I come up with first approach:
# Find nodes without relationship for label Ticket
MATCH (t:Ticket), (c:Customer)
WHERE NOT (t)--(c)
RETURN t.number as ticket_number, t.type as ticket_type,t.sid as ticket_sid
# Run through the resultset and execute for each found node
MATCH (t:Ticket { number: "xxx" }), (c:Customer {code: "xxx"})
MERGE (t)-[:IS_TICKET_OF]->(c);
There is an index
ON :Ticket (number)
ON :Customer(code)
This way to handle it is very slow and it took minutes to run through the CSV file. I hope there is a way to optimize the query or maybe to find a way to create the missing relationship easier as first to look them all up and then run through a loop.
The CSV Load is :
LOAD CSV FROM "file:c:..." AS csvLine
MERGE (t:Ticket { number: csvLine[0]})
Maybe its also fine to create the relation already in the CSV import - maybe something like
MATCH (c:Customer {code:"xxx"})
MERGE (t) - [:IS_TICKET_OF]-> (c)
But I would need to figure out in the query how to extract the code from a field as I have something like "aaa/vvv/bbb/1234" in the CSV import and would need only aaa for the match above as this is stored in the customer node as ID.
Any hint is very appreciated.
Thanks!

Does this query work for you?
It stores the aaa part of the input string in num, makes sure the ticket with that number exists, and then makes sure a relationship exists to the matching customer (if there is such a customer).
LOAD CSV FROM "file:c:..." AS csvLine
WITH SPLIT(csvLine[0], '/')[0] AS num
MERGE (t:Ticket {number: num})
WITH num, t
OPTIONAL MATCH (c:Customer {code: num})
MERGE (t)-[:IS_TICKET_OF]->(c);

Related

Unable to link a node with itself using Neo4j

How can I create a relationship from a node to itself? I have one node (p:person) and my csv has 2 columns: name and vice. Each row in my csv represents a person who a ceo and their vp at the time. Now sometimes vp were ceo so I want to show that relationship. Here is what I was trying but no luck. If I do not include the WITH I receive error saying I need it but when I add the * or a property, it says it cannot find row. I'm stuck
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///ceo_vp.csv' AS row
CREATE (p:person {name:coalesce(row.name,'UNK')})
MATCH (p:person {name:row.vice })
WITH *
CREATE (p)-[:was_vp_for]->(p)
There is typo on the variable p; You must assign a different variable name for vp. Here is the script;
LOAD CSV WITH HEADERS FROM 'file:///ceo_vp.csv' AS row
MERGE (ceo:person {name:coalesce(row.name,'UNK')})
MERGE (vice:person {name:row.vice })
CREATE (vice)-[:was_vp_for]->(ceo)
Notice that I used merge because as you said, a vp can be a former ceo (and vice versa) so merge is better than create. Merge will ignore the person if it already exists.

My match/merge process is not creating relationships in the Neo4J database

I am very new to Neo4j/cypher/graph databases, and have been trying to follow the Neo4j tutorial to import data I have in a csv and create relationships.
The following code does what I want in terms of reading in the data, creating nodes, and setting properties.
/* Importing data on seller-buyer relationshsips */
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///customer_rel_table.tsv' AS row
FIELDTERMINATOR '\t'
MERGE (seller:Seller {sellerID: row.seller})
ON CREATE SET seller += {name: row.seller_name,
root_eid: row.vendor_eid,
city: row.city}
MERGE (buyer:Buyer {buyerID: row.buyer})
ON CREATE SET buyer += {name: row.buyer_name};
/* Creating indices for the properties I might want to match on */
CREATE INDEX seller_id FOR (s:Seller) on (s.seller_name);
CREATE INDEX buyer_id FOR (b:Buyer) on (b.buyer_name);
/* Creating constraints to guarantee buyer-seller pairs are not duplicated */
CREATE CONSTRAINT sellerID ON (s:Seller) ASSERT s.sellerID IS UNIQUE;
CREATE CONSTRAINT buyerID on (b:Buyer) ASSERT b.buyerID IS UNIQUE;
Now I have the nodes (sellers and buyers) that I want, and I would like to link buyers and sellers. The code I have tried for this is:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///customer_rel_table.tsv' AS row
MATCH (s:Seller {sellerID: row.seller})
MATCH (b:Buyer {buyerID: row.buyer})
MERGE (s)-[st:SOLD_TO]->(b)
The query runs, but I don't get any relationships:
Query executed in 294ms. Query type: WRITE_ONLY.
No results.
Since I'm not asking it to RETURN anything, I think the "No results" comment is correct, but when I look at metadata for the DB, no relationships appear. Also, my data has ~220K rows, so 294ms seems fast.
EDIT: At #cybersam's prompting, I tried this query:
MATCH p=(:Seller)-[:SOLD_TO]->(:Buyer) RETURN p, which gives No results.
For clarity, there are two fields in my data that are the heart of the relationship:
seller and buyer, where the seller sells stuff to the buyer. The seller identifiers are repeated, but for each seller there are unique seller-buyer pairs.
What do I need to fix in my code to get relationships between the sellers and buyers? Thank you!
Your second query's LOAD CSV clause does not specify FIELDTERMINATOR '\t'. The default terminator is a comma (','). That is probably why it fails to MATCH anything.
Try adding FIELDTERMINATOR '\t' at the end of that clause.

Efficient way to import multiple csv's in neo4j

I am working on creating a graph database in neo4j for a CALL dataset. The dataset is stored in csv file with following columns: Source, Target, Timestamp, Duration. Here Source and Target are Person id's (numeric), Timestamp is datetime and duration is in seconds (integer).
I modeled my graph where person are nodes(person_id as property) and call as relationship (time and duration as property).
There are around 2,00,000 nodes and around 70 million relationships. I have a separate csv files with person id's which I used to create the nodes. I also added uniqueness constraint on the Person id's.
CREATE CONSTRAINT ON ( person:Person ) ASSERT (person.pid) IS UNIQUE
I didn't completely understand the working of bulk import so I wrote a python script to split my csv into 70 csv's where each csv has 1 million nodes (saved as calls_0, calls_1, .... calls_69). I took the initiative to manually run a cypher query changing the filename every time. It worked well(fast enough) for first few(around 10) files but then I noticed that after adding relationship from a file, the import is getting slower for the next file. Now it is taking almost 25 minutes for importing a file.
Can someone link me to an efficient and easy way of doing it?
Here is the cypher query:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
MERGE (p1)-[rel:CALLS {time: time, duration: Duration}]->(p2)
RETURN count(rel)
I am using Neo4j 4.0.3
Your MERGE clause has to check for an existing matching relationship (to avoid creating duplicates). If you added a lot of relationships between Person nodes, that could make the MERGE clause slower.
You should consider whether it is safe for you to use CREATE instead of MERGE.
Is much better if you export the match using the ID of each node and then create the relationship.
POC
CREATE INDEX ON :Person(`pid`);
CALL apoc.export.csv.query("LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
RETURN ID(a) AS ida,ID(b) as idb,time,Duration","rels.csv", {});
and then
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:////rels.csv' AS row
MATCH (a:Person) WHERE ID(a) = toInt(row.ida)
MATCH (b:Person) WHERE ID(b) = toInt(row.idb)
MERGE (b)-[:CALLS {time: row.time, duration: Duration}]->(a);
For me this is the best way to do this.

How to create relationship between two existing nodes by using node id?

I am trying to create a relationship between two existing nodes. I am reading the node ID's from a CSV and creating the relationship with the following query:
LOAD CSV WITH HEADERS FROM "file:///8245.csv" AS f
MATCH (Ev:Event) where id(Ev) =f.first
MATCH (Ev_sec:Event) where id(Ev_sec) = f.second
WITH Ev, Ev_sec
MERGE (Ev) - [:DF_mat] - > (Ev_sec)
However, it is not changing anything the database. How can I solve this problem?
Thanks!
I solved the problem. So, I again queried for the ID(node) and this time I exported them as a string (by using toString(ID(node)) ). Then while loading to the database, I converted them to Integer. The query is as follows:
LOAD CSV WITH HEADERS FROM "file:///8245_new.csv" AS csvLine
match (ev:Event) where id(ev)=toInteger(csvLine.first)
match (ev_sec:Event) where id(ev_sec)=toInteger(csvLine.second)
merge (ev)-[:DF_mat]-> (ev_sec)

Load csv in neo4j with nodes and relationships in one csv file

Apologies as I am new to neo4j and struggling with what I imagine is a very simple example.
I would like to model an org chart which I have stored as a csv like so
id,name,manager_id
1,allan,2
2,bob,4
3,john,2
4,sam,
5,Jim,2
Note that Bob has 3 direct reports and Bob reports into Sam who doesn't report into anyone.
I would like to produce a graph which shows the management chain. I have tried the following, but it produces relationships which are disjoint from the people:
LOAD CSV WITH HEADERS FROM "file///employees.csv" AS csvLine
CREATE (p:Person {id: csvLine.id, name: csvLine.name})
CREATE (p)-[:MANAGED_BY {manager: csvLine.manager_id}]->(p)
This query creates a bunch of self-referencing relationships. Is there anyway to populate the graph with one command over the single csv? I must be missing something and any help is appreciated. Thanks
I think this is what you are looking for.
In your query tou are creating a relationship between p and p thus the self referencing relationships.
I added a coalesce statement to deal with people that do not have a manager_id value. THis way Sam can report to himself.
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS csvLine
// create or match the person in the left column
MERGE (p:Person {id: csvLine.id })
// if they are created then assign their name
ON CREATE SET p.name = csvLine.name
// create or match the person/manager in the right column
MERGE (p1:Person {id: coalesce(csvLine.manager_id, csvLine.id) })
// create the reporting relationship
CREATE (p)-[:MANAGED_BY]->(p1)

Resources