I am very new to Neo4j/cypher/graph databases, and have been trying to follow the Neo4j tutorial to import data I have in a csv and create relationships.
The following code does what I want in terms of reading in the data, creating nodes, and setting properties.
/* Importing data on seller-buyer relationshsips */
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///customer_rel_table.tsv' AS row
FIELDTERMINATOR '\t'
MERGE (seller:Seller {sellerID: row.seller})
ON CREATE SET seller += {name: row.seller_name,
root_eid: row.vendor_eid,
city: row.city}
MERGE (buyer:Buyer {buyerID: row.buyer})
ON CREATE SET buyer += {name: row.buyer_name};
/* Creating indices for the properties I might want to match on */
CREATE INDEX seller_id FOR (s:Seller) on (s.seller_name);
CREATE INDEX buyer_id FOR (b:Buyer) on (b.buyer_name);
/* Creating constraints to guarantee buyer-seller pairs are not duplicated */
CREATE CONSTRAINT sellerID ON (s:Seller) ASSERT s.sellerID IS UNIQUE;
CREATE CONSTRAINT buyerID on (b:Buyer) ASSERT b.buyerID IS UNIQUE;
Now I have the nodes (sellers and buyers) that I want, and I would like to link buyers and sellers. The code I have tried for this is:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///customer_rel_table.tsv' AS row
MATCH (s:Seller {sellerID: row.seller})
MATCH (b:Buyer {buyerID: row.buyer})
MERGE (s)-[st:SOLD_TO]->(b)
The query runs, but I don't get any relationships:
Query executed in 294ms. Query type: WRITE_ONLY.
No results.
Since I'm not asking it to RETURN anything, I think the "No results" comment is correct, but when I look at metadata for the DB, no relationships appear. Also, my data has ~220K rows, so 294ms seems fast.
EDIT: At #cybersam's prompting, I tried this query:
MATCH p=(:Seller)-[:SOLD_TO]->(:Buyer) RETURN p, which gives No results.
For clarity, there are two fields in my data that are the heart of the relationship:
seller and buyer, where the seller sells stuff to the buyer. The seller identifiers are repeated, but for each seller there are unique seller-buyer pairs.
What do I need to fix in my code to get relationships between the sellers and buyers? Thank you!
Your second query's LOAD CSV clause does not specify FIELDTERMINATOR '\t'. The default terminator is a comma (','). That is probably why it fails to MATCH anything.
Try adding FIELDTERMINATOR '\t' at the end of that clause.
Related
I am trying to load a CSV file using LOAD FROM CSV and build a relationship. I have a crossover table that I am using to support a many to many relationship. For my example I will use two (2) primary nodes, Car and Driver.
A Car can be driven by one or more Drivers and a Driver can drive one or more Cars.
My cross over table looks like this
CarID (int)
DriverID (int)
Here is my code to successfully load it into Neo4j
LOAD CSV WITH HEADERS FROM 'FILE:///CarToDriverXFER.csv' AS row FIELDTERMINATOR ','
MATCH (c:Cars {carID:row.carID})
MATCH (d:Drivers {driverID:row.driverID})
MERGE (c)-[:DRIVES]->(d)
I want to add an attribute into this relationship. Now the table looks like this:
CarID (int)
DriverID (int)
Rating (int)
I am not sure how to do this. I know how to do this if the object is a node but I am not getting the syntax correct with regards to build a relationship. Here is my attempt at a solution but I am getting an error.
LOAD CSV WITH HEADERS FROM 'FILE:///CarToDriverXFER.csv' AS row FIELDTERMINATOR ','
MATCH (c:Cars {carID:row.carID})
MATCH (d:Drivers {driverID:row.driverID})
CREATE ({Rating:row.Rating})
MERGE (c)-[:DRIVES]->(d)
The above script loaded the relationships but the attribute "Rating" is not listed on the attribute.
Can anyone offer help?
You can add the attribute to the relationship the same way you add attributes to nodes, that is:
LOAD CSV WITH HEADERS FROM 'FILE:///CarToDriverXFER.csv' AS row FIELDTERMINATOR ','
MATCH (c:Cars {carID:row.carID})
MATCH (d:Drivers {driverID:row.driverID})
// adding 'Rating' attribute to ':Drives' relationship between 'c' and 'd'
MERGE (c)-[:DRIVES {Rating:row.Rating}]->(d)
I have a data set, which looks like this
Now, as one can see that a single person has multiple skillid, along with this data, i also have a skill_ref table, which has 2 columns(skillid and skillname) so from the image above, i can look and say that last person has multiple skills, Now, i want this data to be put in Neo4j, with person and skillname as node, and a relationship of has_skill. But i dont know how to handle the multiple instances, if i split the skillid , then i will have multiple instances of person name, but this is not what i want, i want something like this
in the graph,the center node is the name of the person and the others have skill name, with the arrows pointing a relation has_skill.
I am new to neo4j as well as cypher, any help guys will be highly appreciated.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.first_name,p.lastname=line.last_name
WITH line,split(line.skillid,' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
WITH line
MERGE(skill:line.skillid{name:line.skillname})
CREATE (p)-[:HAS_SKILL]->(skill)
You've got the right idea.
The general approach is to first MERGE (or CREATE) the :Person node, then split() the skillid into a list of skill ids, UNWIND the skill id list into rows, then MERGE the skill for the given id (and make sure you have an index or unique constraint on :Skill(id)), then MERGE (or CREATE) the relationship between the :Person node and the :Skill.
Here's an example, loading from a CSV file:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///your.file.url" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.f_name ... <same for the rest of the properties>
UNWIND split(line.skillid, ' ') as skillId
MERGE (skill:Skill{id:skillId})
CREATE (p)-[:HAS_SKILL]->(skill)
EDIT
Regarding the revised query you're attempting, it's actually better to use separate queries for each csv load, using the first to create the nodes and merge the relationships, and the second just to match/merge to skills and add the skill name:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
SET p.firstName = line.first_name, p.lastname=line.last_name
WITH line, split(line.skillid, ' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
CREATE (p)-[:HAS_SKILL]->(skill)
Then your next query:
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
MERGE (skill:Skill{id:line.skillid})
SET skill.name = line.skillname
Remember that you should have an a unique constraint created on :Skill(id) and :Person(id) first.
I am relatively new to neo4j.
I have imported dataset of 12 million records and I have created a relationship between two nodes. When I created the relationship, I forgot to attach a property to the relationship. Now I am trying to set the property for the relationship as follows.
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
MATCH (user:User{userID: USERID})
MATCH (order:Order{orderID: OrderId})
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=field1,
acc.field2=field2;
But this query is taking too much time to execute,
I even tried USING index on user and order node.
MATCH (user:User{userID: USERID}) USING INDEX user:User(userID)
Isn't it possible to create new attributes for the relationship at a later point?
Please let me know, how can I do this operation in a quick and efficient way.
You also forgot to prefix your query with USING PERIODIC COMMIT,
your query will build up transaction state for 24 million changes (property updates) and won't have enough memory to keep all that state.
You also forgot row. for the data that comes from your CSV and those names are inconsistently spelled.
If you run this from neo4j browser pay attention to any YELLOW warning signs.
Run
CREATE CONSTRAINT ON (u:User) ASSERT u.userID IS UNIQUE;
Run
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;
Run
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
with row.USERID as userID, row.OrderId as orderID
MATCH (user:User{userID: userID})
USING INDEX user:User(userID)
MATCH (order:Order{orderID: orderID})
USING INDEX order:Order(orderID)
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=row.field1, acc.field2=row.field2;
I'm doing a project on credit card fraud, and I've got some generated sample data in .CSV (pipe delimited) where each line is basically the person's info, the transaction details along with the merchant name, etc.. Since this is generated data, there's also a flag that indicates if this transaction was fraudulent or not.
What I'm attempting to do is to load the data into Neo4j, create nodes (persons, transactions, and merchants), and then visualize a graph of the fraudulent charges to see if there are any common merchants. (I am aware there is a sample neo4j data set similar to this, but I'm attempting to apply this concept to a separate project).
I load the data in, create constraints, and them attempt my query, which seems to run forever.
Here are a few lines of example data..
ssn|cc_num|first|last|gender|street|city|state|zip|lat|long|city_pop|job|dob|acct_num|profile|trans_num|trans_date|trans_time|unix_time|category|amt|is_fraud|merchant|merch_lat|merch_long
692-42-2939|5270441615999263|Eliza|Stokes|F|684 Abigayle Port Suite 372|Tucson|AZ|85718|32.3112|-110.9179|865276|Science writer|1962-12-06|563973647649|40_60_bigger_cities.json|2e5186427c626815e47725e59cb04c9f|2013-03-21|02:01:05|1363831265|misc_net|838.47|1|fraud_Greenfelder, Bartoletti and Davis|31.616203|-110.221915
692-42-2939|5270441615999263|Eliza|Stokes|F|684 Abigayle Port Suite 372|Tucson|AZ|85718|32.3112|-110.9179|865276|Science writer|1962-12-06|563973647649|40_60_bigger_cities.json|7d3f5eae923428c51b6bb396a3b50aab|2013-03-22|22:36:52|1363991812|shopping_net|907.03|1|fraud_Gerlach Inc|32.142740|-111.675048
692-42-2939|5270441615999263|Eliza|Stokes|F|684 Abigayle Port Suite 372|Tucson|AZ|85718|32.3112|-110.9179|865276|Science writer|1962-12-06|563973647649|40_60_bigger_cities.json|76083345f18c5fa4be6e51e4d0ea3580|2013-03-22|16:40:20|1363970420|shopping_pos|912.03|1|fraud_Morissette PLC|31.909227|-111.3878746
The sample file I'm using has about 60k transactions
Below is my cypher query / code thus far.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "card_data.csv"
AS line FIELDTERMINATOR '|'
CREATE (p:Person { id: toInt(line.cc_num), name_first: line.first, name_last: line.last })
CREATE (m:Merchant { id: line.merchant, name: line.merchant })
CREATE (t:Transaction { id: line.trans_num, merchant_name: line.merchant, card_number:line.cc_num, amount:line.amt, is_fraud:line.is_fraud, trans_date:line.trans_date, trans_time:line.trans_time })
create constraint on (t:Transaction) assert t.trans_num is unique;
create constraint on (p:Person) assert p.cc_num is unique;
MATCH (m:Merchant)
WITH m
MATCH (t:Transaction{merchant_name:m.merchant,is_fraud:1})
CREATE (m)-[:processed]->(t)
You can see in the 2nd MATCH query, I am attempting to specify that we only examine fraudulent transactions (is_fraud:1), and of the roughly 65k transactions, 230 have is_fraud:1.
Any ideas why this query would seen to run endlessly? I do have MUCH larger sets of data I'd like to examine this way, and the small data results thus far are not promising (I'm sure due to my lack of understanding, not Neo4j's fault).
You don't show any index creation. To speed things up, you should create an index on both merchant_name and is_fraud, to avoid going through all transaction nodes sequentially for a given merchant:
CREATE INDEX ON :Transaction(merchant_name)
CREATE INDEX ON :Transaction(is_fraud)
You create duplicate entries both for merchants as well as for people.
// not really needed if you don't merge transactions
// and if you don't look up transactions by trans_num
// create constraint on (t:Transaction) assert t.trans_num is unique;
// can't a person use multiple credit cards?
create constraint on (p:Person) assert p.cc_num is unique;
create constraint on (p:Person) assert p.id is unique;
create constraint on (m:Merchant) assert m.id is unique;
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "card_data.csv" AS line FIELDTERMINATOR '|'
MERGE (p:Person { id: toInt(line.cc_num)})
ON CREATE SET p.name_first=line.first, p.name_last=line.las
MERGE (m:Merchant { id: line.merchant}) ON CREATE SET m.name = line.merchant
CREATE (t:Transaction { id: line.trans_num, card_number:line.cc_num, amount:line.amt, merchant_name: line.merchant,
is_fraud:line.is_fraud, trans_date:line.trans_date, trans_time:line.trans_time })
CREATE (p)-[:issued]->(t)
// only connect fraudulent transactions to the merchant
WHERE t.is_fraud = 1
// also add indicator label to transaction for easier selection / processing later
SET t:Fraudulent
CREATE (m)-[:processed]->(t);
Alternatively you can connect all tx to the merchant and indicate the fraud only via label / alternative rel-types.
after importing data via CSV LOAD I want to connect the imported nodes to customer nodes that are already in the DB. The idea was to look up all imported nodes with the Label TICKET and run through the result set and create the relationship.
Here is the code I come up with first approach:
# Find nodes without relationship for label Ticket
MATCH (t:Ticket), (c:Customer)
WHERE NOT (t)--(c)
RETURN t.number as ticket_number, t.type as ticket_type,t.sid as ticket_sid
# Run through the resultset and execute for each found node
MATCH (t:Ticket { number: "xxx" }), (c:Customer {code: "xxx"})
MERGE (t)-[:IS_TICKET_OF]->(c);
There is an index
ON :Ticket (number)
ON :Customer(code)
This way to handle it is very slow and it took minutes to run through the CSV file. I hope there is a way to optimize the query or maybe to find a way to create the missing relationship easier as first to look them all up and then run through a loop.
The CSV Load is :
LOAD CSV FROM "file:c:..." AS csvLine
MERGE (t:Ticket { number: csvLine[0]})
Maybe its also fine to create the relation already in the CSV import - maybe something like
MATCH (c:Customer {code:"xxx"})
MERGE (t) - [:IS_TICKET_OF]-> (c)
But I would need to figure out in the query how to extract the code from a field as I have something like "aaa/vvv/bbb/1234" in the CSV import and would need only aaa for the match above as this is stored in the customer node as ID.
Any hint is very appreciated.
Thanks!
Does this query work for you?
It stores the aaa part of the input string in num, makes sure the ticket with that number exists, and then makes sure a relationship exists to the matching customer (if there is such a customer).
LOAD CSV FROM "file:c:..." AS csvLine
WITH SPLIT(csvLine[0], '/')[0] AS num
MERGE (t:Ticket {number: num})
WITH num, t
OPTIONAL MATCH (c:Customer {code: num})
MERGE (t)-[:IS_TICKET_OF]->(c);