Neo4j Join from CSV - neo4j

I am the following sample nodes:
{
"name": "host_1",
"id": 0
}
{
"name": "host_2",
"id": 1
}
Then I have connections/authentications between those nodes in a CSV file.
{
"src_id": "291",
"dest_id": "162"
}
{
"src_id": "291",
"dest_id": "257"
}
I am trying to build the relationships (authentications between hosts) with the CSV file, but I'm having trouble getting the query finalized before I can create the relationship.
Is there a way to make an alias for a match similar to a SQL join?
LOAD CSV WITH HEADERS FROM "file:///redteam_connections.csv" AS row
MATCH (n:nodes {id: toInteger(row.dest_id)}), (n:nodes {id: toInteger(row.src_id)})
I'd like to make an alias such as
(n:nodes {id: toInteger(row.dest_id)}) AS dest_node, (n:nodes {id: toInteger(row.src_id)}) AS src_node
RETURN src_node.name, dest_node.name
based on my research, this doesn't appear possible. Any suggestions would be appreciated. Is it a limitation or problem with the structure of my dataset?

The problem you're running into is you're using the same variable, n, to refer to both nodes, so that isn't going to work. If you want to use src_node and dest_node as variables, you can:
LOAD CSV WITH HEADERS FROM "file:///redteam_connections.csv" AS row
MATCH (destNode:nodes {id: toInteger(row.dest_id)}), (srcNode:nodes {id: toInteger(row.src_id)})
CREATE (destNode)-[:AUTHENTICATION]->(srcNode)
You definitely want to add in index on :nodes(id) so your lookups are fast, and you may want to reconsider the :nodes label. By convention labels tend to be capitalized and singular (plural is usually used for when you actually collect() items into a list), so :Node would be more appropriate here.
If your CSV is large, I also recommend you use periodic commit to allow batching and prevent blowing your heap.

Related

Load csv in neo4j with nodes and relationships in one csv file

Apologies as I am new to neo4j and struggling with what I imagine is a very simple example.
I would like to model an org chart which I have stored as a csv like so
id,name,manager_id
1,allan,2
2,bob,4
3,john,2
4,sam,
5,Jim,2
Note that Bob has 3 direct reports and Bob reports into Sam who doesn't report into anyone.
I would like to produce a graph which shows the management chain. I have tried the following, but it produces relationships which are disjoint from the people:
LOAD CSV WITH HEADERS FROM "file///employees.csv" AS csvLine
CREATE (p:Person {id: csvLine.id, name: csvLine.name})
CREATE (p)-[:MANAGED_BY {manager: csvLine.manager_id}]->(p)
This query creates a bunch of self-referencing relationships. Is there anyway to populate the graph with one command over the single csv? I must be missing something and any help is appreciated. Thanks
I think this is what you are looking for.
In your query tou are creating a relationship between p and p thus the self referencing relationships.
I added a coalesce statement to deal with people that do not have a manager_id value. THis way Sam can report to himself.
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS csvLine
// create or match the person in the left column
MERGE (p:Person {id: csvLine.id })
// if they are created then assign their name
ON CREATE SET p.name = csvLine.name
// create or match the person/manager in the right column
MERGE (p1:Person {id: coalesce(csvLine.manager_id, csvLine.id) })
// create the reporting relationship
CREATE (p)-[:MANAGED_BY]->(p1)

Neo4J CSV relationships

I am a Neo4J newbie and I have a simple CSV with source and dest IPs. I'd like to create a relationship between nodes with the same labels.
Something like ... source_ip >> ALERTS >> dest_ip, or the reverse.
"dest_ip","source_ip"
"130.102.82.16","54.231.19.32"
"130.102.82.116","114.30.64.11"
"130.102.82.116","114.30.64.11"
...
LOAD CSV WITH HEADERS
FROM "file:///Users/me/Desktop/query_result.csv" AS csvLine
CREATE (alert:Alert { source_ip: csvLine.source_ip, dest_ip: csvLine.dest_ip})
MATCH (n:Alert) RETURN n LIMIT 25
dest_ip 130.102.82.16 source_ip 54.231.19.32
....
This works fine. My question is how I create the relationship between the labels inside the alerts? I've tried and failed a slew of times. I'm guessing I need to set up separate Nodes for Source and Dest and then link them, just unsure how.
Thanks in advance!
Peace,
Tom
First create a constraint like this, to guarantee uniqueness and speed up the MERGE operation.
CREATE CONSTRAINT ON (a:Alert) ASSERT a.ip IS UNIQUE;
You can use as many CREATE statements as you want, and then MERGE the relationship, like this:
LOAD CSV WITH HEADERS
FROM "file:///Users/me/Desktop/query_result.csv" AS csvLine
MERGE (node1:Alert { ip: csvLine.source_ip })
MERGE (node2:Alert { ip: csvLine.dest_ip })
MERGE (node1)-[r:ALERT]->(node2)
By the by, I'd recommend using MERGE in most places to make sure you don't end up creating duplicates. In this file, a certain IP address might be listed many times, you don't want a new node each time it's created, you probably want all references under that one IP address, hence MERGE here instead of CREATE
Assuming that your graph model is something like
(:source)-[:ALERT]->(:Destination)
The following Cypher query will create that relationship
LOAD CSV WITH HEADERS FROM "file:///Users/me/Desktop/query_result.csv" AS csvLine
CREATE (source:Source { ip: csvLine.source_ip })-[:ALERTS]->(dest:Destination { ip: csvLine.dest_ip})

Extraction of unique nodes from csv in neo4j

In shortest way my problem is below:
I need to get from the following csv file
(https...)drive.google.com/file/d/0B-y9nPaqlH6XdXZsYzAwLThacTg/view?usp=sharing
The following data-structure in neo4j (Using cypher import):
https://drive.google.com/file/d/0B-y9nPaqlH6XdlZHM216eDRSX3c/view?usp=sharing
Instead of:
[https://drive.google.com/file/d/0B-y9nPaqlH6XdE9vZ0gyNU1lR0U/view?usp=sharing]
The longer interpretation:
I thought, the solution of my problem is just need to understand to (un)bound elements.
But I tried many times, in many ways (with(out) creating single nodes first, or in empty database):
LOAD CSV with headers FROM "file:///C:/Users/user/Desktop/neo4j help/calling.csv"
AS csvLine
MERGE (u1:Person { number:(csvLine.A), name:(csvLine.name_A)}) MERGE (u2:Person { number:(csvLine.B), name:(csvLine.name_B)})
MERGE (u1:Person { number:(csvLine.A), name:(csvLine.name_A)})-[c:called]->(u2:Person { number:(csvLine.B), name:(csvLine.name_B)})
RETURN u1.name,c,u2.name
I got instead of wondered results just error message:
Can't create u1 with properties or labels here. It already exists in
this context
And without „pre-merging“ the nodes, I have the results above (in the pink picture)
What do I need to obtain the wanted result (in the first picture)?
You don't need to redefine the u1 and u2 nodes. Just reuse the identifiers and MERGE the relationship :
LOAD CSV with headers FROM "file:///C:/Users/user/Desktop/neo4j help/calling.csv"
AS csvLine
MERGE (u1:Person { number:(csvLine.A), name:(csvLine.name_A)})
MERGE (u2:Person { number:(csvLine.B), name:(csvLine.name_B)})
MERGE (u1)-[c:CALLED]->(u2)
RETURN u1.name,c,u2.name
Nb: I think your images are both the same, and you can post them in your questions, many people will skip your question because they need to open 2 or 3 more browser windows

can't create links from CSV in Neo4j

I can't figure out how to create links out of CSV tables in Neo4j. I've read several parts of the manual (match, loadCSV, etc), that free book, and several tutorials I've found. None of them seems to contemplate my use case (which is weird, because I think it's a pretty simple use case). I've tried adapting the code they have in all sorts of ways, but nothing seems to work.
So, I have three CSV tables: parent companies, child companies, and parent-child pairs. I begin by loading the first two tables (and that works fine - all the properties are there, all the info is correct):
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/children.csv" AS node
CREATE (:Children {id: node[0], name: node[1]})
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/parents.csv" AS node
CREATE (:Parent {id: node[0], name: node[1]})
Now, here's the structure of the third table:
child_id,parent_id
Here's some of the things I've tried:
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {Parent: rels[1]}), (TO {Children: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This doesn't give me an eror, but it returns zero rows.
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {id: rels[1]}), (TO {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This doesn't give me an error, but it just returns a bunch of pairs of empty nodes. So, it creates the links, but somehow it doesn't link the actual nodes.
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {Parent.id: rels[1]}), (TO {Children.id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This gives me a syntax error (Neo.ClientError.Statement.InvalidSyntax)
I also tried several variations of the code blocks above, but to no avail. So, what am I doing wrong? (I'm on Neo4j 2.1.6, in case that matters.)
In your cypher statement, you are not referencing to the same identifiers used in the MATCH for creating the relationship, so he will just create new empty nodes :
Look at the difference :
MATCH (FROM {id: rels[1]}), (TO {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
Instead it should be :
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (Parent {id: rels[1]}), (Children {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)

neo4j Optimize a relationsship check (query)

after importing data via CSV LOAD I want to connect the imported nodes to customer nodes that are already in the DB. The idea was to look up all imported nodes with the Label TICKET and run through the result set and create the relationship.
Here is the code I come up with first approach:
# Find nodes without relationship for label Ticket
MATCH (t:Ticket), (c:Customer)
WHERE NOT (t)--(c)
RETURN t.number as ticket_number, t.type as ticket_type,t.sid as ticket_sid
# Run through the resultset and execute for each found node
MATCH (t:Ticket { number: "xxx" }), (c:Customer {code: "xxx"})
MERGE (t)-[:IS_TICKET_OF]->(c);
There is an index
ON :Ticket (number)
ON :Customer(code)
This way to handle it is very slow and it took minutes to run through the CSV file. I hope there is a way to optimize the query or maybe to find a way to create the missing relationship easier as first to look them all up and then run through a loop.
The CSV Load is :
LOAD CSV FROM "file:c:..." AS csvLine
MERGE (t:Ticket { number: csvLine[0]})
Maybe its also fine to create the relation already in the CSV import - maybe something like
MATCH (c:Customer {code:"xxx"})
MERGE (t) - [:IS_TICKET_OF]-> (c)
But I would need to figure out in the query how to extract the code from a field as I have something like "aaa/vvv/bbb/1234" in the CSV import and would need only aaa for the match above as this is stored in the customer node as ID.
Any hint is very appreciated.
Thanks!
Does this query work for you?
It stores the aaa part of the input string in num, makes sure the ticket with that number exists, and then makes sure a relationship exists to the matching customer (if there is such a customer).
LOAD CSV FROM "file:c:..." AS csvLine
WITH SPLIT(csvLine[0], '/')[0] AS num
MERGE (t:Ticket {number: num})
WITH num, t
OPTIONAL MATCH (c:Customer {code: num})
MERGE (t)-[:IS_TICKET_OF]->(c);

Resources