Get the leaves of every node in Node4J - neo4j

So For example take this data which is stored in csv file:
source,child
A,B
B,C
C,D
X,Y
Y,Z
And i load it like this:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:src {id: line.source})
MERGE (d:dst {id: line.child})
CREATE (s)-[:FEEDs_INTO]->(d)
In my example we have 2 leaves - A and X, but there may be multiple leaves for one node. Now I want to get every leaf-node connection. So for my example I want something like this:
A,B
A,C
A,D
X,Y
X,Z
How Can i do it ?

Your data model can be improved further. Since src and dst are connected, then you can label them in one class (let say, label it as "node"). Thus your loading script can be as follows:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:node {id: line.source})
MERGE (d:node {id: line.child})
MERGE (s)-[:FEEDs_INTO]->(d)
Then your query will be as simple as below:
MATCH (child:node)-[:FEEDs_INTO*]->(parent:node)
WHERE NOT EXISTS((:node)-->(child))
RETURN child, parent
where the * in the relationship means the path may vary from 1 to a length of maximum x. This is like you want to jump from A to B (length: 1) to C (length: 2) then to D (length: 3) and so on. The where clause makes sure that the child is a leaf without a node attached to it.
Result:
╒══════════╤══════════╕
│"child" │"parent" │
╞══════════╪══════════╡
│{"id":"A"}│{"id":"B"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"C"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"D"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Y"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Z"}│
└──────────┴──────────┘

Related

can you add a value as the relationship type from the csv

eg [:owes] instead of this i would like the amount they owe (row.amount)
couldnt come up with much
Below simple cypher script will load the csv file then create a relationship type based on the row.amount and uses APOC (awesome procedure)
LOAD CSV WITH HEADERS FROM "file:///testing.csv" AS row
MERGE (p:Person {name: row.fromPerson})
MERGE (m:Person {name: row.toPerson})
WITH p, m, row
CALL apoc.create.relationship(p, row.amount, {amount: row.amount}, m) YIELD rel
RETURN p, m, rel;
Sample testing.csv:
fromPerson,amount,toPerson
"Tom Hanks",100,"Meg Ryan"
Sample Result:
You wouldn't want to have this as relationship type. The standard way of storing such information is to keep the OWES label as a type and store the amount value as relationship property.
Example statement :
LOAD CSV FROM file:///... AS row
MERGE (from:User {id: row.from_id})
MERGE (to:User {id: row.to_id})
MERGE (from)-[r:OWES]->(to)
SET r.amount = row.amount
If for visualisation purposes you want to see the amount as the caption for the relationship in the Neo4j browser, you can do the following.
Click on the relationship type in the panel on the right
Select the property you want to use as caption

Import data from 2 csv files in neo4j

As a continue to this post in which I completely explained what I'm supposed to do, in case my central node is located in another .csv file, how can I import it in my graph?
The content of names.csv (2 columns: Lname & Fname):
Lname,Fname
Brown,Helen
Right,Eliza
Green,Helen
Pink,Kate
Yellow,Helen
The content of central.csv (2 columns: central & value):
central,value
cent1,10
I tried something like this:
LOAD CSV WITH HEADERS FROM 'file:///central.csv' AS frow
MERGE (c:center {name: frow.central})
WITH *
LOAD CSV WITH HEADERS FROM 'file:///names.csv' AS srow
WITH srow.Fname AS first, srow.Lname AS last
MERGE (p:la {last: last})
MERGE (o:fi {first: first})
MERGE (c)-[r:CONTAINS {first:first}]->(o)
MERGE (o)-[rel:CONTAINS {first: first}]->(p)
RETURN count(o)
but it didn't work for me. It created the central node for me, but my central node is not connected to first nodes as it was supposed to. What's wrong. I wanted it to be like this:
Your second WITH clause does not contain c, so c becomes an unbound variable after that clause.
Change this:
WITH srow.Fname AS first, srow.Lname AS last
to this:
WITH c, srow.Fname AS first, srow.Lname AS last
By the way, your query will only work as you expect if central.csv contains just one data row (as it does currently).

Longest Path Neo4j returning incorrect path

I have the following graph stored in csv format:
graphUnioned.csv:
a b
b c
The above graph denotes path from Node:a to Node:b. Note that the first column in the file denotes source and the second column denotes destination. With this logic the second path in the graph is from Node:b to Node:c. And the longest path in the graph is: Node:a to Node:b to Node:c.
I loaded the above csv in Neo4j desktop using the following command:
LOAD CSV WITH HEADERS FROM "file:\\graphUnioned.csv" AS csvLine
MERGE (s:s {s:csvLine.s})
MERGE (o:o {o:csvLine.o})
MERGE (s)-[]->(o)
RETURN *;
And then for finding longest path I run the following command:
match (n:s)
where (n:s)-[]->()
match p = (n:s)-[*1..]->(m:o)
return p, length(p) as L
order by L desc
limit 1;
However unfortunately this command only gives me path from Node: a to Node:b and does not return the longest path. Can someone please help me understand as to where am I going wrong?
There are two mistakes in your CSV import query.
First, you need to use a type when you MERGE a relationship between nodes, that query won't compile otherwise. You likely supplied one and forgot to add it when you pasted it here.
Second, the big one, is that your query is merging nodes with different labels and different properties, and this is majorly throwing it off. Your intent was to create 3 nodes, with a longest path connecting them, but your query creates 4 nodes, two isolated groups of two nodes each:
This creates 2 b nodes: (:s {s:b}) and (:o {o:b}). Each of them is connected to a different node, and this is due to treating the nodes to be created from each variable in the CSV differently.
What you should be doing is using the same label and property key for all of the nodes involved, and this will allow the match to the b node to only refer to a single node and not create two:
LOAD CSV WITH HEADERS FROM "file:\\graphUnioned.csv" AS csvLine
MERGE (s:Node {value:csvLine.s})
MERGE (o:Node {value:csvLine.o})
MERGE (s)-[:REL]->(o)
RETURN *;
You'll also want an index on :Node(value) (or whatever your equivalent is when you import real data) so that your MERGEs and subsequent MATCHes are fast when performing lookups of the nodes by property.
Now, to get to your longest path query.
If you are assuming that the start node has no relations to it, and that your end node has no relationships from it, then you can use a query like this:
match (start:Node)
where not ()-->(start)
match p = (start)-[*]->(end)
where not (end)-->()
return p, length(p) as L
order by L desc
limit 1;

csv load into Neo4j and create relationship

I have a csv file with below columns and sample data provided, and I've loaded into Neo4j and got stuck when I was trying to create relationships.
**source destination miles**
a b 5
a c 6
a d 20
Now I want to create a graph with source in middle and connected destinations around and label with miles between two stops.(A star graph with source in middle), so I tried below queries, it's not returning miles on the label, I'm new to Neo4j, any help is appreciated, thanks in advance.
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line
CREATE (s:src{id: line.source})
CREATE (d:dst{id: line.destination})
CREATE (s)-[r:trips {total: [line.miles]}]->(d)
RETURN s, d, r;
By default, LOAD CSV expects the CSV file to use comma separators, and it does not support extraneous whitespace. Try changing the content of your CSV file to this:
source,destination,miles
a,b,5
a,c,6
a,d,20
Also, you should use MERGE instead of CREATE to avoid creating duplicate nodes. And there is no evident need to store the miles value in an array, so this query stores it as a scalar value:
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line
MERGE (s:src {id: line.source})
MERGE (d:dst {id: line.destination})
CREATE (s)-[r:trips {miles: line.miles}]->(d)
RETURN s, d, r;
The result of the above is:
╒══════════╤══════════╤══════════════╕
│"s" │"d" │"r" │
╞══════════╪══════════╪══════════════╡
│{"id":"a"}│{"id":"b"}│{"miles":"5"} │
├──────────┼──────────┼──────────────┤
│{"id":"a"}│{"id":"c"}│{"miles":"6"} │
├──────────┼──────────┼──────────────┤
│{"id":"a"}│{"id":"d"}│{"miles":"20"}│
└──────────┴──────────┴──────────────┘

Cypher query for isolating records on time basis

I am trying to do CDR (Call Details Record) Analysis on mobile calls data. Calls are made by a PERSON, THROUGH a tower and CONNECTS to a number. I want to isolate calls that were made prior to a certain date and time and the calling number does not exist after that particular date and time in the records. My current query only shows me data prior to the particular occurrence I am looking for:
MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that? Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
After running the query as suggested this is what EXPLAIN looks like:
If it helps, this is how I loaded the csv file in neo4j:
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that?
Let's create a small example data set:
CREATE
(a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
(b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
(b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
(a1)-[:THROUGH {time: 1}]->(:TOWER)-[:CONNECTS]->(b1),
(a1)-[:THROUGH {time: 3}]->(:TOWER)-[:CONNECTS]->(b2),
(a2)-[:THROUGH {time: 2}]->(:TOWER)-[:CONNECTS]->(b3),
(a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)
It looks like this when visualized:
This query might do the trick for you:
MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1
After the first match, it looks for calls of PERSON a that were initiated after timestamp 5. There might be no such calls, hence we it uses OPTIONAL MATCH. The value of t2 will be null if there were no calls after the specified timestamp, so we do an IS NULL check and return the filtered results.
Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
If you use the graph visualizer, it usually cannot render more than a few hundred nodes. Possible workarounds:
Use the Text view of the web browser that scales better.
Paginate by using SKIP ... LIMIT ....

Resources