I am trying to do CDR (Call Details Record) Analysis on mobile calls data. Calls are made by a PERSON, THROUGH a tower and CONNECTS to a number. I want to isolate calls that were made prior to a certain date and time and the calling number does not exist after that particular date and time in the records. My current query only shows me data prior to the particular occurrence I am looking for:
MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that? Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
After running the query as suggested this is what EXPLAIN looks like:
If it helps, this is how I loaded the csv file in neo4j:
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that?
Let's create a small example data set:
CREATE
(a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
(b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
(b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
(a1)-[:THROUGH {time: 1}]->(:TOWER)-[:CONNECTS]->(b1),
(a1)-[:THROUGH {time: 3}]->(:TOWER)-[:CONNECTS]->(b2),
(a2)-[:THROUGH {time: 2}]->(:TOWER)-[:CONNECTS]->(b3),
(a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)
It looks like this when visualized:
This query might do the trick for you:
MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1
After the first match, it looks for calls of PERSON a that were initiated after timestamp 5. There might be no such calls, hence we it uses OPTIONAL MATCH. The value of t2 will be null if there were no calls after the specified timestamp, so we do an IS NULL check and return the filtered results.
Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
If you use the graph visualizer, it usually cannot render more than a few hundred nodes. Possible workarounds:
Use the Text view of the web browser that scales better.
Paginate by using SKIP ... LIMIT ....
Related
eg [:owes] instead of this i would like the amount they owe (row.amount)
couldnt come up with much
Below simple cypher script will load the csv file then create a relationship type based on the row.amount and uses APOC (awesome procedure)
LOAD CSV WITH HEADERS FROM "file:///testing.csv" AS row
MERGE (p:Person {name: row.fromPerson})
MERGE (m:Person {name: row.toPerson})
WITH p, m, row
CALL apoc.create.relationship(p, row.amount, {amount: row.amount}, m) YIELD rel
RETURN p, m, rel;
Sample testing.csv:
fromPerson,amount,toPerson
"Tom Hanks",100,"Meg Ryan"
Sample Result:
You wouldn't want to have this as relationship type. The standard way of storing such information is to keep the OWES label as a type and store the amount value as relationship property.
Example statement :
LOAD CSV FROM file:///... AS row
MERGE (from:User {id: row.from_id})
MERGE (to:User {id: row.to_id})
MERGE (from)-[r:OWES]->(to)
SET r.amount = row.amount
If for visualisation purposes you want to see the amount as the caption for the relationship in the Neo4j browser, you can do the following.
Click on the relationship type in the panel on the right
Select the property you want to use as caption
I am working on creating a graph database in neo4j for a CALL dataset. The dataset is stored in csv file with following columns: Source, Target, Timestamp, Duration. Here Source and Target are Person id's (numeric), Timestamp is datetime and duration is in seconds (integer).
I modeled my graph where person are nodes(person_id as property) and call as relationship (time and duration as property).
There are around 2,00,000 nodes and around 70 million relationships. I have a separate csv files with person id's which I used to create the nodes. I also added uniqueness constraint on the Person id's.
CREATE CONSTRAINT ON ( person:Person ) ASSERT (person.pid) IS UNIQUE
I didn't completely understand the working of bulk import so I wrote a python script to split my csv into 70 csv's where each csv has 1 million nodes (saved as calls_0, calls_1, .... calls_69). I took the initiative to manually run a cypher query changing the filename every time. It worked well(fast enough) for first few(around 10) files but then I noticed that after adding relationship from a file, the import is getting slower for the next file. Now it is taking almost 25 minutes for importing a file.
Can someone link me to an efficient and easy way of doing it?
Here is the cypher query:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
MERGE (p1)-[rel:CALLS {time: time, duration: Duration}]->(p2)
RETURN count(rel)
I am using Neo4j 4.0.3
Your MERGE clause has to check for an existing matching relationship (to avoid creating duplicates). If you added a lot of relationships between Person nodes, that could make the MERGE clause slower.
You should consider whether it is safe for you to use CREATE instead of MERGE.
Is much better if you export the match using the ID of each node and then create the relationship.
POC
CREATE INDEX ON :Person(`pid`);
CALL apoc.export.csv.query("LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
RETURN ID(a) AS ida,ID(b) as idb,time,Duration","rels.csv", {});
and then
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:////rels.csv' AS row
MATCH (a:Person) WHERE ID(a) = toInt(row.ida)
MATCH (b:Person) WHERE ID(b) = toInt(row.idb)
MERGE (b)-[:CALLS {time: row.time, duration: Duration}]->(a);
For me this is the best way to do this.
This is my database in Neo4j:
CREATE (Alex:Person {name:'Alex', phone:'0420965111'})
CREATE (Oxana:Person {name:'Oxana', email:'oxana#mail.com'})
CREATE (Tango:Dance {name:'Tango'})
CREATE (Ballet:Dance {name:'Ballet'})
CREATE (Zouk:Dance {name:'Zouk'})
CREATE (Saturday:Day {name:'Saturday'})
CREATE (Sunday:Day {name:'Sunday'})
CREATE (Wednesday:Day {name:'Wednesday'})
MERGE (Alex)-[:LIKES]->(Tango)
MERGE (Alex)-[:LIKES]->(Zouk)
MERGE (Oxana)-[:LIKES]->(Tango)
MERGE (Oxana)-[:LIKES]->(Ballet)
MERGE (Alex)-[:AVAILABLE_ON]->(Sunday)
MERGE (Alex)-[:AVAILABLE_ON]->(Wednesday)
MERGE (Oxana)-[:AVAILABLE_ON]->(Sunday)
MERGE (Oxana)-[:AVAILABLE_ON]->(Saturday)
I need a list of more than 1 person who likes the same dance and available on the same day. How to write a query which returns this?:
"Sunday", "Tango", ["Alex","Oxana"]
This almost works: match (p:Person), (d:Dance), (day:Day) where (p)-[:LIKES]->(d) and (p)-[:AVAILABLE_ON]->(day) return day.name, d.name, collect(p.name), count(*) But I don't know how to exclude records where count(*) is less than 2.
You can use WITH:
match (p:Person), (d:Dance), (day:Day)
where (p)-[:LIKES]->(d) and (p)-[:AVAILABLE_ON]->(day)
with day.name as day, d.name as dance, collect(p.name) as names, count(*) as count
where count >= 2
return day, dance, names
From the docs:
The WITH clause allows query parts to be chained together, piping the
results from one to be used as starting points or criteria in the
next.
Also, you can add a constraint (WHERE clause) to filter data.
When using LIMIT with ORDER BY, every node with the selected label still gets scanned (even with index).
For example, let's say I have the following:
MERGE (:Test {name:'b'})
MERGE (:Test {name:'c'})
MERGE (:Test {name:'a'})
MERGE (:Test {name:'d'})
Running the following gets us :Test {name: 'a'}, however using PROFILE we can see the entire list get scanned, which obviously will not scale well.
MATCH (n:Node)
RETURN n
ORDER BY n.name
LIMIT 1
I have a few sorting options available for this label. the order of nodes within these sorts should not change often, however, I can't cache these lists because each list is personalized for a user, i.e. a user may have hidden :Test {name:'b'}
Is there a golden rule for something like this? Would creating pointers from node to node for each sort option be a good option here? Something like
(n {name:'a'})-[:ABC_NEXT]->(n {name:'b'})-[:ABC_NEXT]->(n {name:'c'})-...
Would I be able to have multiple sort pointers? Would that be overkill?
Ref:
https://neo4j.com/blog/moving-relationships-neo4j/
http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-relationships-between-a-collection-of-nodes-invalid-input/
Here's what I ended up doing for anyone interested:
// connect nodes
MATCH (n:Test)
WITH n
ORDER BY n.name
WITH COLLECT(n) AS nodes
FOREACH(i in RANGE(0, length(nodes)-2) |
FOREACH(node1 in [nodes[i]] |
FOREACH(node2 in [nodes[i+1]] |
CREATE UNIQUE (node1)-[:IN_ORDER_NAME]->(node2))))
// create list, point first item to list
CREATE (l:List { name: 'name' })
WITH l
MATCH (n:Test) WHERE NOT (m)<-[:IN_ORDER_NAME]-()
MERGE (l)-[:IN_ORDER_NAME]->(n)
// getting 10 nodes sorted alphabetically
MATCH (:List { name: 'name' })-[:IN_ORDER_NAME*]->(n)
RETURN n
LIMIT 10
in Neo4J I am trying to visualize a small amount of calls taken from a csv file ( fake numbers sample below):
A,B
1,4
1,5
1,2
2,7
2,9
2,11
3,15
I am dealing with each column (A,B) as phone numbers would be the nodes and the presence of a call between them ( A to B ) is the relationship
ideally the graph produced should show multiple relationships between
the nodes (e.g.:node with value 1 would have three connections to other nodes and one of these is node 2 with value 2 that has another three connections,finally node with value 3 would have a connection but be separate)
the code i am trying
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
CREATE (A:phone {number: row.A})
CREATE (B:phone {number: row.B})
WITH A as a MATCH (a)-[:CALLED*]-(m)
RETURN a,m
and obviously its producing repeated nodes and only single relationships with no 2nd level arrows..
what am I doing wrong here?
This should be most efficient:
create constraint on (p:phone) assert p.number is unique;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
with distinct row.A as value
MERGE (:phone {number: value});
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
with distinct row.B as value
MERGE (:phone {number: value});
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
MATCH (A:phone {number: row.A})
MATCH (B:phone {number: row.B})
MERGE (A)-[r:CALLED]->(B)
ON CREATE SET r.count = 1
ON MATCH SET r.count = r.count + 1;
Not really sure what you want to query though?
E.g.
MATCH (a:phone)-[r:CALLED]->(b)
RETURN a, sum(r.count) as calls
ORDER BY calls DESC LIMIT 10;