Agregate cypher query - neo4j

This is my database in Neo4j:
CREATE (Alex:Person {name:'Alex', phone:'0420965111'})
CREATE (Oxana:Person {name:'Oxana', email:'oxana#mail.com'})
CREATE (Tango:Dance {name:'Tango'})
CREATE (Ballet:Dance {name:'Ballet'})
CREATE (Zouk:Dance {name:'Zouk'})
CREATE (Saturday:Day {name:'Saturday'})
CREATE (Sunday:Day {name:'Sunday'})
CREATE (Wednesday:Day {name:'Wednesday'})
MERGE (Alex)-[:LIKES]->(Tango)
MERGE (Alex)-[:LIKES]->(Zouk)
MERGE (Oxana)-[:LIKES]->(Tango)
MERGE (Oxana)-[:LIKES]->(Ballet)
MERGE (Alex)-[:AVAILABLE_ON]->(Sunday)
MERGE (Alex)-[:AVAILABLE_ON]->(Wednesday)
MERGE (Oxana)-[:AVAILABLE_ON]->(Sunday)
MERGE (Oxana)-[:AVAILABLE_ON]->(Saturday)
I need a list of more than 1 person who likes the same dance and available on the same day. How to write a query which returns this?:
"Sunday", "Tango", ["Alex","Oxana"]
This almost works: match (p:Person), (d:Dance), (day:Day) where (p)-[:LIKES]->(d) and (p)-[:AVAILABLE_ON]->(day) return day.name, d.name, collect(p.name), count(*) But I don't know how to exclude records where count(*) is less than 2.

You can use WITH:
match (p:Person), (d:Dance), (day:Day)
where (p)-[:LIKES]->(d) and (p)-[:AVAILABLE_ON]->(day)
with day.name as day, d.name as dance, collect(p.name) as names, count(*) as count
where count >= 2
return day, dance, names
From the docs:
The WITH clause allows query parts to be chained together, piping the
results from one to be used as starting points or criteria in the
next.
Also, you can add a constraint (WHERE clause) to filter data.

Related

How to do this in a single Cypher Query?

So this is a very basic question. I am trying to make a cypher query that creates a node and connects it to multiple nodes.
As an example, let's say I have a database with towns and cars. I want to create a query that:
creates people, and
connects them with the town they live in and any cars they may own.
So here goes:
Here's one way I tried this query (I have WHERE clauses that specify which town and which cars, but to simplify):
MATCH (t: Town)
OPTIONAL MATCH (c: Car)
MERGE a = ((c) <-[:OWNS_CAR]- (p:Person {name: "John"}) -[:LIVES_IN]-> (t))
RETURN a
But this returns multiple people named John - one for each car he owns!
In two queries:
MATCH (t:Town)
MERGE a = ((p:Person {name: "John"}) -[:LIVES_IN]-> (t))
MATCH (p:Person {name: "John"})
OPTIONAL MATCH (c:Car)
MERGE a = ((p) -[:OWNS_CAR]-> (c))
This gives me the result I want, but I was wondering if I could do this in 1 query. I don't like the idea that I have to find John again! Any suggestions?
It took me a bit to wrap my head around why MERGE sometimes creates duplicate nodes when I didn't intend that. This article helped me.
The basic insight is that it would be best to merge the Person node first before you match the towns and cars. That way you won't get a new Person node for each relationship pattern.
If Person nodes are uniquely identified by their name properties, a unique constraint would prevent you from creating duplicates even if you run a mistaken query.
If a person can have multiple cars and residences in multiple towns, you also want to avoid a cartesian product of cars and towns in your result set before you do the merge. Try using the table output in Neo4j Browser to see how many rows are getting returned before you do the MERGE to create relationships.
Here's how I would approach your query.
MERGE (p:Person {name:"John"})
WITH p
OPTIONAL MATCH (c:Car)
WHERE c.licensePlate in ["xyz123", "999aaa"]
WITH p, COLLECT(c) as cars
OPTIONAL MATCH (t:Town)
WHERE t.name in ["Lexington", "Concord"]
WITH p, cars, COLLECT(t) as towns
FOREACH(car in cars | MERGE (p)-[:OWNS]->(car))
FOREACH(town in towns | MERGE (p)-[:LIVES_IN]->(town))
RETURN p, towns, cars

Remove unnecessary relationships between nodes?

I tried to build a graph model using data, here is the cypher query
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (y:Year {year:toInteger(line.YearofJoining)})
ON CREATE SET y.month = line.MonthNamofJoining
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (ag)-[:LOCALITY]->(c)
MERGE (c)-[:JOINING_YEAR]->(y)
I need to return all connecting path between four employees, so I tried below query
MATCH p = (a:Employee)-[:AGE]->(ag)-[:LOCALITY]-(c)-[:JOINING_YEAR]-(y)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
But the result i got correct but there are many unnecessary paths. i am uploading the resulted graph please check and correct where i am doing wrong.resulted image
There are potentially several things to fix in the import query:
The year nodes are misleading. I think you should extract the month attribute to a separate node, like this:
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
Also, the modelling seems wrong. Currently, a Location is linked to year (or soon: month in year) via JOINING_YEAR. An age is linked to a location. This does seem to make sense.
You probably want an intermediate node to represent the fact that an employee has a joined a location (given Neo4j doesn't support relationships between more than 2 nodes).
LOAD CSV WITH HEADERS FROM 'file:///y.csv' AS line
MERGE (a:Employee {empid:line.EmpID})
ON CREATE SET a.firstname = line.FirstName, a.lastname = line.LastName
MERGE (ag:Age {age:toInteger(line.AgeinYrs)})
MERGE (a)-[:AGE]->(ag)
MERGE (y:Year {year:toInteger(line.YearofJoining))
MERGE (m:Month {month:line.MonthNamofJoining})-[:MONTH_IN_YEAR]->(y)
MERGE (c:Location {city:line.City})
ON CREATE SET c.pincode = line.PinCode,c.county = line.County,c.state =
line.State,c.region = line.Region
MERGE (j:Join {empid:line.EmpID}) // need a property to merge on
MERGE (a)-[:JOINED]->(j)
MERGE (j)-[:LOCALITY]->(c)
MERGE (j)-[:JOINING_MONTH]->(m)
Your read query becomes:
MATCH p = (:Location)<-[:LOCALITY]-(:Join)<-[:JOINED]-(a:Employee)-[:AGE]->(:Age)
WHERE a.empid IN ['840300','840967','346058','320954']
return p limit 25
Unrelated formatting note:
the recommended case for attribute is camelCase (e.g. empId instead of empid) and for relation types is SNAKE_CASE (e.g. JOINING_YEAR instead of JOININGYEAR).
By convention, relation types are verbs more often than not.

Cypher query for isolating records on time basis

I am trying to do CDR (Call Details Record) Analysis on mobile calls data. Calls are made by a PERSON, THROUGH a tower and CONNECTS to a number. I want to isolate calls that were made prior to a certain date and time and the calling number does not exist after that particular date and time in the records. My current query only shows me data prior to the particular occurrence I am looking for:
MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that? Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
After running the query as suggested this is what EXPLAIN looks like:
If it helps, this is how I loaded the csv file in neo4j:
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)
However, how do I now isolate only those records which exist before t.time=1500399900 and not after that?
Let's create a small example data set:
CREATE
(a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
(b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
(b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
(a1)-[:THROUGH {time: 1}]->(:TOWER)-[:CONNECTS]->(b1),
(a1)-[:THROUGH {time: 3}]->(:TOWER)-[:CONNECTS]->(b2),
(a2)-[:THROUGH {time: 2}]->(:TOWER)-[:CONNECTS]->(b3),
(a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)
It looks like this when visualized:
This query might do the trick for you:
MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1
After the first match, it looks for calls of PERSON a that were initiated after timestamp 5. There might be no such calls, hence we it uses OPTIONAL MATCH. The value of t2 will be null if there were no calls after the specified timestamp, so we do an IS NULL check and return the filtered results.
Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
If you use the graph visualizer, it usually cannot render more than a few hundred nodes. Possible workarounds:
Use the Text view of the web browser that scales better.
Paginate by using SKIP ... LIMIT ....

Some way to create 1 million relationships with a neo4j query

With this query I am importing 75000 nodes from my csv file. (Category)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM "file:///prodcategory.csv" AS row
CREATE (:Category {id: row.idProdCategory, name: row.name, idRestaurant: row.idRestaurant});
And with this query I am also importing 1 million nodes from my csv file (Product)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM "file:///products.csv" AS row
CREATE (:Product {id: row.idProduct, idProductCategory: row.idProductCategory,name: row.name,idRestaurant:row.idRestaurant ,description: row.description, price: row.price, shipping_price: row.shippingPrice});
I am using this query to create the relationship between id -> category and idProductCategory -> products.
MATCH (category:Category {id: category.id})
MATCH (Product:Product {idProductCategory: Product.idProductCategory})
WHERE Product.idProductCategory=category.id
MERGE (category)-[:OF_CATEGORY]->(Product);
This query only creates 2999 relationships and I do not believe the 1 million relationships I should create, please if there is a method or configuration to be able to create more than 1 million relationships please help me I would be very grateful.
Ensure you have indexes on Product.idProductCategory.
I assume that the category id is unique across categories.
CREATE CONSTRAINT ON (category:Category) ASSERT category.id IS UNIQUE;
I assume that there are multiple products with the same category ID.
CREATE INDEX ON :Product(idProductCategory);
Then you can simply match each category and then for each category find the appropriate products and create the relationships.
// match all of your categories
MATCH (category:Category)
// then with each category find all the products
WITH category
MATCH (Product:Product {idProductCategory: category.id })
// and then create the
MERGE (category)-[:OF_CATEGORY]->(Product);
If you are running into memory constraints you could use the APOC periodic commit to wrap your query...
call apoc.periodic.commit("
MATCH (category:Category)
WITH category
MATCH (Product:Product {idProductCategory: category.id })
MERGE (category)-[:OF_CATEGORY]->(Product)
",{limit:10000})
try to change your query to this... you are using too many filters in your query
check docs for MATCH
MATCH (category:Category),(Product:Product)
WHERE Product.idProductCategory=category.id
MERGE (category)-[:OF_CATEGORY]->(Product)
you can also just change your second import query, so you do not need a separate query for linking.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM "file:///products.csv" AS row
CREATE (p:Product {id: row.idProduct, name: row.name,idRestaurant:row.idRestaurant ,description: row.description, price: row.price, shipping_price: row.shippingPrice})
MATCH (c:Category{id:row.idProductCategory}
MERGE (p)-[:OF_CATEGORY]->(c)

Order list without scanning every node

When using LIMIT with ORDER BY, every node with the selected label still gets scanned (even with index).
For example, let's say I have the following:
MERGE (:Test {name:'b'})
MERGE (:Test {name:'c'})
MERGE (:Test {name:'a'})
MERGE (:Test {name:'d'})
Running the following gets us :Test {name: 'a'}, however using PROFILE we can see the entire list get scanned, which obviously will not scale well.
MATCH (n:Node)
RETURN n
ORDER BY n.name
LIMIT 1
I have a few sorting options available for this label. the order of nodes within these sorts should not change often, however, I can't cache these lists because each list is personalized for a user, i.e. a user may have hidden :Test {name:'b'}
Is there a golden rule for something like this? Would creating pointers from node to node for each sort option be a good option here? Something like
(n {name:'a'})-[:ABC_NEXT]->(n {name:'b'})-[:ABC_NEXT]->(n {name:'c'})-...
Would I be able to have multiple sort pointers? Would that be overkill?
Ref:
https://neo4j.com/blog/moving-relationships-neo4j/
http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-relationships-between-a-collection-of-nodes-invalid-input/
Here's what I ended up doing for anyone interested:
// connect nodes
MATCH (n:Test)
WITH n
ORDER BY n.name
WITH COLLECT(n) AS nodes
FOREACH(i in RANGE(0, length(nodes)-2) |
FOREACH(node1 in [nodes[i]] |
FOREACH(node2 in [nodes[i+1]] |
CREATE UNIQUE (node1)-[:IN_ORDER_NAME]->(node2))))
// create list, point first item to list
CREATE (l:List { name: 'name' })
WITH l
MATCH (n:Test) WHERE NOT (m)<-[:IN_ORDER_NAME]-()
MERGE (l)-[:IN_ORDER_NAME]->(n)
// getting 10 nodes sorted alphabetically
MATCH (:List { name: 'name' })-[:IN_ORDER_NAME*]->(n)
RETURN n
LIMIT 10

Resources