Window in cypher - neo4j

So basically it comes down to this. I have a (:PERSON) that used his (:CAR) at a given (:TIME). This triplet is fully connected. It might be that a (:CAR) is used by other (:PERSON) and a (:PERSON) can use multiple (:CAR) all of that at different (:TIME).
What I want to query is that for each combination (p:PERSON)-[:AT]-(t:TIME) I want the number of cars used in t-6H (p-[:USED]-(c:CAR)-[:AT]-(o:TIME) in t-6H).
Here is what I have achieved so far, but this only takes each :PERSON once.
MATCH (n:PERSON)-[:AT]-(t:TIME)
WITH n,t
MATCH (n)-[:USED]-(c:CAR)-[:AT]-(o:TIME)
WITH n,t,c,toFLoat(t.id) as current, toFloat(o.id) as previous
WITH n,t,c,current-previous as diff
WHERE (diff) >= 0 AND (diff) <= 3600*6
WITH n, count(distinct c) as cnt
RETURN n, cnt
Where :TIME(id) is a String containing the time in seconds
Hope this is clear. Thanks for the help.

You should count on person and 't' :
MATCH (n:PERSON)-[:AT]-(t:TIME)
WITH n,t
MATCH (n)-[:USED]-(c:CAR)-[:AT]-(o:TIME)
WITH n,t,c,toFLoat(t.id) - toFloat(o.id) as diff
WHERE (diff) >= 0 AND (diff) <= 3600*6
WITH n,t, count(distinct c) as cnt
RETURN n,t, cnt
Also you should make your TIME(id) a numeric value so you can remove the toFloat from your query which will improve the performance.

Maybe you should put your t Time in your USED relation.
Either you'll want only one USED per Person + Car then have a collection of times (no nice for querying)
or you'll have multiple USED

Related

Return node after aggregation in Cypher

I am having a hard time understanding how to properly use aggregate functions in Cypher.
Let say I have nodes labelled as Animal, with properties size and species.
For each species, I want to get the largest.
So far, I understand I can do it with the following :
MATCH (n:Animal)
WITH n.species as species, max(n.size) as size
RETURN species, size
And I will effectively get the largest sizes with corresponding species.
But how can I get the nodes instead of species ?
I can't return n because of the WITH statement, and I can't inject it into the WITH because it will break species aggregation.
I know this question has already been asked a few times, but the different solutions I came accross were case-specific and used relations
Any advice is very welcome
EDIT: I finally made it work with :
MATCH (n:Animal)
WITH n.species as species, max(n.size) as size, collect(n) as ns
UNWIND ns as n
WITH n
WHERE n.size = size
RETURN n
Is this the Cypher-way to settle things ? Seems a bit verbose and not efficient (all nodes are fetched here) to me, isn't there a more straightforward option ?
Since the MAX aggregation function does not return the node with the max value, you should not use it. Otherwise, you'd have to test the size of every animal twice to get both the max value and the node of interest (as you discovered).
You can instead use the REDUCE function to test the size of every animal just once:
MATCH (n:Animal)
WITH n.species AS species, COLLECT(n) as ns
RETURN species, REDUCE(s = {size: -1}, a IN ns |
CASE WHEN a.size > s.size THEN {size: a.size, a: a} ELSE s END
) AS result;
This is a frequently encountered limitation with our max() and min() aggregation functions, so we added an APOC function that can help: apoc.agg.maxItems():
apoc.agg.maxItems(item, value, groupLimit: -1) - returns a map {items:[], value:n} where value is the maximum value present, and items are all items with the same value. The number of items can be optionally limited.
MATCH (n:Animal)
WITH n.species as species, apoc.agg.maxItems(n.size, n) as sizeData
RETURN species, sizeData.value as size, sizeData.items as animals

Neo4j pipe data

Hi there I am on neo4j and I am having some trouble I have one query where I want to return a the a node (cuisine) with the highest percentage like so
// 1. Find the most_popular_cuisine
MATCH (n:restaurants)
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
RETURN r.cuisine , 100 * count(*)/total as percentage
order by percentage desc
limit 1
I am trying to extend this even further by getting the top result and matching to that to get nodes with just that property like so
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
WITH r.cuisine as cuisine , count(*) as cnt
MATCH (t:restaurants)
WHERE t.cuisine = cuisine AND count(*) = MAX(cnt)
RETURN t
I think you might be better off refactoring your model a little bit such that a :Cuisine is a label and each cuisine has its own node.
(:Restaurant)-[:OFFERS]->(:Cuisine)
or
(:Restaurant)-[:SPECIALIZES_IN]->(:Cuisine)
Then your query can look like this
MATCH (cuisine:Cuisine)
RETURN cuisine, size((cuisine)<-[:OFFERS]-()) AS number_of_restaurants
ORDER BY number_of_restaurants DESC
I wasn't able to use WITH r.cuisine as cuisine , count(*) as cnt in a WITH rather than a RETURN statement, so I had to resort to a slightly more long-winded approach.
There might be a more optimized way to do this, but this works too,
// Get all unique cuisines in a list
MATCH (n:Restaurants)
WITH COUNT(n.Cuisine) as total, COLLECT(DISTINCT(n.Cuisine)) as cuisineList
// Go through each cuisine and find the number of restaurants associated with each
UNWIND cuisineList as c
MATCH (r:Restaurants{Cuisine:c})
WITH total, r.Cuisine as c, count(r) as cnt
ORDER BY cnt DESC
WITH COLLECT({Cuisine: c, Count:cnt}) as list
// For the most popular cuisine, find all the restaurants offering it
MATCH (t:Restaurants{Cuisine:list[0].Cuisine})
RETURN t

Traversing through all nodes and comparing each one with every other one

I am working on a little project and I have a dataset of about 60k nodes and 500k relationships between those nodes. The nodes are of two types. First type are are recipes and the second type are ingredients. Recipes are composed of ingredients like:
(ingredient)-[:IS_PART_OF]->(recipe)
My objective is to find how many common ingredients two recipes share. I have managed to obtain this information with the following query that compares one recipe to all others (the first one with all others):
MATCH (recipe:RECIPE{ ID: 1000000 }),(other)
WHERE (other.ID >= 1000001 AND other.ID <= 1057690)
OPTIONAL MATCH (recipe:RECIPE)<-[:IS_PART_OF]-(ingredient:INGREDIENT)- [:IS_PART_OF]->(other)
WITH ingredient, other
RETURN other.ID, count(distinct ingredient.name)
ORDER BY other.ID DESC
My first question: How can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
My second question: is it possible to write a loop that would iterate through all the recipes and check for common ingredients? The objective is to compare each recipe with all others. I think this should return (n-1)*(n/2) rows.
I have tried the above and the problem remains. Even with LIMIT and SKIP I can not run the code on the whole set. I have changed my query so it allows me to partition my set accordingly:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient.name) AS MutualIngredients, recipe2.ID
ORDER BY recipe1.ID
Until I get my hands on a better machine this will suffice.
I still haven't solved my first question: how can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
You'll need to play with this, but it's going to be something similar to this:
MATCH (recipe1:RECIPE)<-[:IS_PART_OF]-(ingred:INGREDIENT)-[:IS_PART_OF]->(recipe2:RECIPE)
WHERE ID(recipe1) < ID(recipe2)
RETURN recipe1, collect(ingred.name), recipe2
ORDER BY recipe1.ID
The match pattern gets you all of the common ingredients between two recipes. The WHERE clause ensures that you're not comparing a recipe to itself (because it would share all ingredients with itself). The return clause just gives you the two recipes you're comparing, and what they have in common.
This will be O(n^2) though, and will be very slow.
UPDATE took Nicole's suggestion, which is a good one. That should guarantee each pair is only considered once.
SOLVED: Just to share it if someone else will need it:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient1:INGREDIENT)
MATCH (recipe2)<-[:IS_PART_OF]-(ingredient2:INGREDIENT)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient1.name) + count(distinct ingredient2.name) - count(distinct ingredient.name) AS RecipesUnion, recipe2.ID
ORDER BY recipe1.ID

Neo4j Return Top Level Nodes in Tree

I have the following data in Neo4j:
CREATE (t1:T {start:1, end:8})
CREATE (t2:T {start:1, end:4})
CREATE (t3:T {start:1, end:2})
CREATE (t4:T {start:3, end:4})
CREATE (t5:T {start:5, end:6})
CREATE (t6:T {start:7, end:8})
CREATE (t2)-[r1:T_OF]->(t1)
CREATE (t3)-[r2:T_OF]->(t2)
CREATE (t4)-[r3:T_OF]->(t2)
CREATE (t5)-[r4:T_OF]->(t1)
CREATE (t6)-[r5:T_OF]->(t1)
This creates a tree with start and end values, which in my actual application are epoch dates. I want to be able to find the nodes that don't have shorter/smaller nodes attached to them in a given range.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
(MAGIC)
RETURN t
My goal is for this to only return t2 and t5, even though t3 and t4 fall in the range. Since they have a T_OF relationship to t2, they should be ignored.
I've tried a few different ways, but unfortunately I can't figure this one out.
Please let me know if I should explain better.
Does this work for you?
It collects all the T nodes with the right date range, and then filters out all the nodes that have a T_OF relationship to any node in the collection.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
WITH COLLECT(t) AS ct
RETURN FILTER(x IN ct
WHERE ALL (y IN ct
WHERE NOT ((x)-[:T_OF]->(y))))
AS result;
You can use paths as expressions and also negate them.
MATCH (t:T)
WHERE t.start >= 1 AND t.end <= 6
AND NOT (t)<-[:T_OF]-()
RETURN t

neo4j distinct two columns

How to return two different columns with cypher in Neo4j? The query which I've got is that:
MATCH (a:Person)-[r:WorksFOR]->(b:Boss), (c:Boss)<-[r2:WorksFOR]-(d:Person)
WHERE b.sex = c.sex
RETURN a, d;
And it returns:
a d
John Will
Will John
I want to get rid of one of the column.
OP needs to reword the question to clarify OP wants to get rid of one of the rows.
Here's a query that does that:
MATCH (a:Person)-[r:WorksFOR]->(b:Boss), (c:Boss)<-[r2:WorksFOR]-(d:Person)
WHERE b.name < c.name AND
b.sex = c.sex AND
b <> c
RETURN a, d;
The problem with your query is that b and c can match any Boss. To force them to match in one order, I've added b.name < c.name. The order doesn't matter, this just forces it to match one way, but not the other. I've added b <> c because you have to handle the case where they work for the same boss, which I don't think you want.
Once you add the ordering, the boss matching (b and c) can only happen one way, not the other way, so your second row of results gets eliminated.

Resources