MATCH based on sum of common node property - neo4j

I need help figuring out a MATCH statement.
My data model is as follows:
(a:musician {name}) //individual musicians
(b:jamSession {date, durationInHours}) //the date and length of jam sessions where 2 or more musicians participated together
and the relation
[r:PLAYED]
I've already figured out how to find all of the jam sessions a specific musician played at:
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession) RETURN a.name, b.date
and all of the musicians a specific musician played with
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession)<-[r2:PLAYED]-(c:musician) RETURN c.name
But how do I get only the musicians that Joe Smith has played with were the sum total time of their common jam sessions was >=100 hours and what date the pair of musicians meet the 100 hour milestone?

This may work for you (assuming b.date is suitable for sorting):
MATCH (a:musician)-[:PLAYED]->(b:jamSession)<-[:PLAYED]-(c:musician)
WHERE a.name = "Joe Smith"
WITH a, c, b ORDER BY b.date
WITH a, c,
REDUCE(s = {sum: 0}, x IN COLLECT(b) |
CASE WHEN s.date IS NULL AND x.durationInHours + s.sum >= 100
THEN {sum: s.sum + x.durationInHours, date: x.date}
ELSE s
END
) AS data
WHERE data.date IS NOT NULL
RETURN c.name AS name, data.date AS date
The aggregating function COLLECT is used to collect the date-ordered b nodes that are shared by the same a and c node pairs. And the REDUCE function is used to iterate through the ordered b nodes to find when the 100 threshold is met.

Related

How to loop through properties for comparison for a certain type of node in Neo4j using Cypher

I am using Neo4j and wondering how to use Cypher to loop through properties of other nodes connecting one node for comparison and filter the ones satisfy condition.
Here is the sample data:
Person Movie Publish_Date
Tina A 2016-01-01
Tina B 2016-01-01
Tina C 2016-03-05
Tina D 2016-03-06
Tina X 2018-03-09
Bob E 2016-08-01
Bob F 2016-08-08
Ana G 2016-04-05
Ana H 2016-08-05
Ana I 2016-12-05
Here is what I want:
Person Movie Publish_Date
Tina A 2016-01-01
Tina B 2016-01-01
Tina C 2016-03-05
Tina D 2016-03-06
Tina X 2018-03-09
Bob E 2016-08-01
Bob F 2016-08-08
I want to return the person who participated in more than 2 movies published in 30 days and movie information.
What I thought to do is for each Person, loop through the publish date of movie nodes connecting with him and retain the ones satisfy the condition in the result table.
Here is my query for getting the sample data:
MATCH (p:Person)-[r1:ACTED_IN]->(m:Movie)
WITH p, m
ORDER BY p.Name DESC, n.Publish_Date
RETURN p.name AS Person, m.title AS Movie, m.publish_date AS Publish_Date
Please suggest.
Thanks in advance!
I'm taking a few liberties in reinterpreting your requirements, so there are some assumptions here you'll have to confirm for this to be valid.
I'm assuming you are looking for, per person, at least one set of 2 movies that were published within 30 days of each other (otherwise you would not be expecting Bob's entries in your results).
I'm also assuming you meant to have movie X's publish_date to be 2016-03-09 instead of 2018-03-09 otherwise it should not be in the expected results.
With those assumptions, this query should do the trick:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, m
ORDER BY m.publish_date
WITH p, collect(m) as movies, collect(m.publish_date) as dates
UNWIND range(0, size(movies)-2) as index
WITH p, movies, dates, index
WHERE duration.inDays(dates[index], dates[index+1]).days <= 30
UNWIND [movies[index], movies[index+1]] as movieInRange
RETURN DISTINCT p, movieInRange
ORDER BY p.name DESC
We UNWIND a range from 0 to 2 less than the size of the movies list per person so that we can do indexing into the lists (we will be evaluating movie dates in pairs so we can do the comparison).
For the adjacent pairs published within 30 days of each other, we UNWIND the collection of the adjacent movies so that the movies are both under the same variable, then we return the sorted DISTINCT values (since the same movie might occur twice, being within 30 days of both the movie before and after.
This query will tell you whether any two movies from the same person are less than 30 days apart. From there you can filter as you wish:
MATCH (m1:Movie)<-[:ACTED_IN]-(p:Person)-[:ACTED_IN]->(m2:Movie)
WITH p, m1,m2, datetime(m1.Publish_Date) as date1, datetime(m2.Publish_Date) as date2
return p.name,m1.title,m2.title,
CASE WHEN date1<date2
THEN date1+duration("P30D")>date2
ELSE date2+duration("P30D")>date1 END AS lessThan30DaysApart

Cypher query which returns the students born in a certain year

In my Neo4j database a have nodes, labeled as students. Student nodes have such a property as date_of_birth, which is of type date(for example: date_of_birth:"1997-01-01"). I want to return all the students who where born in the year "1997" for example.
I tried to do sth like this:
match (n:Student)
with n.date_of_birth as d
where d.year="1997"
return n
But I am getting this error:
Neo.ClientError.Statement.SyntaxError: Variable n not defined ("return n"^)
Why n is not defined in this query and how should I change the query to get the result I need?
There's two things:
1) The WITH statement only carries forward explicitly what you tell it, so if you want n to be usable in a RETURN you need to include n
2) The .year of a date is numeric, so you need to compare it to a numeric and not a string , so 1997 instead of "1997"
This'll make the query:
MATCH (n:Student)
WITH n, n.date_of_birth as d
WHERE d.year == 1997
RETURN n

neo4j cypher suggestion based on common relation rating

Scenario:
graph image
John doe has rated 2 ingredients, 2 of those ingredients happen to belong to a soup recipe, and only 1 to pizza. The query should return the soup recipe because the avg of those ingredient ratings is > 5
What I have:
I started with below query:
MATCH (:Subject {ref:
1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WHERE ir.value > 5 return r;
What I would like to happen:
This returns recipes where an ingredient has a rating above 5, but this does not take into account that other ingredients of that recipe could have lower ratings given by that user.
So I have to expand on above query but I'm a bit clueless where to start.
Thanks in advance,
Update 1:
Based on #InverseFalcon I came up with this, which gives me the results I expect:
MATCH (:Subject {ref: '1'})-[ir:INGREDIENT_RATING]->(i:Ingredient)-[:HAS_INGREDIENT]-(r:Recipe)-[:KITCHEN]->(k:Kitchen)
MATCH (r)-[HAS_INGREDIENT]-(in:Ingredient)
WITH r, k, in, sum(ir.value) AS sum
WHERE sum > 10
RETURN DISTINCT r, collect(DISTINCT in) AS ingredients, k AS kitchen, sum
ORDER BY sum DESC
The second match is because without it, it only returns ingredients with a rating, I need all of them.
There is only one oddity and that is I get a duplicate result even tough I use distinct on r.
Sounds like you need the avg() aggregation function to take the average of multiple values. Does this work for you?
MATCH (:Subject {ref: 1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WITH r, avg(ir.value) as avg
WHERE avg > 5
RETURN r;

Neo4j cypher query for X-chromosome ancestors

In genetic genealogy X-chromosome data is useful linking to certain ancestors. This is well illustrated at: X-DNA Inheritance Chart
My Neo4j database has nodes for each Person and relationships connecting them of father and mother. Each node has a property sex (for the Person's gender; M or F). A female has two X-chromosomes, one from either parent. A male has one X-chromosome, always from the mother.
You can use reduce to see the genders involved in the inheritance from ancestors:
match p=(n:Person{RN:1})-[:father|mother*..20]->m
return m.fullname as FullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c
So, starting with a male (RN:1), the result for c is MM for his father and MF for his mother, MMM for the paternal grandfather and MFM for the maternal grandfather, etc. This pattern shows that when c contains MM (two Ms together in sequence) that these are NOT contributing to the X-chromosome of the start Person.
I want to remove any node that has the MM pattern. It's easy to do this with external code, but I cannot figure out how to do it within the cypher query.
This should work for you:
MATCH p=(n:Person { RN:1 })-[:father|mother*..20]->m
WITH m, NODES(p) AS a
WITH m, REDUCE(c = "", i IN RANGE(0, SIZE(a)-1)| CASE
WHEN c IS NULL OR (i > 0 AND (a[i-1]).sex = "M" AND (a[i]).sex = "M") THEN
NULL
ELSE
c + (a[i]).sex
END ) AS c
WHERE c IS NOT NULL
RETURN m.fullName AS fullName, c
ORDER BY LENGTH(c);
And here is a console that demonstrates the results.
A little late to the party and same thought process as #cybersam's solution.
match p=(n:Person { RN: 1 })-[:father|mother*..20]->(m)
with p, m, extract( g in nodes(p) | g.sex ) as genders
with p, m, genders, range(0,size(genders) -1,1) as gender_index
unwind gender_index as idx
with p, m, genders, collect([genders[idx], genders[idx+1]]) as pairs
where not ['M','M'] in pairs
return m.fullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c
This query gets me only ancestors contributing an X-chromosome:
match p=(n:Person{RN:1})-[:father|mother*..20]->(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return m.RN,m.fullname as Name, c
The collection of genders adds a gender for each generation and is filtered to exclude any MM since a male cannot transmit his X to another male (e.g., son).

Neo4J - select records with max count

I struggle to write a query, that will return info about most played tracks for every user.
I go with something like this:
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
return l.name, a.title, numberOfScrobbles
and get a list of values: User name - Artist name - Number of scrobbled tracks created by given artist.
My goal is to acquire most favorite artist for every user (artist with most scrobbles for each user). The closest i get is with this:
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
return l.name, max(numberOfScrobbles)
which gives me number of tracks played by a favourite artist for given user, but how can I join proper artist's name to this result?
Any clues/tips?
One idea (maybe there's a much simpler solution):
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
with l, collect(a) as artists, collect(numberOfScrobbles) as counts
with l, artists, reduce(x=[0,0], idx in range(0,size(counts)-1) | case when counts[idx] > x[1] then [idx,counts[idx]] else x end)[0] as index
return l.name, artists[index]
The reduce function is used to find the position of the largest element in the array. That index is then used to subscript the artists array.
Here is a query that should improve on #StefanAmbruster's fine answer. It uses the MAX() function to find the max numberOfScrobbles per listener; extracts all the artists that scored that max number for that listener; and then returns each listener, its collection of winning artists, and the max count.
MATCH (l:Listener)-[:LOGS]->(:Scrobble)-[:FEATURES]->(:Track)<-[:PERFORMS]-(a:Artist)
WITH l, a, count(*) as numberOfScrobbles
WITH l, collect(a) as artists, collect(numberOfScrobbles) as counts, MAX(numberOfScrobbles) AS max_nos
WITH l, max_nos, extract(i IN range(0, size(counts)-1) | CASE WHEN counts[i] = max_nos THEN artists[i] ELSE NULL END) AS as
RETURN l.name, as, max_nos;

Resources