I struggle to write a query, that will return info about most played tracks for every user.
I go with something like this:
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
return l.name, a.title, numberOfScrobbles
and get a list of values: User name - Artist name - Number of scrobbled tracks created by given artist.
My goal is to acquire most favorite artist for every user (artist with most scrobbles for each user). The closest i get is with this:
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
return l.name, max(numberOfScrobbles)
which gives me number of tracks played by a favourite artist for given user, but how can I join proper artist's name to this result?
Any clues/tips?
One idea (maybe there's a much simpler solution):
MATCH (l:Listener)-[lo:LOGS]->(s:Scrobble)-[f:FEATURES]->(t:Track)<-[p:PERFORMS]-(a:Artist)
with l,a,count(*) as numberOfScrobbles
with l, collect(a) as artists, collect(numberOfScrobbles) as counts
with l, artists, reduce(x=[0,0], idx in range(0,size(counts)-1) | case when counts[idx] > x[1] then [idx,counts[idx]] else x end)[0] as index
return l.name, artists[index]
The reduce function is used to find the position of the largest element in the array. That index is then used to subscript the artists array.
Here is a query that should improve on #StefanAmbruster's fine answer. It uses the MAX() function to find the max numberOfScrobbles per listener; extracts all the artists that scored that max number for that listener; and then returns each listener, its collection of winning artists, and the max count.
MATCH (l:Listener)-[:LOGS]->(:Scrobble)-[:FEATURES]->(:Track)<-[:PERFORMS]-(a:Artist)
WITH l, a, count(*) as numberOfScrobbles
WITH l, collect(a) as artists, collect(numberOfScrobbles) as counts, MAX(numberOfScrobbles) AS max_nos
WITH l, max_nos, extract(i IN range(0, size(counts)-1) | CASE WHEN counts[i] = max_nos THEN artists[i] ELSE NULL END) AS as
RETURN l.name, as, max_nos;
Related
I would like to find all persons who participated in all specified movies, for example in 2 movies: "The Terminator", "True Lies"
I have the following query:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH n, collect(n) as names
WHERE SIZE(names) >= 2
RETURN n.primaryName
which works fine if every person participated (:ACTS_BY relationship) only once in every movie. But according to my database schema design, every person can have 0-N :ACTS_BY relationships between Principal and Name nodes(for example the same person can be producer and actor of the movie at the same time).
The issue is that the mentioned Cypher query will also return the person(Name node) in case that person participated 2+ times in one movie and 0 times in another but I only need to return the Name node in case the person participated in each movie.
Please help to improve the query in order to achieve it.
To fix this, you'll want to get distinct values of t, p, n to weed out the duplicates, and only then do a count:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH DISTINCT t, p, n
WITH n, count(n) as occurrences
WHERE occurrences >= 2
RETURN n.primaryName
Hi there I am on neo4j and I am having some trouble I have one query where I want to return a the a node (cuisine) with the highest percentage like so
// 1. Find the most_popular_cuisine
MATCH (n:restaurants)
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
RETURN r.cuisine , 100 * count(*)/total as percentage
order by percentage desc
limit 1
I am trying to extend this even further by getting the top result and matching to that to get nodes with just that property like so
WITH COUNT(n.cuisine) as total
MATCH (r:restaurants)
WITH r.cuisine as cuisine , count(*) as cnt
MATCH (t:restaurants)
WHERE t.cuisine = cuisine AND count(*) = MAX(cnt)
RETURN t
I think you might be better off refactoring your model a little bit such that a :Cuisine is a label and each cuisine has its own node.
(:Restaurant)-[:OFFERS]->(:Cuisine)
or
(:Restaurant)-[:SPECIALIZES_IN]->(:Cuisine)
Then your query can look like this
MATCH (cuisine:Cuisine)
RETURN cuisine, size((cuisine)<-[:OFFERS]-()) AS number_of_restaurants
ORDER BY number_of_restaurants DESC
I wasn't able to use WITH r.cuisine as cuisine , count(*) as cnt in a WITH rather than a RETURN statement, so I had to resort to a slightly more long-winded approach.
There might be a more optimized way to do this, but this works too,
// Get all unique cuisines in a list
MATCH (n:Restaurants)
WITH COUNT(n.Cuisine) as total, COLLECT(DISTINCT(n.Cuisine)) as cuisineList
// Go through each cuisine and find the number of restaurants associated with each
UNWIND cuisineList as c
MATCH (r:Restaurants{Cuisine:c})
WITH total, r.Cuisine as c, count(r) as cnt
ORDER BY cnt DESC
WITH COLLECT({Cuisine: c, Count:cnt}) as list
// For the most popular cuisine, find all the restaurants offering it
MATCH (t:Restaurants{Cuisine:list[0].Cuisine})
RETURN t
Scenario:
graph image
John doe has rated 2 ingredients, 2 of those ingredients happen to belong to a soup recipe, and only 1 to pizza. The query should return the soup recipe because the avg of those ingredient ratings is > 5
What I have:
I started with below query:
MATCH (:Subject {ref:
1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WHERE ir.value > 5 return r;
What I would like to happen:
This returns recipes where an ingredient has a rating above 5, but this does not take into account that other ingredients of that recipe could have lower ratings given by that user.
So I have to expand on above query but I'm a bit clueless where to start.
Thanks in advance,
Update 1:
Based on #InverseFalcon I came up with this, which gives me the results I expect:
MATCH (:Subject {ref: '1'})-[ir:INGREDIENT_RATING]->(i:Ingredient)-[:HAS_INGREDIENT]-(r:Recipe)-[:KITCHEN]->(k:Kitchen)
MATCH (r)-[HAS_INGREDIENT]-(in:Ingredient)
WITH r, k, in, sum(ir.value) AS sum
WHERE sum > 10
RETURN DISTINCT r, collect(DISTINCT in) AS ingredients, k AS kitchen, sum
ORDER BY sum DESC
The second match is because without it, it only returns ingredients with a rating, I need all of them.
There is only one oddity and that is I get a duplicate result even tough I use distinct on r.
Sounds like you need the avg() aggregation function to take the average of multiple values. Does this work for you?
MATCH (:Subject {ref: 1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WITH r, avg(ir.value) as avg
WHERE avg > 5
RETURN r;
I'm working on an application using Neo4J and I'm having problems with the sorting in some of the queries. I have a list of stores that have promotions so I have a node for each store and a node for each promotion. Each store can have multiple promotions so it's a one to many relationship. Some of the promotions are featured (featured property = true) so I want those to appear first. I'm trying to construct a query that does the following:
Returns a list of stores with the promotoions as a collection (returning it like this is ideal for paging)
Sorts the stores so the ones with most featured promotions appear first
Sorts the collection so that the promotions that are featured appear first
So far I have the following:
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store) WITH p, s, collect(p.featured) as featuredCol WITH p, s, LENGTH(FILTER(i IN featuredCol where i='true')) as featuredCount ORDER BY p.featured DESC, featuredCount DESC RETURN s, collect(p) skip 0 limit 10
First, I try to create a collection using the featured property using a WITH clause. Then, I try to create a second collection where the featured property is equal to true and then get the length in a second WITH clause. This sorts the collection with the promotions correctly but not the stores. I get an error if I try to add another sort at the end like this because the featuredCount variable is not in the RETURN clause. I don't want the featuredCount variable in the RETURN clause because it throws my pagination off.
Here is my second query:
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store) WITH p, s, collect(p.featured) as featuredCol WITH p, s, LENGTH(FILTER(i IN featuredCol where i='true')) as featuredCount ORDER BY p.featured DESC, featuredCount DESC RETURN s, collect(p) ORDER BY featuredCount skip 0 limit 10
I'm very new to Neo4J so any help will be greatly appreciated.
Does this query (see this console) work for you?
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store)
WITH p, s
ORDER BY p.featured DESC
WITH s, COLLECT(p) AS pColl
WITH s, pColl, REDUCE(n = 0, p IN pColl | CASE
WHEN p.featured
THEN n + 1
ELSE n END ) AS featuredCount
ORDER BY featuredCount DESC
RETURN s, pColl
LIMIT 10
This query performs the following steps:
It orders the matched rows so that the rows with featured promotions are first.
It aggregates all the p nodes for each distinct s into a pColl collection. The featured promotions still appear first within each pColl.
It calculates the number of featured promotions in each pColl, and orders the stores so that the ones with the most features promotions appear first.
It then returns the results.
Note: This query assumes that featured has a boolean value, not a string. (FYI: ORDER BY considers true to be greater than false). If this assumption is not correct, you can change the WHEN clause to WHEN p.featured = 'true'.
I'm working on a simple demo in neo4j where I want to use recommendations based on orders and how has bought what. I've created a graph here: http://console.neo4j.org/?id=jvqr95.
Basically I have many relations like:
(o:Order)-[:INCLUDES]->(p:Product)
An order can have multiple products.
Given a specific product id I would like to find other products that was in an order containing a product with the given product id and I would like to order it by the number of orders the product is in.
I've tried the following:
MATCH (p:Product)--(o)-[:INCLUDES]->(p2:Product)--(o2)
WHERE p.name = "chocolate"
RETURN p2, COUNT(DISTINCT o2)
but that doesn't give me the result I want. For that query I expected to get chips back with a count of 2, but I only get a count of 1.
And for the follwing query:
MATCH (p:Product)--(o)-[:INCLUDES]->(p2:Product)--(o2)
WHERE p.name = "chips"
RETURN p2, COUNT(DISTINCT o2)
I expect to get chocolate and ball back where each has a count of 1, but I don't get anything back. What have I missed?
You're matching on too many things in your initial MATCH.
MATCH (o:Order)-[:INCLUDES]->(p:Product { name:'ball' })
MATCH (o)-[:INCLUDES]->(p2:Product)
WHERE p2 <> p
MATCH (o2:Order)-[:INCLUDES]->(p2)
RETURN p2.name AS Product, COUNT(o2) AS Count
ORDER BY Count DESC
In English: "Match on orders that include a specific product. For these orders, get all included products that are not the initial product. For these products, match on orders that they are included in. Return the product name and the count of how many orders it's been included in."
http://console.neo4j.org/?id=q49sx6
http://console.neo4j.org/?id=uy3t9e