neo4j cypher suggestion based on common relation rating

neo4j cypher suggestion based on common relation rating - neo4j

Scenario:
graph image
John doe has rated 2 ingredients, 2 of those ingredients happen to belong to a soup recipe, and only 1 to pizza. The query should return the soup recipe because the avg of those ingredient ratings is > 5
What I have:
I started with below query:
MATCH (:Subject {ref:
1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WHERE ir.value > 5 return r;
What I would like to happen:
This returns recipes where an ingredient has a rating above 5, but this does not take into account that other ingredients of that recipe could have lower ratings given by that user.
So I have to expand on above query but I'm a bit clueless where to start.
Thanks in advance,
Update 1:
Based on #InverseFalcon I came up with this, which gives me the results I expect:
MATCH (:Subject {ref: '1'})-[ir:INGREDIENT_RATING]->(i:Ingredient)-[:HAS_INGREDIENT]-(r:Recipe)-[:KITCHEN]->(k:Kitchen)
MATCH (r)-[HAS_INGREDIENT]-(in:Ingredient)
WITH r, k, in, sum(ir.value) AS sum
WHERE sum > 10
RETURN DISTINCT r, collect(DISTINCT in) AS ingredients, k AS kitchen, sum
ORDER BY sum DESC
The second match is because without it, it only returns ingredients with a rating, I need all of them.
There is only one oddity and that is I get a duplicate result even tough I use distinct on r.

Sounds like you need the avg() aggregation function to take the average of multiple values. Does this work for you?
MATCH (:Subject {ref: 1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe)
WITH r, avg(ir.value) as avg
WHERE avg > 5
RETURN r;

Related

Neo4j count based on single relationship to node

I would like to find all persons who participated in all specified movies, for example in 2 movies: "The Terminator", "True Lies"
I have the following query:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH n, collect(n) as names
WHERE SIZE(names) >= 2
RETURN n.primaryName
which works fine if every person participated (:ACTS_BY relationship) only once in every movie. But according to my database schema design, every person can have 0-N :ACTS_BY relationships between Principal and Name nodes(for example the same person can be producer and actor of the movie at the same time).
The issue is that the mentioned Cypher query will also return the person(Name node) in case that person participated 2+ times in one movie and 0 times in another but I only need to return the Name node in case the person participated in each movie.
Please help to improve the query in order to achieve it.

To fix this, you'll want to get distinct values of t, p, n to weed out the duplicates, and only then do a count:
MATCH (t:Title)-[:ACTS_IN]-(p:Principal)-[:ACTS_BY]->(n:Name)
WHERE t.originalTitle IN ["The Terminator", "True Lies"]
WITH DISTINCT t, p, n
WITH n, count(n) as occurrences
WHERE occurrences >= 2
RETURN n.primaryName

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?

We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a

Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

compare sum and sum of different conditions - post union processing?

Hi I have a relationship Artist - Collaborated -> Writer and would like to find who are the artists who write mainly their own songs. Thus the weighted edge between writer and artist with the same name should be bigger than the sum of all other weights.
I managed to do this:
MATCH (n:Artist)-[r:Collaborated]-(m:Writer)
WITH n, m, sum(r.weight) as wrote
WHERE n.name = toLower(m.name)
RETURN n.name as Node, wrote ORDER BY wrote descending;
but I am not sure how to incorporate the second condition. Do I have to use post union processing? Any help pls?
To join the two WHERE conditions, I tried something like this and compare the first sum to the second sum but it doesn't work:
MATCH (o:Artist)-[q:Collaborated]-(p:Writer)
WITH o, p, sum(q.weight) as wrote1
WHERE o.name <> toLower(p.name)
MATCH (n:Artist)-[r:Collaborated]-(m:Writer)
WITH n, m, sum(r.weight) as wrote2
WHERE n.name = toLower(m.name) and wrote2>wrote1
RETURN n.name as Node, wrote2;
This is an example of how my graph looks like:
I would like to know if the weight between eminem and eminem is bigger than all the other weights

Firstly, your model is a little weird, you have two nodes Eminem, one with the label Artist and an other with the label Writer.
For my POV, you should have only one node Eminem with both labels.
To respond to your question I think that this query can helps you :
MATCH (o:Artist)-[r:Collaborated]->(p:Writer)
WITH o, CASE WHEN o.name = p.name THEN r.weight ELSE -1*r.weight END AS score
RETURN o, sum(score) AS score
If the score is superior to 0, then you know that eminem and eminem is bigger than all the other weights.

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.

UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

Traversing through all nodes and comparing each one with every other one

I am working on a little project and I have a dataset of about 60k nodes and 500k relationships between those nodes. The nodes are of two types. First type are are recipes and the second type are ingredients. Recipes are composed of ingredients like:
(ingredient)-[:IS_PART_OF]->(recipe)
My objective is to find how many common ingredients two recipes share. I have managed to obtain this information with the following query that compares one recipe to all others (the first one with all others):
MATCH (recipe:RECIPE{ ID: 1000000 }),(other)
WHERE (other.ID >= 1000001 AND other.ID <= 1057690)
OPTIONAL MATCH (recipe:RECIPE)<-[:IS_PART_OF]-(ingredient:INGREDIENT)- [:IS_PART_OF]->(other)
WITH ingredient, other
RETURN other.ID, count(distinct ingredient.name)
ORDER BY other.ID DESC
My first question: How can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
My second question: is it possible to write a loop that would iterate through all the recipes and check for common ingredients? The objective is to compare each recipe with all others. I think this should return (n-1)*(n/2) rows.
I have tried the above and the problem remains. Even with LIMIT and SKIP I can not run the code on the whole set. I have changed my query so it allows me to partition my set accordingly:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient.name) AS MutualIngredients, recipe2.ID
ORDER BY recipe1.ID
Until I get my hands on a better machine this will suffice.
I still haven't solved my first question: how can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)

You'll need to play with this, but it's going to be something similar to this:
MATCH (recipe1:RECIPE)<-[:IS_PART_OF]-(ingred:INGREDIENT)-[:IS_PART_OF]->(recipe2:RECIPE)
WHERE ID(recipe1) < ID(recipe2)
RETURN recipe1, collect(ingred.name), recipe2
ORDER BY recipe1.ID
The match pattern gets you all of the common ingredients between two recipes. The WHERE clause ensures that you're not comparing a recipe to itself (because it would share all ingredients with itself). The return clause just gives you the two recipes you're comparing, and what they have in common.
This will be O(n^2) though, and will be very slow.
UPDATE took Nicole's suggestion, which is a good one. That should guarantee each pair is only considered once.

SOLVED: Just to share it if someone else will need it:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient1:INGREDIENT)
MATCH (recipe2)<-[:IS_PART_OF]-(ingredient2:INGREDIENT)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient1.name) + count(distinct ingredient2.name) - count(distinct ingredient.name) AS RecipesUnion, recipe2.ID
ORDER BY recipe1.ID

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

neo4j cypher suggestion based on common relation rating - neo4j

Sounds like you need the avg() aggregation function to take the average of multiple values. Does this work for you? MATCH (:Subject {ref: 1})-[ir:INGREDIENT_RATING]->(:Ingredient)<-[:HAS_INGREDIENT]-(r:Recipe) WITH r, avg(ir.value) as avg WHERE avg > 5 RETURN r;

Related

Neo4j count based on single relationship to node

Nodes with relationship to multiple nodes

compare sum and sum of different conditions - post union processing?

Neo4j: multiple counts from multiple matches

Traversing through all nodes and comparing each one with every other one

Categories

Resources