I'm working on a simple demo in neo4j where I want to use recommendations based on orders and how has bought what. I've created a graph here: http://console.neo4j.org/?id=jvqr95.
Basically I have many relations like:
(o:Order)-[:INCLUDES]->(p:Product)
An order can have multiple products.
Given a specific product id I would like to find other products that was in an order containing a product with the given product id and I would like to order it by the number of orders the product is in.
I've tried the following:
MATCH (p:Product)--(o)-[:INCLUDES]->(p2:Product)--(o2)
WHERE p.name = "chocolate"
RETURN p2, COUNT(DISTINCT o2)
but that doesn't give me the result I want. For that query I expected to get chips back with a count of 2, but I only get a count of 1.
And for the follwing query:
MATCH (p:Product)--(o)-[:INCLUDES]->(p2:Product)--(o2)
WHERE p.name = "chips"
RETURN p2, COUNT(DISTINCT o2)
I expect to get chocolate and ball back where each has a count of 1, but I don't get anything back. What have I missed?
You're matching on too many things in your initial MATCH.
MATCH (o:Order)-[:INCLUDES]->(p:Product { name:'ball' })
MATCH (o)-[:INCLUDES]->(p2:Product)
WHERE p2 <> p
MATCH (o2:Order)-[:INCLUDES]->(p2)
RETURN p2.name AS Product, COUNT(o2) AS Count
ORDER BY Count DESC
In English: "Match on orders that include a specific product. For these orders, get all included products that are not the initial product. For these products, match on orders that they are included in. Return the product name and the count of how many orders it's been included in."
http://console.neo4j.org/?id=q49sx6
http://console.neo4j.org/?id=uy3t9e
Related
I have a requirement where i want to include multiple destination nodes in my path.
My requirement:
I want to find all the suppliers and the RISK associated with the Suppliers for a particular product.
Query to return all suppliers for a Product.
match path =(p:Product{name:"Product2"}) <-[*..10] -(:Supplier)
return path
This query returns me all the suppliers for a particular product
Query to return suppliers affected by RISK for a product
match path =(p:Product{name:"Product2"}) <-[*..10] -(:Supplier)-[:AFFECTEDBY]-(:RISK)
return path
As you can see the 2 suppliers (name:"SupplierN+1") were not retrieved in the above graph
Can you please help me with the query to retrieve BOTH Suppliers (3 suppliers) and RISK associated with for a particular product.
Thanks
MATCH p:Product{name:"Product2"}) <-[*..10] -(s:Supplier)
OPTIONAL MATCH s-[:AFFECTED_BY]-(r:RISK)
RETURN s, r, p
OPTIONAL MATCH is like OUTER JOIN of SQL. It will optionally match :RISK nodes for the :SUPPLIER. It will return all :SUPPLIER nodes which have or don't have relation to :RISK nodes.
The below query is taken from neo4j movie review dataset sandbox:
MATCH (u:User {name: "Some User"})-[r:RATED]->(m:Movie)
WITH u, avg(r.rating) AS mean
MATCH (u)-[r:RATED]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE r.rating > mean
WITH u, g, COUNT(*) AS score
MATCH (g)<-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS((u)-[:RATED]->(rec))
RETURN rec.title AS recommendation, rec.year AS year, COLLECT(DISTINCT g.name) AS genres, SUM(score) AS sscore
ORDER BY sscore DESC LIMIT 10
what I can not understand is: why the DISTINCT keyword is required in the query's return statement?. Because the expected results from the last MATCH statement is something like this:
g1,x
g1,y
...
g2,z
g2,v
g2,m
...
gn,m
gn,b
gn,x
where g1,g2,..gn are the set of genres and x,y,z,v,m,b... are a set of movies (in addition there is a user and score column deleted for readability).
So according to my understanding what this query is returning: For each movie return its genres and the sum of their scores.
Assumptions:
Every Movie has a unique title. (This is required for the query to work as is.)
Every Genre has a unique name.
Every Movie has at most one IN_GENRE relationship to each distinct Genre.
Given the above assumptions, you are correct that the DISTINCT is not necessary. That is because the RETURN clause is using rec.title as one of the aggregation grouping keys.
I have a db of drugs and manufacturers and I want to find all manufacturers who have produced multiple drugs. How can I get only the manufacturers and the drugs they have produced?
I'm currently using
match (a:Brand), (c:Manufacturer) where size((c)-[:PRODUCED]->()) >1 return a,c;
which returns manufacturers with more than one drug produced but also all drugs, regardless of manufacturer
This query uses the aggregating function, COLLECT, to return a record for each manufacturer who makes multiple brands, along with a collection of those brands:
MATCH (m:Manufacturer)-[:PRODUCED]->(b:Brand)
WITH m, COLLECT(b) AS brands
WHERE SIZE(brands) > 1
RETURN m, brands;
Sounds like you only need to select the manufacturers, like so:
MATCH (c:Manufacturer) WHERE size((c)-[:PRODUCED]->()) > 1 RETURN c;
I am experimenting with a graph representing (:Shopper)'s who -[:Make]->(:Purchase)'s and each purchase -[:Contains]->(:Item)'s. The challenge is that I want to compare the quantity of Item A each Shopper bought on their most recent purchase. Eliminating Items with only one :Contains relationship won't work, because the Item may have been bought in an earlier purchase as well.
I can get data on the set of all Items in all Shoppers' most recent Purchases with
MATCH (s:Shopper)-->(p:Purchase)
WITH s, max(p.Time) AS latest
MATCH (s)-->(p:Purchase)
WHERE p.Time = latest
MATCH (p)-[c:Contains]->(i:Item)
RETURN s.Name, p.Time, c.Quantity, i.Name
but now I want to replace the second MATCH clause with something like
MATCH (p:Purchase)-[c1:Contains]->(i:Item)<-[c2:Contains]-(p:Purchase)
and it doesn't return any results. I suspect that this looks for items that have two :Contains relationships to the SAME Purchase. I want to get the :Contains relationships on two DIFFERENT Purchases in the same filtered group. How can I do this efficiently? I really want to avoid having to redo the filtering process on the second Purchase node.
[UPDATED]
In your top query, you do not need to MATCH twice to get the latest Purchase for each Shopper (see below).
In your MATCH snippet, you are using the same p variable for both Purchase nodes, so of course they are forced to be the same node.
Here is a query that should return a set of data for each Item that was in the latest Purchases of multiple Shoppers:
MATCH (s:Shopper)-[:Make]->(pur:Purchase)
WITH s, pur
ORDER BY pur.Time DESC
WITH s, HEAD(COLLECT(pur)) AS p
MATCH (p)-[c:Contains]->(i:Item)
WITH i, COLLECT({shopper: s.Name, time: p.Time, quantity: c.Quantity}) AS set
WHERE SIZE(set) > 1
RETURN i.Name AS item, set;
Here is a console that demonstrates the query with your sample data (with corrections to label and type names). It produces this result:
+-------------------------------------------------------------------------------------------------------------------------------+
| item | set |
+-------------------------------------------------------------------------------------------------------------------------------+
| "Banana" | [{shopper=Mandy, time=213, quantity=12},{shopper=Joe, time=431, quantity=5},{shopper=Steve, time=320, quantity=1}] |
+-------------------------------------------------------------------------------------------------------------------------------+
Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total