Product recommendation cypher - neo4j

I have the following simple graph -
I wish to build a simple recommendation system on the basis of the following example:
Consider that we have invoice 1 with an Article "Apple".
We also have invoice 2 which has "Apple" and "Oranges".
Customer of invoice 1 should be recommended "Oranges".
Basically, When a customer adds an item to an invoice, we need to recommend articles that were added to another invoice with at least one of its article in the current invoice. And the recommended article not in the current invoice.
Another way to say this -
When an article A exists in Invoice 1 AND Invoice 2 also contains article A, then list all other articles in Invoice 2 provided they do not exist in Invoice 1.
However, as a complete beginner I'm unable to figure out how to write the cypher query. Any help on how to write such a query?

Something like below should work to start with:
MATCH (i:Invoice)-[]-(a:Article)-[]-(:Invoice)-[]-(b:Article)
WHERE i.invoiceNumber = 123
RETURN b;
What is does is - start from the invoice, then navigate through the articles connected to that invoice, onto other invoices (all other invoices that share this article). From there it collects all the articles connected to those invoices.
(this assumes that you are using unique Articlenodes and connecting the invoices to them)

You can use below query for a given Customer (let say Customer1), give me other customers and recommended food based on any food that Customer1 ordered and common to other customers.
MATCH (c1:Customer {name: 'Customer1'})<-[:GENERATED_FOR]-(:Invoice)<-[:ITEMIZED_IN]-(:Article)-[:TYPE]->(f:FoodArticle)
WITH c1, collect(f) as food
MATCH (c2:Customer)<-[:GENERATED_FOR]-(:Invoice)<-[:ITEMIZED_IN]-(:Article)-[:TYPE]->(f2:FoodArticle)
WHERE c1 <> c2 AND f2 in food
WITH c2, food, collect(f2) as food2
WITH c2, [fd IN food WHERE NOT fd IN food2] as recommendations
WHERE size(recommendations) > 0
RETURN c2.name, recommendations
First, get all food that customer1 has ordered
Next, find all customers that has at least one food contained in Customer1's food
List out customer2 and collect all food for this customer2
Create a list of recommended food based on those found in customer1 food list BUT NOT found in customer2 food list
Return customer2 name and recommended food but ensure that there is at least one food in Customer1 list that is not found in customer2 list (food2)

Related

Cypher : book recommendation

I have 3 nodes:
Users (id, age).
Ratings (isbn, id, rating (this has a value of 0 to 10)).
Books (isbn, title, ...)
And the relationships:
Users - [GIVE_RATINGS]-Ratings -[BELONGS_TO]- Books
I need to create a recommendation where the input will be one or more books the reader liked, and the output will be books that users who rated positively also rated books the reader has already read.
I tried to create such a query, but it doesn't work.
MATCH (u:Users{id:'11676'})-[:GIVE_RATING]->(book)<-[:GIVE_RATING]-(person), (person)-[:GIVE_RATING]->(book2)<-[:GIVE_RATING]-(r:Ratings{rating:'9'})
WHERE NOT EXIST (book2)-[:GIVE_RATING]->(u)
RETURN book2.isbn,person.id
you probably want to store your ratings as integers or floats, not strings, better to use [not] exists { pattern } in newer versions
A common recommendation statement would look like this:
MATCH (u:Users{id:$user})-[:GIVE_RATING]->(rating)
<-[:GIVE_RATING]-(person)-[:GIVE_RATING]->(rating2)
<-[:GIVE_RATING]-(rating3)
WHERE abs(rating2.rating - rating.rating) <= 2 // similar ratings
AND rating3.rating >= 9
AND NOT EXIST { (rating3)<-[:GIVE_RATING]-(u) }
WITH rating3, count(*) as freq
RETURN rating3.isbn,person.id
ORDER BY freq DESC LIMIT 5
You could also represent your rating information on the relationship between user and book, no need for the extra Node.

Iterate over a list in Neo4j

I am working on Neo4j database and I want to replicate the scenario mentioned below,
I have 2 nodes Product and customer. In the customer node I am storing customer id and list of products. and in the product I am storing only productid.
Customer has values {custId:1,products:[1,2,3,4]}
Product has values {productid:1},{productid:2},{productid:3},{productid:4}
Now what I want to do is,
I need to replace all these ids to an autogenerated ids after adding the nodes in the graph database. SOmething like set custId=ID(customer) and productId=ID(product) but what I am stuck at is how to iterate the list of products in customer node and change the product id to auto generated ids.
Any help is appreciated.
The idea of storing the product IDs are automatically generated by database in an array of user property - it is the wrong idea. In all senses.
The graph spirit - is to establish a relationship between the node Customer and its corresponding nodes Product, and then delete the property products from Customer and productid from Product:
MATCH (Customer:Customer)
UNWIND Customer.products as prodID
MATCH (Product:Product {productid: prodID})
MERGE (Customer)-[r:hasProduct]->(Product)
WITH Customer, count(Product) as mergedProduct
REMOVE Customer.products
WITH count(Customer) as totalMerged
MATCH (Product:Product)
REMOVE Product.productid

How to get unique product list?

We have list of category products having duplicate names.
How to get a list of products which should not have duplicate product name in Postgres?
We are searching for min product ids with group by name.
then searching the products in ids.
category = Category.first
ids = Product.select("MIN(id) as id").where(deleted: false).group(:name).collect(&:id)
category.products.where("products.id IN (?)", ids).find_active
How can we optimize the queries?
You can do Product.all.pluck(:name).uniq to get just the product names in an array.
But I think you're solving the wrong problem, in that this problem has a bad 'smell'. If you have products that have identical names, how do you differentiate them from a UX perspective? And why only get the first created product by that name vs. the most 'popular' product? I'm trying to imagine how this solution would work for the user and I'm coming up blank, perhaps because I don't know enough about the context.
Edit: Also, could you clarify what you mean by 'should not have duplicate product name'? Is it to get a list of products, but only the first product if there's multiple products with the same name? Or are you looking for items to correct?
The simple solution in Postgres is with DISTINCT ON:
SELECT DISTINCT ON (name)
id, name -- add more columns if you need
FROM Product
WHERE deleted = FALSE
ORDER BY name, id;
Returns unique product names (sorted alphabetically). From each set of dupes the one with the smallest id. Details:
Select first row in each GROUP BY group?

Find nodes with same relationships that initial node

I have customers (id, name, type), commerces (id, name, type) and relationships between them (idcustomer, idcommerce, quantity) that indicates that a customer has bought in a commerce and the quantity.
Well, I want to achieve nodes that have same relationships that the origin node, I mean, if customer 1 bought in commerce id=10 and id=11 I want to achive other customers who have bought in exact the same commerces (at least) that customer 1 in order to recommend the rest of commerces.
Now, I have next command but it doesn't work because it returns me all customers that have bought in one of the commerce where customer 1 bought but not in all of them.
START m=node:id(id="1") MATCH (m)-[:BUY]->(commerces)<-[:BUY]-(customers) RETURN customers;
Example
Customer 1 bought commerce 10, 11
Customer 2 bought commerce 10, 3
Customer 3 bought commerce 10, 11, 4
Customer 4 bought commerce 5, 8, 10
The return that I want is Customer 3 in order to recommend commerce 4.
Thank you.
Here is one solution,
The first query gets all of the products the start node m buys, that is the collect(commerce) of the first "WITH" clause;
The second query gets all products each customer shares with the m, that is the customerCommerces of the second "With" clause;
Then the "Where" clause eliminates those customers who share only a subset of the products bought by the m, therefore returns the customers who share all of the products with the m.
START m=node:id(id="1")
Match (m)-[:BUY]->(commerce)
With collect(commerce) as mCos
START m=node:id(id="1")
Match (m)-[:BUY]->(commerce)<-[:BUY]-(customer)
with mCos, customer, collect(commerce) as customerCommerces
Where length(mCos) = length(customerCommerces)
Return customer

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

Resources