Cypher : book recommendation - neo4j

I have 3 nodes:
Users (id, age).
Ratings (isbn, id, rating (this has a value of 0 to 10)).
Books (isbn, title, ...)
And the relationships:
Users - [GIVE_RATINGS]-Ratings -[BELONGS_TO]- Books
I need to create a recommendation where the input will be one or more books the reader liked, and the output will be books that users who rated positively also rated books the reader has already read.
I tried to create such a query, but it doesn't work.
MATCH (u:Users{id:'11676'})-[:GIVE_RATING]->(book)<-[:GIVE_RATING]-(person), (person)-[:GIVE_RATING]->(book2)<-[:GIVE_RATING]-(r:Ratings{rating:'9'})
WHERE NOT EXIST (book2)-[:GIVE_RATING]->(u)
RETURN book2.isbn,person.id

you probably want to store your ratings as integers or floats, not strings, better to use [not] exists { pattern } in newer versions
A common recommendation statement would look like this:
MATCH (u:Users{id:$user})-[:GIVE_RATING]->(rating)
<-[:GIVE_RATING]-(person)-[:GIVE_RATING]->(rating2)
<-[:GIVE_RATING]-(rating3)
WHERE abs(rating2.rating - rating.rating) <= 2 // similar ratings
AND rating3.rating >= 9
AND NOT EXIST { (rating3)<-[:GIVE_RATING]-(u) }
WITH rating3, count(*) as freq
RETURN rating3.isbn,person.id
ORDER BY freq DESC LIMIT 5
You could also represent your rating information on the relationship between user and book, no need for the extra Node.

Related

Rails: Collect records whose join tables appear in two queries

There are three models that matter here: Objective, Student, and Seminar. All are associated with has_and_belongs_to_many.
There is an ObjectiveStudent join model that includes columns "ready" and "points_all_time". There is an ObjectiveSeminar join model that includes column "priority".
I need to collect all of the objectives that are associated with a given student and also with a given seminar.
They need to also be marked with a "priority" above zero in the seminar. So I think I need this line:
obj_sems = ObjectiveSeminar.where(:seminar => given_seminar).where("priority > ?", 0)
Finally, they need to also be objectives where the student is ready, but has not scored above 7. So I think I need this line:
obj_studs = ObjectiveStudent.where(:user => given_student, :ready => true).where("points_all_time <= ?", 7)
Is there a way to gather all the objectives whose join table records appear in both of the above queries? Note that neither of the lists return objectives; they return objective_seminars, and objective_students, respectively. My end goal is to collect the objectives that meet all of the above criteria.
Or am I approaching this all wrong?
Bonus question: I would also love to sort the objectives by their priority in the given seminar. But I'm afraid that would add too much to the database load. What are your thoughts on this?
Thank you in advance for any insight.
In order to get Objectives you'll need to start your query from that.
In order to query with an AND condition the associated tables, you'll need inner joins with these tables.
Finally you'll need a distinct operator to only fetch each objective once.
The extended version of what (I think) you need is:
Objective.joins(objective_seminars: :seminar, objective_student: :student).
where(seminars: seminar_search_params, strudents: student_search_params).
where('objective_seminars.priority > 0').
where('objective_students.ready = 1 AND points_all_time <= 7').
order('objective_seminars.priority ASC').
distinct
Now for the database load it all depends on your indexes and the size of your tables.
The above query will translate to the following SQL (or something similar).
SELECT DISTINCT objectives.* FROM objectives
INNER JOIN objective_students ON objective_students.objective_id = objectives.id
INNER JOIN students ON students.id = objective_students.student_id
INNER JOIN objective_seminars ON objective_seminars.objective_id = objectives.id
INNER JOIN seminars ON seminars.id = objective_seminars.seminar_id
WHERE seminars_query AND
students_query AND
objective_seminars.priority > 0 AND
objective_students.ready = 1 AND points_all_time <= 7 AND
objective_seminars.priority ASC
So you'll need to add or extend your indexes so that all 5 tables queries can have an index helping out. The actual index implementation is up to you and depends on your application's specific (read - write load, tables size, cardinality etc)

Get the average of the most recent records within groups with ActiveRecord

I have the following query, which calculates the average number of impressions across all teams for a given name and league:
#all_team_avg = NielsenData
.where('name = ? and league = ?', name, league)
.average('impressions')
.to_i
However, there can be multiple entries for each name/league/team combination. I need to modify the query to only average the most recent records by created_at.
With the help of this answer I came up with a query which gets the result that I need (I would replace the hard-coded WHERE clause with name and league in the application), but it seems excessively complicated and I have no idea how to translate it nicely into ActiveRecord:
SELECT avg(sub.impressions)
FROM (
WITH summary AS (
SELECT n.team,
n.name,
n.league,
n.impressions,
n.created_at,
ROW_NUMBER() OVER(PARTITION BY n.team
ORDER BY n.created_at DESC) AS rowcount
FROM nielsen_data n
WHERE n.name = 'Social Media - Twitter Followers'
AND n.league = 'National Football League'
)
SELECT s.*
FROM summary s
WHERE s.rowcount = 1) sub;
How can I rewrite this query using ActiveRecord or achieve the same result in a simpler way?
When all you have is a hammer, everything looks like a nail.
Sometimes, raw SQL is the best choice. You can do something like:
#all_team_avg = NielsenData.find_by_sql("...your_sql_statement_here...")

How do I access the data fields inside a bag in pig latin?

I am using the IMDB database to find the actor/actress with the highest rating and was in the most movies in a given year. I am trying to join the actors dataset with their ratings. Then filter the year and sort the data based on highest rating and movie count.
joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE *;
actorsYear = FILTER actorRating BY(year MATCHES '2000');
groupedYear = GROUP actorsYear BY (year,rating,firstName,lastName);
aggregatedYear = FOREACH groupedYear GENERATE group, COUNT (actorsYear) AS movieCount;
unaggregatedYear = FOREACH aggregatedYear GENERATE FLATTEN(group) AS (year,rating,firstName,lastName);
sortRating = ORDER unaggregatedYear BY rating ASC, count ASC;
dump sortRating;
The compiler says that the second line is an "Invalid field projection" but I am not sure how to access the year field after joining the two datasets. Does anyone know how to fix this?
After your join, you need to project the fields you want through to your current relation.
joinedActorRating = JOIN ratings by movie, actors BY movie;
actorRating = FOREACH joinedActorRating GENERATE ratings::movie as movie
, ratings::rank as rank, ratings::year as year, actors::firstName as firstName
, actors::lastName as lastName;
I'm not sure which columns are in which table (other than movie is in both) because you didn't include the two tables, so I just guessed. You can modify the projections as needed.

How to get records from multiple condition from a same column through associated table

Let say a book model HABTM categories, for an example book A has categories "CA" & "CB". How can i retrieve book A if I query using "CA" & "CB" only. I know about the .where("category_id in (1,2)") but it uses OR operation. I need something like AND operation.
Edited
And also able to get books from category CA only. And how to include query criteria such as .where("book.p_year = 2012")
ca = Category.find_by_name('CA')
cb = Category.find_by_name('CB')
Book.where(:id => (ca.book_ids & cb.book_ids)) # & returns elements common to both arrays.
Otherwise you'd need to abuse the join table directly in SQL, group the results by book_id, count them, and only return rows where the count is at least equal to the number of categories... something like this (but I'm sure it's wrong so double check the syntax if you go this route. Also not sure it would be any faster than the above):
SELECT book_id, count(*) as c from books_categories where category_id IN (1,2) group by book_id having count(*) >= 2;

Ranking position

I have a Rails application with the following models:
User
Bet
User has many_bets and Bets belongs_to User. Every Bet has a Profitloss value, which states how much the User has won/lost on that Bet.
So to calculate how much a specific User has won overall I cycle through his bets in the following way:
User.bets.sum(:profitloss)
I would like to show the User his ranking compared to all the other Users, which could look something like this:
"Your overall ranking: 37th place"
To do so I need to sum up the overall winnings per User, and find out in which position the current user is.
How do I do that and how to do it, so it don't overload the server :)
Thanks!
You can try something similar to
User.join(:bets).
select("users.id, sum(bets.profitloss) as accumulated").
group("users.id").
order("accumulated DESC")
and then search in the resulting list of "users" (not real users, they have only two meaningful attributes, their ID and a accumulated attribute with the sum), for the one corresponding to the current one.
In any case to get a single user's position, you have to calculate all users' accumulated, but at least this is only one query. Even better, you can store in the user model the accumulated value, and query just it for ranking.
If you have a large number of Users and Bets, you won't be able to compute and sort the global profitloss of each user "on demand", so I suggest that you use a rake task that you schedule regularly (once a day, every hour, etc...)
Add a column position in the User model, get the list of all Users, compute their global profitloss, sort the list of Users with their profitloss, and finally update the position attribute of each User with their position in the list.
Best way to do it is to keep a pre calculated total in your database either on user model itself or on a separate model that has 1:1 relation to user. If you don't do this, you will have to calculate sum for all users at all times in order to get their rating, which means full table operation on bets table. This said, this query will give you desired results, if more than 1 person has the same total, it will count both as rating X:
select id, (select count(h.id) from users u inner join
(select user_id, sum(profitloss) as `total` from bets group by user_id) b2
on b2.user_id = u.id, (select id from users) h inner join
(select user_id, sum(profitloss) as `total` from bets group by user_id) b
on b.user_id = h.id where u.id = 1 and (b.total > b2.total))
as `rating` from users where id = 1;
You will need to plug user.id into query in where id = X
if you add a column to user table to keep track of their total, query is a little simpler, in this example column name is total_profit_loss:
select id, total_profit_loss, (select count(h.username)+1 from users u,
(select username, score from users) h
where id = 1 and (h.total_profit_loss > u.total_profit_loss))
as `rating` from users where id = 1;

Resources