How to use ActiveRecord calculations with relations - ruby-on-rails

I am trying to make an admin dashboard that shows an administrator relevant statistics about the site. For example, given a company has many users, finding the average number of users per company, or the maximum number of users a company has.
I have found activerecord::calculations, which seems to do most of what I want, but as far as I can tell, it doesn't let you do anything with relations. How would I go about finding counts or averages that are grouped by relations?

You have to think of it in terms of the User.
The simplest way would be
# get a hash of company_ids and user counts
User.group(:company_id).count
But then you'll have to load the Companies and match them up.
Then you could try and do
user_counts = User.group(:company_id).count
company_users = Company.all.map{|company| user_counts[company.id]}
# the maximum
company_users.max
# the average
company_users.sum.to_f / company_users.length

Related

Rails: Return highest and lowest associated records from a grouped association value taking the total number of associated records into account

In our Rails application we have two tables:
products:
- id
- name
feedbacks
- id
- product_id
- rating
So Products can have many feedbacks which have a rating (1-5).
What we want to do get the BEST and WORST Product by counting the number of feedbacks per product using (4-5) for BEST and (1-2) for WORST.
However we also need to take into account the total number of feedbacks for the product, otherwise if a product happened to have more feedback than another, it could end up being both the best and worst...
As an example, we've tried the following that 'should' return a list of products where the feedback rating is 4+ and then the first element would be the highest.
Product.joins(:feedbacks).group('feedbacks.rating').having('Max(rating) >= 4')
First question... is this the correct way to return this? Is there a better way to return only one record rather than returning an ActiveRecord relation and pulling the first record?
Second question... how do we take into account the number of feedbacks against the product? So the query becomes... 'Rating 4+ against total number of feedbacks on the product'
Writing an SQL query that will calculating a quality ranking sounds intimidating.
This sounds like it could be an XY Problem, I'd approach it a different way:
Goal:
A Product needs to have a ranking compared to its siblings.
I think the mathematicians can better speak to the algorithm you choose to set that rank.
My process for calculating rank would be something like this:
Add a ranking column to Product
Write a job that performs the "ranking" calculation
Trigger this job to run in the background either on a schedule, or when a Feedback is created / updated.
Use that ranking column for a very simple .order_by(ranking: :desc)

Include vs Join

I have 3 models
User - has many debits and has many credits
Debit - belongs to User
Credit - belongs to User
Debit and credit are very similar. The fields are basically the same.
I'm trying to run a query on my models to return all fields from debit and credit where user is current_user
User.left_outer_joins(:debits, :credits).where("users.id = ?", #user.id)
As expected returned all fields from User as many times as there were records in credits and debits.
User.includes(:credits, :debits).order(created_at: :asc).where("users.id = ?", #user.id)
It ran 3 queries and I thought it should be done in one.
The second part of this question is. How I could I add the record type into the query?
as in records from credits would have an extra field to show credits and same for debits
I have looked into ActiveRecordUnion gem but I did not see how it would solve the problem here
includes can't magically retrieve everything you want it to in one query; it will run one query per model (typically) that you need to hit. Instead, it eliminates future unnecessary queries. Take the following examples:
Bad
users = User.first(5)
users.each do |user|
p user.debits.first
end
There will be 6 queries in total here, one to User retrieving all the users, then one for each .debits call in the loop.
Good!
users = User.includes(:debits).first(5)
users.each do |user|
p user.debits.first
end
You'll only make two queries here: one for the users and one for their associated debits. This is how includes speeds up your application, by eagerly loading things you know you'll need.
As for your comment, yes it seems to make sense to combine them into one table. Depending on your situation, I'd recommend looking into Single Table Inheritance (STI). If you don't go this route, be careful with adding a column called type, Rails won't like that!
First of all, in the first query, by calling the query on User class you are asking for records of type User and if you do not want user objects you are performing an extra join which could be costly. (COULD BE not will be)
If you want credit and debit records simply call queries on Credit and Debit models. If you load user object somewhere prior to this point, use includes preload eager_load to do load linked credit and debit record all at once.
There is two way of pre-loading records in Rails. In the first, Rails performs single query of each type of record and the second one Rails perform only a one query and load objects of different types using the data returned.
includes is a smart pre-loader that performs either one of the ways depending on which one it thinks would be faster.
If you want to force Rails to use one query no matter what, eager_load is what you are looking for.
Please read all about includes, eager_load and preload in the article here.

how to specify two conditions in a collect in rails 4

In my system I have the notion of a cost. A cost belongs to one user and to one trip.
I am trying to extract all the costs that belong to a specific user and to a specific trip.
doing
#trip.costs.collect {|cost| [cost.value, cost.exchange_id]}
will get me all the costs for a specific trip
and doing
current_user.costs.collect {|cost| [cost.value, cost.exchange_id]}
will get me all the costs for a specific user.
How do I combine both of them?
collecting costs attributes for a specific trip of a specific user:
current_user.costs.where(trip_id: #trip.id).collect { |c| [c.value, c.exchange_id] }

rails 3.1 How to retrieve in a query those elements with the maximum number of relationships in a many to many?

I have tried different combinations but it seems that I cannot work this out.
I want to retrieve, from an Event model, those events which have the biggest number of users.
For example, I retrieve users of an event like this
#users = Event.find(x).users
They can be counted using this
Event.find(x).users.count
So, How should be done to order the list by the number of users each event has. And then retrieve the 8 first?
The same issue was resolved in: How to get highest count of associated model (Rails)?
Event.order("events.users_count DESC")

user matching system, efficient search approach?

EDIT: I know it's been over a year, but I finally got something new to this problem. To see an update for this look at this question: Rails 3 user matching-algorithm to SQL Query (COMPLICATED)
I'm working on a site where users are matched based on answered questions.
The match percentage is calculated each time a user, for example, visits another users profile page. So the matching percentage is not stored in the database and is recalculated all the time.
Now I want to build in a search where users can search for their best match.
The question I have is, what is the most efficient way to do this?
What if I have 50k users and I have to list them ordered by match percentages. Do I have to calculate each matching percentage between one and the other 50k users and then create a list out of that? Sounds kind of inefficient to me. Wouldn't that slow down the app drastically?
I hope someone can help me with this, because this gives me kind of a headache.
EDIT:
To clear things up a bit, here is my database model for user, questions, answers, user_answers and accepted_answers:
Tables:
Users(:id, :username, etc.)
Questions(:id, :text)
Answers(:id, :question_id, :text)
UserAnswers(:id, :user_id, :question_id, :answer_id, :importance)
AcceptedAnswers(:id, :user_answer_id, :answer_id)
Questions <-> Answers: one-to-many
Questions <-> UserAnswers: one-to-many
Users <-> UserAnswers: one-to-many
UserAnswers <-> AcceptableAnswers: one-to-many
So there is a list of Questions(with possible answers to this question) and Users give their "UserAnswers" to those questions, assign how important that question is to them and what answers they accept from other users.
Then if you take User1 and User2, you look for common answered questions, so UserAnswers where the question_id is the same. They have 10 questions in common. User1 gave the importance value 10 to the first five questions and the importance value 20 to the other five. User 2 gave acceptable answers to two 20 value and three 10 value questions. A total of 70 points. The highest reachable pointscore is of course 20x5 + 10x5... So User2 reached 70/150 * 100 = 46,66% ... The same thing is done the other way around for how much User1 reached of User2's assigned points to those questions. Those 2 percentages are then combined through the geometric mean: sqrt of percentage1 * percentage2 ... this gives the final match percentage
#Wassem's answer seems on spot to your problem. I would also suggest you take an approach where percentages are updated on new answers and new accepted answers.
I have created a db only solution(gist), which would work but has an additional complexity of an intermediate table.
Ideally you should create two more tables, one for importance and another for percentage matches. You should create/insert/delete rows in these tables when user assigns/updates importance to an answer or marks some answer as acceptable. You can also leverage delayed_job or rescue to update the tables in background on the particular actions.
You may need to run the sqls once in while to sync up the data in the two new tables as there can be inconsistencies arising due to concurrency and also due to ordering of update actions in certain cases.
Updates on a accepted answer should be straight forward as you only need to update one pair. But in case somebody assigns importance to a question, there can be a lot calculations and a lot of percentages might need updation. To avoid this you might chose to only maintain the table with sums of importance for each pair, update it when required and calculate actual percentages on the fly(in db off-course).
I suggest you keep the match percentage of all the users in your database. Create a table matches that has match percentage for a pair of users. You do not need to save match percentage for all the pairs of users in your database. A valid match percentage is calculated for two users only when any one of have them has accepted an answer from other user. Most of the users will not accept the answers of most of other users.
I will suggest you to calculate and save the match percentage not at the time when a user visits another users profile. But when a user accepts another users answers. This will make sure that you do not make any unnecessary calculation and match percentage for a pair of users is always fresh.

Resources