user matching system, efficient search approach? - ruby-on-rails

EDIT: I know it's been over a year, but I finally got something new to this problem. To see an update for this look at this question: Rails 3 user matching-algorithm to SQL Query (COMPLICATED)
I'm working on a site where users are matched based on answered questions.
The match percentage is calculated each time a user, for example, visits another users profile page. So the matching percentage is not stored in the database and is recalculated all the time.
Now I want to build in a search where users can search for their best match.
The question I have is, what is the most efficient way to do this?
What if I have 50k users and I have to list them ordered by match percentages. Do I have to calculate each matching percentage between one and the other 50k users and then create a list out of that? Sounds kind of inefficient to me. Wouldn't that slow down the app drastically?
I hope someone can help me with this, because this gives me kind of a headache.
EDIT:
To clear things up a bit, here is my database model for user, questions, answers, user_answers and accepted_answers:
Tables:
Users(:id, :username, etc.)
Questions(:id, :text)
Answers(:id, :question_id, :text)
UserAnswers(:id, :user_id, :question_id, :answer_id, :importance)
AcceptedAnswers(:id, :user_answer_id, :answer_id)
Questions <-> Answers: one-to-many
Questions <-> UserAnswers: one-to-many
Users <-> UserAnswers: one-to-many
UserAnswers <-> AcceptableAnswers: one-to-many
So there is a list of Questions(with possible answers to this question) and Users give their "UserAnswers" to those questions, assign how important that question is to them and what answers they accept from other users.
Then if you take User1 and User2, you look for common answered questions, so UserAnswers where the question_id is the same. They have 10 questions in common. User1 gave the importance value 10 to the first five questions and the importance value 20 to the other five. User 2 gave acceptable answers to two 20 value and three 10 value questions. A total of 70 points. The highest reachable pointscore is of course 20x5 + 10x5... So User2 reached 70/150 * 100 = 46,66% ... The same thing is done the other way around for how much User1 reached of User2's assigned points to those questions. Those 2 percentages are then combined through the geometric mean: sqrt of percentage1 * percentage2 ... this gives the final match percentage

#Wassem's answer seems on spot to your problem. I would also suggest you take an approach where percentages are updated on new answers and new accepted answers.
I have created a db only solution(gist), which would work but has an additional complexity of an intermediate table.
Ideally you should create two more tables, one for importance and another for percentage matches. You should create/insert/delete rows in these tables when user assigns/updates importance to an answer or marks some answer as acceptable. You can also leverage delayed_job or rescue to update the tables in background on the particular actions.
You may need to run the sqls once in while to sync up the data in the two new tables as there can be inconsistencies arising due to concurrency and also due to ordering of update actions in certain cases.
Updates on a accepted answer should be straight forward as you only need to update one pair. But in case somebody assigns importance to a question, there can be a lot calculations and a lot of percentages might need updation. To avoid this you might chose to only maintain the table with sums of importance for each pair, update it when required and calculate actual percentages on the fly(in db off-course).

I suggest you keep the match percentage of all the users in your database. Create a table matches that has match percentage for a pair of users. You do not need to save match percentage for all the pairs of users in your database. A valid match percentage is calculated for two users only when any one of have them has accepted an answer from other user. Most of the users will not accept the answers of most of other users.
I will suggest you to calculate and save the match percentage not at the time when a user visits another users profile. But when a user accepts another users answers. This will make sure that you do not make any unnecessary calculation and match percentage for a pair of users is always fresh.

Related

Is it possible to store a list of ids as an attribute for an object in rails?

I'm trying to implement a voting system where users can upvote/downvote links posted by other users. A user can only vote on a link once, so before I execute upvote or downvote I need to check if the user has already voted and if they had already voted, wheather they upvoted or downvoted, so that I can disable the button for the other.
There are a few ways to do this. The most immediate solution that comes to me is to have two additional columns in the link model, one to store a list of ids of users that upvoted and the other to store a list of ids of users that downvoted.
Two concerns arise in my mind. One, is this even considered a good practice (in terms of database efficiency) and if it is the best way to do it, how do I store a list of ids as an attribute for the model? What would be the data type I need to enter for the migration?
No, it is not a good practice storing votes as list of ids in a field. You are violating the 1NF of your database. 1NF wiki
Imagine this happening on a scale of millions of votes, not only is the storage inefficient, but also imagining fetching and scanning the whole list if you want to see if a voter voted for given object.
The better solution for this will be to have A "Vote" table with columns like "voter_id", "voted_for_id", "vote_value".
Proper indexes will ensure that you will be able to do most of your operations very efficiently even on very large data. e.g.:- finding number of upvotes/downvotes for a candidate or finding whether a person has already voted for a candidate etc.
Is it possible to store a list of ids as an attribute for an object in rails?
Yes, it possible. One way is using Array datatype as
def change
add_column :links, :upvote_user_ids, :array, :default => []
end
is this even considered a good practice (in terms of database efficiency)
No, it is not at all recommended. Over the period of time the list will explode degrading your system thoroughly..
Consider acts_as_votable gem, this solves your query elegantly..

calculated fields: to store in DB or not to store?

I am building a ruby on rails application where a user can learn words from a story (having many stories on his list of stories to learn from), and conversely, a story can belong to many users. Although the story is not owned by the user (it's owned by the author), the user can track certain personal things about each story that relate to him and only to him, such as how many words are left to learn in each of his stories (which will obviously differ from user to user).
Currently, I have a has_many :through relationship set up through a third table called users_stories. My concern/question has to do with "calculated fields": is it really necessary to store things like words_learnt_in_this_story (or conversely, words_not_yet_learnt_in_this_story) in the database? It seems to me that things like this could be calculated by simply looking at a list of all the words that the user has already learnt (present on his learnt_words_list), and then simply contrast/compare that master list with the list of words in the story in order to calculate how many words are unlearnt.
The dilemma here is that if this is the case, if all these fields can simply be calculated, then there seems to be no reason to have a separate model. If this is the case, then there should just be a join model in the middle and have it be a has_and_belongs_to_many relationship, no? Furthermore, in such a scenario, where do calculated attributes such as words_to_learn get stored? Or maybe they don't need to get stored at all, and rather just get calculated on the fly every time the user loads his homepage?
Any thoughts on this would be much appreciated! Thanks, Michael.
If you're asking "is it really necessary to store calculated values in the DB" I answer you. No, it's not necessary.
But it can give you some pros. For example if you have lots of users and the users call those values calculating a lot then it could be more winnable strategy to calculate them once in a while. It will save your server resources.
Your real question now is "What will be more effective for you? Calculate values each time or calculate them once in a while and store in DB?"
In a true relational data model you don't need to store anything that can be calculated from the existing data.
If I understand you correctly you just want to have a master word list (table) and just reference those words in a relation. That is exactly how it should be modelled in a relational database and I suggest you stick with it for consistency reason. Just make sure you set the indices right in the database.
If further down the road you run into performance issue (usually you don't) you can solve that problems then by caching/views etc.
It is not necessary to store calculated values in the DB, but if the values are often used in logic or views its good idea to store it in Database once(calculate again on change) and use from there rather then calculating in views or model.

Display Sorted Column on Static Page in Rails

I am creating a website that allows users to evaluate their coworkers. My boss would like the averages to be displayed from best to worst on a static page that she can print and hang up in our store, so the employees can see their results compared to other employees. I have been searching for awhile now on how to easily sort a column. I found a Railscast on sorting columns, but it seems a lot more detailed than I truly need. I found the order API, but I don't think I'm implementing it the way I need to. I am hoping that maybe there is a one-liner that can help me solve this problem, such as:
#user = User.all.order(average: :asc)
Where I can load a static page that prints the user's name and their score. Thank you in advance!
Have you tried that code of yours? It should do exactly what you're asking except from lowest score to highest score.
You could simplify it a little and sort from highest to lowest by doing:
#users = User.order(average: :desc)
Like MarsAtomic said, this assumes that you actually have a column in your users table called average. If not we need more information on how your database is set up.

Rails 4 - Ordering by something not stored in the database

I am using Rails 4. I have a Room model with hour_price day_price and week_price attributes.
On the index, users are able to enter different times and dates they would like to stay in a room. Based on these values, I have a helper method that then calculates the total price it would cost them using the price attributes mentioned above.
My question is what is the best way to sort through the rooms and order them least to greatest (in terms of price). Having a hard time figuring out the best way to do this, especially when considering the price value is calculated by a helper and isn't stored in the database.
You could load all of them and do an array sort as is suggested here, and here. Though that would not scale well, but if you've already filtered by the rooms that are available this might be sufficient.
You might be able to push it back to the database by building a custom sql order by.
Rooms.order("(#{days} * day_price) asc")

How can I access complex data (like children data) in the Rails index view?

I'm developing an Rails app that will display food with its nutrients. I want to show only the nutrients that the user wants to see.
So, I have the models:
Food:
Nutrient:
FoodNutrient: Specifies the quantity of each nutrient in each food
UserNutrient: Specifies which nutrients the user wants to see
I will have thousands of foods and more than 100 nutrients
I saw several several sources that give hints on how to deal with this type of complexity (for now I'm considering in trying with Arel). However, these sources usually don't provide examples neither hints on how we should deal with this on the views. I found this one but would love more opinions on the issue, specially concerning the large data involved.
So, how is the best way to deal with this in my index view?
Another doubt that I have is if it is better for performance to have the FoodNutrient model or it is better to include columns on the Food model in which each new column would represent a nutrient. I suppose that the FoodNutrient bet is better as the user will choose which nutrients he will see but I'm not sure.
I would appreciate any example, explanation, advice, feedback or reference that may help me.
Edited
As there were some comments from people that didn't understand my question, I will try to summarize it in other words.
I want to get data from the first 3 models, and the last one (UserNutrient) I would use to reduce the number of rows shown to the user.
As I want to show something like:
Food Name | Nutrient 1 | Nutrient 2 | Nutrient 3
_______________________________________________________
Food 1 10 40 7.3
Food 2 9 4.4 9.1
I understand that I would have one loop on Food that would iterate one per row shown above. And I would also have to iterate on UserNutrient inside of the first loop to show the quantity of the nutrient on each food (this data is on UserNutrient). The main question is how to do these loops, specially considering that the tables will have lots of data. This one seems to be a little similar, although I didn't understand well.
My other doubt is if the structure is the best one. The FoodNutrient and Food tables could be merged.
I have researched about this problem and for now I'm decided to merge the FoodNutrient and Food tables/models as Food.
I believe that a FoodNutrient with lots of rows would be worse as it would have a huge index. Worse than a Food table with lots of columns.
This article helped me to decide:
http://rails-bestpractices.com/posts/58-select-specific-fields-for-performance
If you have something to add, please, answer the question too or add a comment.

Resources