Storing data popularity in a Ruby on Rails app - ruby-on-rails

I was reading this Stack Overflow question and was wondering what is a common practice to store popularity values for data in a Ruby on Rails application?
My thinking is to have 2 models, a regular model and a popular one that has data from the regular model sorted by a popularity formula. A cronjob would populate the latter model at some specific interval.
Any thoughts?

It would depend on the specifics, but I think you can store the popularity information as a column in the model. For example, if you had Questions which you wanted to sort by popularity, you could run a migration AddPopularityToQuestions popularity:float.
You could then run a script at set intervals (e.g. with Whenever) to update the popularity value for each question. However, if there isn't that much activity, it might make more sense to update the popularity for a question whenever something happens that will change it. For example, if popularity is mainly determined by votes, you could update a question's popularity whenever there are new votes.

Use one rake task to update the popularity and run it with heroku scheduler.

I would imagine popularity to be statistical. For example, views over the last 7 days, etc. The measurement you might want to use might change, even allowing the user to select which formula to use.
In this case, you'd want a Statistics model that belongs_to the regular model, and you can join the tables to get the popularity measurement that you're interested in.

I would go with a popularity_score field in the model and a scope that returns the popular items for example:
def popular(count = 10)
order('popularity_score DESC').limit(count)
end
depending on the scoring algorithm, I may add additional model to hold the statistics
class model_stats
attr_accessor :statistic, :value
belongs_to :model
end
This could hold stats like views, up_votes or shares which would be periodically aggregated using your preferred popularity algorithm and the result saved into popularity_score. (managed as a rake task kicked off by cron or similar)
I would make sure that popularity_score was an indexed field!

Related

Is it possible to store a list of ids as an attribute for an object in rails?

I'm trying to implement a voting system where users can upvote/downvote links posted by other users. A user can only vote on a link once, so before I execute upvote or downvote I need to check if the user has already voted and if they had already voted, wheather they upvoted or downvoted, so that I can disable the button for the other.
There are a few ways to do this. The most immediate solution that comes to me is to have two additional columns in the link model, one to store a list of ids of users that upvoted and the other to store a list of ids of users that downvoted.
Two concerns arise in my mind. One, is this even considered a good practice (in terms of database efficiency) and if it is the best way to do it, how do I store a list of ids as an attribute for the model? What would be the data type I need to enter for the migration?
No, it is not a good practice storing votes as list of ids in a field. You are violating the 1NF of your database. 1NF wiki
Imagine this happening on a scale of millions of votes, not only is the storage inefficient, but also imagining fetching and scanning the whole list if you want to see if a voter voted for given object.
The better solution for this will be to have A "Vote" table with columns like "voter_id", "voted_for_id", "vote_value".
Proper indexes will ensure that you will be able to do most of your operations very efficiently even on very large data. e.g.:- finding number of upvotes/downvotes for a candidate or finding whether a person has already voted for a candidate etc.
Is it possible to store a list of ids as an attribute for an object in rails?
Yes, it possible. One way is using Array datatype as
def change
add_column :links, :upvote_user_ids, :array, :default => []
end
is this even considered a good practice (in terms of database efficiency)
No, it is not at all recommended. Over the period of time the list will explode degrading your system thoroughly..
Consider acts_as_votable gem, this solves your query elegantly..

calculated fields: to store in DB or not to store?

I am building a ruby on rails application where a user can learn words from a story (having many stories on his list of stories to learn from), and conversely, a story can belong to many users. Although the story is not owned by the user (it's owned by the author), the user can track certain personal things about each story that relate to him and only to him, such as how many words are left to learn in each of his stories (which will obviously differ from user to user).
Currently, I have a has_many :through relationship set up through a third table called users_stories. My concern/question has to do with "calculated fields": is it really necessary to store things like words_learnt_in_this_story (or conversely, words_not_yet_learnt_in_this_story) in the database? It seems to me that things like this could be calculated by simply looking at a list of all the words that the user has already learnt (present on his learnt_words_list), and then simply contrast/compare that master list with the list of words in the story in order to calculate how many words are unlearnt.
The dilemma here is that if this is the case, if all these fields can simply be calculated, then there seems to be no reason to have a separate model. If this is the case, then there should just be a join model in the middle and have it be a has_and_belongs_to_many relationship, no? Furthermore, in such a scenario, where do calculated attributes such as words_to_learn get stored? Or maybe they don't need to get stored at all, and rather just get calculated on the fly every time the user loads his homepage?
Any thoughts on this would be much appreciated! Thanks, Michael.
If you're asking "is it really necessary to store calculated values in the DB" I answer you. No, it's not necessary.
But it can give you some pros. For example if you have lots of users and the users call those values calculating a lot then it could be more winnable strategy to calculate them once in a while. It will save your server resources.
Your real question now is "What will be more effective for you? Calculate values each time or calculate them once in a while and store in DB?"
In a true relational data model you don't need to store anything that can be calculated from the existing data.
If I understand you correctly you just want to have a master word list (table) and just reference those words in a relation. That is exactly how it should be modelled in a relational database and I suggest you stick with it for consistency reason. Just make sure you set the indices right in the database.
If further down the road you run into performance issue (usually you don't) you can solve that problems then by caching/views etc.
It is not necessary to store calculated values in the DB, but if the values are often used in logic or views its good idea to store it in Database once(calculate again on change) and use from there rather then calculating in views or model.

Rails 4 - Ordering by something not stored in the database

I am using Rails 4. I have a Room model with hour_price day_price and week_price attributes.
On the index, users are able to enter different times and dates they would like to stay in a room. Based on these values, I have a helper method that then calculates the total price it would cost them using the price attributes mentioned above.
My question is what is the best way to sort through the rooms and order them least to greatest (in terms of price). Having a hard time figuring out the best way to do this, especially when considering the price value is calculated by a helper and isn't stored in the database.
You could load all of them and do an array sort as is suggested here, and here. Though that would not scale well, but if you've already filtered by the rooms that are available this might be sufficient.
You might be able to push it back to the database by building a custom sql order by.
Rooms.order("(#{days} * day_price) asc")

Optimising a large number of rows in a Rails app database

We have a number of Meters which read a number of Parameters at a given interval and uploads the data (in CSV format) to a MYSQL database.
I have modelled this in Rails as follows:
Meter
has_many :parameters
Parameter
belongs_to :meter
has_many :readings
Reading
belongs_to :parameter
(I've used normal foreign keys - meter_id and parameter_id - to link the tables)
This is working great with my seed data and I'm using self.readings.pluck(:value).latest in my Parameter model in order to grab the latest value and pass it to the view.
The only problem is that the meters upload the data every 30 seconds. This means that - as there are currently 20 parameters - just over a months worth of data has left me with over 20,000,000 rows in my Readings table and this means that the queries to grab the latest are taking around 500ms each.
I'm after suggestions of ways to optimise this. I've added an index to the parameter_id field but, other than that, I'm not really sure of the best way to proceed...
It may be that I need to rethink the way that my database is structured, but this seemed to make most sense as I want to be able to dynamically add new parameters down the line (hence why I couldn't just make my columns the parameter names) and this seems to be the way that Rails stores data by default.
Thanks in advance.
If you are using Rails 3 and want to keep using a relational database your best option is to use table partitioning.
If you use PostgreSQL you can use the partitioned gem and check this slides to get an overview.
If you want to use Rails 4, since the partitioned gem is not compatible with ActiveRecord 4, I would advise you to use manual partitioning, you can use the year as your partition point, for example.
Check this blog post on Sharding and Partitioning and evaluate what should work best.

Rails Caching DB Queries and Best Practices

The DB load on my site is getting really high so it is time for me to cache common queries that are being called 1000s of times an hour where the results are not changing.
So for instance on my city model I do the following:
def self.fetch(id)
Rails.cache.fetch("city_#{id}") { City.find(id) }
end
def after_save
Rails.cache.delete("city_#{self.id}")
end
def after_destroy
Rails.cache.delete("city_#{self.id}")
end
So now when I can City.find(1) the first time I hit the DB but the next 1000 times I get the result from memory. Great. But most of the calls to city are not City.find(1) but #user.city.name where Rails does not use the fetch but queries the DB again... which makes sense but not exactly what I want it to do.
I can do City.find(#user.city_id) but that is ugly.
So my question to you guys. What are the smart people doing? What is
the right way to do this?
With respect to the caching, a couple of minor points:
It's worth using slash for separation of object type and id, which is rails convention. Even better, ActiveRecord models provide the cacke_key instance method which will provide a unique identifier of table name and id, "cities/13" etc.
One minor correction to your after_save filter. Since you have the data on hand, you might as well write it back to the cache as opposed to delete it. That's saving you a single trip to the database ;)
def after_save
Rails.cache.write(cache_key,self)
end
As to the root of the question, if you're continuously pulling #user.city.name, there are two real choices:
Denormalize the user's city name to the user row. #user.city_name (keep the city_id foreign key). This value should be written to at save time.
-or-
Implement your User.fetch method to eager load the city. Only do this if the contents of the city row never change (i.e. name etc.), otherwise you can potentially open up a can of worms with respect to cache invalidation.
Personal opinion:
Implement basic id based fetch methods (or use a plugin) to integrate with memcached, and denormalize the city name to the user's row.
I'm personally not a huge fan of cached model style plugins, I've never seen one that's saved a significant amount of development time that I haven't grown out of in a hurry.
If you're getting way too many database queries it's definitely worth checking out eager loading (through :include) if you haven't already. That should be the first step for reducing the quantity of database queries.
If you need to speed up sql queries on data that doesnt change much over time then you can use materialized views.
A matview stores the results of a query into a table-like structure of
its own, from which the data can be queried. It is not possible to add
or delete rows, but the rest of the time it behaves just like an
actual table. Queries are faster, and the matview itself can be
indexed.
At the time of this writing, matviews are natively available in Oracle
DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL
doesn’t provide native support for matviews, unfortunately, but there
are open source alternatives to it.
Here is some good articles on how to use matviews in Rails
sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails
hashrocket.com/materialized-view-strategies-using-postgresql
I would go ahead and take a look at Memoization, which is now in Rails 2.2.
"Memoization is a pattern of
initializing a method once and then
stashing its value away for repeat
use."
There was a great Railscast episode on it recently that should get you up and running nicely.
Quick code sample from the Railscast:
class Product < ActiveRecord::Base
extend ActiveSupport::Memoizable
belongs_to :category
def filesize(num = 1)
# some expensive operation
sleep 2
12345789 * num
end
memoize :filesize
end
More on Memoization
Check out cached_model

Resources