How do I avoid constantly re-calculating summary data using Rails? - ruby-on-rails

I have a user profile page that has a sidebar with user stats like not unlike the Stack Overflow profile page (e.g., total visits, number of badges).
The trouble is that currently I'm hitting the database and calculating these stats with every single request. I can implement fragment caching to cut down on this, but is there a better way to handle this type of thing?
Storing the aggregated summary data in the database seems like it might lead to problems (i.e., inconsistency).

You could store this information in the database instead of recalculating it, using:
Counter caching
Custom callbacks
Counter Caching
As an example, if measuring the number of badges, you could create a database field in User called badges_count, then in the Badge model, have belongs_to :user, :counter_cache => true. Now, whenever the number of badges changes, you can access the count without any new calculations at #user.badges_count.
A basic implementation: http://asciicasts.com/episodes/23-counter-cache-column
Custom Callbacks
Let's say you have a field that measures behavior that is more complex than a simple count. In this case, just implement callbacks that update a field whenever a certain action occurs using before_save, after_save, before_create, etc.
Concerns about Inconsistency
Storing the data in your database will only be inconsistent if you're doing it wrong. There are a finite number of paths through which any statistic can be updated, and you should ensure that all paths are covered in updating whichever field you are using. Rails does it for you with counter_caching, and you have to do it yourself if you use custom callbacks or you have some unusual situation.

You could use a hidden div like in this question (Caching data by using hidden divs). Depending on how much data you want to cache this might be a good solution.

Related

calculated fields: to store in DB or not to store?

I am building a ruby on rails application where a user can learn words from a story (having many stories on his list of stories to learn from), and conversely, a story can belong to many users. Although the story is not owned by the user (it's owned by the author), the user can track certain personal things about each story that relate to him and only to him, such as how many words are left to learn in each of his stories (which will obviously differ from user to user).
Currently, I have a has_many :through relationship set up through a third table called users_stories. My concern/question has to do with "calculated fields": is it really necessary to store things like words_learnt_in_this_story (or conversely, words_not_yet_learnt_in_this_story) in the database? It seems to me that things like this could be calculated by simply looking at a list of all the words that the user has already learnt (present on his learnt_words_list), and then simply contrast/compare that master list with the list of words in the story in order to calculate how many words are unlearnt.
The dilemma here is that if this is the case, if all these fields can simply be calculated, then there seems to be no reason to have a separate model. If this is the case, then there should just be a join model in the middle and have it be a has_and_belongs_to_many relationship, no? Furthermore, in such a scenario, where do calculated attributes such as words_to_learn get stored? Or maybe they don't need to get stored at all, and rather just get calculated on the fly every time the user loads his homepage?
Any thoughts on this would be much appreciated! Thanks, Michael.
If you're asking "is it really necessary to store calculated values in the DB" I answer you. No, it's not necessary.
But it can give you some pros. For example if you have lots of users and the users call those values calculating a lot then it could be more winnable strategy to calculate them once in a while. It will save your server resources.
Your real question now is "What will be more effective for you? Calculate values each time or calculate them once in a while and store in DB?"
In a true relational data model you don't need to store anything that can be calculated from the existing data.
If I understand you correctly you just want to have a master word list (table) and just reference those words in a relation. That is exactly how it should be modelled in a relational database and I suggest you stick with it for consistency reason. Just make sure you set the indices right in the database.
If further down the road you run into performance issue (usually you don't) you can solve that problems then by caching/views etc.
It is not necessary to store calculated values in the DB, but if the values are often used in logic or views its good idea to store it in Database once(calculate again on change) and use from there rather then calculating in views or model.

rails sums model

I have a lot of sum functions in my app. I am moving all data sums to a separate model...lets call it...sums. This will make calling those numbers faster as I wont nail the database everytime I need to sum many rows.
What I would like to do is update the sum row (many attributes) each time certain other models are created or updated. I am trying to this via each model using a class method but for some reason its not working.
Im wondering where I can create a universal method that can be called from an after_create callback for whichever models I choose. The sum is associated to an account, which is not the model being updated. Other account associated models are the ones that will hit the sums model. Therefor, I will probably need to pass self.account_id to the callback method.
Like Papertrails versions model, that is updated everytime another model is saved.
Have a look at Statistics gem.
It has a cache option that uses Rails cache to prevent repeated aggregation calls to database.
I would prefer something like this as opposed to have an update every time a property gets updated. If you are not planning to hold the sums model in memory, then you will have to fire as many updates to sums table as there are actual updates.
While it may be OK to have write-time overheads, as opposed to read-time, Statistics comes with other options to optimize.

Ruby on Rails Active Record Validation for "Draft" state best practices

I'm developing a form that I would like the user to have the option to return to. Ultimately all the fields need to be completed and I would like to incorporate the proper model level field validation before labeling the record as "complete". I can think of a few ways to do this:
Create two tables, one for records in "draft" state, with looser validation rules (ie fields don't necessarily need to be complete for the record to be saved), and a second table to store the records that have been submitted as "complete", obviously with the more stringent validation rules.
Create only one table to store the records, with a field that is labeled "isComplete", and based on this value determine which validation rules to apply.
I'm leaning towards option #2 because it would involve less working parts (in option #1 I would have to ensure that when I change a records state from "Draft" to "Complete" it gets deleted from one table and added to the other). The issue is I don't know how how do this elegantly in Rails.
Ultimately, as I'm sure this problem has been solved before, my question is:
What is the best practice in this situation?
Definitely option number two. Have an isComplete or draft option that is a boolean. Then in your models you can control what validations to run based on the state of the isComplete field. This can be done in a number of ways, for instance Rails has the concept of conditional validation which allows you to specify :if options on the validations so you can restrict what runs based on the complete state. You can also add a before_save or before_update hook to run methods based on if the record is a draft or not. With these two tools you should be able to have everything in a single table in an intuitive way.

Tracking changes on instances of a class and their associations - thoughts?

I have a class Question which has a lot of assocated models. On one page on my app i list a summary of all the current questions, with various info from associated records. Ultimately this is a hash of values that i then just print out into a csv-style row (i'll call this the 'row hash' from hereon)
I now have a requirement to only show the changes to questions (or their associated data) over a given period. I'm currently deliberating the best way to do this. Here's some ideas i've had so far, i'd welcome any feedback, thoughts, suggestions etc.
1) Approach 1 - acts_as_audited
This was my first thought as i've used this before in other apps. The porblem with aaa though is that it only tracks changes to the record's data (ie it doesn't care if the associations change). So, i could audit all of the associated records as well but then trying to piece together what had changed by tying different audit records together sounds like a nightmare.
2) Save the old and new hash out into serialized fields: ie
- when someone goes to the question/edit page, i calculate the current row hash and save it in a serialized field "old_data" in the question table. Then after they save the question i calculate the new current row hash and save it into a serialized field "new_data" in the question table. Also, i compare the two serialized hashes and save the differences into another serialized hash field 'changes'. Now to do my report i just look for questions updated in the last x days and output their changes data.
3) make a view
- i make a view which corresponds to the data that i want to output (ie that amalgamates all the data that i pull into my report). Then i track changes to the view - somehow. I'm not sure how exactly i would do that.
I'm leaning towards option 2 right now.
Any other thoughts/comments? grateful for any suggestions - max.
So, like you said, you only want to show changes to the records between time x and time y, right? This would seem perfect to me using the acts_as_audited plugin because you end up with a table of changes, right? So make a has_many_through association from Question to all these related tables, then search it for related changes, where date created is after time X. This would return a list of changes. From there, you could connect this list back to the parent object if you need to, or whatever - but it in the end seems like a more reasonable thing to search. You're not looking for a list of related objects, after all, you're looking for a list of changes, so having a table of changes seems a reasonable way to accomplish that?
Hey I had a similar problem, check this out. If you can, go with Mongoid or Mongomapper, embedded versioned documents are sweet.
Thanks guys. I ended up rolling my own solution because what i really needed to do was to capture changes in the results of various methods called on the object, some of which involved associated objects. I wasn't so much interested in the associated objects as (for example) a text string generated as a result of looking at a few different associated objects. I had methods to do all of this already so i really just needed to track changes in the results of calling these methods.
None of the plugins i saw could really do that simply and effectively, so i ended making a table called states which just holds a serialized hash with results of all of these method calls. This gets saved when the record is altered and saved or when any of the relevant associated objects get altered and saved. Then i have some methods to return the differences between different saved state records. It works well for my needs. Thanks very much for your advice anyway.

How to implement row ordering in Rails?

I'm trying to implement a UI feature for a listings page where the user can change the order of the records they have created.
I'd assume one way to do it would be to store a position field with some kind of editable auto-increment rule; The position values of rows could then be swapped as the user raises or the lowers the position. However I'm not quite sure how that would be done, I'm still a Rails newbie.
I should also mention that I am trying to a avoid solutions that tie me to a particular database.
Any suggestions?
acts_as_list is the standard solution here. You will have a position column in your model that will hold the ordering.
This is a commment/addition to the accepted answer above.
acts_as_list is not designed for anything beyond a prototype, and out of the box it does not handle concurrency or data integrity at all. At the very least, you must set your isolation level to serializable! Here's a short screen cast with more information, and some workarounds.

Resources