memcached scalability and efficient voting stats - ruby-on-rails

How scalable is memcache in an enviornment where a cache is potentially getting expired every second. In fact, my question is not just about scalability of memcached but about situations where a model is continuously changing and the best way to scale that type of environment. One might say, why cache if the cache is getting expired every second.
Consider this in an hypothetical app, where people are posting marking posts as favorites and let's just consider that there are thousands of people constantly marking posts favorite and creating a favorite record as a result. With each insertion the post view needs to be updated to show the current stats about posts, how many people made it favorite, a user's favorite count etc etc.
We were thinking this could be cached to show only a snapshot taken every x many minutes..but is there a good way to make this more real time in rails?

Try such algo:
Update db records for post with +1 vote
Create cache value || update existing cache +1 for post fav count
Create cache value || update existing cache +1 for user fav count
Because cache has no transactions values may be not consistent with current DB values -
rare parallel read and write race condition may occur. But these inconsistencies will expire as soon cache entry invalidates and values are recalculated from DB.
In above scenario cache sweeping operation may be performed every minute or even every hour - depending how accurate stats should be. Remember that you don't lose anything - all accurate data is kept in DB.
Most important here is that users see realtime value change for vote.
Memcached read/write cost is definitely less than DB here because memcache is just multi-access hash store with no transactions.

Related

Efficient way to retrieve user count in Swift app

I'm making a simple Swift meditation app and want to have a feature to allow users to see how many others have installed the app as well ("You're part of a community of 354 other meditators")
My current plan - save a "blank" record on first load to public DB in CloudKit.
Then - each client on login retrieves all the records and counts how many there are?
Is there a better solution. I could imagine this getting slow if there are lots of users...
Thanks!
In terms of your CloudKit example, as far as I'm aware there is no option to return the number of records, instead CloudKit just returns the actual records in batches (it decides how many to return). However, you may specify a limit of records for it to return.
If you did specify a limit, you would need to continually update it since once the number of records grows larger than the limit it will no longer retrieve them all and your count will be wrong.
This would be a bad idea probably since you will have to continually release app updates to increase the limit (unless you stored this value in some kind of other external DB which would then probably be preferable to CloudKit itself). Basically, CloudKit is probably not the best idea for this.
It would probably be much easier to use a different public DB setup. Either set up your own or use a service like 'Parse.com' which makes setting up and connecting to a public DB very simple. An additional benefit of doing it this way is you can run the count query on the server and just return the count value itself rather than returning all records and counting them locally - very inefficient.

Ruby on Rails - Most efficient solution for this Class?

I'm a senior Comp. Sci. major working on a senior design project for our faculty. The name of this project is "Gradebook", and it is responsible for allowing instructors to record grades for students and for students to check their grades in a class. This project is written in Ruby on Rails, and this feature set is integrated into our current CS Website.
One requirement for our project is to constantly keep the course average and each of the student's averages updated. So I designed a CourseInfo class and a StudentInfo class to help with this process.
The CourseInfo class accepts a Gradebook (an ActiveRecord object) as a parameter and calculates the course average. It creates an Associative Array of StudentInfo objects, with each StudentInfo object containing the student's overall average in the class. The benefit of this is that I can calculate the Course Average with one line of code that initializes the class, and it is very clean.
But there is one issue that I'm mulling over. The problem is, the CourseInfo object does not survive when another HTTP request is made, I have to keep recreating it. Whether I'm adding an assignment, editing a category, or recording grades, I have to keep it updated because this project uses AJAX requests all the time. Instructors do not have to refresh any pages, because AJAX requests are created with every action.
For example, suppose I'm recording grades for a specific assignment. With each grade I record into the spreadsheet, an AJAX request is made and the course average updates with each new grade. But the problem is, if I want to update the Course Average after recording a student's grade, since the CourseInfo object does not stay alive in the next request, I have to recreate the object to keep the average updated. But that is a LOT of work. That involves calculating each of the student's average for EACH assignment, and then calculating the course average for EACH student. I know, a lot of work and could be simpler right?
So naturally, I want this CourseInfo object to live forever as long as the client is using the website. I've thought of many different ways to solve this problem:
1) Global Variables or Class Variables - I honestly want to stay away from this approach because I hear it is bad design. I also hear that this approach is not thread-safe. But it seems to provide a simple solution to my problem?
2) Serialize the Object in the Database - This is what I'm learning towards the most. I hear that sometimes people will serialize a Hash that contains user preferences in a web app, why not serialize my CourseInfo object? I've also done some research on the MessagePack gem, and I could potentially encode the CourseInfo object using MessagePack and then store it into the database. I feel like this would be a noticeable performance increase.
3) Use some kind of cache - Gems such as Redis act as a cache, and I liked Redis because it is a key value store. I can store a CourseInfo object for each Gradebook that was used during the session, and if I need to update the CourseInfo object, I can simply fetch the CourseInfo object by using the Gradebok's ID as a key. But I'm not sure if this is thread-safe. What if two instructors attempt to update two different grades at the same time? Will there be multiple instances of this CourseInfo object for each client using Gradebook?
4) Store it in the Session - Yeah I pretty much crossed this option off my list. I researched this approach, and I hear it is horrible to store a lot of data in the session. I don't want to do this.
What do you think? If I don't want to reinitialize this large object for each request, how can I make it live forever? What is the most efficient solution? What do you think about my design?
Help would be much appreciated! Thanks!
Use
2) Serialize the Object in the Database
due to agile philosophy of implementing the simplest thing that could possibly work first.
see Saving arrays, hashes, and other non-mappable objects in text columns
The course_average allways reflects the persistent state of the users records. Serializing it is a no braner in ActiveRecord. If you are using postgres , you can even use the native json store, which you can not only deserialize but also query through. No need for additional complexity to maintain an extra store. This solution has also the benefit of having a persistent counter cache.(no need to recalculate if nothing changes)
However using a cache is also a valuable option. Just remember, if you want to use redis as a cache store you have to explicitly configure a cache expiring policy, as by default none of the keys will expire and you will recieve an out of memory error, when redis grows beyound the size of RAM on the machine.
The redis-rails gem will setup rails to use redis for caching.
Storing this information in the session might also work, but watch out you session not getting to big. The whole session data is allways loaded completely into memory, regardles of some information in it is required or not. Allways loading megabytes of data into memory for every http connection might be not a great idea.
There is also a 5th option, i would evaluate first. Check, does the computation of averages really takes so long. Or can the peformance of it, pobably be improved, e.g. by reducing n+1 queries, setting proper indexes, doing the whole computation in sql or preparing the necessary data completly in sql, so that all the necessary data can be fetched in 1 query.

calculating lots of statistics on database user data: optimizing performance

I have the following situation (in Rails 3): my table contains financial transactions for each user (users can buy and sell products). Since lots of such transactions occur I present statistics related to the current user on the website, e.g. current balance, overall profit, how many products sold/bought overall, averages, etc. (the same also on a per month/per year basis instead of overall). Parts of this information is displayed to the user on many forms/pages so that the user can always see his current account information (different bits of statistics is displayed on different pages though).
My question is: how can I optimize database performance (and is it worth it)? Surely, if the user is just browsing, there is no need to re-calculate all of the values every time a new page is loaded unless a change to the underlying database has been made?
My first solution would be to store these statistics in their own table and update them once a financial transaction has been added/edited (in Rails maybe using :after_update ?). Taking this further, if, for example, a new transaction has been made, then I can just modify the average instead of re-calculating the whole thing.
My second idea would be to use some kind of caching (if this is possible?), or to store these values in the session object.
Which one is the preferred/recommended way, or is all of this a waste of time as the current largest number of financial transactions is in the range of 7000-9000?
You probably want to investigate summary tables, also known as materialized views.
This link may be helpful:
http://wiki.postgresql.org/wiki/Materialized_Views

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Recommendations on handling object status fields in rails apps: store versus calculate?

I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.
The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.
The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.
Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.

Resources