I'm a senior Comp. Sci. major working on a senior design project for our faculty. The name of this project is "Gradebook", and it is responsible for allowing instructors to record grades for students and for students to check their grades in a class. This project is written in Ruby on Rails, and this feature set is integrated into our current CS Website.
One requirement for our project is to constantly keep the course average and each of the student's averages updated. So I designed a CourseInfo class and a StudentInfo class to help with this process.
The CourseInfo class accepts a Gradebook (an ActiveRecord object) as a parameter and calculates the course average. It creates an Associative Array of StudentInfo objects, with each StudentInfo object containing the student's overall average in the class. The benefit of this is that I can calculate the Course Average with one line of code that initializes the class, and it is very clean.
But there is one issue that I'm mulling over. The problem is, the CourseInfo object does not survive when another HTTP request is made, I have to keep recreating it. Whether I'm adding an assignment, editing a category, or recording grades, I have to keep it updated because this project uses AJAX requests all the time. Instructors do not have to refresh any pages, because AJAX requests are created with every action.
For example, suppose I'm recording grades for a specific assignment. With each grade I record into the spreadsheet, an AJAX request is made and the course average updates with each new grade. But the problem is, if I want to update the Course Average after recording a student's grade, since the CourseInfo object does not stay alive in the next request, I have to recreate the object to keep the average updated. But that is a LOT of work. That involves calculating each of the student's average for EACH assignment, and then calculating the course average for EACH student. I know, a lot of work and could be simpler right?
So naturally, I want this CourseInfo object to live forever as long as the client is using the website. I've thought of many different ways to solve this problem:
1) Global Variables or Class Variables - I honestly want to stay away from this approach because I hear it is bad design. I also hear that this approach is not thread-safe. But it seems to provide a simple solution to my problem?
2) Serialize the Object in the Database - This is what I'm learning towards the most. I hear that sometimes people will serialize a Hash that contains user preferences in a web app, why not serialize my CourseInfo object? I've also done some research on the MessagePack gem, and I could potentially encode the CourseInfo object using MessagePack and then store it into the database. I feel like this would be a noticeable performance increase.
3) Use some kind of cache - Gems such as Redis act as a cache, and I liked Redis because it is a key value store. I can store a CourseInfo object for each Gradebook that was used during the session, and if I need to update the CourseInfo object, I can simply fetch the CourseInfo object by using the Gradebok's ID as a key. But I'm not sure if this is thread-safe. What if two instructors attempt to update two different grades at the same time? Will there be multiple instances of this CourseInfo object for each client using Gradebook?
4) Store it in the Session - Yeah I pretty much crossed this option off my list. I researched this approach, and I hear it is horrible to store a lot of data in the session. I don't want to do this.
What do you think? If I don't want to reinitialize this large object for each request, how can I make it live forever? What is the most efficient solution? What do you think about my design?
Help would be much appreciated! Thanks!
Use
2) Serialize the Object in the Database
due to agile philosophy of implementing the simplest thing that could possibly work first.
see Saving arrays, hashes, and other non-mappable objects in text columns
The course_average allways reflects the persistent state of the users records. Serializing it is a no braner in ActiveRecord. If you are using postgres , you can even use the native json store, which you can not only deserialize but also query through. No need for additional complexity to maintain an extra store. This solution has also the benefit of having a persistent counter cache.(no need to recalculate if nothing changes)
However using a cache is also a valuable option. Just remember, if you want to use redis as a cache store you have to explicitly configure a cache expiring policy, as by default none of the keys will expire and you will recieve an out of memory error, when redis grows beyound the size of RAM on the machine.
The redis-rails gem will setup rails to use redis for caching.
Storing this information in the session might also work, but watch out you session not getting to big. The whole session data is allways loaded completely into memory, regardles of some information in it is required or not. Allways loading megabytes of data into memory for every http connection might be not a great idea.
There is also a 5th option, i would evaluate first. Check, does the computation of averages really takes so long. Or can the peformance of it, pobably be improved, e.g. by reducing n+1 queries, setting proper indexes, doing the whole computation in sql or preparing the necessary data completly in sql, so that all the necessary data can be fetched in 1 query.
Related
I am facing to problem since I have millions of domain objects in one table. When I try to get all objects by using Domain.findAllBy() after couple of minutes I get OutOfMemoryError.. I would like to know if there is an efficient way to load all of them without getting this error again ?
Should I page the result and only load the necessaries ?
Please tell me if I am doing it wrong as well..
Thank you for your help and happy new year ;)
The real solution is not to try to retrieve all domain objects in one go into memory. No matter how much memory you buy, you can't guarantee that the domain objects won't grow more quickly than your RAM.
Next, even if you could store all the objects in memory, it would take a non-trivial amount of time to retrieve them all. Any operations you want to undertake on the objects - modifying attributes, calling methods - would take even longer.
I can't imagine a scenario where a human user would want to see millions of business objects on a web page - even paging through them all doesn't make sense.
So, if you are retrieving the objects to modify them, do so in the database. If you are retrieving them to run a method on the business objects, use paging, or consider if you can implement that method as a database call. If you're retrieving them for display, you'll need to allow the user to filter their request, and provide pagination.
I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.
I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.
I have always been taught that storing objects in a session was a bad idea. Instead IDs should be stored that retrieve the record when needed.
However, I have an application that I wonder is an exception to this rule. I'm building a flashcard application, and the words being quizzed are in a table in the database whose schema doesn't change. I want to store the words currently being quizzed in a session, so a user can finish where they started in case they move on to a separate page.
In this case, is it possible to get away with storing these words as objects in the database? If so, why? The reason I ask is because the quiz is designed to move quickly, and I'd hate to waste a database call on retrieving a record that never changes in the first place. However, perhaps there are other negatives to a large session that I'm not aware of.
*For the record, I have tried caching it with the built-in memcache methods in Rails 2.3, but apparently that has a maximum size per item of 1MB.
The main reason not to store objects in the session is that if the object structure changes, you will get an exception. Consider the following:
class Foo
attr_accessor :bar
end
class Bar
end
foo = Foo.new
foo.bar = Bar.new
put_in_session(foo)
Then, in a subsequent release of the project, you change Bar's name. You reboot the server, and try to grab foo out of the session. When it tries to deserialize, it fails to find Bar and explodes.
It might seem like it would be easy to avoid this pitfall, but in practice, I've seen it bite a number of people. This is just because serializing an object can sometimes take more along with it than is immediately apparent (this sort of thing is supposed to be transparent) and unless you have rigorous rules about this, things will tend to get flummoxed up.
The reason it's normally frowned upon is that it's extremely common for this to bite people in ActiveRecord, since it's quite common for the structure of your app to shift over time, and sessions can be deserialized a week or longer after they were originally created.
If you understand all that and are willing to put in the energy to be sure that your model does not change and is not serializing anything extra, you're probably fine. But be careful :)
Rails tends to encourage RESTful design, and using sessions isn't very RESTful. I'd probably make a Quiz resource that has a bunch of words, as well as a current_word. This way, when they come back, you'll know where they were.
Now, REST isn't everything (depending on who you talk to), but there's a pretty good case against large sessions. Remember that sessions write things to and from disk, and the more data that you're writing, the longer it takes to read back...
Since your app is a Rails app, I would suggest either:
Using your clients' ability to cache
by caching the cards in javascript.
(you'd need a fairly ajaxy app to
do this, see the latest RailsCast for some interesting points on javascript page caching)
Use one of the many other rails-supported server-side
caching options (i.e. MemCached) to
cache this data.
A much more insidious issue you'll encounter storing objects directly in the session is when you're using CookieStore (the default in Rails 2+ I believe). It's very easy to get CookieOverflow errors which are very hard to recover from.
I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.
The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.
The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.
Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.