rails activerecord objects persistence for a specified time - ruby-on-rails

In a rails 3 app, I need to create objects who are persistent for specified time. For example, say I created an activerecord object (of a model) but need it to be around only for say 12 hours. The activerecord object needs to be automatically destroyed from the database after 12 hours. There is no need for the deletion to happen through controller, it could happen directly on the database. What is the best way to achieve this? There might be thousands of such objects getting created and destroyed at any timer and each needs to be destroyed at specified time.
I looked around and found some gems - whenever & delayed job. whenever is more of a wrapper on cron and hence not suitable considering the number of activerecord objects to be managed. I am yet to investigate 'delayed job' which uses background process, but wanted to get some feedback on available options.
Thanks in advance.

Why don't you run a cron/whenever job every few minutes that queries the database for objects where created_at > 12.hours.ago? Thus you manage to delete all objects by query instead of going one by one.
every 5.minutes do
Model.where(:created_at > 12.hours.ago).delete_all
end
If you need to be more precise, I can imagine using Redis keys for your objects and setting the time expiration on keys at 12 hours, though you would need to figure out how to connect with Redis.
In any case, something needs to be observing the database for this to happen, or you have expiring keys.

Related

Ruby on Rails - Most efficient solution for this Class?

I'm a senior Comp. Sci. major working on a senior design project for our faculty. The name of this project is "Gradebook", and it is responsible for allowing instructors to record grades for students and for students to check their grades in a class. This project is written in Ruby on Rails, and this feature set is integrated into our current CS Website.
One requirement for our project is to constantly keep the course average and each of the student's averages updated. So I designed a CourseInfo class and a StudentInfo class to help with this process.
The CourseInfo class accepts a Gradebook (an ActiveRecord object) as a parameter and calculates the course average. It creates an Associative Array of StudentInfo objects, with each StudentInfo object containing the student's overall average in the class. The benefit of this is that I can calculate the Course Average with one line of code that initializes the class, and it is very clean.
But there is one issue that I'm mulling over. The problem is, the CourseInfo object does not survive when another HTTP request is made, I have to keep recreating it. Whether I'm adding an assignment, editing a category, or recording grades, I have to keep it updated because this project uses AJAX requests all the time. Instructors do not have to refresh any pages, because AJAX requests are created with every action.
For example, suppose I'm recording grades for a specific assignment. With each grade I record into the spreadsheet, an AJAX request is made and the course average updates with each new grade. But the problem is, if I want to update the Course Average after recording a student's grade, since the CourseInfo object does not stay alive in the next request, I have to recreate the object to keep the average updated. But that is a LOT of work. That involves calculating each of the student's average for EACH assignment, and then calculating the course average for EACH student. I know, a lot of work and could be simpler right?
So naturally, I want this CourseInfo object to live forever as long as the client is using the website. I've thought of many different ways to solve this problem:
1) Global Variables or Class Variables - I honestly want to stay away from this approach because I hear it is bad design. I also hear that this approach is not thread-safe. But it seems to provide a simple solution to my problem?
2) Serialize the Object in the Database - This is what I'm learning towards the most. I hear that sometimes people will serialize a Hash that contains user preferences in a web app, why not serialize my CourseInfo object? I've also done some research on the MessagePack gem, and I could potentially encode the CourseInfo object using MessagePack and then store it into the database. I feel like this would be a noticeable performance increase.
3) Use some kind of cache - Gems such as Redis act as a cache, and I liked Redis because it is a key value store. I can store a CourseInfo object for each Gradebook that was used during the session, and if I need to update the CourseInfo object, I can simply fetch the CourseInfo object by using the Gradebok's ID as a key. But I'm not sure if this is thread-safe. What if two instructors attempt to update two different grades at the same time? Will there be multiple instances of this CourseInfo object for each client using Gradebook?
4) Store it in the Session - Yeah I pretty much crossed this option off my list. I researched this approach, and I hear it is horrible to store a lot of data in the session. I don't want to do this.
What do you think? If I don't want to reinitialize this large object for each request, how can I make it live forever? What is the most efficient solution? What do you think about my design?
Help would be much appreciated! Thanks!
Use
2) Serialize the Object in the Database
due to agile philosophy of implementing the simplest thing that could possibly work first.
see Saving arrays, hashes, and other non-mappable objects in text columns
The course_average allways reflects the persistent state of the users records. Serializing it is a no braner in ActiveRecord. If you are using postgres , you can even use the native json store, which you can not only deserialize but also query through. No need for additional complexity to maintain an extra store. This solution has also the benefit of having a persistent counter cache.(no need to recalculate if nothing changes)
However using a cache is also a valuable option. Just remember, if you want to use redis as a cache store you have to explicitly configure a cache expiring policy, as by default none of the keys will expire and you will recieve an out of memory error, when redis grows beyound the size of RAM on the machine.
The redis-rails gem will setup rails to use redis for caching.
Storing this information in the session might also work, but watch out you session not getting to big. The whole session data is allways loaded completely into memory, regardles of some information in it is required or not. Allways loading megabytes of data into memory for every http connection might be not a great idea.
There is also a 5th option, i would evaluate first. Check, does the computation of averages really takes so long. Or can the peformance of it, pobably be improved, e.g. by reducing n+1 queries, setting proper indexes, doing the whole computation in sql or preparing the necessary data completly in sql, so that all the necessary data can be fetched in 1 query.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

rails 3 & activerecord: do I need to take special record locking precautions to update a counter field?

Each time a user's profile gets displayed we update an integer counter in the Users table.
I started thinking about high-concurrencey situations and wondered what happens if a bunch of people hit a user's profile page at the exact same time: does rails/activerecord magically handle the record locking and semaphore stuff for me?
Or does my app need to explicitly call some sort of mechanism to avoid missing update events when concurrent calls are made to my update method?
def profile_was_hit
self.update_attributes :hitcounter => self.hitcount + 1
end
And along those lines, when should I use something like Users.increment_counter(:hit counter, self.id) instead?
In the default configuration, a single Rails instance is only handling a single request at a time, so you don't have to worry about any concurrency trouble on the application layer.
If you have multiple instances of your application running (which you probably do/will), they will all make requests to the database without talking to one another about it. This is why this is handled at the database layer. MySQL, PostgreSQL, etc. are all going to lock the row on write.
The way you are handling this situation isn't ideal for performance though because your application is reading the value, incrementing it, and then writing it. That lag between read and write does allow for you to miss values. You can avoid this by pushing the increment responsibility to your database (UPDATE hitcounter SET hitcount = hitcount + 1;). I believe ActiveRecord has support for this built in, I'll/you'll have to go dig around for it though. Update: Oh, duh, yes you want to use the increment_counter method to do this. Reference/Documentation.
Once you update your process push incrementing responsibility to the database I wouldn't worry about performance for a while. I once had a PHP app do this once per request and it scaled gloriously for me to 100+ updates/second (mysqld on the same machine, no persistent connections, and always < 10% cpu).

Recommendations on handling object status fields in rails apps: store versus calculate?

I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.
The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.
The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.
Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.

Resources