Efficient way to retrieve user count in Swift app - ios

I'm making a simple Swift meditation app and want to have a feature to allow users to see how many others have installed the app as well ("You're part of a community of 354 other meditators")
My current plan - save a "blank" record on first load to public DB in CloudKit.
Then - each client on login retrieves all the records and counts how many there are?
Is there a better solution. I could imagine this getting slow if there are lots of users...
Thanks!

In terms of your CloudKit example, as far as I'm aware there is no option to return the number of records, instead CloudKit just returns the actual records in batches (it decides how many to return). However, you may specify a limit of records for it to return.
If you did specify a limit, you would need to continually update it since once the number of records grows larger than the limit it will no longer retrieve them all and your count will be wrong.
This would be a bad idea probably since you will have to continually release app updates to increase the limit (unless you stored this value in some kind of other external DB which would then probably be preferable to CloudKit itself). Basically, CloudKit is probably not the best idea for this.
It would probably be much easier to use a different public DB setup. Either set up your own or use a service like 'Parse.com' which makes setting up and connecting to a public DB very simple. An additional benefit of doing it this way is you can run the count query on the server and just return the count value itself rather than returning all records and counting them locally - very inefficient.

Related

Ruby on Rails - Most efficient solution for this Class?

I'm a senior Comp. Sci. major working on a senior design project for our faculty. The name of this project is "Gradebook", and it is responsible for allowing instructors to record grades for students and for students to check their grades in a class. This project is written in Ruby on Rails, and this feature set is integrated into our current CS Website.
One requirement for our project is to constantly keep the course average and each of the student's averages updated. So I designed a CourseInfo class and a StudentInfo class to help with this process.
The CourseInfo class accepts a Gradebook (an ActiveRecord object) as a parameter and calculates the course average. It creates an Associative Array of StudentInfo objects, with each StudentInfo object containing the student's overall average in the class. The benefit of this is that I can calculate the Course Average with one line of code that initializes the class, and it is very clean.
But there is one issue that I'm mulling over. The problem is, the CourseInfo object does not survive when another HTTP request is made, I have to keep recreating it. Whether I'm adding an assignment, editing a category, or recording grades, I have to keep it updated because this project uses AJAX requests all the time. Instructors do not have to refresh any pages, because AJAX requests are created with every action.
For example, suppose I'm recording grades for a specific assignment. With each grade I record into the spreadsheet, an AJAX request is made and the course average updates with each new grade. But the problem is, if I want to update the Course Average after recording a student's grade, since the CourseInfo object does not stay alive in the next request, I have to recreate the object to keep the average updated. But that is a LOT of work. That involves calculating each of the student's average for EACH assignment, and then calculating the course average for EACH student. I know, a lot of work and could be simpler right?
So naturally, I want this CourseInfo object to live forever as long as the client is using the website. I've thought of many different ways to solve this problem:
1) Global Variables or Class Variables - I honestly want to stay away from this approach because I hear it is bad design. I also hear that this approach is not thread-safe. But it seems to provide a simple solution to my problem?
2) Serialize the Object in the Database - This is what I'm learning towards the most. I hear that sometimes people will serialize a Hash that contains user preferences in a web app, why not serialize my CourseInfo object? I've also done some research on the MessagePack gem, and I could potentially encode the CourseInfo object using MessagePack and then store it into the database. I feel like this would be a noticeable performance increase.
3) Use some kind of cache - Gems such as Redis act as a cache, and I liked Redis because it is a key value store. I can store a CourseInfo object for each Gradebook that was used during the session, and if I need to update the CourseInfo object, I can simply fetch the CourseInfo object by using the Gradebok's ID as a key. But I'm not sure if this is thread-safe. What if two instructors attempt to update two different grades at the same time? Will there be multiple instances of this CourseInfo object for each client using Gradebook?
4) Store it in the Session - Yeah I pretty much crossed this option off my list. I researched this approach, and I hear it is horrible to store a lot of data in the session. I don't want to do this.
What do you think? If I don't want to reinitialize this large object for each request, how can I make it live forever? What is the most efficient solution? What do you think about my design?
Help would be much appreciated! Thanks!
Use
2) Serialize the Object in the Database
due to agile philosophy of implementing the simplest thing that could possibly work first.
see Saving arrays, hashes, and other non-mappable objects in text columns
The course_average allways reflects the persistent state of the users records. Serializing it is a no braner in ActiveRecord. If you are using postgres , you can even use the native json store, which you can not only deserialize but also query through. No need for additional complexity to maintain an extra store. This solution has also the benefit of having a persistent counter cache.(no need to recalculate if nothing changes)
However using a cache is also a valuable option. Just remember, if you want to use redis as a cache store you have to explicitly configure a cache expiring policy, as by default none of the keys will expire and you will recieve an out of memory error, when redis grows beyound the size of RAM on the machine.
The redis-rails gem will setup rails to use redis for caching.
Storing this information in the session might also work, but watch out you session not getting to big. The whole session data is allways loaded completely into memory, regardles of some information in it is required or not. Allways loading megabytes of data into memory for every http connection might be not a great idea.
There is also a 5th option, i would evaluate first. Check, does the computation of averages really takes so long. Or can the peformance of it, pobably be improved, e.g. by reducing n+1 queries, setting proper indexes, doing the whole computation in sql or preparing the necessary data completly in sql, so that all the necessary data can be fetched in 1 query.

How to efficiently loading millions of domain objects

I am facing to problem since I have millions of domain objects in one table. When I try to get all objects by using Domain.findAllBy() after couple of minutes I get OutOfMemoryError.. I would like to know if there is an efficient way to load all of them without getting this error again ?
Should I page the result and only load the necessaries ?
Please tell me if I am doing it wrong as well..
Thank you for your help and happy new year ;)
The real solution is not to try to retrieve all domain objects in one go into memory. No matter how much memory you buy, you can't guarantee that the domain objects won't grow more quickly than your RAM.
Next, even if you could store all the objects in memory, it would take a non-trivial amount of time to retrieve them all. Any operations you want to undertake on the objects - modifying attributes, calling methods - would take even longer.
I can't imagine a scenario where a human user would want to see millions of business objects on a web page - even paging through them all doesn't make sense.
So, if you are retrieving the objects to modify them, do so in the database. If you are retrieving them to run a method on the business objects, use paging, or consider if you can implement that method as a database call. If you're retrieving them for display, you'll need to allow the user to filter their request, and provide pagination.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Questions about caching in high-traffic website

Suppose we are building an E-commerce site that allows consumers to search for products by typing in keywords. Say there are at most 200,000 products, and there are millions of consumers using the system. Let’s say the product table is updated fairly frequently. Since the number of products is not that high and we can probably store the entire product table in memory and search against it instead of hitting the database. We are hoping to create distributed caches that store the same data but reside in different servers (for high availability and performance reason) and we need to be able to synchronize data among these caches and invalidate caches when product table is modified.
Our application is built using ASP.NET MVC and NHibernate. I am trying to understand whether NHibernate’s level-2 caching would help with my situation. I would really appreciate if you guys can shed some light on this.
I understand that level-2 caching will help cache query result so if two different users are searching using the same keyword, the L2 Cache will serve the result from the cache instead of from the database. But it doesn’t help us much since the product table is updated frequently and the cached result will be stale.
My question is am I understanding L2 caching correctly and is there exists anything that help manage cache the way I would like to do (multiple caches, the same data, synchronize between cache and invalidate cache). Any thoughts is highly appreciated.
Having used both the second-level cache (using the memcached provider) and the NHibernate.Search add-on it seems to me you could benefit from both.
The NHibernate.Search component depends on Lucene.Net and keyword search is decoupled from the Database it self. A different index file is created per class mapped and optimizations can be set on the property level using attributes, giving you an extra level of granularity. Additionally, you can implement best match and propositions (check Lucene in Action and/or Hibernate Search in action). As a note, you don't have to maintain the index (unless you explicitly request an index rebuild); the implementation manages everything behind the scenes although you can manipulate the index if you wish to do so. So, adding/deleting/updating a product will automatically update the according index.
For the second-level cache you get instant performance boost. On a test environment with a data set of approx 2 mil rows i had more than 20% improvement even on an extremely low request count. The performance boost is gradually larger as the request count increases - the application first hits the 2nd level cache and if it does not find it then hits the DB to fetch the required rows and inserts them on the cache for future queries. Again you can manage stuff like cache duration and other configuration settings, as well as explicitly clear the cache (all of it, a part of it, or particular entries) if you wish to do so. Note that cache state is managed by the application during save/update/delete.
For scallability
* the 2nd level cache depends on the provider (ie memcached is highly performant and scalable and supports distributed instances).
* for the Lucene.Net/NHibernate.Search you will need to set up a specific place that the indexes will reside and that place must be accessible for read/write by all web-application instances. Note here that the sensitive link is I/O and file contention, so setting up a machine with a faster than light file system will prevent that from happening (i am speaking for your scenario with many thousands of search requests per second)
As a side note i would highly recommend NHibernate.Search since it is extremely faster than LIKE queries and is easier to use than implementing SQL-Server's FullText search inside the application (which i have done).
Whether a second level cache will help depends on exactly how frequently your product table is updated in relation to cache hits. If you add 100 new products an hour but receive 10,000 queries an hour, even a 10% cache hit rate will make a big difference. If the rates are reversed, a second level cache will be of almost no value.
I suggest you set up a stress test environment that closely approximates your production environment and perform benchmarking on the various second level cache providers.
Also check that your DB is configured properly for an update-heavy scenario.
I recommend using NHibernate.Search w/ Lucene. It works together with the 2nd level cache. Lucene can do sophisticated text searching ripping fast and then return back the entity keys to NHibernate which pulls the full entity out of its 2nd level cache. The NHibernate.Search extension does the work of keeping your Lucene index in sync.
TekPub did a recent episode on your exact scenario of searching product descriptions. The episode compares NHibernate queries, SQL Full-text indexing and Lucene w/ NHibernate.Search.

Recommendations on handling object status fields in rails apps: store versus calculate?

I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.
The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.
The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.
Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.

Resources