How to efficiently loading millions of domain objects

How to efficiently loading millions of domain objects - grails

I am facing to problem since I have millions of domain objects in one table. When I try to get all objects by using Domain.findAllBy() after couple of minutes I get OutOfMemoryError.. I would like to know if there is an efficient way to load all of them without getting this error again ?
Should I page the result and only load the necessaries ?
Please tell me if I am doing it wrong as well..
Thank you for your help and happy new year ;)

The real solution is not to try to retrieve all domain objects in one go into memory. No matter how much memory you buy, you can't guarantee that the domain objects won't grow more quickly than your RAM.
Next, even if you could store all the objects in memory, it would take a non-trivial amount of time to retrieve them all. Any operations you want to undertake on the objects - modifying attributes, calling methods - would take even longer.
I can't imagine a scenario where a human user would want to see millions of business objects on a web page - even paging through them all doesn't make sense.
So, if you are retrieving the objects to modify them, do so in the database. If you are retrieving them to run a method on the business objects, use paging, or consider if you can implement that method as a database call. If you're retrieving them for display, you'll need to allow the user to filter their request, and provide pagination.

Related

Efficient way to retrieve user count in Swift app

I'm making a simple Swift meditation app and want to have a feature to allow users to see how many others have installed the app as well ("You're part of a community of 354 other meditators")
My current plan - save a "blank" record on first load to public DB in CloudKit.
Then - each client on login retrieves all the records and counts how many there are?
Is there a better solution. I could imagine this getting slow if there are lots of users...
Thanks!

In terms of your CloudKit example, as far as I'm aware there is no option to return the number of records, instead CloudKit just returns the actual records in batches (it decides how many to return). However, you may specify a limit of records for it to return.
If you did specify a limit, you would need to continually update it since once the number of records grows larger than the limit it will no longer retrieve them all and your count will be wrong.
This would be a bad idea probably since you will have to continually release app updates to increase the limit (unless you stored this value in some kind of other external DB which would then probably be preferable to CloudKit itself). Basically, CloudKit is probably not the best idea for this.
It would probably be much easier to use a different public DB setup. Either set up your own or use a service like 'Parse.com' which makes setting up and connecting to a public DB very simple. An additional benefit of doing it this way is you can run the count query on the server and just return the count value itself rather than returning all records and counting them locally - very inefficient.

Ruby on Rails - Most efficient solution for this Class?

I'm a senior Comp. Sci. major working on a senior design project for our faculty. The name of this project is "Gradebook", and it is responsible for allowing instructors to record grades for students and for students to check their grades in a class. This project is written in Ruby on Rails, and this feature set is integrated into our current CS Website.
One requirement for our project is to constantly keep the course average and each of the student's averages updated. So I designed a CourseInfo class and a StudentInfo class to help with this process.
The CourseInfo class accepts a Gradebook (an ActiveRecord object) as a parameter and calculates the course average. It creates an Associative Array of StudentInfo objects, with each StudentInfo object containing the student's overall average in the class. The benefit of this is that I can calculate the Course Average with one line of code that initializes the class, and it is very clean.
But there is one issue that I'm mulling over. The problem is, the CourseInfo object does not survive when another HTTP request is made, I have to keep recreating it. Whether I'm adding an assignment, editing a category, or recording grades, I have to keep it updated because this project uses AJAX requests all the time. Instructors do not have to refresh any pages, because AJAX requests are created with every action.
For example, suppose I'm recording grades for a specific assignment. With each grade I record into the spreadsheet, an AJAX request is made and the course average updates with each new grade. But the problem is, if I want to update the Course Average after recording a student's grade, since the CourseInfo object does not stay alive in the next request, I have to recreate the object to keep the average updated. But that is a LOT of work. That involves calculating each of the student's average for EACH assignment, and then calculating the course average for EACH student. I know, a lot of work and could be simpler right?
So naturally, I want this CourseInfo object to live forever as long as the client is using the website. I've thought of many different ways to solve this problem:
1) Global Variables or Class Variables - I honestly want to stay away from this approach because I hear it is bad design. I also hear that this approach is not thread-safe. But it seems to provide a simple solution to my problem?
2) Serialize the Object in the Database - This is what I'm learning towards the most. I hear that sometimes people will serialize a Hash that contains user preferences in a web app, why not serialize my CourseInfo object? I've also done some research on the MessagePack gem, and I could potentially encode the CourseInfo object using MessagePack and then store it into the database. I feel like this would be a noticeable performance increase.
3) Use some kind of cache - Gems such as Redis act as a cache, and I liked Redis because it is a key value store. I can store a CourseInfo object for each Gradebook that was used during the session, and if I need to update the CourseInfo object, I can simply fetch the CourseInfo object by using the Gradebok's ID as a key. But I'm not sure if this is thread-safe. What if two instructors attempt to update two different grades at the same time? Will there be multiple instances of this CourseInfo object for each client using Gradebook?
4) Store it in the Session - Yeah I pretty much crossed this option off my list. I researched this approach, and I hear it is horrible to store a lot of data in the session. I don't want to do this.
What do you think? If I don't want to reinitialize this large object for each request, how can I make it live forever? What is the most efficient solution? What do you think about my design?
Help would be much appreciated! Thanks!

Use
2) Serialize the Object in the Database
due to agile philosophy of implementing the simplest thing that could possibly work first.
see Saving arrays, hashes, and other non-mappable objects in text columns
The course_average allways reflects the persistent state of the users records. Serializing it is a no braner in ActiveRecord. If you are using postgres , you can even use the native json store, which you can not only deserialize but also query through. No need for additional complexity to maintain an extra store. This solution has also the benefit of having a persistent counter cache.(no need to recalculate if nothing changes)
However using a cache is also a valuable option. Just remember, if you want to use redis as a cache store you have to explicitly configure a cache expiring policy, as by default none of the keys will expire and you will recieve an out of memory error, when redis grows beyound the size of RAM on the machine.
The redis-rails gem will setup rails to use redis for caching.
Storing this information in the session might also work, but watch out you session not getting to big. The whole session data is allways loaded completely into memory, regardles of some information in it is required or not. Allways loading megabytes of data into memory for every http connection might be not a great idea.
There is also a 5th option, i would evaluate first. Check, does the computation of averages really takes so long. Or can the peformance of it, pobably be improved, e.g. by reducing n+1 queries, setting proper indexes, doing the whole computation in sql or preparing the necessary data completly in sql, so that all the necessary data can be fetched in 1 query.

The best way to handle erratic data on iOS

I am working on an application where I have a connection to a database. The database contains from 300MB to 4GB worth of data as each customer has their own database. My issue that I am having is in gathering the data, because of the potential database size, just downloading and storing the information locally isn't possible. The data can get quite complex and can vary. For an example:
A customer has a Job and they want to search for that job from the app.
I then fetch a list of jobs matching the search criteria.
The customer sees the job they want to view and I start the gathering process.
This job can potentially touch many tables, sometimes repeatedly..
There is the jobs table, a relational table to map to a person. Then there is another table that contains non-customer relational information, then there are calendar events associated to the job, which in tun can associate different people. Then there are emails attached to the job, which in turn can bring in additional people and events.
So I have a working model that gathers all of this information. The problem I have is that I cannot figure out a great method of signaling to my view that the data is completely downloaded. My initial thought was to use the NotificationCenter to message when the certain parts of the task were finished, allowing the core Job object to notify the view when everything was complete.
I know this is a pretty generalized question, but I'm honestly stumped as to how to take an unknown number of table results and translate that into a notice that my app can actually use.

My initial recommendation would be Core Data. It's designed for this kind of problem. No, I'm not saying to download the entire database into Core Data. I'm saying to use Core Data to manage your object model, because that's what it's good at.
As you receive data from the server, compose it into NSManagedObjects and stick them in the data store. On the UI side, create an NSFetchedResultsController to keep you informed as the data updates asynchronously. You don't necessarily need to persist this store. You could just keep it in memory and throw it away whenever you're done with the query, but keeping it on disk could be a nice caching solution. Again, don't think of Core Data as "a local database." Think of it as a model persistence engine that you can query for objects.
One advantage of this model is that you can provide the best available data to the user as it becomes available. But say you really don't want to get the information until it's all available. That's fine, too. Just let the network side keep updating its context, and then only save it when everything's complete. That way NSFetchedResultsController gets a single atomic update. The nice things with Core Data is that it has these concepts built in, so you can adjust your update strategy without requiring massive redesign.

The Notification Center will work great for this.
Post the notification at logical points in your data load to trigger a UI update for your users.

Core Data confusion in retrieving records

I'm currently building a Core Data app and I've hit a snag. I guess here's some context on the schema:
The app is to keep track of a therapist's session with her clients. So the schema is organized thus: there's a table of clients, clients have sessions, sessions have activities, and activities have metrics. In the app these metrics translate to simple counters, timers, and NSSliders.
The crux is that the client wants to be able to insert previously made activities into new sessions for new clients. So, I've tried just doing a simple fetch request and then moved on to an NSFetchedResultsController. I keep running into the issue that since Core Data is an object graph, I get a ton of activity entries with virtually the same data. The only differentiating property would be whatever the session is (and if you want to go further back, the client itself).
I'm not sure if this is something I need to change in the schema itself, or if there's some kind of workaround I can do within Core Data. I've already tried doing distinct fetch results with the NSFetchedResultsController by using the result type NSDictionaryResultType. It kind of accomplishes what I want but I only get the associated properties of the entity, and not any children entities associated with it (I need those metrics, you see).
Any help is appreciated, and I can post code if desired even though I don't really have a specific coding error.

I don't see the problem. If you modeled things with the Client, Session, Activity, and Metric entities, each having a to-many relationship to the one to its right and to-one/to-many inverse relationship to the one to its left (in the order I listed the entities), there is nothing stopping you from adding a particular activity into another session (of another client), is it?
Maybe I'm misunderstanding the question.

Just use a simple NSFetchRequest and set the predicate for exactly what you are looking for. You can set the fetch limit if you are getting too many results but your question doesn't exactly sounds like a question IMO.
I believe what you are looking for is an NSPredicate to narrow your results down. Once you fetch a specific object you can assign any relation or attribute to that object easily with dot notation then save the context.

Large database - Best way to display data on device?

I am currently creating an iOS app, which connects to a database and asynchronously downloads a JSON object of data to display in a table view.
As it currently stands, this is an ok way to do it. However, when the database starts getting much larger, this will cause a massive inconvenience. I'm reasonably proficient in Objective-C but not so much in the database side of things. What would be the best way to get this data from the server, and keep it in the app? At the moment, I have a custom class object storing the data for each of the 'objects' in the JSON object. There will however be many other aspects of the app that the database will handle, such as invites, logins and user details.
Would core data be the way to go? I.e duplicating the database (to a certain extent) and storing it locally, then accessing from there. As I said, i'm not really sure which route to take here, so any advice would be real appreciated.

Core location is for handling location (satellite (and wifi) positionning).
I guess you mean Core Data. Core Data is a graph object model which allows you to manipulate data as objects. You don't dig directly into the database, you ask for objects instanciation through predicates (kind of where clause in SQL) and the manipulate the objects.
This stated, it all depends on what is a "big" database. If it's really big you could consider copying locally a part of it and ask for what's remaining from the server through your webservice.
Another question that you could ask yourself is the quantity of data that never change and if your website database and your app database needs to get synchronized (if your website database is always changing then it would be dumb to copy it in your app totally and always synced your app..).
Links :
Introduction to Core Data
Difference between Core Data and a Database (Cocoa With Love)
edit :
A question you can ask yourself is where your data needs to be saved ?
if your app is just for printing 20 cells out of a total of 200 cells then i would go for a total download of your 200 cells. The load of the other cells will be with no delay after first download, especially appreciated if you're using table view cells with reusable cells
is a delay of some seconds acceptable between the 20 first cells and the 20 following ? I think there is no real "good" answer to your question, it depends on many factors (purpose of your app, acceptable time between loads, does the info needs to be modified and saved back to server or locally, what kind of customers, what your app will do with the cells, if you have a database locally will it be totally independant from "mother" database (if no, what kind of synchronization), etc.)
Trying to sum up things according to what I've understood of your needs, I would say that webservices is good if you just need to retrieve info and exploiting it after without saving it back (even if you can do it actually having services allowing you to do it), having a database locally is good if you need your app to be independant from your server in some ways.
Only you has the key to answer all this and take a decision according to your needs and your knowledge of your application and your customers.

Something like JSON or SOAP is the way to go with getting structured data from a web service into objects in your iPhone app.
Storing relational data on the iPhone itself is easy with SQLite. Here's a decent looking tutorial.
Make things easy for yourself by writing a data layer, abstracting away calls to the database, to avoid dotting SQL queries all over your code in places it shouldn't be, like the UI.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart