Disappointing iOS search times with CoreData - ios

I have a coredata db running on an ipad with iOS 5.1.1. My database has around 50,000 companies in it. I have created an index on the company name attribute. I am searching on this attribute and on occasion get several thousand records returned with a fetchRequest.
When several thousand records are returned then it can take a couple of seconds to return from the fetch. This makes type-ahead searching pretty clunky.
I anticipate having much larger databases in the future. What are my options for implementing a really fast search function?
Thanks.

I recommend watching the core-data performance videos from the last few WWDCs. They often talk about strategies for improving this kind of bottleneck. Some suggestions from the videos :
De-normalise the name field into a separate "case and diacritic insensitive" searchString field and search on that field instead, using <, <= or BEGINSWITH. Avoid MATCHES and wildcards.
Limit the number of results returned by the NSFetchRequest using fetchLimit and fetchBatchSize
If your company object is large you can extract some of the key data items off onto a separate smaller "header" object that is used just for the search interface. Then add a relationship back to the main object when the user makes a selection.
Some pointers to a couple of videos (there are more from other years also):
WWDC 2012: Session 214 - Core Data Best Practices : 45:00
WWDC 2010: Session 137 - Optimizing Core Data Performance on iPhone OS: 34:00

While Core Data is the correct tool in many cases, it's not a silver bullet by any means.
Check out this post which discusses a few optimizing strategies and also cases when an SQL database is your better choice instead of Core Data:
http://inessential.com/2010/02/26/on_switching_away_from_core_data
You may honestly be better off using an SQL database instead of Core Data in this case because when you try to access attribute values on Core Data entities, it typically results in a fault and pulls the object into active memory... this can definitely have performance and speed costs. Using a true database - Core Data is not a true database, see http://cocoawithlove.com/2010/02/differences-between-core-data-and.html - you can query the database without creating objects from it.

Related

NSKeyedArchiver vs Core Data

I am building an app with Objective-C and I would like to persist data. I am hesitating between NSKeyedArchiver and core Data. I am aware there are plenty of ressources about this on the web (including Objective-C best choice for saving data) but I am still doubtful about the one I should use. Here are the two things that make me wonder :
(1) I am assuming I will have around 1000-10000 objects to handle for a data volume of 1-10 Mb. I will do standard database queries on these objects. I would like to be able to load all these objects on launching and to save them from time to time -- a 1 second processing time for loading or saving would be fine by me.
(2) For the moment my model is rather intricate : for instance classA contains among other properties an array of classB which is itself formed by (among other) a property of type classC and a property of type classD. And class D itself contains properties of type classE.
Am I right to assume that (1) means that NSKeyedArchiver will still work fine and that (2) means that using core Data may not be very simple ? I have tried to look for cases where core Data was used with complex object graph structure like my case (2) on the web but haven't found that many ressources. This is for the moment what refrains me the most from using it.
The two things you identify both make me lean towards using CoreData rather than NSKeyedArchiver:
CoreData is well able to cope with 10,000 objects (if not considerably more), and it can support relatively straight-forward "database-like" queries of the data (sorting with NSSortDescriptors, filtering with NSPredicate). There are limitations on what can be achieved, but worst case you can load all the data into memory - which is what you would have to do with the NSKeyedArchiver solution.
Loading in sub-second times should be achievable (I've just tested with 10,000 objects, totalling 14Mb, in 0.17 secs in the simulator), particularly if you optimise to load only essential data initially, and let CoreData's faulting process bring in the additional data when necessary. Again, this will be better than NSKeyedArchiver.
Although most demos/tutorials opt for relatively straight forward data models (enough to demonstrate attributes and relationships), CoreData can cope with much more sophisticated data models. Below is a mock-up of the relationships that you describe, which took a few minutes to put together:
If you generate subclasses for all those entities, then traversing those relationships is simple (both forwards and backwards - inverse relationships are managed automatically for you). Again, there are limitations (CoreData does the SQL work for you, but in so doing it is less flexible than using a relational database directly).
Hope that helps.

Write native SQL in Core Data

I need to write a native SQL Query while I'm using Core Data in my project. I really need to do that, since I'm using NSPredicate right now and it's not efficient enough (in just one single case). I just need to write a couple of subqueries and joins to fetch a big number of rows and sort them by a special field. In particular, I need to sort it by the sum of values of their child-entites. Right now I'm fetching everything using NSPredicate and then I'm sorting my result (array) manually, but this just takes too long since there are many thousands of results.
Please correct me if I'm wrong, but I'm pretty sure this can't be a huge challenge, since there's a way of using sqlite in iOS applications.
It would be awesome if someone could guide me into the right direction.
Thanks in advance.
EDIT:
Let me explain what I'm doing.
Here's my Coredata model:
And here's how my result looks on the iPad:
I'm showing a table with one row per customer, where every customer has an amount of sales he made from January to June 2012 (Last) AND 2013 (Curr). Next to the Curr there's the variance between those two values. The same thing for gross margin and coverage ratio.
Every customer is saved in the Kunde table and every Kunde has a couple of PbsRows. PbsRow actually holds the sum of sales amounts per month.
So what I'm doing in order to show these results, is to fetch all the PbsRows between January and June 2013 and then do this:
self.kunden = [NSMutableOrderedSet orderedSetWithArray:[pbsRows valueForKeyPath:#"kunde"]];
Now I have all customers (Kunde) which have records between January and June 2013.
Then I'm using a for loop to calculate the sum for each single customer.
The idea is to get the amounts of sales of the current year and compare them to the last year.
The bad thing is that there are a lot of customers and the for-loop just takes very long :-(
This is a bit of a hack, but... The SQLite library is capable of opening more than one database file at a given time. It would be quite feasible to open the Core Data DB file (read/only usage) directly with SQLite and open a second file in conjunction with this (reporting/temporary tables). One could then execute direct SQL queries on the data in the Core Data DB and persist them into a second file (if persistence is needed).
I have done this sort of thing a few times. There are features available in the SQLite library (example: full-text search engine) that are not exposed through Core Data.
If you want to use Core Data there is no supported way to do a SQL query. You can fetch specific values and use [NSExpression expressionForFunction:arguments:] with a sum: function.
To see what SQL commands Core Data executes add -com.apple.CoreData.SQLDebug 1 to "Arguments Passed on Launch". Note that this should not tempt you to use the SQL commands youself, it's just for debugging purposes.
Short answer: you can't do this.
Long answer: Core Data is not a database per se - it's not guaranteed to have anything relational backing it, let alone a specific version of SQLite that you can query against. Furthermore, going mucking around in Core Data's persistent store files is a recipe for disaster, especially if Apple decides to change the format of that file in some way. You should instead try to find better ways to optimize your usage of NSPredicate or start caching the values you care about yourself.
Have you considered using the KVC collection operators? For example, if you have an entity Foo each with a bunch of children Bar, and those Bars have a Baz integer value, I think you can get the sum of those for each Foo by doing something like:
foo.bars.#sum.baz
Not sure if these are applicable to predicates, but it's worth looking into.

Is core data performance better with less attributes?

I have a core data entity called DiveSite that has a large number of attributes of which many are booleans that represent features or conditions affecting a dive site.
In fact, I have so many attributes that xCode gives me a warning - "Misconfigured Entity - DiveSite has more than 100 properties; consider a more shallow entity hierarchy or denormalize properties"
Many of these properties could be grouped reducing the overall number of attributes on the entity - I could possibly change groups of booleans into a series of integers and do a logical and to check the factors i want.
I also realise that I could make these groups into separate entities - some of which would have a 1-1 relationship and some a 1-many relationship
In terms of performance would changing my DiveSite entity to have fewer attributes be a positive thing to do?
If so would it likely be better performance-wise to have separate entities or to have perhaps 6 attributes which I call using a predicate to filter on. ?
Thinking about it whilst phrasing this question, I realise that if I go the separate entities route I allow myself to add factors to some of the entities just by adding them as instances of the entity without changing my code.
I may have answered my own question as I write this but would appreciate the opinions of experience core data /and database users out there
Cheers
Yes it's adviced to keep your entities small. When you have a list view for example, you generally don't need all the information on the objects, but when you click one and go to the detail view, you would want to fetch more detailed information. Then you can fetch it from the other entities.
Of course, you should make relationships between these entities.
I can't say if it's a good practice to keep the entities "small" or not. But from my experience with Core Data, big entities aren't an issue.
By big, I mean an entity with 25 to 50 attributes, with per example a lot of long strings or binary data. The query time is, for entities of that size, more often than not, greater than the load and instantiation time. Fetching 1000 full entities in one fetch is usually faster than fetching 1000 partial entities then faulting 100 missing attributes.
On a side note, I must add I very rarely used entities of that size in a shipping product. Large entities are almost always refactored in several smaller related entities.
Now you told you reached something like 100 attributes. Wow. I think I never hit that mark in any of my projects - Core Data or any "classic" database. I would say the first issue here is readability & maintainability. I'm pretty sure you can refactor such a big entity in smaller ones, define the core attributes defining the principal entity, find some shared values here and there, etc. That would certainly helps.
Performance wise, as always, the answer lies in the profiler to accurately measure where the time is spent. Fetching too many happens, but fetching too little (aka loads of queries) happens more often in my experience.

Improve process of mirroring server database to a client database via JSON?

I have a ready enterprise (non-AppStore) legacy iOS application for iPad that I need to refactor (it was written by another developer, my predecessor on my current job).
This application fetches its data via JSON from a server having MSSQL database. The database schema has about 30 tables, the most capacious are: Client, City, Agency each having about 10.000 records each and the further growth is expected in the future. After the JSON is received (one JSON request-and-response pair for each table) - it is mapped to the CoreData - the process which also includes glueing together the corresponding CoreData entities (Client, City, Agency and others) with each other i.e. setting the relations beetween these entities on the CoreData layer.
In itself the project's CoreData fetch-part (or read-part) is heavily optimized - it uses, I guess, almost all possible performance and memory tweaks CoreData has, that is why UI layer of application is very fast and responsive, so that I consider its work as completely satisfactory and adequate.
The problem is the process of preparation of CoreData layer, i.e. the server-to-client synchronization process: it takes too much time. Consider 30 network requests resulting in 30 JSON packs ("pack" I mean "one table - one JSON"), which are then mapped to 30 CoreData entities, which are then glued together (the appropriate CoreData relations are set beetween them). When I first saw how all this is done in this project (too slow), the first idea to come into my head was:
"For the first time a complete synchronization is performed (app's first launch time) - perform a fetch of the whole database data in, say, one archived file (something like database dump) and then somehow import it as a whole to a Core Data land".
But then I realized that, even if such transmission of such one-file dump was possible, CoreData would still require me to perform a gluing of the corresponding CoreData entities to set the appropriate relations beetween them so that it is hard to imagine that I could benefit in performance if I would rely on this scheme.
Also, my colleague suggested me to consider SQLite as a complete alternative to Core Data, but unfortunately I don't have an experience of using it, that's why I am completely blind to foresee all the consequences of such serious design decision (even having the synchronization process very slow, my app does work, especially its UI performance is very good now). The only thing I can imagine about SQLite that in contrast to Core Data it will not push me to glue some additional relations on a client side, because SQLite has its good old foreign key system, doesn't it?
And so here are the questions (Respondents, please do not mix these points when you answer - there is too much confusion I have about all of them):
Does anybody have such experience of taking "first-time large import of the whole database" approach in a way I have described above? I would be very thankful to know about any solutions should they exploit JSON<->CoreData pair or not.
Does Core Data has some global import mechanism which can allow mass-creation of corresponding 30-tables-schema (possibly using some specific source other than "30 packs of JSON" described above) without a need of setting up corresponding relations for 30 entities?
Are there any possibilities to speed up the synchronization process if 2) is impossible? Here I mean the improvements of current JSON<->CoreData scheme my app uses.
Migration to SQLite: should I consider such migration? What I will benefit from it? How the whole process of replication->transmission->client preparations could look like then?
Other alternatives to CoreData and SQLite - what could they be or look like?
Any other thoughts or visions you may have about the situation I've described?
UPDATE 1
Though the answer written by Mundi is good (one large JSON, "No" for using SQLite), I am still interested if there are any other insights into the the problem I've described.
UPDATE 2
I did try to use my russian english the best way I could to describe my situation in a hope for my question could become pretty clear to everyone who will read it. By this second update I will try to provide it with some more guides to make my question even more clear.
Please, consider two dichotomies:
What can/should I use as a data layer on iOS client - CoreData vs SQLite?
What can/should I use as a transport layer - JSON (single-JSON-at-once as suggested in the answer, even zipped maybe) or some DB-itself-dumps (if it is even possible, of course - notice I am also asking this in my question).
I think it is pretty obvious the "sector" which is formed by intersection of these two dichotomies, choosing CoreData from the first one and JSON from the second is the most wide-spread default in iOS development world and also it is used by my app from this question.
Having that said, I claim that I would be thankful to see any answers regarding CoreData-JSON pair as well as the answers considering using any other "sectors" (what about opting SQLite and some kind of its dumps approach, why not?)
Also, important to note, that I don't want to just drop the current option for some other alternatives, I just want to get the solution working fast on both synchronization and UI phases of its usage. So answers about improving current scheme as well as answers suggesting the other schemes are welcome!
Now, please see the following update #3 which provides more details for my current CoreData-JSON situation:
UPDATE 3
As I have said, currently my app receives 30 packs of JSON - one pack for the whole table. Let's take capacious tables for example: Client, Agency, City.
It is Core Data, so if a client record has non-empty agency_id field, I need to create new Core Data entity of class Agency (NSManagedObject subclass) and fill it with this record's JSON data, that's why I need to already have corresponding Core Data entity for this agency of class Agency (NSManagedObject's subclass), and finally I need to do something like client.agency = agency; and then call [currentManagedObjectContext save:&error]. Having it done this way, later I can then ask this client to be fetched and ask its .agency property to find corresponding entity. I hope I am completely sane when I do it this way.
Now imagine this pattern applied to the following situation:
I have just received the following 3 separate JSON packs: 10000 clients and 4000 cities and 6000 agencies (client has one city, city has many clients; client has agency, agency has many clients, agency has one city, city has many agencies).
Now I want to setup the following relations on Core Data level: I want my client entity client to be connected to a corresponding city and corresponding agency.
The current implementation of this in the project does very ugly thing:
Since dependency order is the following: City -> Agency -> Client i.e. the City needs to be baked first, the application begins creating entities for City and persists them to Core Data.
Then it deals with the JSON of agencies: it iterates through every JSON record - for every agency, it creates a new entity agency and by its city_id, it fetches corresponding entity city and connects it using the agency.city = city. After the iteration through the whole agencies JSON array is done, current managed object context is saved (actually the -[managedObjectContext save:] is done several times, each after 500 records processed). At this step it is obvious that fetching one of 4000 cities for every client for every of 6000 agencies has a huge performance impact on the whole synchronization process.
Then, finally it deals with the JSON of clients: like in previous 2 stage, it iterates through the whole 10000-elements JSON array and one by one performs the fetch of corresponding agencies and ZOMG cities, and this impacts the overall performance in the same manner like previous stage 2 does.
It is all very BAD.
The only performance optimization I can see here, is that the first stage could leave a large dictionary with cities ids (I mean NSNumber's of real ids) and faulted City entities as values) so it would be possible to prevent ugly find process of the following stage 2 and then do the same on the stage 3 using the analogous caching trick, but the problem is that there are much more relations beetween all the 30 tables that just-described [ Client-City, Client-Agency, Agency-City ] so the final procedure involving a caching of all the entities will the most probably hit the resources iPad device reserves for my app.
UPDATE 4
Message for future respondents: I've tried my best to make this answer well-detailed and well-formed and I really expect you to answer with verbose answers. It would be great if your answer would really address the complexity of problem discussed here and complement my efforts I've made to make my question clear and general as much as possible. Thanks.
UPDATE 5
Related topics: Core Data on client (iOS) to cache data from a server Strategy, Trying to make a POST request with RestKit and map the response to Core Data.
UPDATE 6
Even after it is no more possible to open new bounties and there is accepted answer, I still would be glad to see any other answers containing additional information about the problem this topic addresses. Thanks in advance.
I have experience in a very similar project. The Core Data insertions take some time, so we condition the user that this will take a while, but only the first time. The best performance tweak was of course to get the batch size right between saves, but I am sure you are aware of that.
One performance suggestion: I have tried a few things and found that creating many download threads can be a hit on performance, I suppose because for each request there is some latency from the server etc.
Instead, I discovered that downloading all the JSON in one go was much faster. I do not know how much data you have, but I tested with > 100.000 records and a 40MB+ JSON string this works really fast, so the bottleneck is just the Core Data insertions. With an #autorelease pool this even performed acceptably on a first generation iPad.
Stay away from the SQLite API - it will take you more than a man year (provided high productivity) to replicate the performance optimizations you get out of the box with Core Data.
First off, you're doing a lot of work, and it will take some time no matter how you slice it, but there are ways to improve things.
I'd recommend doing your fetches in batches, with a batch size matching your batch size for processing new objects. For example, when creating new Agency records, do something like:
Make sure the current Agency batch is sorted by city_id. (I'll explain why later).
Get the City ID for each Agency in the batch. Depending on how your JSON is structured, this is probably a one-liner like this (since valueForKey works on arrays):
NSArray *cityIDs = [myAgencyBatch valueForKey:#"city_id"];
Get all the City instances for the current pass in one fetch by using the IDs you found in the previous step. Sort the results by city_id. Something like:
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"City"];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"city_id in %#", cityIDs];
[request setPredicate:predicate];
[request setSortDescriptors:#[ [NSSortDescriptor sortDescriptorWithKey:#"city_id" ascending:YES] ]];
NSArray *cities = [context executeFetchRequest:request error:nil];
Now, you have one array of Agency and another one of City, both sorted by city_id. Match them up to set up the relationships (check city_id in case things don't match). Save changes, and go on to the next batch.
This will dramatically reduce the number of fetches you need to do, which should speed things up. For more on this technique, see Implementing Find-or-Create Efficiently in Apple's docs.
Another thing that may help is to "warm up" Core Data's internal cache with the objects you need before you start fetching them. This will save time later on because getting property values won't require a trip to the data store. For this you'd do something like:
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"City"];
// no predicate, get everything
[request setResultType:NSManagedObjectIDResultType];
NSArray *notUsed = [context executeFetchRequest:request error:nil];
..and then just forget about the results. This is superficially useless but will alter the internal Core Data state for faster access to City instances later on.
Now as for your other questions,
Using SQLite directly instead of Core Data might not be a terrible choice for your situation. The benefit would be that you'd have no need to set up the relationships, since you could use use fields like city_id as foreign keys. So, fast importing. The downside, of course, is that you'll have to do your own work converting your model objects to/from SQL records, and probably rewrite quite a lot of existing code that assumes Core Data (e.g. every time you follow a relationship you now need to look up records by that foreign key). This change might fix your import performance issues, but the side effects could be significant.
JSON is generally a very good format if you're transmitting data as text. If you could prepare a Core Data store on the server, and if you would use that file as-is instead of trying to merge it into an existing data store, then that would almost certainly speed things up. Your import process would run once on the server and then never again. But those are big "if"s, especially the second one. If you get to where you need to merge a new server data store with existing data, you're right back to where you are now.
Do you have control of the server? I ask, because it sounds like you do from the following paragraph:
"For the first time a complete synchronization is performed (app's first launch time) - perform the fetch of the whole database data in, say, one archived file (something like database dump) and then somehow import it as a whole to the CoreData land".
If sending a dump is possible, why not send the Core Data file itself? Core Data (by default) is backed by a SQLite database -- why not generate that database on the server, zip it and send it across the wire?
This would mean you could eliminate all the JSON parsing, network requests etc and replace it with a simple file download and archive extraction. We did this on a project and it improved performance immeasurably.
For each row in your table there must be a timestamp column. If there isn't one, you should add it.
First time and each time you fetch database dump you store last update date and time.
On every next time you instruct the database to return only those records that were changed or updated since the previous download operation. There also should be a "deleted" flag for you to remove vanished records.
Then you only need to update certain matching records saving time on all fronts.
To speed up the first time sync you can also ship a seed database with the app, so that it could be imported immediately without any network operations.
Download the JSON files by hand.
Put them into your project.
Somewhere in the project configuration or header files take a note of download date and time.
On the first run, locate and load said files, then proceed like you're updating them.
If in doubt, refer to the manual.
Example:
NSString *filePath = [[NSBundle mainBundle] pathForResource:#"cities"
ofType:#"json"];
NSData *citiesData = [NSData dataWithContentsOfFile:filePath];
// I assume that you're loading an array
NSArray *citiesSeed = [NSJSONSerialization JSONObjectWithData:citiesData
options:NSJSONReadingMutableContainers error:nil];
Here you have my recommendations:
Use magicalrecord. It's a CoreData wrapper that saves you a lot of boilerplate code, plus it comes with very interesting features.
Download all the JSON in one request, as others suggested. If you can embed the first JSON document into the app, you can save the download time and start populating the database right when you open the app for the first time. Also, with magicalrecord is quite easy to perform this save operation in a separate thread and then sync all contexts automatically. This can improve the responsiveness of your app.
It seems that you should refactor that ugly method once you have solved the first import issue. Again, I would suggest to use magicalrecord to easily create those entities.
We've recently moved a fairly large project from Core Data to SQLite, and one of the main reasons was bulk insert performance. There were quite a few features we lost in the transition, and I would not advise you to make the switch if you can avoid it. After the transition to SQLite, we actually had performance issues in areas other than bulk inserts which Core Data was transparently handling for us, and even though we fixed those new issues, it took some amount of time getting back up and running. Although we've spent some time and effort in transitioning from Core Data to SQLite, I can't say that there are any regrets.
With that cleared up, I'd suggest you get some baseline measurements before you go about fixing the bulk insert performance.
Measure how long it takes to insert those records in the current state.
Skip setting up the relationships between those objects altogether, and then measure the insert performance.
Create a simple SQLite database, and measure the insert performance with that. This should give you a very good baseline estimate of how long it takes to perform the actual SQL inserts and will also give you a good idea of the Core Data overhead.
A few things you can try off the bat to speed up inserts:
Ensure that there are no active fetched results controllers when you are performing the bulk inserts. By active, I mean fetched result controllers that have a non-nil delegate. In my experience, Core Data's change tracking was the single most expensive operation when trying to do bulk insertions.
Perform all changes in a single context, and stop merging changes from different contexts until this bulk inserts are done.
To get more insight into what's really going on under the hood, enable Core Data SQL debugging and see the SQL queries that are being executed. Ideally, you'd want to see a lot of INSERTs, and a few UPDATEs. But if you come across too many SELECTs, and/or UPDATEs, then that's a sign that you are doing too much reading, or updating of objects.
Use the Core-Data profiler instrument to get a better high-level overview of what's going on with Core Data.
I've decided to write my own answer summarizing the techniques and advices I found useful for my situation. Thanks to all folks who posted their answers.
I. Transport
"One JSON". This is the idea that I want to give a try. Thanks #mundi.
The idea of archiving JSON before sending it to a client, be it a one JSON pack or a 30 separate 'one table - one pack'.
II. Setting up Core Data relations
I will describe a process of importing JSON->CoreData import using imaginary large import operation as if it was performed in one method (I am not sure will it be so or not - maybe I split it into a logical chunks).
Let's imagine that in my imaginary app there are 15 capacious tables, where "capacious" means "cannot be held in memory at once, should be imported using batches" and 15 non-capacious tables each having <500 records, for example:
Capacious:
cities (15k+)
clients (30k+)
users (15k+)
events (5k+)
actions (2k+)
...
Small:
client_types (20-)
visit_types (10-)
positions (10-)
...
Let's imagine, that I already have JSON packs downloaded and parsed into composite NSArray/NSDictionary variables: I have citiesJSON, clientsJSON, usersJSON, ...
1. Work with small tables first
My pseudo-method starts with import of tiny tables first. Let's take client_types table: I iterate through clientTypesJSON and create ClientType objects (NSManagedObject's subclasses). More than that I collect resulting objects in a dictionary with these objects as its values and "ids" (foreign keys) of these objects as keys.
Here is the pseudocode:
NSMutableDictionary *clientTypesIdsAndClientTypes = [NSMutableDictionary dictionary];
for (NSDictionary *clientTypeJSON in clientsJSON) {
ClientType *clientType = [NSEntityDescription insertNewObjectForEntityForName:#"ClientType" inManagedObjectContext:managedObjectContext];
// fill the properties of clientType from clientTypeJSON
// Write prepared clientType to a cache
[clientTypesIdsAndClientTypes setValue:clientType forKey:clientType.id];
}
// Persist all clientTypes to a store.
NSArray *clientTypes = [clientTypesIdsAndClientTypes allValues];
[managedObjectContext obtainPermanentIDsForObjects:clientTypes error:...];
// Un-fault (unload from RAM) all the records in the cache - because we don't need them in memory anymore.
for (ClientType *clientType in clientTypes) {
[managedObjectContext refreshObject:clientType mergeChanges:NO];
}
The result is that we have a bunch of dictionaries of small tables, each having corresponding set of objects and their ids. We will use them later without a refetching because they are small and their values (NSManagedObjects) are now faults.
2. Use the cache dictionary of objects from small tables obtained during step 1 to set up relationships with them
Let's consider complex table clients: we have clientsJSON and we need to set up a clientType relation for each client record, it is easy because we do have a cache with clientTypes and their ids:
for (NSDictionary *clientJSON in clientsJSON) {
Client *client = [NSEntityDescription insertNewObjectForEntityForName:#"Client" inManagedObjectContext:managedObjectContext];
// Setting up SQLite field
client.client_type_id = clientJSON[#"client_type_id"];
// Setting up Core Data relationship beetween client and clientType
client.clientType = clientTypesIdsAndClientTypes[client.client_type_id];
}
// Save and persist
3. Dealing with large tables - batches
Let's consider a large clientsJSON having 30k+ clients in it. We do not iterate through the whole clientsJSON but split it into a chunks of appropriate size (500 records), so that [managedObjectContext save:...] is called every 500 records. Also it is important to wrap operation with each 500-records batch into an #autoreleasepool block - see Reducing memory overhead in Core Data Performance guide
Be careful - the step 4 describes the operation applied to a batch of 500 records not to a whole clientsJSON!
4. Dealing with large tables - setting up relationships with large tables
Consider the following method, we will use in a moment:
#implementation NSManagedObject (Extensions)
+ (NSDictionary *)dictionaryOfExistingObjectsByIds:(NSArray *)objectIds inManagedObjectContext:(NSManagedObjectContext *)managedObjectContext {
NSDictionary *dictionaryOfObjects;
NSArray *sortedObjectIds = [objectIds sortedArrayUsingSelector:#selector(compare:)];
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] initWithEntityName:NSStringFromClass(self)];
fetchRequest.predicate = [NSPredicate predicateWithFormat:#"(id IN %#)", sortedObjectIds];
fetchRequest.sortDescriptors = #[[[NSSortDescriptor alloc] initWithKey: #"id" ascending:YES]];
fetchRequest.includesPropertyValues = NO;
fetchRequest.returnsObjectsAsFaults = YES;
NSError *error;
NSArray *fetchResult = [managedObjectContext executeFetchRequest:fetchRequest error:&error];
dictionaryOfObjects = [NSMutableDictionary dictionaryWithObjects:fetchResult forKeys:sortedObjectIds];
return dictionaryOfObjects;
}
#end
Let's consider clientsJSON pack containing a batch (500) of Client records we need to save. Also we need to set up a relationship beetween these clients and their agencies (Agency, foreign key is agency_id).
NSMutableArray *agenciesIds = [NSMutableArray array];
NSMutableArray *clients = [NSMutableArray array];
for (NSDictionary *clientJSON in clientsJSON) {
Client *client = [NSEntityDescription insertNewObjectForEntityForName:#"Client" inManagedObjectContext:managedObjectContext];
// fill client fields...
// Also collect agencies ids
if ([agenciesIds containsObject:client.agency_id] == NO) {
[agenciesIds addObject:client.agency_id];
}
[clients addObject:client];
}
NSDictionary *agenciesIdsAndAgenciesObjects = [Agency dictionaryOfExistingObjectsByIds:agenciesIds];
// Setting up Core Data relationship beetween Client and Agency
for (Client *client in clients) {
client.agency = agenciesIdsAndAgenciesObjects[client.agency_id];
}
// Persist all Clients to a store.
[managedObjectContext obtainPermanentIDsForObjects:clients error:...];
// Un-fault all the records in the cache - because we don't need them in memory anymore.
for (Client *client in clients) {
[managedObjectContext refreshObject:client mergeChanges:NO];
}
Most of what I use here is described in these Apple guides: Core Data performance, Efficiently importing data. So the summary for steps 1-4 is the following:
Turn objects into faults when they are persisted and so their property values become unnecessary as import operation goes futher.
Construct dictionaries with objects as values and their ids as keys, so these dictionaries can serve as lookup tables when constructing a relationships beetween these objects and other objects.
Use #autoreleasepool when iterating through a large number of records.
Use a method similar to dictionaryOfExistingObjectsByIds or to a method that Tom references in his answer, from Efficiently importing data - a method that has SQL IN predicate behind it to significantly reduce a number of fetches. Read Tom's answer and referenced Apple's corresponding guide to better understand this technique.
Good reading on this topic
objc.io issue #4: Importing Large Data Sets

SQL SELECT with table aliases in Core Data

I have the following SQL query that I want to do using Core Data:
SELECT t1.date, t1.amount + SUM(t2.amount) AS importantvalue
FROM specifictable AS t1, specifictable AS t2
WHERE t1.amount < 0 AND t2.amount < 0 AND t1.date IS NOT NULL AND t2.date IS NULL
GROUP BY t1.date, t1.amount;
Now, it looks like CoreData fetch requests can only fetch from a single entity. Is there a way to do this entire query in a single fetch request?
The best way I know is to crate an abstract parent entity for entities you wish to fetch together.
So if you have - 'Meat' 'Vegetables' and 'Fruits' entities, you can create a parent abstract entity for 'Food' and then fetch for all the sweet entities in the 'Food' entity.
This way you will get all the sweet 'Meat' 'Vegetables' and 'Fruits'.
Look here:
Entity Inheritance in Apple documentation.
Nikolay,
Core Data is not a SQL system. It has a more primitive query language. While this appears to be a deficit, it really isn't. It forces you to bring things into RAM and do your complex calculations there instead of in the DB. The NSSet/NSMutableSet operations are extremely fast and effective. This also results in a faster app. (This is particularly apparent on iOS where the flash is slow and, hence, big fetches are to be preferred.)
In answer to your question, yes, a fetch request operates on a single entity. No, you are not limited to data on that entity. One uses key paths to traverse relationships in the predicate language.
Shannoga's answer is one good way to solve your problem. But I don't know enough about what you are actually trying to accomplish with your data model to judge whether using entity inheritance is the right path for your app. It may not be.
Your SQL schema from a server may not make sense in a CD app. Both the query language and how the data is used in the UI probably force a different structure. (For example, using a fetched results controller on iOS can force you to denormalize your data differently than you would on a server.)
Entity inheritance, like inheritance in OOP, is a stiff technology. It is hard to change. Hence, I use it carefully. When I do use it, I gain performance in some fetches and some simplification in other calculations. At other times, it is the wrong answer, performance wise.
The real answer is a question: what are you really trying to do?
Andrew

Resources