I have an app with a lot of data (including NSMutableArrays, NSNumbers, various custom classes) that uses NSCoding protocol presently. However, I would like to implement an incremental saving system, to save time during the "saving process". The loading time is not important.
Is there any existing container that checks its members for "dirty" and only updates those values when writing to file; or better yet, a protocol that can be implemented to do the same; or any other simple, available way of doing this?
For large amount of data its better to change data model to Core Data. Otherwise, you may want to save changes after specific events, or, bad solution is to use NSTimer, to save all data every time you want to.
Related
Background story
I am developing a big iOS app. This app works under specific assumptions. The main of them is that app should work offline with internal storage which is a snapshot of last synchronized state of data saved on server. I decided to use CoreData to handle this storage. Every time app launches I check if WiFi connection is enabled and then try to synchronize storage with server. The synchronization can take about 3 minutes because of size of data.
The synchronization process consists of several stages and in each of them I:
fetch some data from the server (XML)
deserialize it
save it in Core Data
Problem
Synchronization process can be interrupted for several reasons (internet connection, server down, user leaving application, etc). This may cause data to be out-of-sync.
Let's assume that synchronization process has 5 stages and it breaks after third. It results in 3/5 of data being updated in internal storage and the rest being out of sync. I can't allow it because data are strongly connected to each other (business logic).
Goal
I don't know if it is possible but I'm thinking about implementing one solution. On start of synchronization process I would like to create snapshot (some kind of copy) of current state of Core Date and during synchronization process work on it. When synchronization process completes with success then this snapshot could overwrite current CoreData state. When synchronization interrupts then snapshot can be simply aborted. My internal storage will be secured.
Questions
How to create CoreData snapshot?
How to work with CoreData snapshot?
How to overwrite CoreDate state with snapshot?
Thanks in advice for any help. Code examples, if it is possible, will be appreciated.
EDIT 1
The size of data is too big to handle it with multiple CoreData's contexts. During synchronization I am saving current context multiple times to cleanup memory. If I do not do it, the application will crash with memory error.
I think it should be resolved with multiple NSPersistentStoreCoordinators using for example this method: link. Unfortunately, I don't know how to implement this.
You should do exactly what you said. Just create class (lets call it SyncBuffer) with methods "load", "sync" and "save".
The "load" method should read all entities from CoreData and store it in class variables.
The "sync" method should make all the synchronisation using class variables.
Finally the "save" method should save all values from class variables to CoreData - here you can even remove all data from CoreData and save brand new values from SyncBuffer.
A CoreData stack is composed at its core by three components: A context (NSManagedObjectContext) a model (NSManagedObjectModel) and the store coordinator (NSPersistentStoreCoordinator/NSPersistentStore).
What you want is to have two different contexts, that shares the same model but use two different stores. The store itself will be of the same type (i.e. an SQLite db) but use a different source file.
At this page you can see some documentation about the stack:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/InitializingtheCoreDataStack.html#//apple_ref/doc/uid/TP40001075-CH4-SW1
The NSPersistentContainer is a convenience class to initialise the CoreData stack.
Take the example of the initialisation of a NSPersistentContainer from the link: you can have the exact same code to initialise it, twice, but with the only difference that the two NSPersistentContainer use a different .sqlite file: i.e. you can have two properties in your app delegate called managedObjectContextForUI and managedObjectContextForSyncing that loads different .sqlite files. Then in your program you can use the context from one store to get current data to show to the user and you can use the context that use the other store with a different .sqlite if you are doing sync operations. When the sync operations are finally done you can eventually swap the two files and after clearing and reloading the NSPersistentContainer (this might be tricky, because you will want to invalidate and reload all managed objects: you are switching to an entirely new context) you can then show the newly synced data to the user and start syncing again on a new .sqlite file.
The way I understand the problem is that you wish to be able download a large "object graph". It is however so large that it cannot be loaded at once in memory, so you would have to break it in chunks and then merge it locally into to Core data.
If that is the case, I think that's not trivial. I am not sure I can think of direct solution without understanding the object relations and even then it may be really overwhelming.
An overly simplistic solution may be to generate the sqlite file on the backend then download it in chunks; it seems ugly, but it serves to separate the business logic from the sync, i.e. the sqlite file becomes the transport layer. So, I think the essence to the solution would be to find a way to physically represent the data you are syncing in a format that allows for splitting it in chunks and that can afterwards be merged into a sqlite file (if you insist on using Core data).
Please also note that as far as I know Amazon (https://aws.amazon.com/appsync/) and Realm (https://realm.io/blog/introducing-realm-mobile-platform/) provide background sync of you local database, but those are paid services and you would have to be careful not be locked in (should not depend on their libs in your model layer, instead have a translation layer).
This question is not about the technical problem, but rather the approach.
I know two more or less common approaches to store the data received from the server in your app:
1) Using managers, data holders etc to store the data. They are most often some kind of singleton and are used to store the models received from the server. (E.g. - the array of the posts/places/users) Singletons are needed to be able to access the data from any screen. I think the majority of apps uses this approach.
2) Using Core Data (or maybe Realm) as in-memory storage. This approach avoids having singletons, but, I guess, it is a bit more complex (and crash risky) to maintain and support.
How do you store data and why?
P.S. Any answers would help. But big "thank you" for detailed ones, with reasons.
The reason people opt to use Core Data/Relam/Shark or any other iOS ORM is mainly for the purpose of persisting data between runs of the app.
Currently there are two ways of doing this, for single values and very small (not that I encourage it) objects you can use the UserDefaults to persist between app launches. For a approach closer to a database, infact in the case of Core Data and SharkORM, they are built on top of SQLite, you need to use an ORM.
Using a manager to store an array of a data models will only persist said models for the lifetime of the app. For example when the user force quits the app, restarts their device or in some circumstances when iOS terminates your app, all that data will be lost permanently. This is because it is stored in RAM which is volatile memory, rather than in a database on the disk itself.
Using a database layer even if you don't specifically require persistence between launches can have its advantages though; for instance SharkORM allows you to execute raw SQL queries on your objects if you don't want to use the built in powerful query builder. This can be useful to quickly pull the model you are interested in rather than iterating through a local array.
In reply to your question, how do I store data?
Well, I use a combination of all three. Say for instance I called to an API for some data which I wanted to display there and then to the user, I would use a manager instance with an array to hold the data model.
But on the flipside if I wanted to store that data for later or if I needed to execute a complex query on it, I would store it on disk using Shark.
If however I just wanted to store whether or not the user had seen my on boarding flow I would just persist a boolean value into UserDefaults.
I hope this is detailed enough for you.
CoreData isn't strictly "in-memory". You can load objects into your data model and save them into their context, then they might actually be on disk and out of main memory, and they can easily be brought back via fetch requests.
Singletons, on the other hand, do typically stay in main memory all the time until the user terminates the app. If you have larger objects that you are storing in some data structure (e.g. full resolution images when all you really needed was a thumbnail), this can be quite a resource hog.
Design question:
My app talks to a server. Json data being sent/received.
Data on server is always changing, and I want users to see most current data, not stored/cached data. So I require a user to be logged in order to use the app, and care not to persist data in the app.
Should I still use CoreData and map it to Json's.?
Or can I just create custom model classes and map Json's to it's properties, and have nsarray properties, which point to its child objects, etc. ?
Which is better?
Thanks
If you dont want to persist data, I personally think core data would be overkill for this application
Core Data is really for local persistance. If the data was not changing so often and you didnt want them to have to get an updated data everytime the user visited the page, then you would load the JSON and store it locally using CoreData.
Use plain old objective-c objects for now. It's not hard to switch to Core Data in future, but once you've done so it gets a lot harder to change your schema.
That depends on what your needs are.
If you need the app to work offline, you need to store your information somehow in the client.
In order to save on network usage, you could store locally, then query the server to see if it had an updated answer -- you could do this by sending a time stamp to the server and return a 304 Not Modified if the entity hasn't changed.
Generally, it depends on how much time you have to put into the app and what your specific requirements are, but as a general rule I would optimise for as low bandwidth usage as possible, as that not only reduces potential data costs, but also means the answers will be more quickly available to your users (when online and they have not changed) and also available offline.
If you do not wish to store data locally at all,
The problem:
I have been using for some time now my own cache system by using NSFileManager. Normally the data I receive is JSON and I just save the dictionary directly into cache (in the Documents Folder). When I need it back I will just go get it. I also implement, sometimes when I feel it's better, a NSDictionary on the root Folder with keys/values for the path for a given resource. For example:
Resource about weather in Geneve 17/02/2013, so I would have a key called GE_17_02_2013 and the value the path to the NSDictionary with the information.
Normally I don't need to do any complex queries. But, somehow, and from what I have been reading, when you have a lot of data, you should stick with Core Data. In my case, I normally have a lot of data, but I never actually felt the application going down, or suffering in terms of performance. So my questions are:
In this case, where sometimes (the weather thing was just an
example) I need to just remove all the data (a Twitter feed, for
example) and replace it by a completely new stream of data, is Core
Data worth? I think removing all the data, and inserting (populating) it, is heavier than just store the NSDictionary and replacing the old one.
Sometimes it would envolve images, textFiles, etc and the
NSFileManager does it perfectly, so what advantages could Core
Data bring in this cases?
P.S: I just saw this post, where this kind of question is made and numbers prove which one is actually faster. Still, I would like as well an empiric answer.
Core Data is worth using in every scenario you described. In fact, if an app stores more than preferences, you should probably use Core Data. Here are some reasons, among which, you'll find answers to your own problems:
is definitely faster than the filesystem, even if you wipe out everything and write it again as you describe (so you don't benefit to much from caching). This is basically because you can coalesce your operations and only access the store when needed. So if you read, write and read, you can save only once, the rest is done in memory, which is, needless to say, very fast.
everything is versioned and you can migrate from one version to another easily (while keeping the content the user has on the device)
80% of your model operations come free. Like, when something changes, you can override the willSave managed object method and notify your controllers.
using cascade makes it trivial to delete even very complex object structures
while is a bad idea to keep images in the database, you can still keep them on the filesystem and have core data delete them automatically when the managed object that represents them is deleted
is flexible, in fact is so flexible that you could migrate your project from using the local filesystem to using a server with very little modifications by writing a custom data store.
the core data designer basically creates the model objects for you. You don't need to create your own model classes (which you would have to if using the filesystem)
In this case ... is Core Data worth it?
Yes, to the extent that you need something more centrally managed than trying to draw up your own file-system schema. Core Data, and its lower-level cousin SQL, are still the best choice for persistence that we have right now. Besides, the performance hit of using NSKeyed(Un)Archiver to keep serializing/deserializing a dictionary over and over again becomes increasingly noticeable with larger datasets.
I think removing all the data, and inserting (populating) it, is heavier
than just store the NSDictionary and replacing the old one.
Absolutely, yes. But, you don't have to think about cache turnover like that. If you imagine Core Data as a static model, you can use a cache layer to ferry data in and out of the store. Need that resource about the weather? Check the cache layer. If it's not in there, make the cache run a fetch request. Need to turn over the whole cache? Have the cache empty itself then run a request to mark every entity with some kind of flag to show they are invalid. The expensive deletion you're worrying about can be done by a background process when you see that all your new data has been safely interned in the cache.
Sometimes it would envolve images, textFiles, etc and the
NSFileManager does it perfectly, so what advantages could Core Data
bring in this cases?
Unfortunately, not many. For blobs of data (which is essentially what Core Data does in these situations), storage and fetches to and from Core Data can quickly get costly. They can also take up a noticeably larger space on disk if they aren't compressed (which further decreases performance). If you need a faster alternative, use a store more suited to the task like Tokyo Cabinet or LevelDB, and use the entities in the Core Data store as a kind of stand-in that would, say, contain the key to the resource in one of those relational databases.
I have what I would presume is a very common situation, but as I'm new to iOS programming, I'm not sure of the optimal way to code it.
Synopsis:
I have data on a server which can be retrieved by the iPhone app via a REST service. On the server side, the data is objects with a foreign key (an integer id number).
I'm storing the data retrieved via REST in Core Data. The managed objects have an "objId" attribute so that I can uniquely identify the managed objects in the rest of my code.
My app must always reflect the server data.
On subsequent requests made to the server:
some objects may not be returned, they have been deleted on the server - in which case I need to delete the corresponding objects from Core Data - so that I'm reflecting the state of the server correctly.
some objects have attributes which have changed, therefore the corresponding managed objects need updating with the new data.
my solution - and question to you
To get things going in my app, I made the easiest solution of deleting all objects in Core Data, then adding all new objects in, created with the latest server side data.
I don't think this is the best way to approach it :) As I progress on with my app, I now want to link up my tableview with NSFetchedResultsController, and have realised that my approach of deleting everything and re-adding is not going to work any more.
What is the tried and trusted way of syncing Core Data with server side data?
Do I need to make a fetch request for each object id I get back from the server, and then update the object with the new data?
And then go through all of the objects in core data and see which ones have not been updated, and delete those?
Is that the best way to do it? It just seems a little expensive to do a fetch for each object in Core Data, that's all.
Pseudo code is fine for any answers :)
thanks in advance!
Well, consider your download. First, you should be doing this in a background thread (if not, there are lots of SO posts that talk about how to do that).
I would suggest that you implement what makes sense first, and then, after you can get valid performance data from running Instruments, consider performance optimization. Of course, use some common sense on "easy" performance stuff (your design can take care of the big ones easily enough).
Anyway, get your data from the online resource, and then, for each object fetched, use the "unique object id" to fetch the object from core data. You know there is only one object with that ID, so you can set fetchLimit to 1 on your fetch request. You can also configure your "object id" attribute to be an INDEX in the database. This way, you get the fastest search from the underlying database, and it knows to stop looking once it finds your one object. This should be pretty snappy.
Now you have your object. Change any attributes necessary. Save, rinse, and repeat.
Furthermore, for several reasons, you may want to know when objects were last updated. I'd suggest adding a timestamp to each object that gets changed with the current time every time an object is changed. This will also help in deleting objects. Since your online database does not tell you which objects are deleted, you must have some way to know that an item is "old and no longer needed."
An easy way to do this is to remember the time you started your update. After processing all objects from the download, you now have a way to find all the objects that were deleted from the online database. Basically, any object with a "last update" timestamp before the time you began the update should be removed (since they were not added or modified in the last update). You can also index the database on this field, which will make finding those objects faster - unless your database is huge, I'd wait to see what Instruments has to say about this one though.