Core Data many-to-many relationship & data integrity

Core Data many-to-many relationship & data integrity - ios

I'm working with Core Data and a many-to-many relationship: a building can have multiple departments, and a department can be in multiple buildings. Having worked with databases before, I was unsure of how to implement this in Core Data, but I found this in the Core Data Programming Guide:
If you have a background in database management and this causes you
concern, don't worry: if you use a SQLite store, Core Data
automatically creates the intermediate join table for you.
However, there's not really any data integrity. I've tried inserting a few building objects, which for now only have one attribute (number), and every time I set the department object (relationship) it relates to. This results in the database containing multiple building objects with the same building number, all relating to a different department object. Ideally, there would be one object per building number, with in it all the different departments that are located in it.
So, my question is: can Core Data maintain data integrity somehow, or should I check to see if a building object with that number already exists before inserting it? It looks like I'll have to manually check it, but it would be cool if Core Data could do this for me.

What melsam wrote is right. In addition to his answer I suggest you to use inverse relationships. About inverse, Apple says:
You should typically model relationships in both directions, and
specify the inverse relationships appropriately. Core Data uses this
information to ensure the consistency of the object graph if a change
is made (see “Manipulating Relationships and Object Graph Integrity”).
For a discussion of some of the reasons why you might want to not
model a relationship in both directions, and some of the problems that
might arise if you don’t, see “Unidirectional Relationships.”
A key point to understand is that when you work with Core Data, you work with objects. So, integrity criteria are resolved when you save the context or you explicity says to context to process also process pending changes (see processPendingChanges method).
About your question, I guess you have to create a fetch request and retrieve the object(s) you are looking for (e.g. you could provide to each object a specific id and set up a predicate with the id you want).
If the fetch request retrieve some objects, then you can update them. If not, create a new object with insertNewObjectForEntityForName:inManagedObjectContext:. Finally save the context.
I suggest you to read about Efficiently Importing Data.
Hope it helps.

Core Data maintains data integrity for you. I can assure you (from lots of experience with Core Data) that you do not have to manually check integrity. Doublecheck how your relationships and delete rules are set up in Xcode's Core Data Model Editor. I can't tell exactly what may be wrong with the details you've provided, but you'll find it if you poke around.

Related

managed objects vs. business objects

I'm trying to figure out how to use Core Data in my App. I already have in mind what the object graph would be like at runtime:
An Account object owns a TransactionList object.
A TransactionList object contains all the transactions of the account. Rather than being a flat list, it organizes transactions per day. So it contains a list of DailyTransactions objects sorted by date.
A DailyTransactions contains a list of Transaction objects which occur in a single day.
At first I thought Core Data was an ORM so I thought I might just need two tables: Account table and Transaction table which contained all transactions and set up the above object graph (i.e., organizing transactions per date and generating DailyTransactions objects, etc.) using application code at run time.
When I started to learn Core Data, however, I realized Core Data was more of an object graph manager than an ORM. So I'm thinking about using Core Data to implement above runtime object relationship directly (it's not clear to me what's the benefit but I believe Core Data must have some features that will be helpful).
So I'm thinking about a data model in Core Data like the following:
Acount <--> TransactionList -->> DailyTransactions -->> Transaction
Since I'm still learning Core Data, I'm not able to verify the design yet. I suppose this is the right way to use Core Data. But doesn't this put too many implementation details, instead of raw data, in persistent store? The issue with saving implementation details, I think, is that they are far more complex than raw data and they may contain duplicate data. To put it in another way, what exactly does the "data" in data model means, raw data or any useful runtime objects?
An alternative approach is to use Core Data as ORM by defining a data model like:
Account <-->> Transactions
and setting up the runtime object graph using application code. This leads to more complex application code but simpler database design (I understand user doesn't need to deal with database directly when using Core Data, but still it's good to have a simpler system). That said, I doubt this is not the right way to use Cord Data.
A more general question. I did little database programming before, but I had the impression that there was usually a business object layer above plain old data object layer in server side programming framework like J2EE. In those architectures, objects that encapsulate application business are not same as the objects loaded from database. It seems that's not the case with Core Data?
Thanks for any explanations or suggestions in advance.
(Note: the example above is an simplification. A transaction like transfer involves two accounts. I ignore that detail for simplification.)

Now that I read more about the Core Data, I'll try to answer my own question since no one did it. I hope this may help other people who have the same confusion as I did. Note the answer is based on my current (limited) understanding.
1. Core Data is an object graph manager for data to be persistently stored
There are a lot articles on the net emphasizing that Core Data manages object graph and it's not an ORM or database. While they might be technically correct, they unfortunately cause confusion to beginner like me. In my opinion, it's equally important to point out that objects managed by Core Data are not arbitrary runtime objects but those that are suitable for being saved in database. By suitable it means all these objects conform to principles of database schema design.
So, what' a proper data model is very much a database design question (it's important to point out this because most articles try to ask their readers to forget about database).
For example, in the account and transactions example I gave above, while I'd like to organize transactions per day (e,g., putting them in a two-level list, first by date, then by transaction timestamp) at runtime. But the best practice in database design is to save all transactions in a single table and generating the two-level list at runtime using application code (I believe so).
So the data model in Core Data should be like:
Account <->> Transaction
The question left is where I can add the code to generate the runtime structure (e.g., two-level list) I'd like to have. I think it's to extend Account class.
2. Constraints of Core Data
The fact that Core Data is designed to work with database (see 1) explains why it has some constraints on the data model design (i.e., attribute can't be of an arbitrary type, etc.).
While I don't see anyone mentioned this on the net, personally I think relationship in Core Data is quite limited. It can't be of a custom type (e.g, class) but has to be a variable (to-one) or an array (to-many) at run time. That makes it far less expressive. Note: I guess it's so due to some technical reason. I just hope it could be a class and hence more flexible.
For example, in my App I actually have complex logic between Account and its Transaction and want to encapsulate it into a single class. So I'm thinking to introduce an entity to represent the relationship explicitly:
Account <->> AccountTranstionMap <-> Transaction
I know it's odd to do this in Core Data. I'll see how it works and update the answer when I finish my app. If someone knows a better way to not do this, please let me know!
3. Benefits of Core Data
If one is writing a simple App, (for example, an App that data modal change are driven by user and hence occurs in sequence and don't have asynchronous data change from iCloud), I think it's OK to ignore all the discussions about object graph vs ORM, etc. and just use the basic features of Core Data.
From the documents I have read so far (there are still a lot I haven't finished), the benefits of Core Data includes automatic mutual reference establishment and clean up, live and automatically updated relationship property value, undo, etc. But if your App is not complex, it might be easier to implement these features using application code.
That said, it's interesting to learn a new technology which has limitation but at the same time can be very powerful in more complex situations. BTW, just curious, is there similar framework like Core Data on other platforms (either open source or commercial)? I don't think I read about similar things before.
I'll leave the question open for other answers and comments :) I'll update my answer when I have more practical experience with Core Data.

"Core Data is not a relational database." Why exactly is this important to know?

I realize this may be common sense for a lot of people, so apologies if this seems like a stupid question.
I am trying to learn core data for iOS programming, and I have repeatedly read and heard it said that Core Data (CD) is not a relational database. But very little else is said about this, or why exactly it is important to know beyond an academic sense. I mean functionally at least, it seems you can use CD as though it were a database for most things - storing and fetching data, runnings queries etc. From my very rudimentary understanding of it, I don't really see how it differs from a database.
I am not questioning the fact that the distinction is important. I believe that a lot of smart people would not be wasting their time on this point if it weren't useful to understand. But I would like someone please to explain - ideally with examples - how CD not being a relational database affects how we use it? Or perhaps, if I were not told that CD isn't a relational database, how would this adversely impact my performance as an Objective-C/Swift programmer?
Are there things that one might try to do incorrectly if they treated CD as a relational database? Or, are there things which a relational database cannot do or does less well that CD is designed to do?
Thank you all for your collective wisdom.

People stress the "not a relational database" angle because people with some database experience are prone to specific errors with Core Data that result from trying to apply their experience too directly. Some examples:
Creating entities that are essentially SQL junction tables. This is almost never necessary and usually makes things more complex and error prone. Core Data supports many-to-many relationships directly.
Creating a unique ID field in an entity because they think they need one to ensure uniqueness and to create relationships. Sometimes creating custom unique IDs is useful, usually not.
Setting up relationships between objects based on these unique IDs instead of using Core Data relationships-- i.e. saving the unique ID of a related object instead of using ObjC/Swift semantics to relate the objects.
Core Data can and often does serve as a database, but thinking of it in terms of other relational databases is a good way to screw it up.

Core Data is a technology with many powerful features and tools such as:
Change tracking (undo/redo)
Faulting (not having to load entire objects which can save memory)
Persistence
The list goes on..
The persistence part of Core Data is backed by SQLite, which is a relational database.
One of the reasons I think people stress that Core Data is not a relational database is because is it so much more than just persistence, and can be taken advantage of without using persistence at all.
By treating Core Data as a relational database, I assume you mean that relationships between objects are mapped by ids, i.e. a Customer has a customerId and a product has a productId.
This would certainly be incorrect because Core Data let's you define powerful relationships between object models that make things easy to manage.
For example, if you want to have your customer have multiple products and want to delete them all when you delete the customer, Core Data gives you the ability to do that without having to manage customerIds/productIds and figuring out how to format complex SQL queries to match your in-memory model. With Core Data, you simply update your model and save your context, the SQL is done for you under the hood. (In fact you can even turn on debugging to print out the SQL Core Data is performing for you by passing '-com.apple.CoreData.SQLDebug 1' as a launch argument.
In terms of performance, Core Data does some serious optimizations under the hood that make accessing data much easier without having to dive deep into SQL, concurrency, or validation.

I THINK the point is that it is different from a relational database and that trying to apply relational techniques will lead the developer astray as others have mentioned. It actually operates at a higher level by abstracting the functionality of the relational database out of your code.
A key difference, from a programming standpoint, is that you don't need unique identifiers because core data just handles that. If you tried to create your own, you will come to see that they are redundant and a whole lot of extra trouble.
From the programmer's perspective, whenever you access an entity "record", you will have a pointer to any relationship -- perhaps a single pointer for a "to-one" relationship, or a set of pointers to the records in a "to-many" relationship. Core Data handles the retrieval of the actual "records" when you use one of the pointers.
Because Core Data efficiently handles faults (where the "record" (object) referenced by a pointer is not in memory) you do not have to concern yourself with their retrieval generally. When they're needed by your program Core Data will make them available.
At the end of the day, it provides similar functionality but under the hood it is different. It does require some different thinking in that ordinary SQL doesn't make sense in the context of Core Data as the SQL (in the case of a sqlite store) is handled for you.
The main adjustments for me in transitioning to Core Data were as noted -- getting rid of the concept of unique identifiers. They're going on behind the scenes but you never have to worry about them and should not try to define your own. The second adjustment for me was that whenever you need an object that is related to yours, you just grab it by using the appropriate pointer(s) in the entity object you already have.

Core Data Memory Efficient Migration

I am currently building a CoreData migration for an app which has 200k / 500k average rows of data per entity. There are currently 15 entities within the CoreData Model.
This is the 7th migration I have built for this app, all of the previous have been simple (add 1or 2 column style) migrations, which have not been any trouble and have not needed any mapping models.
This Migration
The migration we are working on is fairly sizeable in comparison to previous migrations and adds a new entity between two existing entities. This requires a custom NSEntityMigrationPolicy which we have built to map the new entity relationships. We also have a *.xcmappingmodel, which defines the mapping between model 6 and the new model 7.
We have implemented our own subclass of NSMigrationManager (as per http://www.objc.io/issue-4/core-data-migration.html + http://www.amazon.com/Core-Data-Management-Pragmatic-Programmers/dp/1937785084/ref=dp_ob_image_bk).
The Problem
Apple uses the migrateStoreFromURL method of NSMigrationManager to migrate the model, however, this seems to be built for low/medium dataset sizes, which do not overload the memory.
We are finding that the app crashes due to memory overload (# 500-600mb on iPad Air/iPad 2) as a result of the following apple method not frequently dumping the memory on data transfer.
[manager migrateStoreFromURL:sourceStoreURL type:type options:nil withMappingModel:mappingModel toDestinationURL:destinationStoreURL destinationType:type destinationOptions:nil error:error];
Apple's Suggested Solution
Apple suggest that we should divide the *.xcmappingmodel up into a series of mapping models per individual entities - https://developer.apple.com/library/ios/documentation/cocoa/conceptual/CoreDataVersioning/Articles/vmCustomizing.html#//apple_ref/doc/uid/TP40004399-CH8-SW2. This would work neatly with the progressivelyMigrateURL methods defined in the above NSMigrationManager subclasses. However, we are not able to use this method as once entity alone will still lead to a memory overload due to the size of one entity by itself.
My guess would be that we would need to write our own migrateStoreFromURL method, but would like to keep this as close to as Apple would have intended as possible. Has anyone done this before and/or have any ideas for how we could achieve this?

The short answer is that heavy migrations are not good for iOS and should be avoided at literally any cost. They were never designed to work on a memory constrained device.
Having said that, a few question for you before we discuss a resolution:
Is the data recoverable? Can you download it again or is this user data?
Can you resolve the relationships between the entities without having the old relationship in place? Can it be reconstructed?
I have a few solutions but they are data dependent, hence the questions back to you.
Update 1
The data is not recoverable and cannot be re-downloaded. The data is formed from user activity within the application over a time period (reaching up to 1 year in the past). The relationships are also not reconstructable, unless we store them before we lose access to the old relationships.
Ok, what you are describing is the worst case and therefore the hardest case. Fortunately it isn't unsolvable.
First, Heavy migration is not going to work. We must write code to solve this issue.
First Option
My preferred solution is to do a lightweight migration that only adds the new relationship between the (now) three entities, it does not remove the old relationship. This lightweight migration will occur in SQLite and will be very quick.
Once that migration has been completed then we iterate over the objects and set up the new relationship based on the old relationship. This can be done as a background process or it can be done piece meal as the objects are used, etc. That is a business decision.
Once that conversion as been completed you can then do another migration, if needed, to remove the old relationship. This step is not necessary but it does help to keep the model clean.
Second Option
Another option which has value is to export and re-import the data. This has the added value of setting up code to back up the user's data in a format that is readable on other platforms. It is fairly simple to export the data out to JSON and then set up an import routine that pulls the data into the new model along with the new relationship.
The second option has the advantage of being cleaner but requires more code as well as a "pause" in the user's activities. The first option can be done without the user even being aware there is a migration taking place.

If I understand this correctly then you have one entity that is so big that when migrating this entity does cause the memory overload. In this case, how about splitting the migration of this one entity in several steps and therefore doing only some properties per each migration iteration?
That way you won't need to write your own code but you can benefit form the "standard" code.

iOS Core Data Wants All Relationships to be bi-directional

I am new to iOS programming but have done SQL stuff for years. I am trying to use Core Data to build my model. Following the tutorials I have created a schema for my application that involves a number of one-to-many relationships that are not bi-directional.
For example I have a Games entity and a Player entity. A Game includes a collection of Players. Because a Player can be involved in more than one game, an inverse relationship does not make any sense and is not needed.
Yet when I compile my application, I get Consistency Error messages in two forms. One says.
Game.players does not have an inverse; this is an advanced setting.
Really? This is an "advanced" capability enough to earn a warning message? Should I just ignore this message or am I actually doing something wrong here that Core Data is not designed to do?
The other is of the form Misconfigured Property and logs the text:
Something.something should have an inverse.
So why would it think that?
I can't find any pattern to why it picks one error message over the other. Any tips for an iOS newb would be appreciated.
This is under Xcode 5.0.2.

Core Data is not a database. This is an important fact to grasp otherwise you will be fighting the framework for a long time.
Core Data is your data model that happens to persist to a database as one of its options. That is not its main function, it is a secondary function.
Core Data requires/recommends that you use inverse relationships so that it can keep referential integrity in check without costly maintenance. For example, if you have a one way between A and B (expressed A --> B) and you delete a B, Core Data may need to walk the entire A table looking for references to B so that it can clean them up. This is expensive. If you have a proper bi-directional relationship (A <-> B) then Core Data knows exactly which A objects it needs to touch to keep the referential integrity.
This is just one example.
Bi-directionality is not required but it is recommended highly enough that it really should be considered a requirement. 99.999% of the time you want that bi-directional relationship even if you never use it. Core Data will use it.

Why not just add the inverse relationship? It can be to-many as well, and you may well end up using it - often fetch requests or object graph navigation works faster or better coming from a different end of a relationship.
Core Data prefers you to define relationships in both directions (hence the warnings) and it costs you nothing to do so, so you may as well. Don't fight the frameworks - core data isn't an SQLLite "manager", it is an object graph and persistence tool, that can use SQL as a back end.

Keeping Core Data Objects in multiple stores

I'm developing an iOS application using Core Data. I want to have the persistent store located in a shared location, such as a network drive, so that multiple users can work on the data (at different times i.e. concurrency is not part of the question).
But I also want to offer the ability to work on the data "offline", i.e. by keeping a local persistent store on the iPad. So far, I read that I could do this to some degree by using the persistent store coordinator's migration function, but this seems to imply the old store is then invalidated. Furthermore, I don't necessarily want to move the complete store "offline", but just a part of it: going with the simple "company department" example that Apple offers, I want users to be able to check out one department, along with all the employees associated with that department (and all the attributes associated with each employee). Then, the users can work on the department data locally on their iPad and, some time later, synchronize those changes back to the server's persistent store.
So, what I need is to copy a core data object from one store to another, along with all objects referenced through relationships. And this copy process needs to also ensure that if an object already exists in the target persistent store, that it's overwritten rather than a new object added to the store (I am already giving each object a UID for another reason, so I might be able to re-use the UID).
From all I've seen so far, it looks like there is no simple way to synchronize or copy Core Data persistent stores, is that a fair assessment?
So would I really need to write a piece of code that does the following:
retrieve object "A" through a MOC
retrieve all objects, across all entities, that have a relationship to object "A"
instantiate a new MOC for the target persistent store
for each object retrieved, check the target store if the object exists
if the object exists, overwrite it with the attributes from the object retrieved in steps 1 & 2
if the object doesn't exist, create it and set all attributes as per object retrieved in steps 1 & 2
While it's not the most complicated thing in the world to do, I would've still thought that this requirement for "online / offline editing" is common enough for some standard functionality be available for synchronizing parts of persistent stores?
Your point of views greatly appreciated,
thanks,
da_h-man

I was just half-kidding with the comment above. You really are describing a pretty hard problem - it's very difficult to nail this sort of synchronization, and there's seldom, in any development environment, going to be a turn-key solution that will "just work". I think your pseudo-code description above is a pretty accurate description of what you'll need to do. Although some of the work of traversing the relationships and checking for existing objects can be generalized, you're talking about some potentially complicated exception handling situations - for example, if updating an object, and only 1 out 5 related objects is somehow out of date, do you throw away the update or apply part of it? You say "concurrency" is not a part of the question, but if multiple users can "check out" objects at the same time, unless you plan to have a locking mechanism on those, you would start having conflicts when trying to make updates.
Something to check into are the new features in Core Data for leveraging iCloud - I doubt that's going to help with your problem, but it's generally related.
Since you want to be out on the network with your data, another thing to consider is whether Core Data is the right fit to your problem in general. Since Core Data is very much a technology designed to support the UI and MVC pattern in general, if your data needs are not especially bound to the UI, you might consider another type of DB solution.
If you are in fact leveraging Core Data in significant ways beyond just modeling, in terms of driving your UI, and you want to stick with it, I think you are correct in your analysis: you're going to have to roll your own solution. I think it will be a non-trivial thing to build and test.
An option to consider is CouchDB and an iOS implementation called TouchDB. It would mean adopting more of a document-oriented (JSON) approach to your problem, which may in fact be suitable, based on what you've described.

From what I've seen so far, I reckon the best approach is RestKit. It offers a Core Data wrapper that uses JSON to move data between remote and local stores. I haven't fully tried it yet, but from what the documentation reads, it sounds quite powerful and ideally suited for my needs.

You definetly should check these things:
Parse.com - cloud based data store
PFIncrementalStore https://github.com/sbonami/PFIncrementalStore - subclass of NSIncrementalStore which allows your Persistent Store Coordinator to store data both locally and remotely (on Parse Cloud) at the same time
All this stuff are well-documented. Also Parse.com is going to release iOS local datastore SDK http://blog.parse.com/2014/04/30/take-your-app-offline-with-parse-local-datastore/ wich is going to help keep your data synced.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart