I am required to develop a big application,required to know graph database concepts the link http://sparsity-technologies.com/UserManual/API.html#transactions.I am planning to use core data instead of above link frame work. I want answerers for the following questions.
1)What is Graph Database exactly?.Explain with simple general example.which we can not perform with sqlite.
2)Does core data come under relational data base or not ? Explain.
3)Does core data come under Graph Database? But in apple documentation they mentioned that core data is for object graph management.object graph management means Graph Database .If i want to make relation ships ,weighted edge between objects core data is suitable?.
1)What is Graph Database exactly?.Explain with simple general
example.which we can not perform with sqlite.
Well, since this is all Turing complete, you can do it any database operation with any other database, the real question is a matter of efficiency.
In conventional "relational" databases the "relationships" are nothing but pointers to entries in other tables. They don't inherently communicate any information other than, "A is connected to B" To capture and structure anything more complex than that, you have to build a lot of pseudo-structure.
A1-->B1 // e.g. first-name, last-name
Which is fine but the relationship doesn't necessarily have a reciprocal, nor does the data in each table cell have to be names. To make the relationship always make sense, you've got build a lot of logic to put the data into the tables directly. Ditto for getting it out.
In a GraphDB you have "nodes" and "relationships". Nodes are not entries in a table. They can be arbitrarily complex objects, persisted or not, and persisted in a variety of ways. Nodes general model some "real-world" object like a person.
"Relationships" GraphDBs, owing to the previous meaning in SQL et al, really need another term because instead of be simple pointers, they to can be arbitrarily complex objects. In a node of names (way to simple to actually justify it)
Node-Name-A--(comes before)-->Node-Name-B
Node-Name-B--(comes after)-->Node-Name-B
In a sqlite, to find first and last names you query both tables. In a Graph, you grab one of the nodes and follow its relationship to other node.
(Come to think of it, graph theory in math started out as a way to model bridges of Konigsberg connecting the islands that made up the city. So maybe a transportation map would be a better example)
If cities are nodes, the roads are relationships. The road objects/descriptors would just connect the two but would contain their own logic and data such as their direction, length, conditions, traffic, suseptiblity to weather, and so on.
When you wanted to fetch and optimum route between widely separated cities, nodes for any particular time, traffic weather etc between two different nodes, you'd start with the node representing the start city and the follow the relationship/road-descriptors. In a complex model, any two nearby city-nodes might have several roads connecting them each best in certain circumstances.
All you have to do computationally though is compare the relationships between any to nodes. This is called "walking the graph" The huge benefit is that no matter how big the overall DB is, you only have to process the relationships coming out of the first node, say 3, and ignore utterly the the millions of other nodes and relationships that might be in the DB.
Can't do that in sqlite. The more data, the more "relationships" the more you have to process
2)Does core data come under relational data base or not ? Explain.
No, but if you hum a few bars you can fake it. By default, Core Data is an Object graph, which means it does connect object/nodes, but the relationships are themselves not objects but are instead defined by information contained in the class for each Object. E.g. you could have a Core Data of the usual Company, manager and employee.
CompanyClass
set_of_manager_objects
min_managers==1, max_managers==undefined
delete_Company_Object_delete_all_manager_objects
reciprocal_relationship_from_manager_is_company
ManagerClass
one company object
min_companies==1, max_companies==1
delete_manager_object_nullify (remove from set in company class)
recipocal_relationship_from_company_is_manager
So, Core Data a kind of "missing link" in the evolution of GraphDBs. I has relationships but they're not objects of themselves. They're inside the object/node. The relationship properties are hard coded into the classes themselves and just a few, but not all values can be changed. Still, Core Data does have the advantage of walking the graph. To find the Employees of one manager at one company. You just start at the company object, go through a small set of managers to find the right one, then walk down to the employee set. Even if you had hundreds of companies, thousands of managers and tens of thousands of employees. You can find one employee out of tens of thousands with a couple of hops.
But you can fake a GraphDB by creating relationship objects and putting them between any two object/nodes. Because Core Data allows any subclasses of relationship definition to be in the same relationship set e.g. ManagerClass--> LowManager,MidManager,HighManager, you can define a simple relationship in any given class and then populate with objects of arbitrary complexity as long as they are subclasses. These are usually termed "linking classes" or "linking relationships"
The normal pattern is to have the linking class have a relationship to the two or more classes it might have to link (which can be generic as well, I've started class trees with a base class with nothing but relationship properties, although their is a performance penalty if you get huge.)
If you give each node/object several relationships all defined on separate base linking classes, you can link the same nodes together in multiple ways.
3)Does core data come under Graph Database?
No, because the fundamental task of a database is persistence, saving the data. The fundamental task of Core Data is modeling the logic of the data inside the app.
Two different things. For example, when I start building a Core Data model, I start with an in-memory store, usually with test. The model graph is built from scratch every run, in memory, never touches the disk. As it progresses, I will shift to an XML store on disk, so I can examine it if necessary. The XML and binary stores are written out once entire and read in the same way. Only, at the end do I change the store to MySQL or something custom.
In a GraphDB, the nodes, relationships and the general graph are tied to the persistence systems innately AFAK and can't be altered. When you walk the graph, you walk the persistence, every time (except for caching.)
The usual question people ask is when to use Core Data and when to use SQL in the Apple Ecosystem.
The answer is pretty simple:
Core Data handles complexity inside the running app. The more complex the data model interactions, the more you get free with Core Data.
SQL derived solutions handle volumes of simple data. If the data model inside the app has little or no logic and there's a lot of it.
If your app is displaying something that would fit on a bunch of index cards, library book records, baseball cards etc, the an SQL solution is best because of the logic is just getting particular cards in and out of persistence.
If your app is complex vector drawing app, where every document will be different and of arbitrary complexity, or you're modeling an V8 engine, then most of the logic in the active data model while the app is running while persistence is trivial, then Core Data is the better choice.
Graph Databases are catching on because our data is getting 1) really, really big and 2) increasing complex. We need to model the complexity in the node-relationship graph in persistence so we don't have chew through the entire DB to find the data and then have to add an additional layer of logic
Core data is nothing but Data Model Layer, core data is NOT a datatbase and far away from being a graph database.
Core data only helps you to
Create Tables (Entities)
Columns in a table (Attribute)
Relationship (such as primary key, foreign key, one to one, one to many)
Core Data uses sqlite to store data and make queries.
Core Data is used in iOS mobile apps, I believe what you want is a backend solution for database.
Related
I'm trying to figure out how to use Core Data in my App. I already have in mind what the object graph would be like at runtime:
An Account object owns a TransactionList object.
A TransactionList object contains all the transactions of the account. Rather than being a flat list, it organizes transactions per day. So it contains a list of DailyTransactions objects sorted by date.
A DailyTransactions contains a list of Transaction objects which occur in a single day.
At first I thought Core Data was an ORM so I thought I might just need two tables: Account table and Transaction table which contained all transactions and set up the above object graph (i.e., organizing transactions per date and generating DailyTransactions objects, etc.) using application code at run time.
When I started to learn Core Data, however, I realized Core Data was more of an object graph manager than an ORM. So I'm thinking about using Core Data to implement above runtime object relationship directly (it's not clear to me what's the benefit but I believe Core Data must have some features that will be helpful).
So I'm thinking about a data model in Core Data like the following:
Acount <--> TransactionList -->> DailyTransactions -->> Transaction
Since I'm still learning Core Data, I'm not able to verify the design yet. I suppose this is the right way to use Core Data. But doesn't this put too many implementation details, instead of raw data, in persistent store? The issue with saving implementation details, I think, is that they are far more complex than raw data and they may contain duplicate data. To put it in another way, what exactly does the "data" in data model means, raw data or any useful runtime objects?
An alternative approach is to use Core Data as ORM by defining a data model like:
Account <-->> Transactions
and setting up the runtime object graph using application code. This leads to more complex application code but simpler database design (I understand user doesn't need to deal with database directly when using Core Data, but still it's good to have a simpler system). That said, I doubt this is not the right way to use Cord Data.
A more general question. I did little database programming before, but I had the impression that there was usually a business object layer above plain old data object layer in server side programming framework like J2EE. In those architectures, objects that encapsulate application business are not same as the objects loaded from database. It seems that's not the case with Core Data?
Thanks for any explanations or suggestions in advance.
(Note: the example above is an simplification. A transaction like transfer involves two accounts. I ignore that detail for simplification.)
Now that I read more about the Core Data, I'll try to answer my own question since no one did it. I hope this may help other people who have the same confusion as I did. Note the answer is based on my current (limited) understanding.
1. Core Data is an object graph manager for data to be persistently stored
There are a lot articles on the net emphasizing that Core Data manages object graph and it's not an ORM or database. While they might be technically correct, they unfortunately cause confusion to beginner like me. In my opinion, it's equally important to point out that objects managed by Core Data are not arbitrary runtime objects but those that are suitable for being saved in database. By suitable it means all these objects conform to principles of database schema design.
So, what' a proper data model is very much a database design question (it's important to point out this because most articles try to ask their readers to forget about database).
For example, in the account and transactions example I gave above, while I'd like to organize transactions per day (e,g., putting them in a two-level list, first by date, then by transaction timestamp) at runtime. But the best practice in database design is to save all transactions in a single table and generating the two-level list at runtime using application code (I believe so).
So the data model in Core Data should be like:
Account <->> Transaction
The question left is where I can add the code to generate the runtime structure (e.g., two-level list) I'd like to have. I think it's to extend Account class.
2. Constraints of Core Data
The fact that Core Data is designed to work with database (see 1) explains why it has some constraints on the data model design (i.e., attribute can't be of an arbitrary type, etc.).
While I don't see anyone mentioned this on the net, personally I think relationship in Core Data is quite limited. It can't be of a custom type (e.g, class) but has to be a variable (to-one) or an array (to-many) at run time. That makes it far less expressive. Note: I guess it's so due to some technical reason. I just hope it could be a class and hence more flexible.
For example, in my App I actually have complex logic between Account and its Transaction and want to encapsulate it into a single class. So I'm thinking to introduce an entity to represent the relationship explicitly:
Account <->> AccountTranstionMap <-> Transaction
I know it's odd to do this in Core Data. I'll see how it works and update the answer when I finish my app. If someone knows a better way to not do this, please let me know!
3. Benefits of Core Data
If one is writing a simple App, (for example, an App that data modal change are driven by user and hence occurs in sequence and don't have asynchronous data change from iCloud), I think it's OK to ignore all the discussions about object graph vs ORM, etc. and just use the basic features of Core Data.
From the documents I have read so far (there are still a lot I haven't finished), the benefits of Core Data includes automatic mutual reference establishment and clean up, live and automatically updated relationship property value, undo, etc. But if your App is not complex, it might be easier to implement these features using application code.
That said, it's interesting to learn a new technology which has limitation but at the same time can be very powerful in more complex situations. BTW, just curious, is there similar framework like Core Data on other platforms (either open source or commercial)? I don't think I read about similar things before.
I'll leave the question open for other answers and comments :) I'll update my answer when I have more practical experience with Core Data.
I'm trying to learn Azure Machine Learning and it seems the data sources for all the algorithms are two dimensional. Is there any way I can use one to many relational tables as a data source? or is it even possible?
It's not possible as far as I'm aware :(
However, the general rule is that you should flatten a relational graph into a single array of values. Remember, though, that you should have one array of values per main entity, it looks to me like your main entity in your example is the one with the Visits in.
Effectively, you'd be saying that all diagnoses are a property of Visit, but because there's potentially more than one, you'd have to have properties such as Diagnosis1, Diagnosis2, Diagnosis3 ...etc.
I'm playing around with neo4j - seeing what I can and can't do with it before suggesting it for something serious. One of the things I'm looking at now is Data Partitioning. By this I mean having a single data store that contains data from many different customers, and knowing which customer the data belongs to.
In the SQL world, we've always done this by having a customer_id field on the tables that are customer specific, and then always including that in the queries and indices. This works perfectly well for us, but in the Graph DB world it feels like we can do better.
The options that I've come up with some far are:
The same as before - including a property on the nodes that is the Customer ID
Storing a Label on each Node that identifies the Customer. However, as far as I can tell you can't bind parameters to labels so this would mean that the queries are generated slightly awkwardly.
Storing a Customer Node, and linking all of the other nodes to it.
Number #3 seems to be the "correct" Graph DB way of managing this, but I'm concerned with the impact of this on the performance of the data. It's perfectly feasible that there will be hundreds of thousands of links from a single Customer Node to the other data nodes, and there will be hundreds of different Customer Nodes. (Based on the volume of data in the existing SQL database)
What's the recommended way of achieving this level of data partitioning whilst maintaining performance?
I realize this may be common sense for a lot of people, so apologies if this seems like a stupid question.
I am trying to learn core data for iOS programming, and I have repeatedly read and heard it said that Core Data (CD) is not a relational database. But very little else is said about this, or why exactly it is important to know beyond an academic sense. I mean functionally at least, it seems you can use CD as though it were a database for most things - storing and fetching data, runnings queries etc. From my very rudimentary understanding of it, I don't really see how it differs from a database.
I am not questioning the fact that the distinction is important. I believe that a lot of smart people would not be wasting their time on this point if it weren't useful to understand. But I would like someone please to explain - ideally with examples - how CD not being a relational database affects how we use it? Or perhaps, if I were not told that CD isn't a relational database, how would this adversely impact my performance as an Objective-C/Swift programmer?
Are there things that one might try to do incorrectly if they treated CD as a relational database? Or, are there things which a relational database cannot do or does less well that CD is designed to do?
Thank you all for your collective wisdom.
People stress the "not a relational database" angle because people with some database experience are prone to specific errors with Core Data that result from trying to apply their experience too directly. Some examples:
Creating entities that are essentially SQL junction tables. This is almost never necessary and usually makes things more complex and error prone. Core Data supports many-to-many relationships directly.
Creating a unique ID field in an entity because they think they need one to ensure uniqueness and to create relationships. Sometimes creating custom unique IDs is useful, usually not.
Setting up relationships between objects based on these unique IDs instead of using Core Data relationships-- i.e. saving the unique ID of a related object instead of using ObjC/Swift semantics to relate the objects.
Core Data can and often does serve as a database, but thinking of it in terms of other relational databases is a good way to screw it up.
Core Data is a technology with many powerful features and tools such as:
Change tracking (undo/redo)
Faulting (not having to load entire objects which can save memory)
Persistence
The list goes on..
The persistence part of Core Data is backed by SQLite, which is a relational database.
One of the reasons I think people stress that Core Data is not a relational database is because is it so much more than just persistence, and can be taken advantage of without using persistence at all.
By treating Core Data as a relational database, I assume you mean that relationships between objects are mapped by ids, i.e. a Customer has a customerId and a product has a productId.
This would certainly be incorrect because Core Data let's you define powerful relationships between object models that make things easy to manage.
For example, if you want to have your customer have multiple products and want to delete them all when you delete the customer, Core Data gives you the ability to do that without having to manage customerIds/productIds and figuring out how to format complex SQL queries to match your in-memory model. With Core Data, you simply update your model and save your context, the SQL is done for you under the hood. (In fact you can even turn on debugging to print out the SQL Core Data is performing for you by passing '-com.apple.CoreData.SQLDebug 1' as a launch argument.
In terms of performance, Core Data does some serious optimizations under the hood that make accessing data much easier without having to dive deep into SQL, concurrency, or validation.
I THINK the point is that it is different from a relational database and that trying to apply relational techniques will lead the developer astray as others have mentioned. It actually operates at a higher level by abstracting the functionality of the relational database out of your code.
A key difference, from a programming standpoint, is that you don't need unique identifiers because core data just handles that. If you tried to create your own, you will come to see that they are redundant and a whole lot of extra trouble.
From the programmer's perspective, whenever you access an entity "record", you will have a pointer to any relationship -- perhaps a single pointer for a "to-one" relationship, or a set of pointers to the records in a "to-many" relationship. Core Data handles the retrieval of the actual "records" when you use one of the pointers.
Because Core Data efficiently handles faults (where the "record" (object) referenced by a pointer is not in memory) you do not have to concern yourself with their retrieval generally. When they're needed by your program Core Data will make them available.
At the end of the day, it provides similar functionality but under the hood it is different. It does require some different thinking in that ordinary SQL doesn't make sense in the context of Core Data as the SQL (in the case of a sqlite store) is handled for you.
The main adjustments for me in transitioning to Core Data were as noted -- getting rid of the concept of unique identifiers. They're going on behind the scenes but you never have to worry about them and should not try to define your own. The second adjustment for me was that whenever you need an object that is related to yours, you just grab it by using the appropriate pointer(s) in the entity object you already have.
I am new to iOS programming but have done SQL stuff for years. I am trying to use Core Data to build my model. Following the tutorials I have created a schema for my application that involves a number of one-to-many relationships that are not bi-directional.
For example I have a Games entity and a Player entity. A Game includes a collection of Players. Because a Player can be involved in more than one game, an inverse relationship does not make any sense and is not needed.
Yet when I compile my application, I get Consistency Error messages in two forms. One says.
Game.players does not have an inverse; this is an advanced setting.
Really? This is an "advanced" capability enough to earn a warning message? Should I just ignore this message or am I actually doing something wrong here that Core Data is not designed to do?
The other is of the form Misconfigured Property and logs the text:
Something.something should have an inverse.
So why would it think that?
I can't find any pattern to why it picks one error message over the other. Any tips for an iOS newb would be appreciated.
This is under Xcode 5.0.2.
Core Data is not a database. This is an important fact to grasp otherwise you will be fighting the framework for a long time.
Core Data is your data model that happens to persist to a database as one of its options. That is not its main function, it is a secondary function.
Core Data requires/recommends that you use inverse relationships so that it can keep referential integrity in check without costly maintenance. For example, if you have a one way between A and B (expressed A --> B) and you delete a B, Core Data may need to walk the entire A table looking for references to B so that it can clean them up. This is expensive. If you have a proper bi-directional relationship (A <-> B) then Core Data knows exactly which A objects it needs to touch to keep the referential integrity.
This is just one example.
Bi-directionality is not required but it is recommended highly enough that it really should be considered a requirement. 99.999% of the time you want that bi-directional relationship even if you never use it. Core Data will use it.
Why not just add the inverse relationship? It can be to-many as well, and you may well end up using it - often fetch requests or object graph navigation works faster or better coming from a different end of a relationship.
Core Data prefers you to define relationships in both directions (hence the warnings) and it costs you nothing to do so, so you may as well. Don't fight the frameworks - core data isn't an SQLLite "manager", it is an object graph and persistence tool, that can use SQL as a back end.