I am new to Neo4j and I am trying to convert a relational model to a graph model. In this model, I have two labels X and Y which there is a relationship between them. This relationship has property P. The problem is that this P should get its values from an external table (list of possible values for P). How should I relate this property values to be obtained from this external table.
I can't say I'm completely following, but at the most basic level, if you already have x and y nodes modeled and populated (with unique constraints on the primary keys), and if you have a join table with x and y primary keys and a value that should be on the relationship, then it's a matter of reading in the import file of the join table, matching to the corresponding x and y nodes via the primary keys, then merging the appropriate relationship between them, adding any additional properties on the relationship as needed.
However, it's always a good idea to check if this is the best way to model what you want in a graph db. So far you've only been describing tables and how they relate, but getting a better description of the big picture of what this data represents and logically how it relates to each other might provide insights for modeling the data in a way that makes more sense for a graph db. Could you provide in your description a more verbal description of what exactly you're trying to model, how it relates to each other, and the kind of questions you want to ask of your data?
Related
How can I represent my lookup tables in technical reports?
In other words, the ER model is used to represent a database,
but what about lookup tables?
To recover a conceptual model (entity sets, attributes and relationships) from a physical model (tables and columns), we first have to understand the logical model. This means understanding the domains and functional dependencies which are represented by the lookup table.
Lookup table is a common term which can mean different things. I generally understand it as a table which represents a domain with a surrogate key, and associates it with a name and/or a few other attributes. In the ER model, these would be simple entity relations, and leaves / terminal nodes in the graph of entity sets.
If a lookup table records facts about only one type of thing (represented by the key of the lookup table), then you can represent that type as an entity set (rectangle) with an attribute (oval) for each dependent column, and draw relationships (diamonds) to connect it to other entity sets as required. Look for foreign key columns / constraints in other tables to find these relationships.
For example, consider the following physical model:
CarMake and CarModel are examples of lookup tables. This isn't a very good model, since in the real world CarModelId determines CarMakeId, while the model treats them as independent elements in CarSales. However, since the point of the example is to focus on lookup tables, I'll use it as is.
In this case, CarMake and CarModel describe a single entity set each. Their functional dependencies are CarMakeId -> CarMakeName and CarModelId -> CarModelName. In CarSales, we've got CarSaleId -> RegNumber, Price, SoldOn (attributes) and CarSaleId -> CarMakeId, CarModelId (relationships).
In this case our ER model is similar to the physical model:
However, in some cases, you may find multiple types of things combined into one lookup table due to the similar physical structure. This doesn't affect the logical or conceptual models, but makes it more complicated to recover since we have to understand how the table is used to unpack it.
The Entity-Attribute-Value (EAV) model is really powerful, but complex to implement using SQL, so people often look for alternatives to EAV. It seems like the perfect candidate for graph databases. I understand how to build a movie database where you have nodes with the Neo4j label "Movie" with the property "release_date" right on the node. How would you make this more generic, such that movies have the Neo4j label "Entity" following the general EAV model?
I've thought a lot about this, but I'm not confident I have a good solution. I'll take a stab at it anyway. Here's the most basic model:
<node> <relationship> <node>
Attribute --> :VALUE --> Entity
name="Label",type="string" --> value="Movie" --> name="The Matrix"
With this model, you can write code for how to display and edit Attribute.type. For example, maybe all labels have a text field with finite options on the front-end and all dates have a date-picker. You could break Attribute.type out into its own node, Type, if that was preferable (particularly would make sense for handling composite types). In that case, you have the relationship TYPE between Attribute and Type nodes.
This becomes a problem if entities have multiple relationships, as is the case for reviews or if you want to relate the value to something else, such as the user who assigned the value. Now, I think, the relationship "VALUE" has to be it's own node of type "Value" (i.e. has the Neo4j label, "Value") with an incoming relationship from both Attribute and User nodes.
The full form has Type nodes, Attribute nodes, User nodes, Value nodes, and Entity nodes, where the relationships have basically no properties on them.
Why do you need it in the first place?
I always thought that EAV was just a workaround for relational databases not being schema free.
Neo4j as other nosql databases is schema free, so you can just add the attributes that you want to both nodes and relationships.
If you need to you can also record the EAV model in a meta-schema within the graph but in most cases it is good enough if the meta-schema lives within the application that creates and uses your attributes.
Usually I treat labels as roles which in a certain context provide certain properties and relationships. A node can have many labels each of which representing one of those roles.
E.g. for the same node
:Person(name)-[:LIVES_IN]->(:City)
:Employee(empNo)-[:WORKS_AT]->(:Company)
:Developer()-[:HAS_SKILL]->(:CompSkill)
...
So in your case :Entity would just be a label that implies the name property.
And :Movie is a label that implies a release_date property and e.g. ACTED_IN relationships.
I'm new to Core Data and I'm trying to implement it into my existing project. Here is my model:
Now, there's some things that don't make sense to me, likely because I haven't modelled it correctly.
CMAJournal is my top level object with an ordered set of CMAEntry objects and an ordered set of CMAUserDefine objects.
Here's my problem:
Each CMAUserDefine object has an ordered set of objects. For example, the "Baits" CMAUserDefine will have an ordered set of CMABait objects, the "Species" CMAUserDefine will have an ordered set of CMASpecies objects, etc.
Each CMAEntry object has attributes like baitUsed, fishSpecies, etc. that point to an object in the respective CMAUserDefine object. This is so if changes are made, each CMAEntry that references that object is also changed.
Now, from what I've read I should have inverses for each of my relationships. This doesn't make sense in my model. For example, I could have 5 CMAEntry objects whose baitUsed property points to the same CMABait object. Which CMAEntry does the CMABait's entry property point to if there are 5 CMAEntry objects that reference that CMABait? I don't think it should point to anything.
What I want is for all CMAUserDefine objects (i.e. all CMABait, CMASpecies, CMALocation, etc. objects) to be stored in the CMAJournal userDefines set, and have those objects be referenced in each CMAEntry.
I originally had this working great with NSArchiving, but the archive file size was MASSIVE. I mean, 18+ MB for 16 or so entries (which included about 20 images). And from what I've read, Core Data is something I should learn anyway.
So I'm wondering, is my model wrong? Did I take the wrong approach? Is there a more efficient way of using NSArchiver that will better fit my needs?
I hope that makes sense. Please let me know if I need to explain it better.
Thanks!
E: What lead me to this question is getting a bunch of "Dangling reference to an invalid object." = "" errors when trying to save.
A. Some Basics
Core Data needs a inverse relationship to model the relationship. To make a long story short:
In an object graph as modeled by Core Data a reference semantically points from the source object to a destination object. Therefore you use a single reference as CMASpecies's fishSpecies to model a to-one relationship and a collection as NSSet to model a to-many relationship. You do not care about the type of the inverse relationship. In many cases you do not have one at all.
In a relational data base relationships are modeled differently: If you have a 1:N (one-to-many) relationship the relationship is stored on the destination side. The reason for this is, that in a rDB every entity has a fixed size and therefore cannot reference a variable number of destinations. If you have a many-to-many relationship (N:M), a additional table is needed.
As you can see, in an object graph the types of relationships are to-one and to-many only depending on the source, while in rDB the types of relationships are one-to-one, one-to-many, many-to-many depending on both source and destination.
To select the right kind of rDB modeling Core Data wants to know the type of the inverse relationship.
Type Object graph Inverse | rDB
1:1 to-one id to-one id | source or destination attribute
1:N collection to-one id | destination attribute
N:M collection collection | additional table with two attributes
B. To your Q
In your case, if a CMAEntry object refers exactly one CMASpecies object, but a CMASpecies object can be referred by many CMAEntry objects, this simply means that the inverse relationship is a to-many relationship.
Yes, it is strange for a OOP developer to have such inverse relationships. For a SQL developer, it is the usual case. Developing an ORM (object relational mapper) this is one of the problems. (I know that, because I'm doing that for Objective-Cloud right now. But I did if different, more the OOP's point of view.) Every solution is a kind of unusual for one side. Somebody called ORM the "vietnam of software development".
To have a more simple example: Modeling a sports league you will find yourself having a entity Match with the properties homeTeam and guestTeam. You want to have an inverse relationship, no not homeMatches and guestMatches, but simply matches. This is obviously no inverse. Simply add inverse relationship, if Core Data wants and don't care about it.
I'm struggling with creating a suitable Core Data model for my app. I'm hoping someone here can provide some guidance.
I have two entities -- "Goals" and "Items". The Goals entity contains only a goal description, but any goal may have any number of subgoals, and these may extend multiple levels in a tree structure. Subgoals are to be contained within the same entity, so presumably the Goal entity will contain a pointer to "parent" which will be the parent goal of any subgoal.
There will also be an "Items" entity that contains a couple of text fields and a couple of binary items, and must be linked (ideally, by a unique identifier, perhaps objectID) to the particular goal or subgoal the item(s) are related to.
I am totally fumbling with how to set this model up. I know what attributes need to be in each entity, but the relationships, particularly between goals and "subgoals", has me stumped. I don't seem to be able to turn up any good examples of tree structures in Core Data on the Internet, and even the couple of books I have on Core Data don't seem to address it.
Can anyone here help an old SQL programmer get headed the right direction with these relationships in Core Data? Thanks.
Have you tried creating a one-to-many from Goal to itself, and a one-to-one from Goal to Item? The only thing I would worry about here is circular references.
Also, read Relationships and Fetched Properties in the CoreData Programming Guide.
Here is how it is done:
You set up a to-many relationship from Goal to Item in the model editor. Don't use any ids, foreign keys etc. This is old-fashioned database thinking - you can forget about it. Here we are only dealing with an object graph. The database layer is just an implementation detail for persisting the data.
Make two more relationships in entity Goal to itself: a to-one called parent, a to-many called subGoals. Make them the inverse of each other. Simple!
QED is correct, you can create a to many relationship on goal (call it subgoals) as well as a to-one relationship on goal (call it parentGoal) and set them as inverses to each other.
Then create another to many relationship (call it items) on the goal entity, with the inverse being a to one relationship on the item entity (call it goal). Then you're all set. You don't need to link items with a unique id, just add them to the items relationship.
Also note that if you did want to give items a unique id, do not use the objectID. The objectID should only be used as a temporary id as they are not guaranteed to remain the same. In fact they will change if you ever do a Core Data migration.
One way, though not really great, is to create a another entity, say subGoal, and each goal has one subGoal and each object of subGoal has many goal.
In my system I have a relational DB table with "id" columns, and I am representing some of that same data in Neo4J.
My first approach is to make an "id" attribute in Neo which correlates to the id column.
Is there any reason that this isn't a good practice? Does it conflict technically or conceptually with the node IDs that Neo generates?
If the ids serve the purpose of uniquely distinguishing the nodes that will get generated then yes its good to have one.
But keep in mind the possibility that if your graph grows in future and say a situation arrives that another DB table needs to be modelled into graph and by any chance say some ids in the new DB table conflict with the old DB table then in that situation you will get into trouble maintaining uniqueness of node.
And node ids that neo4j generates are recommended not to be used as they are prone to be reused in case the nodes are deleted.
In case you just want to model the DB table into graph database and dont want to relate the graph data to your db table later on, you can use UUID.randomUUID().toString() to generate random unique UUIDs(extremely less probability of duplicate UUID) for ids of nodes.