How to design a graph database in this scenario?

How to design a graph database in this scenario? - neo4j

here is my scenario. I have a pre-defined data type structure for books. Just take it as an example for the sake of simplicity. The structure looks like the image below. It's a Labeled Property Graph and the information is self-explained. This data type structure is fixed, I cannot change it. I just use it.
When there is 1 book, let's call it Harry Potter, in the system, it might look like below:
So, the book has its own property (ID, Name,...) and also contains a field type MandatoryData. By looking at this graph, we can know all information about the book.
The problem happens when I have 2 books in the system, which looks like this:
In this case, there is another book called Graph DB, with those information as highlighted.
The problem of this design is: we don't know which information belong to which book. For example, we cannot distinguish the publishedYear anymore.
My question is: how to solve or avoid this problem? Should I create 1 MandatoryData for each book? Could you propose me any design?
I'm using Neo4j and Cypher. Thank you for your help!

UPDATE
From the comments (by #AnhTriet):
Thanks for your suggestion. But I want to have some sort of connection
between those books. If we create new MandatoryData, those books will
be completely separated. (...) I meant, 2 books should point to some
same nodes if they have the same author or published year, right?
After some clarification in the comments, I suggest the creation of a MandatoryData node for each property in the database. Then you will connect a given book to various MandatoryData nodes, depending on the number of properties of the book.
This way two books with the same author will be connected to the same MandatoryData node.
Since you cannot change the data model, I strongly recommend you to create a new MandatoryData node for each new book added to the system.
This way you will be able to get informations about the an specific book with queries like:
// Get the author's name of the book with ID = 1
MATCH (:Book {ID : 1})-->(:MandatoryData)-->(:Author)-->(:Name)-->(v:Value)
RETURN v.value
The model proposed in your question is not viable since has no way to identify the owner of an specific property, as indicated by you.

Related

Neo4j relationship property - array of values vs duplicate relationship

Lets say I have two nodes of type (: City).
Whats the better approach to store relationahip when some user walks from one city to another?. Please note that for this case we want to save day value (NOT daytime) and name of user.
Lets say that the same user walks between same two cities on 5 different day
What I have been thinking about is:
1) there will be a new relationahip each time when user walks from City(a) to City(b). However, that would create 5 different relationships with pretty much the same content (same user name in this case), only day value will be different.
2) there will be one relationship for each user and single data type(long) for property day will be replaced with an array of long values.
What do you think is better approach? Either create multiple relationships even if they share most of its properties, or create a one relationship with those shared properties and put variable ones into an array.
If you have any other ideas or suggestions please write them :)
Thanks
c.

In the spirit of the graphs, you can:
have a node for city
have a node for user
have a node for a fact of walking

Core Data Relationship For Unidirectional One to Many

What is the best practice for creating Unidirectional One to Many Relationships in Core Data?
For example...
Lets take two classic entity examples, "teacher" and "student".
Each student has one teacher, and each teacher has many students.
In CoreData right now you are forced to provide an inverse such that teacher is forced to have a reference to a 'student'. If you don't you get this nice warning that says something along the lines of...
file:///Users/josephastrahan/Documents/VisualStudioProjects/Swift3WorkOrders/WorkOrders/WorkOrders/WorkOrders.xcdatamodeld/WorkOrders.xcdatamodel/: warning: Misconfigured Property: Teacher.student should have an inverse
What if I don't want teacher to have a reference to student?
Some other posts have brought up that I should just allow the inverse anyways but I think this inverse may be causing an issue with one of my projects.
That said let me explain my exact issue.
Lets say that our teacher has a unique attribute int64 called 'id'. Lets say the students also have unique attribute int64 called 'id'.
The int64 is enforced to be unique by adding a constraint on the model for teacher on id. (refer to image below to see how that is done)
Every year there is new students but the teachers stay the same. So I decided that I want to delete all the students without deleting the reference to the teacher. So I set the delete rule to 'nullify' for the relationship for the teacher to student and 'nullify' for the student to teacher.
Now when I create a new student I want to assign one of the existing teachers to that student... (something like student.teacher = teacher object with id of 1 or the same id as before) however!! , because the teacher has the inverse relationship to a student that no longer exists (which in theory should be null) the program crashes!
I know this is the case as I've used print console logs to narrow it down the exact point that it occurs. Also I know this because if I add the delete rule of cascade for student the crash will go away but...then I lose my teacher! which I don't want...
Some things that I think might be the issue:
1.) When I do my testing I do it at the startup of the program which creates a new context everytime. Could it be that because I never deleted teacher it still thinks it refers to a student from a context that no longer exists? (if I'm even saying this right...)
I'm not sure the best solution to acheive what I'm trying to do with Coredata and any advice is much appreciated!
Note:
Forgot to mention I also have the Merge Policy of: NSMergeByPropertyObjectTrumpMergePolicy, which will overwrite the old data with the new. When I'm creating new students I'm creating new teachers also just using the same id which should follow this policy.

You are almost there.
The advice to keep the inverse relationship is a good one. Keep it.
Your issue is likely caused by different contexts. Instead of holding on to a teacher object in memory, you should fetch the teacher (based on the id) in the context in which you intend to use it.
Your nullified students should not have any impact. A to-many relationship is really a Set<Student>. Make sure the set is empty.
NB:
If you want to keep the student in the database (for historical purposes) - it seems from your description that this is the case - you might also consider another scheme: give your students another attribute (such as a year) and use that to filter the student list. You would not have to delete or nullify anything. You could also do some more interesting time-based queries on the data.

Unique Constraints are available with iOS9. Which have helped iOS Developers with adding and updating records in CoreData.
Unique Constraints make sure that records in an Entity are unique by the given fields. But unique constraints along with To-Many relationship leads to a lot of weird issues while resolving conflicts.
e.g. “Dangling reference to an invalid object.”
This post is basically focused to a small problem that may take days to fix.
http://muhammadzahidimran.com/2016/12/08/coredata-unique-constraints-and-to-many-relationship/

Preventing duplicate NSManagedObjects

Consider two entities Author and Book that are in a many-to-many relationship that are imported into my CoreData store from an external database. What I am confused about is, should I create a new NSManagedObject for each author, even if this author is already in the store? How do I even know that two authors with the same name are the same person? I could for instance end up with 10 John Smiths, and 5 of them are the same person, but there is no way to check this when importing the data, right? Suppose I want to do a fetchrequest for one of these John Smiths, I will still get 10 results. He may also appear as J. Smith, or J.A. Smith. But J. Smith could also be Jenny Smith.
Should I just create an NSManagedObject for each author, and not worry about possible duplicates, or are there other ways around this?

How do I even know that two authors with the same name are the same person?
You don't, and that's the core of your problem right there. You need to allow duplicate names, because names are (usually) not unique. Any technical solution to avoiding or removing duplicates based on name is virtually guaranteed to corrupt your data.
It's not clear where your data is coming from, so it's hard to say what the best fix is. If this is user-entered data, let the user edit an existing author to add or remove titles, to prevent a duplicate. Offer the option to merge two entries in case the user accidentally creates a duplicate.
If the data comes from an online service of some kind, you pretty much have to take what they give you. If they have duplicate entries for authors, you can't reliably do anything about it. You could easily find duplicate names, but that doesn't mean they're the same person.

use a fetch or create pattern as explained in the apple CoreData docs
Core Data doesnt have an implicit uniquing algorithm.
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html
(they call it find or create) ;)

In order to disambiguate people (or authors) you would need either a "unique" attribute, say an author_id which is guaranteed to be unique when an author will be created.
The other approach is to use heuristics to determine if an object has possibly duplicates This second approach sounds more complex, and actually it IS more complex ;)
Unfortunately, Core Data does not support "unique attributes" (unique keys).
Both approaches can be implemented as proper managed object "validations", which get invoked when the context will be saved.
A sophisticated solution would use a separate index maintained per unique attribute and per context. Using Core Data queries as shown in the sample snippets "Implementing Find-or-Create Efficiently" in order to confirm that the "unique constraint" is fulfilled each time the context is saved, will become quite slow for large data sets.

With iOS 9, Apple introduced unique constraints to Core Data. Now you can specify an attribute that has to be unique.

datamodel connections

I'm struggling with a new application where I have a User model wich has several associations with itself.
For example a user can have students / parents / administrators, but all of those associations are users as well.
My idea was to create a connection model where I specify the associations id's and the association type. Unfortunately I dont know how to implement this.
Any help would be much appreciated.
Thank you!

When a model references itself, it is a self-join. See here. and also google for "self join".
Re: "connection model" needed?
Answer: Rather than "connection model," better terms are "many to many table" or "junction table"
A many to many table is only needed if your data has a many to many relationship. Otherwise, you just need a one to one or many to one relationship.
"A user can have students" The key question is, can one student also have many "users"? If so, then you need a many to many table, otherwise not.
For parents, you could say that a user has exactly zero or one father. If so then a many to many table is not needed.
Edited: Oops, I realize that I no longer know this "cold". I'd have to experiment with sample code to get it right. And unfortunately I don't have the time right now. My apologies.
See Self-Joins doc

Domain Driven Design: When to make an Aggregate Root?

I'm attempting to implement DDD for the first time with a ASP.NET MVC project and I'm struggling with a few things.
I have 2 related entities, a Company and a Supplier. My initial thought was that Company was an aggregate root and that Supplier was a value object for Company. So I have a Repository for company and none for Supplier.
But as I have started to build out my app, I ended up needing separate list, create, and update forms for the Supplier. The list was easy I could call Company.Suppliers, and create was horrible I could do Company.Suppliers.Add(supplier), but update is giving me a headache. Since I need just one entity and I can't exactly stick it in memory between forms, I ended up needing to refetch the company and all of the suppliers and find the one I needed to bind to it and again to modified it and persist it back to the db.
I really just needed to do a GetOne if I had a repository for Supplier. I could add some work arounds by adding a GetOneSupplier to my Company or CompanyRepository, but that seems junky.
So, I'm really wondering if it's actually a Value Object, and not a full domain entity itself.
tldr;
Is needing separate list/create/update view/pages a sign that an entity should be it's own root?

Based on your terminology I assume you are performing DDD based on Eric Evans' book. It sounds like you have already identified a problem with your initial go at modeling and you are right on.
You mention you thought of supplier as a Value Object... I suggest it is not. A Value Object is something primarily identified by its properties. For example, the date "September 30th, 2009" is a value object. Why? Because all date instances with a different month/day/year combo are different dates. All date instances with the same month/day/year combo are considered identical. We would never argue over swapping my "September 30th, 2009" for yours because they are the same :-)
An Entity on the other hand is primarily identified by its "ID". For example, bank accounts have IDs - they all have account numbers. If there are two accounts at a bank, each with $500, if their account numbers are different, so are they. Their properties (in this example, their balance) do not identify them or imply equality. I bet we would argue over swapping bank accounts even if their balances were the same :-)
So, in your example, I would consider a supplier an Entity, as I would presume each supplier is primarily identified by its ID rather than its properties. My own company shares its name with two others in the world - yet we are not all interchangeable.
I think your suggestion that if you need views for CRUDing an object then it is an Entity probably holds true as a rule of thumb, but you should focus more on what makes one object different from others: properties or ID.
Now as far as Aggregate Roots go, you want to focus on the lifecycle and access control of the objects. Consider that I have a blog with many posts each with many comments - where is/are the Aggregate Root(s)? Let's start with comments. Does it make sense to have a comment without a post? Would you create a comment, then go find a post and attach it to it? If you delete a post, would you keep its comments around? I suggest a post is an Aggregate Root with one "leaf" - comments. Now consider the blog itself - its relationship with its posts is similar to that between posts and comments. It too in my opinion is an Aggregate Root with one "leaf" - posts.
So in your example, is there a strong relationship between company and supplier whereby if you delete a company (I know... you probably only have one instance of company) you would also delete its suppliers? If you delete "Starbucks" (a coffee company in the US) do all its coffee bean suppliers cease to exist? This all depends on your domain and application, but I suggest more than likely neither of your Entities are Aggregate Roots, or perhaps a better way to think about them is that they are Aggregate Roots each with no "leaves" (nothing to aggregate). In other words, company does not control access to or control the lifecycle of suppliers. It simply has a one-to-many relationship with suppliers (or perhaps many-to-many).
This brings us to Repositories. A Repository is for storing and retrieving Aggregate Roots. You have two (technically they are not aggregating anything but its easier than saying "repositories store aggregate roots or entities that are not leaves in an aggregate"), therefore you need two Repositories. One for company and one for suppliers.
I hope this helps. Perhaps Eric Evans lurks around here and will tell me where I deviated from his paradigm.

Sounds like a no-brainer to me - Supplier should have its own repository. If there is any logical possibility that an entity could exist independently in the model then it should be a root entity, otherwise you'll just end up refactoring later on anyway, which is redundant work.
Root entities are always more flexible than value objects, despite the extra implementation work up front. I find that value objects in a model become rarer over time as the model evolves, and entities that remain value objects were usually the ones that you could logically constrain that way from day one.
If companies share suppliers then having supplier as a root entity removes data redundancy as well, as you do not duplicate the supplier definition per company but share the reference instead, and the association between Company and Supplier can be bi-directional as well, which may yield more benefits.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart