How to Data model with NoSQL

How to Data model with NoSQL - asp.net-mvc

I and my team are all beginners with NoSQL, were still wearing the Entity Framework with SQL Server 2008 on a project but with the passage of time the project was getting bigger and more complex than the EF does not resolve more for us, we decided to adopt the MongoDB, but we still have many doubts due to great paradigm shift, I'll post them here to see what you guys think and your opinions.
I have the entities "Person Fisica", "Patient" and "professional" and the patient and the Professional are Person, but in a few moments the patient and the professional will be the same person ex (a professional health unit which is also patient) in SQL Server we had a patient who had a reference to the Physical Person and professional who also had a reference to the PersonWhen Patient and Professional were the same person, the two had references to the same Person, now at mongo appeared doubts, some team members here want to do the same thing pretty much, Patient and professional organizations have the Id of the Person. Now I wanted to make the patient and the professional have the full object Person, but oh how would the integrity of this? Because technically the Physical Person of the Patient would be different from the Physical Person of professional ... This and other questions are breaking our heads here, in several entities that are shared do not know if we put the entity within the object that has it or the object only takes the Id of the entity, in the same way as in relational DB. Another example: the Health Unit and the types of UnidadeDeSaude, a type Of Health Unit has several Health Units and a Health Unit has a type, the correct approach would be to place the Unit Type object within the Health Unit or just reference it by Id?
Googled a few articles, but we're still in doubt in these cases
http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
http://blog.fiesta.cc/post/11319522700/walkthrough-mongodb-data-modeling

Without being able to see exactly what you have, so speaking generally, in MongoDB you wouldn't JOIN tables in the same way you would with a RDBMS. Typically, if you have a Person entity you store the whole Person as a Person. This is a nice mapping from your code classes.
In the case where you have references to other entities, say where a single Person is shared between Patient and Professional you would do this with a foreign key reference in a RDBMS. You can do this with Mongo but Mongo won't do the JOIN for you. That would have be done by the caller. The recommended approach is to put a copy of the Person entity in both Patient AND Professional. The impact of this means that if you update the Person entity you now have to update the data in two places, but that's not necessarily as bad as it sounds. It's usually "quick" to update and you can update both 'atomically' so in practice there's little difference between that and updating a single entity except that you don't have to do the JOIN so your reads are simpler and usually faster.
The most powerful tool you have for fetching data is the Collection's (table's) index over your documents (entities) and any way you can leverage that will be the fastest way to return data. So counter intuitively, if you need to filter and process parts of a document more often than you need the whole, you are better to break it up into entities that share an indexed key. That would mean storing Person, Patient and Professional in the same collection and using two keys. One key is shared by the Person and it's derived class (Patient) and the other is a type discriminator that selects one part or the other. In other words, use the index to find whole entities, or collections of whole entities.
Aside from that, if once you have used the index to locate an entity, Person, Patient or Professional, read the whole entity and have it contain everything you need to fulfill the request without a JOIN. So whether you request the Patient or the Person (both refer to the same Person) you get the same Person data whichever object you read.
In short, you'll be replicating data in Mongo just about anywhere you'd have used a Join in SQL.
Are you able to draw what your class hierarchy looks like?

Related

Best Way to Store Contextual Attributes in Core Data?

I am using Core Data to store objects. What is the most efficient possibility for me (i.e. best execution efficiency, least code required, greatest simplicity and greatest compatibility with existing functions/libraries/frameworks) to store different attribute values for each object depending on the context, knowing that the contexts cannot be pre-defined, will be legion and constantly edited by the user?
Example:
An Object is a Person (Potentially =Employer / =Employee)
Each person works for several other persons and has different titles in relation to their work relationships, and their title may change from one year to another (in case this detail matters: each person may also concomitantly employ one or several other persons, which is why a person is an employee but potentially also an employer)
So one attribute of my object would be “Title vs Employer vs Year Ended”
The best I could do with my current knowledge is save all three elements together as a string which would be an attribute value assigned to each object, and constantly parse that string to be able to use it, but this has the following (HUGE) disadvantages:
(1) Unduly Slowed Execution & Increased Energy Use. Using this contextual attribute is at the very core of my prospective App´s core function (so it would literally be used 10-100 times every minute). Having to constantly parse this information to be able to use it adds undue processing that I’d very much like to avoid
(2) Undue Coding Overhead. Saving this contextual attribute as a string will unduly make additional coding for me necessary each time I’ll use this central information (i.e. very often).
(3) Undue Complexity & Potential Incompatibility. It will also add undue complexity and by departing from the expected practice it will escape the advantages of Core Data.
What would be the most efficient way to achieve my intended purpose without the aforementioned disadvantages?

Taking your example, one option is to create an Employment entity, with attributes for the title and yearEnded and two (to-one) relationships to Person. One relationship represents the employer and the other represents the employee.
The inverse relationships are in both cases to-many. One represents the employments where the Person is the employee (so you might name it employmentsTaken) and the other relationship represents the employments where the Person is the Employer (so you might name it employmentsGiven).
Generalising, this is the solution recommended by Apple for many-many relationships which have attributes (see "Modelling a relationship based on its semantics" in their documentation).
Whether that will address all of the concerns listed in your question, I leave to your experimentation: if things are changing 10-100 times a minute, the overhead of fetch requests and creating/updating/deleting the intermediate (Employment) entity might be worse than your string representation.

Domain service with too many repositories

I have 4 related entities:
District (id, name, municipality, zip_code)
Municipality (id, name, city)
City (id, name, province)
Province (id, name)
And I just made a domain service to get all data related to a Zip code. I need to find Districts, Municipality, City and Province related to it. So I'm injecting those 4 repos in my service. I read data from every repository, format it to (id, name) because is all the data that I need from them.
I think that there is violation of SRP, but can't find a way to do this in a better way. I have read already Refactor to Facade Service but don't think that this apply to my problem.
My questions are:
1. Should I put all those entities into an aggregation?
2. Where should be done data formating? In service in repo or another class called from service?
3. Any other better solution?
Thanks in advance

As you discovered, one repository per domain entity does not scale well. It's basically ignoring the relations between the entities.
In ddd there is a concept of an aggregate root(ar) object which is basically a master node object with associated child objects. Different domain contexts will have different ars. Functionality is usually designed around ars as opposed to individual entities.
So think in terms of having a repository support what is needed for a given ar. That means being able to do one zip code query and returning an ar consisting of a zip code root and attached districts, cities etc.
To implement you will probably need a master object containing all the individual entity database mappings as well as their relations. Again, it's the relations that are important. Each repository will have access to the complete mapping information.
You did not mention a language but in php here an example of an object relational manager that follows these concepts: http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/

Is this pattern suitable for Core Data?

The only databases I've worked with before are MySQL so the database design of CoreData is confusing me a little bit.
Briefly, the design consists of a many-to-many relationship between people and businesses. Many people can own one business. One person can own many businesses.
In this simplified design, there are 3 tables:
PERSON BUSINESS OWNED BUSINESS
------ -------- --------------
id id personID
name name businessID
email website acquisitionDate
The OwnedBusiness table is the one that's confusing me. In MySQL, this table is used to support many-to-many relationships. I understand that CoreData doesn't require this, however I have an extra field in OwnedBusiness: acquisitionDate.
Does the extra field, acquisitionDate warrant the use of the extra entity/table? If not, where would that field go?

First, Core Data is not a database, full stop.
Core Data is an object graph management framework, your model in your application.
It can persist to disk in a database. It can also persist as binary, XML and just about anything else. It does not even need to persist.
Think about Core Data as an object graph only. In your example you would have a Person entity, a Business entity and a OwnedBusiness entity.
The OwnedBusiness entity would have two relationships and one property. You would not manage the foreign keys because Core Data handles that if you end up persisting to a database. Otherwise they are object pointers.

So first of all, CoreData is not a relational db just to clear this out.
Second, I think you should have a quick look at CoreData documentation and since you are familiar with MySql it will be an easy reading and I think you will be kind of amazed by the extra features that CoreData provides.
Regarding the many-to-many relationship, CoreData support this relationship without the need of extra tables. Also the relationship are not based on ids, they are based directly on objects.
So in your case, you don't have to use the person id & business id to create the relationship, you can create the relationship in the Relationship section of your xcdatamodel, there you can set the relationship class (or Destination), an inverse to that relationship (useful thing) and of course the type of relationship (to-many, to-one).
So to answer your question, you can add it there depending on your business logic. As a short advice, pleas don't try to normalise the database as you would do on a normal MySql instance, you will loose lot of performance by normalising, this thing is often ignored by devs.

Graph Database Data Model of One Type of Object

Say I'm a mechanic who's worked on many different cars and would like to keep a database of the cars I've worked on. These cars have different manufacturers, models, and some customers have modified versions of these cars with different parts so it's not guaranteed the same model gives you the same car. In addition, I would like to see all these different cars and their similarities/differences easily. Basically the database needs to both represent the logical similarities/differences between all cars that I encounter while still giving me the ability to push/pull each instance of a car I've encountered.
Is this more set up for a relational or graph database?
If a graph database, how would you go about designing it? Each of the relationship labels would just be a 'has_a' or 'is_a_type_of'. Would you have the logical structure amongst all the cars and for each individual car have them point to the leaf nodes? Or would you have each relationship represent each specific car and have those relationships span the logical tree structure of the cars?

Alright so a "graphy" way to go about this would be to create a node type for each kind of domain object. You have a Car identified by a VIN, it can be linked to a Make, Model, and Year. You also have Mechanic nodes that [:work_on] various Car nodes. Don't store make/model/year with the Car, but rather link via relationships, e.g.:
CREATE (c:Car { VIN: "ABC"})-[:make]->(m:Make {label:"Toyota"});
...and so on.
Each of the relationship labels would just be a 'has_a' or
'is_a_type_of'.
Probably no, I'd create different relationship types unique to pairings of node types. So Mechanic -> Car would be :works_on, Car -> Model would be [:model] and so on. I don't recommend using the same relationship type like has_a everywhere, because from a modeling perspective it's harder to sort out the valid domain and ranges of those relationships (e.g. you'll end up in a situation where has_a can go from just about anything to just about anything, and picking out which has_a relationships you want will be hard).
Or would you have each relationship represent each specific car and
have those relationships span the logical tree structure of the cars?
Each car is its own node, identified by something like a VIN, not by a make/model/year. (Splitting out make/model/year later will allow you to very easily query for all Volvos, etc).
Your last question (and the toughest one):
Is this more set up for a relational or graph database?
This is an opinionated question (it attracts opinionated answers), let me put it to you this way: any data under the sun can be done both relationally and via graphs. So I could answer both yes relational, and yes graph. Your data and your domain doesn't select whether you should do RDBMS or Graph. Your queries and access patterns select RDBMS vs. graph. If you know how you need to use your data, which queries you'll run, and what you're trying to do, then with that information in hand, you can do your own analysis and determine which one is better. Both have strengths and weaknesses and many points of tradeoff. Without knowing how you'll access the data, it's impossible to answer this question in a really fair way.

Domain Driven Design: When to make an Aggregate Root?

I'm attempting to implement DDD for the first time with a ASP.NET MVC project and I'm struggling with a few things.
I have 2 related entities, a Company and a Supplier. My initial thought was that Company was an aggregate root and that Supplier was a value object for Company. So I have a Repository for company and none for Supplier.
But as I have started to build out my app, I ended up needing separate list, create, and update forms for the Supplier. The list was easy I could call Company.Suppliers, and create was horrible I could do Company.Suppliers.Add(supplier), but update is giving me a headache. Since I need just one entity and I can't exactly stick it in memory between forms, I ended up needing to refetch the company and all of the suppliers and find the one I needed to bind to it and again to modified it and persist it back to the db.
I really just needed to do a GetOne if I had a repository for Supplier. I could add some work arounds by adding a GetOneSupplier to my Company or CompanyRepository, but that seems junky.
So, I'm really wondering if it's actually a Value Object, and not a full domain entity itself.
tldr;
Is needing separate list/create/update view/pages a sign that an entity should be it's own root?

Based on your terminology I assume you are performing DDD based on Eric Evans' book. It sounds like you have already identified a problem with your initial go at modeling and you are right on.
You mention you thought of supplier as a Value Object... I suggest it is not. A Value Object is something primarily identified by its properties. For example, the date "September 30th, 2009" is a value object. Why? Because all date instances with a different month/day/year combo are different dates. All date instances with the same month/day/year combo are considered identical. We would never argue over swapping my "September 30th, 2009" for yours because they are the same :-)
An Entity on the other hand is primarily identified by its "ID". For example, bank accounts have IDs - they all have account numbers. If there are two accounts at a bank, each with $500, if their account numbers are different, so are they. Their properties (in this example, their balance) do not identify them or imply equality. I bet we would argue over swapping bank accounts even if their balances were the same :-)
So, in your example, I would consider a supplier an Entity, as I would presume each supplier is primarily identified by its ID rather than its properties. My own company shares its name with two others in the world - yet we are not all interchangeable.
I think your suggestion that if you need views for CRUDing an object then it is an Entity probably holds true as a rule of thumb, but you should focus more on what makes one object different from others: properties or ID.
Now as far as Aggregate Roots go, you want to focus on the lifecycle and access control of the objects. Consider that I have a blog with many posts each with many comments - where is/are the Aggregate Root(s)? Let's start with comments. Does it make sense to have a comment without a post? Would you create a comment, then go find a post and attach it to it? If you delete a post, would you keep its comments around? I suggest a post is an Aggregate Root with one "leaf" - comments. Now consider the blog itself - its relationship with its posts is similar to that between posts and comments. It too in my opinion is an Aggregate Root with one "leaf" - posts.
So in your example, is there a strong relationship between company and supplier whereby if you delete a company (I know... you probably only have one instance of company) you would also delete its suppliers? If you delete "Starbucks" (a coffee company in the US) do all its coffee bean suppliers cease to exist? This all depends on your domain and application, but I suggest more than likely neither of your Entities are Aggregate Roots, or perhaps a better way to think about them is that they are Aggregate Roots each with no "leaves" (nothing to aggregate). In other words, company does not control access to or control the lifecycle of suppliers. It simply has a one-to-many relationship with suppliers (or perhaps many-to-many).
This brings us to Repositories. A Repository is for storing and retrieving Aggregate Roots. You have two (technically they are not aggregating anything but its easier than saying "repositories store aggregate roots or entities that are not leaves in an aggregate"), therefore you need two Repositories. One for company and one for suppliers.
I hope this helps. Perhaps Eric Evans lurks around here and will tell me where I deviated from his paradigm.

Sounds like a no-brainer to me - Supplier should have its own repository. If there is any logical possibility that an entity could exist independently in the model then it should be a root entity, otherwise you'll just end up refactoring later on anyway, which is redundant work.
Root entities are always more flexible than value objects, despite the extra implementation work up front. I find that value objects in a model become rarer over time as the model evolves, and entities that remain value objects were usually the ones that you could logically constrain that way from day one.
If companies share suppliers then having supplier as a root entity removes data redundancy as well, as you do not duplicate the supplier definition per company but share the reference instead, and the association between Company and Supplier can be bi-directional as well, which may yield more benefits.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart