Neo4j Choice Values Design Decision - neo4j

TLDR: single and multi-select choice fields: for a generic ETL tool should I model them as nodes connected with relationships or direct labels
Detail:
I am thinking through an ETL tool/workflow that I am building that pulls from an API source to create graphs on the fly. The api generally returns data structured as individual entries that belong to "lists" such as People, Company, City, etc. with individual attributes/properties also present on the entries.
I know that I want to create node labels for each 'List', BUT the thing is that multiple list types have single and multi-select choice fields on an individual entry (such as company type for the company list. Company type options might include institution, bank, school, investor, real estate company, etc. and a company can have one or many types).
I ideally would want a generic rule that can handle all choice fields across all lists when writing to the neo4j graph. (Another choice field is "Tier" for example, with Tier 1, Tier 2, Tier 3 and Tier 4 as potential values)
Would it be best practice to
create a node label for each choice field category (i.e. a node label called "CompanyType" with X instances of that node one for each company type with a "name" property equal to the company type), and then link the Company node to its one or many CompanyType nodes via a "company_type" edge
or
create a node label for each choice field option and apply that label to the Company node in question (So a Company node might then have 5-6 labels such as Company:Tier1:Tier2:Investor:Bank)

Related

ER Model representing entities not stored in DB and user choice

I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?
First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:
First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/

How to model country, state and cities using Neo4j

I'm a building a registration form for my website(it is using Neo4j) and need to populate the country, state and city field. All these fields are inter-linked i.e depending on country, state field will be set and depending on state city will be set. I'm trying to figure out what's the best approach to model this using Neo4j. Do I need to create nodes for each country, state and city, and then create relationships between all of them? For instance, Detroit - belongs to - Michigan - belongs to - United States. What would be the best approach to handle this in Neo4j? Are there any examples to look at ? Would it be efficient to do this in Neo4j ? Or is it better to use a document based DB for that such as MongoDB?
I don't see any reason you can't do what you suggested, creating nodes for City, State, Country and wiring them up (I'm planning on doing this exact same thing with my upcoming project). This also lets you reuse those nodes in other parts of your graph, potentially allowing you to make interesting queries using common locations at faster speeds than property comparisons.
If I understand your requirements correctly, you'll have dropdowns or autocomplete fields or similar to drill down to each level (populate dropdown with countries -> populate next dropdown with states in the selected country -> populate last drop down with cities in the selected state). Just add indexes on identifier or abbreviation for quick node lookup and you're good, it should work quite fast.
If you're adding zip codes in there, that could be tricky, as you can't really model it in the same way. You'll have one-to-many relationships from both state and city to zip, and unless I'm mistaken there are a few interesting zips which can span more than a single state and/or city. Some other factors that can complicate things include 5 vs 9 digit zips (or more for other countries), and handling of zip-equivalents in other countries, as they may adhere to different logic.

Is it possible to have conditional OLAP dimension aggregators?

I have a set of OLAP cubes, in the form of a snow-flake schema, each representing one factory.
I have three concepts that for some factories clearly behave as 3 dimensions, and for other factories clearly behave as 2 dimensions.
The concepts are always the same: "products", "sales agents" and "customers".
But for some cases, I doubt if I should model it as a purely 3 dimensional cube or I should play around with some tweak or trick with a 2 dimensional cube.
Cases A and B are the ones that are clear for me, and Case C is the one that generates my wonderings.
CASE A: Clearly a 3 dimensional cube
Any agent can sell any product to any company. Several agents are resposible together for the same set of customers.
I model this case as this:
CASE B: Clearly a 2 dimensional cube
Every agent is 'responsible' for a portfolio of customers, and he can sell any product but only to his customers. The analysis is made on 'current responsability on the portfolio' so if an agent leaves the company, all his customers are reassigned to a new agent and the customer uniquely belongs to the new agent.
I model this case as this:
CASE C: My doubts
A customer may have been assigned a single agent or a set of several agents each one being responsible for a ProductCategory.
For example:
Alice manages TablesAndWoods ltd and GreenForest ltd.
Bob manages Chairs ltd and FastWheels ltd.
Carol manages Forniture ltd ONLY for ProductType = 'machinery' and also manages FrozenBottles ltd for ANY type of product.
Dave also manages Forniture ltd but ONLY for ProductType = 'consumables' and also manages HighCeilings ltd for ANY type of product.
QUESTION:
In this example "Case C":
Are customer and agent independent dimensions because Forniture ltd has relation both to Caroland Dave, so it is a 3D cube?
Or it is a 2D cube, where agent is not an independent dimension, but it is an aggregator of customer "conditioned" somehow by the ProductCategory product aggregator?
I would like to see how would you model this.
Thanks in advance.
Here is how I would model it:
Your fact table is Sales.
Your dimensions are (probably) Date, Product, Customer and Agent. This is closest to your Case A.
Collapse your snowflake (white entities) into the dimensions. The presence of these entities suggest that you should consider whether type-2 slowly changing dimensions are needed for at-time analysis.
Consider a Bridge table to capture the many-to-many relationship between Agent and Product.

Graph Database Data Model of One Type of Object

Say I'm a mechanic who's worked on many different cars and would like to keep a database of the cars I've worked on. These cars have different manufacturers, models, and some customers have modified versions of these cars with different parts so it's not guaranteed the same model gives you the same car. In addition, I would like to see all these different cars and their similarities/differences easily. Basically the database needs to both represent the logical similarities/differences between all cars that I encounter while still giving me the ability to push/pull each instance of a car I've encountered.
Is this more set up for a relational or graph database?
If a graph database, how would you go about designing it? Each of the relationship labels would just be a 'has_a' or 'is_a_type_of'. Would you have the logical structure amongst all the cars and for each individual car have them point to the leaf nodes? Or would you have each relationship represent each specific car and have those relationships span the logical tree structure of the cars?
Alright so a "graphy" way to go about this would be to create a node type for each kind of domain object. You have a Car identified by a VIN, it can be linked to a Make, Model, and Year. You also have Mechanic nodes that [:work_on] various Car nodes. Don't store make/model/year with the Car, but rather link via relationships, e.g.:
CREATE (c:Car { VIN: "ABC"})-[:make]->(m:Make {label:"Toyota"});
...and so on.
Each of the relationship labels would just be a 'has_a' or
'is_a_type_of'.
Probably no, I'd create different relationship types unique to pairings of node types. So Mechanic -> Car would be :works_on, Car -> Model would be [:model] and so on. I don't recommend using the same relationship type like has_a everywhere, because from a modeling perspective it's harder to sort out the valid domain and ranges of those relationships (e.g. you'll end up in a situation where has_a can go from just about anything to just about anything, and picking out which has_a relationships you want will be hard).
Or would you have each relationship represent each specific car and
have those relationships span the logical tree structure of the cars?
Each car is its own node, identified by something like a VIN, not by a make/model/year. (Splitting out make/model/year later will allow you to very easily query for all Volvos, etc).
Your last question (and the toughest one):
Is this more set up for a relational or graph database?
This is an opinionated question (it attracts opinionated answers), let me put it to you this way: any data under the sun can be done both relationally and via graphs. So I could answer both yes relational, and yes graph. Your data and your domain doesn't select whether you should do RDBMS or Graph. Your queries and access patterns select RDBMS vs. graph. If you know how you need to use your data, which queries you'll run, and what you're trying to do, then with that information in hand, you can do your own analysis and determine which one is better. Both have strengths and weaknesses and many points of tradeoff. Without knowing how you'll access the data, it's impossible to answer this question in a really fair way.

RoR: Tagging tags with other tags

I'm trying to prototype a system in Rails. Essentially, its an abstract relational data model that takes in user input to create nodes of information. Each node can have meta-information associated with it, so some nodes may have CreateDate and DueDate while others may have StartDate, DueDate and PersonResponsible. In this way we're simply collecting lots of notes, and attributing information that a person would want to remember in relation to that note. Easy.
What I want to do to build on that is to make each node act as a tag which can be applied to any other node, building trees that can be browsed down with every node leading you to other nodes that are relationally its children. That way you can start by showing a list of your top level nodes (those not tagged by any others) and as each item is focused on, present a list of that node's children (all the other nodes which are tagged by it).
So my question is, which rails plugins should I look into to do this?
If I understood correctly - the data model you are describing is a graph.
Unfortunately ... I haven't found a plugin that implements graphs with all the characteristics you need (acts_ as _ graph plugin cannot do it) so you could try programming the model yourself.
You will need 3 tables and 2 active record classes for it (one table is used for the many to many relationship)
Classes
1. Node
has_many_and_belongs_to :node
2. Metadata
belongs_to :node
Since you need dynamic metadata you can use 2 columns: Name (string), Data (text) but you'll have to serialize data when you put it in the Data field (since you need class information as well as the data so you can use it).
I think this model should be able to hold your data. It's up to you to program the user interface part.

Resources