Neo4j Cypher - how to query 'inherited' reations on a child-node? - neo4j

I want to use Cypher to query all specifications that are valif for a product, but with the specifications defined at different parent levels of the product.
I have a data model that represents a product categorisation tree with levels C1, C2, C3, ... and at the lowest level products P. To simplify the maintenance and data-entry of product specifications, the validity of product specifications is defined at the categorisation levels. The products 'inherit' the specifications that are valid for all their parent categories, up to the root of the categorisation tree.
The (simplified) data-model is shown in the image. In this case product specifications are defined for categorisation levels C1, C2 and C3. The product is connected to the lowest categorisation level C3.
My objective is to query all specifications that are valid for product P, based on their relations to the categorisation levels C1, C2 and C3.
I have the following questions:
Is this possible with a single Cypher query?
What is the best query strategy in a large database? Use a query? Create real relations for all valid specifications for a product instead of querying the 'inherited' specifications?
Change the datamodel?
Other tips?
thanks

You could find all specifications of a product by MATCHing patterns of variable length.
Assuming you have a parameter productId, you would use something like this
MATCH (p:PRODUCT {productId:$productiId)-[:BELONGS_TO*]->(c:Category)<-[:VALID_FOR]-(s:Specification)
RETURN s
to retrieve the relevant specifications.
Since you seem to be working on bills of materials, some things you may want to look at:
Management of complex product specifications by splitting it up in "atoms" https://www.slideshare.net/neo4j/graphtour-neo4j-murrelektronik
and
An example of how you can keep track of versions of your BOM :
https://www.youtube.com/watch?v=7iMraBHtTqE
Disclosure : I'm a member of the Graphileon team, and involved in what is shown in the slide deck and video.

Related

Developing graph database model for department/supplier/items

I'm currently ramping up on graph databases and to do that am working through a set of questions to learn Cypher. However, I'm not 100% happy with the design I've chosen since I have to match relationships to nodes to make some of the queries work.
I found Neo4j: Suggestions for ways to model a graph with shared nodes but has a unique path based on some property with some suggestions that are relevant, but they involve copying nodes (repeating them) when in fact they do represent the same thing. That seems like an update issue waiting to happen.
My design currently has
(:Dept {name,floor})-[:SOLD {quantity}]->(:Item {name,type})<-[:SUPPLIES {dept,volume)]-(:Company {name,address})
As you can see, to figure out which department a company supplied an item to, I have to check the :SUPPLIES dept property. This leads to somewhat awkward queries - it feels that way to me, anyway.
I've tried other relationships, like having (:Company)-[:SUPPLIES {item,vol}]->(:Dept) but then the problem just shifts to matching :SUPPLIES relationship properties to :Item nodes.
The types of queries I am building are of the nature: Find departments that sell all of the items they are supplied.
Is there some other way to model this that I am overlooking? Or is this sort of relationship, where a supplier is related to two things, an item and a department, just something that doesn't fit the graph model very well?
You want to store and query a triangular relationship between :Dept, :Item, and :Company. This can't be accomplished by a linear relationship pattern. Comparing IDs of entities is not the Neo4j way, you would neglect the strengths of a graph database.
(Assuming that I understood your use case scenario) I would introduce an additional node of type :SupplyEvent that has relationships to :Dept, :Item, and :Company. You could also split up :SOLD relationship in a similar way, if you want relations between department, item, and, e.g., a customer.
Now, you can query all companies that supplied which items to which departments (without comparing any IDs):
MATCH (company:Company)<-[:SUPPLIED_FROM]-(se:SupplyEvent)-[:SUPPLIED_TO]->(dept:Dept),
(se)-[:SUPPLIED]->(item:Item)
RETURN company, item, dept

Relational Algebra Join - necessary to rename?

Let's assume i have some 2 simple tables:
IMPORTANT: This is about relational algebra, not SQL.
Band table:
band_name founded
Gambo 1975
John. 1342
Album table:
album_name band_name
Celsius. Gambo
Trambo Gambo
Now, since the Band and the Album table share the same column name "band_name", would it be necessary to rename it when i would join them?
As far as i know, the join eliminates the duplicate entry that is shared amongst the join. This example, where i simply pick all Bands that are existing in the Album table (obviously just 'Gambo' in this giving example)
Πfounded, band_name(Band ⋈ Album)
should therefore work fine, right? Can somebody confirm?
(Have to enter a caveat that there are many variants of Relational Algebra; that they differ in semantics; and they differ in syntax. Assuming you intend a variant similar to that in wikipedia ...)
Yes that expression should work fine. The natural join operator ⋈ matches same-named attributes between its two operands. So the subexpression Band ⋈ Album produces a result with attributes {band_name, founded, album_name}. Your expression projects two of those.
Note the attributes for a relation value are a set not a sequence; therefore any operation over relation operands with same-named attributes must match attributes.
In contrast, Cartesian Product × requires its operands to have disjoint attribute names. Then Band × Album is ill-formed and would be rejected. (So you'd need to Rename band_name in one of them, to get relations that could be operands.)
I'm not all that happy with your way of putting it "the join eliminates the duplicate entry that is shared amongst the join." Because only in SQL do you get a duplicate (from SELECT * FROM Band, Album ... -- which results in a table with four columns, of which two are named band_name). SQL FROM list of tables is a botch-up: neither join nor Cartesian Product, but something trying to be both, and succeeding only in being neither. RA's ⋈ never produces a "duplicate" so never does it "eliminate" anything.
Particularly if there's Keys declared and a Foreign Key constraint (from Album's band_name to Band's) I see those as identifying the same band, then the natural operation is to bring together that which has been taken apart, so the name 'Natural Join'.

ER Model representing entities not stored in DB and user choice

I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?
First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:
First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/

Select rows with "one of each" in relational algebra

Say I have a Personstable with attributes {name, pet}. How do I select the names of people where they have one of each kind of pet (dog, cat, bird), but a person only has one of each kind of pet if they pet is in the table.
Example: Bob, Dog and Bob, Cat are the only rows in the table. Therefore, Bob has one of each kind of pet. But the moment Lynda, Bird are added, Bob doesn't have one of each type of pet anymore.
I think the first step to this is to π(pet). You get a list of all kinds of pets since relational algebra removes duplicates. Not sure what to do after this, but I have think I need to join π(pet) and Persons.
I've tried a few things like Natural Join and Cross products but I haven't arrived at a result yet and I'm out of ideas.
The answer to the question can be found with the Division operator:
Persons ÷ πpet(Persons)
This relational algebra expression returns a relation with only the column name, containing all the names of the persons that have all the different kind of pets currently present in the Persons table itself.
The division is an operator that, in some sense, is the inverse of the product operator (the name is derived exactly from this fact). It is a derived operator that can be defined in terms of projection, set difference and product (see for instance this answer).

Relating an entity to a relationship proper in Neo4j?

I'm attempting to use Neo4j to model the relationship between projects, staff, and project roles. Each project has a role called "project manager" and a role called "director". What I'm trying to accomplish in the data model is the ability to say "for project A, the director is staff X." For my purposes, it's important that "project", "staff", and "role" are all entities (as opposed to properties). Is this possible in Neo4j? In simpler terms, are associative entities possible in Neo4j? In MySQL, this would be represented with a junction table with a unique id column and three foreign key columns, one for project, staff, and role respectively, which would allow me to identify the relationship between those entities as an entity itself. Thoughts?
#wassgren's answer is a solid one, worth considering.
I'll offer one additional option. That is, you can "Reify" that relationship. Reificiation is sort of when you take a relationship and turn it into a node. You're taking an abstract association (relationship between staff and project) and your'e turning it into a concrete entity (a Role) All of the other answer options involve basically two nodes Project and Staff, with variations on relationships between them. These approaches do not reify role, but store it as a property, or a label, of a relationship.
(director:Staff {name: "Joe"})-[:plays]->(r:Role {label:"Director"})-[:member_of]->(p:Project { name: "Project X"});
So...people don't contribute to projects directly, roles do. And people play roles. Which makes an intuitive sense.
The advantages of this approach is that you get to treat the "Role" as a first-class citizen, and assert relationships and properties about it. If you don't split the "Role" out into a separate node, you won't be able to hang relationships off of the node. Further, if you add extra properties to a relationship that is masquerading as a role, you might end up with confusions about when a property applies to the role, and when it applies to the association between a staff member and a project.
Want to know who is on a project? That's just:
MATCH (p:Project {label: "Project X"})<-[:member_of]-(r:Role)<-[:plays]-(s:Staff)
RETURN s;
So I think what I'm suggesting is more flexible for the long term, but it might also be overkill for you.
Consider a hypothetical future requirement: we want to associate roles with a technical level or job category. I.e. the project manager should always be a VP or higher (silly example). If your role is a relationship, you can't do that. If your role is a proper node, you can.
Conceptually, a Neo4j graph is built based on two main types - nodes and relationships. Nodes can be connected with relationships. However, both nodes and relationships can have properties.
To connect the Project and Staff nodes, the following Cypher statement can be used:
CREATE (p:Project {name:"Project X"})-[:IS_DIRECTOR]->(director:Staff {firstName:"Jane"})
This creates two nodes. The project node has a Label of type Project and the staff node has a Label of type Staff. Between these node there is a relationhip of type IS_DIRECTOR which indicates that Jane is the director of the project. Note that a relationship is always directed.
So, to find all directors of all project the following can be used:
MATCH (p:Project)-[:IS_DIRECTOR]->(director:Staff) RETURN director
The other approach is to add properties to a more general relationship type:
create (p:Project {name:"Project X"})<-[:MEMBER_OF {role:"Director"}]-(director:Staff {firstName:"Jane"})
This shows how you can add properties to a relationship. Notice that the direction of the relationship was changed for the second example.
To find all directors using the property based relationship the following can be used:
MATCH (p:Project)<-[:MEMBER_OF {role:"Director"}]-(directors:Staff) RETURN directors
To retrieve all role types (e.g. director) the following can be used:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
r, // The actual relationship
type(r), // The relationship type e.g. IS_DIRECTOR
r.role // If properties are available they can be accessed like this
And, to get a unique list of role names COLLECT and DISTINCT can be used:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
COLLECT(DISTINCT type(r)) // Distinct types
Or, for properties on the relationship:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
COLLECT(DISTINCT r.role) // The "role" property if that is used
The COLLECT returns a list result and the DISTINCT keyword makes sure that there are no duplicates in the list.

Resources