I'm attempting to use Neo4j to model the relationship between projects, staff, and project roles. Each project has a role called "project manager" and a role called "director". What I'm trying to accomplish in the data model is the ability to say "for project A, the director is staff X." For my purposes, it's important that "project", "staff", and "role" are all entities (as opposed to properties). Is this possible in Neo4j? In simpler terms, are associative entities possible in Neo4j? In MySQL, this would be represented with a junction table with a unique id column and three foreign key columns, one for project, staff, and role respectively, which would allow me to identify the relationship between those entities as an entity itself. Thoughts?
#wassgren's answer is a solid one, worth considering.
I'll offer one additional option. That is, you can "Reify" that relationship. Reificiation is sort of when you take a relationship and turn it into a node. You're taking an abstract association (relationship between staff and project) and your'e turning it into a concrete entity (a Role) All of the other answer options involve basically two nodes Project and Staff, with variations on relationships between them. These approaches do not reify role, but store it as a property, or a label, of a relationship.
(director:Staff {name: "Joe"})-[:plays]->(r:Role {label:"Director"})-[:member_of]->(p:Project { name: "Project X"});
So...people don't contribute to projects directly, roles do. And people play roles. Which makes an intuitive sense.
The advantages of this approach is that you get to treat the "Role" as a first-class citizen, and assert relationships and properties about it. If you don't split the "Role" out into a separate node, you won't be able to hang relationships off of the node. Further, if you add extra properties to a relationship that is masquerading as a role, you might end up with confusions about when a property applies to the role, and when it applies to the association between a staff member and a project.
Want to know who is on a project? That's just:
MATCH (p:Project {label: "Project X"})<-[:member_of]-(r:Role)<-[:plays]-(s:Staff)
RETURN s;
So I think what I'm suggesting is more flexible for the long term, but it might also be overkill for you.
Consider a hypothetical future requirement: we want to associate roles with a technical level or job category. I.e. the project manager should always be a VP or higher (silly example). If your role is a relationship, you can't do that. If your role is a proper node, you can.
Conceptually, a Neo4j graph is built based on two main types - nodes and relationships. Nodes can be connected with relationships. However, both nodes and relationships can have properties.
To connect the Project and Staff nodes, the following Cypher statement can be used:
CREATE (p:Project {name:"Project X"})-[:IS_DIRECTOR]->(director:Staff {firstName:"Jane"})
This creates two nodes. The project node has a Label of type Project and the staff node has a Label of type Staff. Between these node there is a relationhip of type IS_DIRECTOR which indicates that Jane is the director of the project. Note that a relationship is always directed.
So, to find all directors of all project the following can be used:
MATCH (p:Project)-[:IS_DIRECTOR]->(director:Staff) RETURN director
The other approach is to add properties to a more general relationship type:
create (p:Project {name:"Project X"})<-[:MEMBER_OF {role:"Director"}]-(director:Staff {firstName:"Jane"})
This shows how you can add properties to a relationship. Notice that the direction of the relationship was changed for the second example.
To find all directors using the property based relationship the following can be used:
MATCH (p:Project)<-[:MEMBER_OF {role:"Director"}]-(directors:Staff) RETURN directors
To retrieve all role types (e.g. director) the following can be used:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
r, // The actual relationship
type(r), // The relationship type e.g. IS_DIRECTOR
r.role // If properties are available they can be accessed like this
And, to get a unique list of role names COLLECT and DISTINCT can be used:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
COLLECT(DISTINCT type(r)) // Distinct types
Or, for properties on the relationship:
MATCH
(p:Project)-[r]->(s:Staff)
RETURN
COLLECT(DISTINCT r.role) // The "role" property if that is used
The COLLECT returns a list result and the DISTINCT keyword makes sure that there are no duplicates in the list.
Related
I'm currently ramping up on graph databases and to do that am working through a set of questions to learn Cypher. However, I'm not 100% happy with the design I've chosen since I have to match relationships to nodes to make some of the queries work.
I found Neo4j: Suggestions for ways to model a graph with shared nodes but has a unique path based on some property with some suggestions that are relevant, but they involve copying nodes (repeating them) when in fact they do represent the same thing. That seems like an update issue waiting to happen.
My design currently has
(:Dept {name,floor})-[:SOLD {quantity}]->(:Item {name,type})<-[:SUPPLIES {dept,volume)]-(:Company {name,address})
As you can see, to figure out which department a company supplied an item to, I have to check the :SUPPLIES dept property. This leads to somewhat awkward queries - it feels that way to me, anyway.
I've tried other relationships, like having (:Company)-[:SUPPLIES {item,vol}]->(:Dept) but then the problem just shifts to matching :SUPPLIES relationship properties to :Item nodes.
The types of queries I am building are of the nature: Find departments that sell all of the items they are supplied.
Is there some other way to model this that I am overlooking? Or is this sort of relationship, where a supplier is related to two things, an item and a department, just something that doesn't fit the graph model very well?
You want to store and query a triangular relationship between :Dept, :Item, and :Company. This can't be accomplished by a linear relationship pattern. Comparing IDs of entities is not the Neo4j way, you would neglect the strengths of a graph database.
(Assuming that I understood your use case scenario) I would introduce an additional node of type :SupplyEvent that has relationships to :Dept, :Item, and :Company. You could also split up :SOLD relationship in a similar way, if you want relations between department, item, and, e.g., a customer.
Now, you can query all companies that supplied which items to which departments (without comparing any IDs):
MATCH (company:Company)<-[:SUPPLIED_FROM]-(se:SupplyEvent)-[:SUPPLIED_TO]->(dept:Dept),
(se)-[:SUPPLIED]->(item:Item)
RETURN company, item, dept
I have a network of nodes which represent People which are connected by relationships (Emails).
The Receiver of the Email is m.slug
Based on: ()-[r]-(m)
I wish to split the property (in this case "Sender" / m.slug ie Larry#google.com)and create a new node "Google.com" AS Company (i.e so I now have a set of Company nodes created from the existing information).
I want to then link Google (the company) to my Person Node (Larry#Google.com).
--
How would you approach this, without creating multiple duplicate Company nodes? (i.e for Sergey#Google.com & Larry#Google.com should be connected to the same Google.com Company Node).
Visual Representation of People and relationships
Example syntax of queries and relationship properties
This is how you can ensure there is a unique Company node for each email address domain name, and associate it (via an AT relationship) with each Person with an email address in that domain. The domain name is lower-cased before storing, to ensure uniqueness, since email addresses frequently come with different casing.
MATCH (n:Person)
MERGE (c:Company {name: TOLOWER(SPLIT(n.slug, '#')[1])})
CREATE (n)-[:AT]->(c);
NOTE: The above query should only be executed ONCE, since the CREATE clause would create the relationship every time, even if it already exists. You can replace CREATE with MERGE if you need to run the query multiple times.
Consider Person nodes and Item nodes.
What is the best way to prevent having both 'Purchased' type relationships and 'Bought' type relationships in the graph that have the same meaning, but are simply named differently?
E.g. if we end up with our graph in a state like:
(Alice) -[Bought] -> (Pickles)
(Bob) -[Purchased]-> (Pickles)
and I want to know everyone who has bought a jar of pickles. Clearly someone made a mistake when creating one of these relationships. How do I prevent that class of mistake?
Limit the relationships a user can create to a specific set of names, and don't allow any other relationship names.
The Entity-Attribute-Value (EAV) model is really powerful, but complex to implement using SQL, so people often look for alternatives to EAV. It seems like the perfect candidate for graph databases. I understand how to build a movie database where you have nodes with the Neo4j label "Movie" with the property "release_date" right on the node. How would you make this more generic, such that movies have the Neo4j label "Entity" following the general EAV model?
I've thought a lot about this, but I'm not confident I have a good solution. I'll take a stab at it anyway. Here's the most basic model:
<node> <relationship> <node>
Attribute --> :VALUE --> Entity
name="Label",type="string" --> value="Movie" --> name="The Matrix"
With this model, you can write code for how to display and edit Attribute.type. For example, maybe all labels have a text field with finite options on the front-end and all dates have a date-picker. You could break Attribute.type out into its own node, Type, if that was preferable (particularly would make sense for handling composite types). In that case, you have the relationship TYPE between Attribute and Type nodes.
This becomes a problem if entities have multiple relationships, as is the case for reviews or if you want to relate the value to something else, such as the user who assigned the value. Now, I think, the relationship "VALUE" has to be it's own node of type "Value" (i.e. has the Neo4j label, "Value") with an incoming relationship from both Attribute and User nodes.
The full form has Type nodes, Attribute nodes, User nodes, Value nodes, and Entity nodes, where the relationships have basically no properties on them.
Why do you need it in the first place?
I always thought that EAV was just a workaround for relational databases not being schema free.
Neo4j as other nosql databases is schema free, so you can just add the attributes that you want to both nodes and relationships.
If you need to you can also record the EAV model in a meta-schema within the graph but in most cases it is good enough if the meta-schema lives within the application that creates and uses your attributes.
Usually I treat labels as roles which in a certain context provide certain properties and relationships. A node can have many labels each of which representing one of those roles.
E.g. for the same node
:Person(name)-[:LIVES_IN]->(:City)
:Employee(empNo)-[:WORKS_AT]->(:Company)
:Developer()-[:HAS_SKILL]->(:CompSkill)
...
So in your case :Entity would just be a label that implies the name property.
And :Movie is a label that implies a release_date property and e.g. ACTED_IN relationships.
Maybe it is a long shot but worth trying...
I have the following relation User1-[:MATCHED]-User2, I want to allow other users to give feedback (Like) on that relation, I am guessing that the obvious answer is to define new node from type Match which will be created for every two matched users and then relate to that node with LIKE relation from each user who liked the match.
I am trying to think about other way to model that in the graph without the overhead of creating new node for each match...
Can a relation relate to other nodes except the start/end nodes?
Any help will be appreciated thanks.
Neo4j does not support hypergraphs or relationships to relationships. Modelling your MATCHED relationship with a node is probably the way to go.
An alternative is to reference the relationship id from another node:
User1-[MATCHED]->User2 (where MATCHED has the id xyz)
User3-[LIKES]->Relationship(relId = xyz)
The "Relationship" node would contain the id of the MATCHED relationship as a property. This relId property would need to be indexed to find all LIKES of a given MATCHED relationship.
This solution is not well suited for traversals though.