Node based properties on a relationship - neo4j

I'm starting out with Neo4J to create a graph of users and their relationships. At the moment there is a single 'KNOWS' relationship between users i.e.
What I want to do now is specify properties on the relationship specifically for each of the users. For example, "interest" which indicates how much a user is interested in the other user. Can I specify this for each user on a single KNOWS relationship or would I need to create two relationships between the users and set the attribute on each of the relationships?
Any help would be appreciated.

Can I specify this (property: interest) for each user on a single KNOWS relationship or would I need to create two relationships between the users and set the attribute on each of the relationships?
You will need two relationships.
You could do it with one but then you have to keep two properties in the relationship and information about which property goes with which node. Much easier with two relationships.
From comment:
Can I keep them as bi-directional or would I need to use directional
in this case?
Relationships are always directional. It is only when you query that the concept of bi-directional appears, but that is not really bi-directional, it is without direction, e.g. (a)-[r]-(b). So you would use (a)-[r]->(b) and (b)-[r]->(a) or (a)<-[r]-(b). If you query with the direction, then you know how to apply the relationship property.
I typically do more of my work with Java as an embedded application instead of Cypher and it pays to use directional queries as it makes for less code to do the associations.
Note
Since your case is so simple, just try various methods and see what works. Remember to keep track of how long the quires take and if necessary add indexes. Also use the query profiling tool to make sure you are making effective queries.

Related

Developing graph database model for department/supplier/items

I'm currently ramping up on graph databases and to do that am working through a set of questions to learn Cypher. However, I'm not 100% happy with the design I've chosen since I have to match relationships to nodes to make some of the queries work.
I found Neo4j: Suggestions for ways to model a graph with shared nodes but has a unique path based on some property with some suggestions that are relevant, but they involve copying nodes (repeating them) when in fact they do represent the same thing. That seems like an update issue waiting to happen.
My design currently has
(:Dept {name,floor})-[:SOLD {quantity}]->(:Item {name,type})<-[:SUPPLIES {dept,volume)]-(:Company {name,address})
As you can see, to figure out which department a company supplied an item to, I have to check the :SUPPLIES dept property. This leads to somewhat awkward queries - it feels that way to me, anyway.
I've tried other relationships, like having (:Company)-[:SUPPLIES {item,vol}]->(:Dept) but then the problem just shifts to matching :SUPPLIES relationship properties to :Item nodes.
The types of queries I am building are of the nature: Find departments that sell all of the items they are supplied.
Is there some other way to model this that I am overlooking? Or is this sort of relationship, where a supplier is related to two things, an item and a department, just something that doesn't fit the graph model very well?
You want to store and query a triangular relationship between :Dept, :Item, and :Company. This can't be accomplished by a linear relationship pattern. Comparing IDs of entities is not the Neo4j way, you would neglect the strengths of a graph database.
(Assuming that I understood your use case scenario) I would introduce an additional node of type :SupplyEvent that has relationships to :Dept, :Item, and :Company. You could also split up :SOLD relationship in a similar way, if you want relations between department, item, and, e.g., a customer.
Now, you can query all companies that supplied which items to which departments (without comparing any IDs):
MATCH (company:Company)<-[:SUPPLIED_FROM]-(se:SupplyEvent)-[:SUPPLIED_TO]->(dept:Dept),
(se)-[:SUPPLIED]->(item:Item)
RETURN company, item, dept

In Neo4j, is it possible to have the same relationship name for different entities

Let`s use the movie DB as an example.
If I would insert all people that worked on a movie in the DB, it would be difficult to find relationship names for everyone. Would it be a problem to have entities like: sound_designer, sound_engineer, set_designer, set_assistance, cable_guy, etc with the same relationship "WORKS_IN" to a Movie entity. Is it possible? Is it a good solution? Would I have problems? Are there alternatives?
Gabor's answer in the comments is a good one, there are no problems with nodes of differing labels having relationships of the same type to the same node.
Multi-labeling nodes with their role isn't a bad idea, however that assumes that a person's role is constant throughout the years captured by the graph, which may not hold true. Or rather, the labels would capture what roles they have been in their entire history, but what specific role they played within a particular movie is likely something you want as a property of the relationship itself, like a role property. Which might even be a list, if a person might have multiple roles for the same movie, similar to actors playing a part (where there is a roles list property on :ACTED_IN relationships).

Better way to model RATED relationship in neo4j movie graph database

I want to know which is better approach to model [:RATED] relationship in movie database in Nneo4J? I can think of following two approaches:
Approach 1 feels more straighforward and somehow design academically more correct.
However, approach 1 requires n (:Movie) nodes. One might say that approach 2 looks more natural as graph can contain only one (:Movie) node for a particular movie ("The Matrix" in this case) which can exists regardless whether anyone rates it or not. However I feel it less comfortable to store rating values on [:RATED] relationship. Is it correct looking in purely design perspective?
Also what if we are dealing with a node which does not represent an entity. For example bunch of cars replacing users in above image and accident replacing "The Matrix". In this case (:Accident) node may not exist by default, but only created when accident occurs. Also accident faced by two different cars are different instances of (:Accident) and have many attributes associated with them like time, place etc. In this case it makes more design sense to create separate (:Accident) node for each car whenever it encounters accident and have its properties associated with it instead of having single (:Accident) and have properties associated with relationships pointing from(:Car) to (:Accident). But then it will create a lot of (:Accident) nodes. What will be best approach for this scenario in design perspective and performance perspective?
Summarizing:
Is approach 2 perfectly fine in design perspective? (Especially storing properties on relationships which might have been stored on nodes instead)
What are possible design, performance drawbacks of approach 2?
In general, whatever approach you choose to use should fit your use cases and queries.
Given your example, approach 2, using one Matrix :Movie node, is perfectly fine design given the use cases of tracking movie ratings. This is the same approach used in the Movie graph you can load up in Neo4j. Try that out, and note that the graph would be chaotic and difficult to query if there were multiple separate :Movie nodes for every single relationship to a :Movie.
You'll note that in approach 1, there is absolutely nothing different between each of the Matrix :Movie nodes. That's a strong indicator that you should be modeling the thing as a single node instead of multiple. It's also more difficult to query if you're using multiple nodes for the same thing, as the database can no longer use a single node as a starting point for the movie to get data based on relationships from it. Your queries about the movie itself also become slightly more complicated, in that you will need to add LIMIT 1 when matching to the movie by name, otherwise the query will match to all the multiple Matrix movies, which could be in the thousands or more depending on how many ratings there are.
Even though some of the other queries you might use for this model are going to use similar Cypher, or even the same Cypher queries, you will be needlessly impacting db operations through this data model. Consider an average rating query. With a single Matrix :Movie node, it's a matter of matching on the single :Movie node (by indexed or unique name), then taking the average of all its relationships. With multiple Matrix :Movie nodes, your match will match on thousands (or more) redundant nodes, and for all of those nodes it will need to pull those relationships and average them together. That's a ton of db hits you didn't need to do.
Also, keep in mind the difficulty of using this approach when combining this for other use cases. For example, consider if we had to change your data model to include actors and directors, similar to the movie db you can import in neo4j. If we had multiple nodes for every single rating for every single movie, which node would we use when creating relationships between actors and directors and the movie they worked in? With that kind of data model, there are no good choices for modeling this kind of data efficiently or clearly.
Considering your second case, it makes sense to make a new :Accident node with each accident, with details of the accident in each node. If two or more cars in your db is involved in the same accident, then it makes sense to use the same accident node to represent the accident, and make relationships from the multiple cars to the same accident they were involved in. That saves you from duplicating data about the same accident instance, and clearly models the participants in the accident, along with any other related data that is associated with the accident. You could always store accident data specific to the car in question on the relationship between the car and the accident, such as the damage sustained, and whether the driver of the car was found at fault.
It should be clear in this data model that there should be separate :Accident nodes (unless, as mentioned, it's the same accident for multiple cars), as the data between accidents will differ, and requires you to capture them in separate nodes. This is far different than your movie data model, where it does not make sense to use multiple :Movie nodes for the same movie, since the data is all the same.
As for storing data in relationships, again that depends upon your data model, and what makes the most sense. For ratings, storing the rating on the relationship to the movie looks fine to me.
There are cases where you may consider creating intermediary nodes to store data on a node instead of a relationship. Consider an employment graph, with :Person and :Company nodes. You could model this simply with :WORKS_AT relationships between nodes, but you would need to store data about the employment on the relationship, such as hireDate, salary, jobTitle, etc. That might be fine...but you could always extract that into its own node, an :Employment node between a :Person and a :Company to hold that data. That could let us index those properties, making it easier to query :Persons for a :Company in order of hireDate, for example, which wouldn't be as efficient if the data was on the relationships, as you can't index on relationship properties.
EDIT
Concerning cardinality of nodes, when to use a single node instance vs multiple node instances, again, that's usually best answered as you answer the questions of "does this make logical sense for this data model" and "is this easy and efficient to query this data?"
The two cases you presented, for Matrix :Movie nodes and :Accident nodes, each demonstrate opposite cases for this.
A single Matrix :Movie node makes sense, I think it may be a stretch to find use cases which would require multiple copies of Matrix nodes.
However, if you had to model movie showings of The Matrix, then that might call for a :Showing node, of which there would be several (per time and per theater), but all of them referencing the same Matrix :Movie node. It's the same movie, but it has multiple showings.
For :Accidents, it makes sense to use multiple :Accident nodes, each one representing a particular instance of an accident. In many cases there will be only one :Car associated with a single :Accident node, a driver crashing into something without involving other drivers. In other cases, when it's a multi-car collision, then several cars are involved in the same :Accident, so you would have the :Accident node with the time and location and details, and relationships with the :Cars involved in that particular accident.
While it's possible to use a single :Accident node for ALL accidents, and have the details on the relationships, you'll quickly encounter problems with some of the likely queries you'll need to make. For example, how do you know which accidents were multi-car accidents, and which cars were involved? We would have to examine all relationships to the single :Accident node, and even then we'd have to do extra logic to figure out the associations. What if we wanted to order :Accidents by date? We can't use indexes on relationship properties, so again we have to touch on all relationships and inspect their properties and sort them all. What if we wanted to indicate location based on closest city to the accident, for fast lookup of accidents in certain cities? Again, we can't use indexes on relationship properties for fast lookup. If we already have :City nodes, we can't create relationships between the relevant :City node and the crash relationship, you need a node for that.
I could list more cases, but it's fairly clear that multiple :Accident nodes are needed per accident (again, sharing the node for :Cars involved in the same :Accident).
This is one of those cases where even if you missed it when thinking about if the data model makes sense, consideration about the kind of queries you want to make, and their efficiency, should push you toward a better means of modeling your data...in this case, using multiple :Accident nodes.

Neo4J - Prevent duplicated relationship types with identical meaning

Consider Person nodes and Item nodes.
What is the best way to prevent having both 'Purchased' type relationships and 'Bought' type relationships in the graph that have the same meaning, but are simply named differently?
E.g. if we end up with our graph in a state like:
(Alice) -[Bought] -> (Pickles)
(Bob) -[Purchased]-> (Pickles)
and I want to know everyone who has bought a jar of pickles. Clearly someone made a mistake when creating one of these relationships. How do I prevent that class of mistake?
Limit the relationships a user can create to a specific set of names, and don't allow any other relationship names.

Relations to relation neo4j

Maybe it is a long shot but worth trying...
I have the following relation User1-[:MATCHED]-User2, I want to allow other users to give feedback (Like) on that relation, I am guessing that the obvious answer is to define new node from type Match which will be created for every two matched users and then relate to that node with LIKE relation from each user who liked the match.
I am trying to think about other way to model that in the graph without the overhead of creating new node for each match...
Can a relation relate to other nodes except the start/end nodes?
Any help will be appreciated thanks.
Neo4j does not support hypergraphs or relationships to relationships. Modelling your MATCHED relationship with a node is probably the way to go.
An alternative is to reference the relationship id from another node:
User1-[MATCHED]->User2 (where MATCHED has the id xyz)
User3-[LIKES]->Relationship(relId = xyz)
The "Relationship" node would contain the id of the MATCHED relationship as a property. This relId property would need to be indexed to find all LIKES of a given MATCHED relationship.
This solution is not well suited for traversals though.

Resources