Simple social network design flaw with graph database - neo4j

I was looking at graph databases and Neo4j. As suggested, I tried to draw a simple social networking graph on white paper and after a few sketches I stucked at some similar points.
At first I designed a social network where "user"s can "like" "post"s.
(u1:User)-[:LIKED]->(p:Post)<-[:POSTED]-(u2:User)
Now I want to notify user2 about the like action and draw this on the white paper.
(u1:User)-[:LIKED]->(p:Post)<-[:POSTED]-(u2:User)
| ^
|__________[:NOTIFY]_________|
I am not sure if it is clear but I just drew a relationship between a node and another relationship which is not possible for graph databases, at least for Neo4j. So I decided, a Like should be a node instead of a relationship. Then my graph turned into this.
(u1:User)-[:CREATAD]->(l:Like)-[:BELONGS_TO]->(p:Post)<-[:POSTED]-(u2:User)
| ^
|__________________[:NOTIFY]________________|
Now everything is OK. Then I added Comments feature to the system as a relationship but when notifications involved, again it turned into a node. And same happened when I added "Liking comments" feature, "Likes to Comments" first seemed they are relationships but once again they turned into nodes when notifications involved.
In general, at some point I find myself drawing a relationship between a node and another relationship. My solution to that feels like I am turning entities, which naturally look like relationships, into nodes. And this makes me think of I have some problems with deciding what should be a node and what should be a relationship.
So my question is, does anyone else other than me fall into this "relationship between a node and another relationship" issue and if so how do you solve that?

It all depends on your use-cases, in many cases a simple relationship is good enough but if you want to do more with that entity or fact you turn it into a node, oftentimes it turns out that it is an actually quite important concept in the domain.
In our data modeling class there is a specific section on this and also in the "Graph Databases" book it is discussed in detail (you can get the free PDF here).
Sometimes it makes sense to keep the original relationship around for a fast shortcut crossing over that intermediate node if you don't need that detail.

Related

Why it is not recommended to index relationships in a graph database

In the book Neo4j in Action by Aleksa Vukotic and Nicki Watt, the authors say:
In our experience, it is less common for relationship indexes to be good solutions. We are not saying that relationship indexing is poor practice, but if you find yourself adding lots of relationship indexes, it is worth asking why.
It sounds that the authors do not recommend to index relationship in a graph database but no explanation is given thereafter. Does anyone know why?
I've voted for this question to be migrated to SO, and answering it while hoping it to be really migrated. I used Neo4j a couple of years. Although it has changed a lot since then, the principles of being a graph database won't alter much I believe. In my opinion, if you need a lot of indices to promptly query the relationships between the nodes, you could have designed your data model in some other way such that it focuses more on the graph nodes (just for example, relationships being your nodes, and nodes being your relationships as in line graph); because the querying mechanism (e.g. Cypher query) is generally optimised for the nodes.
First, it's important to understand the role of indexes in Neo4j, in that indexes are used to find starting points in the graph, after which relationship traversal and filtering are used to perform the remainder of the pattern matching and to complete the query.
The advice therefore is about the same as: "we do not recommend using relationships as starting points in the graph", and we find that true more often than not.
Usually when you need to do index lookups, you have certain "things" in mind as your starting places, and important things in graphs are typically represented by nodes. If we're asking "what employees are connected to this particular company" we're interested in starting quickly by finding that particular company and expanding out, not in finding all :EMPLOYED_BY relationships in the graph and filtering by the connected company, which would take far more time.
Often we find that those who encounter this restriction, and need this kind of fast lookup of relationships anyway, may need to rethink their model. Often when there is a need to lookup relationships as starting places in the graph, it is an indication that the thing represented by a relationship is important enough that it really should be a node in the graph (with its own relationships to the previously connected nodes), so this becomes a "modeling smell" that drives refactoring changes to the model. Often this kind of change feels more natural after, and affords more capability for the thing as a node that wasn't available when it was being modeled as a relationship (for example, the ability to apply multiple labels to it, or to connect it via relationships to more nodes than just the original two).
All that said, there will be cases where a relationship really does just need to be a relationship (either for business reasons, or because it truly is most practical modeling-wise for it to be kept as a relationship), and using those relationships as starting points in the graph make sense.
With the fulltext schema indexes introduced in Neo4j 3.5, we added the capability to add relationship indexes by relationship type(s) and property(or properties). So the capability is there, if needed, after you've ruled out refactoring of your model.

What is the use of properties on Neo4J relationships?

I am concerned I am not getting the full benefit from relations in Neo4J. While we use them to relate two nodes (of course), we rarely add properties to relationships and I feel like we're missing the bigger picture.
Consider a case where there's an EVENT and affected people. We want confirmation from all people that they are informed of the event.
Here is what we do, and I think it is not great:
(e:EVENT)-[:NOTIFICATION]->(:EVENT_STATUS)-[:AFFECTED]->(a:PERSON)
Now it isn't so bad, because we need EVENTs and we already have PERSON. So we're adding the stuff that connects them. It works. However, the only purpose of EVENT_STATUS is to track a notification date and the PERSON's confirmation information. The fact is, it feels like we're implementing a relational database structure.
Would it be wrong/suicidal to add the notification date and the PERSON's confirmation to the relation?
(e:EVENT)-[:INFORMED {notification_date: 123123123,
confirmation_date: 123123999,
confirmation_type: 'ATTENDING'}]->(a:PERSON)
Help me understand the purpose of properties on Relationships, please!
edit - English... is a skill.
Your proposed solution is just fine, since you are tracking different pieces of information about a particular type of relationship between 2 nodes. This is exactly what relationship properties are for.
There is no need to add extra relationships and nodes, as you are now doing. Not only are you wasting resources, but your queries are made unnecessarily complex.

When to not use neo4j?

Neo4j is a great tool for mapping relational data, but I am curious what under what conditions it would not be a good tool to use.
In which use cases would using neo4j be a bad idea?
You might want to check out this slide deck and in particular slides 18-22.
Your question could have a lot of details to it, but let me try to focus on the big pieces. Graph databases are naturally indexed by relationships. So graph databases will be good when you need to traverse a lot of relationships. Graphs themselves are very flexible, so they'll be good when the inter-connections between your data need to change from time to time, or when the data about your core objects that's important to store needs to change. Graphs are a very natural method of modeling some (but not all) data sources, things like peer to peer networks, road maps, organizational structures, etc.
Graphs tend to not be good at managing huge lists of things. For example, if you were going to build a customer transaction database with analytics (where you need 1 million customers, 50 million transactions, and all you do is post transactions all day long) then it's probably not a good fit. RDBMS is great at that, notice how that use case doesn't exploit relationships really.
Make sure to read those two links I provided, they have much more discussion.
For maintenance reasons, any service aggregating data feeds has until now been well advised to keep their sources independent.
If I want to explore relationships between different feeds, this can be done at application level, using data tracking (for example) user preferences amongst the other feeds.
Graph databases are about managing relationship complexity, but this complexity is in many cases a design choice. Putting all your kids in one bathtub is fine until you drop the soap..

Graph Databases and MDM

One of the problems that I'm trying to solve with Master Data Management (MDM) is merge duplicate entities that look different because of things like misspellings. For instance John Doe and Jon Doe might in reality be the same people.
I've read that graph databases like Neo4J can be used for MDM, and I have the vague sense that graph theory might be able to help me resolve the problem of duplicate entities. Basically if I look at the relationships between John Doe/Jon Doe might graph similarity of that node with other pieces of data offer a way to decide whether they are in fact the same object?
If so, how can I go about doing this with Neo4J?

Can this be accomplished by a Graph Database?

I have a request to develop an application that keep track of the movements of a certain item (or items). To better demonstrate what the application must do, I drew a diagram (simplified abstraction).
As I never worked with any databases other than the relational ones, I really don't know if I can solve this problem with a graph database.
These questions must be answered by the system:
What was the path that a certain pen drive walked?
I passed some pen drivers. Where are they now?
What are the pens I received, from where did they come from and to where did they go?
Where are the pens I burned and passed? And with whom?
Any help and suggestions are much appreciated.
Thanks
In Neo4j everything is either a node or a relationship. So it's useful to think: what would be my nodes and relationships?
Here it might be, for example, that every "pen drive, "person" and "location" is a node. Verbs like "walk" or "give" would be your relationships.
In this model, you'd be able use "Cypher" to query for things like "give me all location nodes connected to pen nodes by the relationship walk." Or, say "start at all person nodes and return nodes who have a give relationship to a pen drive node that doesn't have a give relationship that connects back to the starting person node."
This rich graph query language gives you nice algorithms like shortest distance for free, so you beyond a transactional record you could determine whether, for example, a pen drive made it from A to B using the optimal path. But as you can see above, "relational joins" do not beget simple queries or descriptions thereof.
When it comes to database design, when the model becomes cumbersome to map mentally, it's going to be a pain to develop too. Design your database based on how you plan to query it. If you're unable to easily explain those queries in terms of Neo4j, it's possible that Neo4j isn't going to be the best fit.

Resources