Efficiently check if there is at least one relationship of type connected to node, if not - remove node

Efficiently check if there is at least one relationship of type connected to node, if not - remove node - neo4j

Let's assume this Neo4j data construct.
(:Store)<-[:FROM]-(:Notification)-[:FOR]->(//users//:User)
Which should serve as, for example, a notification for users on a new sale in a store.
Such a notification may be addressed to a large number of users simultaneously, in this case I can see two approaches for this data modelling:
Create separate :Notification with relationships for each of the :User's; All connected to single :Store node. So when notification is received - relationships and :Notification node is removed.
Approach which i thought should be more effective performance wise and about which my question is: Create a single :Notification node connected to a store and multiple [:FOR]'s for different users. Notification received - :FOR for this user removed, if no :FOR's left - :Notification itself removed .
So my questions are:
Am I correct in assuming that the 2nd one is a better practice?
How can I generally, after deleting a relationship, check if there are more such relationships, and do it without making Neo4j match all connected ones of this type, i.e check if there is at least one relationship of this type left?

Definitely having fewer objects created in the first place would be more efficient in general. It will result in 2n fewer objects (1 node and relationship per user). The exception would be if you had so many users per notification that the node density was too high. To avoid that though you could simply create additional notifications at a particular user threshold.
The following query is adapted from #Luanne's answer to this question which i though was pretty slick.
It presumes that besides the User nodes, the only other connection to the Notification nodes is the store nodes. If the degree of nodes connected to the Notification node is one then it must be just a Store node remaining. Remove the relationship to the Store node and remove then remove the Notification node.
MATCH (n:Notification {name: 'Notification One'} )-[store_rel:FROM]->(s:Store)
WHERE size((n)--())=1
DELETE store_rel, n

Related

Device Delete event Handling in Rule chain being able to reduce the total device count at Customer Level

I am using total count of devices as the "server attributes" at customer entity level that is in turn being used for Dashboard widgets like in "Doughnut charts". Hence to get the total count information, I have put a rule chain in place that handles "Device" Addition / Assignment event to increment the "totalDeviceCount" attribute at customer level. But when the device is getting deleted / UnAssigned , I am unable to get access to the Customer entity using "Enrichment" node as the relation is already removed at the trigger of this event. With this I have the challenge of maintaining the right count information for the widgets.
Has anyone come across similar requirement? How to handle this scenario?

Has anyone come across similar requirement? How to handle this scenario?
What you could do is to count your devices periodically, instead of tracking each individual addition/removal.
This you can achieve using the Aggregate Latest Node, where you can indicate a period (say, each minute), the entity or devices you want to count, and to which variable name you want to save it.
This node outputs a POST_TELEMETRY_REQUEST. If you are ok with that then just route that node to Save Timeseries. If you want an attribute, route that node to a Script Transformation Node and change the msgType to POST_ATTRIBUTE_REQUEST.

Neo4J - Which is better to store element as a property of user or as a node & relationship?

I got a problem when designing a graph model with million users. I need to store information that user is registered or non-register.
As I see we have 2 options:
Store a property "register = true/false" in each user node. So with 1 million user, we have 1 million properties "register".
Store a Registered node then make relationship just for registered user to this node. So we have number of relationship equal exactly with the registered user.
Which option is better in performance searching also about minimum storage?
Thanks in advance,

Modeling your data as a graph is a difficult thing to pin down exactly. Typically, when it comes to NoSQL databases, the most important thing to consider is how you will be using your data, and to model it based on that.
Using the external node might run into performance problems, as Neo4J typically starts to run into issues during traversing as it approaches around 10,000 relationships in a single node. You will be well above that limit with an external "Registered" node; on the other hand as long as you are not anchoring your search to that node, it should be okay.
No matter which route you go, the query you described in the comments will likely anchor on (start with) the user, then traverse to who their friends are, and then for each friend, it will check whether it
A. has the "registered" property set to 'true'
B. has a relationship to the "Registered" node.
Each of these methods appears to have a similar execution time, and indexing on the "registered" property will have negligible impact because it is not being used as an anchor (presumably; you would have to PROFILE your query with both methods to find out for sure). So, like you mentioned, one might consider the space restraints.
Besides that, there is not much difference from a performance analysis perspective between the two methods that I can see.
A third option, mentioned by #InverseFalcon, is to use an additional label, ':Registered' on those nodes that are registered. This might well result in a faster comparison time than keeping it in a property, as labels will be inlined in the node store and can be checked there, whereas properties might have an additional level of indirection to the property store.

store files in Neo4j : neostore.nodestore.db

I'm reading about Neo4j underlying infrastructure in it's book and I think I found a contradiction .Here In the text it is mentioned that :"The next four
bytes represent the ID of the first relationship connected to the node, and the following
four bytes represent the ID of the first property for the node" :
but as you can see in the figure 6-4 : if you look at the photo it is Nextrelid! which one is correct? and if we only store first relationship in the nodestore file, what happen to the other relationship?

From the point of view of the node, the next relationship id is the same thing as "the id of the first relationship connected to the node". They're different ways of describing the same thing.
The pattern here is that relationships are stored as a chain. To iterate over all relationships, from the node, you use the id of the first relationship to jump to that relationship in memory, then jump to the area in memory on that relationship where the next rel id is stored and pointer chase across the rest of the chain.
That said, when relationships reach a particular density (I think it's 50 rels per node) then the structure is somewhat different, a new entity is present between the node and its relationships to allow for more efficient navigation of its relationships.

Chord Join DHT - join protocol for second node

I have a distributed hash table (DHT) which is running on multiple instances of the same program, either on multiple machines or for testing on different ports on the same machine. These instances are started after each other. First, the base node is started, then the other nodes join it.
I am a little bit troubled how I should implement the join of the second node, in a way that it works with all the other nodes as well (all have of course the same program) without defining all border cases.
For a node to join, it sends a join message first, which gets passed to the correct node (here it's just the base node) and then answered with a notify message.
With these two messages the predecessor of the base node and the successor of the existing nodes get set. But how does the other property get set? I know, that occasionally the nodes send a stabilise message to their successor, which compares it to its predecessor and returns it with a notify message and the predecessor in case it varies from the sender of the message.
Now, the base node can't send a message, because it doesn't know its successor, the new node can send one, but the predecessor is already valid.
I am guessing, both properties should point to the other node in the end, to be fully joined.
Here another diagram what I think should be the sequence i the third node joins. But again, when do I update the properties based on a stabilise message, when do I send a notify message back? In the diagram it is easy to see, but in code it is hard to decide.

Th trick is here to set the successor to the same value as the predecessor if it is NULL after the join-message has been received. Everything else gets handled nicely by the rest of the protocol.

data model for notification in social network?

I build a social network with Neo4j, it includes:
Node labels: User, Post, Comment, Page, Group
Relationships: LIKE, WRITE, HAS, JOIN, FOLLOW,...
It is like Facebook.
example: A user follow B user: when B have a action such as like post, comment, follow another user, follow page, join group, etc. so that action will be sent to A. Similar, C, D, E users that follow B will receive the same notification.
I don't know how to design the data model for this problem and I have some solutions:
create Notification nodes for every user. If a action is executed, create n notification for n follower. Benefit: we can check that this user have seen notification, right? But, number of nodes quickly increase, power of n.
create a query for every call API notification (for client application), this query only get a action list of users are followed in special time (24 hours or a 2, 3 days). But Followers don't check this notification seen or yet, and this query may make server slowly.
create node with limited quantity such as 20, 30 nodes per user.
Create unlimited nodes (include time of action) on 24 hours and those nodes has time of action property > 24 hours will be deleted (expire time maybe is 2, 3 days).
Who can help me solve this problem? I should chose which solution or a new way?

I believe that the best approach is the option 1. As you said, you will be able to know if the follower has read or not the notification. About the number of notification nodes by follower: this problem is called "supernodes" or "dense nodes" - nodes that have too many connections.
The book Learning Neo4j (by Rik Van Bruggen, available for download in the Neo4j's web site) talk about "Dense node" or "Supernode" and says:
"[supernodes] becomes a real problem for graph traversals because the graph
database management system will have to evaluate all of the connected
relationships to that node in order to determine what the next step
will be in the graph traversal."
The book proposes a solution that consists in add meta nodes between the follower and the notification (in your case). This meta node should got at most a hundred of connections. If the current meta node reaches 100 connections a new meta node must be created and added to the hierarchy, according to the example of figure, showing a example with popular artists and your fans:
I think you do not worry about it right now. If in the future your followers node becomes a problem then you will be able to refactor your database schema. But at now keep things simple!
In the series of posts called "Building a Twitter clone with Neo4j" Max de Marzi describes the process of building the model. Maybe it can help you to make best decisions about your model!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart