Chord Join DHT - join protocol for second node - join

I have a distributed hash table (DHT) which is running on multiple instances of the same program, either on multiple machines or for testing on different ports on the same machine. These instances are started after each other. First, the base node is started, then the other nodes join it.
I am a little bit troubled how I should implement the join of the second node, in a way that it works with all the other nodes as well (all have of course the same program) without defining all border cases.
For a node to join, it sends a join message first, which gets passed to the correct node (here it's just the base node) and then answered with a notify message.
With these two messages the predecessor of the base node and the successor of the existing nodes get set. But how does the other property get set? I know, that occasionally the nodes send a stabilise message to their successor, which compares it to its predecessor and returns it with a notify message and the predecessor in case it varies from the sender of the message.
Now, the base node can't send a message, because it doesn't know its successor, the new node can send one, but the predecessor is already valid.
I am guessing, both properties should point to the other node in the end, to be fully joined.
Here another diagram what I think should be the sequence i the third node joins. But again, when do I update the properties based on a stabilise message, when do I send a notify message back? In the diagram it is easy to see, but in code it is hard to decide.

Th trick is here to set the successor to the same value as the predecessor if it is NULL after the join-message has been received. Everything else gets handled nicely by the rest of the protocol.

Related

How does the Minimum spanning tree in neo4j work

I am playing around with some graph theory algorithms in neo4j. I am trying to find the minimum spanning tree (mst) within my network. I synthetically created a network of 10 000 people. Each person has 12 relationship types each one linking him back to the other 9999 and each relationship with its own weight assigned.
The problem I have however is the fact that according to the definition the results must be a tree spanning over the ENTIRE network. The neo4j function however only returns a very small sub-graph (only about 12 nodes) of the entire network.
The code I am using looks like this:
MATCH (a:Name {Name:"Dillon Snow"})
CALL algo.mst(a,"Weight",{stats:true})
YIELD loadMillis, computeMillis, writeMillis, weightSum, weightMin, weightMax, relationshipCount
RETURN loadMillis, computeMillis, writeMillis, weightSum, weightMin, weightMax, relationshipCount
What can I change to get the function to return the mst spreading through the entire network
algo.mst.* has not been adapted to the matured Neo4j-Graph-Algorithms-CoreAPI in its current release (3.2.5.2/3.3.0.0 # Dec 2017) which might lead to unexpected results. But there is a pull request in the pipe, you can expect some changes in the next release.
Anyway.. The procedure should add a new relationship-type (default mst) to your nodes. In a connected graph each node should be connected as well while a disconnected graph leads to connections only between the nodes of this particular connected component (from your startNode).
If i understand you right you have multiple relationship types and more then one of them between a pair of nodes? E.g. Node A is connected to Node B with several relations, each of them with a different type and property value. This is a problem. In general the Graph-Algorithms-API does not support multible releationships. Each pair of nodes can only have one connection per direction. Although you can import multible types the core-api itself has no idea of the underlying type. If multible relationships between a pair of nodes get imported usualy the last one wins. This has been mentioned in the documentation ;)
To overcome this limitation you could replace your relationship types with some kind of artificial nodes. When traversing over the result tree the occurence of one of those nodes would indicate the original relationship.

Efficiently check if there is at least one relationship of type connected to node, if not - remove node

Let's assume this Neo4j data construct.
(:Store)<-[:FROM]-(:Notification)-[:FOR]->(//users//:User)
Which should serve as, for example, a notification for users on a new sale in a store.
Such a notification may be addressed to a large number of users simultaneously, in this case I can see two approaches for this data modelling:
Create separate :Notification with relationships for each of the :User's; All connected to single :Store node. So when notification is received - relationships and :Notification node is removed.
Approach which i thought should be more effective performance wise and about which my question is: Create a single :Notification node connected to a store and multiple [:FOR]'s for different users. Notification received - :FOR for this user removed, if no :FOR's left - :Notification itself removed .
So my questions are:
Am I correct in assuming that the 2nd one is a better practice?
How can I generally, after deleting a relationship, check if there are more such relationships, and do it without making Neo4j match all connected ones of this type, i.e check if there is at least one relationship of this type left?
Definitely having fewer objects created in the first place would be more efficient in general. It will result in 2n fewer objects (1 node and relationship per user). The exception would be if you had so many users per notification that the node density was too high. To avoid that though you could simply create additional notifications at a particular user threshold.
The following query is adapted from #Luanne's answer to this question which i though was pretty slick.
It presumes that besides the User nodes, the only other connection to the Notification nodes is the store nodes. If the degree of nodes connected to the Notification node is one then it must be just a Store node remaining. Remove the relationship to the Store node and remove then remove the Notification node.
MATCH (n:Notification {name: 'Notification One'} )-[store_rel:FROM]->(s:Store)
WHERE size((n)--())=1
DELETE store_rel, n

neo4j - how to snapshot everything in a label

Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.

Is node reference equality in embedded neo4j guaranteed?

I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.
Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()
See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.
It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob

Erlang scalability questions related to gen_server:call()

in erlang otp when making a gen_server:call(), you have to send in the name of the node, to which you are making the call.
Lets say I have this usecase:
I have two nodes: 'node1' and 'node2' running. I can use those nodes to make gen_server:call() to each other.
Now lets say I added 2 more nodes: 'node3' and 'node4' and pinged each other so that all nodes can see and make gen_server:calls to each other.
How do the erlang pros handle the dynamic adding of new nodes like that so that they know the new node names to enter into the gen_server calls, or is it a requirement to know beforehand the names of all the nodes so that they are hardcoded in somewhere like the sys.config?
you can use:
erlang:nodes()
to get a "now" view of the node list
Also, you can use:
net_kernel:monitor_nodes(true) to be notified as nodes come and go (via ping / crash / etc)
To see if your module is running on that node you can either call the gen_server with some kind of ping callback
That, or you can use the rpc module to call erlang:whereis(name) on the foreign node.

Resources