What is the difference between leaf set and routing table entries in DHT? - dht

I am new to DHT (Distributed Hash Table). I have read theory regarding DHT (Pastry implementation - FreePastry). But I was really confused about distinction between leaf set, routing table and neighborhood set. What is their significance?
Also what is the difference between keys and nodeIds in the DHT ring? It would be really helpful if someone could provide an insight into it.
Thanks in advance.

I Was able to find the following facts about DHT implementation of FreePastry:
Leaf Set- It is the no. of L closest nodes to a given node in the DHT ring. L/2 nodes are larger than the given nodeId and the rest L/2 is smaller than the given nodeId. The leaf set size 'L' is configurable in most of the cases.
Routing table - contains the information about the nodes connected to a given node to which it can send direct messages for routing. [For the process of routing, each node checks whether the address is present in its leaf set. If found then it delivers, else it routes the message to closest Id in the routing table.]
Neighbourhood set - It is deprecated as of now and I ahven't been able to find much info about it.
If someone has a better insight, plz share!

Related

Neo4j query execution process

I am using Neo4j 5.1 Enterprise edition.
I performed the following code:
profile MATCH(d:Dataset {name:'dataset2'})<-[:`has_d`]-(s:Score)-[:`has_a`]->(a:Algorithm {name:'algorithm1'})
MATCH (t:Tag) WHERE t.name IN ['tag1', 'tag2', 'tag3', 'tag4', 'tag5']`
MATCH (i:Image)-[:has_score]->(s)-[:`has_tag`]->(t)
RETURN i LIMIT 100
Due to the profile result is too big, i only post here the important part:
I was expecting it to filter Tag by name before doing Expand.
Why Neo4j did Expand before Filter?
How can i fix it? Is the order of execution irrelevant?
Is Filter#Neo4j a simple filter or it uses our index?
I'm very sorry for asking so many questions, maybe some of them are stupid and obvious, but I don't understand why.
Any help would be greatly appreciated
It needs to follow the relationships by type and direction first from the source nodes.
So it does the expand by type and direction
and only then sees the end node and then can filter those nodes if they have the matching label(s).
If your relationships are already uniquely identifying the target node label, then you can leave off the labels from those (but not from the start node otherwise it won't use the index).

Neo4j cyper query: How to travese

I am trying to learn neo4j, so I just took a use case of a travel app to learn but I am not sure about the optimal way to solve it. Any help will be appreciated.
Thanks in advance.
So consider a use case in which I have to travel from one place (PLACE A) to other (PLACE C) by train, but there is no direct connection between the two places. And so we have to change our train in PLACE B.
Two places are connected via a relation IS_CONNECTED relation. refering to green nodes in the image
And then if there is an is_connected relation between two place then there will be an out going relation i.e. CONNECTED_VIA to a common train from both the node which implies how they are connected referring to red nodes in image
my question is how are we suppose to know that we have to change the station from place b
My understanding is:
We will check where the two places are connected via IS_CONNECTED relationship
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
MATCH path = (start)-[:IS_CONNECTED*..]->(end)
RETURN path
this will show that these two places are connected
Then we will see that if place A and place c are directly connected or not by the query
match (p:place{name:"heidelberg"})-[:CONNECTED_VIA]->(q)<-[:CONNECTED_VIA]-(t:place{name:"frankfurt"})
return q
And this will return nothing because there is no direct connections
My brain stopped functioning after this. I am trying to figure how from past 3 days. I am sorry I look ao confused
Please click here for the image of what i am referring
You'll want to use variable-length relationships in your :CONNECTED_VIA match, and then get the :Place nodes that are in your path. And it's usually a good idea to use an upper bound, whatever makes sense in your graph.
Then we can use a filter on the nodes in your path to only keep the ones that are :Place nodes.
match path = (p:place{name:"heidelberg"})-[:CONNECTED_VIA*..4]-(t:place{name:"frankfurt"})
return path, [node in nodes(path)[1..-1] where node:Place] as connectionPlaces
And if you're only interested in the shortest paths, you may want to check the shortestPath() or shortestPaths() functions.
One last thing to note...when determining if two locations are connected, if all you need is a true or false if they're connected, you can use the EXISTS() function to return whether such a pattern exists:
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
return exists((start)-[:IS_CONNECTED*..5]->(end))

how new traverse api works

nowadaya i m learning new traverse api of neo4j and i followed the link below
http://neo4j.com/docs/stable/tutorial-traversal-java-api.html
so now i know how to use uniqueness,evaluater etc.
that is i know how to change beahviours of the api.
but the thing i want to know is that how exactly it traverse.
for example im trying to find neighbours of a node.
does neo4j use index to find this?
does neo4j keep a hash to find neighbours?
more specifically, when i write the following code for example.
TraversalDescription desc = database.traversalDescription().breadthFirst().evaluator( Evaluators.toDepth( 3) );
node =database.getNodeById(4601410);
Traverser traverser = desc.traverse(node);
in my description i used breadthFirst. So it means that when i give node to traverse, the code should find the first neighbours. So how the api finds the first neighbours is the thing i want to know. Is there a pointer to neighbours in node? So when i say traverse until to depth 3 it finds the first neighbours and then take the neighbours as node in a recursive function and so on? So if we say to depth 10 then it can be slow?
so what i want exactly is how i can change the natural behaviour of the api to traverse?
Simplified, Neo4j stores records representing nodes and relationships a.s.o. in its store. Every node is represented by a node record on disk, that record contains a pointer (direct offset into relationship store) for the first relationship (neighbour if you will). Relationship records link to each other, so getting all neighbours for a node will read the node record, its relationship pointer to that relationship record and continue following those forward pointers until the end of that chain. Does that answer your question?
TraversalDescription features a concept of PathExpander - that is the component deciding which relationships will be used for the next step. Use TraversalDescription.expand() for this.
You can either use your own implementation for PathExpander or use one of the predefined methods in PathExpanders.
If you just want your traversal follow specific relationship types you can use TraversalDescription.relationships() to specify those.

Cypher: Ordering Nodes on Same Level by Property on Relationship

I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders

Nearest nodes to a give node, assigning dynamically weight to relationship types

I need to find the N nodes "nearest" to a given node in a graph, meaning the ones with least combined weight of relationships along the path from given node.
Is is possible to do so with a pure Cypher only solution? I was looking about path functions but couldn't find a viable way to express my query.
Moreover, is it possible to assign a default weight to a relationship at query time, according to its type/label (or somehow else map the relationship type to the weight)? The idea is to experiment with different weights without having to change a property for every relationship.
Otherwise I would have to change the weight property's value to each relationship and re-do it to before each query, which is very time-consuming (my graph has around 10M relationships).
Again, a pure Cypher solution would be the best, or please point me in the right direction.
Please use variable length Cypher queries to find the nearest nodes from a single node.
MATCH (n:Start { id: 0 }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
Note that the syntax [CONNECTED*0..2] is a range parameter specifying the min and max relationship distance from a given node, with relationship type CONNECTED.
You can swap this relationship type for other types.
In the case you wanted to traverse variably from the start node to surrounding nodes but constrain via a stop criteria to a threshold, that is a bit more difficult. For these kinds of things it is useful to get acquainted with Neo4j's spatial plugin. A good starting point to learn more about Neo4j spatial can be found in this blog post: http://neo4j.com/blog/neo4j-spatial-part1-finding-things-close-to-other-things
The post is a little outdated but if you do some Google searching you can find more updated materials.
GitHub repository: https://github.com/neo4j-contrib/spatial

Resources