Kadelmia Join operation: what does "refresh" mean in the context of join opreation? - dht

The paper says:
To join the network, a node u must have a contact to an already
participating node w. u inserts w into the appropriate k-bucket. u
then performs a node lookup for its own node ID. Finally, u refreshes
all kbuckets further away than its closest neighbor. During the
refreshes, u both populates its own k-buckets and inserts itself into
other nodes’ k-buckets as necessary
The definition of the refresh is the following:
Refreshing means picking a random ID in the bucket’s range and
performing a node search for that ID.
I don't understand the last part. Per my understanding, in the intermediate steps of the node lookup operation, the neighbors of the node u will have stored node u inside their own k-buckets. At the end, node u will store its k neighbors inside the appropriate k-buckets. (as shown in the visualization here)
So I don't understand what value is being gained from performing a refresh after doing the lookup.
A walkthrough example with simple parameters would super helpful!

A lookup targeting the node's own ID is not guaranteed to visit enough other nodes to fill all buckets. So after performing a lookup that fills the home bucket additional refreshes targeting the other buckets are performed to completely fill them.

Related

Creating Stateful nodes in Neo4j

When I say Stateful Node, I mean a node that carries ‘state info,’ such as the path that leads to this node. E.g. R1 is a node, and
state1: link coming from path 1
state2: link coming from path 2
Is there any way I could create such a node in Neo4j? While traversing such a node, I expect it to behave like this:
if state 1, and input is x, then [:has] node1
if state one and input is y, then stop
if state two and input is z, then [: has] node 2.
I want to convert node R1 to a stateful node so that it keeps the information mentioned above. Does Neo4J support such nodes? If so, could you guide me to a resource? Also, does the cipher query support the ‘stateful’ approach so I can set the state according to the path from which R1 is produced?
In the Neo4j architecture, a relationship is a doubly linked-list that stores pointers to the start and end nodes.
It sounds like what you're looking to do is create nodes that store that same information for all relationships that touch it, and then have behavior based on how the graph reaches them.
This is more akin to logic control, and Cypher handles that through filters on relationship type, node labels, and properties.
However, you can always set properties of nodes based on queries. For example:
MATCH (:AUTH_T)-[:HAS]->(n:R1)
SET R1.reached_by = "HAS"
Then you could do something with that in the future, like if you want to know if node n was reached by another method.

neo4j getting from a list of labels that which is a child and which one is parent

I have a problem in which there a number of nodes A,B,C,D
where
B-->A
C-->B
D-->B
and the relation between them is children.
Now I want to query Neo4j to find that from a list of labels (B,C,D) which nodes exists at the bottom of the graph
I am making a bot application. In the neo4j database relations would be stored between different terms.
Like :dog-->:animal
:labra-->:dog
:germanShepard-->:dog
Now If a user asks a qustion tell me about dog then i should be able to get dog label data and if the user asks tell me about labra dog then i should be able to get labra label data.I am breaking the user input into tokens and then trying to find which label is at the bottom.
You can try something like
Match (a:Label) where not (a)<--(:Label) return a
(should work but I didn't test it)
As mentioned in my comment, using a unique label for every single node is going to be costly in the long run, and is going to impact your lookup speed on your queries.
So, if I'm understanding your use case correctly, you're breaking up user input into tokens, and the tokens should match to nodes on the same path in your graph. You want to find the label on the "bottom" of the graph, basically a leaf node, though in your description child nodes point toward their parent. I'll assume it's a :Parent relationship from the child to the parent node.
Here's a query which might do what you want. We'll assume you pass in the list of tokens as a parameter {tokens}. Please review the developer documentation for using parameters.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
AND NOT ()-[:Parent]->(n)
RETURN n
This will ensure the nodes you return are not themselves parents of any other node.
However, if you want instead wanted to be able to return nodes even if they were parents of other nodes, then we could instead return the node that is farthest from the root node. This requires a :Root node at the root of your entire graph. For your example in your description, :Root would be the parent of :animal.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
MATCH (n)-[r:Parent*]->(:Root)
RETURN n
ORDER BY SIZE(r)
LIMIT 1
Keep in mind that this query isn't guaranteed to work when there are multiple nodes with the same distance to the :Root. For example, if "germanShepard" and "labra" were given as elements of the tokens list, only one of the corresponding nodes would be returned because of the LIMIT 1, with no guarantee of which node would be returned.

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

neo4j - how to snapshot everything in a label

Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources