how new traverse api works - neo4j

nowadaya i m learning new traverse api of neo4j and i followed the link below
http://neo4j.com/docs/stable/tutorial-traversal-java-api.html
so now i know how to use uniqueness,evaluater etc.
that is i know how to change beahviours of the api.
but the thing i want to know is that how exactly it traverse.
for example im trying to find neighbours of a node.
does neo4j use index to find this?
does neo4j keep a hash to find neighbours?
more specifically, when i write the following code for example.
TraversalDescription desc = database.traversalDescription().breadthFirst().evaluator( Evaluators.toDepth( 3) );
node =database.getNodeById(4601410);
Traverser traverser = desc.traverse(node);
in my description i used breadthFirst. So it means that when i give node to traverse, the code should find the first neighbours. So how the api finds the first neighbours is the thing i want to know. Is there a pointer to neighbours in node? So when i say traverse until to depth 3 it finds the first neighbours and then take the neighbours as node in a recursive function and so on? So if we say to depth 10 then it can be slow?
so what i want exactly is how i can change the natural behaviour of the api to traverse?

Simplified, Neo4j stores records representing nodes and relationships a.s.o. in its store. Every node is represented by a node record on disk, that record contains a pointer (direct offset into relationship store) for the first relationship (neighbour if you will). Relationship records link to each other, so getting all neighbours for a node will read the node record, its relationship pointer to that relationship record and continue following those forward pointers until the end of that chain. Does that answer your question?

TraversalDescription features a concept of PathExpander - that is the component deciding which relationships will be used for the next step. Use TraversalDescription.expand() for this.
You can either use your own implementation for PathExpander or use one of the predefined methods in PathExpanders.
If you just want your traversal follow specific relationship types you can use TraversalDescription.relationships() to specify those.

Related

How to optimise recursive query - Neo4j?

I am developing a contact tracing framework using Neo4j. There are 2 types of nodes, namely Person and Location. There exists a relationship VISITED between a Person and a Location, which has properties startTS and endTS. Example:
Now suppose person 1 is infected. I need to find all the persons who have been in contact with this person. For each person identified, I need to find all other persons who have been in contact with that person. This process is repeated until an identified person has not met anyone. Here is a working code:
MATCH path = (infected:Person {id:'1'})-[*]-(otherPerson:Person)
WITH relationships(path) as rels, otherPerson
WHERE all(i in range(1, size(rels)-1)
WHERE i % 2 = 0
OR (rels[i].endTS >= rels[i-1].startTS AND rels[i].startTS <= rels[i-1].endTS)
)
RETURN otherPerson
The problem is that the process is taking way too much time to complete with large datasets. Can the above query be optimised? Thank you for your help.
For this one, unfortunately, there are some limitations on our syntax for filtering these more complex conditions during expansion. We can cover post-expansion filtering, but you'd want an upper bound otherwise this won't perform well on a more complex graph.
To get what you need today (filtering during-expansion instead of after), you would need to implement a custom procedure in Java leveraging our traversal API, and then call the procedure in your Cypher query.
Advanced syntax that can cover these cases has already been proposed for GQL, and we definitely want that in Cypher. It's on our backlog.

How does Neo4j Traverse a graph using relationship ID's in nodestore file

Something has confused me a lot I was Wondering If you could help me with this please
According to Neo4j graph database book, there are 4 bytes in node store file contains the ID of the nodes relationship . If the node has 100 relationship (and all of them are the node's first relationship in the relationship chain) how does neo4j understand which id to choose??? for example I wrote Match(a:user{Name:'a')-[r:Has-skill]->(b:skill)
Imagine The user node has lot's of relationship but we are interested in [has_skill] relationship how does neo4j understand which id in related to this relationship?
The relationship chain that you are talking about is not the same as a "path". A node does not have more than one relationship that is the first in the chain.
The chain of relationships is a doubly-linked list that contains that Node's relationships. Given that Neo4J already has found the first user in the pattern, it will perform the following steps (or something similar):
Follow the pointer from the node record to the first element of the linked list that contains all of that node's relationships (this first element is the "first relationship in the chain").
For each element of the linked list:
Check if matches the criteria for the searched-for relationship (here, it would be that it has the type HAS_SKILL).
If it does match the criteria, the relationship is kept for future following; if it does not match, it is discarded.
Follow the pointer to the next element in the relationship linked-list (in the "chain"); if at the last element already, exit the loop.
For each of the relationships retrieved by scanning the linked list, follow them to the node they point to and continue evaluating the pattern.
The actual algorithm may differ slightly; e.g. it may use depth-first traversal instead of breadth-first, or it may be optimized in a different way, but the end result is the same.
From Graph Databases, 2nd Edition by Ian Robinson, Jim Webber and Emil Eifrem, page 154:
To find a relationship for a node, we follow that node’s relationship pointer to its first relationship (the LIKES relation‐ ship in this example). From here, we then follow the doubly linked list of relation‐ ships for that particular node (that is, either the start node doubly linked list, or the end node doubly linked list) until we find the relationship we’re interested in.
Finally, #InverseFalcon points out that this will be implemented differently for densely-related nodes, by their estimate at around 50+ relationships. At this point, a slightly different structure is used which groups by types and direction so the cost to search through is reduced.

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

Nearest nodes to a give node, assigning dynamically weight to relationship types

I need to find the N nodes "nearest" to a given node in a graph, meaning the ones with least combined weight of relationships along the path from given node.
Is is possible to do so with a pure Cypher only solution? I was looking about path functions but couldn't find a viable way to express my query.
Moreover, is it possible to assign a default weight to a relationship at query time, according to its type/label (or somehow else map the relationship type to the weight)? The idea is to experiment with different weights without having to change a property for every relationship.
Otherwise I would have to change the weight property's value to each relationship and re-do it to before each query, which is very time-consuming (my graph has around 10M relationships).
Again, a pure Cypher solution would be the best, or please point me in the right direction.
Please use variable length Cypher queries to find the nearest nodes from a single node.
MATCH (n:Start { id: 0 }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
Note that the syntax [CONNECTED*0..2] is a range parameter specifying the min and max relationship distance from a given node, with relationship type CONNECTED.
You can swap this relationship type for other types.
In the case you wanted to traverse variably from the start node to surrounding nodes but constrain via a stop criteria to a threshold, that is a bit more difficult. For these kinds of things it is useful to get acquainted with Neo4j's spatial plugin. A good starting point to learn more about Neo4j spatial can be found in this blog post: http://neo4j.com/blog/neo4j-spatial-part1-finding-things-close-to-other-things
The post is a little outdated but if you do some Google searching you can find more updated materials.
GitHub repository: https://github.com/neo4j-contrib/spatial

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources