Neo4j Key-Value List recommended implementation - neo4j

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.

Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Related

Creating Stateful nodes in Neo4j

When I say Stateful Node, I mean a node that carries ‘state info,’ such as the path that leads to this node. E.g. R1 is a node, and
state1: link coming from path 1
state2: link coming from path 2
Is there any way I could create such a node in Neo4j? While traversing such a node, I expect it to behave like this:
if state 1, and input is x, then [:has] node1
if state one and input is y, then stop
if state two and input is z, then [: has] node 2.
I want to convert node R1 to a stateful node so that it keeps the information mentioned above. Does Neo4J support such nodes? If so, could you guide me to a resource? Also, does the cipher query support the ‘stateful’ approach so I can set the state according to the path from which R1 is produced?
In the Neo4j architecture, a relationship is a doubly linked-list that stores pointers to the start and end nodes.
It sounds like what you're looking to do is create nodes that store that same information for all relationships that touch it, and then have behavior based on how the graph reaches them.
This is more akin to logic control, and Cypher handles that through filters on relationship type, node labels, and properties.
However, you can always set properties of nodes based on queries. For example:
MATCH (:AUTH_T)-[:HAS]->(n:R1)
SET R1.reached_by = "HAS"
Then you could do something with that in the future, like if you want to know if node n was reached by another method.

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

how new traverse api works

nowadaya i m learning new traverse api of neo4j and i followed the link below
http://neo4j.com/docs/stable/tutorial-traversal-java-api.html
so now i know how to use uniqueness,evaluater etc.
that is i know how to change beahviours of the api.
but the thing i want to know is that how exactly it traverse.
for example im trying to find neighbours of a node.
does neo4j use index to find this?
does neo4j keep a hash to find neighbours?
more specifically, when i write the following code for example.
TraversalDescription desc = database.traversalDescription().breadthFirst().evaluator( Evaluators.toDepth( 3) );
node =database.getNodeById(4601410);
Traverser traverser = desc.traverse(node);
in my description i used breadthFirst. So it means that when i give node to traverse, the code should find the first neighbours. So how the api finds the first neighbours is the thing i want to know. Is there a pointer to neighbours in node? So when i say traverse until to depth 3 it finds the first neighbours and then take the neighbours as node in a recursive function and so on? So if we say to depth 10 then it can be slow?
so what i want exactly is how i can change the natural behaviour of the api to traverse?
Simplified, Neo4j stores records representing nodes and relationships a.s.o. in its store. Every node is represented by a node record on disk, that record contains a pointer (direct offset into relationship store) for the first relationship (neighbour if you will). Relationship records link to each other, so getting all neighbours for a node will read the node record, its relationship pointer to that relationship record and continue following those forward pointers until the end of that chain. Does that answer your question?
TraversalDescription features a concept of PathExpander - that is the component deciding which relationships will be used for the next step. Use TraversalDescription.expand() for this.
You can either use your own implementation for PathExpander or use one of the predefined methods in PathExpanders.
If you just want your traversal follow specific relationship types you can use TraversalDescription.relationships() to specify those.

neo4j - how to snapshot everything in a label

Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.

Assumptions regarding Node ID strings in Neo4j - cypher

In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?
You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.

Resources