Creating Stateful nodes in Neo4j - neo4j

When I say Stateful Node, I mean a node that carries ‘state info,’ such as the path that leads to this node. E.g. R1 is a node, and
state1: link coming from path 1
state2: link coming from path 2
Is there any way I could create such a node in Neo4j? While traversing such a node, I expect it to behave like this:
if state 1, and input is x, then [:has] node1
if state one and input is y, then stop
if state two and input is z, then [: has] node 2.
I want to convert node R1 to a stateful node so that it keeps the information mentioned above. Does Neo4J support such nodes? If so, could you guide me to a resource? Also, does the cipher query support the ‘stateful’ approach so I can set the state according to the path from which R1 is produced?

In the Neo4j architecture, a relationship is a doubly linked-list that stores pointers to the start and end nodes.
It sounds like what you're looking to do is create nodes that store that same information for all relationships that touch it, and then have behavior based on how the graph reaches them.
This is more akin to logic control, and Cypher handles that through filters on relationship type, node labels, and properties.
However, you can always set properties of nodes based on queries. For example:
MATCH (:AUTH_T)-[:HAS]->(n:R1)
SET R1.reached_by = "HAS"
Then you could do something with that in the future, like if you want to know if node n was reached by another method.

Related

Using cypher or traversal api to match only a single node on extreme sides of a path

Say I have following path in the graph:
(:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-()<-[*]-(centernode)-[*]->()-[:RelType2]->(:Type2)-[:RelType1]->(:Type1)
Given <id> of (:Type1) node on left side, I am able to MATCH above path and get corresponding (:Type1) node on right side (notice that the path is symmetric and its center is node (centernode)). In my usecase we get <id>s of (:Type1) node, get the corresponding (:Type1) node on the other side and then process further.
However it may happen that I get <id>s of both nodes of (:Type1). In that case separate queries will be fired starting at corresponding node and will evaluate to the (:Type1) node on the other side, thus further execution will continue on both the nodes.
Q1. How can I avoid processing both nodes. That is, if given two <id>s of (:Type1) nodes which reside on extreme sides of same path, how can I ensure only one of the queries starting at one of these nodes matching node on the other side is executed so that only one of those nodes are processed further and other node is say held in temporary buffer to process afterwards (if processing of first node fails).
Added fact: Above I have a single path with two (:Type1) nodes at its extreme sides. I may have three or more paths emanating from (centernode) and ending in (:Type1) node. So I want only one of those (:Type1) nodes to get processed first, and next (:Type1) node will processed only if earlier processing fails.
Q2. Is this scenario even possible with pure cypher? Or I have to end up using Neo4J Traversal API? If yes how this can be done, as I have to ensure uniqueness of nodes/relationships visited across two different traveresals.
Q3. How can I add path expander in Traversal API to match path of type (:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-(). Should I be doing something like this:
at each traversal `next()`
if (node is of Type1)
follow <-[:RelType1]-
if (node is of Type2)
follow <-[:RelType2]-
(Above is pseudocode. I am new to Traversal API. I have went through all docs and examples. So I am guessing inside expander I have to put if() filters to check current nodes type and decide which relation type and its direction to expand next. Above pseudocode is meant to indicate that.)
Is this how such cypher can be writting in Traversal API? Or is there any better way?
An old trick is to use node ids to order pairs (ID(a) < ID(b)), which filters out "duplicate" results. So if you feed all your source IDs into a single query, you can make use this trick to filter out duplicates:
WITH [1, 2, 3, 4] AS sourceIds
UNWIND sourceIds AS sourceId
MATCH (source:Type1)
WHERE ID(source) = sourceId
MATCH
(source)<-[:RelType1]-(:Type2)<-[:RelType2]-
()<-[*]-(centernode)-[*]->()
-[:RelType2]->(:Type2)-[:RelType1]->(target:Type1)
WHERE ID(source) < ID(target)
RETURN source, target
Could this work for your use case?

Cyper query- Property value change propagation

Hi,
In the above graph, we have a scenario where in any one of value property of a node is updating, the effect of that value, to be propagated to the remaining nodes. How should this value change event be propagated thru' the cypher query.
Appreciate your support
Is the requirement that this particular property should always be the same for this group of nodes? If it must be the same, then I would recommend extracting it into a node instead, and create relationships to that node from all nodes that should be using it.
With the value in a single place, it will only require a single property change on that node and everything will be in the right state.
EDIT
Requirements are rather fuzzy, so my answer will be fuzzy as well.
If you're matching based upon relationship types, then you'll want some kind of multiplicity on the relationship and maybe specifying allowed types in the match. Such as:
MATCH (start:RNode)-[:R45|R34|R23|R12*]->(r:RNode)
WHERE start.ID = 123 (or however you're matching on your start node)
That will match on every single node from your startNode up the relationship chain until there are no more of the allowed relationships to continue traversing.
If you need a more complicated expansion, you may want to look at the APOC Procedure library's Path Expander.
After you find the right matching query, then it should just be a matter of doing the recalculation for all matched nodes.

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

neo4j - how to snapshot everything in a label

Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources