Hi,
In the above graph, we have a scenario where in any one of value property of a node is updating, the effect of that value, to be propagated to the remaining nodes. How should this value change event be propagated thru' the cypher query.
Appreciate your support
Is the requirement that this particular property should always be the same for this group of nodes? If it must be the same, then I would recommend extracting it into a node instead, and create relationships to that node from all nodes that should be using it.
With the value in a single place, it will only require a single property change on that node and everything will be in the right state.
EDIT
Requirements are rather fuzzy, so my answer will be fuzzy as well.
If you're matching based upon relationship types, then you'll want some kind of multiplicity on the relationship and maybe specifying allowed types in the match. Such as:
MATCH (start:RNode)-[:R45|R34|R23|R12*]->(r:RNode)
WHERE start.ID = 123 (or however you're matching on your start node)
That will match on every single node from your startNode up the relationship chain until there are no more of the allowed relationships to continue traversing.
If you need a more complicated expansion, you may want to look at the APOC Procedure library's Path Expander.
After you find the right matching query, then it should just be a matter of doing the recalculation for all matched nodes.
Related
I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.
Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.
The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.
I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.
Here is a Gist of a sample Cypher dump
cypher-dump
This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
RETURN path
This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
RETURN path
What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.
This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.
FYI:
[*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).
The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].
Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).
Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.
Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.
You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.
Suppose I have 3 subgraphs in Neo4j and I would like to select and delete the whole subgraph if all the nodes in the subgraph matching the filtering criteria that is each node's property value <= 1. However if there is atleast one node within the subgraph that is not matching the criteria then the subgraph will not be deleted.
In this case the left subgraph will be deleted but the right subgraph and the middle one will stay. The right one will not be deleted even though it has some nodes with value 1 because there are also nodes with values greater than 1.
userids and values are the node properties.
I will be thankful if anyone can suggest me the cypher query that can be used to do that. Please note that the query will be on the whole graph, that is on all three subgraphs or more if there are anymore.
Thanks for the clarification, that's a tricky requirement, and it's not immediately clear to me what the best approach is that will scale well with large graphs, as most possibilities seem to be expensive full graph operations. We'll likely need to use a few steps to set up the graph for easier querying later. I'm also assuming you mean "disconnected subgraphs", otherwise this answer won't work.
One start might be to label nodes as :Alive or :Dead based upon the property value. It should help if all nodes are of the same label, and if there's an index on the value property for that label, as our match operations could take advantage of the index instead of having to do a full label scan and property comparison.
MATCH (a:MyNode)
WHERE a.value <= 1
SET a:Dead
And separately
MATCH (a:MyNode)
WHERE a.value > 1
SET a:Alive
Then your query to mark nodes to delete would be:
MATCH (a:Dead)
WHERE NOT (a)-[*]-(:Alive)
SET a:ToDelete
And if all looks good with the nodes you've marked for delete, you can run your delete operation, using apoc.periodic.commit() from APOC Procedures to batch the operation if necessary.
MATCH (a:ToDelete)
DETACH DELETE a
If operations on disconnected subgraphs are going to be common, I highly encourage using a special node connected to each subgraph you create (such as a single :Cluster node at the head of the subgraph) so you can begin such operations on :Cluster nodes, which would greatly speed up these kind of queries, since your query operations would be executed per cluster, instead of per :Dead node.
Setup:
Neo4j and Cypher version 2.2.0.
I'm querying Neo4j as an in-memory instance in Eclipse created TestGraphDatabaseFactory().newImpermanentDatabase();.
I'm using this approach as it seems faster than the embedded version and I assume it has the same functionality.
My graph database is randomly generated programmatically with varying numbers of nodes.
Background:
I generate cypher queries automatically. These queries are used to try and identify a single 'target' node. I can limit the possible matches of the queries by using known 'node' properties. I only use a 'name' property in this case. If there is a known name for a node, I can use it to find the node id and use this in the start clause. As well as known names, I also know (for some nodes) if there are names known not to belong to a node. I specify this in the where clause.
The sorts of queries that I am running look like this...
START
nvari = node(5)
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvari:C4)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION),
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION),
WHERE
NOT(nvarj.Name IN ['nf']) AND NOT(nvarm.Name IN ['nb','nj'])
RETURN DISTINCT target
Another way to think about this (if it helps), is that this is an isomorphism testing problem where we have some information about how nodes in a query and target graph correspond to each other based on restrictions on labels.
Question:
With regards to optimisation:
Would it help to include relation variables in the match clause? I took them out because the node variables are sufficient to distinguish between relationships but this might slow it down?
Should I restructure the match clause to have match/where couples including the where clauses from my previous example first? My expectation is that they can limit possible bindings early on. For example...
START
nvari = node(5)
MATCH
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarj.Name IN ['nf'])
MATCH
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarm.Name IN ['nb','nj'])
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION)
RETURN DISTINCT target
On the side:
(Less important but still an interest) If I make each relationship in a match clause an optional match except for relationships containing the target node, would cypher essentially be finding a maximum common sub-graph between the query and the graph data base with the constraint that the MCS contains the target node?
Thanks a lot in advance! I hope I have made my requirements clear but I appreciate that this is not a typical use-case for Neo4j.
I think querying with node properties is almost always preferable to using relationship properties (if you had a choice), as that opens up the possibility that indexing can help speed up the query.
As an aside, I would avoid using the IN operator if the collection of possible values only has a single element. For example, this snippet: NOT(nvarj.Name IN ['nf']), should be (nvarj.Name <> 'nf'). The current versions of Cypher might not use an index for the IN operator.
Restructuring a query to eliminate undesirable bindings earlier is exactly what you should be doing.
First of all, you would need to keep using MATCH for at least the first relationship in your query (which binds target), or else your result would contain a lot of null rows -- not very useful.
But, thinking clearly about this, if all the other relationships were placed in separate OPTIONAl MATCH clauses, you'd be essentially saying that you want a match even if none of the optional matches succeeded. Therefore, the logical equivalent would be:
MATCH (target:C5)-[:IN_LOCATION]->(nvara:LOCATION)
RETURN DISTINCT target
I don't think this is a useful result.
In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?
You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.
My graph is composed of multiple "sub-graphes" that are disconnected from one another. These sub-graphes are composed of nodes that are connected with a given relation type.
I would like to get (for example) the list of sub-graphes that contain at least one node that has the property "name" equals "John".
It's equivalent to finding one node per subgraph having this property.
One solution would be to find all the nodes having this property and loop through this list to only pick the ones that are not connected to the previously picked ones. But that would be ugly and quite heavy. Is there an elegant way to do that with Cypher?
I'm trying with something along this direction but have no success so far:
START source=node:user('name:"John"')
MATCH source-[r?:KNOWS*]-target
WHERE r is null
RETURN source
Try this one it may help
START source=node:user('name:"John"')
MATCH source-[r:KNOWS]-()-[r2:KNOWS]-target
WHERE NOT(source-[r:KNOWS]-target)
RETURN target