Neo4j: Conditions on Relationships with Depth - neo4j

What I'm trying to do
Being relatively new to Neo4j, I'm trying to find certain nodes with Cypher in a Neo4j graph database. The nodes should be connected by a chain of relationships of a certain type with further conditions on the relationships:
// Cypher
START self = node(3413)
MATCH (self)<-[rel:is_parent_of*1..100]-(ancestors)
WHERE rel.some_property = 'foo'
RETURN DISTINCT ancestors
What goes wrong
If I drop the depth part *1..100, the query works, but of course, then allows only one relationship between self and the ancestors.
But, if I allow the ancestors to be several steps away from self by introducing the depth *1..100, the query fails:
Error: Expected rel to be a Map but it was a Collection
I thought, maybe this syntax defines rel to be is_parent_of*1..100 rather than defining rel to be a relationship of type is_parent_of and allowing a larger relationship depth.
So, I've tried to make my intentions clear by using parenthesis: [(rel:is_parent_of)*1..100. But this causes a syntax error.
I'd appreciate any help to fix this. Thanks!

Nomenclature
Calling *1..100 depth originates in the nomenclature of the neography ruby gem, where this is done using the abstract depth method.
In neo4j, this is called variable length relationships, as can be seen here in the documentation: MATCH / Variable length relationships.
Reason for the error
The reason for the "Expected rel to be a Map but it was a Collection" error is, indeed, that rel does not refer to each single relationship but rather to the entire collection of matched relationships.
For an example, see here in the documentation: MATCH / Relationship variable in variable length relationships.
Solution
First, acknowledge that the identifier refers to a collection (i.e. a set of multiple items) and call it rels instead of rel. Then, in the WHERE clause, state that the condition has to apply to all rel items in the rels collection using the ALL predicate.
// Cypher
START self = node(3413)
MATCH (self)<-[rels:is_parent_of*1..100]-(ancestors)
WHERE ALL (rel in rels WHERE rel.some_property = 'foo')
RETURN DISTINCT ancestors
The ALL predicate is explained here in the documentation: Functions / Predicate Functions.
I was led to this solution by this stackoverflow answer of a related question.
Long query time
Unfortunately, asking for relationship properties does cost a lot of time. The above query with only a couple of nodes in the database, takes over 3000ms on my development machine.

Related

How can I filter relationship condition for multiple iteration specified with Cypher?

I would like to get all nodes and relationships matching certain condition in relationship.
MATCH(a:DB {TABLE:'CONT',COLUMN:'STATUS_CDE'})-[b:RELATED*..]->(c:DB)
WHERE b.CLAUSE IN ['where','join','unknown']
RETURN a,b,c
But I got the below error message, when I tried to execute the above query.
Type mismatch: expected Map, Node or Relationship but was Collection<Relationship>
I am using Neo4j community edition v3.0.1.
How can I achieve my goal?
This is because you use a variable depth for relationship type RELATED - b is now a collection of relationships, not a single relation upon which you can use the IN operator.
Depending on whether you want every relationship to have one of these values, or just some/one, you can use one of the predicate functions like this
MATCH(a:DB {TABLE:'CONT',COLUMN:'STATUS_CDE'})-[b:RELATED*..]->(c:DB)
WHERE all(rel in b where rel.name in ['where','join','unknown'])
RETURN a,b,c
(untested)

How *not* to return the relationships that link a node to itself in neo4J

In the database that I am using, nodes often have multiple relationships to themselves which makes the resulting graph very messy. As this is for a presentation how do we structure a Cypher query which does not return the self-referencing relationships
I have tried
match p=((n:actor) -[*1..3]-> (nd:movie)) where n.name='Craig' and nd.name='Pride_and_prejudice' and not (n)-[]->(n) return p
didnt give the desired result.
If you have a lot of relationships from Actor to itself, the variable length path query might not be ideal. It will always include the self-referencing relationships which limits performance and gives too many results. One solution would be to explicitly MATCH the first step and filter for the label:
MATCH p=( (n:actor)-[r1]-(n1)-[*0..2]->(nd:Movie) )
WHERE NOT n1:actor
RETURN ...
The *0..2 relationship will catch cases where n1 is a Movie.
Alternatively, you can filter the variable length path for a property as described here: http://neo4j.com/docs/stable/query-match.html#match-match-with-properties-on-a-variable-length-path

Cypher query optimisation - Utilising known properties of nodes

Setup:
Neo4j and Cypher version 2.2.0.
I'm querying Neo4j as an in-memory instance in Eclipse created TestGraphDatabaseFactory().newImpermanentDatabase();.
I'm using this approach as it seems faster than the embedded version and I assume it has the same functionality.
My graph database is randomly generated programmatically with varying numbers of nodes.
Background:
I generate cypher queries automatically. These queries are used to try and identify a single 'target' node. I can limit the possible matches of the queries by using known 'node' properties. I only use a 'name' property in this case. If there is a known name for a node, I can use it to find the node id and use this in the start clause. As well as known names, I also know (for some nodes) if there are names known not to belong to a node. I specify this in the where clause.
The sorts of queries that I am running look like this...
START
nvari = node(5)
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvari:C4)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION),
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION),
WHERE
NOT(nvarj.Name IN ['nf']) AND NOT(nvarm.Name IN ['nb','nj'])
RETURN DISTINCT target
Another way to think about this (if it helps), is that this is an isomorphism testing problem where we have some information about how nodes in a query and target graph correspond to each other based on restrictions on labels.
Question:
With regards to optimisation:
Would it help to include relation variables in the match clause? I took them out because the node variables are sufficient to distinguish between relationships but this might slow it down?
Should I restructure the match clause to have match/where couples including the where clauses from my previous example first? My expectation is that they can limit possible bindings early on. For example...
START
nvari = node(5)
MATCH
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarj.Name IN ['nf'])
MATCH
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarm.Name IN ['nb','nj'])
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION)
RETURN DISTINCT target
On the side:
(Less important but still an interest) If I make each relationship in a match clause an optional match except for relationships containing the target node, would cypher essentially be finding a maximum common sub-graph between the query and the graph data base with the constraint that the MCS contains the target node?
Thanks a lot in advance! I hope I have made my requirements clear but I appreciate that this is not a typical use-case for Neo4j.
I think querying with node properties is almost always preferable to using relationship properties (if you had a choice), as that opens up the possibility that indexing can help speed up the query.
As an aside, I would avoid using the IN operator if the collection of possible values only has a single element. For example, this snippet: NOT(nvarj.Name IN ['nf']), should be (nvarj.Name <> 'nf'). The current versions of Cypher might not use an index for the IN operator.
Restructuring a query to eliminate undesirable bindings earlier is exactly what you should be doing.
First of all, you would need to keep using MATCH for at least the first relationship in your query (which binds target), or else your result would contain a lot of null rows -- not very useful.
But, thinking clearly about this, if all the other relationships were placed in separate OPTIONAl MATCH clauses, you'd be essentially saying that you want a match even if none of the optional matches succeeded. Therefore, the logical equivalent would be:
MATCH (target:C5)-[:IN_LOCATION]->(nvara:LOCATION)
RETURN DISTINCT target
I don't think this is a useful result.

NEO4J shortestPath taking into account a particular relationship pattern

I have a graph where I have chains of nodes that have a relationship [:LINKS_TO] and I can successfully get the shortestPath function to work.
For most of my users this level of detail is fine.
I have another set of users where there is a need for a richer set of information on the relationship. Given that properties on relationships are supposed to represent strengths or scores for the relationship I have created specific nodes to hold descriptive metadata.
This means I have a pattern that says (start)-[:PARTICIPATES]-(middle)-[:REFERENCES]->(end)
There can be any number of nodes between the start and end points in the chain.
I am struggling to get the shortestPath function to return any results for the more detailed chain. Is there a way to do this using Cypher?
You could also have kept your metadata information on the relationships.
For your needs, this should work:
MATCH p = shortestPath((start)-[:PARTICIPATES|:REFERENCES*]->(end))
RETURN nodes(p)

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Resources