Neo4j match Relationship parameters satisfying a certain schema - neo4j

Using cypher, is there any way to match a path where the relationships satisfy a certain input schema generically?
I know I can do something like
parameters: {
"age": 20
}
MATCH (n)-[r:MY_RELATION]-() WHERE $age>18 AND $age<24 ...
when i want to match only relations satisfying the schema { "type": "integer", "minimum": 19, "maximum": 23 }.
But then i have to specify the min and max range within the relationship. What if I want to match strings against a schema, or even more complex types like address with subparameters etc.?
Is there a generic way to do it?
edit:
I'll try and state the question more clearly. What I want is a graph and a (parameterized) query to traverse that graph, such that:
i get all the nodes in the traversal
the relationships in the graph pose certain constraints on the query parameters (like minimum on an integer)
the traversal only follows edges where the constraint is met
i need to make constraints on integers, like min/max, but as well on strings, like pattern, etc.
edit 2:
What I want may not even be possible.
I want all of the information about the constraint to reside in the edge, including the parameter to test against. So I would want something along the lines of
parameters: { "age": 20, "location": "A" }
MATCH (n)-[r]-()
WHERE r.testtype='integer' AND getParameterByName(r.testparamname) < r.max
OR r.testtype='string' AND getParameterByName(r.testparamname)=r.allowedStringValue
Of course, as can be read in the neo4j documentation about parameter functionality it should not be possible to dynamically load the parameter via a name that resides in the DB.
There may yet be some workaround?

[UPDATED]
Original answer:
Your question is not stated very clearly, but I'll try to answer anyway.
I think something like this is what you want:
parameters: {
"minimum": 19,
"maximum": 23
}
MATCH (n)-[r:MY_RELATION]-() WHERE $maximum >= r.age >= $minimum
...
There is no need to specify a "type" parameter. Just make sure your parameter values are of the appropriate type.
New answer (based on updated question):
Suppose the parameters are specified this way (where test indicates the type of test):
parameters: {
"age": 20,
"test": "age_range"
}
Then you could do this (where r would contain the properties test, min, and max):
MATCH (n)-[r:MY_RELATION]-(m)
WHERE r.test = $test AND r.min <= $age <= r.max
RETURN n, r, m;
Or, if you do not need all the relationships to be of the same type, this should also work and may be easier to visualize (where r would be of, say, type "age_range", and contain the properties min and max):
MATCH (n)-[r]-(m)
WHERE TYPE(r) = $test AND r.min <= $age <= r.max
RETURN n, r, m;
To help you decide which approach to use, you should profile the two approaches with your code and some actual data to see which is faster for you.
Even Newer answer (based on edit 2 in question)
The following parameter and query should do what you want. Square brackets can be used to dyamically specify the name of a property.
parameters: {
"data": {
"age": 20,
"location": "A"
}
}
MATCH (n)-[r]-()
WHERE r.testtype='integer' AND $data[r.testparamname] < r.max
OR r.testtype='string' AND $data[r.testparamname]=r.allowedStringValue
...

Does this solution meet your requirements?
Considering the following small sample data set
MERGE (p1:Person {name: 'P 01'})
MERGE (p2:Person {name: 'P 02'})
MERGE (p3:Person {name: 'P 03'})
MERGE (p1)-[:MY_RELATION { minimum: 19, maximum: 23 }]->(p2)
MERGE (p2)-[:MY_RELATION { minimum: 19, maximum: 20 }]->(p3)
This query will only return the nodes and relationship where the supplied parameter fits the relationship constraints (e.g. $age = 21 should only return a single row). It is basically the inverse of #cybersam's proposal.
MATCH (s:Person)-[r:MY_RELATION]->(e:Person)
WHERE r.minimum <= $age <= r.maximum
RETURN *

Related

Cypher: how do I check that at least one of the nodes in a set matches a given property?

I have a data model in neo4j where a Person node may be "merged" with another — not literally merged, just a relation in the form:
(a:Person)-[:MERGED]-(other:Person)
And, of course, b can be merged with someone else, in a potentially endless path.
I have a query to return a list of persons, with the 'merged' persons — that is, anyone in the :MERGED path — embedded as a property.
MATCH (a:Person)
CALL {
WITH a
MATCH path = (a)-[:MERGED*]-(other)
RETURN COLLECT(other{.label}) as b
}
RETURN a{.label, merged_items:b}
This returns, for example, something like:
{
"label": "John Smith",
"merged_items": [
{
"label": "Toby Jones"
},
{
"label": "Seamus McGibbon"
},
{
"label": "Aaron Drew"
}
]
}
for each of the Persons in this chain of merges (so actually the full result has four items, with each of the connected people being a — this is precisely what I want).
Now, I want to be able to filter the results by the Person.label, but any one of the Persons in the chain could match (either a OR any of the others).
Any idea how I might go about this?
I've tried a lot of different things (any(), for example) but can't get it to work.
The syntax for any() is WHERE any(e IN list WHERE predicate(e))
So in your case, this should work.
WITH COLLECT(other{.label}) as b
WHERE any(e IN b WHERE e.label = a.label)
RETURN b
You could in principle already apply it to the path before you collect. The tail(list) is so that it excludes a which would be the first node of the path.
MATCH ...
WHERE any(n in tail(nodes(path)) WHERE n.label = a.label)

neo4j all relations to certain depth between 2 sets

Playing with simple neo4j queries. My base match is :
MATCH (:Movie { id: '10' })-[*0..3]-(p:Producer)
RETURN p.id
This returns some results, so obviously there are some relations between movie-10 and any producer. Part of the result set is:
'producer_12'
'producer_18'
'producer_36'
.........
Now I want to return all relations between movie-10 and producer_12 or producer_18 up to 3 hops. I modified my match.
MATCH (:Movie { id: '10' })-[*0..3]-(p:Producer)
WHERE p.id IN ['producer_12', 'producer_18']
RETURN p.id
And this already doesn't return any value, while I expected producers 12 and 18 to be in the answer. Besides I can't find the way to label the relation. This is not accepted. [r:*0..3].
My final query must be to get all relations between 2 sets (movies 10, 12 or 15) and (producers 12 or 18) for example.
I simulated your scenario here.
The sample data:
CREATE (movie:Movie {id : '10'})
CREATE (producer12:Producer {id:'producer_12'})
CREATE (producer18:Producer {id:'producer_18'})
CREATE (producer36:Producer {id:'producer_36'})
CREATE (movie)-[:PRODUCTED_BY]->(producer12)
CREATE (movie)-[:PRODUCTED_BY]->(producer18)
CREATE (movie)-[:PRODUCTED_BY]->(producer36)
Querying:
MATCH (:Movie { id: '10' })-[*0..3]-(p:Producer)
WHERE p.id IN ['producer_12', 'producer_18']
RETURN p.id
The result:
╒═════════════╕
│"p.id" │
╞═════════════╡
│"producer_12"│
├─────────────┤
│"producer_18"│
└─────────────┘
Probably your id property of :Movie nodes is not a string but an integer. So try changing your query to:
MATCH (:Movie { id: 10 })-[*0..3]-(p:Producer)
WHERE p.id IN ['producer_12', 'producer_18']
RETURN p.id
That is: change '10' to 10.
Besides I can't find the way to label the relation. This is not
accepted. [r:*0..3].
This is because you are not using a type in the relationship. The : is only used in conjunction with a type (for example, [r:SOME_TYPE*0..3]). So remove the :, this way: [r *0..3].
EDIT:
From comments:
About the last sentence: It's still working but it says that "This
feature is deprecated and will be removed in future versions. Binding
relationships to a list in a variable length pattern is deprecated" –
user732456 3 hours ago
Binding relationships to a list in a variable length pattern is deprecated since 3.2.0-rc1.
According this pull request Cypher queries like:
MATCH (n)-[rs*]-() RETURN rs
will generate a warning and the canonical way to write the same query is:
MATCH p=(n)-[*]-() RETURN relationships(p) AS rs

Return Relationship belonging to a list Neo4j cypher

I have this dataset containing 3M nodes and more than 5M relationships. There about 8 different relationship types. Now I want to return 2 nodes if they are inter-connected.. Here the 2 nodes are A & B and I would like to see if they are inter-connected.
MATCH (n:WCD_Ent)
USING INDEX n:WCD_Ent(WCD_NAME)
WHERE n.WCD_NAME = "A"
MATCH (m:WCD_Ent)
USING INDEX m:WCD_Ent(WCD_NAME)
WHERE m.WCD_NAME = "B"
MATCH (n) - [r*] - (m)
RETURN n,r,m
This gives me Java Heap Space error.
Another conditionality I am looking to put in my query is if the relationship between the 2 nodes A&B contains one particular relationship type(NAME_MATCH) atleast once. A Could you help me address the same?
Gabor's suggestion is the most important fix; you are blowing up heap space because you are generating a cartesian product of rows to start, then filtering out using the pattern. Generate rows using the pattern and you'll be much more space efficient. If you have an index on WCD_Ent(WCD_NAME), you don't need to specify the index, either; this is something you only do if your query is running very slow and a PROFILE shows that the query planner is skipping the index. Try this one instead:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*..5]-(m:WCD_Ent { WCD_NAME: "B" })
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
The WHERE filter here will check all of the relationships in r (which is a collection, the way you've assigned it) and ensure that at least 1 of them matches the desired type.
Tore's answer (including the variable relationship upper bound) is the best one for finding whether two nodes are connected and if a certain relationship exists in a path connecting them.
One weakness with most of the solutions given so far is that there is no limitation on the variable relationship match, meaning the query is going to crawl your entire graph attempting to match on all possible paths, instead of only checking that one such path exists and then stopping. This is likely the cause of your heap space error.
Tore's suggesting on adding an upper bound on the variable length relationships in your match is a great solution, as it also helps out in cases where the two nodes aren't connected, preventing you from having to crawl the entire graph. In all cases, the upper bound should prevent the heap from blowing up.
Here are a couple more possibilities. I'm leaving off the relationship upper bound, but that can easily be added in if needed.
// this one won't check for the particular relationship type in the path
// but doesn't need to match on all possible paths, just find connectedness
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN EXISTS((n)-[*]-(m))
// using shortestPath() will only give you a single path back that works
// however WHERE ANY may be a filter to apply after matches are found
// so this may still blow up, not sure
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN shortestPath((n)-[r*]-(m))
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
// Adding LIMIT 1 will only return one path result
// Unsure if this will prevent the heap from blowing up though
// The performance and outcome may be identical to the above query
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
MATCH (n)-[r*]-(m)
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
LIMIT 1
Some enhancements:
Instead of the WHERE condition, you can bind the property value inside the pattern.
You can combine the three MATCH conditions into a single one, which makes sure that the query engine will not calculate a Cartesian product of n AND m. (You can also use EXPLAIN to visualize the query plan and check this.)
The resulting query:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*]-(m:WCD_Ent { WCD_NAME: "B" })
RETURN n, r, m
Update: Tore Eschliman pointed out that you don't need to specify the indices, so I removed these two lines from the query:
USING INDEX n:WCD_Ent(WCD_NAME)
USING INDEX m:WCD_Ent(WCD_NAME)

Can Cypher filter results based on an attribute of the first encountered node of a given type?

I'm using Neo4J and learning Cypher, and have a question about filtering results based on an attribute of the first encountered node of a given type (in the OPTIONAL MATCH line of the example code below).
My query is as follows:
MATCH
(a:Word),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p) WHERE 'Word' in labels(n))
) - 1 as Shortest_Number_of_Hops_Only_Counting_Words,
length(p) as Shortest_Number_of_Hops_Counting_All_Nodes
Two general types of paths might occur in the database:
(a:Word) <-[IS_A_FORM_OF]- (Morph) -[IS_A_FORM_OF]-> (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (b:Word)
and
(a:Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (b:Word)
There might be any number of hops (currently capped at 15 in the query above) between a and b.
I've tried to give a very specific example above, but my question really is a very general one about using Cypher: I would like to filter for paths in which the first Synset node encountered contains a certain attribute (for example, {part_of_speech: 'verb'}. I've been reading the Cypher refcard and am wondering whether the head() expression should be used to somehow select the first Synset node in the path, but I'm unsure how to do it. Is there a straightforward way to add this to the MATCH / WHERE statement?
You can match Synset node by its property like this
MATCH (verb:Synset {part_of_speech: 'verb'})
RETURN verb
Then variable verb will match only Synset nodes whose part_of_speech property is "verb".
You can use this variable further on in your request. For example you can write essentially the same request restricting the value of node's property in WHERE section:
MATCH (verb:Synset)
WHERE verb.part_of_speech = 'verb'
RETURN verb
Applying to your request you might rewrite it like this:
MATCH
(a:Word) -[:IS_DEFINED_AS]-> (verb:Synset {part_of_speech: "verb"}),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS]-(verb)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p) WHERE 'Word' in labels(n))
) - 1 as Shortest_Number_of_Hops_Only_Counting_Words,
length(p) as Shortest_Number_of_Hops_Counting_All_Nodes
#oleg-kurbatov's answer does work, but only if (a:Word) is immediately connected to a Synset (it doesn't account for instances where (a:Word) must travel through a node of type Morph, etc., before getting to a Synset (as in my first example path in the original question). Additionally, the adding-paths-together approach seems more computationally intensive – 802ms for my original query vs 2364ms using a slightly modified version of Oleg's suggested implementation (since Cypher/Neo4J doesn't allow specifying more than one specific hop when using shortestPath():
MATCH
(a:Word),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
MATCH p1 = (a)-[:IS_DEFINED_AS]-> (initial_synset:Synset{pos: 'v'})
OPTIONAL MATCH p2 = shortestPath((initial_synset)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p2) WHERE 'Word' in labels(n))
) as Shortest_Number_of_Hops_Only_Counting_Words,
length(p1) + length(p2) as Shortest_Number_of_Hops_Counting_All_Nodes
Taking Oleg's suggestion as a starting point, though, I did figure out one way to filter shortestPath() so that it only settles on a path where the first encountered Synset node has a 'pos' attribute of 'v', without increasing query execution time: I amended the OPTIONAL MATCH line in my original question to read:
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
WHERE head(filter(x in nodes(p) WHERE x:Synset)).pos = 'v'
As I understand, filter(x in nodes(p) WHERE x:Synset) gets a list of all Synset-type nodes in the path being considered. head(...) gets the first node from that list, and .pos = 'v' checks that that node's "pos" attribute is "v".

How to get nodes from relationships with any depth including relation filter

I'm using a query similar to this one:
(n)-[*]->(m)
Any depth.
But I cannot filter the relation name in such a query like this:
(n)-[*:DOES]->(m)
Any depth.
I need to filter the relation name since there are different relations on the related path. If it helps, here is my graph:
CREATE (Computer { name:'Computer' }),(Programming { name:'Programming' }),(Java { name:'Java' }),(GUI { name:'GUI' }),(Button { name:'Button' }), Computer<-[:IS]-Programming, Programming<-[:IS]-Java, Java<-[:IS]-GUI, GUI<-[:IS]-Button, (Ekin { name:'Ekin' }), (Gunes { name:'Gunes' }), (Ilker {name:'Ilker'}), Ekin-[:DOES]->Programming, Ilker-[:DOES]->Java, Ilker-[:DOES]->Button, Gunes-[:DOES]->Java
I'd like to get the names (Ekin, Ilker and Gunes) which have "DOES" relationship connected to "Programming" with any depth.
Edit:
I'm able to get the values I want by merging two different queries' results (think 13 is the top node that I want to reach):
START n=node(13)
MATCH p-[:DOES]->()-[*]->(n)
RETURN DISTINCT p
START n=node(13)
MATCH p-[:DOES]->(n)
RETURN DISTINCT p
I want to do it in a single query.
Change the matching pattern to "p-[:DOES]->()-[*0..]->n",
Match p-[:DOES]->()-[*0..]->n
Return distinct p.name
The variable length relationship "[*]" means 1..*. You need 0..* length relationships on the path.
Just to update the answer with Neo4j 3.0.
MATCH p-[:DOES*0..]->(n)
RETURN DISTINCT(p.name)
It returns the same result as the accepted answer.

Resources