Cypher: how do I check that at least one of the nodes in a set matches a given property? - neo4j

I have a data model in neo4j where a Person node may be "merged" with another — not literally merged, just a relation in the form:
(a:Person)-[:MERGED]-(other:Person)
And, of course, b can be merged with someone else, in a potentially endless path.
I have a query to return a list of persons, with the 'merged' persons — that is, anyone in the :MERGED path — embedded as a property.
MATCH (a:Person)
CALL {
WITH a
MATCH path = (a)-[:MERGED*]-(other)
RETURN COLLECT(other{.label}) as b
}
RETURN a{.label, merged_items:b}
This returns, for example, something like:
{
"label": "John Smith",
"merged_items": [
{
"label": "Toby Jones"
},
{
"label": "Seamus McGibbon"
},
{
"label": "Aaron Drew"
}
]
}
for each of the Persons in this chain of merges (so actually the full result has four items, with each of the connected people being a — this is precisely what I want).
Now, I want to be able to filter the results by the Person.label, but any one of the Persons in the chain could match (either a OR any of the others).
Any idea how I might go about this?
I've tried a lot of different things (any(), for example) but can't get it to work.

The syntax for any() is WHERE any(e IN list WHERE predicate(e))
So in your case, this should work.
WITH COLLECT(other{.label}) as b
WHERE any(e IN b WHERE e.label = a.label)
RETURN b
You could in principle already apply it to the path before you collect. The tail(list) is so that it excludes a which would be the first node of the path.
MATCH ...
WHERE any(n in tail(nodes(path)) WHERE n.label = a.label)

Related

Neo4j: Cypher query returns wrong json result

I have a problem with my cypher query.
Situation explained:
A user is able to connect to other CONTACT nodes, but he can also connect to EVENT nodes. Other users can also connect to these event nodes. We expect to retrieve the nodes we are connected to (CONTACT & EVENT) but we also need to retrieve the event nodes of the CONTACT nodes that we are connected to.
This is the graph we want to see when we retrieve the connected nodes from the bottom center CONTACT node:
But we receive this json output:
{
"_type": "Node",
"_id": 1,
"nodeType": "EVENT",
"nodeId": 1,
"connected_with": [
{
"_type": "Node",
"_id": 0,
"nodeType": "CONTACT",
"nodeId": 1
},
{
"_type": "Node",
"_id": 2,
"nodeType": "CONTACT",
"nodeId": 2,
"connected_with": [
{
"_type": "Node",
"_id": 0,
"nodeType": "CONTACT",
"nodeId": 1
}
]
}
]
}
We want to go 2 levels deep, meaning we want to see
contacts that we are connected to but also contacts we
"met" at an event hence the reason we want to go 2 levels deep.
We currently have this cypher query running but as previously mentioned, it's not working.
MATCH path = (n:Node {nodeId: 1})<-[:CONNECTED_WITH*]-(nodes)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value as json
RETURN json
Any help would be appreciated!
Your results seem to match what you say you want, except that it is in tree form (which you asked for).
You state that you do not "see" what you expected (presumably in the neo4j Browser). This is because the results you asked for are not plain nodes, relationships, and/or paths.
Try this, instead (note also the upper bound of 2 on the depth of the variable-length path pattern):
MATCH path = (n:Node {nodeId: 1})<-[:CONNECTED_WITH*..2]-(nodes)
RETURN path
Aside: Having just a single node label, Node, with a nodeType property that specifies the exact "type" of node is not generally the right way to model things. It makes it harder to understand the DB, tends to complicate your code, and makes it harder to take advantage of indexing. You probably want to have separate labels (say, Person and Event). You may also want to have different relationship types as well.

Neo4j match Relationship parameters satisfying a certain schema

Using cypher, is there any way to match a path where the relationships satisfy a certain input schema generically?
I know I can do something like
parameters: {
"age": 20
}
MATCH (n)-[r:MY_RELATION]-() WHERE $age>18 AND $age<24 ...
when i want to match only relations satisfying the schema { "type": "integer", "minimum": 19, "maximum": 23 }.
But then i have to specify the min and max range within the relationship. What if I want to match strings against a schema, or even more complex types like address with subparameters etc.?
Is there a generic way to do it?
edit:
I'll try and state the question more clearly. What I want is a graph and a (parameterized) query to traverse that graph, such that:
i get all the nodes in the traversal
the relationships in the graph pose certain constraints on the query parameters (like minimum on an integer)
the traversal only follows edges where the constraint is met
i need to make constraints on integers, like min/max, but as well on strings, like pattern, etc.
edit 2:
What I want may not even be possible.
I want all of the information about the constraint to reside in the edge, including the parameter to test against. So I would want something along the lines of
parameters: { "age": 20, "location": "A" }
MATCH (n)-[r]-()
WHERE r.testtype='integer' AND getParameterByName(r.testparamname) < r.max
OR r.testtype='string' AND getParameterByName(r.testparamname)=r.allowedStringValue
Of course, as can be read in the neo4j documentation about parameter functionality it should not be possible to dynamically load the parameter via a name that resides in the DB.
There may yet be some workaround?
[UPDATED]
Original answer:
Your question is not stated very clearly, but I'll try to answer anyway.
I think something like this is what you want:
parameters: {
"minimum": 19,
"maximum": 23
}
MATCH (n)-[r:MY_RELATION]-() WHERE $maximum >= r.age >= $minimum
...
There is no need to specify a "type" parameter. Just make sure your parameter values are of the appropriate type.
New answer (based on updated question):
Suppose the parameters are specified this way (where test indicates the type of test):
parameters: {
"age": 20,
"test": "age_range"
}
Then you could do this (where r would contain the properties test, min, and max):
MATCH (n)-[r:MY_RELATION]-(m)
WHERE r.test = $test AND r.min <= $age <= r.max
RETURN n, r, m;
Or, if you do not need all the relationships to be of the same type, this should also work and may be easier to visualize (where r would be of, say, type "age_range", and contain the properties min and max):
MATCH (n)-[r]-(m)
WHERE TYPE(r) = $test AND r.min <= $age <= r.max
RETURN n, r, m;
To help you decide which approach to use, you should profile the two approaches with your code and some actual data to see which is faster for you.
Even Newer answer (based on edit 2 in question)
The following parameter and query should do what you want. Square brackets can be used to dyamically specify the name of a property.
parameters: {
"data": {
"age": 20,
"location": "A"
}
}
MATCH (n)-[r]-()
WHERE r.testtype='integer' AND $data[r.testparamname] < r.max
OR r.testtype='string' AND $data[r.testparamname]=r.allowedStringValue
...
Does this solution meet your requirements?
Considering the following small sample data set
MERGE (p1:Person {name: 'P 01'})
MERGE (p2:Person {name: 'P 02'})
MERGE (p3:Person {name: 'P 03'})
MERGE (p1)-[:MY_RELATION { minimum: 19, maximum: 23 }]->(p2)
MERGE (p2)-[:MY_RELATION { minimum: 19, maximum: 20 }]->(p3)
This query will only return the nodes and relationship where the supplied parameter fits the relationship constraints (e.g. $age = 21 should only return a single row). It is basically the inverse of #cybersam's proposal.
MATCH (s:Person)-[r:MY_RELATION]->(e:Person)
WHERE r.minimum <= $age <= r.maximum
RETURN *

Return Relationship belonging to a list Neo4j cypher

I have this dataset containing 3M nodes and more than 5M relationships. There about 8 different relationship types. Now I want to return 2 nodes if they are inter-connected.. Here the 2 nodes are A & B and I would like to see if they are inter-connected.
MATCH (n:WCD_Ent)
USING INDEX n:WCD_Ent(WCD_NAME)
WHERE n.WCD_NAME = "A"
MATCH (m:WCD_Ent)
USING INDEX m:WCD_Ent(WCD_NAME)
WHERE m.WCD_NAME = "B"
MATCH (n) - [r*] - (m)
RETURN n,r,m
This gives me Java Heap Space error.
Another conditionality I am looking to put in my query is if the relationship between the 2 nodes A&B contains one particular relationship type(NAME_MATCH) atleast once. A Could you help me address the same?
Gabor's suggestion is the most important fix; you are blowing up heap space because you are generating a cartesian product of rows to start, then filtering out using the pattern. Generate rows using the pattern and you'll be much more space efficient. If you have an index on WCD_Ent(WCD_NAME), you don't need to specify the index, either; this is something you only do if your query is running very slow and a PROFILE shows that the query planner is skipping the index. Try this one instead:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*..5]-(m:WCD_Ent { WCD_NAME: "B" })
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
The WHERE filter here will check all of the relationships in r (which is a collection, the way you've assigned it) and ensure that at least 1 of them matches the desired type.
Tore's answer (including the variable relationship upper bound) is the best one for finding whether two nodes are connected and if a certain relationship exists in a path connecting them.
One weakness with most of the solutions given so far is that there is no limitation on the variable relationship match, meaning the query is going to crawl your entire graph attempting to match on all possible paths, instead of only checking that one such path exists and then stopping. This is likely the cause of your heap space error.
Tore's suggesting on adding an upper bound on the variable length relationships in your match is a great solution, as it also helps out in cases where the two nodes aren't connected, preventing you from having to crawl the entire graph. In all cases, the upper bound should prevent the heap from blowing up.
Here are a couple more possibilities. I'm leaving off the relationship upper bound, but that can easily be added in if needed.
// this one won't check for the particular relationship type in the path
// but doesn't need to match on all possible paths, just find connectedness
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN EXISTS((n)-[*]-(m))
// using shortestPath() will only give you a single path back that works
// however WHERE ANY may be a filter to apply after matches are found
// so this may still blow up, not sure
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
RETURN shortestPath((n)-[r*]-(m))
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
// Adding LIMIT 1 will only return one path result
// Unsure if this will prevent the heap from blowing up though
// The performance and outcome may be identical to the above query
MATCH (n:WCD_Ent { WCD_NAME: "A" }), (m:WCD_Ent { WCD_NAME: "B" })
MATCH (n)-[r*]-(m)
WHERE ANY(rel IN r WHERE TYPE(rel) = 'NAME_MATCH')
RETURN n, r, m
LIMIT 1
Some enhancements:
Instead of the WHERE condition, you can bind the property value inside the pattern.
You can combine the three MATCH conditions into a single one, which makes sure that the query engine will not calculate a Cartesian product of n AND m. (You can also use EXPLAIN to visualize the query plan and check this.)
The resulting query:
MATCH (n:WCD_Ent { WCD_NAME: "A" })-[r*]-(m:WCD_Ent { WCD_NAME: "B" })
RETURN n, r, m
Update: Tore Eschliman pointed out that you don't need to specify the indices, so I removed these two lines from the query:
USING INDEX n:WCD_Ent(WCD_NAME)
USING INDEX m:WCD_Ent(WCD_NAME)

Can I create and relate two nodes with the same name but different ids in neo4j

I have created two nodes in neo4j with the same name and label but with different ids:
CREATE (P:names {id:"1"})
CREATE (P:names{id:"2"})
My question is if I can create a relationship between these two nodes like this:
MATCH (P:names),(P:names)
WHERE P.id = "1" AND P.id = "2"
CREATE (P)-[r:is_connected_with]->(P) RETURN r"
I try it but it doesn't work.
Is it that I shouldn't create nodes with the same name or there is a workaround?
How about the following?
First run the create statements:
CREATE (p1:Node {id:"1"}) // note, named p1 here
CREATE (p2:Node {id:"2"})
Then, do the matching:
MATCH (pFirst:Node {id:"1"}), (pSecond:Node {id:"2"}) // and here we can call it something else
CREATE pFirst-[r:is_connected_with]->(pSecond)
RETURN r
Basically, you are matching two nodes (with the label Node). In your match you call them p1 and p2 but you can change these identifiers if you wish. Then, simply create the relationship between them.
You should not create identifiers with the same name. Also note that p1 and p2 are not the name of the node, it is the name of the identifier in this particular query.
EDIT: After input from the OP I have created a small Gist that illustrates some basics regarding Cypher.
#wassgren has the right answer about how to fix your query but I might be able to fill in some details about why and it's too long to leave in a comment.
The character before the colon when describing a node or relationship is referred to as an identifier, it's just a variable representing a node/rel within a Cypher query. Neo4j has some naming conventions that you are not following and as a result, it makes your query harder to read and will be harder for you to get help in the future. Best practices are:
Identifiers start lowercase: person instead of Person1, p instead of P
Labels are singular and have their first character capitalized: (p1:Name), not (p1:Names) or (p1:names) or (p1:name)
Relationships are all caps, [r:IS_CONNECTED_WITH], not [r:is_connected_with], though this one gets broken all the time ;-)
Back to your query, it both won't work and it doesn't follow conventions.
Won't work:
MATCH (P:names),(P:names)
WHERE P.id = "1" AND P.id = "2"
CREATE (P)-[r:is_connected_with]->(P) RETURN r
Will work, looks so much better(!):
MATCH (p1:Name),(p2:Name)
WHERE p1.id = "1" AND p2.id = "2"
CREATE (p1)-[r:IS_CONNECTED_WITH]->(p2) RETURN r
The reason your query doesn't work, though, is that by writing MATCH (P:names),(P:names) WHERE P.id = "1" AND P.id = "2", you are essentially saying "find a node, call it 'P', with an ID of both 1 and 2." That's not what you want and it obviously won't work!
If you're trying to create many nodes, you would rerun this query for each pair of nodes you want to create, changing the ID you assign each time. You can create the nodes and their relationship in one query, too:
CREATE (p1:Name {id:"1"})-[r:IS_CONNECTED_WITH]->(p2:Name {id:"2"}) RETURN r
In the app, just change the ID you want to assign to the nodes before you run the query. The identifiers are instance variables, they disappear when the query is complete.
EDIT #1!
One more thing, setting the id property within your app and assigning it to the database instead of relying on the Neo4j-created internal ID is a best practice. I suggest avoiding sequential IDs and instead using something to create a unique ID. In Ruby, many people use SecureRandom::uuid for this, I'm sure there's a parallel in whatever language(s) you are using.
EDIT #2!
Neo4j supports integer properties. {id:"1"} != {id: 1}. If your field is supposed to be an integer, use an integer.

How to get nodes from relationships with any depth including relation filter

I'm using a query similar to this one:
(n)-[*]->(m)
Any depth.
But I cannot filter the relation name in such a query like this:
(n)-[*:DOES]->(m)
Any depth.
I need to filter the relation name since there are different relations on the related path. If it helps, here is my graph:
CREATE (Computer { name:'Computer' }),(Programming { name:'Programming' }),(Java { name:'Java' }),(GUI { name:'GUI' }),(Button { name:'Button' }), Computer<-[:IS]-Programming, Programming<-[:IS]-Java, Java<-[:IS]-GUI, GUI<-[:IS]-Button, (Ekin { name:'Ekin' }), (Gunes { name:'Gunes' }), (Ilker {name:'Ilker'}), Ekin-[:DOES]->Programming, Ilker-[:DOES]->Java, Ilker-[:DOES]->Button, Gunes-[:DOES]->Java
I'd like to get the names (Ekin, Ilker and Gunes) which have "DOES" relationship connected to "Programming" with any depth.
Edit:
I'm able to get the values I want by merging two different queries' results (think 13 is the top node that I want to reach):
START n=node(13)
MATCH p-[:DOES]->()-[*]->(n)
RETURN DISTINCT p
START n=node(13)
MATCH p-[:DOES]->(n)
RETURN DISTINCT p
I want to do it in a single query.
Change the matching pattern to "p-[:DOES]->()-[*0..]->n",
Match p-[:DOES]->()-[*0..]->n
Return distinct p.name
The variable length relationship "[*]" means 1..*. You need 0..* length relationships on the path.
Just to update the answer with Neo4j 3.0.
MATCH p-[:DOES*0..]->(n)
RETURN DISTINCT(p.name)
It returns the same result as the accepted answer.

Resources