I will explain briefly my query:
match (a)-[:requires]-(b),
(a)-[:instanceOf]->(n)<-[:superclassOf*]-(c:Host_configuration),
(h)-[:instanceOf]->(z)<-[:superclassOf*]-(t:Host)
where not b = h
return distinct a, b
My wish is to return all (a)-[:requires]-(b) patterns (where a is somehow a subclass of Host_configuration but b is not a subclass of Host.
This query however returns also nodes that actually are subclasses of Host
EDIT
I don't want to retrieve all a elements connected to b elements that are not tied to Host. I want to retrieve all patterns between a and b that are not like (a)-[:requires]-(h)
Try this query:
match (a)-[:requires]-(b),
(a)-[:instanceOf]->(n)<-[:superclassOf*]-(:Host_configuration)
where not (b)-[:instanceOf]->(z)<-[:superclassOf*]-(:Host)
return distinct a, b
It's possible to directly update the where clause with the path you want to exclude. You can define the where clause to exclude b where it is a subclass of Host.
I believe you should care about the direction of the (a)-[:requires]-(b) pattern. If you do not specify a direction, you would not know who requires whom AND you might also get the same node pair twice (in opposite orders). In my answer, I assume you meant (a)-[:requires]->(b), but you can easily reverse the direction if need be.
For efficiency, you should perform both instanceOf/superclassOf tests using path patterns in a WHERE clause. Such a path pattern just checks for a single match before succeeding, and does not bother to expend the resources to hunt down all possible matches. (By the way, a path pattern in a WHERE clause cannot introduce new variables.)
Once the above issues are taken care of, your MATCH clause would just be MATCH (a)-[:requires]->(b), and any given a/b pair would only be found once (as long as your DB has at most one requires relationship going from a given node to another given node). So that should mean that your RETURN clause can omit the DISTINCT option, which would be more efficient.
So, this may work better for you:
MATCH (a)-[:requires]->(b)
WHERE
(a)-[:instanceOf]->()<-[:superclassOf*]-(:Host_configuration) AND
NOT (b)-[:instanceOf]->()<-[:superclassOf*]-(:Host)
RETURN a, b
By the way, it would also be more efficient for the MATCH clause to specify the node labels for a and b, so that the DB does not have to scan every node in the DB. I have not done that in my answer.
Related
Suppose you've got two nodes that represent the same thing, and you want to merge those two nodes. Both nodes can have any number of relations with other nodes.
The basics are fairly easy, and would look something like this:
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->()
CREATE (a)-[s]->()
SET s = PROPERTIES(r)
DELETE DETACH b
Only I can't create a relation without a type. And Cypher doesn't support variable labels either. I'd love to be able to do something like
CREATE (a)-[s:{LABELS(r)}]->(o)
but that doesn't work. To create the relation, you need to know the type of the relation, and in this case I really don't.
Is there a way to dynamically assign types to relationships, or am I going to have to query the types of the old relation, and then string concat new queries with the proper types? That's not impossible, but a lot slower and more complex. And this could potentially match a lot of elements and even more relationships, so having to generate a separate query for every instance is going to slow things down quite a lot.
Or is there a way to change the target of the old relationship? That would probably be the fastest, but I'm not aware of any way to do that.
I think you need to take a look at APOC, especially apoc.create.relationship which enable creating relationships with dynamic type.
Adapting your example, you should end up with something along the line of (not tested):
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->(n)
CALL apoc.create.relationship(a, type(r), properties(r), n)
DETACH DELETE b
NB
relationships have TYPE and not label
the proper cypher statement to delete relationships attached to a node and the node itself is DETACH DELETE (and not DELETE DETACH)
Related resource: https://markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/
The APOC procedure apoc.refactor.mergeNodes should be very helpful. That procedure is very powerful, and you need to read the documentation to understand how to configure it to do what you want in your specific situation.
Here is a simple example that shows how to use the procedure's default configuration to merge nodes with the same id:
MATCH (node:Foo)
WITH node.id AS id, COLLECT(node) AS nodes
WHERE SIZE(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {}) YIELD node
RETURN node
In this example, I specified an arbitrary Foo label to avoid accidentally merging unwanted nodes. Doing so also helps to speed up the query if you have a lot of nodes with other labels (since they will not need to be scanned for the id property).
The aggregating function COLLECT is used to collect a list of all the nodes with the same id. After checking the size of the list, it is passed to the procedure.
Can I use some back-reference sort of mechanism in neo4j? I'm not interested in what matches the query, just that it is the same thing in many places. Something like:
MATCH (a:Event {diagnosis1:11})
MATCH (b:Event {diagnosis1:15})
MATCH (c:Event {diagnosis1:5})
MATCH (a)-[rel:Next {PatientID:*}]->(b)
MATCH (b)-[rel1:Next {PatientID:\{1}]->(c)
The idea is that I just require the attribute IDs from both edges to be the same, without specifying it. The whole purpose of it would be not generating all possible matchings, to then filter them, but only hop in the specific places.
I've asked something similar in a more specific way here.
Edit: I know WHERE clauses can be used for that, but they filter the query AFTER matching the edges and nodes. I want to do that DURING the matching!
Use a WHERE clause with simple references, there's no need for back references:
MATCH (a)-[rel:Next]->(b)
MATCH (b)-[rel1:Next]->(c)
WHERE rel.PatientID = rel1.PatientID
Update
First of all, Cypher is a declarative query language: you express what you want, the runtime takes care of executing and optimizing it any way it can, so it's not that obvious that it would do it the way you think it will, or that using "back references" would magically solve the problem; it's just another way of writing the same thing.
So, your problem is that the match creates all the relationship pairs before filtering them. How about splitting the match in 2 phases using WITH?
MATCH (a:Event {diagnosis1:11})-[rel:Next]->(b:Event {diagnosis1:15})
WITH a, b, rel
MATCH (b)-[rel1:Next]->(c:Event {diagnosis1:5})
WHERE rel1.PatientID = rel.PatientID
That should only select the second relationships that match the first, but I'm not sure if it's an O(n^2) algorithm in Cypher's runtime.
Otherwise, if you drop to the Java API (which would mean either an extension or a procedure, depending on your version of Neo4j), you can probably implement in O(n) by
scanning all the relationships between a and b, indexing them by PatientID in some multimap (see Guava, or use a Map<K, Collection<V>>); this is O(n)
then doing the same for all the relationships between b and c, still O(n)
iterate on the keys of one multimap to get the values in both and match them, still O(n)
A co-worker coded something like this:
match (a)-[r]->(b), (c) set c.x=y
What does the comma do? Is it just shorthand for MATCH?
Since Cypher's ASCII-art syntax can only let you specify one linear chain of connections in a row, the comma is there, at least in part, to allow you to specify things that might branch off. For example:
MATCH (a)-->(b)<--(c), (b)-->(d)
That represents three nodes which are all connected to b (two incoming relationships, and one outgoing relationship.
The comma can also be useful for separating lines if your match gets too long, like so:
MATCH
(a)-->(b)<--(c),
(c)-->(d)
Obviously that's not a very long line, but that's equivalent to:
MATCH
(a)-->(b)<--(c)-->(d)
But in general, any MATCH statement is specifying a pattern that you want to search for. All of the parts of that MATCH form the pattern. In your case you're actually looking for two unconnected patterns ((a)-[r]->(b) and (c)) and so Neo4j will find every combination of each instance of both patterns, which could potentially be very expensive. In Neo4j 2.3 you'd also probably get a warning about this being a query which would give you a cartesian product.
If you specify multiple matches, however, you're asking to search for different patterns. So if you did:
MATCH (a)-[r]->(b)
MATCH (c)
Conceptually I think it's a bit different, but the result is the same. I know it's definitely different with OPTIONAL MATCH, though. If you did:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar), (a)-->(c:Baz)
You would only find instances where there is a Foo node connected to nothing, or connected to both a Bar and a Baz node. Whereas if you do this:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar)
OPTIONAL MATCH (a)-->(c:Baz)
You'll find every single Foo node, and you'll match zero or more connected Bar and Baz nodes independently.
EDIT:
In the comments Stefan Armbruster made a good point that commas can also be used to assign subpatterns to individual identifiers. Such as in:
MATCH path1=(a)-[:REL1]->(b), path2=(b)<-[:REL2*..10]-(c)
Thanks Stefan!
EDIT2: Also see Mats' answer below
Brian does a good job of explaining how the comma can be used to construct larger subgraph patterns, but there's also a subtle yet important difference between using the comma and a second MATCH clause.
Consider the simple graph of two nodes with one relationship between them. The query
MATCH ()-->() MATCH ()-->() RETURN 1
will return one row with the number 1. Replace the second MATCH with a comma, however, and no rows will be returned at all:
MATCH ()-->(), ()-->() RETURN 1
This is because of the notion of relationship uniqueness. Inside each MATCH clause, each relationship will be traversed only once. That means that for my second query, the one relationship in the graph will be matched by the first pattern, and the second pattern will not be able to match anything, leading to the whole pattern not matching. My first query will match the one relationship once in each of the clauses, and thus create a row for the result.
Read more about this in the Neo4j manual: http://neo4j.com/docs/stable/cypherdoc-uniqueness.html
i am running the queries to find if node a is connected to node b directly or indirectly. for directly i can use
MATCH (n)-[r]->(a) OR MATCH (n)-[r]->(b)
when i use the query
MATCH (b)-[r*1..2]->(a)
the results are different. i am confused to understand what is the difference between below mentioned two queries.
1- OPTIONAL MATCH L=a-->c-->e-->b with a,b,L,p,q,n
2- OPTIONAL MATCH M=(a)-[r*1..2]->(b)
Are these both queries are the same. if they are then the results for both in my case are different.
what i wanted to see, a is connected to b after two hop distance.
i will be very grateful for your contribution. Thanks in advance
This query:
MATCH (b)-[r*1..2]->(a)
Means match one or two hops away. So the result is different from your first queries, because your first queries match exactly one hop away. This one goes further, so the results are different. Here, "hops" mean relationships not nodes.
This query:
OPTIONAL MATCH L=a-->c-->e-->b with a,b,L,p,q,n
Is very different because you're navigating through 2 intermediate nodes (c and e) with 3 intermediate relationships (a->c, c->e, e->b).
By the way, you can of course use optional match here, but for you it's not needed. If a and b must be connected, then using optional match doesn't really change anything for you here.
So you need to decide whether you want something 2 hops/relationships away, or if you want something 2 hops/nodes away, that's different.
Another way of writing 2 hops/relationships away would be this:
MATCH p=(a)-[r1]-(m)-[r2]-(b)
RETURN p
I am trying to find relations between nodes with optional but specific nodes/relationships in between (neo4j 2.0 M6).
In my data model, 'Gene' can be 'PARTOF' a 'Group. I have 'INTERACT' relationships between 'Gene'-'Gene', 'Gene'-'Group' and 'Group'-'Group' (red lines in model image).
I want to boil this down to all 'INTERACT' relationships between 'Gene': both direct (Gene-INTERACT-Gene) and via one or two 'Group' (Gene-PARTOF-Group-INTERACT-Gene).
Of course this is easy with multiple Cypher queries:
# direct INTERACT
MATCH (g1:Gene)-[r:INTERACT]-(g2:Gene) RETURN g1, g2
# INTERACT via one Group
MATCH (g1:Gene)-[:PARTOF]-(gr:Group)-[r:INTERACT]-(g2:Gene) RETURN g1, g2
# INTERACT via two Group
MATCH (g1:Gene)-[:PARTOF]-(gr1:Group)-[r:INTERACT]-(gr2:Group)-[:PARTOF]-(g2:Gene)
RETURN g1, g2
But would it be possible to construct a single Cypher query that takes optional 'Group steps' in the path? So far I only used optional relationships and shortestPaths, but I have no idea if I can filter for one or two optional nodes in between two genes.
You can assign a depth between zero and one for each of the relationships you add to the path. Try something like
MATCH (g1:Gene)-[:PARTOF*0..1]-(gr1)-[:INTERACT]-(gr2)-[:PARTOF*0..1]-(g2:Gene)
RETURN g1,g2
and to see what the matched paths actually look like, just return the whole path
MATCH p=(g1:Gene)-[:PARTOF*0..1]-(gr1)-[:INTERACT]-(gr2)-[:PARTOF*0..1]-(g2:Gene)
RETURN p
There is a bit of a bother with declaring node labels for the optional parts of this pattern, however, so this query assumes that genes are not part of anything other than groups, and that they only interact with groups and other genes. If a gene can have [:PARTOF] to something else, then (gr1) will bind that something, and the query is no longer reliable. Simply adding a where clause like
WHERE gr1:Group AND gr2:Group
excludes the case where the optional parts are not matched, so that won't work (it'd be like your third query). I'm sure it can be solved but if your actual model is not much more complex than what you describe in your question, this should.
I took the liberty of interpreting your model in a console here, check it out to see if it does what you want.