I have a query that I'm not sure how to implement or if it's efficient to do in cypher. Anyway, here's what I'm trying to do.
I have basically this graph:
I want to get all the nodes/relationships from 1 to 3 (note: the empty node can be any number of nodes). I also want all the, if any, incoming edges from the last two nodes and only the last two nodes that are not in the original path. In this case the edges that are in red should also be added to result.
I already know the path that I want. So in this example I would have been given node ids 1, ..., 2, 3 and I think I know how to get the path of the first part.
MATCH (n)-->() WHERE n.nid IN ['1', '...', '2', '3'] RETURN n
I just can't figure out how to get the red edges for the last two nodes in the path. Also, I'm not given node ids 4 and 5. We can assume the edges connecting 1, ..., 2, 3 all have the same label and all the other edges have a different label.
I think I need to use merge but can't figure out how to do it yet.
Or if someone know's how to do this in gremlin, I'm all ears.
Does this work for you?
MATCH ({nid: '1'})-[:t*]->(n2 {nid: '2'})-[:t]->(n3 {nid: '3'})
OPTIONAL MATCH ()-[t42]->(n2)
WHERE (TYPE(t42) <> 't')
OPTIONAL MATCH ()-[t53]->(n3)
WHERE (TYPE(t53) <> 't')
RETURN COLLECT(t42) AS c42, COLLECT(t53) AS c53;
I give all the relationships on the left path (in your diagram) the type "t". (The term label is used for nodes, not relationships.). You said we can assume that the other relationships do not have that type, so this query takes advantage of that fact to filter out type "t" relationships from the result.
This query also makes the 4-2 and 5-3 relationships optional.
Related
I had another thread about this where someone suggested to do
MATCH (p:Person {person_id: '123'})
WHERE ANY(x IN $names WHERE
EXISTS((p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(:Dias {group_name: x})))
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
This does what I need it to, returns nodes that fit the criteria without returning the relationships, but now I need to include another param that is a list.
....(:Dias {group_name: x, second_name: y}))
I'm unsure of the syntax.. here's what I tried
WHERE ANY(x IN $names and y IN $names_2 WHERE..
this gives me a syntax error :/
Since the ANY() function can only iterate over a single list, it would be difficult to continue to use that for iteration over 2 lists (but still possible, if you create a single list with all possible x/y combinations) AND also be efficient (since each combination would be tested separately).
However, the new existenial subquery synatx introduced in neo4j 4.0 will be very helpful for this use case (I assume the 2 lists are passed as the parameters names1 and names2):
MATCH (p:Person {person_id: '123'})
WHERE EXISTS {
MATCH (p)-[:BELONGS]-(:Face)-[:CORRESPONDS]-(:Image)-[:HAS_ACCESS_TO]-(d:Dias)
WHERE d.group_name IN $names1 AND d.second_name IN $names2
}
MATCH path=(p)-[:ASSOCIATED_WITH]-(:Person)
RETURN path
By the way, here are some more tips:
If it is possible to specify the direction of each relationship in your query, that would help to speed up the query.
If it is possible to remove any node labels from a (sub)query and still get the same results, that would also be faster. There is an exception, though: if the (sub)query has no variables that are already bound to a value, then you would normally want to specify the node label for the one node that would be used to kick off that (sub)query (you can do a PROFILE to see which node that would be).
Look at following example graph (from Neo4j reference):
And ther query is:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson.name
The result is:
"Anders"
I can't understand why this result was returnes. First of all,
what does it mean:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
In particualr, Bossman has also (like Anders) two outgoing edges and is connected to David.
Can someone explain me a semantic of this query ?
So as you noted there are two nodes which look like they fit the pattern you described. Both Anders and Bossman are connected to David, and both have two outgoing relationships.
The thing you're missing is that with Cypher patterns, relationships are unique for the pattern, they will not be reused (this is actually very useful, for example it prevents infinite loops when using variable-length relationships when a cycle is present).
So in this MATCH pattern:
MATCH (david { name: 'David' })--(otherPerson)-->()
the relationship used to get from David to Bossman (the :BLOCKS relationship) will not be reused in the pattern (specifically the (otherPerson)-->() part), so you will only get a single result row for this, while for Anders you will get 2. Your WHERE clause then rules out the match for Bossman, since the count of foaf is 1.
One way you could alter this query to get the desired result is to check for the degrees of a relationship in the WHERE clause rather than in the MATCH pattern. This is also more efficient as checking for relationship degrees doesn't have to perform an expand operation, the relationship degree data is on the node itself.
MATCH ({ name: 'David' })--(otherPerson)
WHERE size((otherPerson)-->()) > 1
RETURN otherPerson.name
(also it's a good idea to use node labels in your matches, at least for your intended starting nodes. Indexes (if present) will only be used when you explicitly use both the label and the indexed property in the match, it won't work when you omit the label, or use a label that's not a part of the index).
I've built a graph with 40 mln nodes and 40 mln relations with Neo4j.
Mostly I search for different shortest paths and queries are to be very fast. Right now it usually takes a few milliseconds per query.
For speed I encode all parameters in relations property val, so ordinary query looks like this:
MATCH (one:Obj{oid:'1'})
with one
MATCH (two:Obj{oid:'2'}), path=shortestPath((one) -[*0..50]-(two))
WHERE ALL (x IN RELATIONSHIPS(path) WHERE ((x.val > 50 and x.val<109) ))
return path
But one filter cannot be done this way, as it should evaluate (on each step) property of starting node, property of relation, property of ending node, for example:
Path: n1(==1)-r1(==2)-n2(==1)-r2(==5)-n3(==3)
On step1: properties of n1 and n2 equal 1 and relation's property equals 2, that's OK, going further
On step2: property of n2 equals 1, but property of n3 equals 3, so we stop. If it was 1, we would stop anyway, because relation r2 is not 2, but 5.
I've used RELATIONSHIPS and NODES predicates, but they seem to work separately.
Also, I guess this can be done with traversal API, but I'll have to rewrite a lot of my other code, so it is not desirable.
Am I missing some fast solution?
It looks like your basic query is running quite fast. If you want to filter at additional steps, you probably have to add additional optional match and with statements to accommodate the filters. Undesired elements should drop out.
I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.
Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.