I am trying to match multiple outer joins on the same level in neo4j.
My database consists of users and a count of common up ur downratings on articles. The ratings counts are on seprate edges for up and downratings between the users.
---------- -----------
| User n | -[:rating_positive {weight}]-> | User n2 |
---------- -----------
| ^
\-----[:rating_negative {weight}]-------/
Now i want to produce edges that sum up these ratings.
I would love to use multiple optional merges, that do so sch as e.g.:
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
But: In this example I get all users without any positive ratings and those with positive and negatice ratings but none with only negative ratings. So there seems to be a sequence in OPTIONAL MATCH.
If I swap the order of the "OPTIONAL MATCHes" I get those with only negative ratings but not those with onl positive ratings.
So "OPTIONAL MATCH" is somehow a sequence where only when the first
sequence is met I get something from the second and so on?
Is there a workaround?
Neo4j Version is 2.1.3.
P.S.:
Even more confusing matching against NULL does not seem to work. So this query:
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
WHERE rating_positive IS NULL AND rating_negative IS NOT NULL
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
will give me lots of edges with NULL rating_negative and NON NULL rating_positive. I don't know what is happening with null matching in WHERE?
Anyway I found a way to recode the nulls to 0 values using "coalesce":
MATCH (n:`User`)
OPTIONAL MATCH (n:`User`)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n:`User`)-[rating_negative:rating_negative]-(n2:`User`)
WITH n, n2, coalesce(rating_positive.weight, 0) AS rating_positive, coalesce(rating_negative.weight, 0) as rating_negative
WHERE rating_positive = 0 AND rating_negative > 0
RETURN n.uid, n2.uid, rating_positive, rating_negative
With this query it works as expected.
I believe more than the sequencing of the optional match, it's the fact that you've bound n2.
So the next optional match is restricted to only match the nodes identified to be candidates for n2 in the previous match. And so it appears that the order of the optional match influences it.
If you take a look at a small sample graph I set up here http://console.neo4j.org/r/lrp55o , the following query
MATCH (n:User)
OPTIONAL
MATCH (n)-[:rating_negative]->(n2)
OPTIONAL
MATCH (n)-[:rating_positive]->(n2)
RETURN n,n2
returns B-[:rating_negative]->C and C-[:rating_negative]->D but it leaves out A-[:rating_positive]->B.
The first optional match for rating_negative bound C and D as nodes for "n2". The second optional match found no n which has a rating_positive to C or D and hence the results.
I'm a bit unclear about what you are trying to do with the query and null checks but a union would be one way to give you all the positive and negative relations (which you can add your filters to):
MATCH (n:User)
OPTIONAL
MATCH (n)-[rating:rating_negative]->(n2)
RETURN n,n2,rating.weight
UNION ALL
MATCH (n:User)
OPTIONAL
MATCH (n)-[rating:rating_positive]->(n2)
RETURN n, n2, rating.weight
If this is not what you're looking for, a small subgraph at http://console.neo4j.org?init=0 would be great to help you further.
EDIT: Since comments indicated that the sum of ratings was required between a pair of users, the following query does the job:
MATCH (u:User)-[rating:rating_positive|:rating_negative]->(u2)
RETURN u,u2,sum(rating.weight)
I can't be entirely sure whether this is what is causing your problem but it appears to me that you should be omitting the labels in the OPTIONAL MATCH clauses.
Perhaps try the query below
MATCH (n:`User`)
OPTIONAL MATCH (n)-[rating_positive:rating_positive]-(n2:`User`)
OPTIONAL MATCH (n)-[rating_negative:rating_negative]-(n2)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
It may also be worth including the relationship directions.
MATCH (n:`User`)
OPTIONAL MATCH (n)-[rating_positive:rating_positive]->(n2:`User`)
OPTIONAL MATCH (n)-[rating_negative:rating_negative]->(n2)
RETURN n.uid, n2.uid, rating_positive.weight, rating_negative.weight
Related
In my graph, I want to get the first-degree, second-degree, and third-degree neighbors of a certain node. If my graph is A -> B -> C -> D -> E, then
first-degree neighbor of C is B
second-degree neighbor of C is A
third-degree neighbor of C is none
When checking neighbors, I go in the reverse direction of the edge. To get these nodes, I wrote the following query.
MATCH (changedNode: Function) WHERE changedNode.signature IN [...]
MATCH (neig1: Function)-[:CALLS]->(changedNode)
MATCH (neig2: Function)-[:CALLS]->(neig1)
MATCH (neig3: Function)-[:CALLS]->(neig2)
RETURN DISTINCT neig1.functionName, neig2.functionName, neig3.functionName
I realized that this code does not return B as the first-degree neighbor of C since A does not have any neighbors(neig3 is empty). In other words, this query requires a node to have a third-degree neighbor. I understood this but could not update my code. How should I revise my query?
You can use OPTIONAL MATCH since A may not have a neighbor. Then the query will return a null value for neigh3.
MATCH (changedNode: Function) WHERE changedNode.signature IN [...]
MATCH (neig1: Function)-[:CALLS]->(changedNode)
OPTIONAL MATCH (neig2: Function)-[:CALLS]->(neig1)
OPTIONAL MATCH (neig3: Function)-[:CALLS]->(neig2)
RETURN DISTINCT neig1.functionName, neig2.functionName, neig3.functionName
Use of OPTIONAL MATCH matches patterns against your graph database, just like a MATCH does. The difference is that if no matches are found, OPTIONAL MATCH will use a null for missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher equivalent of the outer join in SQL.
MATCH (changedNode: Function) WHERE changedNode.signature IN [...]
MATCH (neig1: Function)-[:CALLS]->(changedNode)
OPTIONAL MATCH (neig2: Function)-[:CALLS]->(neig1)
OPTIONAL MATCH (neig3: Function)-[:CALLS]->(neig2)
RETURN DISTINCT neig1.functionName, neig2.functionName, neig3.functionName
For more explanation, visit: https://neo4j.com/developer/kb/a-note-on-optional-matches/
I have a cypher query that is not behaving as expected and I'm trying to figure out why. I suspect I don't fully understand how OPTIONAL MATCH works.
The database has one (:'Person::Current') node and one (:'Trait::Current') node. It does not have a (:'PersonTrait::Current') node.
If I run this query, it correctly returns a count(t) of 1
MATCH (n:`Person::Current` {uuid: $person_id}), (t:`Trait::Current` {uuid: $trait_id})
WHERE NOT (
(n)-[:PERSON_TRAIT]->(:`PersonTrait::Current` {has: true})-[:PERSON_TRAIT]->(t) OR
(n)-[:PERSON_TRAIT]->(:`PersonTrait::Current` {has: false})-[:PERSON_TRAIT]->(t) OR
(t)-[:GIVES_TRAIT]->(:`GivesTrait::Current`)-[:GIVES_TRAIT]->(:`Trait::Current`)<-[:PERSON_TRAIT]-(:`PersonTrait::Current` {has: false})<-[:PERSON_TRAIT]-(n)
)
RETURN count(t) as res
When a (:'PersonTrait::Current') node is added to the database in the form
(:`Person::Current`)-[:PERSON_TRAIT]->(:`PersonTrait::Current` {has: true})-[:PERSON_TRAIT]->(:`Trait::Current`)
My query correctly returns a count(t) of 0.
However, if I try and DRY up the query by making use of OPTIONAL MATCH, like so
MATCH (n:`Person::Current` {uuid: $person_id}), (t:`Trait::Current` {uuid: $trait_id})
OPTIONAL MATCH (pt:`PersonTrait::Current`)
WHERE NOT (
((n)-[:PERSON_TRAIT]->(pt)-[:PERSON_TRAIT]->(t) AND exists(pt.has)) OR
(t)-[:GIVES_TRAIT]->(:`GivesTrait::Current`)-[:GIVES_TRAIT]->(:`Trait::Current`)<-[:PERSON_TRAIT]-(pt {has: false})<-[:PERSON_TRAIT]-(n)
)
RETURN count(t) as res
Then the query incorrectly returns a count(t) of 1 when a (:'PersonTrait::Current') node is added to the database in the form
(:`Person::Current`)-[:PERSON_TRAIT]->(:`PersonTrait::Current` {has: true})-[:PERSON_TRAIT]->(:`Trait::Current`)
Anyone know what's going wrong? The WHERE NOT clause should be filtering out (t) nodes if a (pt) node is present with the appropriate pattern.
THANKS!!!
I think the issue is understanding the WHERE clause, in that WHERE only applies to the previous MATCH, OPTIONAL MATCH, or WITH clause.
In this case, it's paired with the OPTIONAL MATCH, so rows won't be filtered out when the WHERE is false, it will behave the same as if the OPTIONAL MATCH failed, so newly introduced variables in the OPTIONAL MATCH would be set to null.
If you want the WHERE to filter out rows, pair it with a WITH clause instead:
MATCH (n:`Person::Current` {uuid: $person_id}), (t:`Trait::Current` {uuid: $trait_id})
OPTIONAL MATCH (pt:`PersonTrait::Current`)
WITH n, t, pt
WHERE NOT (
((n)-[:PERSON_TRAIT]->(pt)-[:PERSON_TRAIT]->(t) AND exists(pt.has)) OR
(t)-[:GIVES_TRAIT]->(:`GivesTrait::Current`)-[:GIVES_TRAIT]->(:`Trait::Current`)<-[:PERSON_TRAIT]-(pt {has: false})<-[:PERSON_TRAIT]-(n)
)
RETURN count(t) as res
I need to return a node and all the nodes associated with it based on a relationship. An example of the query would be like:
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH p=(n)-[:OWNS]->(q)
Would it be more efficient to get node 'n' on its own and then get the paths in a separate call, or should I do a callthat returns 'n' and 'p'.
Addt. information: I have to do this for multiple relationships and I have noticed that every time I add a relationship, the combinatorics between all the paths causes the performance to deteriorate. Example:
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH p=(n)-[:OWNS]->(q:Something)
OPTIONAL MATCH o=(n)-[:USES]->(r:SomethingElse)
.
.
.
OPTIONAL MATCH l=(n)-[:LOCATED_IN]->(r:NthSomethingElse)
RETURN n, p, o,..., l
or
//Call 1
MATCH (n) where id(n)= {neo_id}
RETURN n
//Call 2
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH p=(n)-[:OWNS]->(q:Something)
RETURN p
//Call 3
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH o=(n)-[:USES]->(r:SomethingElse)
RETURN o
.
.
.
//Call nth
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH l=(n)-[:LOCATED_IN]->(r:NthSomethingElse)
RETURN l
If you always want to get n (if it exists), even if it has no relationships, then your first query is really the only way to do that. If you don't want that, then combining the 2 clauses into 1 probably has little impact on performance.
The reason you notice a slow down every time you add another MATCH is because of "cartesian products". That is: if a MATCH or OPTIONAL MATCH clause would "normally" produce N rows of data, but previous clauses in the same query had already produced M rows of data, the actual resulting number of rows would be M*N. So, every extra MATCH has a multiplicative effect on the number of data rows.
Depending on your use case, you might be able to avoid cartesian products by using aggregation on the results of all (or hopefully most) MATCH clauses, which can turn N into 1 (or some other smaller number). For example:
MATCH (n) where id(n)= {neo_id}
OPTIONAL MATCH p=(n)-[:OWNS]->(:Something)
WITH n, COLLECT(p) AS owns
OPTIONAL MATCH o=(n)-[:USES]->(:SomethingElse)
WITH n, owns, COLLECT(o) AS uses
OPTIONAL MATCH l=(n)-[:LOCATED_IN]->(:NthSomethingElse)
WITH n, owns, uses, COLLECT(l) AS located_in
RETURN n, owns, uses, located_in;
I want to do something like this:
MATCH (p:person)-[a:UPVOTED]->(t:topic),(p:person)-[b:DOWNVOTED]->(t:topic),(p:person)-[c:FLAGGED]->(t:topic) WHERE ID(t)=4 RETURN COUNT(a),COUNT(b),COUNT(c)
..but I get all 0 counts when I should get 2, 1, 1
A better solution is to use size which improve drastically the performance of the query :
MATCH (t:Topic)
WHERE id(t) = 4
RETURN size((t)<-[:DOWNVOTED]-(:Person)) as downvoted,
size((t)<-[:UPVOTED]-(:Person)) as upvoted,
size((t)<-[:FLAGGED]-(:Person)) as flagged
If you are sure that the other nodes on the relationships are always labelled with Person, you can remove them from the query and it will be a bit faster again
Let's start with refactoring the query a bit (hopefully the meaning of it isn't lost):
MATCH
(t:topic)
(p:person)-[upvote:UPVOTED]-(t),
(p:person)-[downvote:DOWNVOTED]->(t),
(p:person)-[flag:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Since t is your primary variable (since you are filtering on it), I've matched once with the label and then used just the variable throughout the rest of the matches. Seeing the query cleaned up like this, it seems to me that you're trying to count all upvotes/downvotes/flags for a topic, but you don't care who did those things. Currently, since you're using the same variable p Cypher is going to try to match the same person for all three lines. So you could have different variables:
(p1:person)-[upvote:UPVOTED]-(t),
(p2:person)-[downvote:DOWNVOTED]->(t),
(p3:person)-[flag:FLAGGED]->(t)
Or better, since you're not referencing the people anywhere else, you can just leave the variables out:
(:person)-[upvote:UPVOTED]-(t),
(:person)-[downvote:DOWNVOTED]->(t),
(:person)-[flag:FLAGGED]->(t)
And stylistically, I would also suggest starting your matches with the item that you're filtering on:
(t)<-[upvote:UPVOTED]-(:person)
(t)<-[downvote:DOWNVOTED]-(:person)
(t)<-[flag:FLAGGED]-(:person)
The next problem comes in because by making these a MATCH, you're saying that there NEEDS to be a match. Which means you'll never get cases with zeros. So you'll want OPTIONAL MATCH:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[upvote:UPVOTED]-(:person)
OPTIONAL MATCH (t)<-[downvote:DOWNVOTED]-(:person)
OPTIONAL MATCH (t)<-[flag:FLAGGED]-(:person)
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Even then, though what you're saying is: "Find a topic and find all cases where there is 1 upvote, no downvote, no flag, 1 upvote, 1 downvote, no flag, etc... to all permutations). That means you'll want to COUNT one at a time:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[r:UPVOTED]-(:person)
WITH t, COUNT(r) AS upvotes
OPTIONAL MATCH (t)<-[r:DOWNVOTED]-(:person)
WITH t, upvotes, COUNT(r) AS downvotes
OPTIONAL MATCH (t)<-[r:FLAGGED]-(:person)
RETURN upvotes, downvotes, COUNT(r) AS flags
A couple of miscellaneous items:
Be careful about using Neo IDs as a long-term reference because they can be recycled.
Use parameters whenever possible for performance / security (WHERE ID(t)={topic_id})
Also, labels are generally TitleCase. See The Zen of Cypher guide.
Check this query, i think it will help you.
MATCH (p:person)-[a:UPVOTED]->(t:topic),
(p)-[b:DOWNVOTED]->(t),(p)-[c:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(a) as a_count,COUNT(b) as b_count,COUNT(c) as c_count;
Your current MATCH requires that the same person node (identified by p) have relationships of all 3 types with t. This is because an identifier is bound to a specific node (or relationship, or value), and (unless hidden by a WITH clause, which you do not have in your query) will reference that same node (or relationship, or value) throughout a query.
Based on your expected results, I am assuming that you are just trying to count the number of relationships of those 3 types between any person and t. If so, this is a performant way to do that:
MATCH (t:topic)
WHERE ID(t) = 4
MATCH (:person)-[r:UPVOTED|DOWNVOTED|FLAGGED]->(t)
RETURN REDUCE(s=[0,0,0], x IN COLLECT(r) |
CASE TYPE(x)
WHEN 'UPVOTED' THEN [s[0]+1, s[1], s[2]]
WHEN 'DOWNVOTED' THEN [s[0], s[1]+1, s[2]]
ELSE [s[0], s[1], s[2]+1]
END
) As res;
res is an array with the number of UPVOTED, DOWNVOTED, and FLAGGED relationships, respectively, between any person and t.
Another approach would be to use separate OPTIONAL MATCH statements for each relationship type, returning three COUNT(DISTINCT x) values. But the above query uses a single MATCH statement, greatly reducing the number of DB hits, which are generally expensive.
In Neo4j 2.0 this query:
MATCH (n) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
returns different results than this query:
START n = node(*) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
The first query works as expected returning a single Node for 'blevine' and the associated Nodes mentioned in the OPTIONAL MATCH clauses. The second query returns many more Nodes which do not even have a username property. I realize that start n = node(*) is not recommended and that START is not even required in 2.0. But the second form (with OPTIONAL MATCH replaced with question marks on the relationship type) worked prior to 2.0. In the second form, why is 'n' not being constrained to the single 'blevine' node by the first WHERE clause?
To run the second query as expected you would just need to add WITH n. In your query you would need to filter the result and pass it for optional match which is to be done using WITH
START n = node(*) WHERE n.username = 'blevine'
WITH n
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
From the documentation
WHERE defines the MATCH patterns in more detail. The predicates are part of the
pattern description, not a filter applied after the matching is done.
This means that WHERE should always be put together with the MATCH clause it belongs to.
when you do start n=node(*) where n.name="xyz" you need to pass the result explicitly into your next optional matches. But when you do MATCH (n) WHERE n.name="xyz" this tells graph specifically what node to start looking into.
EDIT
Here is the thing. The documentation says Optional Match returns null if a pattern is not found so in your first case, it includes all those results too where n.username property is null or cases where n doesnt even have a relationship suggested in the OPTIONAL MATCH pattern. So when you do a WITH n , the graph is explicitly told to use only n.
Excerpt from the documentation (link : here)
OPTIONAL MATCH matches patterns against your graph database, just like MATCH does.
The difference is that if no matches are found, OPTIONAL MATCH will use NULLs for
missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher
equivalent of the outer join in SQL.
Either the whole pattern is matched, or nothing is matched. Remember that
WHERE is part of the pattern description, and the predicates will be
considered while looking for matches, not after. This matters especially
in the case of multiple (OPTIONAL) MATCH clauses, where it is crucial to
put WHERE together with the MATCH it belongs to.
Also few more things to note about the behaviour of WHERE clause: here
Excerpts:
WHERE is not a clause in it’s own right — rather, it’s part of MATCH,
OPTIONAL MATCH, START and WITH.
In the case of WITH and START, WHERE simply filters the results.
For MATCH and OPTIONAL MATCH on the other hand, WHERE adds constraints
to the patterns described. It should not be seen as a filter after the
matching is finished.