I have a pattern like so. A person can make visits (v:Visit {type:'introduction'}) to a property, visits can have different types, such as inventory check, inspection....
Visits have a chronology which is described by a (v:Visit)-[r:NEXT]->(vn:Visit)relationship. At a visit the person may recommend that another visit is scheduled (it may or may not be booked) (v:Visit)-[r:RECOMMEDED]->(i:Inspection)
My question is, is it possible to shape a Cypher query to find visit nodes which have a [r:RECOMMEDED]->(i:Inspection) pattern which DON'T have a Visit of {type: 'inspection'} within 2 [:NEXT] hops?
I have this but I have the issue that it return a collection of relationships.
MATCH (v:Visit)-[r:RECOMMENDED]->(i:Inspection)
WITH v
MATCH (v)-[n:NEXT*1..2]->(vis:Visit)
WHERE NOT((v)-[n]->(vis:Visit {type:'Inspection'}))
RETURN v
LIMIT 10
You might want to try OPTIONAL MATCH where you insist that what you want is null, like this:
MATCH (v:Visit)-[r:RECOMMENDED]->(i:Inspection)
OPTIONAL MATCH (v)-[n:NEXT*1..2]->(vis:Visit)
WHERE vis is null
RETURN v
LIMIT 10
An OPTIONAL MATCH lets you look for a pattern that may or may not be there. The WHERE vis is null insists that it not be there. So that's your proof that those Visit items don't have other downstream Visit nodes.
Related
I have two questions.
What is the best way to index user activities like posts, reposts, comments, upvotes, and downvotes. My current solution is representing every activity as a POST. It should work, but I know its quite expensive to regard upvotes and downvotes as new nodes when I can just use a relationship to represent this. But then, I want to be able to fetch everything once and order.
Secondly: When I run the following excluding the WITH and following MATCH, The result is larger but as I try to get the counts of reposts, replies and upvotes. The result keeps getting smaller and eventually nothing.
MATCH (me:User {id: "172ed572-e3af-d3ee-77c0-8d9d181b12f1"})-[:COLLEAGUE_OF]-(u:User)-[posted:POSTED]->(p:Post) WHERE posted.date >= 0
WITH p, posted, u AS user MATCH (p)-[ro:REPOST_OF]-(:Post)
WITH count(ro) AS reposts, posted, ro, user MATCH (p)-[rt:REPLY_TO]->(:Post)
WITH count(rt) AS replies, posted, user, reposts MATCH (p)-[uv:UP_VOTE]->(:Post)
WITH count(uv) AS upvotes, posted, user, reposts, replies, p
RETURN p AS post, posted, user, reposts, replies
ORDER BY -posted.date
You need to read the documentation on aggregating functions (like COUNT). In particular, you need to understand that the WITH (and RETURN) clause treats terms that do not contain aggregating functions as the "grouping keys" for the terms that do contain aggregating functions.
For example, a clause such as WITH foo, COUNT(foo) AS fooCount will always produce a fooCount of 1.
WITH clauses must specify the bound variables whose values you want to use later in the same query; any unspecified variables will be dropped. SInce your second and third WITH clauses do not specify p, their subsequent MATCH clauses are actually NOT using the previously bound value for p (but creating totally new p variables, each having multiple values).
You should use OPTIONAL MATCH instead of MATCH to get the counts of things that may not exist. A MATCH would cause the entire query to abort if it fails to find a match.
You neglected to make the (p)-[ro:REPOST_OF]-(:Post) relationship pattern directional. If you wanted to get a count of the number of times that p was reposted, so you should have used the pattern (p)<-[ro:REPOST_OF]-(:Post).
You forgot to return upvotes.
You should use ORDER BY posted.date DESC instead of ORDER BY -posted.date.
This may work better for you:
MATCH (:User {id: "172ed572-e3af-d3ee-77c0-8d9d181b12f1"})-[:COLLEAGUE_OF]-(user:User)-[posted:POSTED]->(p:Post)
WHERE posted.date >= 0
OPTIONAL MATCH (p)<-[ro:REPOST_OF]-(:Post)
WITH p, posted, user, COUNT(ro) AS reposts
OPTIONAL MATCH (p)-[rt:REPLY_TO]->(:Post)
WITH p, posted, user, reposts, COUNT(rt) AS replies
OPTIONAL MATCH (p)-[uv:UP_VOTE]->(:Post)
RETURN p, posted, user, reposts, replies, COUNT(uv) AS upvotes
ORDER BY posted.date DESC
My database contains informations about the nominations for the accademy awards.
I want to know how many directors have won an oscar for "best director" more than one time.
I can't quite get to the result that i want, a list of nominees.
The closest I've been is with this query:
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true
RETURN count(n1.win), n.name
ORDER BY n.name;
wich returns the directors names and the number of times they won an oscar.
I tried to do something like
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true AND count(n1.win)>1
RETURN n.name;
but got an error that says
Invalid use of aggregating function count(...) in this context (line
2, column 50 (offset: 96)) "WHERE c.name="Best Director" AND
n1.win=true AND count(n1.win)>1"
Can someone help me with this?
Use WITH to aggregate the wins first. According to the docs:
[...] WITH is used to introduce aggregates which can then by used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can also, like RETURN, alias expressions that are introduced into the results using the aliases as binding name.
So a query like this should work:
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true
WITH n, count(n1.win) AS winCount
WHERE winCount > 1
RETURN n.name;
See also the docs on WHERE:
WHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results of a WITH clause.
Context:
I'm working on an Alumni project to understand the difference between giving and engagement. (engagement = showing up, attending events, volunteering, etc.) The value in the work will come from the insight gained from understanding the behavior partners.
In the query below I've been effective at bring back the "Biggest spenders", however I'd like to list the name of the (n) Alumni and the (a,b)gifts. There 30 gift types that fit into (a,b).
Please let me know your thoughts... Innosoljim
>//Who are Alumni that give the most?
>>MATCH (n:Alumni)-[r:Supportfin]->(b)
>>MATCH (n:Alumni)-[t:Gavefin]->(a)
>>RETURN n,b,a LIMIT 1500
Thanks for the Answer - Let me restate the goal for clarity: I'm trying to consolidate (into n.Alumni) many relationships -[Gave|Support]-> to unique nodes (Various Gifts) so that I can obtain a report on an Alumni's activity (giving, support, by n.name. The Graph model places the Alumni node at the center of each unique behavior (giving, support, graddate, address, degree, greeklife, etc....) Does this help?
MATCH (a:Alumni)-[r:Supportfin|Gavefin]->(gift)
RETURN a.name, collect(gift)
ORDER BY (a)-[r:Supportfin|Gavefin]-> count(*) DESC
Something like this maybe although this isn't working (syntaxerror)
Match Alumni to the gifts with both relationship types and return:
MATCH (a:Alumni)-[r:r:Supportfin|Gavefin]->(gift)
RETURN a.name, collect(gift)
Or split it by the different relationship types:
MATCH (a:Alumni)
OPTIONAL MATCH (s)-[:Supportfin]->(sup_gift)
OPTIONAL MATCH (a)-[:Gavefin]->(gave_gift)
RETURN a.name, collect(DISTINCT sup_gift), collect(DISTINCT gave_gift)
Without a proper decription of your graph model and problem the question is difficult to answer.
I'm brand new to neo4j and graph databases.
I'm trying to create a query I would describe as a 'contains all' however I think I'm very far away and not sure how to progress
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actors)
-[:guest_stars_in]-(movie2)
RETURN movie2.name
Let's say
MATCH (movie:Movie{name:'tropic thunder'})-[:stars_in]-(actors)
returns 5 actors
I'm looking to match exactly (all 5 actors -> same 5 actors as guest stars) or as a subset (all 5 actors are a subset of a movie which has 10 guest stars).
Hope that makes sense. Thanks for your help :D
The first thing I would point out is that you should call the variable actor instead of actors. It may seem picky, but it's a common confusion with Cypher. With the MATCH you are matching one sub-pattern at a time.
So to start out let's find each movie2 and get an array of the actors in question:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
RETURN movie2.name, collect(actor)
A first instinct might be to extend the path like so:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)-[:guest_starts_in]-(actor2)
But again, we're matching every possible match of that path in the database. So for each actor, we're going to match all possible actor2s, which would lead to duplicates.
What we can do, though, is to take our first query and change the RETURN to a WITH in order to pass our data onto a second part of the query:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
WITH movie2, collect(actor) AS original_movie_actors
MATCH movie2-[:guest_stars_in]-(guest_star)
RETURN movie2.name, original_movie_actors, collect(guest_star) AS guest_stars
This gives us
a list of movies in question
the list of the actors who both stared in "tropic thunder" and guest stared in the movie in question
all guest stars for the movie in question
From here you could probably figure it out in your programming language of choice. But we can figure this out in Cypher too:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
WITH movie2, collect(actor) AS original_movie_actors
MATCH movie2-[:guest_stars_in]-(guest_star)
WITH movie2, original_movie_actors, collect(guest_star) AS guest_stars
RETURN
movie.name,
ALL(guest_star IN guest_stars WHERE guest_star IN original_movie_actors) AS all_matched,
length(original_movie_actors) / length(guest_stars) AS percentage_match
I threw in a percentage_match as a double-check and in case that's useful
I want to do something like this:
MATCH (p:person)-[a:UPVOTED]->(t:topic),(p:person)-[b:DOWNVOTED]->(t:topic),(p:person)-[c:FLAGGED]->(t:topic) WHERE ID(t)=4 RETURN COUNT(a),COUNT(b),COUNT(c)
..but I get all 0 counts when I should get 2, 1, 1
A better solution is to use size which improve drastically the performance of the query :
MATCH (t:Topic)
WHERE id(t) = 4
RETURN size((t)<-[:DOWNVOTED]-(:Person)) as downvoted,
size((t)<-[:UPVOTED]-(:Person)) as upvoted,
size((t)<-[:FLAGGED]-(:Person)) as flagged
If you are sure that the other nodes on the relationships are always labelled with Person, you can remove them from the query and it will be a bit faster again
Let's start with refactoring the query a bit (hopefully the meaning of it isn't lost):
MATCH
(t:topic)
(p:person)-[upvote:UPVOTED]-(t),
(p:person)-[downvote:DOWNVOTED]->(t),
(p:person)-[flag:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Since t is your primary variable (since you are filtering on it), I've matched once with the label and then used just the variable throughout the rest of the matches. Seeing the query cleaned up like this, it seems to me that you're trying to count all upvotes/downvotes/flags for a topic, but you don't care who did those things. Currently, since you're using the same variable p Cypher is going to try to match the same person for all three lines. So you could have different variables:
(p1:person)-[upvote:UPVOTED]-(t),
(p2:person)-[downvote:DOWNVOTED]->(t),
(p3:person)-[flag:FLAGGED]->(t)
Or better, since you're not referencing the people anywhere else, you can just leave the variables out:
(:person)-[upvote:UPVOTED]-(t),
(:person)-[downvote:DOWNVOTED]->(t),
(:person)-[flag:FLAGGED]->(t)
And stylistically, I would also suggest starting your matches with the item that you're filtering on:
(t)<-[upvote:UPVOTED]-(:person)
(t)<-[downvote:DOWNVOTED]-(:person)
(t)<-[flag:FLAGGED]-(:person)
The next problem comes in because by making these a MATCH, you're saying that there NEEDS to be a match. Which means you'll never get cases with zeros. So you'll want OPTIONAL MATCH:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[upvote:UPVOTED]-(:person)
OPTIONAL MATCH (t)<-[downvote:DOWNVOTED]-(:person)
OPTIONAL MATCH (t)<-[flag:FLAGGED]-(:person)
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Even then, though what you're saying is: "Find a topic and find all cases where there is 1 upvote, no downvote, no flag, 1 upvote, 1 downvote, no flag, etc... to all permutations). That means you'll want to COUNT one at a time:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[r:UPVOTED]-(:person)
WITH t, COUNT(r) AS upvotes
OPTIONAL MATCH (t)<-[r:DOWNVOTED]-(:person)
WITH t, upvotes, COUNT(r) AS downvotes
OPTIONAL MATCH (t)<-[r:FLAGGED]-(:person)
RETURN upvotes, downvotes, COUNT(r) AS flags
A couple of miscellaneous items:
Be careful about using Neo IDs as a long-term reference because they can be recycled.
Use parameters whenever possible for performance / security (WHERE ID(t)={topic_id})
Also, labels are generally TitleCase. See The Zen of Cypher guide.
Check this query, i think it will help you.
MATCH (p:person)-[a:UPVOTED]->(t:topic),
(p)-[b:DOWNVOTED]->(t),(p)-[c:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(a) as a_count,COUNT(b) as b_count,COUNT(c) as c_count;
Your current MATCH requires that the same person node (identified by p) have relationships of all 3 types with t. This is because an identifier is bound to a specific node (or relationship, or value), and (unless hidden by a WITH clause, which you do not have in your query) will reference that same node (or relationship, or value) throughout a query.
Based on your expected results, I am assuming that you are just trying to count the number of relationships of those 3 types between any person and t. If so, this is a performant way to do that:
MATCH (t:topic)
WHERE ID(t) = 4
MATCH (:person)-[r:UPVOTED|DOWNVOTED|FLAGGED]->(t)
RETURN REDUCE(s=[0,0,0], x IN COLLECT(r) |
CASE TYPE(x)
WHEN 'UPVOTED' THEN [s[0]+1, s[1], s[2]]
WHEN 'DOWNVOTED' THEN [s[0], s[1]+1, s[2]]
ELSE [s[0], s[1], s[2]+1]
END
) As res;
res is an array with the number of UPVOTED, DOWNVOTED, and FLAGGED relationships, respectively, between any person and t.
Another approach would be to use separate OPTIONAL MATCH statements for each relationship type, returning three COUNT(DISTINCT x) values. But the above query uses a single MATCH statement, greatly reducing the number of DB hits, which are generally expensive.