I am following Neo4j's 'Intermediate Cypher Queries' course in the neo4j graph academy, and I've been introduced to the WITH clause, whose basic function is to define or re-define the scope of variables. Now for some reason I can't quite wrap my head around the use of the WITH clause with the examples they have given, especially in regard to pipelining. For example, in one of the exercises I am told to use WITH to aggregate intermediate results. Here is the correct answer:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
WHERE p.name = 'Tom Hanks'
WITH m, avg(r.rating) AS avgRating
RETURN m.title AS Movie, avgRating AS `AverageRating`
ORDER BY avgRating DESC
But, to my mind, the WITH clause doesn't really do much work. To convince myself, I re-wrote the query to get the same result without the WITH clause:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
WHERE p.name = 'Tom Hanks'
RETURN m.title AS Movie, avg(r.rating) AS `AverageRating`
ORDER BY avg(r.rating) DESC
This works fine, with one less line of code. Perhaps the issue is just of example – in much longer queries the 'WITH method' would come into its own. But, as it stands, I can't fully account for the real use of WITH. So, for example, they talk about pipelining results, but we specified 'm' right at the start in the MATCH clause, so why are we bothering to have a WITH clause with it the 'm' variable in it again? As for the 'avg(r.rating)', really it just seems like we're wasting time renaming the result of a query when this is something we can just do as the end as I have done. So, what's really going on here? Can someone enlighten me?
The WITH clause is helpful when you want to do intermediate aggregations or do several aggregations in sequence. You could also do intermediate filtering. Think of it as an option to manipulate/transform data in the middle of a query statement.
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
WITH m, avg(r.rating) AS avgRating
WHERE avgRating > 8
RETURN m.title AS Movie, avgRating AS `AverageRating`
ORDER BY avgRating DESC
Here is one example where you perform intermediate aggregation combined with filtering, that otherwise wouldn't be possible without a WITH statement as the average rating has to be calculated, and you can't filter results in the RETURN statement
Related
I have this graph where the nodes are researchers, and they are related by a relationship named R1, the relationship has a "value" property. How can I get the name of the researchers that are in the relationships with the greatest value? It's like get all the relationships order by r.value DESC but getting only the first relationship per researcher, because I don't want to see on the table duplicated researcher names. By the way, is there a way to get the name of the researchers order by the mean of their relationship "values"? Sorry about the confused topic, I don't speak English very well, thank you very much.
I've been trying things like the Cypher query bellow:
MATCH p=(n)-[r:R1]->(c)
WHERE id(n) < id(c) and r.coauthors = false
return DISTINCT n.name order by n.campus, r.value DESC
Correct me if I am wrong, but you want one result per "n" with the highest value from "r"?
MATCH (n)-[r:R1]->(c)
WHERE r.coauthors = false
WITH n, r ORDER BY r.value DESC
WITH n, head(collect(r)) AS highR
RETURN n.name, highR.value ORDER BY n.campus, highR.value DESC
This will get you all the r's in order and pick the first head(collect(r)) after first doing an ORDER BY. Then you just need to return the values you want. Check out Neo4j Aggregation Functions for some documentation on how aggregation functions work. Good luck!
As an aside, if there is a label that all "n" have, you should add that in your MATCH: MATCH (n:Person) .... it will help speed up your query!
I have some sample tweets stored as neo4j. Below query finds top hashtags from specific country. It is taking a lot of time because the time filter for status type nodes is in where clause and is slowing the response. Is it possible to move this filter to MATCH clause so that status nodes are filtered before relationships are found?
match (c:country{countryCode:"PK"})-[*0..4]->(s:status)-[*0..1]->(h:hashtag) where (s.createdAt >= datetime('2017-06-01T00:00:00') AND s.createdAt
>= datetime('2017-06-01T23:59:59')) return h.name,count(h.name) as hCount order by hCount desc limit 100
thanks
As mentioned in my comment, whether a predicate for a property is in the MATCH clause or the WHERE clause shouldn't matter, as this is just syntactical sugar and is interpreted the same way by the query planner.
You can use PROFILE or EXPLAIN to see the query plan to see what it's doing. PROFILE will give you more information but will have to actually execute the query. You can attempt to use planner hints to force the planner to plan the match differently which may yield a better approach.
You will want to ensure you have an index on :status(createdAt).
You can also try altering your match a little, and moving the portion connecting to the country in question into your WHERE clause instead. Also it's a good idea to get the count based upon the hashtag node itself (assuming there's only one :hashtag node for a given name) so you can order and limit before you do property access:
MATCH (s:status)-[*0..1]->(h:hashtag)
WHERE (s.createdAt >= datetime('2017-06-01T00:00:00') AND s.createdAt
>= datetime('2017-06-01T23:59:59'))
AND (:country{countryCode:"PK"})-[*0..4]->(s)
WITH h, count(h) as hCount
ORDER BY hCount DESC
LIMIT 100
RETURN h.name, hCount
I'd like to write a cypher query which will tell me how frequently a particular node property occurs in a set of matches. For example, in
MATCH (:left)-->(p:right)
I'd like to know how many times the right nodes p.id are "id 1" or "id 2" and so on.
Currently I'm returning all the matches and then (using a separate tool - python) counting the number of times each id occurs in the records.
I'm sure there must be a way to do this purely in cypher using DISTINCT, collect() and count(), but I've got myself stuck...
I think that what your are searching is this query :
MATCH (:left)-->(p:right)
RETURN p.id, count(DISTINCT p)
Cheers
I'm very new to Neo4J and I can't get this simple query work.
The data I have looks like this:
(a)-[:likes]->(b)
(a)-[:likes]->(c)
Now I'd like to extract a list with everyone who likes someone else.
Tried
match (u)-[:likes]->(p) return u order by p.id desc;
This gives me a duplicate of (a).
I tried using distinct:
match (u)-[:likes]->(p) return distinct u order by p.id desc;
This gives me 'variable p undefined'.
I know that if I drop the ordering, distinct works and gives me (a) once.
But how can I work with distinct and order by in the same time?
Consider why your query isn't working:
Without the distinct, you have rows with each pairing of u and p. When you use DISTINCT, how is it supposed to order when there are multiple lines for the same u, matching to multiple p's? That's an impossible task.
If you change it to order by u.id instead, then it works just fine.
I do encourage you to use labels, by the way, to restrict your query only to relevant nodes. You can also rework your query to prevent it from emitting duplicates and avoid the need for DISTINCT completely.
If we assume the nodes you're interested in are labeled with :Person, your query might be:
MATCH (p:Person)
WHERE EXISTS( (p)-[:likes]-() )
RETURN p ORDER BY p.id DESC
I have a query that takes too long to execute:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
MATCH c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
However, when I split it in two:
MATCH (s:Person{id:"103"}), s-[rel]-a
RETURN rel, s;
and
MATCH (s:Person{id:"103"}),
c1-[:friend]->s<-[:friend]-c2, c1-[fol:follows]->c2
RETURN DISTINCT c1,c2;
they are much faster.
Why is it that passing rel and s to the next query makes it so much slower?
(I'm asking because that sample query is only a part of a bigger one and I pass on rel and s with the WITH instead of RETURN to the next part of the query)
Thank you
The first cycles for each node and relation found in the first MATCH:
MATCH (s:Person{id:"103"}), s-[rel]-a WITH rel, s
One row for each relation involving that node. I would use the third query, since rel is never used.
Maybe you try to "profile" the query in Neo4J console, it will give you some clues of how the query actually executed in server.
btw, why would you need to pass the on the "rel" since it never been used