With the following graph:
How can I write a query that would return N latest relationships by the unique target node?
For an example, this query: MATCH (p)-[r:RATED_IN]->(s) WHERE id(p)={person} RETURN p,s,r ORDER BY r.measurementDate DESC LIMIT {N} with N = 1 would return the latest relationship, whether it is RATED_IN Team Lead or Programming, but I would like to get N latest by each type. Of course, with N = 2, I would like the 2 latest measurements per skill node.
I would like the latest relationship by a person for Team Lead and the latest one for Programming.
How can I write such a query?
-- EDIT --
MATCH (p:Person) WHERE id(p)=175
CALL apoc.cypher.run('
WITH {p} AS p
MATCH (p)-[r:RATED_IN]->(s)
RETURN DISTINCT s, r ORDER BY r.measurementDate DESC LIMIT 2',
{p:p}) YIELD value
RETURN p,value.r AS r, value.s AS s
Here's a Cypher knowledge base article on limiting MATCH results per row, with a few different suggestions on how to accomplish this given current limitations. Using APOC's apoc.cypher.run() to perform a subquery with a RETURN using a LIMIT will do the trick, as it gets executed per row (thus the LIMIT is per row).
Note that for the upcoming Neo4j 4.0 release at the end of the year we're going to be getting some nice Cypher goodies that will make this significantly easier. Stay tuned as we reveal more details as we approach its release!
Related
I have a large DB >1m customers, >100k products
I am trying to build a product recommendation out of real-time data. However, the first query I build never resolves. (or at least I stop it after 30 min)
I want to INPUT a customer and Get All the Uses of the products they purchase. Then I want to get ALL Customers who have those same Uses AND More. (I was going to do things after that but can't get past this part)
ALL nodes are INDEXED on a unique ID.
MATCH (c:customer {customer_id:'0c4c518e5d1eaf3fc39f93463c2406ad8b659d6c22c9107179e3992f647b12aa'})-[:PURCHASE]->(p:product)-[:HAS]->(u:use)
WITH DISTINCT(u.section_id) as uses
MATCH (ac:customer)-[:PURCHASE]->(ap:product)-[:HAS]->(au:use)
WHERE au.section_id in uses
RETURN ac
I build a NEW Query that does the full Recommendation. But still takes 1min...
I believe the issue was I was Matching Up to :use and then going back this must have been getting just way too many nodes to sort through.
MATCH (c:customer {customer_id:'0c4c518e5d1eaf3fc39f93463c2406ad8b659d6c22c9107179e3992f647b12aa'})-[:PURCHASE]->(p)<-[:PURCHASE]-(oc)
MATCH (u:use)<-[:HAS]-(p)-[:HAS]->(h:color)-[:IS_A]->(ph)
WHERE c <> oc AND oc.age > (c.age-10) AND oc.age < (c.age+10)
MATCH (oc)-[:PURCHASE]->(np)-[:HAS]->(u)
WITH p, np, ph, u, COLLECT(DISTINCT p.gra_id) as styles
WHERE p.product_code <> np.product_code AND np.gra_id in styles AND (np)-[:HAS]->(:color)-[:IS_A]->(ph)
WITH u.name as use, np.article_id as product, count(*) as score
ORDER BY score DESC
RETURN use, collect(product)[0..6] as products
ORDER BY use
I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:
MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to
MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:
MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r
What am I doing wrong? Why isn't the first query working as expected?
Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.
So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.
You might rewrite this in a couple different ways.
One is to collect the authors and order on the size of the author list:
MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10
You can always collect the author and its relationship, if the relationship itself is interesting to you.
EDIT
If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:
MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a
You can collect on the authors and/or return the relationship as well, depending on what you need from the output.
You are very close: after sorting, it is necessary to rediscover the authors. For example:
MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b,
COUNT(r) AS authorsCount
ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b,
COLLECT(a) AS authors
ORDER BY size(authors) DESC
I have a collection of :Product nodes and I want to return latest 100. Consider global query like:
MATCH (p:Product) RETURN p LIMIT 100
From what I can see it returns oldest nodes first. Is there a way of getting newest on top?
Order by won't be an option as number of products can be millions.
UPDATE
I ended up creating a dense node (:ProductIndex). Each time I create product I add it to the index (:Product)-[:INDEXED]->(:ProductIndex). With dense nodes rel chain will be ordered by latest first so that query below will return newest records on top
MATCH (p:Product)-[:INDEXED]->(:ProductIndex)
RETURN p
LIMIT 1000
I can always keep index fixed size as I don't need to preserve full history.
Is the data model such that the products are connected in a list (e.g. (:Product)-[:PREVIOUS]->(:Product)?
Can you keep track of the most recent node? Either with a time stamp that you can easily locate or another node connected to your most recent product node.
If so, you could always query out the most recent ones with a query similar to the following.
match (max:Date {name: 'Last Product Date'})-->(latest:Product)
with latest
match p=(latest)-[:PREVIOUS*..100]->(:Product)
return nodes(p)
order by length(p) desc
limit 1
OR something like this where you select
match (max:Date {name: 'Product Date'})
with max
match p=(latest:Product)-[:PREVIOUS*..100]->(:Product)
where latest.date = max.date
return nodes(p)
order by length(p) desc
limit 1
Another approach, still using a list could be to keep an indexed create date property on each product. But when looking for the most recent pick a control date that doesn't go back to the beginning of time so you have a smaller pool of nodes (i.e. not millions). Then use an max function on that smaller pool to find the most recent node and follow it back by however many you want.
match (latest:Product)
where latest.date > {control_date}
with max(latest.date) as latest_date
match p=(product:Product)-[:PREVIOUS*..100]->(:Product)
where product.date = latest_date
return nodes(p)
order by length(p) desc
limit 1
Deleting a node in a linked list is pretty simple. If you need to perform this search a lot and you don't want to order the products, I think keeping the products in a list is a pretty good graph application. Here is an example of a delete that maintains the list.
match (previous:Product)<-[:PREVIOUS]-(product_to_delete:Product)<-[:PREVIOUS]-(next:Product)
where product_to_delete.name = 'name of node to delete'
create (previous)<-[:PREVIOUS]-(next)
detach delete product_to_delete
return previous, next
I have graph: (:Sector)<-[:BELONGS_TO]-(:Company)-[:PRODUCE]->(:Product).
I'm looking for the query below.
Start with (:Sector). Then match first 50 companies in that sector and for each company match first 10 products.
First limit is simple. But what about limiting products.
Is it possible with cypher?
UPDATE
As #cybersam suggested below query will return valid results
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
MATCH (c)-[:PRODUCE]->(p:Product)
WITH c, (COLLECT(p))[0..10] AS products
RETURN c, products
However this solution doesn't scale as it still traverses all products per company. Slice applied after each company products collected. As number of products grows query performance will degrade.
Each returned row of this query will contain: a sector, one of its companies (at most 50 per sector), and a collection of up to 10 products for that company:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH s, (COLLECT(c))[0..50] AS companies
UNWIND companies AS company
MATCH (company)-[:PRODUCE]->(p:Product)
WITH s, company, (COLLECT(p))[0..10] AS products;
Updating with some solutions using APOC Procedures.
This Neo4j knowledge base article on limiting results per row describes a few different ways to do this.
One way is to use apoc.cypher.run() to execute a limited subquery per row. Applied to the query in question, this would work:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.cypher.run('MATCH (c)-[:PRODUCE]->(p:Product) WITH p LIMIT 10 RETURN collect(p) as products', {c:c}) YIELD value
RETURN c, value.products AS products
The other alternative mentioned is using APOC path expander procedures, providing the label on a termination filter and a limit:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.path.subgraphNodes(c, {maxLevel:1, relationshipFilter:'PRODUCE>', labelFilter:'/Product', limit:10}) YIELD node
RETURN c, collect(node) AS products
I'm developing a kind of reddit service to learn Neo4j.
Everything works fine, I just want to get some feedback on the Cypher query to get the most recent news stories, the author and number of comments, likes and dislikes.
I'm using Neo4j 2.0.
MATCH comments = (n:news)-[:COMMENT]-(o)
MATCH likes = (n:news)-[:LIKES]-(p)
MATCH dislikes = (n:news)-[:DISLIKES]-(q)
MATCH (n:news)-[:POSTED_BY]-(r)
WITH n, r, count(comments) AS num_comments, count(likes) AS num_likes, count(dislikes) AS num_dislikes
ORDER BY n.post_date
LIMIT 20
RETURN *
o, p, q, r are all nodes with the label user. Should the label be added to the query to speed it up?
Is there anything else you see that I could optimize?
I think you're going to want to get rid of the multiple matches. Cypher will filter on each one, filtering through one another, rather than getting all the information.
I would also avoid the paths like comments, and rather do the count on the nodes you are saving. When you do MATCH xyz = (a)-[:COMMENT]-(b) then xyz is a path, which contains the source, relationship and destination node.
MATCH (news:news)-[:COMMENT]-(comment),(news:news)-[:LIKES]-(like),(news:news)-[:DISLIKES]-(dislike),(news:news)-[:POSTED_BY]-(posted_by)
WHERE news.post_date > 0
WITH news, posted_by, count(comment) AS num_comments, count(like) AS num_likes, count(dislike) AS num_dislikes
ORDER BY news.post_date
LIMIT 20
RETURN *
I would do something like this.
MATCH (n:news)-[:POSTED_BY]->(r)
WHERE n.post_date > {recent_start_time}
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20
To speed it up and have not neo search over all your posts, I would probably index the post-date field (assuming it doesn't contain time information). And then send this query in for today, yesterday etc. until you have your 20 posts.
MATCH (n:news {post_date: {day}})-[:POSTED_BY]->(r)
RETURN n, r,
length((n)<-[:COMMENT]-()) AS num_comments,
length((n)<-[:LIKES]-()) AS num_likes,
length((n)<-[:DISLIKES]-()) AS num_dislikes,
ORDER BY n.post_date DESC
LIMIT 20