Neo4j/Cypher matching first n nodes in the traversal branch - neo4j

I have graph: (:Sector)<-[:BELONGS_TO]-(:Company)-[:PRODUCE]->(:Product).
I'm looking for the query below.
Start with (:Sector). Then match first 50 companies in that sector and for each company match first 10 products.
First limit is simple. But what about limiting products.
Is it possible with cypher?
UPDATE
As #cybersam suggested below query will return valid results
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
MATCH (c)-[:PRODUCE]->(p:Product)
WITH c, (COLLECT(p))[0..10] AS products
RETURN c, products
However this solution doesn't scale as it still traverses all products per company. Slice applied after each company products collected. As number of products grows query performance will degrade.

Each returned row of this query will contain: a sector, one of its companies (at most 50 per sector), and a collection of up to 10 products for that company:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH s, (COLLECT(c))[0..50] AS companies
UNWIND companies AS company
MATCH (company)-[:PRODUCE]->(p:Product)
WITH s, company, (COLLECT(p))[0..10] AS products;

Updating with some solutions using APOC Procedures.
This Neo4j knowledge base article on limiting results per row describes a few different ways to do this.
One way is to use apoc.cypher.run() to execute a limited subquery per row. Applied to the query in question, this would work:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.cypher.run('MATCH (c)-[:PRODUCE]->(p:Product) WITH p LIMIT 10 RETURN collect(p) as products', {c:c}) YIELD value
RETURN c, value.products AS products
The other alternative mentioned is using APOC path expander procedures, providing the label on a termination filter and a limit:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.path.subgraphNodes(c, {maxLevel:1, relationshipFilter:'PRODUCE>', labelFilter:'/Product', limit:10}) YIELD node
RETURN c, collect(node) AS products

Related

Neo4J get N latest relationships per unique target node?

With the following graph:
How can I write a query that would return N latest relationships by the unique target node?
For an example, this query: MATCH (p)-[r:RATED_IN]->(s) WHERE id(p)={person} RETURN p,s,r ORDER BY r.measurementDate DESC LIMIT {N} with N = 1 would return the latest relationship, whether it is RATED_IN Team Lead or Programming, but I would like to get N latest by each type. Of course, with N = 2, I would like the 2 latest measurements per skill node.
I would like the latest relationship by a person for Team Lead and the latest one for Programming.
How can I write such a query?
-- EDIT --
MATCH (p:Person) WHERE id(p)=175
CALL apoc.cypher.run('
WITH {p} AS p
MATCH (p)-[r:RATED_IN]->(s)
RETURN DISTINCT s, r ORDER BY r.measurementDate DESC LIMIT 2',
{p:p}) YIELD value
RETURN p,value.r AS r, value.s AS s
Here's a Cypher knowledge base article on limiting MATCH results per row, with a few different suggestions on how to accomplish this given current limitations. Using APOC's apoc.cypher.run() to perform a subquery with a RETURN using a LIMIT will do the trick, as it gets executed per row (thus the LIMIT is per row).
Note that for the upcoming Neo4j 4.0 release at the end of the year we're going to be getting some nice Cypher goodies that will make this significantly easier. Stay tuned as we reveal more details as we approach its release!

find all items that wasn't bought by a person and count the times it was bought

I have a graph that looks like this.
I want to find all the items bought by the people, who bought the same items as Gremlin using cypher.
Basically I want to imitate the query in the gremlin examples that looks like this
g.V().has("name","gremlin")
.out("bought").aggregate("stash")
.in("bought").out("bought")
.where(not(within("stash")))
.groupCount()
.order(local).by(values,desc)
I was trying to do it like this
MATCH (n)-[:BOUGHT]->(g_item)<-[:BOUGHT]-(r),
(r)-[:BOUGHT]->(n_item)
WHERE
n.name = 'Gremlin'
AND NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
but it seems it doesn't count frequencies properly - they seem to be twice as big.
4 - 4
5 - 2
3 - 2
While 3 and 5 was bought only once and 4 was bought 2 times.
What's the problem?
Cypher is interested in paths, and your MATCH finds the following:
2 paths to item 3 both through Rexter (via items 2 and 1)
2 paths to item 5 through Pipes (via items 1 and 2)
4 paths to item 4 via Rexter and Pipes (via items 1 and 2 for each person)
Basically the items are being counted multiple times because there are multiple paths to that same item per individual person via different common items with Gremlin.
To get accurate counts, you either need to match to distinct r users, and only then match out to items the r users bought (as long as they aren't in the collection of items bought by Gremlin), OR you need to do the entire match, but before doing the counts, get distinct items with respect to each person so each item per person only occurs once...then get the count per item (counts across all persons).
Here's a query that uses the second approach
MATCH (n:Person)-[:BOUGHT]->(g_item)
WHERE n.name = 'Gremlin'
WITH n, collect(g_item) as excluded
UNWIND excluded as g_item // now you have excluded list to use later
MATCH (g_item)<-[:BOUGHT]-(r)-[:BOUGHT]->(n_item)
WHERE r <> n AND NOT n_item in excluded
WITH DISTINCT r, n_item
WITH n_item, count(*) as frequency
RETURN n_item.id, frequency
ORDER by frequency DESC
You should be using labels in your graph, and you should use them in your query in order to leverage indexes and quickly find a starting point in the graph. In your case, an index on :Person(name), and usage of the :Person label in the query, should make this quick even as more nodes and more :Persons are added to the graph.
EDIT
If you're just looking for conciseness of the query, and don't have a large enough graph where performance will be an issue, then you can use your original query but add one extra line to get distinct rows of r and n_item before you count the item. This ensures that you only count an item per person once when you get the count.
Note that forgoes optimizations for handling excluded items (it will do a pattern match per item rather than aggregating the collection of bought items and doing a collection membership check), and it aggregates on items while doing property access rather than doing property access only after aggregating by the node.
MATCH (n:Person)-[:BOUGHT*2]-(r)-[:BOUGHT]->(n_item)
WHERE n.name = 'Gremlin'
WITH DISTINCT n, r, n_item
WHERE NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
I am adding a quick shortcut in your match, using :BOUGHT*2 to indicate two :BOUGHT hops to r, since we don't really care about the item in-between.

Neo4j: Query to find the nodes with most relationships, and their connected nodes

I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:
MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to
MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10
Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:
MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r
What am I doing wrong? Why isn't the first query working as expected?
Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.
So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.
You might rewrite this in a couple different ways.
One is to collect the authors and order on the size of the author list:
MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10
You can always collect the author and its relationship, if the relationship itself is interesting to you.
EDIT
If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:
MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a
You can collect on the authors and/or return the relationship as well, depending on what you need from the output.
You are very close: after sorting, it is necessary to rediscover the authors. For example:
MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b,
COUNT(r) AS authorsCount
ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b,
COLLECT(a) AS authors
ORDER BY size(authors) DESC

How to get total count of users exclude location="Hyderabad" and include deviceBrand= "lenova"?

I am using neo4j version 3.0.3. I have executed the below query. It is giving the results as the count of users who have the HAS_VISITED_LOCATION relation, but I want the total count of users who don't have the HAS_VISITED_LOCATION relation also.
MATCH (c:Consumer)-[:HAS_VISITED_LOCATION]-(l:Location)
WHERE NOT l.AreaName="hyderabad"
MATCH(c)-[:HAS_DEVICE_BRAND]-(d:DeviceBrand{BrandName:"lenovo"})
RETURN count(c)
So you're asking for the count of all consumers who have the lenovo device brand and who have not visited hyderabad.
This query should do that:
MATCH (l:Location {AreaName:'hyderabad'})
MATCH (c:Consumer)-[:HAS_DEVICE_BRAND]->(:DeviceBrand{BrandName:"lenovo"})
WHERE NOT (c)-[:HAS_VISITED_LOCATION]->(l)
RETURN COUNT(DISTINCT c)
EDIT - New (but related) question on how to get consumers who have not visited hyderabad and who don't have the lenovo brand.
This new question is trickier in that it's matching on the absence of relationships.
The straight forward approach is to simply match on consumers where the consumer has not visited hyderabad and doesn't have the lenevo device brand:
MATCH (c:Consumer)
WHERE NOT (c)-[:HAS_VISITED_LOCATION]->(l:Location {AreaName:'hyderabad'})
AND NOT (c)-[:HAS_DEVICE_BRAND]->(:DeviceBrand{BrandName:"lenovo"})
RETURN COUNT(c) as count
While this is correct, it may not be the most efficient query.
If we look at the logical representation of what you want, we might see an alternate approach:
NOT (visited hyderabad) AND NOT (has lenevo)
If we take the negation of your requirement:
NOT (NOT (visited hyderabad) AND NOT (has lenevo)) = (visited
hyderabad) OR (has lenevo)
So an alternate query can be to find the count of the negation of what you want (the count of consumers who have visited hyderabad OR who have lenovo), and subtract it from the total consumer count to get the actual count you want.
You can try this query and see if it performs better than the straightforward approach:
// first get the total count of consumers, should be very fast
MATCH (c:Consumer)
WITH COUNT(c) as totalCount
MATCH (lenovo:DeviceBrand{BrandName:'lenevo'}), (hyderabad:Location{AreaName:'hyderabad'})
// union lenevo and hyderabad into one column through collecting and combining and unwinding
// (this is a workaround since Cypher can't do post-union processing)
WITH totalCount, COLLECT(lenevo) + COLLECT(hyderabad) as excludeNodes
UNWIND excludeNodes as excludeNode
// get all consumers attached to these nodes
MATCH (excludeNode)<-[:HAS_DEVICE_BRAND|:HAS_VISITED_LOCATION]-(c:Consumer)
WITH totalCount, COUNT(DISTINCT c) as excludeCount
RETURN totalCount - excludeCount as count

Neo4j: multiple counts from multiple matches

Given a neo4j schema similar to
(:Person)-[:OWNS]-(:Book)-[:CATEGORIZED_AS]-(:Category)
I'm trying to write a query to get the count of books owned by each person as well as the count of books in each category so that I can calculate the percentage of books in each category for each person.
I've tried queries along the lines of
match (p:Person)-[:OWNS]-(b:Book)-[:CATEGORIZED_AS]-(c:Category)
where person.name in []
with p, b, c
match (p)-[:OWNS]-(b2:Book)-[:CATEGORIZED_AS]-(c2:Category)
with p, b, c, b2
return p.name, b.name, c.name,
count(distinct b) as count_books_in_category,
count(distinct b2) as count_books_total
But the query plan is absolutely horrible when trying to do the second match. I've tried to figure out different ways to write the query so that I can do the two different counts, but haven't figured out anything other than doing two matches. My schema isn't really about people and books. The :CATEGORIZED_AS relationship in my example is actually a few different relationship options, specified as [:option1|option2|option3]. So in my 2nd match I repeat the relationship options so that my total count is constrained by them.
Ideas? This feels similar to Neo4j - apply match to each result of previous match but there didn't seem to be a good answer for that one.
UNWIND is your friend here. First, calculate the total books per person, collecting them as you go.
Then unwind them so you can match which categories they belong to.
Aggregate by category and person, and you should get the number of books in each category, for a person
match (p:Person)-[:OWNS]->(b:Book)
with p,collect(b) as books, count(b) as total
with p,total,books
unwind books as book
match (book)-[:CATEGORIZED_AS]->(c)
return p,c, count(book) as subtotal, total

Resources