I'm working on an application using Neo4J and I'm having problems with the sorting in some of the queries. I have a list of stores that have promotions so I have a node for each store and a node for each promotion. Each store can have multiple promotions so it's a one to many relationship. Some of the promotions are featured (featured property = true) so I want those to appear first. I'm trying to construct a query that does the following:
Returns a list of stores with the promotoions as a collection (returning it like this is ideal for paging)
Sorts the stores so the ones with most featured promotions appear first
Sorts the collection so that the promotions that are featured appear first
So far I have the following:
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store) WITH p, s, collect(p.featured) as featuredCol WITH p, s, LENGTH(FILTER(i IN featuredCol where i='true')) as featuredCount ORDER BY p.featured DESC, featuredCount DESC RETURN s, collect(p) skip 0 limit 10
First, I try to create a collection using the featured property using a WITH clause. Then, I try to create a second collection where the featured property is equal to true and then get the length in a second WITH clause. This sorts the collection with the promotions correctly but not the stores. I get an error if I try to add another sort at the end like this because the featuredCount variable is not in the RETURN clause. I don't want the featuredCount variable in the RETURN clause because it throws my pagination off.
Here is my second query:
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store) WITH p, s, collect(p.featured) as featuredCol WITH p, s, LENGTH(FILTER(i IN featuredCol where i='true')) as featuredCount ORDER BY p.featured DESC, featuredCount DESC RETURN s, collect(p) ORDER BY featuredCount skip 0 limit 10
I'm very new to Neo4J so any help will be greatly appreciated.
Does this query (see this console) work for you?
MATCH (p:Promotion)-[r:BELONGS_TO_STORE]->(s:Store)
WITH p, s
ORDER BY p.featured DESC
WITH s, COLLECT(p) AS pColl
WITH s, pColl, REDUCE(n = 0, p IN pColl | CASE
WHEN p.featured
THEN n + 1
ELSE n END ) AS featuredCount
ORDER BY featuredCount DESC
RETURN s, pColl
LIMIT 10
This query performs the following steps:
It orders the matched rows so that the rows with featured promotions are first.
It aggregates all the p nodes for each distinct s into a pColl collection. The featured promotions still appear first within each pColl.
It calculates the number of featured promotions in each pColl, and orders the stores so that the ones with the most features promotions appear first.
It then returns the results.
Note: This query assumes that featured has a boolean value, not a string. (FYI: ORDER BY considers true to be greater than false). If this assumption is not correct, you can change the WHEN clause to WHEN p.featured = 'true'.
Related
I am trying to return a set of a node from 2 sessions with a condition that returned node should not be present in another session (third session). I am using the following code but it is not working as intended.
MATCH (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
UNWIND ['abc1', 'abc2'] as session_id
MATCH (target:Session {session_id: session_id})-[r:HAS_PRODUCT]->(product:Product)
where p<>product
WITH distinct product.products_id as products_id, r
RETURN products_id, count(r) as score
ORDER BY score desc
This query was supposed to return all nodes present in abc1 & abc2 but not in abc3. This query is not excluding all products present in abc3. Is there any way I can get it working?
UPDATE 1:
I tried to simplify it without UNWIND as this
match (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
MATCH (target:Session {session_id: 'abc1'})-[r:HAS_PRODUCT]->(product:Product)
where product <> p
WITH distinct product.products_id as products_id
RETURN products_id
Even this is also not working. It is returning all items present in abc1 without removing those which are already in abc3. Seems like where product <> p is not working correctly.
I would suggest it would be best to check if the nodes are in a list, and to prove out the approach, start with a very simple example.
Here is a simple cypher showing one way to do it. This approach can then be extended into the complex query,
// get first two product IDs as a list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
RETURN list
// now show two more product IDs which not in that list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
MATCH (p2:Product)
WHERE NOT ID(p2) in list
RETURN ID(p2) LIMIT 2
Note: I'm using the ID() of the nodes instead of the entire node, same dbhits but may be more performant...
I have a bunch of venues in my Neo4J DB. Each venue object has the property 'catIds' that is an array and contains the Ids for the type of venue it is. I want to query the database so that I get all Venues but they are ordered where their catIds match or contain some off a list of Ids that I give the query. I hope that makes sense :)
Please, could someone point me in the direction of how to write this query?
Since you're working in a graph database you could think about modeling your data in the graph, not in a property where it's hard to get at it. For example, in this case you might create a bunch of (v:venue) nodes and a bunch of (t:type) nodes, then link them by an [:is] relation. Each venue is linked to one or more type nodes. Each type node has an 'id' property: {id:'t1'}, {id:'t2'}, etc.
Then you could do a query like this:
match (v:venue)-[r:is]->(:type) return v, count(r) as n order by n desc;
This finds all your venues, along with ALL their type relations and returns them ordered by how many type-relations they have.
If you only want to get nodes of particular venue types on your list:
match (v:venue)-[r:is]-(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
And if you want ALL venues but rank ordered according to how well they fit your list, as I think you were looking for:
match (v:venue) optional match (v)-[r:is]->(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
The match will get all your venues; the optional match will find relations on your list if the node has any. If a node has no links on your list, the optional match will fail and return null for count(r) and should sort to the bottom.
Suppose tha I have the default database Movies and I want to find the total number of people that have participated in each movie, no matter their role (i.e. including the actors, the producers, the directors e.t.c.)
I have already done that using the query:
MATCH (m:Movie)<-[r]-(n:Person)
WITH m, COUNT(n) as count_people
RETURN m, count_people
ORDER BY count_people DESC
LIMIT 3
Ok, I have included some extra options but that doesn't really matter in my actual question. From the above query, I will get 3 movies.
Q. How can I enrich the above query, so I can get a graph including all the relationships regarding these 3 movies (i.e.DIRECTED, ACTED_IN,PRODUCED e.t.c)?
I know that I can deploy all the relationships regarding each movie through the buttons on each movie node, but I would like to know whether I can do so through cypher.
Use additional optional match:
MATCH (m:Movie)<--(n:Person)
WITH m,
COUNT(n) as count_people
ORDER BY count_people DESC
LIMIT 3
OPTIONAL MATCH p = (m)-[r]-(RN) WHERE type(r) IN ['DIRECTED', 'ACTED_IN', 'PRODUCED']
RETURN m,
collect(p) as graphPaths,
count_people
ORDER BY count_people DESC
I have graph: (:Sector)<-[:BELONGS_TO]-(:Company)-[:PRODUCE]->(:Product).
I'm looking for the query below.
Start with (:Sector). Then match first 50 companies in that sector and for each company match first 10 products.
First limit is simple. But what about limiting products.
Is it possible with cypher?
UPDATE
As #cybersam suggested below query will return valid results
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
MATCH (c)-[:PRODUCE]->(p:Product)
WITH c, (COLLECT(p))[0..10] AS products
RETURN c, products
However this solution doesn't scale as it still traverses all products per company. Slice applied after each company products collected. As number of products grows query performance will degrade.
Each returned row of this query will contain: a sector, one of its companies (at most 50 per sector), and a collection of up to 10 products for that company:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH s, (COLLECT(c))[0..50] AS companies
UNWIND companies AS company
MATCH (company)-[:PRODUCE]->(p:Product)
WITH s, company, (COLLECT(p))[0..10] AS products;
Updating with some solutions using APOC Procedures.
This Neo4j knowledge base article on limiting results per row describes a few different ways to do this.
One way is to use apoc.cypher.run() to execute a limited subquery per row. Applied to the query in question, this would work:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.cypher.run('MATCH (c)-[:PRODUCE]->(p:Product) WITH p LIMIT 10 RETURN collect(p) as products', {c:c}) YIELD value
RETURN c, value.products AS products
The other alternative mentioned is using APOC path expander procedures, providing the label on a termination filter and a limit:
MATCH (s:Sector)<-[:BELONGS_TO]-(c:Company)
WITH c
LIMIT 50
CALL apoc.path.subgraphNodes(c, {maxLevel:1, relationshipFilter:'PRODUCE>', labelFilter:'/Product', limit:10}) YIELD node
RETURN c, collect(node) AS products
As part of a larger query, I am trying to select PRODUCTs that have a relationship to more than one SKU. I subsequently want to delete these relationships and perform other modifications not included here for the sake of simplicity.
I was surprised to find that while the following query returns a single node:
MATCH (p:PRODUCT)-[p_s_rel:PRODUCT_SKU]->(s:SKU)
WITH s, p, count(s) AS sCount
WHERE sCount> 1 AND id(s) IN [9220, 9201]
RETURN s
adding the relationships p_s_rel in the WITH clause changes the result and returns no nodes:
MATCH (p:PRODUCT)-[p_s_rel:PRODUCT_SKU]->(s:SKU)
WITH s, p, p_s_rel, count(s) AS sCount
WHERE sCount> 1 AND id(s) IN [9220, 9201]
RETURN s
Based on the documentation for WITH, I didn't expect this behavior. Is it not possible to specify relationship identifiers in the WITH clause?
If that's the case, how can I delete the p_s_rel relationships in my example?
Edit: This is on version 2.1.5
Are you sure that the first query is doing what you think that it is? In your query if sCount > 1 then I think that means you have more than one relationship between p and some SKU (i.e multiple relationships between the same nodes). If your WITH statement were WITH p, count(s) AS sCount then you would be matching a single Product with multiple SKUs.
By executing WITH s, p, count(s) AS sCount you are saying carry the currently matched SKU, the currently matched PRODUCT and the count of PRODUCT_SKU relationships between them, whereas WITH p, count(s) AS sCount would be taken to mean carry the currently matched PRODUCT and the count of all PRODUCT_SKU relationships that it has.
In your second query, by including p_s_rel in your WITH clause you will be propagating only a single row with each result - as there will only ever be one distinct p_s_rel to carry forward in each match (a relationship only has one start node, and one end node). This means that sCount will never be greater than 1 - hence your empty resultset. i.e you are saying carry the currently matched SKU, the currently matched PRODUCT, the current realtionship between those two nodes and the count of the end nodes on that relationship (1).
If you want to carry the relationships forward you could use COLLECT, or, as you are going to be restricting your resultset by SKU, maybe you would be better off matching those first:
MATCH (s1:SKU), (s2:SKU)
WHERE ID(s1) = 9220 AND ID(s2) = 9201
WITH s1, s2
MATCH (s1)<-[r1:PRODUCT_SKU]-(p:PRODUCT)-[r2:PRODUCT_SKU]->(s2)
DELETE r1, r2,
RETURN p