Neo4j/Cypher query limit order by latest created - neo4j

I have a collection of :Product nodes and I want to return latest 100. Consider global query like:
MATCH (p:Product) RETURN p LIMIT 100
From what I can see it returns oldest nodes first. Is there a way of getting newest on top?
Order by won't be an option as number of products can be millions.
UPDATE
I ended up creating a dense node (:ProductIndex). Each time I create product I add it to the index (:Product)-[:INDEXED]->(:ProductIndex). With dense nodes rel chain will be ordered by latest first so that query below will return newest records on top
MATCH (p:Product)-[:INDEXED]->(:ProductIndex)
RETURN p
LIMIT 1000
I can always keep index fixed size as I don't need to preserve full history.

Is the data model such that the products are connected in a list (e.g. (:Product)-[:PREVIOUS]->(:Product)?
Can you keep track of the most recent node? Either with a time stamp that you can easily locate or another node connected to your most recent product node.
If so, you could always query out the most recent ones with a query similar to the following.
match (max:Date {name: 'Last Product Date'})-->(latest:Product)
with latest
match p=(latest)-[:PREVIOUS*..100]->(:Product)
return nodes(p)
order by length(p) desc
limit 1
OR something like this where you select
match (max:Date {name: 'Product Date'})
with max
match p=(latest:Product)-[:PREVIOUS*..100]->(:Product)
where latest.date = max.date
return nodes(p)
order by length(p) desc
limit 1
Another approach, still using a list could be to keep an indexed create date property on each product. But when looking for the most recent pick a control date that doesn't go back to the beginning of time so you have a smaller pool of nodes (i.e. not millions). Then use an max function on that smaller pool to find the most recent node and follow it back by however many you want.
match (latest:Product)
where latest.date > {control_date}
with max(latest.date) as latest_date
match p=(product:Product)-[:PREVIOUS*..100]->(:Product)
where product.date = latest_date
return nodes(p)
order by length(p) desc
limit 1
Deleting a node in a linked list is pretty simple. If you need to perform this search a lot and you don't want to order the products, I think keeping the products in a list is a pretty good graph application. Here is an example of a delete that maintains the list.
match (previous:Product)<-[:PREVIOUS]-(product_to_delete:Product)<-[:PREVIOUS]-(next:Product)
where product_to_delete.name = 'name of node to delete'
create (previous)<-[:PREVIOUS]-(next)
detach delete product_to_delete
return previous, next

Related

Neo4j Cypher - Filtering nodes by Input nodes in common Doesn't resolve

I have a large DB >1m customers, >100k products
I am trying to build a product recommendation out of real-time data. However, the first query I build never resolves. (or at least I stop it after 30 min)
I want to INPUT a customer and Get All the Uses of the products they purchase. Then I want to get ALL Customers who have those same Uses AND More. (I was going to do things after that but can't get past this part)
ALL nodes are INDEXED on a unique ID.
MATCH (c:customer {customer_id:'0c4c518e5d1eaf3fc39f93463c2406ad8b659d6c22c9107179e3992f647b12aa'})-[:PURCHASE]->(p:product)-[:HAS]->(u:use)
WITH DISTINCT(u.section_id) as uses
MATCH (ac:customer)-[:PURCHASE]->(ap:product)-[:HAS]->(au:use)
WHERE au.section_id in uses
RETURN ac
I build a NEW Query that does the full Recommendation. But still takes 1min...
I believe the issue was I was Matching Up to :use and then going back this must have been getting just way too many nodes to sort through.
MATCH (c:customer {customer_id:'0c4c518e5d1eaf3fc39f93463c2406ad8b659d6c22c9107179e3992f647b12aa'})-[:PURCHASE]->(p)<-[:PURCHASE]-(oc)
MATCH (u:use)<-[:HAS]-(p)-[:HAS]->(h:color)-[:IS_A]->(ph)
WHERE c <> oc AND oc.age > (c.age-10) AND oc.age < (c.age+10)
MATCH (oc)-[:PURCHASE]->(np)-[:HAS]->(u)
WITH p, np, ph, u, COLLECT(DISTINCT p.gra_id) as styles
WHERE p.product_code <> np.product_code AND np.gra_id in styles AND (np)-[:HAS]->(:color)-[:IS_A]->(ph)
WITH u.name as use, np.article_id as product, count(*) as score
ORDER BY score DESC
RETURN use, collect(product)[0..6] as products
ORDER BY use

Do not return set of nodes from a specific path in Cypher

I am trying to return a set of a node from 2 sessions with a condition that returned node should not be present in another session (third session). I am using the following code but it is not working as intended.
MATCH (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
UNWIND ['abc1', 'abc2'] as session_id
MATCH (target:Session {session_id: session_id})-[r:HAS_PRODUCT]->(product:Product)
where p<>product
WITH distinct product.products_id as products_id, r
RETURN products_id, count(r) as score
ORDER BY score desc
This query was supposed to return all nodes present in abc1 & abc2 but not in abc3. This query is not excluding all products present in abc3. Is there any way I can get it working?
UPDATE 1:
I tried to simplify it without UNWIND as this
match (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
MATCH (target:Session {session_id: 'abc1'})-[r:HAS_PRODUCT]->(product:Product)
where product <> p
WITH distinct product.products_id as products_id
RETURN products_id
Even this is also not working. It is returning all items present in abc1 without removing those which are already in abc3. Seems like where product <> p is not working correctly.
I would suggest it would be best to check if the nodes are in a list, and to prove out the approach, start with a very simple example.
Here is a simple cypher showing one way to do it. This approach can then be extended into the complex query,
// get first two product IDs as a list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
RETURN list
// now show two more product IDs which not in that list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
MATCH (p2:Product)
WHERE NOT ID(p2) in list
RETURN ID(p2) LIMIT 2
Note: I'm using the ID() of the nodes instead of the entire node, same dbhits but may be more performant...

Neo4J get N latest relationships per unique target node?

With the following graph:
How can I write a query that would return N latest relationships by the unique target node?
For an example, this query: MATCH (p)-[r:RATED_IN]->(s) WHERE id(p)={person} RETURN p,s,r ORDER BY r.measurementDate DESC LIMIT {N} with N = 1 would return the latest relationship, whether it is RATED_IN Team Lead or Programming, but I would like to get N latest by each type. Of course, with N = 2, I would like the 2 latest measurements per skill node.
I would like the latest relationship by a person for Team Lead and the latest one for Programming.
How can I write such a query?
-- EDIT --
MATCH (p:Person) WHERE id(p)=175
CALL apoc.cypher.run('
WITH {p} AS p
MATCH (p)-[r:RATED_IN]->(s)
RETURN DISTINCT s, r ORDER BY r.measurementDate DESC LIMIT 2',
{p:p}) YIELD value
RETURN p,value.r AS r, value.s AS s
Here's a Cypher knowledge base article on limiting MATCH results per row, with a few different suggestions on how to accomplish this given current limitations. Using APOC's apoc.cypher.run() to perform a subquery with a RETURN using a LIMIT will do the trick, as it gets executed per row (thus the LIMIT is per row).
Note that for the upcoming Neo4j 4.0 release at the end of the year we're going to be getting some nice Cypher goodies that will make this significantly easier. Stay tuned as we reveal more details as we approach its release!

find all items that wasn't bought by a person and count the times it was bought

I have a graph that looks like this.
I want to find all the items bought by the people, who bought the same items as Gremlin using cypher.
Basically I want to imitate the query in the gremlin examples that looks like this
g.V().has("name","gremlin")
.out("bought").aggregate("stash")
.in("bought").out("bought")
.where(not(within("stash")))
.groupCount()
.order(local).by(values,desc)
I was trying to do it like this
MATCH (n)-[:BOUGHT]->(g_item)<-[:BOUGHT]-(r),
(r)-[:BOUGHT]->(n_item)
WHERE
n.name = 'Gremlin'
AND NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
but it seems it doesn't count frequencies properly - they seem to be twice as big.
4 - 4
5 - 2
3 - 2
While 3 and 5 was bought only once and 4 was bought 2 times.
What's the problem?
Cypher is interested in paths, and your MATCH finds the following:
2 paths to item 3 both through Rexter (via items 2 and 1)
2 paths to item 5 through Pipes (via items 1 and 2)
4 paths to item 4 via Rexter and Pipes (via items 1 and 2 for each person)
Basically the items are being counted multiple times because there are multiple paths to that same item per individual person via different common items with Gremlin.
To get accurate counts, you either need to match to distinct r users, and only then match out to items the r users bought (as long as they aren't in the collection of items bought by Gremlin), OR you need to do the entire match, but before doing the counts, get distinct items with respect to each person so each item per person only occurs once...then get the count per item (counts across all persons).
Here's a query that uses the second approach
MATCH (n:Person)-[:BOUGHT]->(g_item)
WHERE n.name = 'Gremlin'
WITH n, collect(g_item) as excluded
UNWIND excluded as g_item // now you have excluded list to use later
MATCH (g_item)<-[:BOUGHT]-(r)-[:BOUGHT]->(n_item)
WHERE r <> n AND NOT n_item in excluded
WITH DISTINCT r, n_item
WITH n_item, count(*) as frequency
RETURN n_item.id, frequency
ORDER by frequency DESC
You should be using labels in your graph, and you should use them in your query in order to leverage indexes and quickly find a starting point in the graph. In your case, an index on :Person(name), and usage of the :Person label in the query, should make this quick even as more nodes and more :Persons are added to the graph.
EDIT
If you're just looking for conciseness of the query, and don't have a large enough graph where performance will be an issue, then you can use your original query but add one extra line to get distinct rows of r and n_item before you count the item. This ensures that you only count an item per person once when you get the count.
Note that forgoes optimizations for handling excluded items (it will do a pattern match per item rather than aggregating the collection of bought items and doing a collection membership check), and it aggregates on items while doing property access rather than doing property access only after aggregating by the node.
MATCH (n:Person)-[:BOUGHT*2]-(r)-[:BOUGHT]->(n_item)
WHERE n.name = 'Gremlin'
WITH DISTINCT n, r, n_item
WHERE NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
I am adding a quick shortcut in your match, using :BOUGHT*2 to indicate two :BOUGHT hops to r, since we don't really care about the item in-between.

How can I optimise my neo4j cypher query?

Please check my Cypher below, I am getting result with the query below() with low records but as records increases it take a long time about 1601152 ms:
i found suggestion to add USING INDEX and and I apply the USING INDEX in query.
PROFILE MATCH (m:Movie)-[:IN_APP]->(a:App {app_id: '1'})<-[:USER_IN]-(p:Person)-[:WATCHED]->(ma:Movie)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(mb:Movie)-[:IN_APP]->(a)
USING INDEX a:App(app_id) WHERE p.person_id= '1'
AND NOT (p:Person)-[:WATCHED]-(mb)
RETURN DISTINCT(mb.movie_id) , mb.title, mb.imdb_rating, mb.runtime, mb.award, mb.watch_count, COLLECT(DISTINCT(t.tag_id)) as Tag, count(DISTINCT(t.tag_id)) as matched_tags
ORDER BY matched_tags DESC SKIP 0 LIMIT 50
Can you help me out what can I do?
I am trying to find 100 movies for recommendation on basis of tags, as 100 movies which I do not watch and match with tags of Movies I watched.
The following query may work better for you [assuming you have indexes on both :App(app_id) and :Person(person_id)]. By the way, I presumed that in your query the identifier ma should have been m (or vice versa).
MATCH (m:Movie)-[:IN_APP]->(a:App {app_id: '1'})<-[:USER_IN]-(p:Person {person_id: '1'})-[:WATCHED]->(m)
WITH a, p, COLLECT(m) AS movies
UNWIND movies AS movie
MATCH (movie)-[:HAS_TAG]->(t)<-[:HAS_TAG]-(mb:Movie)-[:IN_APP]->(a)
WHERE NOT mb IN movies
WITH DISTINCT mb, t
RETURN mb.movie_id, mb.title, mb.imdb_rating, mb.runtime, mb.award, mb.watch_count, COLLECT(t.tag_id) as Tag, COUNT(t.tag_id) as matched_tags
ORDER BY matched_tags DESC SKIP 0 LIMIT 50;
If you PROFILE this query, you should see that it performs NodeIndexSeek operations (instead of the much slower NodeByLabelScan) to quickly execute the first MATCH. The query also collects all the movies watched by the specified person and uses that collection later to speed up the WHERE clause (which no longer needs hit the DB). In addition, the query removed some labels from some of the node patterns (where doing so seemed likely to be unambiguous) to speed up processing further.

Resources