i am currently working with neo4j, if anybody knows how to do pagination with the results that a cypher query gives, specialy the size of the results is so big, about 100 millions.
i know the methode of skip and limit and order by (which is not good, it takes a long time). so is there anyone knows another efficient methode to do the pagination.
thank you in advance.
The APOC periodic execution procedures may work for you.
For example, apoc.periodic.iterate allows you to execute a query (referred to as the "inner query") in batches of a specific size.
Actually, you don't need order by clause. You can use SKIP and LIMIT like RETURN x SKIP 0 LIMIT 15. I think its performance should be better.
Related
I'm unable to test this on the big data set right now, but just wondering what Cypher query in Neo4j will work faster in order to properly design the system:
Approach #1:
WHERE apoc.coll.containsAllSorted($profileDetailedCriterionIds, childD.mandatoryCriterionIds)
Approach #2:
(p:Profile {id: $profileId})-[:HAS_VOTE_ON]-(c:Criterion)<-[:HAS_VOTE_ON]-(childD)
WHERE c.id IN childD.mandatoryCriterionIds
WITH childD, COLLECT(c.id) as cIds,
WHERE size(cIds) >= size(childD.mandatoryCriterionIds)
where $profileDetailedCriterionIds is a Set of ids provided via query parameter
What approach should I chose for better performance?
Run both queries in a Neo4j browser but put a keyword PROFILE at the start of the query. When both queries are done, it will display a PROFILING or explanation on how the query was executed. Then go to last tab on the left and look that part where you use the APOC function and compare the db hits and page caching without the APOC function.
why neo4j order by is very slow for large database :(
here is the example query:
PROFILE MATCH (n:Item) RETURN n ORDER BY n.name Desc LIMIT 25
and in result it's read all records but i already used index on name property.
here is the result
Click here to see results
it reads all nodes, it's real mess for large number of records.
any solution for this?
or neo4j is not good choice too for us :(
and any way to get last record from nodes?
Your question and problem are not very clear.
1) Are you sure that you added the index correctly?
CREATE INDEX ON :Item(name)
In the Neo4j browser execute :schema to see all your indexes.
2) How many Items does your database hold and what running time are you expecting and achieving?
3) What do you mean by 'last record from nodes'?
Indexes are currently only used to find entry points into the graph, but not for other uses including ordering of results.
Indexed-backed ORDER BY operations have been a highly requested feature for awhile, and while we've been tracking and ordering its priority, we've had several other features that took priority over this work.
I believe indexed-backed ORDER BY operations are currently scheduled very soon, for our 3.5 release coming in the last few months of 2018.
I am trying to implement a user-journey analytics solution. Simply analyze on which screens, which users leave the application.
For this , I have modeled the data like this:
I modeled single activity since I want to index some attributes. Relation attributes can not be indexed in Neo4j.
With this model, I am trying to write a query that follows three successive event types with below query:
MATCH (eventType1:EventType {eventName:'viewStart-home'})<--(event:EventNode)
<--(eventType2:EventType{eventName:'viewStart-payment'})
WITH distinct event.deviceId as eUsers, event.clientCreationDate as eDate
MATCH((eventType2)<--(event2:EventNode)
<--(eventType3:EventType{eventName:'viewStart-screen1'}))
WITH distinct event2.deviceId as e2Users, event2.clientCreationDate as e2Date
RETURN e2Users limit 200000
And the execution plan is below:
I could not figure the reason of this process out. Can you help me?
Your query is doing a lot more work than it needs to.
The first WITH clause is not needed at all, since its generated eUsers and eDate variables are never used. And the second WITH clause does not need to generate the unused e2Date variable.
In addition, you could first add an index for :EventType(eventName) to speed up the processing:
CREATE INDEX ON :EventType(eventName);
With these changes, your query's profile could be simpler and the processing would be faster.
Here is an updated query (that should use the index to quickly find the EventType node at one end of the path, to kick off the query):
MATCH (:EventType {eventName:'viewStart-home'})<--(:EventNode)
<--(:EventType{eventName:'viewStart-payment'})<--(event2:EventNode)
<--(:EventType{eventName:'viewStart-screen1'})
RETURN distinct event2.deviceId as e2Users
LIMIT 200000;
Here is an alternate query that uses 2 USING INDEX hints to tell the planner to quickly find the :EventType nodes at both ends of the path to kick off the query. This might be even faster than the first query:
MATCH (a:EventType {eventName:'viewStart-home'})<--(:EventNode)
<--(:EventType{eventName:'viewStart-payment'})<--(event2:EventNode)
<--(b:EventType{eventName:'viewStart-screen1'})
USING INDEX a:EventType(eventName)
USING INDEX b:EventType(eventName)
RETURN distinct event2.deviceId as e2Users
LIMIT 200000;
Try profiling them both on your DB, and pick the best one or keep tweaking further.
lets say i have a select query in cypher
MATCH (n:PERSON) RETURN n
this query should give me 70billion result. But it cant since the result data is really big.
or an update query
MATCH (n :PERSON)
SET n.name = NULL RETURN n
i know the queries are ridiculous but in order to tell that i can work with really big data i gave the examples above.
So now i want something asynchronously works and shows me the process. Sometimes we makes some typos on queries or make mistakes which value we want and the query works for hours. Ok it can work but we want to see when the result come or the process.
i wrote cypher but i read some articles they say there is some other ways to access or change the data.
so what should i do.
How about to write some kind of batch job where you perform updates from smaller sets of nodes:
MATCH (n:person)
with n
SKIP ${nodes-to-be-skipped} limit ${maximal-nodes-to-be-match-or-updated}
SET n.name = NULL
RETURN n
If you know how many nodes are to be updated, you can know the % done.
Think that a bigger limit value, will need more memory, and a smaller one more commits (and therefore more time). So you can adapt it to your needs.
If you need some bulk operations you can take a look bulk batch insertion http://neo4j.com/docs/stable/batchinsert-examples.html which is must faster but works without transactions or indexing and therefore no consistency is checked.
It depends on your problem which can be a better solution. For what I can interpret for your post I would just run a job with paginated cypher queries.
Hope that helps.
Is it possible to have cypher query paginated. For instance, a list of products, but I don't want to display/retrieve/cache all the results as i can have a lot of results.
I'm looking for something similar to the offset / limit in SQL.
Is cypher skip + limit + orderby a good option ? http://docs.neo4j.org/chunked/stable/query-skip.html
SKIP and LIMIT combined is indeed the way to go. Using ORDER BY inevitably makes cypher scan every node that is relevant to your query. Same thing for using a WHERE clause. Performance should not be that bad though.
Its like normal sql, the syntax is as follow
match (user:USER_PROFILE)-[USAGE]->uUsage
where HAS(uUsage.impressionsPerHour) AND (uUsage.impressionsPerHour > 100)
ORDER BY user.hashID
SKIP 10
LIMIT 10;
This syntax suit to last version (2.x)
Neo4j apparently uses "indexed-backed order by" nowadays, which means if you are using alphabetical ORDERBY on indexed node properties within your SKIP/LIMIT query, then Neo4j will not perform a full scan of all "relevant nodes" as other have mentioned (their responses were long ago, so keep that in mind). The index will allow Neo4j to optimize on the basis that it already stores indexed properties in ORDERBY order (alphabetical), such that your pagination will be even faster than without the index.