I understand that with parse there is a PFQuery limit where you can only retrieve 1000 objects at a time. I presume it doesn't, but does this also limit the number of whereKey comparisons that can be carried out. E.g.
var query = PFQuery(classname: "Photos")
query.whereKey("Name", equalTo: someString)
query.findObjectsInBackgroundWithBlock()
If there are more than 1000 objects in the class, will the whereKey comparison stop after it has compared 1000 objects, or is the issue with only actually retrieving more than 1000 objects?
The reason I presume there is not a limit on this, is that If you have more than 1000 users, there would be no straight forward way to do a standard user query.
Using the whereKey parameters does not effect your fetch limits, in fact, it reduces them simply due to the fact of its purpose. The point of including keys is to narrow it down correct? You can even include multiple keys or whereKey statements in the same query. So by narrowing it down further your reduce the probable objects to be fetched. So in short, your presumption is correct.
Lets be clear firstly, whereKey isn't actually doing anything, its setting a filter [parameter] and applying it to your asynchronous calls for the given blocks to do something with those keys. The findObjects is what returns your limit that you now know is 1000. You can skip queries See Here which effectively means you can query for the first 1000 and skip those you've already queried once your ready to display further results [pagination]. So to answer your second question, whereKey parameter won't stop doing anything, because it kind of isn't anyways, nor will you stop retrieving objects, you just have to learn how to navigate around your first 1000 returned objects.
There are numerous ways of querying users, it all depends on your apps direction and current set up. You have to think about Parse as a business and not a service, they make money off of API requests, so the more you do the better it is for them. I would suggest coming back to SO once you get to where you have an issue with this so someone can help you out, if you need it.
Related
I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).
This is a follow-up to this last question I asked: Sort Users by Number of Followers. That code is:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
What I hope to accomplish is take the ordered users and get the top 100 of those and then randomly choose 5 out of that 100. Is there a way to accomplish this?
Thanks.
users_in_descending_order_of_followers = User.all.sort_by { |u| -u.followers.count }
sample_of_top = users_in_descending_order_of_followers.take(100).sample(5)
You can use sort_by which can be easier to use than sort, and combine take and sample to get the top 100 users and sample 5 of those users.
User.all.sort can "potentially" pose some problems in the long-run, depending on the number of total users, and the availability of resources particularly computer memory, not to mention it would be a lot slower because you're calling 2x .followers.count inside the sort block, which essentially calls 2xN times more DB query; N being the number of users. This is because User.all.sort will immediately execute the User.all query, thereby fetching all User records into memory, as opposed to your usual User.all, which is lazy loaded, until you (for example use .each, or better yet .find_each somewhere down the line)
I suggest something like below (I extended Deekshith's answer referring to your link to the other question):
User.joins(:followers).order('count(followers.user_id) desc').limit(100).sample(5)
.joins, .order, and .limit above will all extend the SQL string query into one string, then executes that SQL string, and finally run .sample(5) (not a SQL anymore!, but is already just a plain ruby method at this point), finally yielding the result that you needed.
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.
User.order(followers_count: :desc).limit(100).sample(5)
This method will out-perform others using count(*). Add an index on followers_count for best effect.
I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length
I need to count no. of objects from a collection in core data of that satisfy a certain criteria.
(eg. count no. employees with distinct departments).
There are two solutions to my problem:
(1) Fetch the collection in only one request and filter the array locally
for each department using NSPredicate
(2) Execute multiple NSFetchedRequests directly on the data
Question is which solution will be fastest and take up least amount of memory given this is only for instrumentation purpose and is of no importance in the app in terms of behavior/UI.
Counter Question : If it is (1) - which is the best way to filter the array? manual looping and counting or NSPredicate?
P.S:
a. Names of departments are known to me. (its actually an enum)
b. collection is small - will be max 50
1 is fastest and takes most memory.
2 will use the least memory but may take longer.
However, this is not always true. In the event that your number of individual fetch requests will contain many of the same employee data sets that other fetch requests will return too, then it may even be the other way around. But as you are fetching for departments, that will not be the case.
For a small collection it may not be much of a difference anyway.
Counting question: This, too, depends. However, I'd go for the predicate as that is save for future use if the collection grows.
I am trying to sort my query results using the nearGeoPoint constraint. However, Parse docs mention a caveat that "Using the nearGeoPoint constraint will also limit results to within 100 miles."
How do I continue sorting results beyond the first 100 miles, especially if I have thousands of results that need to be sorted?
Your best bet is to try:
PFQuery whereKey:nearGeoPoint:withinMiles:
and set your withinMiles to > 100.
I can't guarantee it will work, but it's certainly worth a shot! Worst case scenario is will error out and you know the limitations of Parse for certain.