nearGeoPoint over 100 miles - ios

I am trying to sort my query results using the nearGeoPoint constraint. However, Parse docs mention a caveat that "Using the nearGeoPoint constraint will also limit results to within 100 miles."
How do I continue sorting results beyond the first 100 miles, especially if I have thousands of results that need to be sorted?

Your best bet is to try:
PFQuery whereKey:nearGeoPoint:withinMiles:
and set your withinMiles to > 100.
I can't guarantee it will work, but it's certainly worth a shot! Worst case scenario is will error out and you know the limitations of Parse for certain.

Related

Randomize Selections in a List of 100

This is a follow-up to this last question I asked: Sort Users by Number of Followers. That code is:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
What I hope to accomplish is take the ordered users and get the top 100 of those and then randomly choose 5 out of that 100. Is there a way to accomplish this?
Thanks.
users_in_descending_order_of_followers = User.all.sort_by { |u| -u.followers.count }
sample_of_top = users_in_descending_order_of_followers.take(100).sample(5)
You can use sort_by which can be easier to use than sort, and combine take and sample to get the top 100 users and sample 5 of those users.
User.all.sort can "potentially" pose some problems in the long-run, depending on the number of total users, and the availability of resources particularly computer memory, not to mention it would be a lot slower because you're calling 2x .followers.count inside the sort block, which essentially calls 2xN times more DB query; N being the number of users. This is because User.all.sort will immediately execute the User.all query, thereby fetching all User records into memory, as opposed to your usual User.all, which is lazy loaded, until you (for example use .each, or better yet .find_each somewhere down the line)
I suggest something like below (I extended Deekshith's answer referring to your link to the other question):
User.joins(:followers).order('count(followers.user_id) desc').limit(100).sample(5)
.joins, .order, and .limit above will all extend the SQL string query into one string, then executes that SQL string, and finally run .sample(5) (not a SQL anymore!, but is already just a plain ruby method at this point), finally yielding the result that you needed.
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.
User.order(followers_count: :desc).limit(100).sample(5)
This method will out-perform others using count(*). Add an index on followers_count for best effect.

Parse.com Query Limit - Effects whereKey Limit?

I understand that with parse there is a PFQuery limit where you can only retrieve 1000 objects at a time. I presume it doesn't, but does this also limit the number of whereKey comparisons that can be carried out. E.g.
var query = PFQuery(classname: "Photos")
query.whereKey("Name", equalTo: someString)
query.findObjectsInBackgroundWithBlock()
If there are more than 1000 objects in the class, will the whereKey comparison stop after it has compared 1000 objects, or is the issue with only actually retrieving more than 1000 objects?
The reason I presume there is not a limit on this, is that If you have more than 1000 users, there would be no straight forward way to do a standard user query.
Using the whereKey parameters does not effect your fetch limits, in fact, it reduces them simply due to the fact of its purpose. The point of including keys is to narrow it down correct? You can even include multiple keys or whereKey statements in the same query. So by narrowing it down further your reduce the probable objects to be fetched. So in short, your presumption is correct.
Lets be clear firstly, whereKey isn't actually doing anything, its setting a filter [parameter] and applying it to your asynchronous calls for the given blocks to do something with those keys. The findObjects is what returns your limit that you now know is 1000. You can skip queries See Here which effectively means you can query for the first 1000 and skip those you've already queried once your ready to display further results [pagination]. So to answer your second question, whereKey parameter won't stop doing anything, because it kind of isn't anyways, nor will you stop retrieving objects, you just have to learn how to navigate around your first 1000 returned objects.
There are numerous ways of querying users, it all depends on your apps direction and current set up. You have to think about Parse as a business and not a service, they make money off of API requests, so the more you do the better it is for them. I would suggest coming back to SO once you get to where you have an issue with this so someone can help you out, if you need it.

Neo4j performance - counting nodes - linked list performance - alternatives?

UPDATED: Wes hit a home run here! Thanks.. I've added a Rails version I was developing using the neography Gem.. Accomplishes the same thing but his version is much faster.. see comparison below
I am using a linked list in Neo4j (1.9, REST, Cypher) to help keep the comments in proper order (Yes I know I can sort on the time etc).
(object node)---[:comment]--->(comment)--->(comment)--->(comment).... etc
Currently I have 900 comments and it's taking 7 seconds to get through the whole list - completely unacceptable.. I'm just returning the ID of the node (I know, don't do this, but it's not he point of my post).
What I'm trying to do is find the ID's of users who commented so I can return a count.. (like "Joe and 405 others commented on your post").. Now, I'm not even counting the unique nodes at this point - I'm just returning the author_id for each record.. (I'll worry about counting later - first take care of the basic performance issue).
start object=node(15837) match object-[:COMMENTS*]->comments return comments.author_id
7 seconds is waaaay too long..
Instead of using a linked list, I could just simply have an object and link all the comments directly to the node - but that could lead to a supernode that is just bogged down, and then finding the most recent comments, even with skip and limit, will be dog slow..
Will relationship indexes help here? I've never used them other than to ensure a unique relationship, or to see if a relationship exists, but can I use them in a cypher query to help speed things up?
If not, what else can I do to decrease the time it takes to return the IDs?
COMPARISON: Here is the Rails version using "phase II" methods of the Neography gem:
next_node_id=18233
#neo=Neography::Rest.new
start_node = Neography::Node.load(next_node_id, #neo)
all_nodes=start_node.outgoing(:COMMENTS).depth(10000)
raise all_nodes.size.to_i
Result: 526 nodes found in 290ms..
Wes' solution took 5 ms.. :-)
Relationship indexes will not help. I'd suggest using an unmanaged extension and the traversal API--it will be a lot faster than Cypher for this particular query on long lists. This example should get you close:
https://github.com/wfreeman/linkedlistlength
I based it on Mark Needham's example here:
http://www.markhneedham.com/blog/2014/07/20/neo4j-2-1-2-finding-where-i-am-in-a-linked-list/
If you're only doing this to return a count, the best solution here is to not figure it out on every query since it isn't changing that often. Cache the results on the node in a total_comments property to your node. Every time a relationship is added or removed, update that count. If you want to know whether any of the current user's friends commented on it so you can say, "Joe and 700 others commented on this," you could do a second query:
start joe=node(15830) object=node(15838) match joe-[:FRIENDS]->friend-[:POSTED_COMMENT]->comment<-[:COMMENTS]-object RETURN friend LIMIT 1
You limit it to 1 since you only need the name of one friend who commented. If it returns someone, adjust the number of comments displayed by 1, include the user's name. You could do that with JS so it doesn't delay your page load. Sorry if my Cypher is a little off, not used to <2.0 syntax.

Solr join and non-Join queries give different results

I am trying to link two types of documents in my Solr index. The parent is named "house" and the child is named "available". So, I want to return a list of houses that have available documents with some filtering. However, the following query gives me around 18 documents, which is wrong. It should return 0 documents.
q=*:*
&fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014 AND sd_month:11
To debug it, I tried first to check whether there is any available documents with the given filter queries. So, I tried the following query:
q=*:*
&fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq=doctype:available AND sd_year:2014 AND sd_month:11
The query gives 0 results, which is correct. So as you can see both queries are the same, the different is using the join query parser. I am a bit confused, why the first query gives results. My understanding is that this should not happen because the second query shows that there is no any available documents that satisfy the given filter queries.
I have figured it out.
The reason is simply the type of join in Solr. It is an outer join. Since both filter queries are executed separately, a house that has available documents with discount > 1 or (sd_year:2014 AND sd_month:11) will be returned even though my intention was applying bother conditions at the same time.
However, in the second case, both conditions are applied at same time to find available documents, then houses based on the matching available documents are returned. Since there is no any available document that satisfies both conditions, then there is no any matching house which gives zero results.
It really took sometime to figure this out, I hope this will help someone else.

CoreData: max number of rows for sort operation?

is there a max number of rows that should be in a table to perform a sort operation? I've got a table with a last modified-date on each entry. If I'd like to get e.g. the 50 latest modfiied entries, I first sort by the last-modified-date and then fetch 50 entries. The fetch should be no problem, is there a limit where the sorting could become slow (e.g. > 1 second)?
Thanks a lot,
Stefan
If you declare the sort descriptor as usual and set your fetchLimit to the desired value, core data is going to take care of all the necessary optimizations.
In my experience, for more than 1 second sort time you would need at least tens of thousands of records. Clearly, this depends on hardware and other factors, but you can test it if you want to be sure.

Resources