Is it possible to sort returned objects from Backand based on how near the location field of type "point" is to the querying users current location?
From the Backand docs I have only seen support for querying based on a maximum distance from a point but nothing about sorting by geo points.
I was able to create a custom query in Backand which I can hit from the Backand API. Unfortunately in order to sort on the distance of nearby users I need to calculate the distance from the current user to every other user in the database and then sort based on this. Seems very complex - a lot of calculations every time the query is called! Will probably see big performance hits as the database gets larger. Guess it answers this question, but I am hopeful still of finding a better alternative.
Related
In our app we originally had User records with latitude/longitude and that worked fine. As we've gotten bigger and have more people using it, the number of location updates/checks has gotten large and I thought we could lighten the load by decoupling location from the User record: can update and check location independently of the User record without blowing away serializer cache on User data every X seconds we update location...
However this has led to an interesting problem: When trying to find Users near a certain location we're now slightly screwed. When latitude/longitude are coupled to User, you can simply do User.near(#geocoded_record) and have a distance sorted list of Users. With Location being independent it gets harder and I'm looking for advice on how to properly query this.
I tried User.some_scopes.joins(:location).merge(Location.near(#geocoded_record)) but that returns an ActiveRecord_Relation with "User records" that only contain a nil id, latitude, and longitude... This DOES NOT happen when applying any other sort of scope/query to the Location merge for some reason.
So... Anyone have a suggestion on the best way to fetch User records sorted by distance to a geocoded record through the association without going back to having latitude/longitude directly on User?
joins and near with Geocoder produce some unexpected results.
We faced the same issue, and created a scope and that seems to be working fine for us.
https://github.com/alexreisner/geocoder/issues/627
dkniffin provides the scopes below.
I had the same problem recently, in my case I have a Travel model and a Destination model which contains the lat and long values for the travel.I finally got it working like that, probably not the best in terms of optimization:
The scope for Travel:
scope :near_of, ->(target_lat, target_lng) { joins(:destination).merge(Destination.near([target_lat, target_lng], 3)) }
And the controller:
travels = Travel.includes(:destination).near_of(params[:destination_latitude], params[:destination_longitude])
To perform geoqueries in Firebase or Firestore, there are libraries like GeoFire and GeoFirestore. But to sort the results of that geoquery by distance, the entire dataset must be read, correct? If a geoquery produces a large number of results, there is no way to paginate those results (on the backend, not to the user) when sorting by distance, is there?
Yes, in order to sort by distance you must read all results that fall into the Geoquery range.
The reason for this is how such queries work: they return a set of documents that are within a range of geohash values, which is not necessarily the same order as by their distance to the center of the query.
This also means that there is no way to do meaningful pagination in a list of documents that are ordered by their distance, since you need to read all results anyway. The best I can think of is implementing the Geoquery in Cloud Functions, so that you can do the sort/filter there, and only return the page-full of results to the client. While this doesn't save on your cost (as you're still reading all documents in the range), it will save bandwidth in sending documents to the user.
To learn more about how such geoqueries work, which explains why they can't be optimized the way you're looking to do, have a look at the video of my talk here or this article+shorter video on Jeff Delaney's site.
I have an app that displays information about certain venues. Each venue is awarded a rating on a scale from 0-100. The app includes a map, and on the map I'd like to show the best nearby venues. (The point is to recommend to the user alternative venues that they might like.)
What is the best way to approach this problem?
If I fetch the nearest x venues, many bad venues (i.e. those with a
low rating) show.
If I fetch the highest rated venues, many of them
will be too far away to be useful as recommendations.
This seems like a pretty common challenge for any geolocation app, so I'm interested to know what approach other people have taken.
I have considered "scoring" each possible venue by taking into account its rating and its distance in miles.
I've also considered fetching the highest rated venues within a y mile radius, but this gets problematic because in some cities there are a lot of venues in a small area (e.g. New York) and in others it's reasonable to recommend venues that are farther away.
(This is a Rails app, and I'm using Solr with the Sunspot gem to retrieve the data. But I'm not necessarily looking for answers in code here, more just advice about the logic.)
Personally, I would implement a few formulas and use some form of A/B testing to get an idea as to which ones yield the best results on some outcome metric. What exactly that metric is is up to you. It could be clicks, or it could be something more complicated.
Start out with the simplest formula you can think of (ideally one that is computationally cheap as well) to establish a baseline. From there, you can iterate, but the absolute key concept is that you'll have hard data to tell you if you're getting better or worse, not just a hunch (perhaps that a more complicated formula is better). Even if you got your hands on Yelp's formula, it might not work for you.
For instance, as you mentioned, a single score calculated based on some linear combination of inverse distance and establishment quality would be a good starting point and you can roll it out in a few minutes. Make sure to normalize each component score in some way. Here's a possible very simple algorithm you could start with:
Filter venues as much as possible on fast-to-query attributes (by type, country, etc.)
Filter remaining venues within a fairly wide radius (you'll need to do some research into exactly how to do this in a performant way; there are plenty of posts on Stackoverflow and else where on this. You'll want to index your database table on latitude and longitude, and follow a number of other best practices).
Score the remaining venues using some weights that seem intuitive to you (I arbitrarily picked 0.25 and 0.75, but they should add up to 1:
score = 0.25*(1-((distance/distance of furthest venue in remaining
set)-distance of closest venue)) + 0.75*(quality score/highest quality
score in remaining set)
Sort them by score and take the top n
I would put money on Yelp using some fancy-pants version of this simple idea. They may be using machine learning to actually select the weights for each component score, but the conceptual basis is similar.
While there are plenty of possibilities for calculating formulas of varying complexity, the only way to truly know which one works best is to gather data.
I would fix the number of venues returned at say 7.
Discard all venues with scores in the lowest quartile of reviewers scores, to avoid bad customer experiences, then return the top 7 within a postcode. If this results in less than 7 entries, then look to the neighboring post codes to find the best scores to complete the list.
This would result in a list of top to mediocre scores locally, perhaps with some really good scores only a short distance away.
From a UX perspective this would easily allow users to either select a postcode/area they are interested in or allow the app to determine its location.
From a data perspective, you already have addresses. The only "tricky" bit is determining what the neighboring postcodes/areas are, but I'm sure someone has figured that out already.
As an aside, I'm a great believer in things changing. Like restaurants changing hands or the owners waking up and getting better. I would consider offering a "dangerous" list of sub-standard eateries "at your own risk" as another form of evening entertainment. Personally I have found some of my worst dining experiences have formed some of my best dining out stories :-) And if the place has been harshly judged in the past you can sometimes find it is now a gem in the making.
First I suggest that you use bayesian average to maintain an overall rating for all the venues, more info here: https://github.com/tyrauber/acts_rateable
Then you can retrieve the nearest venues ordered by distance then ordered by rating. two order by statements in your query
I'm using neo4j for a, let's call it, social network where users will have the ability to log their position during workouts (think Runkeeper and Strava).
I'm thinking about how I want to save the coordinates.
Is it a good idea to have it like node(user)-has->node(workouts)<-is a-node(workout)-start->node(coord)-next->node(coord)-next->.... i.e. a linked list with coordinates for every workout?
I will never query the db for individual points, the workout will always be retrieved as a whole.
Is it a better way to solve this?
I can image that a graph db isn't the ideal db to store this type of data, but I don't want to add the complexity of adding another db right now.
Can someone give me any insight on this?
I would suggest you store it as:
user --has--> workout --positionedAt--> coord
This design feels more natural to me as the linked list design you mentioned in your question just produces a really deep traversal which might be annoying to query. In this way you can easily find all the coordinates for a particular workout by simply iterating edges on the workout vertex. I would recommend storing a datetime stamp on the positionedAt edge so that you can sort your coordinates easily.
The downside is that depending on how many coord vertices you intend to have you might end up with some fat workout vertices, but that may not really affect your use case. I can't think of a workout that would generate something like 100000 coordinates (and hence 100000 edges), but perhaps you can. If so, I suppose I could amend my answer a bit.
I currently have a Postgres DB filled with approx. 300.000 data-sets of moving vehicles all over the world. My very frequently repeated query is: Give me all vehicles in a 5/10/20mile radius. Currently I spend around 600 to 1200 ms in the DB to prepare the set of located vehicle-objects.
I am looking to vastly improve this time by ideally one or two orders of magnitude if possible. I am working in a Ruby on Rails 3.0beta environment if this is relevant.
Any ideas how to architect the whole system to accelerate this query? Any NoSQL database able to deliver this kind of geolocation performance? I know of MongoDB working on an extension to facilitate this scenario but haven't tried it yet. Any intelligent use of Redis to achieve this?
One problem with SQL-DBs here seems to be that I can't possibly use indexes because my vehicles are mostly moving around, meaning I had to constantly created DB indexes which, by itself, is probably more expensive than just doing the searching without index.
Looking forward to your thoughs, Thanks!
If you use the right algorithm for organizing your data, you will be able to use a spatial index which can dramatically speed up your queries.
The best practice for the geolocation domain is to use a geohash, quad-tree, R-tree or similar data structure (R-trees are the most generic, but it sounds like you're querying point data, so that may not matter). In each case, you can create a spatial index that uses a single, linear column where each value represents a bounding box of varying size and shape. This should let you answer most queries with a single range query in your database. Spatial indices can be implemented in SQL (PostGIS, MS SQL, MySQL all have spatial datatypes and spatial indices which use one of these techniques) or NoSQL (popular for its horizontal scalability; AppEngine has geomodel, SimpleGeo uses Cassandra, Foursquare uses MongoDB).
Using an index can be complicated by constantly moving points, but I would suspect that writes, even slightly heavier writes that update indices, wouldn't be your bottleneck.
Even though your vehicles are moving around all the time, I assume they have some kind of speed limit. What you can do is to create some kind of discrete coordinate system, one example would be the integer part of the lat/long coordinate. Then you put those values in separate columns, keeping the exact location in another column. You should then be able to index the integer columns, as the vehicles won't move so much that they change those values very often.
When doing a search, you first find out what "squares" are interesting, and restrict your query to the vechicles within those sqeares, using the indexed columns. Then you have to do a full search of all vehicles within each square. The number of vehicles you have to do a full search over should now only be a small fraction of all vechiles. The efficiency of this strategy of course depends on the distribution of your vechiles. If 50% of them are in a certain city somewhere this will not work, but assuming the largest group of vehicles in one place is 5-10% it should improve performance.