Decoupling Location from Users and Rails Geocoder - ruby-on-rails

In our app we originally had User records with latitude/longitude and that worked fine. As we've gotten bigger and have more people using it, the number of location updates/checks has gotten large and I thought we could lighten the load by decoupling location from the User record: can update and check location independently of the User record without blowing away serializer cache on User data every X seconds we update location...
However this has led to an interesting problem: When trying to find Users near a certain location we're now slightly screwed. When latitude/longitude are coupled to User, you can simply do User.near(#geocoded_record) and have a distance sorted list of Users. With Location being independent it gets harder and I'm looking for advice on how to properly query this.
I tried User.some_scopes.joins(:location).merge(Location.near(#geocoded_record)) but that returns an ActiveRecord_Relation with "User records" that only contain a nil id, latitude, and longitude... This DOES NOT happen when applying any other sort of scope/query to the Location merge for some reason.
So... Anyone have a suggestion on the best way to fetch User records sorted by distance to a geocoded record through the association without going back to having latitude/longitude directly on User?

joins and near with Geocoder produce some unexpected results.
We faced the same issue, and created a scope and that seems to be working fine for us.
https://github.com/alexreisner/geocoder/issues/627
dkniffin provides the scopes below.

I had the same problem recently, in my case I have a Travel model and a Destination model which contains the lat and long values for the travel.I finally got it working like that, probably not the best in terms of optimization:
The scope for Travel:
scope :near_of, ->(target_lat, target_lng) { joins(:destination).merge(Destination.near([target_lat, target_lng], 3)) }
And the controller:
travels = Travel.includes(:destination).near_of(params[:destination_latitude], params[:destination_longitude])

Related

Sorting by nearest locations in Backand?

Is it possible to sort returned objects from Backand based on how near the location field of type "point" is to the querying users current location?
From the Backand docs I have only seen support for querying based on a maximum distance from a point but nothing about sorting by geo points.
I was able to create a custom query in Backand which I can hit from the Backand API. Unfortunately in order to sort on the distance of nearby users I need to calculate the distance from the current user to every other user in the database and then sort based on this. Seems very complex - a lot of calculations every time the query is called! Will probably see big performance hits as the database gets larger. Guess it answers this question, but I am hopeful still of finding a better alternative.

Logic for selecting best nearby venues for display on a map

I have an app that displays information about certain venues. Each venue is awarded a rating on a scale from 0-100. The app includes a map, and on the map I'd like to show the best nearby venues. (The point is to recommend to the user alternative venues that they might like.)
What is the best way to approach this problem?
If I fetch the nearest x venues, many bad venues (i.e. those with a
low rating) show.
If I fetch the highest rated venues, many of them
will be too far away to be useful as recommendations.
This seems like a pretty common challenge for any geolocation app, so I'm interested to know what approach other people have taken.
I have considered "scoring" each possible venue by taking into account its rating and its distance in miles.
I've also considered fetching the highest rated venues within a y mile radius, but this gets problematic because in some cities there are a lot of venues in a small area (e.g. New York) and in others it's reasonable to recommend venues that are farther away.
(This is a Rails app, and I'm using Solr with the Sunspot gem to retrieve the data. But I'm not necessarily looking for answers in code here, more just advice about the logic.)
Personally, I would implement a few formulas and use some form of A/B testing to get an idea as to which ones yield the best results on some outcome metric. What exactly that metric is is up to you. It could be clicks, or it could be something more complicated.
Start out with the simplest formula you can think of (ideally one that is computationally cheap as well) to establish a baseline. From there, you can iterate, but the absolute key concept is that you'll have hard data to tell you if you're getting better or worse, not just a hunch (perhaps that a more complicated formula is better). Even if you got your hands on Yelp's formula, it might not work for you.
For instance, as you mentioned, a single score calculated based on some linear combination of inverse distance and establishment quality would be a good starting point and you can roll it out in a few minutes. Make sure to normalize each component score in some way. Here's a possible very simple algorithm you could start with:
Filter venues as much as possible on fast-to-query attributes (by type, country, etc.)
Filter remaining venues within a fairly wide radius (you'll need to do some research into exactly how to do this in a performant way; there are plenty of posts on Stackoverflow and else where on this. You'll want to index your database table on latitude and longitude, and follow a number of other best practices).
Score the remaining venues using some weights that seem intuitive to you (I arbitrarily picked 0.25 and 0.75, but they should add up to 1:
score = 0.25*(1-((distance/distance of furthest venue in remaining
set)-distance of closest venue)) + 0.75*(quality score/highest quality
score in remaining set)
Sort them by score and take the top n
I would put money on Yelp using some fancy-pants version of this simple idea. They may be using machine learning to actually select the weights for each component score, but the conceptual basis is similar.
While there are plenty of possibilities for calculating formulas of varying complexity, the only way to truly know which one works best is to gather data.
I would fix the number of venues returned at say 7.
Discard all venues with scores in the lowest quartile of reviewers scores, to avoid bad customer experiences, then return the top 7 within a postcode. If this results in less than 7 entries, then look to the neighboring post codes to find the best scores to complete the list.
This would result in a list of top to mediocre scores locally, perhaps with some really good scores only a short distance away.
From a UX perspective this would easily allow users to either select a postcode/area they are interested in or allow the app to determine its location.
From a data perspective, you already have addresses. The only "tricky" bit is determining what the neighboring postcodes/areas are, but I'm sure someone has figured that out already.
As an aside, I'm a great believer in things changing. Like restaurants changing hands or the owners waking up and getting better. I would consider offering a "dangerous" list of sub-standard eateries "at your own risk" as another form of evening entertainment. Personally I have found some of my worst dining experiences have formed some of my best dining out stories :-) And if the place has been harshly judged in the past you can sometimes find it is now a gem in the making.
First I suggest that you use bayesian average to maintain an overall rating for all the venues, more info here: https://github.com/tyrauber/acts_rateable
Then you can retrieve the nearest venues ordered by distance then ordered by rating. two order by statements in your query

Neo4j Structure for GPS coordinates log

I'm using neo4j for a, let's call it, social network where users will have the ability to log their position during workouts (think Runkeeper and Strava).
I'm thinking about how I want to save the coordinates.
Is it a good idea to have it like node(user)-has->node(workouts)<-is a-node(workout)-start->node(coord)-next->node(coord)-next->.... i.e. a linked list with coordinates for every workout?
I will never query the db for individual points, the workout will always be retrieved as a whole.
Is it a better way to solve this?
I can image that a graph db isn't the ideal db to store this type of data, but I don't want to add the complexity of adding another db right now.
Can someone give me any insight on this?
I would suggest you store it as:
user --has--> workout --positionedAt--> coord
This design feels more natural to me as the linked list design you mentioned in your question just produces a really deep traversal which might be annoying to query. In this way you can easily find all the coordinates for a particular workout by simply iterating edges on the workout vertex. I would recommend storing a datetime stamp on the positionedAt edge so that you can sort your coordinates easily.
The downside is that depending on how many coord vertices you intend to have you might end up with some fat workout vertices, but that may not really affect your use case. I can't think of a workout that would generate something like 100000 coordinates (and hence 100000 edges), but perhaps you can. If so, I suppose I could amend my answer a bit.

Order Solr/sunspot search results by geo location

I'd like to be able to order my search results by score and location. Each user in the DB has lat/lot and I am currently indexing:
location :coordinates do
Sunspot::Util::Coordinates.new latlon[0], latlon[1]
end
The model which I would performing the search against is also indexed in the same manner. Essentially what I am trying to achieve is that the results be ordered by score and then by location. So if I search for Walmart, I would like to see all Walmart's ordered by their geo proximity to my location.
I remember reading something about solr's new geo-sort but not sure if it is out of alpha and/or if sunspot has implemented a wrapper.
What would you recommend?
Because of the way that Sunspot calculates location types you'll need to do some extra leg work to have it sort by distance from your target as well. The way it works is that it creates a geo-hash for each point and then searches using regular fulltext search on that geo-hash. The result is that you probably won't be able to determine if a point 10km away is further than a point that is 5km away, but you'll be able to tell if a point 50km away is further than a point 1-2km away. The exact distances are arbitrary but the result is that you probably won't have as fine-grained of a result as you would like and the search acts more as a way to filter points that are within an acceptable proximity. After you have filtered your points using the built-in location search, there are three ways to accomplish what you want:
Upgrade to Solr 3.1 or later and upgrade your schema.xml to use the new spatial search columns. You'll then need to make custom modifications to Sunspot to create fields and orderings that work with these new data types. As far as I know these aren't available in Sunspot yet, so you'll have to make those connections on your own and you'll have to dig around in Solr to do some manual configurations.
Leverage the Spatial Solr Plugin. You'll have to install a new JAR into your Solr directory and you'll have to make some modifications to Sunspot, but they are relatively painless and the full instructions can be found here.
Leverage your DB, if your DB is also indexed on the location columns then you can use the Sunspot built-in location search to filter your results down to a reasonable sized set. You can then query the DB for those results and order them by proximity to your location using your own distance function.

Ride matching in Google Maps with Rails 3 for a carpooling system

We want to search through carpools to find an optimal carpool whose route passes through the user's location. The data in the carpool is just the start and end points. Assuming the end point is a common end point for the user and the car pool creator, what would be the best way to determine an appropriate carpool for the user? We are using Rails 3. Both Google Maps API v2 and v3 are possible solutions.
What you really want to know is the detour, both in time and distance. There's no simple way to determine this mathematically from just cooridnates. Luckily, it's trivial to do with a routeplanner. Just calculate both the time with and without waypoint.
If you have a very large set of carpoolers, it helps to start with the ones that are physically closest . Once you get a carpool with a detour of N kilometers, you know that you can exclude all other begin/end pairs where the begin-user-end distance in a straight line is at least N kilometers more than the best route from begin to end. This is the logic behind A*; start with what looks best geometrically so you quickly establish an upper bound and need not spend a lot of work on long detours.

Resources