Order Solr/sunspot search results by geo location - ruby-on-rails

I'd like to be able to order my search results by score and location. Each user in the DB has lat/lot and I am currently indexing:
location :coordinates do
Sunspot::Util::Coordinates.new latlon[0], latlon[1]
end
The model which I would performing the search against is also indexed in the same manner. Essentially what I am trying to achieve is that the results be ordered by score and then by location. So if I search for Walmart, I would like to see all Walmart's ordered by their geo proximity to my location.
I remember reading something about solr's new geo-sort but not sure if it is out of alpha and/or if sunspot has implemented a wrapper.
What would you recommend?

Because of the way that Sunspot calculates location types you'll need to do some extra leg work to have it sort by distance from your target as well. The way it works is that it creates a geo-hash for each point and then searches using regular fulltext search on that geo-hash. The result is that you probably won't be able to determine if a point 10km away is further than a point that is 5km away, but you'll be able to tell if a point 50km away is further than a point 1-2km away. The exact distances are arbitrary but the result is that you probably won't have as fine-grained of a result as you would like and the search acts more as a way to filter points that are within an acceptable proximity. After you have filtered your points using the built-in location search, there are three ways to accomplish what you want:
Upgrade to Solr 3.1 or later and upgrade your schema.xml to use the new spatial search columns. You'll then need to make custom modifications to Sunspot to create fields and orderings that work with these new data types. As far as I know these aren't available in Sunspot yet, so you'll have to make those connections on your own and you'll have to dig around in Solr to do some manual configurations.
Leverage the Spatial Solr Plugin. You'll have to install a new JAR into your Solr directory and you'll have to make some modifications to Sunspot, but they are relatively painless and the full instructions can be found here.
Leverage your DB, if your DB is also indexed on the location columns then you can use the Sunspot built-in location search to filter your results down to a reasonable sized set. You can then query the DB for those results and order them by proximity to your location using your own distance function.

Related

Sorting by nearest locations in Backand?

Is it possible to sort returned objects from Backand based on how near the location field of type "point" is to the querying users current location?
From the Backand docs I have only seen support for querying based on a maximum distance from a point but nothing about sorting by geo points.
I was able to create a custom query in Backand which I can hit from the Backand API. Unfortunately in order to sort on the distance of nearby users I need to calculate the distance from the current user to every other user in the database and then sort based on this. Seems very complex - a lot of calculations every time the query is called! Will probably see big performance hits as the database gets larger. Guess it answers this question, but I am hopeful still of finding a better alternative.

Import OSM map and reverse geocode to town/city

I'm pretty new to neo4j and the spatial plugin for it so bear with me.
I've used the OSM importer to import the whole of Ireland into the db and now I'm able to query it with the rest API to find nodes within X km of a point. (side note: I am unable to get the Cypher query to return any results? Does the OSMImporter add the data to an index for querying or must I loop through it all and add to an index myself now?)
What I actually want is a rudimental reverse geocoder style query. I want to query the graph for the geometry that contains a geo coordinate, check if this is a town/city/village etc, if not check its ancestors until it is able to tell me what town/county/state I am inside.
Unfortunately I'm quite lost and I've tried, unsuccessfully, looking through the neo4j-spatial code and examples for a start point.

Sunspot Spatial Search Not Returning Results

I've just implemented the Sunspot gem into my application and I really like it except for the fact that when I do a location search it seems to be excluding some results. For example: I live in Columbus Ohio so if I search for "Columbus Ohio" my application translates that into a lat/lng and I do:
#search = (Skatepark.search {
with(:coordinates).near lat, lng, :precision => 3
fulltext text
paginate :page => params[:page], :per_page => 15
})
This returns some records that are geocoded on the west side of columbus but none of my records that I have in my DB that are on the east side. Am I doing something wrong w/ my search?
You can try it out for yourself at http://skateparks.co/search
If you search for "Columbus Ohio" you'll get totally different results than if you search for "Lancaster Ohio" which is only a few miles to the southeast.
This is because Sunspot depends on gem 'pr_geohash' which generates the geospatial index.
A geohash is a form of z-order curve.
You are attempting to solve a nearest neighbor problem using this index, which by its nature can only produce approximate results. The author would have chosen this approach since Solr is designed to deal with large datasets, which suffer from the curse of dimensionality.
Depending on your requirements, perhaps you should try
a siloless solution eg. Freebase (there are several gems)
a saas solution
a spatial database,
a geodatabase,
or your own approach?
Proposed Solution
Use geojson objects for cities, towns, & villages rather than plain lat/long. Use semantic autocompletion so that users can select the polygon directly from Freebase. (Interface example: Quora search box) This way, all results in the requested city will be returned as you are now doing a polygonal bounding search rather than a radial search.
I urge you to make your data available as Freebase Locations where the parent is a City/Town/Village if this is at all possible given your business model. Turns out there is both a location type and a location topic all ready for you.
Update 1
I notice that you're on Heroku.
If you have a dedicated db you could use PostGIS.
If not, read How do you do GIS queries on Heroku using the shared database?

Order Solr results by degrees of friendship

I am currently using Solr 1.4 (soon to upgrade to 3.3). The friendship table is pretty standard:
id | follower_id | user_id
I would like to perform a regular keyword solr search and order the results by degrees of separation as well as the standard score ordering. From the result set, given the keyword matched any of my immediate friends, they would show up first. Secondly would be the friends of my friends, and thirdly friends by 3rd degree of separation. All other results would come after.
I am pretty sure Solr doesn't offer any 'pre-baked' way of doing this therefore I would likely have to do a join on MySQL to properly order the results. Curious if anyone has done this before and/or has some insights.
It's simply not possible in Solr. However, if you aren't too restricted and could use another platform for this, consider neo4j?
This "connections" and degrees is exactly where Neo4j steps in.
http://neo4j.org/
One way might be to create fields like degree_1, degree_2 etc. and store the list of friends at degree x in the field degree_x. Then you could fire multiple queries - the first restricting the results to those who have you in degree_1, the second restricting the results to those who have you in degree_2 and so on.
It is a bit complicated, but the only solution I could think of using Solr.
I haven't represented a graph in solr before, but I think at a high level, this is what you could do. First, represent people as nodes and the social network as a graph in the database. Implement transitive closure function in sql to allow you to walk the graph. Then you would index the result into solr with the social network info stored into payloads, for example.
I was able to achieve this by performing multiple queries and with the scope "with" to restrict to the id's of colleagues, 2nd and 3rd degree colleagues, using the id's and using mysql to do the select.
#search_1 = perform_search(1, options)
#search_2 = perform_search(2, options)
if degree == 1
with(:id).any_of(options[:colleague_ids])
elsif degree == 2
with(:id).any_of(options[:second_degree_colleagues])
end
It's kinda of a dirty solution as I have to perform multiple solr queries, but until I can use dynamic field sorting options (solr 3.3, not currently supported by sunspot) I really don't know any other way to achieve this.

Using Ferret to build unique tag clouds

I've been using Ferret as my full-text search engine in a small project I'm working on.
Through the documentation and a few examples online, i've been able to pull together a tag cloud generator using the full-text index to help with tag cloud generation using the IndexReader.terms method.
It's worked quite well up to now, when I want to get term data based on a search result.
For example, if the user searches for "cake", I want to show them a tag cloud of terms used in association with the term "cake".
I've been looking for examples of where the terms method can be used in association with a search result set or similar?
Currently I'm using the following method to generate my list of tags:
reader = Ferret::Index::IndexReader.new(Scrape.find_last_index_version)
terms = []
reader.terms(:all_quotes).each do |term, doc_freq|
terms << [term, doc_freq]
end
Cheers.
It's more like a term frequency chart (like a wordle) than a tag cloud? Or are these in a tag field? Anyway, the index doesn't keep track of term frequency within each possible document subset (such as the results of a search), so that method wouldn't be fast, even if it existed. For a single document, you can get the TermFreqVector and provide suggested documents that are good matches for other frequent terms in that document. So, you could take some of the top results, grab the term vectors from each one, and just add them up, but those aggregate functions don't exist natively (they generally try not to put slow operations in there.)

Resources