Geo distance filter with Thinking Sphinx - ruby-on-rails

I am considering the use of Thinking Sphinx, as I already have used ElasticSearch and would like to try something new.
In using ThinkingSphinx, how would one go about setting up a geo distance filter. There would be a User model containing the basic information of a user that includes their zip code. There a Locations model that would have the geographical information of the United States (zip code, latitude, longitude, state).
EXAMPLE: Current user “Michael” zip code is 30601. Michael types in the search form “programmer, video games”. The return results will
show Users who have the words “programmer” or “video games” from a
attribute in the User model that are located within 100 miles of
Michael’s zip code 30601.
I have installed ThinkingSphinx, and on my app if I performed the search as detailed in the above example it will return the “programmer” or “video games” matches but only with users who have a 100% exact match to the zip code (it cancels out using geo dist). Now with the code I have I can perform a geo distance search using the zip code, which would returning surrounding Users. The geo distance doesn't seem to work when I factor in attributes from the User model with the zip code.
This was done with ease using Elastic in the past, but I wanted to see how Thinking Sphinx works. If someone has a clue with how this would look in the Searches controller, please

From the perspective of Sphinx, zip codes are not useful - it all comes down to latitude and longitude.
So, if in your example Michael is current_user, you might have a search call looking something like this:
User.search 'programmer video games',
:geo => [current_user.latitude, current_user.longitude],
:with => {:geodist => 0.0..161_000.0},
:order => 'geodist ASC'
Keep in mind this presumes you have latitude and longitude values stored in radians, not degrees. If they are in degrees, then you'll want to convert them in your index definition (as noted in the docs) and when you're searching as well (e.g. current_user.latitude * Math::PI / 180.0).
When filtering by distance, Sphinx uses metres - one mile is almost 1610 metres, hence the conversion in my example above.

Related

Does Google Firestore and/or their Realtime DB have the querying capability to get posts by location (within x miles), order by date, and limit?

I am currently using Firestore for my iOS app and I need to implement a scalable solution for my posts feed. I need to get posts within say 20 miles, order them by date, and limit the amount of posts fetched for pagination. Any and all database solutions would very much appreciated! Thank you!
As a low budget/time alternative to libraries, we have implemented storing the first few digits of lat/long coordinates as a document or collection name and then accessed data that way. The first decimal place gives resolution to around 10 miles or so (exact values for longitude change depending on what latitude you are at). So in your database you could have a collection or document named something like +33.6-112.0. This would mark a reference in Firestore to put all data within (33.8 N, 112.0 W). Be careful with how you round the exact location data before placing it in the respective document or collection.
Then you can retrieve all data at any location you want. This may not give you exactly 20 miles, but some client side sorting can handle that. Note you could make the reference go to any decimal place necessary to achieve the level of precision you are looking for to minimize data base calls (to save you money) and minimize impact on the user's cell data plan.
This is a rather simple solution with limitations, maybe for an MVP, and if not careful could pull way more data than anticipated.
Below is a chart showing the approximate physical distance between each decimal place at the equator. So for example, the distance between (33.3 N, 0 W) and (33.5 N, 0 W) would be about 14 miles.
Neither of those databases have native geospatial querying capabilities. You would have to use some sort of add-on library to help with that. Geofire and Geofirestore are popular for this.

Flickr API for location based images

I want to get images of cities when city name is entered in search field in iOS. I am using flickr API but whenever I enter any longitude and latitude values, I only get returned an empty array. What should I do. I am using this URL:
http://api.flickr.com/services/rest/?method=flickr.photos.geo.photosForLocation&api_key=e3d577010e5979a2ad2a22714abd901e&lat=40.6700&lon=73.9400&format=json&nojsoncallback=1&auth_token=72157638668602974-e1a3a3aa1e6d3dd8&api_sig=a0233b016c863b1662aeb21a664c351a
Please tell me what should I do. any help appreciated
I suspect that you are seeking too precise a match on your lat-long. Use the &accuracy parameter to specify a less precise match. (The default value of 16 specifies a very precise match.) Flickr suggests a value of 11 to match at the city level, so add
&accuracy=11
to your URL.
Update
I have not had any luck retrieving images with flickr.photos.geo.photosForLocation, but I have retrieved images by lat-long with flickr.photos.search. Note this comment in the documentation:
Geo queries require some sort of limiting agent in order to prevent
the database from crying. This is basically like the check against
"parameterless searches" for queries without a geo component.
A tag, for instance, is considered a limiting agent as are user
defined min_date_taken and min_date_upload parameters — If no limiting
factor is passed we return only photos added in the last 12 hours
(though we may extend the limit in the future).
Also remember that longitudes of places in the Western Hemisphere are specified as negative numbers.

Identifying the most relavant document in a information retrieval system

I am developing a search engine modeled after google in my spare time.
I am using the original google research paper located at http://infolab.stanford.edu/~backrub/google.html as my guideline.
As i am developing a very very simplified version of google i am not using pagerank algorithm at all for now.
So far i have developed a simple parser and indexer whose result is that i have an inverted index containing number of hits, hit location and document hash against each unique word.
Now i am trying to develop a query engine. However i am finding it hard to identify the most relevant document for a multi token query.
Specifically lets say i am having difficulty in calculating the proximity of the query words to each other in a document.
I have thought of a algorithm that scans each document for the query words and calculates the proximity score based on how much the query words are close to each other however i suspect this would take a long time, and i think there is a better way to do this of which i am not aware and the research paper is too general to get an answer.
I am just looking for a pointer in the right direction.
Any sort of help would be very very very appreciated.
Look at the inverted index section of "Search Engine Indexing" on Wikipedia http://en.wikipedia.org/wiki/Search_engine_indexing#Inverted_indices
Basically, you want to save the position information of a given word within a document, this makes it easy to compute proximity. This information is saved in the index.
The key point is to index your documents so you don't need to scan them every time. The search for keywords is done on the index that points to the documents containing those keywords.
P.S. don't forget that you're trying to keep the index as small as possible, so storing gaps or differences for word positions will save same memory (as explained in: J. Zobel, A. Moffat - Inverted Files for Search Text Engines at page 23).

Twitter Stream api filter by location AND track

I'm using the following line in order to get geolocated tweets that contain a certain keyword. (I'm using the word Madonna)
https://stream.twitter.com/1.1/statuses/filter.json?track=Madonna&locations=-180,-90,180,90
My problem is that result is not consisted by geolocated tweets that contain the keyword Madonna, but is consisted by geolocated tweets in general.
Any help on what I'm doing wrong here?
"-180,-90,180,90" - it is worldwide location;
Currently for use "AND" instead of "OR" in Twitter stream API you need make request like this: https://stream.twitter.com/1.1/statuses/filter.json?locations=-74,40,-73,41 and filtered results by "Madonna" inside your app after. Unfortunatly, I can not find another way for today;
Filtering by locations can contain:
If coordinates is empty but place is populated, the region defined in
place is checked for intersection against the locations bounding box.
Any overlap will match.
Another, somewhat hack-y solution to this, is you can have a track key work that would never match, such as "dkghaskldfnascjkawenaf", and add a location bounding box.
The API does an OR relationship between tracking and location, you'll only receive tweets from within (or very nearby) the bounding box

Lookup telephone area code by latitude and longitude

Looking for a way to get a list of telephone area codes for a given latitude and longitude (and if necessary a given intl. code.) Note, I'm not talking about international dialing prefixes but the area codes within them.
For example, Denver Colorado is covered by the area codes 303 and 720. It's at 39.739 -104.985 and is in NANP 1. So given 39.739,-104.985,1 I'd like to get back [303,720].
Libraries, web services, DB's, or raw data that needs to be parsed into a DB, e.g., a web page of shape points, are all fine and the more global coverage the better, but just NANP 1 would be a great help.
Note I already use MaxMind and could turn the lat-lng into a fake IP and use that as the lookup key, but MaxMind claims only U.S. area codes (whether they truly mean U.S. or actually NANP I haven't tested) and seemingly only 1 per location (e.g. just 303 for Denver.) So it's a possibility, just not a great one.
UPDATE: I found some more relevant information, but no definitive solutions so I'm listing it here rather than in an answer:
I was able to find two U.S. databases http://www.area-codes.com/area-code-database.asp and http://www.nationalnanpa.com/area_codes/index.html (50% down the page, MS Access file.) The former includes lat/lng for $450 and the latter would require nearest-neighbor matching as KeithS talks about (it's probably the same DB underlying the NANPA City Query he found.)
Additionally I found information that implies Teleatlas has area code boundary maps and that ESRI includes area code shape files with copies of ArcGIS. Maponics seems to have data available: there's a Google Maps implementation of Maponics' data at http://www.usnaviguide.com/areacode.htm.
Wow. You'll definitely need some sort of pre-existing database of points. My first thought was ZIPList5 Geocode. It includes lat-long data for each active U.S. ZIP code, so you can throw this data in a DB table, index the hell out of it, and search by just about any geographic info you'd have access to. You can buy one copy for $40, with enterprise-level use for $100. Only problem is that this DB has only the "primary" area code for each ZIP code, so metro areas that have more than one (Dallas, Chicago, NYC) aren't going to show all of them.
You could try a two-pronged approach with some free data I found: for a given latitude and longitude, do a nearest-neighbors search of the data in the USGS Geographic Names Information System; it includes information on every human habitation center, and every named landmark feature, with lat/long coordinates of their centers. You now have your lat/long point mapped to the nearest town/city, ZIP code, county, and state. Now, you can compare that against this list of U.S. Area Codes, to find area codes matching any or all of the identifying information from the USGS. This is all free, and will eventually get you what you need, but you'll probably have to do some work to "massage" the two sets of data into something you can efficiently cross-reference, and/or you'll need to implement a good "search engine" that will accurately find nearest-neighbor named points, and then find area codes for locations matching the names.
One more thing to look at is NANPA, which administers area code assignment to begin with. I'm sure they have a more comprehensive downloadable DB, but the only free public access I could find was this search page, which will find area codes for any city with >20k people. You could turn your lat/long data into a city and state, and then hit this search page: NANPA City Query
Here is an option:
http://geocoder.ca/39.739,-104.985?geoit=xml
<TimeZone>America/Denver</TimeZone>
<AreaCode>720,303</AreaCode

Resources