Geo: Need to convert a "place name" into an "canonical place name" - geolocation

Our users can enter their location into a free-form text field, like "Austin TX", or even "The Motor City". How can I take this information and turn it into a canonical name? For example, "The Motor City" would be Detroit, MI. They might even enter a zipcode or an address.
I'm planning on storing the place name, or the lat/long, or the unique id code for their location in the database. Then I can determine proximity. Is there a geo lookup api I can use for this?

Yahoo has a nice open API for dealing with this kind of thing, WOEID's (Where on earth); query the WOE DB for your freeform string, it'll give you back a WOEID, then Query the WOEID for whatever actual data you want to store.
http://developer.yahoo.com/geo/geoplanet/guide/concepts.html

Related

Geolocation: How to derive the Country using an address/city/place?

I have a .csv file with Twitter profiles including information such as username, name, description etc. One column is geolocation. In this text the user may have a country (i.e., UK), a city or town (i.e., Cambridge), an actual address (5 Tyrian Place, WR5 TY1), a state (i.e, California, CA) or something silly (i.e., West of Hell).
Is there an API/library/automatic way of taking this information and deriving the country? For example, if the location is Cambridge the output should be UK, if the address is in the UK, the output should be UK, etc.
Google has a reverse geocoding service which you can access through their Maps API:
https://developers.google.com/maps/documentation/geocoding/start
They let you make 2500 free requests per day. One nice feature is it will give you correct latitude, longitude, state, country, etc for things like "Golden Gate Bridge" and "The Big Apple." Twitter users enter all sorts of (sarcastic) phrases for their location -- like "West of Hell," "Mars," etc -- and Google will reverse geocode that as well. Though, that may not be very useful.
As another level of checking, you can compare the user's timezone ("utc_offset"), if it is present, to the place that Google returns. It's a bit involved and requires that you compare the timezone's latitude boundaries to the latitude and longitude in Google's response.

Matching MapKit Places with Facebook Places

I am saving photos with city names to server in my application. Firstly, I am getting city names with MapKit, by using latitude and longitude, and then saving photo and city name to database. Later when user want to search a photo, he/she writes the city name and I use autocomplete with Facebook Places (Graph API).
The problem is Facebook Places and MapKit might have different names (spelling). Even they are both in English. I am wondering how to query from my own server which have MapKit cities in it, with Facebook Places cities.
I assume it a is little bit more complicated as it seems first time. Until Facebook, Apple are not using the same data source for their city names it will be hard to find the cities where the name is not exactly the same if you are using the "raw" string, that you get from the FB places.
Maybe there is a much easier way to achieve it, but my first attempts would cover these options:
Save the geo points when you upload the photo, then find a library, API etc.. that returns you a latitude longtitude data based on the Facebook city name and then use this to query the closest result in your database (based on photo location)
2.
Suppose the user typed in a city name and you have a string value (call it rawCity) with the desired city name. Now rawCity should be contained in or be equal to the string that represents the city's Mapkit name.
Let's assign rawCity to a new string called searchStringCity and remove white spaces from it and make the whole string lowercase (non-ascii chars can make some trouble too).
Now you have two strings that should be added to a dictionary: /Pseudo code/
rawCity = Sample City Name
searchStringCity = yoursamplecity
fbCityDictionary = {rawName:rawCity, searchString:searchStringCity}
After you have the fbCityDictionary you're ready with the Facebook part.
As a second step you need some database related work, so next I would create a searchString column in my database and fill it with the "standarized"(remove whitespaces,uppercases,charachter coding stuff) name of the Mapkit type city name.
Now you can write a query where a db item's searchString value is equal to fbCityDictionary[searchString]. However it won't perfectly solve your problem, it will work when a whitespace or a lower/uppercase letter was the problem, but there are a lot of city names that doesn't has an english version and they can be much different in different map databases.
So you will be good for example cases like these:
Facebook version:
Sample City Name ---> samplecityname
Mapkit version:
Samplecity Name ---> samplecityname
These solutions can improve the results, but I would be curious to hear a better solution.

How do I check whether a given string is a valid geographical location or not?

I have a list of strings (noun phrases) and I want to filter out all valid geographical locations from them. Most of these (unwanted location names) are country or city or state names. What would be a way to do this? Is there any open-source lookup table available which contains all country, states, cities of the world?
Example desired output:
TREC4: false, Vienna: true, Ministry: false, IBM: false, Montreal: true, Singapore: true
Unlike this post: Verify user input location string is a valid geographic location?
I have a high number of strings like these (~0.7 million) so google geolocation API is probably not an option for me.
You can use geoplanet data by Yahoo, or geonames data by geonames.org.
Here is a link to geoplanet TSV file containing 5 million geographical places of the world :
https://developer.yahoo.com/geo/geoplanet/data/
Moreover, geoplanet data will provide you type ( city,country,suburb etc) of the geographical place, along with a unique id.
https://developer.yahoo.com/geo/geoplanet/guide/concepts.html
You can do a lowercase, sanitized ( e.g. remove special characters and other anomalies) match of your needle string to the names present in this data.
If you do not want full file scans, first processing this data to store it in a fast lookup database like mongodb or redis will be beneficial.
I can suggest the following three options:
a) Using the Alchemy API: http://www.alchemyapi.com/
If you try their demo, places like France, Honolulu give the entity type as Country or City
b) Using TAGME: http://tagme.di.unipi.it/
TAGME connects every entity in a given text to the corresponding wikipedia page. Crawl the wikipedia page and check the infobox and filter
c) Using Wikipedia Miner: I was unable to find relevant links for this. However, this also works like TAGME.
Suggest you to try all three and do majority voting for each instance.

Address field validation for iOS / Mac

I want to create an "Add Address" view, a very basic "Street, City, Zip, Country" type of page: multiple text fields inside a table view. This is simple if you only ever added U.S addresses, but I'm not sure about how to do this the right way though, handling all international use-cases as well. Essentially:
1. How do you pick the right field label for each country? For e.g. for US / Australian addresses, the field should be called "State"; for UK, it's called "County", in some places it's called "Province". How do you know what the label should say (short of hard-coding logic myself for each country)?
2. How do you validate the values for those field? UK postal codes have a certain format, whereas in the US it's a 5-digit ZIP code. Also, in the US, there is a list of states that the user can select. How do you get that list?
I've looked into NSLocale, and can't find any way to do this. Surely there must be a good and easy way to do this?
I dug around and in the end the best thing I found was a guide on "The good international address field form", but it'll still be hard to validate it. I don't think it's done.
http://www.uxmatters.com/mt/archives/2008/06/international-address-fields-in-web-forms.php
One method could be to reverse lookup the address through mapkit.
You can try to simplify the UI by adding just one text field and ask user to enter his address in an arbitrary way, and then use CLGeocoder class to convert the string to instance of CLPlacemark, which is a convenient container for such information as country, postal code, etc.

How to do a city/state/country code lookup based on zip/country input by the user?

Would there be a way to do a city/state/country code lookup based on zip/country input by the user? My site will be international, hence the reason for asking the user to input their country.
I'm thinking the user inputs the zip/post-code and country, which gets saved to the database and then the Google geocode API will convert this to city, state and country and print the output to their user profile. For example:
User input:
Zip - 92646
Country - USA
Output:
Huntington Beach, CA, USA
I could just let the users input their city, state and country, but in the future I want to do some geocoding. So it makes sense to set it up now rather than migrate the database at a later stage. Or do you think I'm doing the wrong thing here? I have a site built in Rails. Thanks in advance.
** Comment: Looks like the demonstration on the RubyCoder Gem allows you to input the zip and country to print the City/State/Country/Zip, which is exactly what I'm after. Thoughts on Geocoder versus Google goecoder API?
I would advise to use the geocoder gem and allow the user to enter their address on one field. It is easier for the user to enter only one field in it a convenient format. And keep it string as a full address. Then give this address to geocoder (in general geocoder will do it automatically), and from there take the coordinates, city, state, etc.
If the user enters a bad address, he simply clarify it. This is just my opinion, not the rule.

Resources