Normalizing location data - machine-learning

Before I ask the question I just want to say that I am a backend engineer and have no experience in data science, but I'm trying to look into the machine learning solution for this problem and any sort of answer that what I'm trying to do is impossible or that I should be looking into something different is appreciated.
I'm working on a project and we currently want to normalize location data.
User has free text input, and we want to try and map that into a row in the predefined table of all the locations in the world, if possible, and get country, state and city.
So we want to map input => {country, state, city} for example
Schwäbisch Gmünd, Baden-Württemberg, Deutschland => Germany, Baden-Württemberg, Schwäbisch Gmünd
United States, Los Angeles => United States, California, Los Angeles
United States => United States, null, null
Tuscany => Italy, Tuscany, null
Wien => Austria, Vienna, Vienna
Is this possible to do this with machine learning? If it is what should we be looking into?

Related

Mapkit to show results formatted like so: city, state, country

I have been developing an application and I am new to the whole Mapkit idea. I am wanting my app to allow a user to search for a location but only display results like city, state, country and not address or points of interest. I know this may be a simple question but I am having a hard time finding the answer. Just for example if a user typed "San" into my searchbar, it would pull up "San Antonio, Tx, United States" and every other location that has San in it. I'm sure there is a way to implement this without having to create a giant database or json tree. Thank you in advance!
Take a look at MKLocalSearchRequest, thats basic MapKit searching.

Geolocation: How to derive the Country using an address/city/place?

I have a .csv file with Twitter profiles including information such as username, name, description etc. One column is geolocation. In this text the user may have a country (i.e., UK), a city or town (i.e., Cambridge), an actual address (5 Tyrian Place, WR5 TY1), a state (i.e, California, CA) or something silly (i.e., West of Hell).
Is there an API/library/automatic way of taking this information and deriving the country? For example, if the location is Cambridge the output should be UK, if the address is in the UK, the output should be UK, etc.
Google has a reverse geocoding service which you can access through their Maps API:
https://developers.google.com/maps/documentation/geocoding/start
They let you make 2500 free requests per day. One nice feature is it will give you correct latitude, longitude, state, country, etc for things like "Golden Gate Bridge" and "The Big Apple." Twitter users enter all sorts of (sarcastic) phrases for their location -- like "West of Hell," "Mars," etc -- and Google will reverse geocode that as well. Though, that may not be very useful.
As another level of checking, you can compare the user's timezone ("utc_offset"), if it is present, to the place that Google returns. It's a bit involved and requires that you compare the timezone's latitude boundaries to the latitude and longitude in Google's response.

Trouble parsing autocomplete response

Predictions for addresses in the U.S. seem consistent. However when I get predictions for a U.K. address, I get inconsistent results. For example here are some results I receive:
* Pinewood Green, Iver, Buckinghamshire SL0 0QH, United Kingdom
* Berkshire, William Street, Windsor SL4 1AA, United Kingdom
The first one is Address, City, County, Postal code, Country
The second is County, Address, City, Postal code, Country
The county's position changes. I can find nothing in the response that would help me know from the response which field is what.
Additionally, with a response such as this
* 20 High Street, East Hoathly, East Sussex BN8 6EB, United Kingdom
how do I tell where the county stops and the postal code starts? Terms/Offsets?

Get a list of cities around a city with a given radius with google places api

I'm using google places api for autocomplete on a RoR project.
I want to get a list of cities around the typed city with a given radius.
For instance:
I type "Paris, France" in the input. I want to have a list (JSON or whatever) which contains all the cities around the city with a given radius (maybe 10 miles or more, it'll be a constant in the project).
How can I do that?
Thanks!
-EDIT-
I've end up with this:
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=48.534031,2.632121999999981&language=fr&types=locality&sensor=false&rankby=distance&key=YOUR_KEY_HERE
The lat and lng must point to a town near Paris called "Le Mee sur Seine" (https://maps.google.fr/maps?hl=fr&q=48.534031,2.632121999999981).
I want to list the towns surrouding this city ordered by distance but I have "ZERO_RESULTS" as a result...
The type you're tying to filter on, "locality" is specifically listed as not supported. That is, Google will not let you specifically search for locality or a number of other political geo types. See the full list of unsupported types here: https://developers.google.com/places/documentation/supported_types#table2

GeoIP nearest (closest) country

Guys i have a little problem, i tried to find some examples on GeoIP based system that expands searches based on nearest neighbor countries. For example the visitor is from UK and tries to find IPs from France, Spain, Belgium etc. not for example Brazil, Argentina, China. So how can i get nearest countries from a given country/IP and expand incrementing the distance?
Edit: I'm using the free Max-mind version, since i don't care about cities that much. And my project is C# based.
Well, the first step to reduce the problem is to use basic geography, and categorize the countries by continents.
From there you can make a list of distances of within a given continent, and sort based on those "distances" or "weights".
The geographic distance (e.g. km or miles) between capital cities should be a "good enough" approximation to get started if you want to be fancy. I bet you could even find such a list with a bit of searching on the Internet.
From there you have the "post-office problem" (Knuth) or "nearest neighbor search" optimization problem, and in this case I suspect you can [REWORDED] simply go with a linear search within the continents partitions. If you need better performance, then an approximate algorithm approach should suffice. (answers are not guaranteed to be best solution, but should be reasonable most of the time)
Note this form of geography based "routing" is weak in a few exceptional cases, such as Cuba, which does not get Internet access from its (naive) obvious geographic neighbor, USA, and some "black-hole" type countries due to political relations. North Korea and Tibet I suspect are similar cases.
Maxmind gives you the Lat/Long of each country, so you can just calculate the difference of your country to the others and you're done. See this thread for geolocation distance calculation or use a library of your choice.
But keep in mind the the geo location of a country is just a single point somewhere in this country and not the nearest point to you.

Resources