Guys i have a little problem, i tried to find some examples on GeoIP based system that expands searches based on nearest neighbor countries. For example the visitor is from UK and tries to find IPs from France, Spain, Belgium etc. not for example Brazil, Argentina, China. So how can i get nearest countries from a given country/IP and expand incrementing the distance?
Edit: I'm using the free Max-mind version, since i don't care about cities that much. And my project is C# based.
Well, the first step to reduce the problem is to use basic geography, and categorize the countries by continents.
From there you can make a list of distances of within a given continent, and sort based on those "distances" or "weights".
The geographic distance (e.g. km or miles) between capital cities should be a "good enough" approximation to get started if you want to be fancy. I bet you could even find such a list with a bit of searching on the Internet.
From there you have the "post-office problem" (Knuth) or "nearest neighbor search" optimization problem, and in this case I suspect you can [REWORDED] simply go with a linear search within the continents partitions. If you need better performance, then an approximate algorithm approach should suffice. (answers are not guaranteed to be best solution, but should be reasonable most of the time)
Note this form of geography based "routing" is weak in a few exceptional cases, such as Cuba, which does not get Internet access from its (naive) obvious geographic neighbor, USA, and some "black-hole" type countries due to political relations. North Korea and Tibet I suspect are similar cases.
Maxmind gives you the Lat/Long of each country, so you can just calculate the difference of your country to the others and you're done. See this thread for geolocation distance calculation or use a library of your choice.
But keep in mind the the geo location of a country is just a single point somewhere in this country and not the nearest point to you.
Related
Is there a dynamic hierarchical data source out there that I can use to identify a lat lng point into a neighborhood?
For instance, if I was in Manhattan, it would recognize that I'm in Chinatown, Manhattan, New York City in that order. And if I was in a less densely populated area it would just put me into a neighborhood that would span a larger area. It can be a bit fuzzy in this concept.
Ultimately I want to group people into their nearest neighobrhood given evenly sized neighborhood population.
I know that zip codes can roll up into a metro area, but I wonder if there is something that's more granular or more dynamic.
Google's geocoding API can give a variety of levels of detail about a location. It varies by region, country, and even at state/local levels but you should be able to get close to what you're looking for.
I'm working on a geolocation based personal project where I'd like to fetch the suppliers based on the user's latitude & longitude value. And the deal is suppliers have variable supply radius, few suppliers supply only within 5km of their radius while some may supply across the entire city.
The general way to go about this is for each supplier calculate the distance between the supplier & the user. If it is less than or equal to it's supply radius then display that supplier in the results.
But this might be very slow, so I thought I'd split the city into four zones(pick four latitude & longitude values from google maps for North East West South) & whenever a supplier is added I'll do the math & assign the zones to which they can supply in the database. Now whenever I get the user's latitude & longitude I'd determine the zone & fetch suppliers that can supply to that zone, do the distance calculation & filter them out. This way I do the calculation on less number of suppliers instead of the entire list.
But is it a good idea or can I do better ?
In you are using Postgres/Postgis, you can make use of spatial indexes, and then use ST_DWithin(geom1, geom2, distance) type queries see ST_DWithin docs. The spatial index will partition the space for you, making this kind of query very efficient and avoid you having to come up with any spatial partitioning scheme of your own.
Another operator you can use is the <-> operator, which is very efficient with a spatial index and is used in the order by clause, to get the nearest y things to some point x, (k nearest neighbour search) see <-> operator docs. One caveat for this operator to work properly with the index, the point you are searching for, needs to be a constant, as it sounds like it would be in your case.
I'm currently using a very large geo-ip database that i've built as a mixture from many freeware sites.
The problem is - the mapping of all those database is : map: (ip) -> (latitude,long)
I'm looking for a way that will deduce the location of those latitude and long points by resolution of a city and if possible - offline.
thanks
You may want to try Google Geocoding http://code.google.com/intl/en/apis/maps/documentation/geocoding/
to do it offline, you'll need a database of long/lat coordinates, such as this: http://www.maxmind.com/app/worldcities
then to match the long/lat to the cities, you'll have to build an algorithm which narrows it down to within a margin of error.
a brute-force method might be to measure the distance by using pythagoras' theorem, but that would rapidly kill your CPU. a better way may be to start by excluding results that are 1 or more above or below your target lat/long, then do your measurements on the remaining results.
you can get city and region lat/lon information from citycsv.com if you really need your info offline. It would be easy to query the data for lat/lon and get a city or region back. However as stated google would be able to take a lot of overhead off your hands with their online geocoding tools.
you could run google's geocode in burst-mode (2.500 max per day) through a cron job and fill up your offline database over the course of ....
Looking for a way to get a list of telephone area codes for a given latitude and longitude (and if necessary a given intl. code.) Note, I'm not talking about international dialing prefixes but the area codes within them.
For example, Denver Colorado is covered by the area codes 303 and 720. It's at 39.739 -104.985 and is in NANP 1. So given 39.739,-104.985,1 I'd like to get back [303,720].
Libraries, web services, DB's, or raw data that needs to be parsed into a DB, e.g., a web page of shape points, are all fine and the more global coverage the better, but just NANP 1 would be a great help.
Note I already use MaxMind and could turn the lat-lng into a fake IP and use that as the lookup key, but MaxMind claims only U.S. area codes (whether they truly mean U.S. or actually NANP I haven't tested) and seemingly only 1 per location (e.g. just 303 for Denver.) So it's a possibility, just not a great one.
UPDATE: I found some more relevant information, but no definitive solutions so I'm listing it here rather than in an answer:
I was able to find two U.S. databases http://www.area-codes.com/area-code-database.asp and http://www.nationalnanpa.com/area_codes/index.html (50% down the page, MS Access file.) The former includes lat/lng for $450 and the latter would require nearest-neighbor matching as KeithS talks about (it's probably the same DB underlying the NANPA City Query he found.)
Additionally I found information that implies Teleatlas has area code boundary maps and that ESRI includes area code shape files with copies of ArcGIS. Maponics seems to have data available: there's a Google Maps implementation of Maponics' data at http://www.usnaviguide.com/areacode.htm.
Wow. You'll definitely need some sort of pre-existing database of points. My first thought was ZIPList5 Geocode. It includes lat-long data for each active U.S. ZIP code, so you can throw this data in a DB table, index the hell out of it, and search by just about any geographic info you'd have access to. You can buy one copy for $40, with enterprise-level use for $100. Only problem is that this DB has only the "primary" area code for each ZIP code, so metro areas that have more than one (Dallas, Chicago, NYC) aren't going to show all of them.
You could try a two-pronged approach with some free data I found: for a given latitude and longitude, do a nearest-neighbors search of the data in the USGS Geographic Names Information System; it includes information on every human habitation center, and every named landmark feature, with lat/long coordinates of their centers. You now have your lat/long point mapped to the nearest town/city, ZIP code, county, and state. Now, you can compare that against this list of U.S. Area Codes, to find area codes matching any or all of the identifying information from the USGS. This is all free, and will eventually get you what you need, but you'll probably have to do some work to "massage" the two sets of data into something you can efficiently cross-reference, and/or you'll need to implement a good "search engine" that will accurately find nearest-neighbor named points, and then find area codes for locations matching the names.
One more thing to look at is NANPA, which administers area code assignment to begin with. I'm sure they have a more comprehensive downloadable DB, but the only free public access I could find was this search page, which will find area codes for any city with >20k people. You could turn your lat/long data into a city and state, and then hit this search page: NANPA City Query
Here is an option:
http://geocoder.ca/39.739,-104.985?geoit=xml
<TimeZone>America/Denver</TimeZone>
<AreaCode>720,303</AreaCode
I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.
A common example would be what Google maps does.
Some examples of input would be:
"City, State, Country"
"City, Country"
"City, Zip Code, Country"
"City, State, Zip Code"
"Zip Code"
What would be an efficient and correct way to parse this input from a user?
If you are aware of any example implementations please share :)
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
If we have numbers in the pattern, then it's probably a zip code.
Is the item in the list of known states?
Countries are also fairly easy to handle like states, there's a limited number.
What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
I found a simple PHP implementation
http://www.eotz.com/2008/07/parsing-location-string-php/
Yahoo seems to have a webservice that offers the functionality (sort of)
http://developer.yahoo.com/geo/placemaker/
Openstreetmap seems to offer the same search functionality on its homepage
http://www.openstreetmap.org/
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
Zip is most obvious, so check for
that first.
State has 50x2 options
(California or CA), check that next
Country has ~190x2 options, depending
on how encompassing you want to be
(US, United States, USA).
Whatever is left over is probably your City.
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.