This question is a general version of a more specific question asked here. However, those answers were unusable.
Question: What is the raw source for geoIP data?
Many websites will tell me where my IP is, but they all appear to be using databases from fewer than 5 companies (most are using a database from MaxMind). These companies offer limited free versions of their databases, but I'm trying to determine what they're using for their source data?
I've tried using Linux/Unix commands such as ping, traceroute, dig, whois, etc., but they don't provide predictably accurate information.
Preamble: I believe this is actually a very valid question for SO website as understanding how such things work is important to understanding how such datasets can be used in software. However the answer to this question is rather complex and full of historical remarks.
First - it is worth mentioning that there is NO unified raw geoip data. Such thing just does not exist. Second - the data for this comes from multiple resources and often is not reliable and/or outdated.
To understand how that comes to be one need to know how Internet came into existence and spread around the world. Short summary is below:
IANA is a global [non-profit] organization which manages assignment of IP blocks to regional organizations: https://www.iana.org/numbers This happens upon request and regional organization requests specified block size
Regional organizations may assign those IP blocks to either ISP directly or to country level sub-organizations (who would assign that to ISP then).
ISP assigns IP addresses to local branches etc.
From above you can easily see that:
There is no single body which is responsible for IP block assignment to this or that location
Decisions how to (and whether to) release information about which IP belongs to which location are not taken uniformly and instead each organizations decides how to (and whether do it at all) release that information
All of above creates a whole lot of mess. It takes a lot of dedication and long time to obtain, aggregate and sort this data. And this is why most up-to-date and detailed geoip datasets are commercial commodity.
Whoever takes on a challenge of building their own dataset should be able to obtain this information directly from end users (ISPs), because higher level organizations do not know to which location each IP address will be assigned. Higher level organizations only distribute IP blocks among applicants (and keep some reserve for faster processing) and it is a lowest level organizations who decide which location gets which IP address and they are not obligated to release this information publicly.
UPD:
To start building your own dataset you can begin with this list of blocks and how they are assigned
I've searched around a while and all of the IP --> Hostname things actually only end up giving an ISP. Is there something that goes beyond that? I'm only finding pay services that go further and not something that I can just tap a nice API and programmatically do it.
http://ipinfo.io/ just ends up showing ISP for many of what I've sampled. I saw that guy posts here fairly often.
whoisvisiting.com runs about $99/mnth for what my company site does but in that range I'd rather code something. I'm using the free trial right now and have the IP's logging to analytics so I'm looking at what it returns, what IIS returns as the hostname and what a couple sources like ipinfo.io show and whoisvisiting somehow actually shows what I'm looking for.
There's no way to do so. There's no central registry for which company has which address ranges. In fact, most companies will just be identifiable via their ISP.
Your paid services might be scams, by the way, or just work on very few select companies and universities that actually act as autonomous entities in the IP sense.
It is unlikely to differentiate between ISP or company IP address. Some geolocation providers will use range size or level of allocation to name ISP or business. However, this approach is not always accurate.
I've read in Wikipedia that one of the ways to obtain geolocation information for a given IP is done using DNSBL. The following link is: http://en.wikipedia.org/wiki/Geolocation_software#Data_sources
Could someone explain me how this is done? And in general, what is DNSBL rather than a banning list?
DNSBL is a blacklist/database based on dns. DNS is just your api to get a specific result. Others could be HTTP or a simple local file.
IP needs routing and thus the physical machines doing that are placed in certain locations. Knowing that makes it possible to collect data where the routing points are and thus get to closest location of a certain IP address. (Knowing that there are 5 big co
http://en.wikipedia.org/wiki/Geo_targeting
http://en.wikipedia.org/wiki/LOC_record
http://en.wikipedia.org/wiki/Regional_Internet_registry
I have a geo-sensitive webapp for which I send a request's IP to a remote, commercial ip-to-location service, and get back the country, city, ISP, etc. for the IP.
I currently cache the IP lookups in my database in order to make subsequent lookups faster and free (the commercial service charges per lookup).
I wonder if I can further optimize my caching by assuming that the first 16 bits (i.e. the aaa.bbb in a aaa.bbb.ccc.ddd addresss) always have a uniform location. That way I can have at most 2^15 records to cache.
I don't mind so much about uniformity of ISP but that info would be helpful as well.
I'd recommend going down to at least /24 resolution. Oftentimes a /16 will tell you the ISP but not the city, or vice versa.
If you want a good idea of what the maps really look like, you can spend 49 USD on a developer license to Geobytes's GeoNetMap database. A developer license allows you to download the entire map from IP blocks to locations as a bunch of CSV files, but doesn't cover deploying it onto a production server. Geobytes has the added advantage of being entirely local, so lookups are liquid fast.
MaxMind also has a free downloadable map offering, although it is somewhat cut down from the full map, producing approximately double the error rate.
No, it's not safe. For example, if you do a GeoIP lookup on 216.34.181.45 (Slashdot) you get Mountain View, California. If you do a lookup on 216.34.1.1 you get Chesterfield, Missouri.
With respect to your caching, keep in mind that IPs can move around spatially. If an ISP goes bankrupt and its block gets bought by someone else, that block of IPs will move location.
Right this is confusing me quite a bit, i'm not sure if any of you have noticed or used the "my location" feature on google maps using your desktop (or none GPS/none mobile device). If you have a browser with google gears (easiest to use is Google Chrome) then you will have a blue circle above the zoom function in Google Maps, when clicked (without being logged into my Google Account) using standard Wi Fi to my own personal router and a normal internet connection to my ISP, it somehow manages to pinpoint my exact location with a 100% accuracy (at this moment in time).
How does it do it? they breifly mention it here but it doesn't quite explain it, it says that my browser knows where i am...
...i am baffled, how?
I am intrigued because I would love to integrate it in the future of my programming projects, just like some background understanding and it doesn't seem too well documented at the moment.
I am currently in Tokyo, and I used to be in Switzerland. Yet, my location until some days ago was not pinpinted exactly, except in the broad Tokyo area. Today I tried, and I appear to be in Switzerland. How?
Well the secret is that I am now connected through wireless, and my wireless router has been identified (thanks to association to other wifis around me at that time) in a very accurate area in Switzerland. Now, my wifi moved to Tokyo, but the queried system still thinks the wifi router is in Switzerland, because either it has no information about the additional wifis surrounding me right now, or it cannot sort out the conflicting info (namely, the specific info about my wifi router against my ip geolocation, which pinpoints me in the far east).
So, to answer your question, google, or someone for him, did "wardriving" around, mapping the wifi presence. Every time a query is performed to the system (probably in compliance with the W3C draft for the geolocation API) your computer sends the wifi identifiers it sees, and the system does two things:
queries its database if geolocation exists for some of the wifis you passed, and returns the "wardrived" position if found, eventually with triangulation if intensities are present. The more wifi networks around, the higher is the accuracy of the positioning.
adds additional networks you see that are currently not in the database to their database, so they can be reused later.
As you see, the system builds up by itself. The only thing you need is good seeding. After that, it extends in "50 meters chunks" (the range of a newly found wifi connection).
Of course, if you really want the system go banana, you can start exchanging wifi routers around the globe with fellow revolutionaries of the no-global-positioning movement.
It's a lot more simple that you think. You've signed into both your mobile and Chrome on your desktop using the same Google account. Google simply expect you will have your mobile with you most of the time. They take the location data from your phone and assume the location of your current desktop session is the same.
I proved this by RDPing into my Windows machine at home from work and checking Google maps remotely. It show my location as the same as Chrome on Linux at work.
If you don't have a mobile that is signed into Google then all they can do is lookup GeoIP data for the IP address assigned by your ISP. It will typically be wildly inaccurate.
They use a combination of IP geolocation, as well as comparing the results of a scan for nearby wireless networks with a database on their side (which is built by collecting GPS coordinates alongside wifi scan data when Android phone users use their GPS)
I've finally worked it out. The biggest issue is how they managed to work out what Wireless networks were around me and how do they know where these networks are.
It "seems" to be something similar to this:
skyhookwireless.com [or similar] Company has mapped the location of many wireless access points, i assume by similar means that google streetview went around and picked up all the photos.
Using Google gears and my browser, we can report which wireless networks i see and have around me
Compare these wireless points to their geolocation and triangulate my position.
Reference: Slashdot
According to Google Maps' own help:
Rejecting the WiFi networks idea!
Sorry folks... I don't see it. Using WiFi networks around you seems to be a highly inaccurate and ineffective method of collecting data. WiFi networks these days simply don't stay long in one place.
Think about it, the WiFi networks change every day. Not to mention MiFi and Adhoc networks which are "designed" to be mobile and travel with the users. Equipment breaks, network settings change, people move... Relying on "WiFi Networks" in your area seems highly inaccurate and in the end may not even offer a significant improvement in granularity over IP lookup.
I think the idea that iPhone users are "scanning and sending" the WiFi survey data back to google, and the wardriving, perhaps in conjunction with the Google Maps "Street View" mapping might seem like a very possible method of collecting this data however, in practicality, it does not work as a business model.
Oh and btw, I forgot to mention in my prior post... when I originally pulled my location the time I was pinpointed "precisely" on the map I was connecting to a router from my desktop over an ethernet connection. I don't have a WiFi card on my desktop.
So if that "nearby WiFi networks" theory was true... then I shouldn't have been able to pinpoint my location with such precision.
I'll call my ISP, SKyrim, and ask them as to whether they share their network topology to enable geolocation on their networks.
I know you can look up IP address to get approximate location, but it's not always accurate. Perhaps they're using that?
update:
Typically, your browser uses
information about the Wi-Fi access
points around you to estimate your
location. If no Wi-Fi access points
are in range, or your computer doesn't
have Wi-Fi, it may resort to using
your computer's IP address to get an
approximate location.
It is possible get your approximate locate based on your IP address (wireless or fixed).
See for example hostip.info or maxmind which basically provide a mapping from IP address to geographical coordinates. The probably use many kinds of heuristics and datasources. This kind of system has probably enough accuracy to put you in right major city, in most cases.
Google probably uses somewhat similar approach in addition to WiFi tricks.
So Google keep records of Wifi router location by using any cellphone
GPS that connected to that router when you use Google maps or
location on cellphone. then google knows every device that connected
to that Wifi router uses the same location.
when GPS off or no cellphone connected to router Google uses IP
geolocation