Nearest zip code of a zip code? - geolocation

I need to find the 5 nearest zip code of a zip code.
For example, I have 33304, and I need to find the nearest ones, like 33309, 33308 ...
Is there a database or a web service somewhere that would help me with that?
I think I'm gonna have to build my own database in order to do that? I know how to do it, but in case it has already been done ...

what country? Maybe you want to checkout Geonames: http://www.geonames.org/
Especially: http://www.geonames.org/export/web-services.html#findNearbyPostalCodes

This answer is in response to the recent bounty requesting a location of US Zip Codes.
The United States Census Bureau has a list of zip codes (with latitude and longitude) on their web site.
The zip code tabulation areas is a CSV that contains (as described in the description of the file format), among other fields:
Column 1 GEOID Five digit ZIP Code Tabulation Area Census Code
...
Column 6 INTPTLAT Latitude (decimal degrees) First character is blank or "-" denoting North or South latitude respectively
Column 7 INTPTLONG Longitude (decimal degrees) First character is blank or "-" denoting East or West longitude respectively
I'd cross reference the above, with data from the Census Bureau's county business patterns CSV to get city, state and county names. This file format is described to contain (among other fields) (description is in the file you select to download):
ZIP C ZIP Code
NAME C ZIP Code Name
...
CITY C ZIP City Name
STABBR C ZIP State Abbreviation
CTY_NAME C ZIP County Name

Here is a free web service to do exactly what you need:
http://www.zipwise.com/webservices/

I tested geonames and did not find it good enough. Test areas near your zip code and you will not find all the zipcodew. These have good results but have not found any API yet
http://www.searchbug.com/tools/zip-radius.aspx

For the nearest ZIP Code, there are two approaches: find contiguous (adjoining) ZIP Codes and/or find the nearest ZIPs based on the distance from the center of one ZIP Code to the center of another, based on latitude and longitude.
To choose the contiguous ZIP approach, you will need ZIP Code Boundaries data. Alignstar makes one that we're very pleased with (we resell it), but ESRI and a couple of other companies have good products as well. My company, GreatData.com, developed a contiguous counties product and could develop a contiguous ZIP Codes product, but so far, nobody has been asking for it. This could be a data file or an API.
To find the nearest ZIP Codes based on centroids (latitude / longitude center points of the ZIP Code), you will need a ZIP Code Database with latitude / longitude data (we provide one, or go here for links to some free resources: //http://uszipcodes.com/free-zip-code-lookup.htm. If you just want an API and not the hassle of doing it yourself, let us know. We've started to develop similar APIs and will develop based on the demand.
One caveat of doing this by the centroids: we've seen in the past where if you have a large (or long), adjoining ZIP, the center of that ZIP could be farther away than another small ZIP Code, so your list of the 5 nearest could miss an adjoining ZIP.

PostgreSQL has new feature in 9.1 called "KNN". It is specifically for "Nearest Neighbor" distance searches, and from benchmarking and testing it it myself, it is very fast, and can be used for zipcodes. Here is a quick introduction to KNN.
You can also find a
thread about applying KNN to zipcodes, and a more recent follow-up.

Related

How to best use zipcodes in Random Forest model training?

I have a dataset with zipcode column. They have some significance in output and I want to use it as a feature. I am using random forest model.
I need a suggestions on best way to use zipcode column as a feature. (For example should I get lat/long for that zipcode rather than directly feeding zipcodes etc.)
Thanks in advance !!
A common way of handling zip codes or any high cardinality categorical column is called "target encoding" or "impact encoding". In H2O, you can apply target encoding to any categorical columns. As of H2O 3.20, this is only available in R, but in the next stable release, 3.22, it will be available in all clients (JIRA ticket here).
If you are using R, my advice is to try both target encoding and also the GLRM method mentioned by Lauren and compare the results. If you're in Python or another language, then try GLRM for now and give target encoding a try when H2O 3.22 is released.
I'd 2nd what Erin LeDell says about target encoding.
Here are some other options and not all of them may apply:
Reduce the granularity of zip Code to the first 1,2,3 or 4 digits. So
zip code 90210 becomes 902 (902XX) and would represent Los Angeles
County. 902 zipcodes
Can you group zip codes by MSA or CBSA?
Is there a feature about zip codes that can be appended i.e. city/urban/rural etc.
Can you pull in some zip code demographics,population size or income
Distance to/from a key location (airport, city center, etc.)
Target encode but then group into very high, high, medium and low (or whatever makes sense) example this will help prevent over training your models.

Distance Calculation between US Zipcodes

I have Zipcode, City and State stored as variables for each user. I need to calculate and display the distance in miles between all users from the database and the current user.
I came across Geo-magic and Geo-distance ruby gems but they use lat and lon for calculation.
Would like to acheive something like this
https://www.freemaptools.com/distance-between-usa-zip-codes.htm
https://www.mapdevelopers.com/distance_from_to.php
Would like some help to implement this in rails in the most efficient way. TIA.
Take a look at Geokit gem. This gem can give you the distance between places based on zip code or address. But this gem will use google Gmaps/Bing maps.
Example:
a=Geokit::Geocoders::GoogleGeocoder.geocode '140 Market St, San Francisco, CA'
b=Geokit::Geocoders::GoogleGeocoder.geocode '789 Geary St, San Francisco, CA'
a.distance_to(b) # Get the distance between 'a' and 'b'
Check the gem home page for more usage.
The Geocoder gem is also a good option.
However, as the other posters have indicated, at some point you're going to need to translate your location data down to latitude/longitude. You can find databases such as this one from SoftwareTools that can translate zip codes to latitude/longitude, but there's not a lot of precision. Zip codes are somewhat deceptive; they vary greatly in terms of size, and their boundaries change with some frequency.
For example, zip code 89049 covers an area of 10,000 square miles, so your distance calculation could be by off by over a hundred miles depending on where in the zip code the user was located.
You'd be better off using a geolocation tool to translate each user's street address to a latitude/longitude and using that for your distance calculations. The Haversine gem can help with that, or if you use PostgreSQL, you can use the earthdistance module.
(I have no relation to SoftwareTools other than as a satisfied customer.)

Is there a reputable source that provides mappings of UN/LOCODEs to Olsen Timezones?

I've been researching CLDR and IANA in order to find a centralized mapping of UN/LOCODEs to Olsen Timezones.
Ideally I would like to have for example:
+--------------+--------------------+
|un_locode |timezone |
+--------------+--------------------+
|USLAX | America/Los_Angeles|
+--------------+--------------------+
for every UN/LOCODE.
Are my nube skills failing me in understanding how to use these sources to reach my goal? (If so please help point me towards the scripting that would allow me to automate providing these mappings).
Or, do these sources fail to have the data correlation that I'm looking for? (If so please let me know if you have a reliable source).
We faced the exact same problem and hence had to provide a solution.
This solution involves linking the UN/LOCODES database with a geolocation/timezone database.
There are a few caveats to this approach that were captured by Matt Johnson's answer and the accompanying comments.
Namely:
the UN/LOCODE database of coordinates is not complete[1] and sometime has inaccurate data[2]
in some cases, a 1 to 1 mapping between the UN/LOCODE and a timezone is impossible due to the political nature of the timezones.
the two points above are worsened by the inaccuracy of free coordinates-to-timezone databases. It is helpful to get a dataset that also includes territorial waters so that ports timezones can be properly linked to the country they belong.
The following repository https://github.com/Portchain/un_locodes_sql contains the code to extract and link the data. It outputs a SQL file that can be imported into a PostgreSQL DB.
The geolocation/timezone data is based on the geo-tz[3] module which seems to source its data from timezone-boundary-builder[4].
Again, the list provided by our repository is of course incomplete and inaccurate. If you see any error in the data, please open a github issue and let's make an accurate, open source list of UN/LOCODE, coordinates and timezone information.
[1] For example, both Los Angeles and San Francisco, USA (USLAX & USSFO) are missing coordinates in the UN/LOCODE database.
[2] The petroleum port of Abu al Bukhoosh (AEABU) is situated in Abu Dhabi (UAE). Its coordinates in the UN/LOCODE database position the port right in the middle of the Persian Gulf (https://www.port-directory.com/ports/abu_al_bukhoosh/). When resolved, this causes the timezone to be unknown.
[3] https://github.com/evansiroky/node-geo-tz
[4] https://github.com/evansiroky/timezone-boundary-builder
The GeoNames free database of cities (which is available to download) provides: city names, latitude/longitude and, most importantly, timezone information. You can fairly quickly make your own database connecting this information with the UN/LOCODE code lists based on the name/country/coordinates.
I've not seen such a source. You could try to create one by mapping the lat/lon coordinates for those entries that have them, and correlating to IANA time zone by one of the methods listed here.
However, be sure to read Wikipedia's article about UN/LOCODE, especially describing errors with coordinates. Also note that many of the coordinates simply not in the data - why? I don't know.
The list of UN/LOCODE for the US is here, and show Los Angeles to be US LAX (not UNLAX). Its coordinates field is blank.
If you can find some other reliable source of UN/LOCODE to lat/lon, then you are in business. A quick search found that GeoNames claims to have this in their premium data subscription, but I haven't investigated further.
CLDR's map is here: https://unicode.org/reports/tr35/#Time_Zone_Identifiers
I saw CLDR tagged but not mentioned.

Telephone Number to Geolocation UK

Is there a service that provides latitude and longitude for UK phone numbers?
For example:
Query: 0141 574 xxx, Returns: (55.8659829, -4.2602205) [Glasgow City Centre]
Allow me to stress that I am not looking for a reverse-directory-enquires. I am more interested in 'local area' for things like weather by phone or "Where's my nearest Pizza Shop?"
If this service doesn't exist your suggestions on how to implement it or where to get data from would also be incredibly useful.
I am aware that Ofcom provides a list of area codes with a place name [1] suitable for geolocation, but I have my concerns about resolution. I see this as a particular problem in smaller towns and rural areas where an area code will cover a large geographical area.
Second Example:
Area Code: 01555, Ofcom: Lanark
However:
01555 860xxx is Crossford (4 miles W of Lanark)
01555 77xxxx is Carluke (5 miles NW)
01555 89xxxx is Lesmahagow (5 miles SW)
01555 840xxx is Carnwath (7 miles NE)
Therefore 01555 covers about ~80 sq miles. That's not particularly local.
[1] Ofcom Area Code Tool: http://www.ofcom.org.uk/consumer/2009/09/telephone-area-codes-tool/
You can get a resonable location for numbers allocated to BT.
The "L" digits map to a particular exchange within that area:
(02X) LLLL XXXX (2+8)
(011X) LLL XXXX (3+7)
(01X1) LLL XXXX (3+7)
(01XXX) LLXXXX (4+6)
(01XXX) LLXXX (4+5)
(01XXXX) LXXXX (5+5)
(01XXXX) LXXX (5+4)
For cable providers (especially those using fibre optic delivery), there is sometimes only one exchange per area code and therefore the numbers in each LL range cover the entire area code.
For numbers allocoted to other providers there's a similar problem. Additionally, those numbers may be allocated as VoIP and in use in another area or even in a completely different country. For non-BT numbers location data cannot be relied on.
For people who have moved and kept their number, location data will also be inaccurate.
That said, CodeLook does a reasonable job of showing the right data: http://www.telecom-tariffs.co.uk/codelook.htm
You may have a problem in that not all numerics after area codes are geographic. Some have been block allocated to Cable Providers. I know my own number has belonged to myself and also a person who lived about 5 miles northeast of my current location, the link... we belong to the same cable provider.
What sort of telephone numbers are they? If they are businesses, what do you think of the possibility of searching for the whole number using say, Googles API, and lifting the actual address from the page? - I know thats harder to do than that, just exploring some possibilities ..;-

Is there a formula to convert from Thomas Bros Map page & grid to a latitude/longitude?

I'm working on a project that contains Thomas Brothers Map page and grid numbers. Is there a way to programatically convert from this map page to a latitude & longitude?
An Example would be for the intersection of the US101 & I405 freeways.
ThomasBrothers: 561-3G (page-grid)
Not that I know of, but I don't have a lot of experience with Thomas bros maps. Are you talking about printed version of the maps or is there a link somewhere to an online map?
If you just need a few lat/longs, then you can look up the locations that correspond to the grid and get the lats and longs manually at many websites, including http://itouchmap.com/latlong.html
If you provide a link to a Thomas bros map that you are using, I might be able to help further.
By looking at the link above, you can determine that US 101 and I-405 has a latitude of 34.16073390017978 and a longitude of -118.46952438354492.
Your best source would be the map publisher. If they choose to help, someone there can tell you exactly what you need to know. If they won't help you, it's unlikely that they've released the information to anyone else.
If that's the case, you could do some work by hand to correlate one point from the map grid to your target coordinate system. Effectively, you could reverse engineer a mapping "datum" for each page. You'd also have to know what map projection was used to render the maps, so that you can calculate the transform from the map coordinates to the geographic coordinates as you move away from your "origin". Finally, you'll need to establish the orientation of the map, since different notions of "north" exist.
It sounds like the Thomas maps use a new grid for every page, rather than bleeding the grid continuously from page to page. If that's the case, you'll have to correlate one point on each map. For example, find a spot where a map grid intersection coincides with a notable road intersection. Then you can find the coordinates of the road intersection using a map with latitude and longitude (a topographic map, TerraServer, etc.). Doing this with two points on the same vertical grid line should help you establish the north used on the map as well.
The short answer is that each of the nine regions has a grid derived from a Lambert conformal conic projection with custom parameters, so you cannot write a conversion program without the parameters.
I've also got ThomasBros. pages that I would like to convert to lat/long for lookup against Google Maps API. They also provided something called TBXY ... not sure what this is -- perhaps some notation for GPS/lat/long?
<Area>"El Cajon"</Area>
<ThomasBrothers>"1297 5E"</ThomasBrothers>
<TBXY>"6481390:1827008"</TBXY>
Thomas Brothers Maps invested a lot when developing their GIS system to create their digital mapping system. Though the first "digitally produced" map was Sacramento County-1990, the development began back in 1986. I expect that their map projection equations are a well guarded trade-secret, which Rand McNally now owns. I'd don't know those equations, but would also like to know them.
There are 9 projections covering the 48 states. If you know the equations for Los Angeles, it is valid across California & Nevada. Oregon & Washington have their own projection. Arizona, New Mexico, Colorado, and Utah share another projection.
I do know this...
As many know, the page grid is an exact 1/2 mile square, or 2640 feet by 2640 feet. The coordinate measurement unit is 1 foot.
To determine the Thomas Brothers XY Coordinate, get one or more of the Thomas Guide CD- ROM maps, which were recently discontinued. The last ones produced for certain California counties were the 2008 edition. Last editions for Seattle, Portland, Las Vegas, and Phoenix/Tucson were the 2007 edition. Each is still available on the Rand McNally website for $20.
When you geo-code a group of addresses, you'll see an output file with the TGXY coordinates and Lat/Lon for the addresses you specified, and the page # and grid that point is in. Once that file is open, you can click on the map to add additional geo-coded points, which will also provide both the coordinates. The output file is saved in an Access database ".mdb" file.
If you know a lot about map projections or solid geometry, the set of corresponding TGXY and Lat/Lon coordiantes will provide you some good data for testing.
As you mentioned San Diego Page 1297, I'll provide its bordering coordinates.
West x=3062760
East x=3086520
North y=0985040
South-y=0966560
This is not in range of the "TBXY" you found on Google. Maybe it's the same projection, with a relocated origin.

Resources