One of the parameters my iOS application must meet is searching in full or partially by address, city, state, and zipcode.
I cannot depend on users using commas to separate the data. I also do cannot scan a string for a zip code since the street number could potentially be 5 digits.
I was wondering what standard practice was used to analyze this sort of input. Any help or reference would be greatly appreciated.
One (slightly inefficient) way of doing this would be to make a call to Google's Geocoding API. You'll get results with address component types, such as street_address, postal_code, administrative_area_level_n, etc.
Hope this helps!
Related
I want to be able to run queries locally comparing latitude and longitude of locations so I can run queries for certain addresses I've captured based on distance.
I found a free database that has this information for zip codes but I want this information for more specific addresses. I've looked at google's geolocation service and it appears it's against the TOS to store these values in my database or to use them for anything other than doing stuff with google maps. (If somebody's looked deeper into this and I'm incorrect let me know)
Am I likely to find any (free or pay) service that will let me store these lat/lon values locally? The number of addresses I need is currently pretty small but if my site becomes popular it could expand quite a bit over time to a large number. I just need to get the coordinates of each address entered once though.
This question hasn't received enough attention...
You're correct -- it can't be done with Google's service and still conform to the TOS. Cheers to you for honestly seeking to comply with the TOS.
I work at a company called SmartyStreets where we process addresses and verify addresses -- and geocode them, too. Google's terms don't allow you to store the data returned from the API, and there's pretty strict usage limits before they throttle or cut off your access.
Screen scraping presents many challenges and problems which are both technical and ethical, and I don't suppose I'll get into them here. The Microsoft library linked to by Giorgio is for .NET only.
If you're still serious about doing this, we have a service called LiveAddress which is accessible from any platform or language. It's a RESTful API which can be called using GET or POST for example, and the output is JSON which is easy to parse in pretty much every common language/platform.
Our terms allow you to store the data you collect as long as you don't re-manufacture our product or build your own database in an attempt to duplicate ours (or something of the like). For what you've described, though, it shouldn't be a problem.
Let me know if you have further questions about address geocoding; I'll be happy to help.
By the way, there's some sample code at our GitHub repo: https://github.com/smartystreets/LiveAddressSamples
http://www.zip-info.com/cgi-local/zipsrch.exe?ll=ll&zip=13206&Go=Go could use a screen scraper if you just need to get them once.
Also Microsoft provides this service. Check if this can help you http://msdn.microsoft.com/en-us/library/cc966913.aspx
Are there any open source/commercial libraries out there that can detect mailing addresses in text, just like how Apple's Mail app underlines addresses on the Mac/iPhone.
I've been doing a little online research and the ideas seem to be either to use Google, Regex or a full on NLP package such as Stanford's NLP, which usually are pretty massive. I doubt iPhone has a 500MB NLP package in there, or connects to Google every time you read an email. Which makes me to believe there should be an easier way. Too bad UIDataDetectors is not open source.
I know this question has been asked before, but there were no conclusive answers, so here's my try.
As for Python you can try Pyap:
https://pypi.python.org/pypi/pyap
It currently supports US and Canadian addresses
Parsing addresses isn't a science. At my office we have been dealing with address parsing for years and the problem is that there aren't any rules about what constitutes a valid address. We use the USPS address database for cleaning addresses which is actually pretty fast and way more accurate than we were ever able to get on our own. It gets us 98% accuracy where as before we got about 90% cleaned addresses.
The bigger problem with address parsing tends to be that people don't input the address the same way. The same address might be in all the following forms.
128 E Beaumont St
128 East Beaumont Street
128 E Bmt St
128 Beaumont Street
128 Highway 88
The third one looks totally wrong but people will type that sometimes. Sometimes a street is also a highway. There are a bunch of possibilities. Just try to catch 90% and you accept that is as good as it gets for address parsing.
Extractiv provides commercial NLP powered by Language Computer Corporation that can parse entities and relations in either uploaded documents or from web crawls. The former service utilizes a REST API. I dropped this URL in, and it extracts 4/5 of the addresses. Note, having them strung like that together makes them especially difficult.
Search for "address" in this JSON output:
http://rest.extractiv.com/extractiv/?url=https://stackoverflow.com/questions/5099684/detect-parse-mailing-addresses-in-text&output_format=json
One of them:
{
"id": 11,
"len": 17,
"offset": 1557,
"text": "128 E Beaumont St",
"type": "ADDRESS"
},
(Note: if you use the HTML output, which is more for demos, it filters out non-sentence content, which is why I showed the JSON instead).
Disclaimer: I work at Extractiv.
Update:
Extractiv is no more.
You can actually get extremely high accuracy as Drew mentioned by extracting the addresses and then comparing them against the USPS data. Getting a DVD from the USPS yearly will certainly work but doesn't factor in the addresses that change. For that, you would want a more up-to-date version. The USPS publishes it's updated address data (in proprietary format) monthly so that would be a good source of authoritative addresses.
On top of that, using an address validation service (after you extract the address data) will standardize the addresses for you and then check them for deliverability and/or vacancy status. As Drew mentioned, the same address can be written in many different ways that still work. However, the USPS will always use the standardized format.
In order to do what you are looking for programmatically, you'll definitely want an API, although list processing services are also available.
SmartyStreets has a free address validation API called LiveAddress that will standardize, verify, and then validate any US postal address. In the interest of full disclosure, I'm the founder of SmartyStreets.
I have a (potentially international) phone number. It may or may not have a country code prefix. Does anyone know of a database that will help me map phone number => time zone? I would even just like to map phone number => country, since I can probably create country => time zone by scraping existing data on the web. This is a more complicated problem than it looks; for example, how do I know if it's a US-based number -- e.g. is it a USA area code, or an international country calling code?
Any language is fine; I can port it.
The best library I know of for parsing phone numbers in arbitrary formats is libphonenumber.
You can't map countries to time zones for what should be obvious reasons. Both you and someone in California are in the same country, you're not even in close to the same time zones. Other countries are even wider.
No. Your primary problem here lwill be that lots of countries share each timezone. Eg UK and Ireland, France, Germany, ITaly etc.
You may be able to guess based on number structure, but your best bet I'm afraid is to try to get the code in any new data.
Is there a good physical address to GeoLocation conversion database in the UK? I am trying to use this to build a globrix style search box http://www.globrix.com/ for a web application. Any pointers will be nice. I have been searching for hours. I have found several that convert UK Postcodes into Geolocation. But I need the addresses listed as on Globrix.
The Google Maps API provides a geocoder webservice that you can actually use independently of Google Maps itself. You send it the address/postcode, and it responds with a lat/long plus disambiguated addresses. We use it server-side in the UK to do address lookup. It's incredibly quick, too.
http://code.google.com/apis/maps/documentation/geocoding/index.html
http://www.postcodeanywhere.co.uk should be able to help with this. Alternatively, you can buy the "PAF" (Postcode Address File) from the Royal Mail, but it is expensive.
Update for information relating UK geolocations in 2020. Since 2009:
Google's Geocoder has gotten an order of magnitude more expensive in 2018. It's ~0.5c per search with no free tier
Office for National Statistics have released a free postcode directory called ONSPD. This means if you have the postcode of your address, you can resolve a geolocation accurate to the postcode centroid (this may be 10-100m or so out). There's a free public service API available at https://postcodes.io which allows you to forward or reverse geocode a postcode. There are also public docker data and application images which allow you to host this easily
If you're interested in Rooftop accurate geocodes, a change in Ordnance Survey licensing in 2020 has meant its much simpler and cheaper to access geolocations for almost every premise in Great Britain from Ordnance Survey by combining it with Royal Mail PAF (Postcode Address File). As of September 2020, I think https://ideal-postcodes.co.uk is currently the only company to offer complete and authoritative rooftop geolocations under these new rules. It's likely other PAF vendors will catch up over the coming years.
Disclaimer: I'm the author of postcodes.io and work for ideal-postcodes.co.uk
Given a latitude and longitude, what is the easiest way to find the name of the city and the US zip code of that location.
(This is similar to https://stackoverflow.com/questions/23572/latitude-longitude-database, except I want to convert in the opposite direction.)
Related question: Get street address at lat/long pair
Any of the online services mentioned and their competitors offer "reverse geocoding" which does what you ask--convert lon/lat coordinates into a street address of some-sort.
If you only need the zip codes and/or cities, then I would obtain the Zip Code database and urban area database from the US Census Bureau which is FREE (paid for by your tax dollars). http://www.census.gov/geo/www/cob/zt_metadata.html.
From there, you can either come up with your own search algorithm for the spatial data or make use of one of a spatial databases such as Microsoft SQL Server, PostGIS, Oracle Spatial, ArcSDE, etc.
Update: The 2010 Census data can be found at:
http://www2.census.gov/census_2010/
This is the web service to call.
http://developer.yahoo.com/search/local/V2/localSearch.html
This site has ok web services, but not exactly what you're asking for here.
http://www.usps.com/webtools/
You have two main options:
Use a Reverse Geocoding Service
Google's can only be used in conjunction with an embedded Google Map on the same page, so I don't recommend it unless that is what you are doing.
Yahoo has a good one, see http://developer.yahoo.com/search/local/V3/localSearch.html
I've not used OpenStreetMap's. Their maps look very detailed and thorough, and are always getting better, but I'd be worried about latency and reliability, and whether their address data is complete (address data is not directly visible on a map, and OpenStreetMap is primarily an interactive map).
Use a Map of the ZIP Codes
The US Census publishes a map of US ZIP codes here. They build this from their smallest statistical unit, a Census Block, which corresponds to a city block in most cases. For each block, they find what ZIP code is most common on that block (most blocks have only one ZIP code, but blocks near the border between ZIP codes might have more than one). They then aggregate all the blocks with a given ZIP code into a single area called a Zip Code Tabulation Area. They publish a map of those areas in ESRI shapefile format.
I know about this because I wrote a Java Library and web service that (among other things) uses this map to return the ZIP code for a given latitude and longitude. It is a commercial product, so it won't be for everyone, but it is fast, easy to use, and solves this specific problem without an API. You can read about this product here:
http://askgeo.com/database/UsZcta2010
And about all of your geographic offerings here:
http://askgeo.com
Unlike reverse geocoding solutions, which are only available as Web APIs because running your own service would be extremely difficult, you can run this library on your own server and not depend on an external resource.
If you call volume to the service gets up too high, you should definitely consider getting your own set of postal data. In most cases, that will provide all of the information that you need, and there are plenty of db tools for indexing location data (i.e. PostGIS for PostgreSQL).
You can buy a fairly inexpensive subscription to zipcodes with lat and long info here: http://www.zipcodedownload.com/
Or google's reverse geocoding
link
http://maps.google.com/maps/geo?output=xml&q={0},{1}&key={2}&sensor=true&oe=utf8
where 0 is latitude 1 is longitude
geonames has an extensive set of ws that can handle this (among others):
http://www.geonames.org/export/web-services.html#findNearbyPostalCodes
http://www.geonames.org/export/web-services.html#findNearbyPlaceName
Another reverse geocoding provider that hasn't been listed here yet is OpenStreetMap: you can use their Nominatim search service.
OSM has the (potentially?) added bonus of being entirely user editable (wiki-like) and thus having a very liberal licencing scheme of all this data. Think of this of open source map data.