Getting the human-readable abbreviations for admin1 level records in Geonames - geonames

For the United States, the admin1 code corresponds to the state abbreviation, which makes it very convenient to do a city, state lookup.
However, for countries like Canada where the admin1 code is a number (e.g. 01 for Alberta) it's no longer possible to do a city, state lookup. I looked at the postal codes file for Canada but it doesn't correspond the postal codes to the numerical (I think FIPS) code for the province. So even though the postal code file shows AB for Alberta it doesn't show 01 for that same record so there's no way to correlate the records.
To add insult to injury, in the postal codes file the dataset actually lists AB in the admin1 code field even though in the other file it is entered as 01. Very annoying.
I'm wondering if there are any data files out there that link the numerical FIPS codes for a country's admin1 record to its more human-readable abbreviation.

Related

Is there a function in Swift to get the ISO 3166-2 codes (provinces, states) for a country?

For an app I need the users to create addresses. It is important that the addresses are valid and include an ISO 3166 ALPHA-2 country code as well as an ISO 3166-2 code for the province, state, etc.
I can use NSLocale to get all country codes and the localized names for a country code but is there a similar way to get the codes and names (they doesn't have to be localized) for the subdivisions of a country or do I have to create that massive dictionary by myself? :D
Thank you in advance

Geolocation: How to derive the Country using an address/city/place?

I have a .csv file with Twitter profiles including information such as username, name, description etc. One column is geolocation. In this text the user may have a country (i.e., UK), a city or town (i.e., Cambridge), an actual address (5 Tyrian Place, WR5 TY1), a state (i.e, California, CA) or something silly (i.e., West of Hell).
Is there an API/library/automatic way of taking this information and deriving the country? For example, if the location is Cambridge the output should be UK, if the address is in the UK, the output should be UK, etc.
Google has a reverse geocoding service which you can access through their Maps API:
https://developers.google.com/maps/documentation/geocoding/start
They let you make 2500 free requests per day. One nice feature is it will give you correct latitude, longitude, state, country, etc for things like "Golden Gate Bridge" and "The Big Apple." Twitter users enter all sorts of (sarcastic) phrases for their location -- like "West of Hell," "Mars," etc -- and Google will reverse geocode that as well. Though, that may not be very useful.
As another level of checking, you can compare the user's timezone ("utc_offset"), if it is present, to the place that Google returns. It's a bit involved and requires that you compare the timezone's latitude boundaries to the latitude and longitude in Google's response.

Get ISO 3166-1 country code as a GMSPlace result

I need to implement a city search dialogue field in my app and use
the ISO 3166-1 country code of the search result.
The country in the result is as a string in the current locale (e.g. "United Kingdom", "Vereinigtes Königreich", ...) instead of just "uk", which is a problem. I could do a string search for all 248 countries in all languages google maps supports but that is very error prone.
Is there a way to include the country code in the search result?
My work around for now is to reverse lookup the reported coordinate afterwards but that's an additional request, resulting in wait time for the user.

Database, Currency Modeling, Localization strategies

What's the best strategy for storing e-commerce product information (i.e. product price, current price) in a localized-currency environment?
I came across an issue in Spree, an e-commerce engine for Ruby on Rails regarding the display of currency using localization, delimiters, precision digits, etc.
However, resolving the display of price became more complex, when we had to figure out if the storing of values in the database should include the localization delimiter / precision digits or be normalized. The solution involves localizing both the display of the value as well as potentially normalizing the stored value in the database. But I'm not sure if that's the standard practice (scrubbing the data to fit a "standard" precision and delimiter OR modifying the model to take in a "currency" field, and keeping the input standard.
CASE STUDY:
If a product from the USA (using "en" localization file) is priced at 2.99, then it is stored in the database as 2.99.
If the site updates to be localized for Germany (using "de" localization file), then it is priced at 2,99.
But should updates to that price (and cost_price) value be stored as 2.99, or 2,99? If they're stored at 2.99 and the value is returned back to the view from the model, then the localization will modify the value to be 2,99.
I'm hesitant to standardize user input without their knowledge. Is standardizing currency values normal, or should the model change to handle multi-currency formats?
An extra issue to note is that even though the Spree engine can change localization, I don't believe it can flip by user-demand yet. So it's not technically a "multi-currency" environment, I believe? I'd like to pick a choice that can scale.
RELATED QUESTIONS:
database design: accounts with multi currency
Currency modeling in database
The issue is that you have a product, selling in different exchange-regimes with different cultures. Say it's $1,450.00 USD in America, and €1111,11 in Germany. There are two main factors:
A. There are different prices in different currencies
B. There are different ways to display a money amount in different cultures
Regarding A, you could
store in one price/currency, and adjust to different exchange rates on the fly
or adjust nightly
or just have different prices in different countries
I would go with a table of prices, segregated by currency. updating nightly is probably reasonable:
ProductId Currency Price
1 EUR 1111.11
1 CAD 1436.65
1 USD 1450.00
These values should be numbers, so that you can easily do math on them if necessary. Use decimal(10,2) in your database
Regarding B
You should format the selected price to a given culture upon display. Imagine an American paying in Euros. What do they want to see? Your output would look like this, depending on the selected culture:
Say it's 1,111.11 Euros
Culture Price Long Name
de_de 1.111,11 (German)
fr_ca 1 111,11 (Quebec)
en_us 1,111.11 (US English)
It's all the same amount, just formatted differently, depending on the user's preferences.
If users are entering in different amounts, you will also have to parse their values based on the selected culture. Check out Yii's (sorry, PHP) L10N and I18N features.
Notes:
Whatever you do, don't store it as a float, or you will get subtle errors over time. Use the decimal type
Consider using 4 digits after the decimal place for fields that are the result of calculations

User input parsing - city / state / zipcode / country

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.
A common example would be what Google maps does.
Some examples of input would be:
"City, State, Country"
"City, Country"
"City, Zip Code, Country"
"City, State, Zip Code"
"Zip Code"
What would be an efficient and correct way to parse this input from a user?
If you are aware of any example implementations please share :)
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
If we have numbers in the pattern, then it's probably a zip code.
Is the item in the list of known states?
Countries are also fairly easy to handle like states, there's a limited number.
What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
I found a simple PHP implementation
http://www.eotz.com/2008/07/parsing-location-string-php/
Yahoo seems to have a webservice that offers the functionality (sort of)
http://developer.yahoo.com/geo/placemaker/
Openstreetmap seems to offer the same search functionality on its homepage
http://www.openstreetmap.org/
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
Zip is most obvious, so check for
that first.
State has 50x2 options
(California or CA), check that next
Country has ~190x2 options, depending
on how encompassing you want to be
(US, United States, USA).
Whatever is left over is probably your City.
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.

Resources