Information for localization

Information for localization - localization

Is there a list with complete information about caracteristics like:
currency
date and time (including if it is 12 or 24 hours) format
measurement units (distance, speed, temperature...)
preferred language
masks for phone and local documents
timezones (at least the main ones / variations if daylight saving time is applicable)
decimal and thousand separators
for countries around the world?
I am doing it myself, however, as it takes too long to gather the data, I tought maybe someone have already have it done.

Don't reinvent the wheel.
Start with CLDR, the Common Locale Data Repository (http://cldr.unicode.org/)
Or if you want to honor the locale preferences in your application, use standard I18N APIs (from you platform, whatever that is, or a popular library, like ICU, http://site.icu-project.org/)

For currencies you can rely on international standard ISO 4217. It also refers to the country code of each currency code. This website provides this dataset for download.
For date formats, the best reference seems to be wikipedia.
The measurement units is a very complex domain, because you need to know which dimension you measure (speed, distance, volume, ...) and the units (paper size in cm is not the same as road distance in km). Here you have some lists per type of units, but not per country. This website shows a list of system of measurements in use per country. You'll see that fortunately ùany of them share the metric system, so taht you could use an approach "by exception" documenting yourself only on the remaining ones".
For languages, you have international standard ISO 639 or IANA , but it's country independent. You can look at reference lists for locale such as here: it associates a language code to a country code, so that you could complete the standard information. Note that some countries have several language, and you cannot and should not decide which one is preferred.
For telephone masks, there is only an international list of prefix. The usage vary greately accross countries. Some have fixed format, some use variable formats, some have zone prefixes and some not. Sometimes there is even no clear standard in the country and there are several coexisting usages. I'm not aware of any global list of these.
For timezones around the world, you could have a look at IANA which is extremely comprehensive.
For decimal and thousand separators, it's not an international standard. Again I'd suggest to refer to Wikipedia

Related

How can I change the formatter's decimal separator in Rust?

The function below results in "10.000". Where I live this means "ten thousand".
format!("{:.3}", 10.0);
I would like the output to be "10,000".

There is no support for internationalization (i18n) or localization (l10n) baked in the Rust standard library.
There are several reasons, in no particular order:
a locale-dependent output should be a conscious choice, not a default,
i18n and l10n are much more complicated than just formatting numbers,
the Rust std aims at being small.
The format! machinery is going to be used to write JSON or XML files. You really do NOT want to end up with a differently formatted file depending on the locale of the machine that encoded it. It's a recipe for disaster.
The detection of locale at run-time is also optimization unfriendly. Suddenly you cannot pre-compute things at compile-time (even partially), you cannot even know which size of buffer to allocate at compile-time.
And this ties in with a dubious usefulness. Dates and numbers are arguably important, however this American vs English formatting war is ultimately a drop in the ocean. A French grammar schooler will certainly appreciate that the number is formatted in the typical French format... but it will be of no avail to her if the surrounding text is in English (we French are notoriously bad at teaching/learning foreign languages). Locale should influence language selection, sorting order, etc... merely changing the format of numbers is pointless, everything should switch with it, and this requires much more serious support (check gettext for a C library that provides a good base).
Basing the detection of the locale on the host locale, and it being global to the whole process, is also a very dubious architectural choice in this age of multi-threaded web servers. Imagine if Facebook was served in Swedish in Europe just because its datacenter is running there.
Finally, all this language/date/... support requires a humongous amount of data. ICU has several dozens (or is it hundreds?) of MBs of such data embedded inside it. This would make the size of the std explode, and make it completely unsuitable for embedded development; which probably do not care about this anyway.
Of course, you could cut down on this significantly if you only chose to support a handful of languages... which is yet another argument for putting this outside the standard library.

Since the standard library doesn't have this functionality (localization of number format), you can just replace the dot with a comma:
fn main() {
println!("{}", format!("{:.3}", 10.0).replacen(".", ",", 1));
}
There are other ways of doing this, but this is probably the most straightforward solution.

This is not the role of the macro format!. This option should be handle by Rust. Unfortunately, my search lead me to the conclusion that Rust don't handle locale (yet ?).
There is a library rust-locale, but they are still in alpha.

Common sense when storing currencies?

After reading up on how to best handle users in multiple timezones properly, I've learned that the way to go is to store all dates in an normalized, application-wide timezone - UTC and then apply the diff between the normalized timezone and the individual users timezone when outputting. Today I came to think if this would be appropriate to apply this approach to handling currency in software:
All stored currency are converted to a application-wide currency, lets say EUR (€), and when outputting, the currency is converted back into the users own currency, with an updated exchange rate of the day?
What's common sense here? How is this generally solved and what should I be aware of before choosing a way to handle this?

One standard approach is to store both an amount and a currency whenever monetary values are held and manipulated.
See the Money Pattern in Martin Fowler's Patterns of Enterprise Application Architecture.
Fowler describes defining a simple datatype to hold the two primitive components, with overloaded arithmetical operators for performing monetary operations:
"The basic idea is to have a Money class with ﬁelds for the numeric
amount and the currency. You can store the amount as either an
integral type or a ﬁxed decimal type. The decimal type is easier for
some manipulations, the integral for others. You should absolutely
avoid any kind of ﬂoating point type, as that will introduce the kind
of rounding problems that Money is intended to avoid. Most of the time
people want monetary values rounded to the smallest complete unit,
such as cents in the dollar. However, there are times when fractional
units are needed. It’s important to make it clear what kind of money
you’re working with, especially in an application that uses both
kinds. It makes sense to have different types for the two cases as
they behave quite differently under arithmetic.
Money needs arithmetic operations so that you can use money objects as
easily as you use numbers. But arithmetic operations for money have
some important differences to money operations in numbers. Most
obviously, any addition or subtraction needs to be currency aware so
you can react if you try to add together monies of different
currencies. The simplest, and most common, response is to treat the
adding together of disparate currencies as an error. In some more
sophisticated situations you can use Ward Cunningham’s idea of a money
bag. This is an object that contains monies of multiple currencies
together in one object. This object can then participate in
calculations just like any money object. It can also be valued into a
currency."

The difference between handling time and currency, is that time zones doesn't shift in value.
When handling monetary values you have to consider what the currency of the actual money is. If the actual money is in USD and you store it as EUR, you will get a discrepancy between the actual value and the stored value when their values shift.
Alternatively you would have to recalculate all values when the exchange rate is updated, but that would take away the purpose of storing the values in a single currency.

Is there a "proper" order for listing languages?

Our application is being translated into a number of languages, and we need to have a combo box that lists the possible languages. We'd like to use the name of the language in that language (e.g. Français for French).
Is there any "proper" order for listing these languages? Do we alphabetize them based on their English names?
Update:
Here is my current list (I want to explore the Unicode Collating Algorithm that Brian Campbell mentioned):
"العربية",
"中文",
"Nederlands",
"English",
"Français",
"Deutsch",
"日本語",
"한국어",
"Polski",
"Русский язык",
"Español",
"ภาษาไทย"
Update 2: Here is the list generated by the ICU Demonstration tool, sorting for an en-US locale.
Deutsch
English
Español
Français
Nederlands
Polski
Русский язык
العربية
ภาษาไทย
한국어
中文
日本語

This is a tough question without a single, easy answer. First of all, by default you should use the user's preferred language, as given to you by the operating system, if that is one of your available languages (for example, in Windows, you would use GetUserPreferredUILanguages, and find the first one on that list that you have a translation for).
If the user still needs to select a language (you would like them to be able to override their default language, or select another language if you don't support their preferred language), then you'll need to worry about how to sort the languages. If you have 5 or 10 languages, the order probably doesn't matter that much; you might go for sorting them in alphabetical order. For a longer list, I'd put your most common languages at the top, and perhaps the users preferred languages at the top as well, and then sort the rest in alphabetical order after that.
Of course, this brings up how to sort alphabetically when languages might be written in different scripts. For instance, how does Ελληνικά (Ellinika, Greek) compare to 日本語 (Nihongo, Japanese)? There are a few possible solutions. You could sort each script together, with, for instance, Roman based scripts coming first, followed by Cyrillic, Greek, Han, Hangul, and so on. Or you could sort non-Roman scripts by their English name, or by a Roman transliteration of their native name. Probably the first or third solution should be preferred; people may not know the English name for their language, but many languages have English transliterations that people may know about. The first solution (each script sorted separately) is how the Mac OS X languages selection works; the second (sorted by their Roman transliteration) appears to be how Wikipedia sorts languages.
I don't believe that there is a standard for this particular usage, though there is the Unicode Collation Algorithm which is probably the most common standard for sorting text in mixed scripts in a relatively language-neutral way.

I would say it depends on the length of your list.
If you have 5 languages (or any number which easily fits into the dropdown without scrolling) then I'd say put your most common language at the top and then alphabetize them... but just alphabetizing them wouldn't make it less user friendly IMHO.
If you have enough the you'd need to scroll I would put your top 3 or 5 (or some appropriate number of) most common languages at the top and bold them in the list then alphabetize the rest of the options.
For a long list I would probably list common languages twice.
That is, "English" would appear at the top of the list and at the point in the alphabetized list where you'd expect.
EDIT: I think you would still want to alphabetize them according so how they're listed... that is "Espanol" would appear in the E's, not in the S's as if it were "Spanish"
Users will be able to pick up on the fact that languages are listed according to their translated name.
EDIT2: Now that you've edited to show the languages you're interested in I can see how a sort routine would be a bit more challenging!

The ISO has codes for languages (here's the Library of Congress description), which are offered in order by the code, by the English name, and by the French name.
It's tricky. I think as a user I would expect any list to be ordered based on how the items are represented in the list. So as much as possible, I would use alphabetical order based on the names you are actually displaying.
Now, you can't always do that, as many will use other alphabets. In those cases there may be a roman-alphabet way of transliterating the name (for example, the Pinyin system for Mandarin Chinese) and it could make sense to alphabetize based on that. However, romanization isn't a simple subject; there are at least a dozen ways for romanizing Arabic, for example.

You could alphabetize them based on their ISO 639 language code.

.NET Currency exponent ISO_4217

I'm developing something for international use. Wondering if anyone can shed any light on whether the CultureInfo class has support for finding currency exponents for particular countries, or whether I need to feed this data in at the database level.
I can't see any property that represents this at the minute, so if anyone knows definitively if it exists, before I look for it / buy it from ISO.
Currency Exponent is the minor units of the currency.
http://en.wikipedia.org/wiki/ISO_4217 - e.g. UK is "2"

Take a look at this blog post on getting CultureInfo for a region. Basically, Window and .NET know about the user's region but not their currency. A region implies a currency, but a country can have more than currency. For example, a person in Cambodia would more than likely want to enter and use USD than Riel. If possible, when capturing any currency amount in a multi-currency system you should capture the currency ISO code.
If you just want to make a quick guess, you can create a CultureInfo object and use it's NumberDecimalDigits property. The also creates a problem when countries switch currencies. For example, if Belarus joins the EU, then it's currency would change from BYR to EUR. It's currency symbol and exponent will be out of date.

I looked at this question and provided a solution which may or may not meet your needs here: http://www.codeproject.com/KB/recipes/MoneyTypeForCLR.aspx#CurrencyType
The short of it: I implemented the ISO spec as a custom type using the spec itself to generate the values. Obviously this would need to be regularly updated in production...

What things should be localized in an application

When thinking about what areas should be taken into account for a localized version of an application a number of things pop up right away:
Text display
Date and time
Units
Numbers and decimals
User input formats
LeftToRight support
Dialog and control sizes
Are there other things/areas to remember or keep in mind when building a localizable application? Are there any resources out there which provide a listing of best practices not just for text localization but for all things around localization?

After Kudzu's talk about l10N I left the room with way more questions then I had before and none of my old questions answered. But it gave me something to think about and brought the message "depends on how far you can/want to go" accross.
Translate text bodies with aforementioned things
Test all your controls for length/alignment in LTR/RTL, TTB(TopToBottom) BTT and all it's combinations.
Look out for special characters and encodings
Look out for combinations of different alignments (LTR, RTL, TTB, BTT) and how they effect punctuation and quotation signs.
Align controls according to text alignment (Hebrew Win has its start menu at the right
Take string lengths into account. They can overflow in other languages.
Put labels at the correct side of icons (LTR, TTB etc)
Translate language selection controls
No texts in images (can't be translated)
Translate EVERYTHING (headers, logos, some languages use different brand names, product names etc)
Does the region have a 24:00 or a 00:00 (changes the AM/PM that goes with it too)
Does the region use AM/PM or the 24:00 system
What calendar system are they using
What digit is for what part of the date (day, month, year in all its combinations)
Try to avoid "copying [number] files" equivalents. Some regions have different rules about changing words according to quantities. (This is an extremely complicated topic that I will elaborate on if desired)
Translate sentences, not words. Syntax rules are too complicated to put in your business logic.
Don't use flags for regions. Languages != countries
Consider what languages / dialects you can support (e.g. India has a gazillion of languages)
Encoding
Cultural rules (some western images displaying business woman can be near offensive in some other cultures)
Look out for language generalizations (e.g. boot(UK) != boot(US))
Those are the ones from the top of my head. The list just went on and on...

Don't forget the overhead of converting all documentation and help files.

a couple hints from my J2ME apps days:
don't translate separate words, translate whole phrases, even if there are matching repetitions. You'll later have to translate to a language where words have to be modified differently in different contexts and you may end up with an analog of "color: greenish"
Right2Lelf includes numbering of lists, alignment, and alternative scroll bars
Arabic languages write the same letter differently based on surrounding letters. You can't just print a string from a character buffer, you'll need a special control to output those or support from you platform
alphabetical sorting is HARD. No native Chinese could ever explain me the rules, but they will always spot wrongly sorted words. There appear to be a number of options to sort Chinese. I guess other languages may have the same problem

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart