Is there a "proper" order for listing languages? - localization

Our application is being translated into a number of languages, and we need to have a combo box that lists the possible languages. We'd like to use the name of the language in that language (e.g. Français for French).
Is there any "proper" order for listing these languages? Do we alphabetize them based on their English names?
Update:
Here is my current list (I want to explore the Unicode Collating Algorithm that Brian Campbell mentioned):
"العربية",
"中文",
"Nederlands",
"English",
"Français",
"Deutsch",
"日本語",
"한국어",
"Polski",
"Русский язык",
"Español",
"ภาษาไทย"
Update 2: Here is the list generated by the ICU Demonstration tool, sorting for an en-US locale.
Deutsch
English
Español
Français
Nederlands
Polski
Русский язык
العربية
ภาษาไทย
한국어
中文
日本語

This is a tough question without a single, easy answer. First of all, by default you should use the user's preferred language, as given to you by the operating system, if that is one of your available languages (for example, in Windows, you would use GetUserPreferredUILanguages, and find the first one on that list that you have a translation for).
If the user still needs to select a language (you would like them to be able to override their default language, or select another language if you don't support their preferred language), then you'll need to worry about how to sort the languages. If you have 5 or 10 languages, the order probably doesn't matter that much; you might go for sorting them in alphabetical order. For a longer list, I'd put your most common languages at the top, and perhaps the users preferred languages at the top as well, and then sort the rest in alphabetical order after that.
Of course, this brings up how to sort alphabetically when languages might be written in different scripts. For instance, how does Ελληνικά (Ellinika, Greek) compare to 日本語 (Nihongo, Japanese)? There are a few possible solutions. You could sort each script together, with, for instance, Roman based scripts coming first, followed by Cyrillic, Greek, Han, Hangul, and so on. Or you could sort non-Roman scripts by their English name, or by a Roman transliteration of their native name. Probably the first or third solution should be preferred; people may not know the English name for their language, but many languages have English transliterations that people may know about. The first solution (each script sorted separately) is how the Mac OS X languages selection works; the second (sorted by their Roman transliteration) appears to be how Wikipedia sorts languages.
I don't believe that there is a standard for this particular usage, though there is the Unicode Collation Algorithm which is probably the most common standard for sorting text in mixed scripts in a relatively language-neutral way.

I would say it depends on the length of your list.
If you have 5 languages (or any number which easily fits into the dropdown without scrolling) then I'd say put your most common language at the top and then alphabetize them... but just alphabetizing them wouldn't make it less user friendly IMHO.
If you have enough the you'd need to scroll I would put your top 3 or 5 (or some appropriate number of) most common languages at the top and bold them in the list then alphabetize the rest of the options.
For a long list I would probably list common languages twice.
That is, "English" would appear at the top of the list and at the point in the alphabetized list where you'd expect.
EDIT: I think you would still want to alphabetize them according so how they're listed... that is "Espanol" would appear in the E's, not in the S's as if it were "Spanish"
Users will be able to pick up on the fact that languages are listed according to their translated name.
EDIT2: Now that you've edited to show the languages you're interested in I can see how a sort routine would be a bit more challenging!

The ISO has codes for languages (here's the Library of Congress description), which are offered in order by the code, by the English name, and by the French name.
It's tricky. I think as a user I would expect any list to be ordered based on how the items are represented in the list. So as much as possible, I would use alphabetical order based on the names you are actually displaying.
Now, you can't always do that, as many will use other alphabets. In those cases there may be a roman-alphabet way of transliterating the name (for example, the Pinyin system for Mandarin Chinese) and it could make sense to alphabetize based on that. However, romanization isn't a simple subject; there are at least a dozen ways for romanizing Arabic, for example.

You could alphabetize them based on their ISO 639 language code.

Related

List of iOS fonts grouped by supported script (ISO 15924)

So, obviously there's iosfonts.com which has been incredibly helpful, but how can I determine that, for example, HiraKakuProN-W3 contains the code points for Japanese, (Jpan, 413 in ISO 15924)
Furthermore, I'd like to know more specific information. I imagine that, continuing the example, HiraKakuProN contains the characters for Hiragana and Katakana, but does it also contain all the CJK unified ideographs, just the ones needed for Japanese, or none of them?
Where can I find exhaustive tables of unicode characters per language (IETF language tag)? It's easy to find a listing of all Hani characters, but Unicode (and the Hani code point table) doesn't make a distinction between Hans, Hant, Jpan, etc. I ask this because, if there is no readily available info on which iOS font is for which language, I will programmatically determine this myself, but will need to know what characters to look for.
Thanks for any leads.
The list of supported ScriptCodes for Arial Unicode ( The most polyvalent font as far as I know) is there :
http://en.wikipedia.org/wiki/Arial_Unicode_MS
From this site, you can find a link to fonts supporting a given ScriptCode.
But it may need some font installations.
I hope it helps… This is a complex domain ;)
http://scriptsource.org/cms/scripts/page.php

Locale-specific lookup table

I'm using a lookup table for optimizing an algorithm that works on single characters. Currently I'm adding a..z, A..Z, 0..9 to the lookup table. This works fine in european countries, but in asian countries it doesn't make much sense.
My idea was that I could perhaps use the characters in the windows default code page as an alphabet for the lookup table.
Pseudocode:
for Ch in DefaultCodePage.Characters do
LookupTable.Add (Ch, ComputeValue (Ch));
What do you think and how could this be achieved? Any alternative suggestions?
As you mentioned, it does not make much sense for different scripts. It may only make some sense for alphabet-based languages.
BTW. A-Z is not enough for most of European languages.
I don't quite know what you are doing and what you need this look-up table for but it seems that what you are looking for are Index Characters. You could find such information in CLDR – look for indexCharacters. The resources for various languages are available here.
The only problem you'll face that in fact for some languages Index Characters tend to be Latin based. That is just because these languages do not actually have them... In that case you might want to use so called Exemplar Characters instead but please be warned that it might be just not enough for some use cases.

If you have an application localized in pt-br and pt-pt, what language you should choose if the system is reporting only "pt" code?

If you have an application localized in pt-br and pt-pt, what language you should choose if the system is reporting only pt code (generic Portuguese)?
This question is independent of the nature of the application, desktop, mobile or browser based. Let's assume you are not able to get region information from another source and you have to choose one language as the default one.
The question does apply as well for more case including:
pt-pt and pt-br
en-us and en-gb
fr-fr and fr-CA
zh-cn, zh-tw, .... - in fact in this case I know that zh can be used as predominant language for Simplified Chinese where full code is zh-hans. For Traditional Chinese, with codes like zh-tw, zh-hant-tw, zh-hk, zh-mo the proper code (canonical) should be zh-hant.
Q1: How to I determine the predominant languages for a specified meta-language?
I need a solution that will include at least Portuguese, English and French.
Q2: If the system reported Simplified Chinese (PRC) (zh-cn) as preferred language of the user and I have translation only for English and Traditional Chinese (en,zh-tw) what should I choose from the two options: en or zh-tw?
In general you should separate the "guess the missing parameters" problem from the "matching a list of locales I want vs. a list of locales I have" problem. They are different.
Guessing the missing parts
These are all tricky areas, and even (potentially) politically charged.
But with very few exceptions the rule is to select the "original country" of the language.
The exceptions are mostly based on population.
So fr-FR for fr, es-ES, etc.
Some exceptions: pt-BR instead of pt-PT, en-US instead of en-GB.
It is also commonly accepted (and required by the Chinese standards) that zh maps to zh-CN.
You might also have to look at the country to determine the script, or the other way around.
For instance az => az-AZ but az-Arab => az-Arab-IR, and az_IR => az_Arab_IR
Matching 'want' vs. 'have'
This involves matching a list of want vs. a list of have languages.
Dealing with lists makes it harder. And the result should also be sorted in a smart way, if possible. (for instance if want = [ fr ro ] and have = [ en fr_CA fr_FR ro_RO ] then you probably want [ fr_FR fr_CA ro_RO ] as result.
There should be no match between language with different scripts. So zh-TW should not fallback to zh-CN, and mn-Mong should not fallback to mn-Cyrl.
Tricky areas: sr-Cyrl should not fallback to sr-Latn in theory, but it might be understood by users. ro-Cyrl might fallback to ro-Latn, but not the other way around.
Some references
RFC 4647 deals with language fallback (but is not very useful in this case, because it follows the "cut from the right" rule).
ICU 4.2 and newer (draft in 4.0, I think) has uloc_addLikelySubtags (and uloc_minimizeSubtags) in uloc.h. That implements http://www.unicode.org/reports/tr35/#Likely_Subtags
Also in ICU uloc.h there are uloc_acceptLanguageFromHTTP and uloc_acceptLanguage that deal with want vs have. But kind of useless as they are, because they take a UEnumeration* as input, and there is no public API to build a UEnumeration.
There is some work on language matching going beyond the simple RFC 4647. See http://cldr.unicode.org/development/design-proposals/languagedistance
Locale matching in ActionScript at http://code.google.com/p/as3localelib/
The APIs in the new Flash Player 10.1 flash.globalization namespace do both tag guessing and language matching (http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/globalization/package-detail.html). It works on TR-35 and can look beyond the # and consider the operation. For instance, if have = [ ja ja#collation=radical ja#calendar=japanese ] and want = [ ja#calendar=japanese;collation=radical ] then the best match depends on the operation you want. For date formatting ja#calendar=japanese is the better match, but for collation you want ja#collation=radical
Do you expect to have more users in Portugal or in Brazil? Pick accordingly.
For your general solution, you find out by reading up on Ethnologue.

Coding in Other (Spoken) Languages

This is something I've always wondered, and I can't find any mention of it anywhere online. When a shop from, say Japan, writes code, would I be able to read it in English? Or do languages, like C, PHP, anything, have Japanese translations that they write?
I guess what I'm asking is does every single coder in the world know enough English to use the exact same reserved words I do?
Would this code:
If (i < size){
switch
case 1:
print "hi there"
default:
print "no, thank you"
} else {
print "yes, thank you"
}
display the exact same as I'm seeing it right now in English, or would some other non-English-speaking person see the words "if", "switch", "case", "default", "print", and "else" in their native language?
EDIT - yes, this is serious. I didn't know if different localizations of a language have different keywords. or if there are even different localizations at all.
If I understood well the question actually is: "does every single coder in the world know enough English to use the exact same reserved words as I do?"
Well.. English is not the subject here but programming language reserved words. I mean, when I started about 10 yrs ago, I didn't have any clue of English, and still I was able to program simple things by learning the programming language, even when I did not know what they meant ( in English ). As a matter of fact this helped me to learn English.
For example. I know to do an "iteración" ( iteration of course ) I had to write:
for( i = 0 ; i < 100 ; i++ ) {}
To me, the "for", the ";" and the "++" were simple foreign words or symbols. Later I learned that "for" meant "para", "while" meant "mientras", etc. But, in the meantime, I did not need to know English, what I needed was to know was "C".
Of course when I needed to learn more things, I had to learn English, for the documentation is written in that language.
So the answer is: No, I don't see if, while, for etc. in my native language. I see them in English, but they didn't mean to me any other thing that they meant for the programming language in turn.
Is like switch statement in bash: case .. esac. What Is "esac"... for me the end of the switch statement in bash.
I guess that's what we call "abstraction"
In the Java language some methods must be named (at least partially) using the English language because of the JavaBeans convention.
This convention requires that a property X be established via a pair of getX() and setX() methods. Here in French-Canada, where some developers are obliged to code in the French language this leads to the following travesty:
interface Foo {
Color getCouleur();
void setCouleur(Color couleur);
}
I'm having trouble finding references, but I'm reminded of three stories.
A Lisp hacker defends meaningless functions like "cdr" and "car" by comparing them to programming in your non-native language:
http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg01171.html
When Yukihiro Matsumoto ("Matz") started developing Ruby, he used english keywords even though he was writing all the documentation in Japanese!. There was no English documentation for Ruby for a couple years, and very few Americans using the language. But now it's a world-class language, and it the fact that it was born in Japan is only of historical interest. If the language had been using keywords in hiragana, it would have had a much more difficult time gaining popularity.
I read an essay once -- maybe someone else can find it, Google is no help today -- that suggested that translating keywords was misguided because the words aren't actually English-- they're jargon. Not only do (to use the examples above) para and pour not quite have the exact meaning that for has in English, to non-programmers the phrase "for loop" is jibberish. Even Americans have to learn a new meaning. So to translate the words's superficial meaning into another language is more like making a cross-language pun rather than actually being helpful.
I really have not thought too much about programming in Japanese before, but here we go, using the question's code sample.
Using only the language statements in Japanese with the variables in English:
// In Japanese, it makes more sense to put the keywords/modifiers as
// postfix expressions rather than prefix expressions.
(i < size)か {
(l[i])は {
1だ:
「もしもし。」を書く;
省略時値:
「いいえ、いいですよ。」を書く;
}
} ない {
「はい、ありがとうございます。」を書く;
}
As many people already pointed out, in most programming languages you just have to learn a few keywords, so it doesn't matter that much if they're in English (or a language other than yours, for that matter). It's just a symbol you associate with some construct. For instance, in VB you have "THEN", which in many C-style languages would be "{" and it doesn't make a big difference in readability (well, at least that's how I see it, being a Non-English native speaker).
But where things can sometimes get hairy, and where the choice of (natural) language matters is in naming identifiers. If the names of variables, functions, classes, etc, don't have a meaningful name for you because of a language barrier, following even the simplest code can be rather challenging.
I remember someone once gave me a short snippet of Actionscript taken from some blog. The names were in German and since I don't speak a word of that language, stuff could have been called var_123, var_562 or func_333 as well (and probably it would have been easier for me to remember the names or at least to have a chance of spelling them right without copying and pasting). Since this was a short, self-contained snippet, I used an online translator to give those vars and functions meaningful names in my native language (Spanish) and after that, everything was clear. The point is that the code was actually simple, but I was only able to make sense out of it without too much (unnecessary) extra effort just when I overcame the language barrier.
Since then, I've switched to using English for naming identifiers. Whether you like it or not, it's the "koine" for programming, engineering and generally technical stuff. Most of the APIs are written in English and so is most documentation (and probably the best resources you can find are in English as well). As a nice aside, it keeps your code more coherent with the code you're likely to be interacting with, and I think it tends to be more compact and succinct than other languages like Spanish (which otherwise would be my natural choice).
Of course, if you can't understand at least some English, the problem remains the same, so it's not a perfect solution. But, given a number of developers from many different countries, chances are that the common language for them to communicate (through code and of course other means) will be English. So, choosing English is perhaps the best option, even though it would be not the perfect solution to this problem.
The programming language defines keywords and standard class names, and it's best practice to give user defined types, variables and functions also English names (as a non-native speaker I can tell ;-).
So yes, if all is well, you'll be able to read the code.
However languages like Java and Perl allow the full Unicode set for identifiers, so if somebody writes his class names in Kanji, you'll likely have a problem.
Update: For Perl there's a joke module that allows you to write Perl in Latin. But it's really just that, a joke. Nobody uses things like this seriously.
Second Update: The idea of localized programming languages isn't that ridiculous. Excel's macro language is localized, but luckily it's stored in one canonical language (English) in the file, so the localization is just a layer on top of the normal thing. Such things only make sense for small "programs", for "real" programs it becomes hard to maintain.
Actually there are some Non-English-based programming languages (Wikipedia)
I'm Norwegian but I've allways used English for all code except output (ignoring some silly code from school). Actually I usually write everything in English and then translate it to my native language, using gettext (or something).
I am British and a problem we often run into is the American/British spelling clash. This often occurs with programming related terms such as Initialise() or Initialize(), Analyse() or Analyze() etc. This can (has) lead to problems trying to overriding methods, and is sometimes difficult to spot.
Since the framework (in our case C#) was designed by Americans, we found that it is best to be consistent and use American spellings. We even adopt Color.
We have a mix of nationalities in our development teams and most non-British people tend towards American spellings naturally.
AppleScript was once available in French and Japanese dialects. I do not know why it was withdrawn.
Taking this to the next level, what about being able to substitute symbols?
After seeing languages like Brainf**k and Whitespace I thought of making a language like this: it'd be identical to C except you use closing braces to open, opening braces to close, swap the meanings of + and -, * and /, ; and :, > and <, etc.
The concept is nothing more than a gimmicky altered C compiler. But, like thinking of keywords differently, it challenges you to rethink some basic assumptions if you've never thought of such things before. Ex:
int foo)int i, char c( }
int six = 2 / 3:
int two = six + 4:
if )i > 0( }
printf)"i is negative"(:
{
{
I'm in a French team developing a software system in C#. Despite the fact that the programming language keywords are ostensibly English, I imagine that you would have great difficulty reading the code as all the function names, variables, code comments, database tables and columns, technical specifications, protocols and so on, are all in French, including those lovely accented characters ç, é, è, ù, etc. I'm not even certain if the system would even run elsewhere due to localisation bugs, such as relying on the comma to be the default decimal seperator.
Otherwise, WinDev is a popular programming platform in France, and its programming language WLanguage has keywords in either French or English, see and example here : link text
The only language I saw localized is Excel with its macros. If you try to sum a column using an Italian version of Office you have to write SOMMA(A1:A10) and not SUM. That's a shame.
By the way, just because it's fun, here's how your code should look like with Italian keywords:
se (i < size){
commuta
caso 1:
stampa "hi there"
normalmente:
stampa "no, thank you"
} altrimenti {
stampa "yes, thank you"
}
i've seen VBA translated into spanish-like commands. it's one of the ugliest things ever seen. i would be ashamed to have something like this on my computer.
PD: i happen to think that spanish is a much nicer language than english; but translating is WRONG
Well, As others pointed-out, the keywords and system calls would likely remain in English.
However, understanding the keywords of the language is only a small part in understanding the code. Variable names, function names and comments all risk being in the native language of the author.
Edit: I just flashed-back to my youth where I went in the mapping tables of my TRS-80 built-in BASIC to switch the keywords to French. I could change all the keywords but I couldn't make any of them larger. Made for funny programs.
Don't make fun of this. Some years ago, Microsoft had announced G# (German Sharp) - C# with German keywords and API. Of course, it was an April Fools joke, but the entire site about that looked so real and professional (and was on microsoft.com). Scary.
At work, we use two field bus systems, both developed in German-speaking countries, which have a scary mix of German and English for identifiers, including some lovely false friends. It's a mess.
No, English keywords and identifiers are fine. Though some might argue if it should be Color or Colour :)
In several VBA project I've worked on (yes, very early in my career) we had to detect the version of office which was installed on the user's machine and change the formulas used in the speradsheets accordingly.
As i program in portuguese"SUM" would have to be translated into "SOMA" and so on and so forth. I just can't imagine the necessary work to make this happen in several languages. Has anyone else suffered with this problem?
There are some languages that have translated keywords. Excel formulas, for example. If you write some calculations in a spreadsheet, this will be in your language.
Fortunately, this is not a general practice, and even non-English speakers like me thank God that there is a standard language for keywords :
it's easier to share you work.
it prevent documentation from becoming a bigger nightmare that it already is.
English words and sentences are usually short and syntactically pragmatic. In literature, Latin languages are much more beautiful, but for technical stuff, English rocks.
And where to stop ? Can you imagine a C in ancient Greek ?
Keywords must stay in one language, and well, it started with English, let it stay that way. This could have been worst (Asian language ?). And so we have to write methods and comments in English. Ok, more work for us, but at least the international code base stay congruent.
There is, however, one case where using native language method names and comments can be a good practice : in third world country. I'm going to Senegal in some months to manage a Django project. Senegal have a huge analphabetization rate, and therefor it's already great that they spead energy in improving they programming knowledge. French is the native language here, so it would be inefficient to force them to learn computing AND a new tongue at the same time.
BTW, that would be your code with French keywords :
Si (i < taille) {
cas par cas :
cas 1:
afficher "salut"
défaut:
afficher "non merci"
} sinon {
afficher "oui, merci"
}
Not that translating the keywords have nothing to do with translating the strings. Of course, we have "hi, there" translated in our language. European coders even tend to use I18N much more than American sot their service can reach a wider audience.
Generally speaking, most programmers adapt to the English form.
I learned to program when I was 7 years old and only spoke Hebrew (which is right to left) and with no english, which made it quite a fascinating experience.
The problem you would usually get is with documentation, variables, and function names. I have seen my share of variables in other languages using english alphabet.
The only language I'm familiar with that actually got translated was good old Logo (still amazing to this day).
When I was a kid we went to France, and in a museum we went to, I remember finding a display which showed you how to write computer programmes. The language was some kind of BASIC variant and I distinctly remember it using POUR instead of FOR, and so on. I was 7 years old and had only just learned BASIC, and it seemed completely natural to me that the French would have their own dialect like this!!
I guess it may have been LSE that I saw?
Filemaker's scripting language is localized. The scripts (and data!) are stored in a terrible "sorta canonical" form.
So if you write a script in the American version, then open it up in the French version, all the keywords and built-in function names will be in French. But why won't it run?! Aha! The French version uses "," as the decimal point, and therefore to avoid ambiguity uses ";" to separate function arguments -- where the American version uses "." and "," respectively. This conversion you have to do yourself.
So you work through the incredibly bad script editing interface (you can't write scripts as text files) to fix all these things. It runs! Great! The results are all wrong! Oh no! Aha! The Jan-7-2004 date you entered in the American version is being interpreted as July-1-2004 -- apparently dates are not only displayed but stored in locale-dependent order. Am I kidding you? No.
[Note: Filemaker 8 and 9 may be sane -- I only ever worked with 3 - 7.]
Your question is an interesting one with regard to Perl because it's syntax is designed to follow (English) natural language. I wonder if that makes it more difficult for non-English speakers...
Of course, Perl and Perlers refuse to play by conventional rules. Mad scientist Damian Conway wrote the Lingua::Romana::Perligata module which uses the black magic of source filters to allow you to write Perl in latin!
Here in Australia we still need to spell colour like color.
However, I do find it annoying when other (Australian) developers, working on an Australian project, decide that internal variable names need to be spelt the american way.
It would be pointless, IMHO, to i18n a language syntax. It would just kill any sort of portability.
The only exception are educational languages, such as LOGO. They were designed for ease learning, so portability is not an issue.
I read a lot of code, but the problem always is at variable/method names and comments, if they are commenting their code on their own language, using a language special characters like Japanese or Cyrillic, we are in trouble! but the keywords I think they will stay in English as they are.
in Italian
se (i < dimensione){
scegli
caso 1:
stampa "ciao"
mancante:
stampa "no, grazie"
} altrimenti {
stampa "sì, grazie"
}
To confirm the worries of some previous poster I've seen a Fortran code with a macro include to translate all the keywords from English to French. Allow me not to continue on this.
I also had to work with a code simultaneously containing identifiers in Italian, German, English and French, not only because it was developed in many different places, but also because the main developer thought it was fun and helped him not to duplicate identifier names (of course, with a routine 2000 lines long....)
I think WordBasic was localized. WordBasic was used to write macro's for in Word before VBA was used.
If I remember it correctly, only WordBasic written in the English version would execute on all localized version. If you would write a Dutch version, you could only execute it on a Dutch Word.

What things should be localized in an application

When thinking about what areas should be taken into account for a localized version of an application a number of things pop up right away:
Text display
Date and time
Units
Numbers and decimals
User input formats
LeftToRight support
Dialog and control sizes
Are there other things/areas to remember or keep in mind when building a localizable application? Are there any resources out there which provide a listing of best practices not just for text localization but for all things around localization?
After Kudzu's talk about l10N I left the room with way more questions then I had before and none of my old questions answered. But it gave me something to think about and brought the message "depends on how far you can/want to go" accross.
Translate text bodies with aforementioned things
Test all your controls for length/alignment in LTR/RTL, TTB(TopToBottom) BTT and all it's combinations.
Look out for special characters and encodings
Look out for combinations of different alignments (LTR, RTL, TTB, BTT) and how they effect punctuation and quotation signs.
Align controls according to text alignment (Hebrew Win has its start menu at the right
Take string lengths into account. They can overflow in other languages.
Put labels at the correct side of icons (LTR, TTB etc)
Translate language selection controls
No texts in images (can't be translated)
Translate EVERYTHING (headers, logos, some languages use different brand names, product names etc)
Does the region have a 24:00 or a 00:00 (changes the AM/PM that goes with it too)
Does the region use AM/PM or the 24:00 system
What calendar system are they using
What digit is for what part of the date (day, month, year in all its combinations)
Try to avoid "copying [number] files" equivalents. Some regions have different rules about changing words according to quantities. (This is an extremely complicated topic that I will elaborate on if desired)
Translate sentences, not words. Syntax rules are too complicated to put in your business logic.
Don't use flags for regions. Languages != countries
Consider what languages / dialects you can support (e.g. India has a gazillion of languages)
Encoding
Cultural rules (some western images displaying business woman can be near offensive in some other cultures)
Look out for language generalizations (e.g. boot(UK) != boot(US))
Those are the ones from the top of my head. The list just went on and on...
Don't forget the overhead of converting all documentation and help files.
a couple hints from my J2ME apps days:
don't translate separate words, translate whole phrases, even if there are matching repetitions. You'll later have to translate to a language where words have to be modified differently in different contexts and you may end up with an analog of "color: greenish"
Right2Lelf includes numbering of lists, alignment, and alternative scroll bars
Arabic languages write the same letter differently based on surrounding letters. You can't just print a string from a character buffer, you'll need a special control to output those or support from you platform
alphabetical sorting is HARD. No native Chinese could ever explain me the rules, but they will always spot wrongly sorted words. There appear to be a number of options to sort Chinese. I guess other languages may have the same problem

Resources