I need to handle Declensions. The app is in different language(Czech), where the words changes for singular or plural. and based on genders as well.
Example in English
1 item, 2 items, 5 items, ...
In target language => Czech Language
1 položka, 2 položky, 5 položek, ...
I have found few repositories that I am currently going through.
https://github.com/adamelliot/Inflections
https://github.com/mattt/InflectorKit
On android, there is a way to do it via xml. Is there any recommended way to handle this on iOS ? I don't want to use if's or switches.
Thank you for and suggestions.
Matti
In iOS (and other Apple platforms), plural declensions and other localized strings that change along with numeric values run through the same API as other localized strings. So you just call NSLocalizedString in your code (no extra ifs or switches), but you provide more supporting localized data — in addition to the Localizable.strings file that contains the main (number-independent) localized strings, you add a stringsdict file for the declensions.
Apple's docs run through this step-by-step: see Handling Noun Plurals and Units of Measurement in their Internationalization and Localization Guide. The example there is Russian, which IIUC has similar plural rules to Czech... but the stringsdict format supports the full set of Unicode CLDR Plural Rules if you need to handle more.
Related
I am trying to add Turkish support in my product. Turkish is agglutinative language. Which means that it tends to express concepts in complex words consisting of many elements, rather than by inflection or by using isolated elements.
Currently we have created keys for i18next like following:
tr/resourceExample.json
{
"comment":"Yorum",
"comment_plural":"Yorumlar",
"select_label":"{{label}} seç"
}
Whenever we want to add a sentence like "Select comments" we use
t("resourceExample:select_label",{label:t("resourceExample:comment_plural")})
Now this works properly for languages like English or Spanish. But for Turkish, the suffix of comment changes if the word is used with verb.
For example, our currently key structure will give output for Turkish following:
Yorumlar seç
But the actual expected result for Turkish is:
Yorumları seç
The reason behind keeping this structure is that we didn't want to create new keys for select_label because Select something is used in many places where something can be replaced by many different words.
So, my question is that is there any functionality in i18next which can help in this situation?
If i got you right, you can add custom format function.
i18next.services.formatter.add('objectify', (value, lng, options) => {
if(lng=='tr'){
//add suffix or any decorations here
value=value+"ı";
}
return value
})
Read more at i18next Docs
I am getting started learning iOS Stringsdict files and found some existing code on a project which used the following syntax:
<key>zero</key>
<string>You no message.</string>
As per the CLDR, zero is an invalid plural in English and we expect to use explicit plural rules (=0 when using ICU MessageFormat)
I tried to find how to use explicit plural rules in iOS Stringsdict files and could not find any way to achieve this. Can someone confirm if this is supported or not?
Example of solutions (I cannot test them but maybe someone can?)
<key>0</key>
<string>You no message.</string>
Or
<key>=0</key>
<string>You no message.</string>
Extra reference on explicit plural rules part of the CLDR implementation of ICU MessageFormat:
https://formatjs.io/guides/message-syntax/#plural-format
=value
This is used to match a specific value regardless of the plural categories of the current locale.
If you are interested in the zero rule only, it is handled in .stringsdict file for any language.
Source: Foundation Release Notes for OS X v10.9
If "zero" is present, the value is used for mapping the argument value zero regardless of what CLDR rule specifies for the numeric value.
Otherwise, these are the only rules handled (depends on language): zero, one, two, few, many, others
Short Answer
.stringsdict files have no way to support explicit plural rules (other than a custom Apple implementation of zero which is detailed below)
Detailed Answer
Normal CLDR implementation:
All rules that are not in the CLDR for a given language will be ignored
If using the rule zero, it will use the CLDR values (most languages have 0 as value for zero). This also includes languages like Latvian who have 20, 30, etc. values mapped to zero and also contradicts Apple's own documentation (this behavior was verified):
If "zero" is present, the value is used for mapping the argument value
zero regardless of what CLDR rule specifies for the numeric value.
Source: Foundation Release Notes for OS X v10.9
Custom (Apple) CLDR implementation:
All languages can use the zero category from the CLDR even if the rule is not defined for this language (reference here)
Presumably, they implemented this to facilitate negative forms of sentences which is a common use case (this can even be found in their examples). For example instead of writing:
You have 0 emails.
You can write:
You have no emails.
This is a very common use case but is typically not covered using CLDR categories, it is used by using explicit values. For example, in ICU MessageFormat you can use =0 and not zero for negative forms.
While this seems convenient, it creates a big problem, what if you want to use negative forms for Latvian using the zero category? You simply can't - basically Apple broke linguistic rules by overwriting the CLDR.
Complimentary details:
There are only two languages in the CLDR where zero does not equal 0:
Latvian: 1.3 million speakers worldwide
Prussian: dead language since the 18th century
Neither iOS nor macOS is available in the Latvian languages but they support locale settings (keyboard and date formats)
This means that there are probably few applications that will support Latvian, unless they have a manual way to change the language inside the application itself (this is a less common scenario for iOS which typically honor the device's settings)
Conclusion
Tip #1: If you need to use Latvian, you should probably avoid using zero for negative forms, and use code instead, with strings outside of the stringsdict file
Tip #2: Make sure that your translation process supports this behavior correctly!
I am currently working on a project that would benefit from localized locale codes. For example, RFC 5646 and the parent-standard BCP 47 define locale codes for various locales, such as en-GB for British English and zh-Hans-SG for Singaporean Chinese using simplified Chinese characters. Unfortunately, these codes use only a small subset of the latin alphabet.
I am looking for a similar standard or commonly used system that defines a set of language codes in the respective writing system of each language (somewhat akin to an autoglossonym).
EDIT: I am strictly seeking localized locale codes since in the problem's context (URI i18n/l10n), it would be unreasonable to use an autoglossonym or other verbose equivalent.
Locale codes as specified by RFC 5656 and BCP 47 are meant to be machine parseable. Thus, en-GB is "English (Great Britain)" and zh-Hans-SG is "Chinese (Singapore, Simplified Chinese Script)".
They are designed so that web pages, e-books and other documents can specify the language and script they are written in in a standard way.
Thus, each language, script and country is given a unique code from the respective standards and collated in the IANA Language Subtag Registry (http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry).
For a localized version of this, you are better off mapping the codes to a localized name (e.g. localizing the Description field of the subtag registry database, or using a project like iso-codes) and formatting that in a presentable way, keeping the locale code as an internal representation.
So, obviously there's iosfonts.com which has been incredibly helpful, but how can I determine that, for example, HiraKakuProN-W3 contains the code points for Japanese, (Jpan, 413 in ISO 15924)
Furthermore, I'd like to know more specific information. I imagine that, continuing the example, HiraKakuProN contains the characters for Hiragana and Katakana, but does it also contain all the CJK unified ideographs, just the ones needed for Japanese, or none of them?
Where can I find exhaustive tables of unicode characters per language (IETF language tag)? It's easy to find a listing of all Hani characters, but Unicode (and the Hani code point table) doesn't make a distinction between Hans, Hant, Jpan, etc. I ask this because, if there is no readily available info on which iOS font is for which language, I will programmatically determine this myself, but will need to know what characters to look for.
Thanks for any leads.
The list of supported ScriptCodes for Arial Unicode ( The most polyvalent font as far as I know) is there :
http://en.wikipedia.org/wiki/Arial_Unicode_MS
From this site, you can find a link to fonts supporting a given ScriptCode.
But it may need some font installations.
I hope it helps… This is a complex domain ;)
http://scriptsource.org/cms/scripts/page.php
Our application is being translated into a number of languages, and we need to have a combo box that lists the possible languages. We'd like to use the name of the language in that language (e.g. Français for French).
Is there any "proper" order for listing these languages? Do we alphabetize them based on their English names?
Update:
Here is my current list (I want to explore the Unicode Collating Algorithm that Brian Campbell mentioned):
"العربية",
"中文",
"Nederlands",
"English",
"Français",
"Deutsch",
"日本語",
"한국어",
"Polski",
"Русский язык",
"Español",
"ภาษาไทย"
Update 2: Here is the list generated by the ICU Demonstration tool, sorting for an en-US locale.
Deutsch
English
Español
Français
Nederlands
Polski
Русский язык
العربية
ภาษาไทย
한국어
中文
日本語
This is a tough question without a single, easy answer. First of all, by default you should use the user's preferred language, as given to you by the operating system, if that is one of your available languages (for example, in Windows, you would use GetUserPreferredUILanguages, and find the first one on that list that you have a translation for).
If the user still needs to select a language (you would like them to be able to override their default language, or select another language if you don't support their preferred language), then you'll need to worry about how to sort the languages. If you have 5 or 10 languages, the order probably doesn't matter that much; you might go for sorting them in alphabetical order. For a longer list, I'd put your most common languages at the top, and perhaps the users preferred languages at the top as well, and then sort the rest in alphabetical order after that.
Of course, this brings up how to sort alphabetically when languages might be written in different scripts. For instance, how does Ελληνικά (Ellinika, Greek) compare to 日本語 (Nihongo, Japanese)? There are a few possible solutions. You could sort each script together, with, for instance, Roman based scripts coming first, followed by Cyrillic, Greek, Han, Hangul, and so on. Or you could sort non-Roman scripts by their English name, or by a Roman transliteration of their native name. Probably the first or third solution should be preferred; people may not know the English name for their language, but many languages have English transliterations that people may know about. The first solution (each script sorted separately) is how the Mac OS X languages selection works; the second (sorted by their Roman transliteration) appears to be how Wikipedia sorts languages.
I don't believe that there is a standard for this particular usage, though there is the Unicode Collation Algorithm which is probably the most common standard for sorting text in mixed scripts in a relatively language-neutral way.
I would say it depends on the length of your list.
If you have 5 languages (or any number which easily fits into the dropdown without scrolling) then I'd say put your most common language at the top and then alphabetize them... but just alphabetizing them wouldn't make it less user friendly IMHO.
If you have enough the you'd need to scroll I would put your top 3 or 5 (or some appropriate number of) most common languages at the top and bold them in the list then alphabetize the rest of the options.
For a long list I would probably list common languages twice.
That is, "English" would appear at the top of the list and at the point in the alphabetized list where you'd expect.
EDIT: I think you would still want to alphabetize them according so how they're listed... that is "Espanol" would appear in the E's, not in the S's as if it were "Spanish"
Users will be able to pick up on the fact that languages are listed according to their translated name.
EDIT2: Now that you've edited to show the languages you're interested in I can see how a sort routine would be a bit more challenging!
The ISO has codes for languages (here's the Library of Congress description), which are offered in order by the code, by the English name, and by the French name.
It's tricky. I think as a user I would expect any list to be ordered based on how the items are represented in the list. So as much as possible, I would use alphabetical order based on the names you are actually displaying.
Now, you can't always do that, as many will use other alphabets. In those cases there may be a roman-alphabet way of transliterating the name (for example, the Pinyin system for Mandarin Chinese) and it could make sense to alphabetize based on that. However, romanization isn't a simple subject; there are at least a dozen ways for romanizing Arabic, for example.
You could alphabetize them based on their ISO 639 language code.