How to handle different suffix in i18next for agglutinative languages (eg. Turkish, Japanese, etc.) - localization

I am trying to add Turkish support in my product. Turkish is agglutinative language. Which means that it tends to express concepts in complex words consisting of many elements, rather than by inflection or by using isolated elements.
Currently we have created keys for i18next like following:
tr/resourceExample.json
{
"comment":"Yorum",
"comment_plural":"Yorumlar",
"select_label":"{{label}} seç"
}
Whenever we want to add a sentence like "Select comments" we use
t("resourceExample:select_label",{label:t("resourceExample:comment_plural")})
Now this works properly for languages like English or Spanish. But for Turkish, the suffix of comment changes if the word is used with verb.
For example, our currently key structure will give output for Turkish following:
Yorumlar seç
But the actual expected result for Turkish is:
Yorumları seç
The reason behind keeping this structure is that we didn't want to create new keys for select_label because Select something is used in many places where something can be replaced by many different words.
So, my question is that is there any functionality in i18next which can help in this situation?

If i got you right, you can add custom format function.
i18next.services.formatter.add('objectify', (value, lng, options) => {
if(lng=='tr'){
//add suffix or any decorations here
value=value+"ı";
}
return value
})
Read more at i18next Docs

Related

Transform Text into Different Languages

I want to make some words and phrases in different languages from Google Translator without translating it's actual meaning.Is it possible to convert the text to other languages rather than translating it.
Example:
i want plain conversion like cambridge - كامبردج, कैंब्रिज ,cambridge ,剑桥,Кембридж
i donot want translation like university - جامعة ,विश्वविद्यालय,universitet,大学,
Университет
Yes. This is called "transliteration". There are multiple ways to do it programmatically depending on which programming language you are using. Here, for demonstration, I'm using ICU4J library in Groovy:
// https://mvnrepository.com/artifact/com.ibm.icu/icu4j
#Grapes(
#Grab(group='com.ibm.icu', module='icu4j', version='59.1')
)
import com.ibm.icu.text.Transliterator;
String sourceString = "cambridge";
List<String> transformSchemes = ["Latin-Arabic", "Latin-Cyrillic", "Latin-Devanagari", "Latin-Hiragana"]
for (t in transformSchemes) {
println "${t}: " + Transliterator.getInstance(t).transform(sourceString);
}
Which returns:
Latin-Arabic: كَمبرِدگِ
Latin-Cyrillic: цамбридге
Latin-Devanagari: चंब्रिद्गॆ
Latin-Hiragana: かんぶりでげ
Obviously, since these are rule-based transformations from one language to another, they tend to be imperfect.
Therefore, if you are looking for names of places (since you mentioned "Cambridge" as an example), you'll have better luck using a database of names of places; ICU has some names of cities and many names of countries. You could also use Wikidata API to retrieve such information; here is a sample call: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q350

Handling Declension in iOS

I need to handle Declensions. The app is in different language(Czech), where the words changes for singular or plural. and based on genders as well.
Example in English
1 item, 2 items, 5 items, ...
In target language => Czech Language
1 položka, 2 položky, 5 položek, ...
I have found few repositories that I am currently going through.
https://github.com/adamelliot/Inflections
https://github.com/mattt/InflectorKit
On android, there is a way to do it via xml. Is there any recommended way to handle this on iOS ? I don't want to use if's or switches.
Thank you for and suggestions.
Matti
In iOS (and other Apple platforms), plural declensions and other localized strings that change along with numeric values run through the same API as other localized strings. So you just call NSLocalizedString in your code (no extra ifs or switches), but you provide more supporting localized data — in addition to the Localizable.strings file that contains the main (number-independent) localized strings, you add a stringsdict file for the declensions.
Apple's docs run through this step-by-step: see Handling Noun Plurals and Units of Measurement in their Internationalization and Localization Guide. The example there is Russian, which IIUC has similar plural rules to Czech... but the stringsdict format supports the full set of Unicode CLDR Plural Rules if you need to handle more.

Replacement fields inside a multi-language application

I am developing a project which supports multiple languages. One of the functions we have is to support replacement parameters.
Here is simplified example of what i mean:
A string "{CUSTNAME} has 10 customers" is defined somewhere. It includes one parameter {CUSTNAME}, which will be defined within the hierarchy where this string will be used. When the item with this string is opened up, the {CUSTNAME} resolves to its defined value.
Since in some languages, a single word or a phrase can actually change the previous or the following character(s) in the sentence, how do I implemented the replacement field functionality in that situation?
You'll need to do a few things.
(1). Set up some functions that return different translations based on the quantity and the rules of that language.
Aside from your customer name replacement the part that says 10 customers will also need some replacement and will need to be built with a function call that looks more like:
ngettext( 'customer', 'customers', 10 )
This is along the lines of how Gettext works.
(2). Set up your translation source strings such that they're aware of pluralization rules.
You haven't said what technology you're working with, but Gettext has this built in and many languages including PHP can interact with your system Gettext.
(3). Organize your text replacement into two stages. Possibly using sprintf instead of your token replacement, but that part is up to you.
Because you're using stored translations plus your own customer name replacement I'd do as follows:
Set up translation strings with your full template in each language, perhaps like this in a Gettext PO file:
# ....
msgid "%1$s has one customer"
msgid_plural "%1$s has %2$u customers"
msgstr[0] "%1$s a un client"
msgstr[1] "%2$u clients pour %1$s"
You would then fetch the required template based on quantity and perform your replacement afterwards. for example in PHP:
$n = 10;
$name = "Pierre";
$template = ngettext( '%1$s has one customer', '%1$s has %2$u customers', $n );
$rendered = sprintf( $template, $name, $n );
There are lots of gotchas here, and not all language pack formats support plurals. If you can't use Gettext in your system then have a look at Loco as a way to manage the rules of plurals and export to a file format you can work with.

Latin inflection:

I have a database of words (including nouns and verbs). Now I would like to generate all the different (inflected) forms of those nouns and verbs. What would be the best strategy to do this?
As Latin is a highly inflected language, there is:
a) the declension of nouns
b) the conjugation of verbs
See this translated page for an example of a verb's conjugation ("mandare"): conjugation
I don't want to type in all those forms for all the words manually.
How can I generate them automatically? What is the best approach?
a list of complex rules how to inflect all the words
Bayesian methods
...
There's a program called "William Whitaker's Words". It creates inflections for Latin words as well, so it's exactly doing what I want to do.
Wikipedia says that the program works like this:
Words uses a set of rules based on natural pre-, in-, and suffixation, declension, and conjugation to determine the possibility of an entry. As a consequence of this approach of analysing the structure of words, there is no guarantee that these words were ever used in Latin literature or speech, even if the program finds a possible meaning to a given word.
The program's source is also available here. But I don't really understand how this is to work. Can you help me? Maybe this would be the solution to my question ...
You could do something similar to hunspell dictionary format (see http://www.manpagez.com/man/4/hunspell/)
You define 2 tables. One contains roots of the words (the part that never change), and the other contains modifications for a given class. For a given class, for each declension (or conjugation), it tells what characters to add at the end (or the beginning) of the root. It even can specify to replace a given number of characters. Now, to get a word at a specific declension, you take the root, apply the transformation from the class it belongs, and voilà!
For example, for mandare, the root would be mand, and the class would contains suffixes like o, as, ate, amous, atis... for active indicative present.
I'll use as example the nouns, but it applies also to verbs.
First, I would create two classes: Regular and Irregular. For the Regular nouns, I would make three classes for the three declensions, and make them all implement a Declensable (or however the word is in English :) interface (FirstDeclension extends Regular implements Declensable). The interface would define two static enums (NOMINATIVE, VOCATIVE, etc, and SINGULAR, PLURAL).
All would have a string for the root and a static hashmap of suffixes. The method FirstDeclension#get (case, number) would then append the right suffix based on the hashmap.
The Irregular class should have to define a local hashmap for each word and then implement the same Declensable interface.
Does it make any sense?
Addendum: To clarify, the constructor of class Regular would be
public Regular (String stem) {
this.stem = stem
}
Perhaps, you could follow the line of AOT in your implementation. (It's under LGPL.)
http://prometheus.altlinux.org/en/Sisyphus/srpms/aot
http://seman.sourceforge.net/
http://aot.ru/
There's no Latin morphology in AOT, rather only Russian, German, English, where Russian is of course an example of an inflectional morphology as complex as Latin, so AOT should be ready as a framework for implementing it.
Still, I believe one has to have an elaborate precise formal system for the morphology already clearly defined before one goes on to programming. As for Russian, I guess, most of the working morphological computer systems are based on the serious analysis of Russian morphology done by Andrey Zalizniak and in the Grammatical Dictionary of Russian and related works.

Is there a "proper" order for listing languages?

Our application is being translated into a number of languages, and we need to have a combo box that lists the possible languages. We'd like to use the name of the language in that language (e.g. Français for French).
Is there any "proper" order for listing these languages? Do we alphabetize them based on their English names?
Update:
Here is my current list (I want to explore the Unicode Collating Algorithm that Brian Campbell mentioned):
"العربية",
"中文",
"Nederlands",
"English",
"Français",
"Deutsch",
"日本語",
"한국어",
"Polski",
"Русский язык",
"Español",
"ภาษาไทย"
Update 2: Here is the list generated by the ICU Demonstration tool, sorting for an en-US locale.
Deutsch
English
Español
Français
Nederlands
Polski
Русский язык
العربية
ภาษาไทย
한국어
中文
日本語
This is a tough question without a single, easy answer. First of all, by default you should use the user's preferred language, as given to you by the operating system, if that is one of your available languages (for example, in Windows, you would use GetUserPreferredUILanguages, and find the first one on that list that you have a translation for).
If the user still needs to select a language (you would like them to be able to override their default language, or select another language if you don't support their preferred language), then you'll need to worry about how to sort the languages. If you have 5 or 10 languages, the order probably doesn't matter that much; you might go for sorting them in alphabetical order. For a longer list, I'd put your most common languages at the top, and perhaps the users preferred languages at the top as well, and then sort the rest in alphabetical order after that.
Of course, this brings up how to sort alphabetically when languages might be written in different scripts. For instance, how does Ελληνικά (Ellinika, Greek) compare to 日本語 (Nihongo, Japanese)? There are a few possible solutions. You could sort each script together, with, for instance, Roman based scripts coming first, followed by Cyrillic, Greek, Han, Hangul, and so on. Or you could sort non-Roman scripts by their English name, or by a Roman transliteration of their native name. Probably the first or third solution should be preferred; people may not know the English name for their language, but many languages have English transliterations that people may know about. The first solution (each script sorted separately) is how the Mac OS X languages selection works; the second (sorted by their Roman transliteration) appears to be how Wikipedia sorts languages.
I don't believe that there is a standard for this particular usage, though there is the Unicode Collation Algorithm which is probably the most common standard for sorting text in mixed scripts in a relatively language-neutral way.
I would say it depends on the length of your list.
If you have 5 languages (or any number which easily fits into the dropdown without scrolling) then I'd say put your most common language at the top and then alphabetize them... but just alphabetizing them wouldn't make it less user friendly IMHO.
If you have enough the you'd need to scroll I would put your top 3 or 5 (or some appropriate number of) most common languages at the top and bold them in the list then alphabetize the rest of the options.
For a long list I would probably list common languages twice.
That is, "English" would appear at the top of the list and at the point in the alphabetized list where you'd expect.
EDIT: I think you would still want to alphabetize them according so how they're listed... that is "Espanol" would appear in the E's, not in the S's as if it were "Spanish"
Users will be able to pick up on the fact that languages are listed according to their translated name.
EDIT2: Now that you've edited to show the languages you're interested in I can see how a sort routine would be a bit more challenging!
The ISO has codes for languages (here's the Library of Congress description), which are offered in order by the code, by the English name, and by the French name.
It's tricky. I think as a user I would expect any list to be ordered based on how the items are represented in the list. So as much as possible, I would use alphabetical order based on the names you are actually displaying.
Now, you can't always do that, as many will use other alphabets. In those cases there may be a roman-alphabet way of transliterating the name (for example, the Pinyin system for Mandarin Chinese) and it could make sense to alphabetize based on that. However, romanization isn't a simple subject; there are at least a dozen ways for romanizing Arabic, for example.
You could alphabetize them based on their ISO 639 language code.

Resources