Inverse translation with rails-i18n - ruby-on-rails

I've been happily using the built-in rails i18n support for translating strings to different languages, which works great. Recently though I've had a need for something that goes a bit beyond the default behaviour of this gem.
I'll call this "inverse translation" for lack of a better word. Basically the idea is that I have some string in some language, and I want to be able to call a method with another locale, and get back the string translated to that locale if a mapping exists in the locale strings.
For example, assume I have in config/locales/en.yml
en:
hello: Hello World!
and in config/locales/ja.yml:
ja:
hello: Konnichi wa!
then when I call this method l2l_translate ("locale to locale translate") while in the English locale, with the string and the locale as arguments, I get back the Japanese translation:
I18n.locale = :en
l2l_translate("Hello World!", :ja) #=> "Konnichi wa!"
Also, and this is more tricky, I want to be able to inverse match interpolated strings. So say I have:
config/locales/en.yml
en:
minutes: "%d minutes"
config/locales/ja.yml
ja:
minutes: "%d分"
Then I should be able to translate from English to Japanese like so:
l2l_translate("5 minutes", :ja) #=> "5分"
So basically the string should be matched with a regex to the English translation string, and the "5" pulled out and sent as an argument "%d" to the Japanese translation.
Obviously there are potential problems here, if: 1) there is no match, or 2) there are multiple matches. Those could be handled by raising an exception, for example, or by returning nil in the former case and an array of translations in the latter. In any case those are minor points.
My basic question is: does anything like this exist? And if not, does anyone have any suggestions on how to go about developing it (say as a gem)?
The application I'm specifically thinking of is an API wrapper for a service in Japanese. I want to be able to specify patterns in Japanese which can be matched and translated into other languages. The default i18n support won't do this, and I don't know of any other gems that will.
Any advice or suggestions would be much appreciated! For reference see also this discussion in 2010 on the topic of inverse translation with i18n-rails.

We use gettext, which is a standard unix i18n solution. For Rails, you can use gettext_i18n_rails. One caveat is that FastGettext, which gettext_i18n_rails is backed by, doesn't seem to have complete gettext support, and some advanced features such as pluralization didn't work as expected.

Related

Elixir/Erlang - Split paragraph into sentences based on the language

In Java there is a class called BreakItterator which allows me to pass a paragraph of text in any language (the language it is written in is known) and it will split the text into separate sentences. The magic is that it can take as an argument the locale of the langue the text is written in and it will split the text according to that languages rules (if you look into it it is actually a very complex issue even in English - it is certainly not a case of 'split by full-stops/periods').
Does anybody know how I would do this in elixir? I can't find anything in a Google search.
I am almost at the point of deploying a very thin public API that does only this basic task that I can call into from elixir - but this is really not desirable.
Any help would be really appreciated.
i18n library should be usable for this. Just going from the examples provided, since I have no experience using it, something like the following should work (:en is the locale code):
str = :i18n_string.from("some string")
iter = :i18n_iterator.open(:en, :sentence)
sentences = :i18n_string.split(iter, str)
There's also Cldr, which implements a lot of locale-dependent Unicode algorithms directly in Elixir, but it doesn't seem to include iteration in particular at the moment (you may want to raise an issue there).

Different date format and one locale

I use I18n.t('date.formats.default') for date formatting.
The issue is that in different countries there are different date formats, but one english locale.
For example '%m.%d.%Y' fo US and '%d.%m.%Y' for Australia
I need the ideas how to handle with it.
While you might simply use something else for date formats, the easiest drop-in solution would be to store all possible variants in the same string and on retrieval do (assuming the country code is known):
'date.formats.default': 'US[%m.%d.%Y],AU[%d.%m.%Y]'
code = 'AU'
format = I18n.t('date.formats.default')
format[/(?<=#{code}\[).*?(?=\])/] || format
#⇒ "%d.%m.%Y"
The latter || format is needed to support normal format, without brackets.
If you don’t like regular expressions, store the JSON there, containing hash {CODE => FORMAT}, parse it and retrieve the value.
I think it is more convenient way to use different locales.
For example en-AU.yml, en-US.yml, en-CA.yml etc.? Especially i18n supports this.
Australia has different time format too.
Every time you have to take into account all these nuances for each country.
Using different locales simplifies this.

How to display the internationalization "second"/"seconds" string for a number?

I am using Ruby on Rails 4 and, given a number, I would like to display the internationalization "second"/"seconds" string for that number. That is, I have a number (for example, 1 or 20) and I would like to display 1 second or 20 seconds (in english).
I know the date helpers but no method seems to fit for my case. How can I make that?
The usual t function eventually ends up inside the i18n gem's translate method. translate, like any sensible i18n/l10n tool, already knows about the current locale's pluralization rules. That means that you should just tell the translation system which message/string you want to how many of them you have, something like:
t('message-identifier', :count => n)
Then t will use the appropriate pluralization rules for n things in the current locale.
I use gettext for all my translation needs and it behaves this way. But there's no possible way that t wouldn't work this way too; it must work this way or it is utterly useless.

Best way to handle foreign number in Rails?

I wonder something. We have an rails app that had been translated in 30/40 languages including exotic languages. We wonder what's the best way to handle numbers and number translation.
For example, we have in english, a string 3 items, but in persian or arabic, it will become something like ٣ سلع‎. Unfortunately, I am getting 3 سلع using I18n gem.
I am using this file as locale:
https://raw.github.com/svenfuchs/rails-i18n/master/rails/locale/ar.yml
The locale is correclty set to ar it just output 3 in place of ٣ (the arabic/persian character)
Any ideas?

If you have an application localized in pt-br and pt-pt, what language you should choose if the system is reporting only "pt" code?

If you have an application localized in pt-br and pt-pt, what language you should choose if the system is reporting only pt code (generic Portuguese)?
This question is independent of the nature of the application, desktop, mobile or browser based. Let's assume you are not able to get region information from another source and you have to choose one language as the default one.
The question does apply as well for more case including:
pt-pt and pt-br
en-us and en-gb
fr-fr and fr-CA
zh-cn, zh-tw, .... - in fact in this case I know that zh can be used as predominant language for Simplified Chinese where full code is zh-hans. For Traditional Chinese, with codes like zh-tw, zh-hant-tw, zh-hk, zh-mo the proper code (canonical) should be zh-hant.
Q1: How to I determine the predominant languages for a specified meta-language?
I need a solution that will include at least Portuguese, English and French.
Q2: If the system reported Simplified Chinese (PRC) (zh-cn) as preferred language of the user and I have translation only for English and Traditional Chinese (en,zh-tw) what should I choose from the two options: en or zh-tw?
In general you should separate the "guess the missing parameters" problem from the "matching a list of locales I want vs. a list of locales I have" problem. They are different.
Guessing the missing parts
These are all tricky areas, and even (potentially) politically charged.
But with very few exceptions the rule is to select the "original country" of the language.
The exceptions are mostly based on population.
So fr-FR for fr, es-ES, etc.
Some exceptions: pt-BR instead of pt-PT, en-US instead of en-GB.
It is also commonly accepted (and required by the Chinese standards) that zh maps to zh-CN.
You might also have to look at the country to determine the script, or the other way around.
For instance az => az-AZ but az-Arab => az-Arab-IR, and az_IR => az_Arab_IR
Matching 'want' vs. 'have'
This involves matching a list of want vs. a list of have languages.
Dealing with lists makes it harder. And the result should also be sorted in a smart way, if possible. (for instance if want = [ fr ro ] and have = [ en fr_CA fr_FR ro_RO ] then you probably want [ fr_FR fr_CA ro_RO ] as result.
There should be no match between language with different scripts. So zh-TW should not fallback to zh-CN, and mn-Mong should not fallback to mn-Cyrl.
Tricky areas: sr-Cyrl should not fallback to sr-Latn in theory, but it might be understood by users. ro-Cyrl might fallback to ro-Latn, but not the other way around.
Some references
RFC 4647 deals with language fallback (but is not very useful in this case, because it follows the "cut from the right" rule).
ICU 4.2 and newer (draft in 4.0, I think) has uloc_addLikelySubtags (and uloc_minimizeSubtags) in uloc.h. That implements http://www.unicode.org/reports/tr35/#Likely_Subtags
Also in ICU uloc.h there are uloc_acceptLanguageFromHTTP and uloc_acceptLanguage that deal with want vs have. But kind of useless as they are, because they take a UEnumeration* as input, and there is no public API to build a UEnumeration.
There is some work on language matching going beyond the simple RFC 4647. See http://cldr.unicode.org/development/design-proposals/languagedistance
Locale matching in ActionScript at http://code.google.com/p/as3localelib/
The APIs in the new Flash Player 10.1 flash.globalization namespace do both tag guessing and language matching (http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/globalization/package-detail.html). It works on TR-35 and can look beyond the # and consider the operation. For instance, if have = [ ja ja#collation=radical ja#calendar=japanese ] and want = [ ja#calendar=japanese;collation=radical ] then the best match depends on the operation you want. For date formatting ja#calendar=japanese is the better match, but for collation you want ja#collation=radical
Do you expect to have more users in Portugal or in Brazil? Pick accordingly.
For your general solution, you find out by reading up on Ethnologue.

Resources