ICU/CLDR | ISO 639 Changes - Monitoring & Maintenance - localization

The International Components for Unicode (ICU) is the underlying library that provides localisation features in various programming languages and environments; I approach this from PHP where the intl extension uses ICU for this purpose.
As I understand it, ICU uses Unicode Common Locale Data Repository (CLDR) for identifying ISO 639 language codes.
Language codes are subject to change
The ISO 639 language codes are actively maintained and have been subject to quite substantial changes over the years; see the change-log for ISO 639-3.
If I am aggressively looking to support internationalisation/localisation features, the implementation of ISO 639 and associated standards becomes important to the requirement specification and delivery of my application. Ensuring that my application can at least identify all valid language codes is important for the delivery of relevant content.
How can I Monitor the ISO 639 Implementation in ICU/CLDR?
The most important aspect to this is simply having a traceable source for updates to the ISO 639 data. So that, if a problem is encountered, I know where to look for upcoming releases and change-logs, where to reports bugs, etc.
When using ICU/CLDR how can I monitor and maintain the ISO 639 implementation?

Related

iCalendar Recurrence Rule to Human Readable Text - Localization for CJK Languages

So far, all online libraries that I've found, offer functionality for converting Rrule strings to human readable strings for the English language only.
I tried to modify the code from the Rrule Package (Flutter) (which is mainly based on rrule.js) for adding support for CJK languages, but the result doesn't satisfy me yet and I think I might need a slightly different approach.
Looking at the BigTech implementations: Google doesn't show anything (at least in their calendar app) and Google translation does an awful job with translating more complex recurrence rules from English to e.g. Korean), Microsoft and Samsung calendars support only very simple Recurrence Rules, Apple seems to have one of the best solutions in their calendar.
So I was wondering if there is any open source/accessible code for converting Rrule strings to human readable Strings for CJK languages that I could refer to? The more complex recurrence rules are supported, the better.

Is there a translated / localized ISO 4217 currency code list?

I am trying to find a localized / translated ISO 4217 currency code list. What I found so far was only an English version of ISO 4217, but currency names like "Swiss Franc" have different translations per languages (as per https://www.wikidata.org/wiki/Q25344). Any lists or dbs out there that could be used inside an app without reinventing the wheel?
The ISO 4217 standard defining the international currency codes seems to be provided by ISO only in English language. This is quite unusual, since many of the general purpose standards are provided by ISO in English, French and Russian.
Since currency codes per country evolve more quickly than standard committee can follow, the maintenance of the standard was delegated to an agency that provides a static list for free, but also only in English. The European Publication Office provides the list of currencies on a page that is translated in the 24 official languages of the European Union. There are a couple of other websites, not to speak of wikipedia, which also provide some more translations.
But before you start to develop a web-scraping app to get all the translations across all the known public sources, maybe you could just have a look at this amazing GitHub repository, which provides the list in almost any language, and in a lot of different formats (I recently discovered this link on the wikipedia page that you referenced).

Is there a Way to localize an Application on Various Platforms

We are developing an Application which runs on various plattforms (Windows, Windows RT, MacOSX, iOS, Android).
The Problem is how to manage the different localizations on the different Platforms in an Easy Way. The Language Files on the different platforms have various formats (some are xml based, others are simple key-value pairs and others are totally crazy formats like on MacOS)
I'm sure, we aren't the first company with this problem, but I wasn't able to find an easy to use solution o achive the possibility to have one "datasource" where the strings are collected in different languages (the best would be an User Interface for the translators) and then can export it to the different formats for the different platforms.
Does anybody has a solution for this problem?
Greetings
Alexander
I recommend using GNU Gettext toolchain for management and at runtime use either
some alternate implementation for runtime reading like Boost.Locale,
own implementation (the .mo format is pretty trivial) or
use Translate Toolkit to convert the message catalogs to some other format of your liking.
You can't use the libintl component of GNU Gettext, because it is licensed under LGPL and terms of both Apple AppStore and Windows Live Store are incompatible with that license. But it is really trivial to reimplement the bit you need at runtime.
The Translate Toolkit actually reimplements all or most of GNU Gettext and supports many additional localization formats, but the Gettext .po format has most free tools for it (e.g. poedit for local editing and Weblate for online editing) so I recommend sticking with it anyway. And read the GNU Gettext manual, it describes the intended process and rationale behind it well.
I have quite good experience with the toolchain. The Translate Toolkit is easy to script when you need some special processing like extracting translatable strings from your custom resource files and Weblate is easy to use for your translators, especially when you rely on business partners and testers in various countries for most translations like we do.
Translate Toolkit also supports extracting translatable strings from HTML, so the same process can be used for translating your web site.
I did a project for iPhone and Android which had many translations and I think I have exactly the solution you're looking for.
The way I solved it was to put all translation texts in an Excel spreadsheet and use a VBA macro to generate the .string and .xml translation files from there. You can download my example Excel sheet plus VBA macro here:
http://members.home.nl/bas.de.reuver/files/multilanguage.zip
Just recently I've also added preliminary Visual Studio .resx output, although that's untested.
edit:
btw also my javascript xcode/eclipse converter might be of use..
you can store your translations on https://l10n.ws and get it via they API
Disclaimer: I am the CTO and Co-Founder at Tethras, but will try to answer this in a way that is not just "Use our service".
As loldop points out above, you really need to normalize your content across all platforms if you want to have a one-stop solution for managing your localized content. This can be a lot of work, and would require much coding and scripting and calling of various tools from the different SDKs to arrive at a common format that would service the localization needs of all the various file formats you need to support. The length and complexity of my previous sentence is inversely proportional to the amount of work you would need to do to arrive at a favorable solution for all of this.
At Tethras, we have built a platform that alleviates the need for multi-platform software publishers to have to do this. We support all of the native formats from the platforms you list above, and can leverage translations from one file format to another. For example, translate the content in Localizable.strings from your iOS app into a number of languages, then upload your equivalent strings.xml file from Android or foo.resx from Windows RT to the system, and it will leverage translations for you automatically. Any untranslated strings will be flagged and you can order updates for these strings.
In effect, Tethras is a CMS for localized content across many different native files formats.

What is the official standard for pthreads?

I am trying to find the document that specifies the standard for pthreads. I've seen various links which point to IEEE 1003.1c-1995 (i.e. Wikipedia or OpenGroup). However when I searched for this document on the IEEE standards site I eventually found this page which said "Superseded Standard."
The IEEE page for 1003.1c-1995 did have a note that said: "Abstract not available. See ISO/IEC 9945-1." Searching for that on Google led me to a page for ISO/IEC 9945-1:1996 but the status said "withdrawn."
So my question is what is the current active standard for pthreads? Even better would be if there was a link to a free version of the standard, but it looks like most of the links I've seen for standards cost money. But I figure if I can find out the actual standard then I might try to see if I can access it through my school's library. But first I want to know what document I should be looking for.
I believe you want ISO/IEC/IEEE 9945:2009 as it is newer than ISO/IEC 9945-1:1996 , which was revised ISO/IEC 9945-1:2003 and ISO/IEC/IEEE 9945:2009 revised that.
The following POSIX FAQ provides additional information, specifically relevant Q4. Where can I download the 1003.1 standard from? and includes links to a free HTML online version that requires registration here.
There is understandably a lot of confusion around the relevant standards. We have:
ISO/IEC 9945
IEEE 1003.1
POSIX.1
Single Unix Specification
The Open Group Base Specifications
Possibly others
Why so many different standards? I'm sure it's mostly historical. At one point some or all of these standards might have referred to their own thing. But the simplest answer is that, today, all of these specifications are now just different names for the same thing*. Here is the opening sentence from the online version of The Open Group Base Specifications, Issue 7:
POSIX.1-2008 is simultaneously IEEE Std 1003.1™-2008 and The Open Group Technical Standard Base Specifications, Issue 7
Some of the standards bodies do not provide free or registration-free access to their copies of the standard. However, The Open Group does allow free (and registration-free) access to the current issue of their online copy.
*The Single Unix Specification may not be exactly the same; it seems it contains everything in POSIX, plus the X/Open Curses standard.

Open-source OCR package that can handle unknown characters?

I want to find a (preferably) open-source OCR package (for any OS) that is capable of handling a new character set.
The language is Latin, but with some scribal abbreviations, about 10 different abbreviations that aren't in Unicode.
The text has been printed using specially-developed fonts, and I have high-res images of the text.
I'm assuming some training is going to be needed, first to map the scribal abbreviations to ASCII, and then presumably corpus-specific training for the software to learn where the abbreviations tend to appear within words.
Could anyone recommend a (preferably) open-source package capable of handling this?
AFAIK there is no library (free or commercial) that can be used as-is for what you describe (a language with characters not representable by Unicode) ... BUT as a good starting point there is an opensource OCR called Tesseract which you could take and modify for your special scenario... another interesting base could be OCROpus... but beware: this will mean lots of work.

Resources