So far, all online libraries that I've found, offer functionality for converting Rrule strings to human readable strings for the English language only.
I tried to modify the code from the Rrule Package (Flutter) (which is mainly based on rrule.js) for adding support for CJK languages, but the result doesn't satisfy me yet and I think I might need a slightly different approach.
Looking at the BigTech implementations: Google doesn't show anything (at least in their calendar app) and Google translation does an awful job with translating more complex recurrence rules from English to e.g. Korean), Microsoft and Samsung calendars support only very simple Recurrence Rules, Apple seems to have one of the best solutions in their calendar.
So I was wondering if there is any open source/accessible code for converting Rrule strings to human readable Strings for CJK languages that I could refer to? The more complex recurrence rules are supported, the better.
Related
I noticed that transcribing speech in multiple languages with openai whisper speech-to-text library sometimes accurately recognizes inserts in another language and would provide the expected output, for example: 八十多个人 is the same as 八十几个人. So 多 and 几 are interchangeable and they can both mean several.
Yet, the same audio input on a different pass (with the same model, or a smaller/bigger model) would intermittently result in glitches where the entire sentence is being translated rather than transcribed. I.e. a fragment would be translated either into the first or the second language that appears in the audio. With the example input above either the entire sentence would be in English (with Chinese bits translated to English), or the entire sentence would be in Chinese (with the English bits translated to Chinese). Important: in both cases no input language was specified, and no task type was passed (which implies the default --task transcribe).
The docs for whisper mention translation to English as the only available target language (with the option --task translate in the command line version), but there is no mention of translating to other target languages. Yet the behavior mentioned above indicates that the models are capable of doing translation to other languages too.
The question is if there is a known way to configure the models to do just text-to-text translation? Or is the behavior just some sort of glitch that is not something that can be 'exploited' or configured on a lower level that would allow using the models just for text translation between any of the supported languages?
According to a comment in the whisper's issue tracker this might be a possible answer:
From the paper, the dataset that was used did not use any English audio to polish text samples. The dataset was cleaned by using a different model to match spoken language with text language. If they did not match, the sample was excluded. An exception was made for a portion of the training data to match any spoken language to English text (X->en) translation.
So unfortunately there is no direct way, the model wasn't trained on it. For your use case, this can transcribe to English text, but there has to be some an outside system to translate from English text to Polish text.
The --language parameter is defined in the cli as:
--language
language spoken in the audio, specify None
to perform language detection (default: None)
Yet, despite the help text above this can have potentially useful undocumented side effects.
The 'exploit'
The undocumented glitch that was observed is that if you set a source language e.g. es but the audio input contains English then the English part of the input will be translated to Spanish. Parts of the audio input that are not in English will be transcribed although depending on the language it might not always work or it might generate garbage translations.
So the 'exploit' is that the models can be used to parse English audio and then translate it to a supported language.
The behaviour above occurs with the regular transcribe mode (the default, ie. --task transcribe), and is reproducible with both the original whisper implementation in python, as well as the CPU-optimized C++ port whisper.cpp which is using the same models but apparently with different parameters.
The quality of the non-English translation would depend on the language, and seems to be generally of lower quality that translating from English with the open-source huggingface models (e.g. Helsinki-NLP/opus-mt-es-en, facebook/m2m100_418M, facebook/m2m100_1.2B etc).
I am trying to find a localized / translated ISO 4217 currency code list. What I found so far was only an English version of ISO 4217, but currency names like "Swiss Franc" have different translations per languages (as per https://www.wikidata.org/wiki/Q25344). Any lists or dbs out there that could be used inside an app without reinventing the wheel?
The ISO 4217 standard defining the international currency codes seems to be provided by ISO only in English language. This is quite unusual, since many of the general purpose standards are provided by ISO in English, French and Russian.
Since currency codes per country evolve more quickly than standard committee can follow, the maintenance of the standard was delegated to an agency that provides a static list for free, but also only in English. The European Publication Office provides the list of currencies on a page that is translated in the 24 official languages of the European Union. There are a couple of other websites, not to speak of wikipedia, which also provide some more translations.
But before you start to develop a web-scraping app to get all the translations across all the known public sources, maybe you could just have a look at this amazing GitHub repository, which provides the list in almost any language, and in a lot of different formats (I recently discovered this link on the wikipedia page that you referenced).
I read a few papers about machine translation but did not understand them well.
The language models (in Google translate) use phonetics and machine learning as best as I can tell.
My question then becomes is it possible to convert an Arabic word that is phonetically spelled in English to translate the users intended Arabic word?
For instance the word 'Hadith' is an English phonetic of the Arabic word 'حديث'. Can I programmatically go from 'Hadith' to Arabic?
Thanks the Wiki article, there's an entire field of work in the area of Transliteration. There was a Google API for this that was deprecated in 2011 and moved to the Google Input Tools service.
The simplest answer is Buck Walter Transliteration but at first glace a 1:1 mapping doesn't seem like a good enough idea.
I am going to try to see if there's a way to hack the Google Input tools and call it even at CLI level because their online demo works very well
I have a keyboard app designed for Serbian language. My keys have labels based in Serbian cyrillic alphabet. My xml strings that are used for those labels are enclosed in <xliff:g></xliff:g> tags, but a certain provider on a certain type of a phone still translates these into a different language. Just in case, I also have my strings in language specific folders, but it still happens. Does anyone know if there is a way I could disable translating of all my strings any other way?
There are providers who can handle technical files translations,i.e. know what to translate in technical files. Also, some are available for you to manage the translations. OneSky is one of these platform and we also provide translation service.
See GIF of how placeholder validation works in OneSky
Disclaimer: I work in OneSky
I want to experiment with an idea I have of automatically localizing software, or at least suggesting a reasonable translation if a localized string is not available.
I'm not sure this will be working satisfactorily tomorrow morning but I just wanted to play with this idea.
Does anybody know of a dictionary that is free to use, and is in an easy to parse format, that can help me automatically translate words from English to other European languages (French, German, Spanish, etc)
The FreeDict project has quite a few relatively complete dictionaries. Most are from one language to english or vice versa, but some are between two non-english languages as well.
I don't know any dictionary but would like to point something out. You have to bear in mind that translating is not a direct word to word technique in any sense. The Rules of the language change as well and thus leave sentences unreadable. This is why even companies like Google have trouble making good translation software. Context is very hard to programmatically detect and context means everything in choosing the right word, the right structure and so on.
Maybe use a Translation API, if there is one. Google only seem to do a JavaScript API for Language.
You can't even expect to get a reasonable translation with an automatic method. Translating full texts is too hard for a computer to handle completely correct, translating short phrases correctly is impossible.
Take for example the simple text "Open", without a context it's not even possible to tell if it's a verb or an adjective. I know that at least in german that the verb and the adjective translates into two different words.
Also, computer specific concepts often borrow words from similar concepts outside the computer sphere. Those concepts often have a specific translation, but an automatic translation would sometimes try to translate it as if it was the original meaning, which can give you very strange translations.
After a while of searching i solved the problem by myself start to create my own dictionary. I do a lot of translations in my free time. In the beginning it is really boring work...but after a while you get an really good dicitionary. Some friends of mine using it too...so we all benefit from every new Word we translate.