(Mis)-using open.ai whisper for text-to-text translation - machine-learning

I noticed that transcribing speech in multiple languages with openai whisper speech-to-text library sometimes accurately recognizes inserts in another language and would provide the expected output, for example: 八十多个人 is the same as 八十几个人. So 多 and 几 are interchangeable and they can both mean several.
Yet, the same audio input on a different pass (with the same model, or a smaller/bigger model) would intermittently result in glitches where the entire sentence is being translated rather than transcribed. I.e. a fragment would be translated either into the first or the second language that appears in the audio. With the example input above either the entire sentence would be in English (with Chinese bits translated to English), or the entire sentence would be in Chinese (with the English bits translated to Chinese). Important: in both cases no input language was specified, and no task type was passed (which implies the default --task transcribe).
The docs for whisper mention translation to English as the only available target language (with the option --task translate in the command line version), but there is no mention of translating to other target languages. Yet the behavior mentioned above indicates that the models are capable of doing translation to other languages too.
The question is if there is a known way to configure the models to do just text-to-text translation? Or is the behavior just some sort of glitch that is not something that can be 'exploited' or configured on a lower level that would allow using the models just for text translation between any of the supported languages?

According to a comment in the whisper's issue tracker this might be a possible answer:
From the paper, the dataset that was used did not use any English audio to polish text samples. The dataset was cleaned by using a different model to match spoken language with text language. If they did not match, the sample was excluded. An exception was made for a portion of the training data to match any spoken language to English text (X->en) translation.
So unfortunately there is no direct way, the model wasn't trained on it. For your use case, this can transcribe to English text, but there has to be some an outside system to translate from English text to Polish text.
The --language parameter is defined in the cli as:
--language
language spoken in the audio, specify None
to perform language detection (default: None)
Yet, despite the help text above this can have potentially useful undocumented side effects.
The 'exploit'
The undocumented glitch that was observed is that if you set a source language e.g. es but the audio input contains English then the English part of the input will be translated to Spanish. Parts of the audio input that are not in English will be transcribed although depending on the language it might not always work or it might generate garbage translations.
So the 'exploit' is that the models can be used to parse English audio and then translate it to a supported language.
The behaviour above occurs with the regular transcribe mode (the default, ie. --task transcribe), and is reproducible with both the original whisper implementation in python, as well as the CPU-optimized C++ port whisper.cpp which is using the same models but apparently with different parameters.
The quality of the non-English translation would depend on the language, and seems to be generally of lower quality that translating from English with the open-source huggingface models (e.g. Helsinki-NLP/opus-mt-es-en, facebook/m2m100_418M, facebook/m2m100_1.2B etc).

Related

Web Page From English To Urdu converter

I need to convert my website's pages from English to the Urdu language. For this I was using Google's Translation API, but Google translate API is not returning the correct translation of the pages.
What should I to use to get 99% accurate results when translating pages from English Language to Urdu Language?
There are only few parameters that you can specify when using Google Translate API and that can make a difference to your results: source and model parameters:
Source is the language of the source text. If you don't specify it then it will be detected automatically. As your source language is English then I don't think this will be causing any troubles.
Model: As Urdu language is supported by the Neural Machine Translation Model, if you don't specify the model, then nmt model will be used. You can try to use base model, however the nmt one is supposed to "provide improved translation for longer and more complex content".
Maybe expecting the model to get 99% accuracy is expecting it to be almost perfect.

Android localization/translation

I have a keyboard app designed for Serbian language. My keys have labels based in Serbian cyrillic alphabet. My xml strings that are used for those labels are enclosed in <xliff:g></xliff:g> tags, but a certain provider on a certain type of a phone still translates these into a different language. Just in case, I also have my strings in language specific folders, but it still happens. Does anyone know if there is a way I could disable translating of all my strings any other way?
There are providers who can handle technical files translations,i.e. know what to translate in technical files. Also, some are available for you to manage the translations. OneSky is one of these platform and we also provide translation service.
See GIF of how placeholder validation works in OneSky
Disclaimer: I work in OneSky

Detect when to use a vs an

I have a service that allows user's (admins) to change the terminology the site uses. My designer wants me to use the format "A Group". The problem is, for some terminology, it should be "An" not "A".
Is there any way to reliably detect which to use? What about localization?
I can brute force it and get 90% of the way by checking the first letter for consonant vs vowel. That won't work for all words though. And that doesn't cover any language except English.
In my opinion you've got only 2 ways:
1- You need to check the first letter and process all the sentence by checking its letters to see if there is any non-English letters.
2- Provide a dictionary of English nouns then you can easily check your word to find if it needs an "a" or "an".
Although the "a versus an" issue is very specific, what you're describing here is a natural language processing issue. Essentially you are being asked to write code that generates a grammatically correct piece of text.
I think you should try to to explain the implications to the designer, especially if you end up localizing in other languages. Your time is probably better spent working on your app's business logic than on language processing.

Open-source OCR package that can handle unknown characters?

I want to find a (preferably) open-source OCR package (for any OS) that is capable of handling a new character set.
The language is Latin, but with some scribal abbreviations, about 10 different abbreviations that aren't in Unicode.
The text has been printed using specially-developed fonts, and I have high-res images of the text.
I'm assuming some training is going to be needed, first to map the scribal abbreviations to ASCII, and then presumably corpus-specific training for the software to learn where the abbreviations tend to appear within words.
Could anyone recommend a (preferably) open-source package capable of handling this?
AFAIK there is no library (free or commercial) that can be used as-is for what you describe (a language with characters not representable by Unicode) ... BUT as a good starting point there is an opensource OCR called Tesseract which you could take and modify for your special scenario... another interesting base could be OCROpus... but beware: this will mean lots of work.

Is there a programming language with semantics close to English?

Most languages allow to 'tweek' to certain extend parts of the syntax (C++,C#) and/or semantics that you will be using in your code (Katahdin, lua). But I have not heard of a language that can just completely define how your code will look like. So isn't there some language which already exists that has such capabilities to override all syntax & define semantics ?
Example of what I want to do is basically from the C# code below:
foreach(Fruit fruit in Fruits)
{
if(fruit is Apple)
{
fruit.Price = fruit.Price/2;
}
}
I want do be able to to write the above code in my perfect language like this:
Check if any fruits are Macintosh apples and discount the price by 50%.
The advantages that come to my mind looking from a coder's perspective in this "imaginary" language are:
It's very clear what is going on (self descriptive) - it's plain English after all even kid would understand my program
Hides all complexities which I have to write in C#. But why should I care to learn that
if statements, arithmetic operators etc since there are already implemented
The disadvantages that I see for a coder who will maintain this program are:
Maybe you would express this program differently from me so you may not get all the
information that I've expressed in my sentence
Programs can be quite verbose and hard to debug but if possible to even proximate this type of syntax above maybe more people would start programming right? That would be amazing I think. I can go to work and just write an essay to draw a square on a winform like this:
Create a form called MyGreetingForm. Draw a square with in the middle of
MyGreetingFormwith a side of 100 points. In the middle of the square write "Hello! Click here to continue" in Arial font.
In the above code the parser must basically guess that I want to use
the unnamed square from the previous sentence, it'd be hard to write such a smart parser I guess, yet it's so simple what I want to do.
If the user clicks on square in the middle of MyGreetingForm show MyMainForm.
In the above code 'basically' the compiler must: 1)generate an event handler 2) check if there is any square in the middle of the form and if there is - 3) hide the form and show another form
It looks very hard to do but it doesn't look impossible IMO to me at least approximate this (I can personally generate a parser to perform the 3 steps above np & it's basically the same that it has to do any way when you add even in c# a.MyEvent=+handler; so I don't see a problem here) so I'm thinking maybe somebody already did something like this ? Or is there some practical burden of complexity to create such a 'essay style' programming language which I can't see ? I mean what's the worse that can happen if the parser is not that good? - your program will crash so you have to re-word it:)
Check out:
The Osmosian Order
of Plain English Programmers
Code Example:
The background is a picture.
A button has a box and a name.
To clear the status:
Clear the status' string.
Show everything.
To create the background:
Draw the screen's box with the white color.
Loop.
Pick a spot anywhere in the screen's box.
Pick a color between the lightest gray color and the white color.
Dab the color on the spot.
If a counter is past 80000, break.
If the counter is evenly divisible by 1000, refresh the screen.
Repeat.
Extract the background given the screen's box. \or Create the background from the screen. Or something.
Some Interactive fiction designers use a language syntax extremely close to the English language. Here's some Inform 7 code, which you can play online:
The foyer is a room.
The apple is in the foyer. It is edible. The description is "This is a ripe,
green granny smith apple."
The apple core is a thing. The description is "This apple core all that is
left of that granny smith apple you just consumed."
After eating the apple:
now the apple core is in the player;
say "You gobble down the apple careful not to eat any of those cyanide-
laced seeds you heard about."
I tutored a course that used Inform 7. One of the tutors had the impression the assignment was to design, not write a game. So he marked the programs by reading them, without realising they were actual programs.
I don't think that this would be an easy task nor do I think it is going to make life easier for debugging
How would you deal with these issues?
spelling mistakes
different dialects in different parts of world
different dialects in the same part of the world
synonyms
which part of sentence do you parse first?
tear (rip) and tear (from eye) both words spellings are the same but mean two different things.
Bring back COBOL or can you remember "Walk West", "Examine Door", "Push Door", "Open Door", "Use key on door" :)
edit - how would you strongly type this?
I have written an extensible English-to-Python compiler called EngScript, which converts structured English into working Python code.
This is an example of EngScript code:
print{create a string from the file called "README.txt"}
print{save the string "Woohoo!" to a file called "ExampleText.txt"}
print{the first 3 letters of "EngScript"}
This is the output that was generated by the EngScript compiler:
print(pythonFunctions.stringFromTextFile("README.txt"))
print(pythonFunctions.writeStringToFile("ExampleText.txt", "Woohoo!"))
print("EngScript"[0:(3 - 1)+1])
LiveCode!
There are a few "natural language", high-level, English-like programming or scripting languages. Probably all of them were inspired by the oldest, COBOL. My personal favorite of these languages is LiveCode. LiveCode is a decendent of MetaCard, a Linux clone of Apple’s now defunct HyperCard that used an English-like scripting language called HyperTalk, which was inspired by SmallTalk, and in turn inspired JavaScript (as well as the entire World-Wide-Web). HyperTalk was the basis for another English-Like scripting language called AppleScript (and later AppleScriptObjC), which still comes with macOS to this very day. LiveCode uses a language called LiveCodeScript, or LCS which, like other HyperCard clones and that have existed over the years (SuperCard, Adobe’s Lingo/Flash ActionScript, Open Xion, Oracle’s Toolbook, etc.), is very similar to HyperTalk at it’s core, often referred to as an X-Talk language. LiveCode has several advantages; it’s very much still in production, it has a dual license (open source and commercial versions), the engine is cross-platform (Mac, Win, Linux, HTML5, iOS, Android, and a server version), and like HyperCard it is also a GUI toolkit and it is extensible. The LiveCode team is currently working on new a lower-level programming language called LiveCode Builder, or LCB. LCB is also an English-like, although LCB is a bit less readable than LCS, it has a goals of having capabilities on par with lower-level languages like C++, Objective C, etc., allowing for extending the LiveCode platform with code libraries and frameworks produced by other programming language libraries, and ultimately allowing for the LiveCode IDE to be written in it’s own language.
Try using the programming language called 'Google' - it has a natural English interface and your code fragment throws back all the answers you are suggesting. Interestingly just six minutes after you asked this question, this very page is #1 for the query:
Check if any fruits are Macintosh
apples and discount the price by 50%
Use the Google API and I think you have the basis of a natural English programming language.

Resources