Google Cloud Speech API capability for non-sense words or phonetics - google-cloud-speech

Is is possible for the API to return the phonetics of what the sound file says?
Or, is it possible to provide non-real vocabulary words?
I have a foreign language tutorial where I might be able to use this. It for examples teaches non-Latin alphabets like Cyrillic, Hebrew, Arabic, Chinese, etc...
I have a library of non-sense words to help the student learn;
the reason for non-sense words vs real words is that it breaks the steps down to just two letters at a time; and at first, there aren't many real words that can be created with just those letters.
I'd like to show one of these non-sense words, record the student saying it, then verify if they said it correctly in order to give them feedback.

It is possible to add phrases, but not using a phonetic alphabet. This, for instance, would recognise the ficticious word "Affelfaffel", provided it's pronounced as it should be according to the specified language code:
var speech = SpeechClient.Create();
string url = #"gs://your-bucket-name/your-file";
StringBuilder sb = new StringBuilder();
RecognitionConfig rc = new RecognitionConfig()
{
Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
SampleRate = 16000,
LanguageCode = LanguageCodes.English.UnitedKingdom
};
rc.SpeechContext = new SpeechContext();
rc.SpeechContext.Phrases.Add("Affelfaffel");
var longOperation = speech.AsyncRecognize(rc, RecognitionAudio.FromStorageUri(url));
longOperation = await longOperation.PollUntilCompletedAsync();
var response = longOperation.Result;
foreach (var result in response.Results)
{
foreach (var alternative in result.Alternatives)
{
sb.Append(alternative.Transcript);
}
}

Related

Splitting a string based on a certain set of words?

I'm trying to figure out how to take a phrase and split it up into a list of separate strings based on the occurrence of certain words.
Examples are probably be the easiest way to explain what I'm hoping to achieve:
List splitters = ['ABOVE', 'AT', 'NEAR', 'IN'];
INPUT: "ALFALFA DITCH IN ECKERT CO";
OUTPUT: ["ALFALFA DITCH", "IN ECKERT CO"];
INPUT: 'ANIMAS RIVER AT DURANGO, CO';
OUTPUT: ['ANIMAS RIVER', 'AT DURANGO, CO'];
INPUT: 'ALAMOSA RIVER ABOVE WILSON CREEK IN JASPER, CO';
OUTPUT ['ALAMOSA RIVER', 'ABOVE WILSON CREEK IN JASPER, CO'];
Notice in the third example, when there are multiple occurrences of splitters in the input phrase, I only want to use the first one.
To my knowledge, the split() method doesn't support multiple strings I can't find a single example of this in dart. I would think there is a simple solution?
I'd use a RegExp then
var splitters = ['ABOVE', 'AT', 'NEAR', 'IN'];
var s = "ALFALFA DITCH IN ECKERT CO";
var splitterRE = RegExp(splitters.join('|'));
var match = splitterRE.firstMatch(s);
if (match ! null) {
var partOne = s.substring(0, match.start).trimRight();
var partTwo = s.substring(match.start);
}
That does what you ask for, but it's slightly unsafe.
It will find "IN" in "BEHIND" if given "BEHIND THE FARM IN ALABAMA".
You likely want to match only complete words. In that case, RegExps are even more helpful, since they can do that too. Change the line to:
var splitterRE = RegExp(r'\b(?:' + splitters.join('|') + r')\b');
then it will only match entire words.

Swift, iOS: How to convert a string containing number and character (i.e ',' or ',') into number?

I have an double that i am converting using NSMassFormatter from kg to lb.
let massFormatter = NSMassFormatter()
var xyz = massFormatter.stringFromKilograms(10000.000)
// xyz "22,046.226 lb"
Now I want a way to extract the number from the string. Also if I change the Locale to say es (Spain) then the value becomes "10.000,000 kg" (It actually returns "10.000 kg", removing the decimal points for unknown reasons), but i want a way such that I can extract the number regardless of the locale. Is there any standard way? Like use a regrex or some function in NSNumberFormatter?
Thank you
There is no way to do that fully independent of locale. The main problem is that identical string will be interpreted differently depending on what locale it is run against.
Best solution will be to identify all the possible formats, define all possible formatters and try to get numberFromString: from each formatter - until the first one to obtain the correct result.
The other solution, if you're getting the data from user input, is to explain the correct format to users and provide them with instant validation - i.e. showing "incorrect format" error message. Some apps have used the UIKeyboardTypeNumberPad keyboard to restrict user, so that you'll have only numeric values.
Two keys to the problem, finding the localized units (the "kg") part in your example, and converting the string using localized grouping and decimal separators:
// convert mass to string
var lbs = massFormatter.stringFromKilograms(10000)
println("\(lbs)")
// get localized unit specifier and remove from formatted string
var units = massFormatter.unitStringFromKilograms(10000, usedUnit: nil)
if let range = lbs.rangeOfString(units) {
lbs.replaceRange(range, with: "")
}
// get number formatter and set it to use grouping separator (, or .)
let numberFormatter = NSNumberFormatter()
numberFormatter.usesGroupingSeparator = true
// get number back
var kg = numberFormatter.numberFromString(lbs)
println("\(kg)")

Google Spreadsheet Translate, ignore variable names

An interesting Google Spreadsheet problem, I have a language file based on key=value that I have copied into a spreadsheet, eg.
titleMessage=Welcome to My Website
youAreLoggedIn=Hello #{user.name} you are now logged in
facebookPublish=Facebook Publishing
I have managed to split the key / value into two columns, and then translate the value column, and re-join it with the keys and Voila! this gives me a translated language file back
But as you may have spotted there are some variable in there (eg. #{user.name}) which are injected by my application, obviously I dont want to translate them.
So here is my question, given the following cell contents...
Hello #{user.name} you are now logged in
Is there a function that will translate the contents using the TRANSLATE function, but ignore anything inside #{ } (this could be at any point in the sentance)
Any Google Spreadsheet guru's have a solution for me?
Many thanks
If there are at most one occurrence of #{} then you could use the SPLIT function to divide the string into three parts that are arranged as below.
A B C D E
Original =SPLIT(An, "#{}") First piece Tag Rest of string
Translate Keep as is Translate
Put the pieces together with CONCATENATE.
=CONCATINATE(Cn,Dn,En)
I come up with same question.
Assume the escape pattern is #{sth.sth}(in regex as #{[\w.]+}). Replace them with string which Google Translate would view as untranslatable term, like VAR.
After translation, replace the term with original pattern.
Here is how I did this in script editor of spreadsheet:
function myTranslate(text, source_language, target_language) {
if(text.toString()) {
var str = text.toString();
var regex = /#{[\w.]+}/g; // g flag for multiple matches
var replace = 'VAR'; // Replace #{variable} to prevent from translation
var vars = str.match(regex).reverse(); // original patterns
str = str.replace(regex, replace);
str = LanguageApp.translate(str, source_language, target_language);
var ret = '';
for (var idx = str.search(replace); idx; idx = str.search(replace)) {
ret += str.slice(0, idx) + vars.pop();
str = str.slice(idx+replace.length);
}
return ret;
}
return null;
}
You can't just split and concatenate, because different languages use different word order of subject/predicate/object etc., and also because several languages modify nouns with different prefixes/suffixes/spelling changes depending on what they are doing in the sentence. It's all very complicated. Google needs to enable some sort of enclosing parentheses around any term we want to be quoted rather than translated.

Dectect ASCII codes for asian double byte / cyrillic character sets?

Is it possible to detect if an ascii character belongs to Asian double byte or Cyrillic character sets? Perhaps specific code ranges? I've googled, but not finding anything at first glance.
There's an RSS feed I'm tapping into that has the locale set as 'en-gb'. But there are some Asian double byte characters in the feed itself - which I need to handle differently. Just not sure how to detect it since the meta locale data is incorrect. I do not have access to correct the public feed.
If your rss feed uses utf-8, which it probably does - just look that character value is greater than 255.
A quick Google suggest that you might wanna look at String.charCodeAt
I don't know ActionScript, but I would expect a code snippet to look something like
var stringToTest : String;
for each (var i : Number = 0; i < stringToTest.length; i++) {
if (stringToTest.charCodeAt(i) > 255) {
// Do something to your double-byte character here
} else {
// You have a plain ASCII character here
}
}
I hope this helps!

Capital in Bibtex

I want to show some letters in Bibliography as capital. For example:
#misc{libsvm,
abstract = {LIBSVM is an implbmentation of Support vector machine (SVM).},
author = {Chang, Chih-Chung},
howpublished = {\url{http://www.csie.ntu.edu.tw/~cjlin/libsvm/}},
keywords = {svm},
posted-at = {2010-04-08 00:05:04},
priority = {2},
title = {LIBSVM.},
url = "http://www.csie.ntu.edu.tw/~cjlin/libsvm/",
year = {2008}
}
But "LIBSVM" is not shown as it is:
[3] Chih-Chung Chang. Libsvm. http://www.csie.ntu.edu.tw/ ̃cjlin/libsvm/,
2008.
How can I make the letters capital? Thanks and regards!
Generally, to keep BibTeX from turning your letters lowercase, enclose them in {}:
title = {A History Of {StudlyCaps}}
will produce "A history of StudlyCaps."
Alceu Costa is correct that all-capital abbreviations should be formatted in small capitals, but that is a different matter than this.
The \textsc is used to format the text in small capitals. You can do this in your .bib file:
title = {\textsc{LIBSVM}}
Place {} brackets around anything you want to keep in CAPS.
For example:
#type{bibkey,
title = "{M}y {B}ibliography is the {B}est!",
author = "{ABCDE}",
}
In the case of IEEEtran LaTeX Class template, one can simply do the following:
title = "{3D Open Source Framework for Something you need}"
A template can be found at the following link
http://www.ieee.org/conferences_events/conferences/publishing/templates.html

Resources