iOS Speech framework yields zero confidence for certain locales - ios

When using the Speech framework, I am consistently noticing zero confidence values for certain locales (e.g. "vi-VN", "pt-PT", ...), while non-zero, accurate confidence values are returned for other locales (e.g. "ko-KR", "ja-JP", ...).
Looking at the documentation, the confidence would be zero if there was no recognition. However, when the zero confidence occurs, the formattedString of the bestTranscription is populated and accurate (same for each segment substring text).
I have tried instantiating the locales in various ways (language code only, language and region code, -/_ formatting, grabbing an instance directly off of the SFSpeechRecognizer.supportedLocales() array). I have also tried setting the defaultTaskHint of SFSpeechRecognizer and taskHint of SFSpeechRecognitionRequest to dictation.
I am stuck at this point. Any help would be appreciated. Thanks in advance :)
guard let locale = Locale(identifier: "vi-VN"),
let recognizer = SFSpeechRecognizer(locale: locale),
recognizer.isAvailable else {
return
}
recognizer.defaultTaskHint = .dictation
let request = SFSpeechURLRecognitionRequest(url: ...)
request.contextualStrings = ...
request.shouldReportPartialResults = true
request.taskHint = .dictation
recognizer.recognitionTask(with: request) { (result, error) in
...
if (result.isFinal) {
let transcription = result.bestTranscription
/// transcription.formattedString is correct
/// all segments confidence values are 0, but with the properly recognized substring text.
}
...
}

Related

Getting numbers in word format in SFSpeechRecognizer instead of numbers

Is there any way of printing numbers into proper spellings instead of throwing numbers while recording voice via SFSpeechRecognizer? I've tried to get the word format by implementing the code below:
if let resultString = result?.bestTranscription.formattedString {
if let number = Double(resultString) {
let numberFormatter = NumberFormatter()
numberFormatter.numberStyle = .spellOut
let numberString = numberFormatter.string(from: NSNumber(value: number))
let numberStringWithoutHyphen = numberString?.replacingOccurrences(of: "-", with: " ")
print(numberStringWithoutHyphen)
}
}
This solution works great if the user is speaking whole numbers or even decimal numbers but there are some cases where this solution doesn't work at all and makes this solution look dumb. For example, if the user says "Fifty five point zero", the speech recognizer picks it up as "55.0". But the number formatter returns "Fifty five". In an extreme case, if the user says "One two three four", the speech recognizer picks it up as "1234" but the number formatter returns "One thousand two hundred thirty four".
What I am aiming for is if the user says any number, the speech recognizer should return the same, word by word. If the user says "Fifty five point zero", it should return "Fifty five point zero". If the user says "One two three four", it should return "One two three four".

NSDecimalNumber from user input

I'm trying to get conversion fro user entered
money strings to NSDecimalNumber
var nsdecimalNumberFromUserInput: NSDecimalNumber? {
let parsed = NSDecimalNumber(string: self, locale: Locale.current)
if parsed == .notANumber {
return nil
}
return parsed
}
UIKeyboardType == .numberPad in UITextField
somehow displays either . or , depending on
the device and ios version (12.4[.1] & 13 beta 8).
All current locales set to Belarus.
What's a reliable way to parse money regardless
of the curved balls .numberPad sends my way?
The decimal separator in the above mentioned locale is , (comma)
Thanks

How to detect text (string) language in iOS?

For instance, given the following strings:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of them.
Let's assume the signature for the implemented function is:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
thus the appropriate result would be:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Is there an easy approach to achieve it?
Latest versions (iOS 12+)
Briefly:
You could achieve it by using NLLanguageRecognizer, as:
import NaturalLanguage
func detectedLanguage(for string: String) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Older versions (iOS 11+)
Briefly:
You could achieve it by using NSLinguisticTagger, as:
func detectedLanguage<T: StringProtocol>(for string: T) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(String(string))
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Details:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class,
identify names, perform lemmatization, and determine the language and
script.
NSLinguisticTagger provides a uniform interface to a variety of
natural language processing functionality with support for many
different languages and scripts. You can use this class to segment
natural language text into paragraphs, sentences, or words, and tag
information about those segments, such as part of speech, lexical
class, lemma, script, and language.
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:):
Returns the dominant language for the specified string.
.
.
Return Value
The BCP-47 tag identifying the dominant language of the string, or the
tag "und" if a specific language cannot be determined.
You might notice that the NSLinguisticTagger is exist since back to iOS 5. However, dominantLanguage(for:) method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
. . .
Core ML is the foundation for domain-specific frameworks and
functionality. Core ML supports Vision for image analysis, Foundation
for natural language processing (for example, the NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML
itself builds on top of low-level primitives like Accelerate and BNNS,
as well as Metal Performance Shaders.
Based on the returned value from calling dominantLanguage(for:) by passing "The quick brown fox jumps over the lazy dog":
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:) method from Locale Structure and passing the gotten language code:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Output:
It would be as expected:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Note That:
There still cases for not getting a language name for a given string, like:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLanguage(textUND) // => Unknown language
Or it could be even nil:
let rubbish = "000747322"
let rubbishDetectedLanguage = detectedLanguage(rubbish) // => nil
Still find it a not bad result for providing a useful output...
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
Official Resources
Apple Video Sessions:
For more about Natural Language Processing and how NSLinguisticTagger works: Natural Language Processing and your Apps.
Also, for getting familiar with the CoreML:
Introducing Core ML.
Core ML in depth.
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German
I tried NSLinguisticTagger with short input text like hello, it always recognizes as Italian.
Luckily, Apple recently added NLLanguageRecognizer available on iOS 12, and seems like it more accurate :D
import NaturalLanguage
if #available(iOS 12.0, *) {
let languageRecognizer = NLLanguageRecognizer()
languageRecognizer.processString(text)
let code = languageRecognizer.dominantLanguage!.rawValue
let language = Locale.current.localizedString(forIdentifier: code)
}

Speech API - confidence always 0

I tried using the official Apple example:
SpeekToMe
I edited the example in the following way to get the confidence levels:
if let result = result {
for t in result.transcriptions
{
for s in t.segments
{
print("POSSIBLE TRANSCRIPTION: \(s.substring) confidence: \(s.confidence)")
}
}
self.textView.text = result.bestTranscription.formattedString
isFinal = result.isFinal
}
Problem is the confidence levels are always = 0.
I found similar questions but setting the defaultTaskHint to dictation (or anything else) didn't help.
Does anyone have any suggestions on how to get the proper confidence values?

Using NSDataDetector to just like Apple's Notes app

I'm trying to find several different data types including Dates, Addresses, Phone numbers, and Links. I'm already able to find them but I want to be able to format them by underlining and changing their color. This is my code so far.
func detectData() {
let text = self.textView.text
let types: NSTextCheckingType = .Date | .Address | .PhoneNumber | .Link
var error: NSError?
let detector = NSDataDetector(types: types.rawValue, error: &error)
var dataMatches: NSArray = [detector!.matchesInString(text, options: nil, range: NSMakeRange(0, (text as NSString).length))]
for match in dataMatches {
I was thinking I should first get each result out of the loop then
1) turn them into strings 2)format them.
First question. How will I put my formatted string back into my UITextView at the same place?
Second question. I'm thinking about creating a switch like so
switch match {
case match == NSTextCheckingType.date
but now that I have a specific type of NSTextCheckingType, what do I have to do to make them have the functionality I want? (e.g. call a phone number, open up maps for an address, create a event for a date)
To do what Notes does you just need to set the dataDetectorTypes property on your text view. That's all! No NSDataDetector involved.

Resources