How to detect text (string) language in iOS? - ios

For instance, given the following strings:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of them.
Let's assume the signature for the implemented function is:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
thus the appropriate result would be:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Is there an easy approach to achieve it?

Latest versions (iOS 12+)
Briefly:
You could achieve it by using NLLanguageRecognizer, as:
import NaturalLanguage
func detectedLanguage(for string: String) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Older versions (iOS 11+)
Briefly:
You could achieve it by using NSLinguisticTagger, as:
func detectedLanguage<T: StringProtocol>(for string: T) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(String(string))
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Details:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class,
identify names, perform lemmatization, and determine the language and
script.
NSLinguisticTagger provides a uniform interface to a variety of
natural language processing functionality with support for many
different languages and scripts. You can use this class to segment
natural language text into paragraphs, sentences, or words, and tag
information about those segments, such as part of speech, lexical
class, lemma, script, and language.
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:):
Returns the dominant language for the specified string.
.
.
Return Value
The BCP-47 tag identifying the dominant language of the string, or the
tag "und" if a specific language cannot be determined.
You might notice that the NSLinguisticTagger is exist since back to iOS 5. However, dominantLanguage(for:) method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
. . .
Core ML is the foundation for domain-specific frameworks and
functionality. Core ML supports Vision for image analysis, Foundation
for natural language processing (for example, the NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML
itself builds on top of low-level primitives like Accelerate and BNNS,
as well as Metal Performance Shaders.
Based on the returned value from calling dominantLanguage(for:) by passing "The quick brown fox jumps over the lazy dog":
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:) method from Locale Structure and passing the gotten language code:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Output:
It would be as expected:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Note That:
There still cases for not getting a language name for a given string, like:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLanguage(textUND) // => Unknown language
Or it could be even nil:
let rubbish = "000747322"
let rubbishDetectedLanguage = detectedLanguage(rubbish) // => nil
Still find it a not bad result for providing a useful output...
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
Official Resources
Apple Video Sessions:
For more about Natural Language Processing and how NSLinguisticTagger works: Natural Language Processing and your Apps.
Also, for getting familiar with the CoreML:
Introducing Core ML.
Core ML in depth.

You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German

I tried NSLinguisticTagger with short input text like hello, it always recognizes as Italian.
Luckily, Apple recently added NLLanguageRecognizer available on iOS 12, and seems like it more accurate :D
import NaturalLanguage
if #available(iOS 12.0, *) {
let languageRecognizer = NLLanguageRecognizer()
languageRecognizer.processString(text)
let code = languageRecognizer.dominantLanguage!.rawValue
let language = Locale.current.localizedString(forIdentifier: code)
}

Related

iOS Speech framework yields zero confidence for certain locales

When using the Speech framework, I am consistently noticing zero confidence values for certain locales (e.g. "vi-VN", "pt-PT", ...), while non-zero, accurate confidence values are returned for other locales (e.g. "ko-KR", "ja-JP", ...).
Looking at the documentation, the confidence would be zero if there was no recognition. However, when the zero confidence occurs, the formattedString of the bestTranscription is populated and accurate (same for each segment substring text).
I have tried instantiating the locales in various ways (language code only, language and region code, -/_ formatting, grabbing an instance directly off of the SFSpeechRecognizer.supportedLocales() array). I have also tried setting the defaultTaskHint of SFSpeechRecognizer and taskHint of SFSpeechRecognitionRequest to dictation.
I am stuck at this point. Any help would be appreciated. Thanks in advance :)
guard let locale = Locale(identifier: "vi-VN"),
let recognizer = SFSpeechRecognizer(locale: locale),
recognizer.isAvailable else {
return
}
recognizer.defaultTaskHint = .dictation
let request = SFSpeechURLRecognitionRequest(url: ...)
request.contextualStrings = ...
request.shouldReportPartialResults = true
request.taskHint = .dictation
recognizer.recognitionTask(with: request) { (result, error) in
...
if (result.isFinal) {
let transcription = result.bestTranscription
/// transcription.formattedString is correct
/// all segments confidence values are 0, but with the properly recognized substring text.
}
...
}

Adding custom language dialect in iOS App

I was doing localization of an app in which I face an issue regarding dialects of a country's language. My main question is, is there any provision of adding custom language.
Eg:
Suppose there are two languages:
PL for Poland
UK for Ukraine
I need to support pl-uk i.e Poland Ukraini
Adding a pl-UK.lproj would have made sense if this dialect could be chosen from the system preferences, which is not the case. If you have a local setting, I'm afraid there's no other solution than managing the localisations yourself - and it won't work for Interface Builder files.
The simplest is to store all the pl-UK differences in a separate file (it can be a .strings that you store into the pl.lproj folder (that you localise in Polish Polish - to respect the semantics of the system). Then in a custom function, you load those strings:
func localize(_ string : String, comment: comment) {
guard !isUkrainianPolish else {
return NSLocalizedString(string, comment: comment)
}
// retrieve the cache and check if a key with string exists
if let url = Bundle.main.url(forResource: "localizable_pl_UK" /* or any other name*/, withExtension: "strings", subdirectory: nil, localization:"pl"),
let data = try? Data(contentsOf: url),
let plist = (try? PropertyListSerialization.propertyList(from: data, options: [], format: nil)) as? [String:String] {
// cache the dictionary where you want
return plist[string] ?? NSLocalizedString(string, comment: comment)
}
}
Depending on the organisation of your code, you can implement the function in a singleton or the class that handle localizations.

Up-to-date list of built-in Core Image filters?

Apple's Core Image Filter Reference, which describes all of the built-in CIFilters, is marked as "no longer being updated".
Looks like it was last updated in 2016. Since then, WWDC videos for 2017 and 2018 have announced additional filters (which, indeed, don't appear on this page).
Does anybody know of a more up-to-date list of built-in Core Image filters?
(Question has also been asked, but so far not answered, on the Apple Dev Forum.)
I've created a website called CIFilter.io which lists all the built-in CIFilters and a companion app which you can use to try the filters out if you like. This should have all the up to date CIFilter information - I've updated it for iOS 13 and intend to continue to keep it updated.
More info about the project is available in this blog post.
I created a small project to query an iOS device and (1) list out all available filters and (2) list everything about each input attributes. This project can be found here.
The relevant code:
var ciFilterList = CIFilter.filterNames(inCategories: nil)
This line creates a [String] of all available filters. If you only wish for all available filters of category "CICategoryBlur", replace the nil with it.
print("=======")
print("List of available filters")
print("-------")
for ciFilterName in ciFilterList {
print(ciFilterName)
}
print("-------")
print("Total: " + String(ciFilterList.count))
Pretty self-explanatory. When I ran this on an iPad mini running iOS 12.0.1, 207 filters were listed. NOTE: I have never tried this on macOS, but since it really doesn't use UIKit I believe it will work.
let filterName = "CIZoomBlur"
let filter = CIFilter(name: filterName)
print("=======")
print("Filter Name: " + filterName)
let inputKeys = filter?.inputKeys
if inputKeys?.count == 0 {
print("-------")
print("No input attributes.")
} else {
for inputKey in inputKeys! {
print("-------")
print("Input Key: " + inputKey)
if let attribute = filter?.attributes[inputKey] as? [String: AnyObject],
let attributeClass = attribute[kCIAttributeClass] as? String,
let attributeDisplayName = attribute["CIAttributeDisplayName"] as? String,
let attributeDescription = attribute[kCIAttributeDescription] as? String {
print("Display name: " + attributeDisplayName)
print("Description: " + attributeDescription)
print("Attrbute type: " + attributeClass)
switch attributeClass {
case "NSNumber":
let minimumValue = (attribute[kCIAttributeSliderMin] as! NSNumber).floatValue
let maximumValue = (attribute[kCIAttributeSliderMax] as! NSNumber).floatValue
let defaultValue = (attribute[kCIAttributeDefault] as! NSNumber).floatValue
print("Default value: " + String(defaultValue))
print("Minimum value: " + String(minimumValue))
print("Maximum value: " + String(maximumValue))
case "CIColor":
let defaultValue = attribute[kCIAttributeDefault] as! CIColor
print(defaultValue)
case "CIVector":
let defaultValue = attribute[kCIAttributeDefault] as! CIVector
print(defaultValue)
default:
// if you wish, just dump the variable attribute to look at everything!
print("No code to parse an attribute of type: " + attributeClass)
break
}
}
}
}
}
print("=======")
Again, fairly self-explanatory. The app I'm writing only works with filters using a single CIImage and with attributes restricted to NSNumber, CIColor, and CIVector, so things will fall to the default part of the switch statement. However, it should get you started! If you wish to see the "raw" version, jut look at the attribute variable.
Finally, I'd recommend something developed by Simon Gladman called Filterpedia. It's an iPad app (restricted to landscape) that allows you to experiment with pretty much all available filters along with all attributes with default/max/min values. Be aware of two things though. (1) It's written in Swift 2, but the is a Swift 4 fork here. (2) There are also numerous custom filters using custom CIKernels.

How to provide hint to iOS speech recognition API?

I want to create an app that receive voice input using iOS speech API.
In google's API, there is an option for speechContext which I can provide hint or bias to some uncommon words.
Do iOS API provide this feature? I've been searching the site for a while but din't find any.
there is no sample code about implementing hints for Google Speech Clouds for Swift online, so I made it up!
Open this class: SpeechRecognitionService.swift
You have to add your hint list array to the SpeechContext, add the SpeechContext to RecognitionConfig, and finally add RecognitionConfig to Streaming recognition config. Like this:
let recognitionConfig = RecognitionConfig()
recognitionConfig.encoding = .linear16
recognitionConfig.sampleRateHertz = Int32(sampleRate)
recognitionConfig.languageCode = "en-US"
recognitionConfig.maxAlternatives = 3
recognitionConfig.enableWordTimeOffsets = true
let streamingRecognitionConfig = StreamingRecognitionConfig()
streamingRecognitionConfig.singleUtterance = true
streamingRecognitionConfig.interimResults = true
//Custom vocabulary (Hints) code
var phraseArray=NSMutableArray(array: ["my donkey is yayeerobee", "my horse is tekkadan", "bet four for kalamazoo"])
var mySpeechContext = SpeechContext.init()
mySpeechContext.phrasesArray=phraseArray
recognitionConfig.speechContextsArray = NSMutableArray(array: [mySpeechContext])
streamingRecognitionConfig.config = recognitionConfig
//Custom vocabulary (Hints) code
let streamingRecognizeRequest = StreamingRecognizeRequest()
streamingRecognizeRequest.streamingConfig = streamingRecognitionConfig
Bonus: Adding your custom words mixed inside a simple phrase instead of adding the word alone gave me better results.

Calculating word count from a url in swift

I'm creating a reading list app, and I'd like to pass the read time of a user added link to a table cell in their reading list - and the only way to get that number is from that page's word count. I've found a few solutions, namely Parsehub, Parse and Mercury but they seem to be geared more towards use cases that need more advanced things to be scraped from a url. Is there a simpler way in Swift to calculate word count of a url?
First of all, you need to parse the HTML. HTML can only be parsed reliably with dedicated HTML parser. Please don't use Regular Expressions or any other search method to parse HTML. You may read it why from this link. If you are using swift, you may try Fuzi or Kanna. After you get the body text with any one of the library, you have to remove extra white spaces and count the words. I have written some basic code with Fuzi library for you to get started.
import Fuzi
// Trim
func trim(src:String) -> String {
return src.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
}
// Remove Extra double spaces and new lines
func clean(src:String) ->String {
return src.replacingOccurrences(
of: "\\s+",
with: " ",
options: .regularExpression)
}
let htmlUrl = URL(fileURLWithPath: ((#file as NSString).deletingLastPathComponent as NSString).appendingPathComponent("test.html"))
do {
let data = try Data(contentsOf: htmlUrl)
let document = try HTMLDocument(data: data)
// get body of text
if let body = document.xpath("//body").first?.stringValue {
let cleanBody = clean(src: body)
let trimmedBody = trim(src:cleanBody)
print(trimmedBody.components(separatedBy: " ").count)
}
} catch {
print(error)
}
If you are fancy, you may change my global functions to String extension or you can combine them in a single function. I wrote it for clarity.

Resources