UITextChecker completionsForPartialWordRange - ios

I'm building custom keyboard extension and want to implement autocompletion, like Apple does.
As I see method completionsForPartialWordRangereturns list of words sorted alphabetically. How can I get results sorted by usage?

The docs for completionsForPartialWordRange:inString:language: say:
The strings in the array are in the order they should be presented to the user—that is, more probable completions come first in the array.
However, the results are very clearly sorted in alphabetical order, and it's not true that "more probable completions come first in the array." The below was tested with iOS 9:
NSString *partialWord = #"th";
UITextChecker *textChecker = [[UITextChecker alloc] init];
NSArray *completions = [textChecker completionsForPartialWordRange:NSMakeRange(0, partialWord.length) inString:partialWord language:#"en"];
iOS word completions for "th":
thalami,
thalamic,
thalamus,
thalassic,
thalidomide,
thallium,
...
the,
...
So, the results will need to be sorted again after obtaining the word completions.
The OS X NSSpellChecker version of this method does not have the same problem:
NSString *partialWord = #"th";
NSArray *completions = [[NSSpellChecker sharedSpellChecker] completionsForPartialWordRange:NSMakeRange(0, partialWord.length) inString:partialWord language:#"en" inSpellDocumentWithTag:0];
List of complete words from the spell checker dictionary in the order they should be presented to the user.
Mac OS X word completions for "th":
the,
this,
that,
they,
thanks,
there,
that's,
...
Filing a radar bug report would be a good idea, so that the behavior will hopefully be fixed in a later version of iOS. I've reported this as rdar://24226582 if you'd like to duplicate.

Swift 4.0
func autoSuggest(_ word: String) -> [String]? {
let textChecker = UITextChecker()
let availableLangueages = UITextChecker.availableLanguages
let preferredLanguage = (availableLangueages.count > 0 ? availableLangueages[0] : "en-US");
let completions = textChecker.completions(forPartialWordRange: NSRange(0..<word.utf8.count), in: word, language: preferredLanguage)
return completions
}

Related

Handling Windows Hebrew Encoding in Objective-c

I am working on an ancient iOS app for a program that runs off VB6. It is passing strings over in a non unicode format including Hebrew characters which once they are parsed are displayed as "àáðø ãøåøé"
I am assuming it is being encoded in Windows Hebrew.
I can't seem to find anything in the apple documentation that explains how to handle this case. And most searches bring up solutions in Swift, no Obj-C. I tried this:
NSString *hebrewPickup = [pickupText stringByApplyingTransform:NSStringTransformLatinToHebrew reverse:false];
But that just gave me this:
"ðø ַ̃øַ̊øֵ"
I am stumped.
EDIT: Based on JosefZ's comment I have tried to encode back using CP1252, but the issue is that CP1255 is not in the list of NSStringEncodings. But seems like it would solve my issue.
NSData *pickupdata = [pickupText dataUsingEncoding:NSWindowsCP1252StringEncoding];
NSString *convPick = [[NSString alloc] initWithData:pickupdata encoding:NSWindowsCP1254StringEncoding];
NSString *hebrewPickup = [convPick stringByApplyingTransform:NSStringTransformLatinToHebrew reverse:false];
Ok, if any poor soul ends up here, this is how I ended up fixing it. I needed to add some Swift into my Obj-C code. (If only I could magically just rebuild the whole project in Swift instead.)
Here is the info on that: https://developer.apple.com/documentation/swift/imported_c_and_objective-c_apis/importing_swift_into_objective-c
Making use of this Swift Package: https://github.com/Cosmo/ISO8859
I added the following code to a new swift file.
#objc class ConvertString: NSObject {
#objc func convertToHebrew(str:String) -> NSString {
let strData = str.data(using: .windowsCP1252);
let bytes: Data = strData!;
if let test = String(bytes, iso8859Encoding: ISO8859.part8) {
return test as NSString;
}
let test = "";
return test as NSString;
}
}
Then in the Obj-C project I was able to call it like so:
ConvertString *stringConverter = [ConvertString new];
NSString *pickupTextFixed = [stringConverter convertToHebrewWithStr:pickupText];
NSString *deliverTextFixed = [stringConverter convertToHebrewWithStr:deliverText];

How to detect text (string) language in iOS?

For instance, given the following strings:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of them.
Let's assume the signature for the implemented function is:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
thus the appropriate result would be:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Is there an easy approach to achieve it?
Latest versions (iOS 12+)
Briefly:
You could achieve it by using NLLanguageRecognizer, as:
import NaturalLanguage
func detectedLanguage(for string: String) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Older versions (iOS 11+)
Briefly:
You could achieve it by using NSLinguisticTagger, as:
func detectedLanguage<T: StringProtocol>(for string: T) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(String(string))
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Details:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class,
identify names, perform lemmatization, and determine the language and
script.
NSLinguisticTagger provides a uniform interface to a variety of
natural language processing functionality with support for many
different languages and scripts. You can use this class to segment
natural language text into paragraphs, sentences, or words, and tag
information about those segments, such as part of speech, lexical
class, lemma, script, and language.
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:):
Returns the dominant language for the specified string.
.
.
Return Value
The BCP-47 tag identifying the dominant language of the string, or the
tag "und" if a specific language cannot be determined.
You might notice that the NSLinguisticTagger is exist since back to iOS 5. However, dominantLanguage(for:) method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
. . .
Core ML is the foundation for domain-specific frameworks and
functionality. Core ML supports Vision for image analysis, Foundation
for natural language processing (for example, the NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML
itself builds on top of low-level primitives like Accelerate and BNNS,
as well as Metal Performance Shaders.
Based on the returned value from calling dominantLanguage(for:) by passing "The quick brown fox jumps over the lazy dog":
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:) method from Locale Structure and passing the gotten language code:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Output:
It would be as expected:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Note That:
There still cases for not getting a language name for a given string, like:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLanguage(textUND) // => Unknown language
Or it could be even nil:
let rubbish = "000747322"
let rubbishDetectedLanguage = detectedLanguage(rubbish) // => nil
Still find it a not bad result for providing a useful output...
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
Official Resources
Apple Video Sessions:
For more about Natural Language Processing and how NSLinguisticTagger works: Natural Language Processing and your Apps.
Also, for getting familiar with the CoreML:
Introducing Core ML.
Core ML in depth.
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German
I tried NSLinguisticTagger with short input text like hello, it always recognizes as Italian.
Luckily, Apple recently added NLLanguageRecognizer available on iOS 12, and seems like it more accurate :D
import NaturalLanguage
if #available(iOS 12.0, *) {
let languageRecognizer = NLLanguageRecognizer()
languageRecognizer.processString(text)
let code = languageRecognizer.dominantLanguage!.rawValue
let language = Locale.current.localizedString(forIdentifier: code)
}

`NSLocale preferredLanguages` contains "-US" since iOS 9

We have code like the following to retrieved the user language preference:
NSString *language = [[NSLocale preferredLanguages] firstObject];
Before iOS 8.4, language is "zh-Hans", "de", "ru", "ja" and etc. But since iOS 9, I notice that there is additional three characters "-US" appended to language. For example, "zh-Hans" becomes "zh-Hans-US"
I can find any documentation about this change. I assume that I could do something like the following to workaround this issue.
NSRange range = [language rangeOfString:#"-US"];
if (range.location!=NSNotFound && language.length==range.location+3) {
// If the last 3 chars are "-US", remove it
language = [language substringToIndex:range.location];
}
However, I am not sure whether it is safe to do so. It seems that "-US" is the location where the user is using the app? But this doesn't really make sense because we are in Canada. Has any body from other part of the world tried this?
Apple has started adding regions onto the language locales in iOS 9. Per Apple's docs, it has a fallback mechanism now if no region is specified. If you need to only support some languages, here is how I worked around it, per Apple's docs suggestion:
NSArray<NSString *> *availableLanguages = #[#"en", #"es", #"de", #"ru", #"zh-Hans", #"ja", #"pt"];
self.currentLanguage = [[[NSBundle preferredLocalizationsFromArray:availableLanguages] firstObject] mutableCopy];
This will automatically assign one of the languages in the array based off the User's language settings without having to worry about regions.
Source: Technical Note TN2418
To extract the region I think this is a better solution:
// Format is Lang - Region
NSString *fullString = [[NSLocale preferredLanguages] firstObject];
NSMutableArray *langAndRegion = [NSMutableArray arrayWithArray:[fullString componentsSeparatedByString:#"-"]];
// Region is the last item
NSString *region = [langAndRegion objectAtIndex:langAndRegion.count - 1];
// We remove region
[langAndRegion removeLastObject];
// We recreate array with the lang
NSString *lang = [langAndRegion componentsJoinedByString:#"-"];
Swift 5: Remove region from preferred language
Using Locale.preferredLanguages.first gives you the preferred App language (which can be different than device language for the user).
In order to support the script code and language code (but to remove the region code) I think it is best to create a locale given the preferred language and grab the information we need from there.
if let pref = Locale.preferredLanguages.first {
let locale = Locale(identifier: pref)
let code = [locale.languageCode, locale.scriptCode].compactMap{$0}.joined(separator: "-")
print(code)
}
So first we get the preferred app language, Then create a locale from the language.
To get the language code we create an array with locale.languageCode and the locale.scriptCode (which may be nil), remove any nil values with compactMap and then join the values with a "-".
This should allow support for Simplified Chinese and Traditional, and let Apple handle the region instead of assuming it will always be there.

Arabic characters (effective power bug) crash my swift iOS app. how do I properly sanitize inputs to avoid this and related problems?

the text that caused the crash is the following:
the error occurred at the following line:
let size = CGSize(width: 250, height: DBL_MAX)
let font = UIFont.systemFontOfSize(16.0)
let attributes = [
NSFontAttributeName:font ,
NSParagraphStyleAttributeName: paraStyle
]
var rect = text.boundingRectWithSize(size, options:.UsesLineFragmentOrigin, attributes: attributes, context: nil)
where text variable contains the inputted string
parastyle is declared as follows:
let paraStyle = NSMutableParagraphStyle()
paraStyle.lineBreakMode = NSLineBreakMode.ByWordWrapping
My initial idea is that the system font can't handle these characters and I need to do an NSCharacterSet, but I'm not sure how to either just ban characters that'll crash my app or make it so i can handle this input (ideal). I don't want to ban emojis/emoticons either.
Thanks!
Not an answer but some information and that possibly provids a way code way to avoid it.
Updated to information from The Register:
The problem isn’t with the Arabic characters themselves, but in how the unicode representing them is processed by CoreText, which is a library of software routines to help apps display text on screens.
The bug causes CoreText to access memory that is invalid, which forces the operating system to kill off the currently running program: which could be your text message app, your terminal, or in the case of the notification screen, a core part of the OS.
From Reddit but this may not be completely correct:
It only works when the message has to be abbreviated with ‘…’. This is usually on the lock screen and main menu of Messages.app.
The words effective and power can be anything as long as they’re on two different lines, which forces the Arabic text farther down the message where some of the letters will be replaced with ‘…’
The crash happens when the first dot replaces part of one of the Arabic characters (they require more than one byte to store) Normally there are safety checks to make sure half characters aren’t stored, but this replacement bypasses those checks for whatever reason.
My solution is the next category:
static NSString *const CRASH_STRING = #"\u0963h \u0963 \u0963";
#implementation NSString (CONEffectivePower)
- (BOOL)isDangerousStringForCurrentOS
{
if (IS_IOS_7_OR_LESS || IS_IOS_8_4_OR_HIGHER) {
return NO;
}
return [self containsEffectivePowerText];
}
- (BOOL)containsEffectivePowerText
{
return [self containsString:CRASH_STRING];
}
#end
Filter all characters to have same directionality. Unfortunately, I'm only aware of such API in Java.
Don't even try. This is a bug in the operating system that will be fixed. It's not your problem. If you try to fix it, you are just wasting your time. And you are very likely to introduce bugs - when you say you "sanitise" input that means you cannot handle some perfectly fine input.
The company I work at develops a multiplatform group video chat.
In Crashlytics report we started noticing that some users are "effectively" trolling iOS users using this famous unicode sequence.
We can't just sit and wait for Apple to fix this bug.
So, I've worked on this problem, this is the shortest crashing sequence I got:
// unichar representation
unichar crashChars[8] = {1585, 1611, 32, 2403, 32, 2403, 32, 2403};
// string representation
NSString *crashString = #"\u0631\u064b \u0963 \u0963 \u0963"
So, I decided to filter out all text messages that contains two U+0963 'ॣ' symbols with one symbol between them (hope you are able to decipher this phrase)
My code from NSString+Extensions category.
static const unichar kDangerousSymbol = 2403;
- (BOOL)isDangerousUnicode {
NSUInteger distance = 0;
NSUInteger charactersFound = 0;
for (NSUInteger i = 0; i < self.length; i++) {
unichar character = [self characterAtIndex:i];
if (charactersFound) {
distance++;
}
if (distance > 2) {
charactersFound = 0;
}
if (kDangerousSymbol == character) {
charactersFound++;
}
if (charactersFound > 1 && distance > 0) {
return YES;
}
}
return NO;
}
Lousy Specta test:
SpecBegin(NSStringExtensions)
describe(#"NSString+Extensions", ^{
//....
it(#"should detect dangerous Unicode sequences", ^{
expect([#"\u0963 \u0963" isDangerousUnicode]).to.beTruthy();
expect([#"\u0631\u064b \u0963 \u0963 \u0963" isDangerousUnicode]).to.beTruthy();
expect([#"\u0631\u064b \u0963 \u0963 \u0963" isDangerousUnicode]).to.beFalsy();
});
//....
});
SpecEnd
I'm not sure if it's OK to "discriminate" messages with too many "devanagari vowel sign vocalic ll".
I'm open to corrections, suggestions, criticism :).
I would love to see a better solution to this problem.

Swift optimization level 'Fastest' breaks sorting of array

I have a really strange issue. I'm sorting an array of NSDictionary objects in my app, but it only works correctly when the app is running from Xcode. As soon as I distribute the app and install & run it on a device, the sorting no longer works.
Here's the code can be run in a playground, with some example NSDictionary objects. The code in the app is the same.
import UIKit
let p1 = NSDictionary(objects: ["Zoe", 32], forKeys: ["name", "age"])
let p2 = NSDictionary(objects: ["Adrian", 54], forKeys: ["name", "age"])
let p3 = NSDictionary(objects: ["Jeff", 23], forKeys: ["name", "age"])
let p4 = NSDictionary(objects: ["", 66], forKeys: ["name", "age"])
let p5 = NSDictionary(objects: [23], forKeys: ["age"])
let persons = [p1,p2,p3,p4,p5]
let sortedPersons = persons.sorted { (p1, p2) -> Bool in
(p2["name"] as? String) > (p1["name"] as? String)
}
As you can see, sorting in the playground does work correctly. Does anyone know what could be wrong?
Update
I found that the Swift Optimization Level is causing the problem. Setting this to -O (Fastest) will cause the sort to fail. Setting it to -Onone (None) will cause the sort to work properly.
Does anyone have any suggestions on how to change the code, so it will work with -O optimization?
Update 2
I've filed a bug report at Apple. For the time being I'm using an NSSet to sort the array, which seems to be working fine.
Last update
I haven't been able to reproduce this since Xcode 6.1.1
This appears to be down to your naming convention within your sorted closure. Changing (p1, p2) to different names will resolve it. With -Ofastest, the compiler seems to be incorrectly doing 2 things:
1) causing p1 and p2 within the closure to refer to the NSDictionarys themselves rather than the closure parameters
2) cleaning up the references to the NSDictionary objects prematurely, given #1
Change the code so the last section shows:
let sortedPersons = persons.sorted { (d1, d2) -> Bool in
(d2["name"] as? String) > (d1["name"] as? String)
}

Resources