Which characters does NSLineBreakByWordWrapping break on? - ios

Short version
From the docs:
NSLineBreakByWordWrapping
Wrapping occurs at word boundaries, unless the word itself doesn’t fit on a single line.
What is the set of all word boundary characters?
Longer version
I have a set of UILabels, which contain text, sometimes including URLs. I need to know the exact location (frame, not range) of the URLs so that I can make them tappable. My math mostly works, but I had to build in a test for certain characters:
// This code only reached if the URL is longer than the available width
NSString *theText = // A string containing an HTTP/HTTPS URL
NSCharacterSet *breakChars = [NSCharacterSet characterSetWithCharactersInString:#"?-"];
NSString *charsInRemaininsSpace = // NSString with remaining text on this line
NSUInteger breakIndex = NSNotFound;
if (charsInRemaininsSpace)
breakIndex = [charsInRemaininsSpace rangeOfCharacterFromSet:breakChars
options:NSBackwardsSearch].location;
if (breakIndex != NSNotFound && breakIndex != theText.length-1) {
// There is a breakable char in the middle, so draw a URL through that, then break
// ...
} else {
// There is no breakable char (or it's at the end), so start this word on a new line
// ...
}
The characters in my NSCharacterSet are just ? and -, which I discovered NSLineBreakByWordWrapping breaks on. It does not break on some other characters I see in URLs like % and =. Is there a complete list of characters I should be breaking on?

I recommend using "not alphanumeric" as your test.
[[NSCharacterSet alphanumerCharacterSet] invertedSet]
That said, if you're doing a lot of this, you may want to consider using CTFramesetter to do your layout instead of UILabel. Then you can use CTRunGetImageBounds to calculate actual rects. You wouldn't have to calculate your word breaks, since CTFrame will already have done this for you (and it would be guaranteed to be the same algorithm).

Related

Replace iOS app emoji with twitter open source twemoji

I want to replace all standard iOS emoji from a UILable or UITextView with twitters open source twemoji.
I can't find any library or documentation to do this in iOS. Does anyone have a solution that does not involve me implementing this from scratch?
The solution needs to be efficient and work offline.
The question got me intrigued, and after a bit of searching on how it would be possible to replace all standard iOS emoji with a custom set, I noticed that even Twitter's own iOS app doesn't use Twemoji:
In the end, I came to the same conclusion as you:
I can't find any library or documentation to do this in iOS.
So, I created a framework in Swift for this exact purpose.
It does all the work for you, but if you want to implement your own solution, I'll describe below how to replace all standard emoji with Twemoji.
1. Document all characters that can be represented as emoji
There are 1126 base characters that have emoji representations, and over a thousand additional representations formed by sequences. Although most base characters are confined to six Unicode blocks, all but one of these blocks are mixed with non-emoji characters and/or unassigned code points. The remaining base characters outside these blocks are scattered across various other blocks.
My implementation simply declares the UTF-32 code points for these characters, as the value property of UnicodeScalar is exactly this.
2. Check whether a character is an emoji
In Swift, a String contains a collection of Character objects, each of which represent a single extended grapheme cluster. An extended grapheme cluster is a sequence of Unicode scalars that together represent one1 human-readable character, which is helpful since you can loop through the Characters of a string and handling them based on the UnicodeScalars they contain (rather than looping through the UTF-16 values of the string).
To identify whether a Character is an emoji, only the first UnicodeScalar is significant, so comparing this value to your table of emoji characters is enough. However, I'd also recommend checking if the Character contains a Variation Selector, and if it does, make sure that it's VS16 – otherwise the character shouldn't be presented as emoji.
Extracting the UnicodeScalars from a Character requires a tiny hack:
let c: Character = "A"
let scalars = String(c).unicodeScalars
3. Convert the code points into the correct format
Twemoji images are named according to their corresponding code points2, which makes sense. So, the next step is to convert the Character into a string equivalent to the image name:
let codePoint = String("🙃").unicodeScalars.first!.value // 128579
let imageName = String(codePoint, radix: 16) // "1f643"
Great, but this won't work for flags or keycaps, so we'll have to modify our code to take those into account:
let scalars = String("🇧🇪").unicodeScalars
let filtered = scalars.filter{ $0.value != 0xfe0f } // Remove VS16 from variants, including keycaps.
let mapped = filtered.map{ String($0.value, radix: 16) }
let imageName = mapped.joined(separator: "-") // "1f1e7-1f1ea"
4. Replace the emoji in the string
In order to replace the emoji in a given String, we'll need to use NSMutableAttributedString for storing the original string, and replace the emoji with NSTextAttachment objects containing the corresponding Twemoji image.
let originalString = "🙃"
let attributedString = NSMutableAttributedString(string: originalString)
for character in originalString.characters {
// Check if character is emoji, see section 2.
...
// Get the image name from the character, see section 3.
let imageName = ...
// Safely unwrapping to make sure the image exists.
if let image = UIImage(named: imageName) {
let attachment = NSTextAttachment()
attachment.image = image
// Create an attributed string from the attachment.
let twemoji = NSAttributedString(attachment: attachment)
// Get the range of the character in attributedString.
let range = attributedString.mutableString.range(of: String(character))
// Replace the emoji with the corresponding Twemoji.
attributedString.replaceCharacters(in: range, with: twemoji)
}
}
To display the resulting attributed string, just set it as the attributedText property of a UITextView/UILabel.
Note that the above method doesn't take into account zero-width joiners or modifier sequences, but I feel like this answer is already too long as it stands.
1. There is a quirk with the Character type that interprets a sequence of joined regional indicator symbols as one object, despite containing a theoretically unlimited amount of Unicode scalars. Try "🇩🇰🇫🇮🇮🇸🇳🇴🇸🇪".characters.count in a playground.
2. The naming pattern varies slightly when it comes to zero-width joiners and variation selectors, so it's easier to strip these out of the image names – see here.
Easiest thing to do:
1) Load the twemoji images into your project.
2) Create an NSDictionary that correlates the emoji codes supported by iOS with the paths to the respective twemoji images:
NSArray *iOSEmojis = #[#"iOSEmoji1",#"iOSEmoji2];
NSDictionary *twemojiPaths = [NSDictionary dictionaryWithObjects:#[#"Project/twemoji1.png",#"Project/twemoji2.png"] andKeys:#[#"iOSEmoji1","iOSEmoji2"]];
3) Code your app to search for emoji strings and display the twemojis where the regular emojis would go:
for (NSString *emoji in iOSEmojis)
{
NSString *twemojiPath = [twemojiPaths valueForKey:emoji];
// Find the position of the emoji string in the text
// and put an image view there.
NSRange range = [label.text rangeOfString:emoji];
NSString *prefix = [label.text substringToIndex:range.location];
CGSize prefixSize = [prefix sizeWithAttributes: #{NSFontAttributeName: [UIFont fontWithName:#"HelveticaNeue" size:14]}];
CGSize emojiSize = [label.text sizeWithAttributes: #{NSFontAttributeName: [UIFont fontWithName:#"HelveticaNeue" size:14]}];
CGRect imageViewFrame = CGRectMake(prefixSize.width,label.frame.size.height,emojiSize.width,label.frame.size.height);
imageViewFrame = [self.view convertRect:imageViewFrame fromView:label];
UIImageView *imageView = [[UIImageView alloc] initWithFrame:imageViewFrame];
imageView.image = [UIImage imageWithContentsOfFile:twemojiPath];
}

Prevent UILabel from character wrapping symbols

I have a multi-line label that has the following text:
Lots of text here · $$$$
Since the text at the beginning is freeform, sometimes the wrapping ends up looking like this:
Lots of text here · $$$
$
How do I prevent this from happening? I want it to look like this:
Lots of text here ·
$$$$
I've tried every lineBreakMode to little avail. Word wrap doesn't work because it doesn't treat $$$$ as a word.
It seems that you might benefit from subclassing UILabel, which would treat a string differently for the NSLineBreakByWordWrapping line break mode, which treats your phonetic words like words. You will effectively be expanding the definition by which your custom linebreakmode considers a word.
You would have to roll your own line-breaking algorithm. The approach to determining the location of your line-breaks would be similar to the following:
Loop through the string, to get each character, until one of two conditions is met: a) you have reached the width of the view, or b) you have reached a space, and the next word (delimited by a space) doesn't fit on the same line.
If you have reached condition a, you have two options--you could either adopt a policy that never splits words into multiple lines, or your could only apply the non-split rule to your phonetic words. Either way, you will need to insert a line break at the beginning of the phonetic word, when there is no more room on a given line.
You may want to use two separate strings, to keep the source string separate from the display string that contains your formatting.
Let me know if that helps!
This might be very late but atleast it could help someone.
The way I have done it is as follows:
UIFont *fontUsed = [UIFont systemFontOfSize:17];
NSDictionary *dictFont = [NSDictionary dictionaryWithObject:fontUsed forKey:NSFontAttributeName];
NSString *strTextToShow = #"text that has to be displayed but without $$$$";
CGRect rectForSimpleText = [strTextToShow boundingRectWithSize:CGSizeMake(154, 258) options:NSStringDrawingUsesLineFragmentOrigin attributes:dictFont context:nil];
NSString *strTextAdded = [NSString stringWithFormat:#"%# $$$$", strTextToShow];
CGFloat oldHeight = rectForSimpleText.size.height;
CGRect rectForAppendedText = [strTextAdded boundingRectWithSize:CGSizeMake(154, 258) options:NSStringDrawingUsesLineFragmentOrigin attributes:dictFont context:nil];
CGFloat newHeight = rectForAppendedText.size.height;
if (oldHeight < newHeight) {
strTextAdded = [strTextAdded stringByReplacingOccurrencesOfString:#"$$$$" withString:#"\n$$$$"];
}
[lblLongText setText:strTextAdded];
lblLongText here is the IBOutlet of UILabel and CGSizeMake(154, 258) is the size of UILabel I have used. Let me know if there is any other way you have found.
Try inserting a line break in your input text.
Lots of text here ·\n $$$
It should print the $$$ in the next line.

Move Cursor One Word(including language like Chinese) at a Time in UTextView

I have read this,but it can only work well in English for it just use white-space and something like NewlineCharacterSet as separator.
I want to add a left arrow and a right arrow in the accessory input view to move the cursor in UITextView by words.
And I am wondering how to support that feature for some Asian languages like Chinese
PS:I will added an example that CFStringTokenizer failed to work with when there are both English Characters and Chinese characters
test string:
Happy Christmas! Text view test 云存储容器测试开心 yeap
the expected boundaries:
Happy/ Christmas!/ Text/ view/ test/ 云/存储/容器/测试/开心/ yeap/
the boundaries show in reality:
Happy/ Christmas!/ Text/ view/ test/ 云存储容器测试开心/ yeap/
I don't speak Chinese, but according to the documentation,
CFStringTokenizer is able to find word boundaries in many languages,
including Asian languages.
The following code shows how to advance from one word ("world" at position 6)
to the next word ("This" at position 13):
// Test string.
NSString *string = #"Hello world. This is great.";
// Create tokenizer
CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(NULL,
(__bridge CFStringRef)(string),
CFRangeMake(0, [string length]),
kCFStringTokenizerUnitWordBoundary,
CFLocaleCopyCurrent());
// Start with a position that is inside the word "world".
CFIndex position = 6;
// Goto current token ("world")
CFStringTokenizerTokenType tokenType;
tokenType = CFStringTokenizerGoToTokenAtIndex(tokenizer, position);
if (tokenType != kCFStringTokenizerTokenNone) {
// Advance to next "normal" token:
tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer);
while (tokenType != kCFStringTokenizerTokenNone && tokenType != kCFStringTokenizerTokenNormal) {
tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer);
}
if (tokenType != kCFStringTokenizerTokenNone) {
// Get the location of next token in the string:
CFRange range = CFStringTokenizerGetCurrentTokenRange(tokenizer);
position = range.location;
NSLog(#"%ld", position);
// Output: 13 = position of the word "This"
}
}
There is no CFStringTokenizerAdvanceToPreviousToken() function, so to move to
the previous word you have to start at the beginning of the string and advance forward.
Finnally I use UITextInputTokenizer to realized the function

Text string with EMOJI causing issues with NSRange

I am using TTTAttributedLabel to apply formatting to text, however it seems to crash because I am trying to apply formatting to a range which includes emoji. Example:
NSString *text = #"#user1234 🍺🍺 #hashtag"; // text.length reported as 22 by NSLog as each emoji is 2 chars in length
cell.textLabel.text = text;
int length = 8;
int start = 13;
NSRange *range = NSMakeRange(start, length);
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))) {
// apply formatting to TTTAttributedLabel
[cell.textLabel addLinkToURL:[NSURL URLWithString:[NSString stringWithFormat:#"someaction://hashtag/%#", [cell.textLabel.text substringWithRange:range]]] withRange:range];
}
Note: I am passed the NSRange values from an API, as well as the text string.
In the above I am attempting to apply formatting to #hashtag. Normally this works fine, but because I have emoji involved in the string, I believe the range identified is attempting to format the emoji, as they are actually UTF values, which in TTTAttributedLabel causes a crash (it actually hangs with no crash, but...)
Strangely, it works fine if there is 1 emoji, but breaks if there are 2.
Can anyone help me figure out what to do here?
The problem is that any Unicode character in your string with a Unicode value of \U10000 or higher will appears as two characters in NSString.
Since you want to format the hashtag, you should use more dynamic ways to obtain the start and length values. Use NSString rangeOfString to find the location of the # character. Use that results and the string's length to get the needed length.
NSString *text = #"#user1234 🍺🍺 #hashtag"; // text.length reported as 22 by NSLog as each emoji is 2 chars in length
cell.textLabel.text = text;
NSUInteger start = [text rangeOfString:#"#"];
if (start != NSNotFound) {
NSUInteger length = text.length - start;
NSRange *range = NSMakeRange(start, length);
// apply formatting to TTTAttributedLabel
[cell.textLabel addLinkToURL:[NSURL URLWithString:[NSString stringWithFormat:#"someaction://hashtag/%#", [cell.textLabel.text substringWithRange:range]]] withRange:range];
}
I assume this is from the Twitter API, and you are trying to use the entities dictionary they return. I have just been writing code to support handling those ranges along with NSString's version of the range of a string.
My approach was to "fix" the entities dictionary that Twitter return to cope with the extra characters. I can't share code, for various reasons, but this is what I did:
Make a deep mutable copy of the entities dictionary.
Loop through the entire range of the string, unichar by unichar, doing this:
Check if the unichar is in the surrogate pair range (0xd800 -> 0xdfff).
If it is a surrogate pair codepoint, then go through all the entries in the entities dictionary and shift the indices by 1 if they are greater than the current location in the string (in terms of unichars). Then increment the loop counter by 1 to skip the partner of this surrogate pair as it's been handled now.
If it's not a surrogate pair, do nothing.
Loop through all entities and check that none of them overrun the end of the string. They shouldn't, but just incase. I found some cases where Twitter returned duff data.
I hope that helps! I also hope that one day I can open source this code as I think it would be incredibly useful!

Character code in NSString to unicode character

I have an NSString with a charactercode like this: 0x1F514.
I want to take this NSString and add it to another NSString, but not with the literal value of it, but the icon hidden behind it. In this case an emoticon of a bell.
How can I easily convert this NSString to show the emoticon instead of the character code?
Something like this would do:
NSString *c = #"0x1F514";
unsigned intVal;
NSScanner *scanner = [NSScanner scannerWithString:c];
[scanner scanHexInt:&intVal];
NSString *str = nil;
if (intVal > 0xFFFF) {
unsigned remainder = intVal - 0x10000;
unsigned topTenBits = (remainder >> 10) & 0x3FF;
unsigned botTenBits = (remainder >> 0) & 0x3FF;
unichar hi = topTenBits + 0xD800;
unichar lo = botTenBits + 0xDC00;
unichar unicodeChars[2] = {hi, lo};
str = [NSString stringWithCharacters:unicodeChars length:2];
} else {
unichar lo = (unichar)(intVal & 0xFFFF);
str = [NSString stringWithCharacters:&lo length:1];
}
NSLog(#"str = %#", str);
The reason simply #"\u1f514" doesn't work is because those \u values cannot be outside the BMP, i.e. >0xFFFF, i.e. >16-bit.
So, what my code does is check for that scenario and does the relevant surrogate pair magic to make the right string.
Hopefully that is actually what you want and makes sense!
If your NSString contains this "bell" character, then it does. You just append strings the usual way, like with stringByAppendingString.
The drawing of a bell instead of something denoting an unknown character is a completely separate issue. Your best bet is to ensure you're not using CoreText for drawing this, as it's been reported elsewhere, and I've seen it myself at work, that various non-standard characters may not work when printed that way. They do work, however, when printed with UIKit (that should be standard UI components, UIKitAdditions, and so on).
If using CoreText, you might get a bit lucky if you disable some text properties for the string with this special character, or choose appropriate font (but I won't help you here; we decided to leave the issue as Won't fix).
Having said that, the last time I was dealing with those was in pre-iOS 6 days...
Summary: your problem is not appending strings, but how you draw them.

Resources