I'm trying to calculate the byte size of a String in Swift but I don't know the size of a single character; what is the byte size of a character?
Let's say I have a string:
let str = "hello, world"
I want to send that to some web service endpoint, but that endpoint only accepts strings whose size is under 32 bytes. How would I control the byte size of my string?
It all depends on the character encoding, let's suppose UTF8:
let string = "abde"
let size = string.utf8.count
Note that not all characters have the same byte size in UTF8.
If your string is ASCII, you can assume 1 byte per character.
Related
I have example:
let stringToCheck: String = "42"
let numbers: CharacterSet = CharacterSet.decimalDigits
let stringIsANumber: Bool = stringToCheck.rangeOfCharacter(from: numbers.inverted) == nil
and i have two question
How the function inverted works? what does it do?
what range does rangeOfCharacter return?
inverted means the opposite. For example, if you have only characters a and b and c and a character set consisting of a, its inversion is b and c. So your decimalDigits characters set, inverted, means everything that is not a decimal digit.
A range is a contiguous stretch of something, specified numerically. For example, if you have the string "abc", the range of "bc" is "the second and third characters". The range of something that isn't there at all can be expressed as nil.
So the code you have shown looks for a character that is not a digit in the original string, and if it fails to find one (so that the range is nil), it says that the string is entirely a number.
I want to replace all standard iOS emoji from a UILable or UITextView with twitters open source twemoji.
I can't find any library or documentation to do this in iOS. Does anyone have a solution that does not involve me implementing this from scratch?
The solution needs to be efficient and work offline.
The question got me intrigued, and after a bit of searching on how it would be possible to replace all standard iOS emoji with a custom set, I noticed that even Twitter's own iOS app doesn't use Twemoji:
In the end, I came to the same conclusion as you:
I can't find any library or documentation to do this in iOS.
So, I created a framework in Swift for this exact purpose.
It does all the work for you, but if you want to implement your own solution, I'll describe below how to replace all standard emoji with Twemoji.
1. Document all characters that can be represented as emoji
There are 1126 base characters that have emoji representations, and over a thousand additional representations formed by sequences. Although most base characters are confined to six Unicode blocks, all but one of these blocks are mixed with non-emoji characters and/or unassigned code points. The remaining base characters outside these blocks are scattered across various other blocks.
My implementation simply declares the UTF-32 code points for these characters, as the value property of UnicodeScalar is exactly this.
2. Check whether a character is an emoji
In Swift, a String contains a collection of Character objects, each of which represent a single extended grapheme cluster. An extended grapheme cluster is a sequence of Unicode scalars that together represent one1 human-readable character, which is helpful since you can loop through the Characters of a string and handling them based on the UnicodeScalars they contain (rather than looping through the UTF-16 values of the string).
To identify whether a Character is an emoji, only the first UnicodeScalar is significant, so comparing this value to your table of emoji characters is enough. However, I'd also recommend checking if the Character contains a Variation Selector, and if it does, make sure that it's VS16 โ otherwise the character shouldn't be presented as emoji.
Extracting the UnicodeScalars from a Character requires a tiny hack:
let c: Character = "A"
let scalars = String(c).unicodeScalars
3. Convert the code points into the correct format
Twemoji images are named according to their corresponding code points2, which makes sense. So, the next step is to convert the Character into a string equivalent to the image name:
let codePoint = String("๐").unicodeScalars.first!.value // 128579
let imageName = String(codePoint, radix: 16) // "1f643"
Great, but this won't work for flags or keycaps, so we'll have to modify our code to take those into account:
let scalars = String("๐ง๐ช").unicodeScalars
let filtered = scalars.filter{ $0.value != 0xfe0f } // Remove VS16 from variants, including keycaps.
let mapped = filtered.map{ String($0.value, radix: 16) }
let imageName = mapped.joined(separator: "-") // "1f1e7-1f1ea"
4. Replace the emoji in the string
In order to replace the emoji in a given String, we'll need to use NSMutableAttributedString for storing the original string, and replace the emoji with NSTextAttachment objects containing the corresponding Twemoji image.
let originalString = "๐"
let attributedString = NSMutableAttributedString(string: originalString)
for character in originalString.characters {
// Check if character is emoji, see section 2.
...
// Get the image name from the character, see section 3.
let imageName = ...
// Safely unwrapping to make sure the image exists.
if let image = UIImage(named: imageName) {
let attachment = NSTextAttachment()
attachment.image = image
// Create an attributed string from the attachment.
let twemoji = NSAttributedString(attachment: attachment)
// Get the range of the character in attributedString.
let range = attributedString.mutableString.range(of: String(character))
// Replace the emoji with the corresponding Twemoji.
attributedString.replaceCharacters(in: range, with: twemoji)
}
}
To display the resulting attributed string, just set it as the attributedText property of a UITextView/UILabel.
Note that the above method doesn't take into account zero-width joiners or modifier sequences, but I feel like this answer is already too long as it stands.
1. There is a quirk with the Character type that interprets a sequence of joined regional indicator symbols as one object, despite containing a theoretically unlimited amount of Unicode scalars. Try "๐ฉ๐ฐ๐ซ๐ฎ๐ฎ๐ธ๐ณ๐ด๐ธ๐ช".characters.count in a playground.
2. The naming pattern varies slightly when it comes to zero-width joiners and variation selectors, so it's easier to strip these out of the image names โ see here.
Easiest thing to do:
1) Load the twemoji images into your project.
2) Create an NSDictionary that correlates the emoji codes supported by iOS with the paths to the respective twemoji images:
NSArray *iOSEmojis = #[#"iOSEmoji1",#"iOSEmoji2];
NSDictionary *twemojiPaths = [NSDictionary dictionaryWithObjects:#[#"Project/twemoji1.png",#"Project/twemoji2.png"] andKeys:#[#"iOSEmoji1","iOSEmoji2"]];
3) Code your app to search for emoji strings and display the twemojis where the regular emojis would go:
for (NSString *emoji in iOSEmojis)
{
NSString *twemojiPath = [twemojiPaths valueForKey:emoji];
// Find the position of the emoji string in the text
// and put an image view there.
NSRange range = [label.text rangeOfString:emoji];
NSString *prefix = [label.text substringToIndex:range.location];
CGSize prefixSize = [prefix sizeWithAttributes: #{NSFontAttributeName: [UIFont fontWithName:#"HelveticaNeue" size:14]}];
CGSize emojiSize = [label.text sizeWithAttributes: #{NSFontAttributeName: [UIFont fontWithName:#"HelveticaNeue" size:14]}];
CGRect imageViewFrame = CGRectMake(prefixSize.width,label.frame.size.height,emojiSize.width,label.frame.size.height);
imageViewFrame = [self.view convertRect:imageViewFrame fromView:label];
UIImageView *imageView = [[UIImageView alloc] initWithFrame:imageViewFrame];
imageView.image = [UIImage imageWithContentsOfFile:twemojiPath];
}
let str1 = "๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช"
let str2 = "๐ฉ๐ช.๐ฉ๐ช.๐ฉ๐ช.๐ฉ๐ช.๐ฉ๐ช."
println("\(countElements(str1)), \(countElements(str2))")
Result: 1, 10
But should not str1 have 5 elements?
The bug seems only occurred when I use the flag emoji.
Update for Swift 4 (Xcode 9)
As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9
standard:
let str1 = "๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช"
print(str1.count) // 5
print(Array(str1)) // ["๐ฉ๐ช", "๐ฉ๐ช", "๐ฉ๐ช", "๐ฉ๐ช", "๐ฉ๐ช"]
Also String is a collection of its characters (again), so one can
obtain the character count with str1.count.
(Old answer for Swift 3 and older:)
From "3 Grapheme Cluster Boundaries"
in the "Standard Annex #29 UNICODE TEXT SEGMENTATION":
(emphasis added):
A legacy grapheme cluster is defined as a base (such as A or ใซ)
followed by zero or more continuing characters. One way to think of
this is as a sequence of characters that form a โstackโ.
The base can be single characters, or be any sequence of Hangul Jamo
characters that form a Hangul Syllable, as defined by D133 in The
Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji
national flag symbols corresponding to ISO country codes. Sequences of
more than two RI characters should be separated by other characters,
such as U+200B ZWSP.
(Thanks to #rintaro for the link).
A Swift Character represents an extended grapheme cluster, so it is (according
to this reference) correct that any sequence of regional indicator symbols
is counted as a single character.
You can separate the "flags" by a ZERO WIDTH NON-JOINER:
let str1 = "๐ฉ๐ช\u{200C}๐ฉ๐ช"
print(str1.characters.count) // 2
or insert a ZERO WIDTH SPACE:
let str2 = "๐ฉ๐ช\u{200B}๐ฉ๐ช"
print(str2.characters.count) // 3
This solves also possible ambiguities, e.g. should "๐ซโ๐ทโ๐บโ๐ธ"
be "๐ซโ๐ท๐บโ๐ธ" or "๐ซ๐ทโ๐บ๐ธ" ?
See also How to know if two emojis will be displayed as one emoji? about a possible method
to count the number of "composed characters" in a Swift string,
which would return 5 for your let str1 = "๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช".
Here's how I solved that problem, for Swift 3:
let str = "๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช๐ฉ๐ช" //or whatever the string of emojis is
let range = str.startIndex..<str.endIndex
var length = 0
str.enumerateSubstrings(in: range, options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in
length = length + 1
}
print("Character Count: \(length)")
This fixes all the problems with character count and emojis, and is the simplest method I have found.
I am making an app that downloads a 32bit integer from server, and use the first 16bit and second 16bit for different purposes...
I am responsible for the second 16bit, which should be used to form an int, I know I should use bitwise operation to do this, but unable to achieve, below is the code that I am using, please give me more information.
//CID is a 32bit integer, in nslog it shows as 68913219 - its different for every user
Byte lowByte = (CID>>16)&0xFF; //the 3rd byte
Byte highByte = (CID>>24)&0xFF; //the 4th byte
uint16_t value = lowByte & highByte; //combine, but the result is 0.
uint16_t value = lowByte & highByte; //combine, but the result is 0.
This is not how you combine two bytes into a single uint16_t: you are ANDing them in place, while you need to shift the high byte, and OR it with the low byte:
uint16_t value = lowByte | (((uint16_t)highByte) << 8);
However, this is suboptimal in terms of readability: if you start with a 32-bit integer, and you need to cut out the upper 16 bits, you could simply shift by 16, and mask with 0xFFFF - i.e. the same way that you cut out the third byte, but with a 16-bit mask.
You shouldn't need to do this byte-by-byte. Try this:
//CID is a 32bit integer, in nslog it shows as 68913219 - its different for every user
uint16_t value = (CID>>16)&0xffff;
You can also use NSData.
If you have this integer:
int i = 1;
You can use :
NSData *data = [NSData dataWithBytes: &i length: sizeof(i)];
and then, to get the second 16bit do:
int16_t recover;
[data getBytes: &recover range: NSMakeRange(2, 2)]
Then you have your 16 bits on "recover".
Maybe this way is not so efficient like bit operation, but it is clearer.
:D
In TCPDF there is GetCellHeight() for measuring the height of a cell.
How do you measure the length or width of a string (or block of string) in TCPDF?
The method to get the string length is GetStringWidth(). Check also the getNumLines() and getStringHeight() methods.