How to express Strings in Swift using Unicode hexadecimal values (UTF-16) - ios

I want to write a Unicode string using hexadecimal values in Swift. I have read the documentation for String and Character so I know that I can use special Unicode characters directly in strings like the following:
var variableString = "Cat‼🐱" // "Cat" + Double Exclamation + cat emoji
But I would like to do it using the Unicode code points. The docs (and this question) show it for characters, but are not very clear about how to do it for strings.
(Note: Although the answer seems obvious to me now, it wasn't obvious at all just a short time ago. I am answering my own question below as a means of learning how to do this and also to help myself understand Unicode terminology and how Swift Characters and Strings work.)

Character
The Swift syntax for forming a hexadecimal code point is
\u{n}
where n is a hexadecimal number up to 8 digits long. The valid range for a Unicode scalar is U+0 to U+D7FF and U+E000 to U+10FFFF inclusive. (The U+D800 to U+DFFF range is for surrogate pairs, which are not scalars themselves, but are used in UTF-16 for encoding the higher value scalars.)
Examples:
// The following forms are equivalent. They all produce "C".
let char1: Character = "\u{43}"
let char2: Character = "\u{0043}"
let char3: Character = "\u{00000043}"
// Higher value Unicode scalars are done similarly
let char4: Character = "\u{203C}" // ‼ (DOUBLE EXCLAMATION MARK character)
let char5: Character = "\u{1F431}" // 🐱 (cat emoji)
// Characters can be made up of multiple scalars
let char7: Character = "\u{65}\u{301}" // é = "e" + accent mark
let char8: Character = "\u{65}\u{301}\u{20DD}" // é⃝ = "e" + accent mark + circle
Notes:
Leading zeros can be added or omitted
Characters are known as extended grapheme clusters. Even when they are composed of multiple scalars, they are still considered a single character. What is key is that they appear to be a single character (grapheme) to the user.
TODO: How to convert surrogate pair to Unicode scalar in Swift
String
Strings are composed of characters. See the following examples for some ways to form them using hexadecimal code points.
Examples:
var string1 = "\u{0043}\u{0061}\u{0074}\u{203C}\u{1F431}" // Cat‼🐱
// pass an array of characters to a String initializer
let catCharacters: [Character] = ["\u{0043}", "\u{0061}", "\u{0074}", "\u{203C}", "\u{1F431}"] // ["C", "a", "t", "‼", "🐱"]
let string2 = String(catCharacters) // Cat‼🐱
Converting Hex Values at Runtime
At runtime you can convert hexadecimal or Int values into a Character or String by first converting it to a UnicodeScalar.
Examples:
// hex values
let value0: UInt8 = 0x43 // 67
let value1: UInt16 = 0x203C // 8252
let value2: UInt32 = 0x1F431 // 128049
// convert hex to UnicodeScalar
let scalar0 = UnicodeScalar(value0)
// make sure that UInt16 and UInt32 form valid Unicode values
guard
let scalar1 = UnicodeScalar(value1),
let scalar2 = UnicodeScalar(value2) else {
return
}
// convert to Character
let character0 = Character(scalar0) // C
let character1 = Character(scalar1) // ‼
let character2 = Character(scalar2) // 🐱
// convert to String
let string0 = String(scalar0) // C
let string1 = String(scalar1) // ‼
let string2 = String(scalar2) // 🐱
// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
if let scalar = UnicodeScalar(hexValue) {
myString.append(Character(scalar))
}
}
print(myString) // Cat‼🐱
Further reading
Strings and Characters docs
Glossary of Unicode Terms
Strings in Swift
Working with Unicode code points in Swift

from your Hex "0x1F52D" to actual Emoji
let c = 0x1F602
next step would possibly getting an Uint32 from your Hex
let intEmoji = UnicodeScalar(c!).value
from this you can do something like
titleLabel.text = String(UnicodeScalar(intEmoji)!)
here you have a "😂"
it work with range of hexadecimal too
let emojiRanges = [
0x1F600...0x1F636,
0x1F645...0x1F64F,
0x1F910...0x1F91F,
0x1F30D...0x1F52D
]
for range in emojiRanges {
for i in range {
let c = UnicodeScalar(i)!.value
data.append(c)
}
}
to get multiple UInt32 from your Hex range for exemple

Related

How to split uncode string into characters

I have strings like
"\U0aac\U0ab9\U0ac1\U0ab5\U0a9a\U0aa8",
"\U0a97\U0ac1\U0ab8\U0acd\U0ab8\U0acb",
"\U0aa6\U0abe\U0ab5\U0acb",
"\U0a96\U0a82\U0aa1"
But I want to split this strings by unicode character
I dont know hot to do. I know components seprated by function but it's no use here.
\nAny help would be apperiaciated
If the strings you're getting really contain \U characters, you need to parse them manually and extract the unicode scalar values. Something like this:
let strings = [
"\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8",
"\\U0a97\\U0ac1\\U0ab8\\U0acd\\U0ab8\\U0acb",
"\\U0aa6\\U0abe\\U0ab5\\U0acb",
"\\U0a96\\U0a82\\U0aa1"
]
for str in strings {
let chars = str.components(separatedBy: "\\U")
var string = ""
for ch in chars {
if let val = Int(ch, radix: 16), let uni = Unicode.Scalar(val) {
string.unicodeScalars.append(uni)
}
}
print(string)
}
You can map your array, split its elements at non hexa digit values, compact map them into UInt32 values, initializate unicode scalars with them and map the resulting elements of your array into a UnicodeScalarView and init a new string with it:
let arr = [
#"\U0aac\U0ab9\U0ac1\U0ab5\U0a9a\U0aa8"#,
#"\U0a97\U0ac1\U0ab8\U0acd\U0ab8\U0acb"#,
#"\U0aa6\U0abe\U0ab5\U0acb"#,
#"\U0a96\U0a82\U0aa1"#]
let strings = arr.map {
$0.split { !$0.isHexDigit }
.compactMap { UInt32($0, radix: 16) }
.compactMap(Unicode.Scalar.init)
}.map { String(String.UnicodeScalarView($0)) }
print(strings)
This will print
["બહુવચન", "ગુસ્સો", "દાવો", "ખંડ"]
So, the string that comes back already has the "\" because in order to use components you'd need to have an additional escaping "\" so that you'd be able to do:
var listofCodes = ["\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8", "\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8"]
var unicodeArray :[String] = []
listofCodes.forEach { string in
unicodeArray
.append(contentsOf: string.components(separatedBy: "\\"))
unicodeArray.removeAll(where: {value in value == ""})
}
print(unicodeArray)
I will revise this answer once you specify how you are obtaining these strings, as is I get a non-valid string error from the start.

How to convert sequence of ASCII code into string in swift 4?

I have an sequence of ASCII codes in string format like (7297112112121326610511411610410097121). How to convert this into text format.
I tried below code :
func convertAscii(asciiStr: String) {
var asciiString = ""
for asciiChar in asciiStr {
if let number = UInt8(asciiChar, radix: 2) { // Cannot invoke initializer for type 'UInt8' with an argument list of type '(Character, radix: Int)'
print(number)
let character = String(describing: UnicodeScalar(number))
asciiString.append(character)
}
}
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
But getting error in if let number line.
As already mentioned decimal ASCII values are in range of 0-255 and can be more than 2 digits
Based on Sulthan's answer and assuming there are no characters < 32 (0x20) and > 199 (0xc7) in the text this approach checks the first character of the cropped string. If it's "1" the character is represented by 3 digits otherwise 2.
func convertAscii(asciiStr: String) {
var source = asciiStr
var result = ""
while source.count >= 2 {
let digitsPerCharacter = source.hasPrefix("1") ? 3 : 2
let charBytes = source.prefix(digitsPerCharacter)
source = String(source.dropFirst(digitsPerCharacter))
let number = Int(charBytes)!
let character = UnicodeScalar(number)!
result += String(character)
}
print(result) // "Happy Birthday"
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
If we consider the string to be composed of characters where every character is represented by 2 decimal letters, then something like this would work (this is just an example, not optimal).
func convertAscii(asciiStr: String) {
var source = asciiStr
var characters: [String] = []
let digitsPerCharacter = 2
while source.count >= digitsPerCharacter {
let charBytes = source.prefix(digitsPerCharacter)
source = String(source.dropFirst(digitsPerCharacter))
let number = Int(charBytes, radix: 10)!
let character = UnicodeScalar(number)!
characters.append(String(character))
}
let result: String = characters.joined()
print(result)
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
However, the format itself is ambigious because ASCII characters can take from 1 to 3 decimal digits, therefore to parse correctly, you need all characters to have the same length (e.g. 1 should be 001).
Note that I am taking always the same number of letters, then convert them to a number and then create a character the number.

How to handle the %s format specifier

Objective-C code:
NSString *str = #"hi";
NSString *strDigit = #"1934"; (or #"193" may be a 3 digit or 4 digit value)
[dayText appendFormat:#"%#%4s,str,[strDigit UTF8String]];
The Objective-C code handles the output string with current alignment when it appears with 3 or 4 digits as output. It is correctly aligning to left and doesn't matter how much digits it is. Any one know how to handle this in Swift?
In Swift I tried with below code and the string is not adjusting the alignment according to the number of digits.
textForTrip += "\(str) \(String(format:"%4s", (strDigit.utf8))"
The %s format expects a pointer to a (NULL-terminated) C string
as argument, that can be obtained with the withCString method.
This would produce the same output as your Objective-C code:
let str = "Hi"
let strDigit = "193"
let text = strDigit.withCString {
String(format: "%#%4s", str, $0)
}
print(text)
It becomes easier if you store the number as integer instead of a
string:
let str = "Hi"
let number = 934
let text = String(format: "%#%4d", str, number)
print(text)
Try this below approach, that might help you
let strDigit = "\("1934".utf8)" //(or #"193" may be a 3 digit or 4 digit value)
var dayText = "Hello, good morning."
dayText += "\(strDigit.prefix(3))"

How to convert from unichar to character

There is better way to convert from unichar to Character?
tried :
var unichar = ...
let str = NSString(characters: &unichar, length: 1) as String
let character = Array(str.characters)[0])
You can convert unichar -> UnicodeScalar -> Character:
let c = unichar(8364)
if let uc = UnicodeScalar(c) {
let char = Character(uc)
print(char) // €
} else {
print("illegal input")
}
Input values in the range 0xD800...0xDFFF
(high and low surrogates) are not allowed because they do not
correspond to valid Unicode scalar values.
If it is guaranteed that those input values do not occur then you
can simplify the conversion to
let char = Character(UnicodeScalar(c)!)
To replace a possible invalid input value by a default character
(e.g. a question mark), use
let char = Character(UnicodeScalar(c) ?? "?")
public class func stringForIcon(_ icon: NSInteger) -> Character {
return Character(UnicodeScalar(icon)!)
}

String with Unicode (variable) [duplicate]

I have a problem I couldn't find a solution to.
I have a string variable holding the unicode "1f44d" and I want to convert it to a unicode character 👍.
Usually one would do something like this:
println("\u{1f44d}") // 👍
Here is what I mean:
let charAsString = "1f44d" // code in variable
println("\u{\(charAsString)}") // not working
I have tried several other ways but somehow the workings behind this magic stay hidden for me.
One should imagine the value of charAsString coming from an API call or from another object.
One possible solution (explanations "inline"):
let charAsString = "1f44d"
// Convert hex string to numeric value first:
var charCode : UInt32 = 0
let scanner = NSScanner(string: charAsString)
if scanner.scanHexInt(&charCode) {
// Create string from Unicode code point:
let str = String(UnicodeScalar(charCode))
println(str) // 👍
} else {
println("invalid input")
}
Slightly simpler with Swift 2:
let charAsString = "1f44d"
// Convert hex string to numeric value first:
if let charCode = UInt32(charAsString, radix: 16) {
// Create string from Unicode code point:
let str = String(UnicodeScalar(charCode))
print(str) // 👍
} else {
print("invalid input")
}
Note also that not all code points are valid Unicode scalars,
compare Validate Unicode code point in Swift.
Update for Swift 3:
public init?(_ v: UInt32)
is now a failable initializer of UnicodeScalar and checks if the
given numeric input is a valid Unicode scalar value:
let charAsString = "1f44d"
// Convert hex string to numeric value first:
if let charCode = UInt32(charAsString, radix: 16),
let unicode = UnicodeScalar(charCode) {
// Create string from Unicode code point:
let str = String(unicode)
print(str) // 👍
} else {
print("invalid input")
}
This can be done in two steps:
convert charAsString to Int code
convert code to unicode character
Second step can be done e.g. like this
var code = 0x1f44d
var scalar = UnicodeScalar(code)
var string = "\(scalar)"
As for first the step, see here how to convert String in hex representation to Int
As of Swift 2.0, every Int type has an initializer able to take String as an input. You can then easily generate an UnicodeScalar corresponding and print it afterwards. Without having to change your representation of chars as string ;).
UPDATED: Swift 3.0 changed UnicodeScalar initializer
print("\u{1f44d}") // 👍
let charAsString = "1f44d" // code in variable
let charAsInt = Int(charAsString, radix: 16)! // As indicated by #MartinR radix is required, default won't do it
let uScalar = UnicodeScalar(charAsInt)! // In Swift 3.0 this initializer is failible so you'll need either force unwrap or optionnal unwrapping
print("\(uScalar)")
You can use
let char = "-12"
print(char.unicodeScalars.map {$0.value }))
You'll get the values as:
[45, 49, 50]
Here are a couple ways to do it:
let string = "1f44d"
Solution 1:
"&#x\(string);".applyingTransform(.toXMLHex, reverse: true)
Solution 2:
"U+\(string)".applyingTransform(StringTransform("Hex/Unicode"), reverse: true)
I made this extension that works pretty well:
extension String {
var unicode: String? {
if let charCode = UInt32(self, radix: 16),
let unicode = UnicodeScalar(charCode) {
let str = String(unicode)
return str
}
return nil
}
}
How to test it:
if let test = "e9c8".unicode {
print(test)
}
//print:
You cannot use string interpolation in Swift as you try to use it. Therefore, the following code won't compile:
let charAsString = "1f44d"
print("\u{\(charAsString)}")
You will have to convert your string variable into an integer (using init(_:radix:) initializer) then create a Unicode scalar from this integer. The Swift 5 Playground sample code below shows how to proceed:
let validCodeString = "1f44d"
let validUnicodeScalarValue = Int(validCodeString, radix: 16)!
let validUnicodeScalar = Unicode.Scalar(validUnicodeScalarValue)!
print(validUnicodeScalar) // 👍

Resources