How to split uncode string into characters - ios

I have strings like
"\U0aac\U0ab9\U0ac1\U0ab5\U0a9a\U0aa8",
"\U0a97\U0ac1\U0ab8\U0acd\U0ab8\U0acb",
"\U0aa6\U0abe\U0ab5\U0acb",
"\U0a96\U0a82\U0aa1"
But I want to split this strings by unicode character
I dont know hot to do. I know components seprated by function but it's no use here.
\nAny help would be apperiaciated

If the strings you're getting really contain \U characters, you need to parse them manually and extract the unicode scalar values. Something like this:
let strings = [
"\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8",
"\\U0a97\\U0ac1\\U0ab8\\U0acd\\U0ab8\\U0acb",
"\\U0aa6\\U0abe\\U0ab5\\U0acb",
"\\U0a96\\U0a82\\U0aa1"
]
for str in strings {
let chars = str.components(separatedBy: "\\U")
var string = ""
for ch in chars {
if let val = Int(ch, radix: 16), let uni = Unicode.Scalar(val) {
string.unicodeScalars.append(uni)
}
}
print(string)
}

You can map your array, split its elements at non hexa digit values, compact map them into UInt32 values, initializate unicode scalars with them and map the resulting elements of your array into a UnicodeScalarView and init a new string with it:
let arr = [
#"\U0aac\U0ab9\U0ac1\U0ab5\U0a9a\U0aa8"#,
#"\U0a97\U0ac1\U0ab8\U0acd\U0ab8\U0acb"#,
#"\U0aa6\U0abe\U0ab5\U0acb"#,
#"\U0a96\U0a82\U0aa1"#]
let strings = arr.map {
$0.split { !$0.isHexDigit }
.compactMap { UInt32($0, radix: 16) }
.compactMap(Unicode.Scalar.init)
}.map { String(String.UnicodeScalarView($0)) }
print(strings)
This will print
["બહુવચન", "ગુસ્સો", "દાવો", "ખંડ"]

So, the string that comes back already has the "\" because in order to use components you'd need to have an additional escaping "\" so that you'd be able to do:
var listofCodes = ["\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8", "\\U0aac\\U0ab9\\U0ac1\\U0ab5\\U0a9a\\U0aa8"]
var unicodeArray :[String] = []
listofCodes.forEach { string in
unicodeArray
.append(contentsOf: string.components(separatedBy: "\\"))
unicodeArray.removeAll(where: {value in value == ""})
}
print(unicodeArray)
I will revise this answer once you specify how you are obtaining these strings, as is I get a non-valid string error from the start.

Related

How to convert sequence of ASCII code into string in swift 4?

I have an sequence of ASCII codes in string format like (7297112112121326610511411610410097121). How to convert this into text format.
I tried below code :
func convertAscii(asciiStr: String) {
var asciiString = ""
for asciiChar in asciiStr {
if let number = UInt8(asciiChar, radix: 2) { // Cannot invoke initializer for type 'UInt8' with an argument list of type '(Character, radix: Int)'
print(number)
let character = String(describing: UnicodeScalar(number))
asciiString.append(character)
}
}
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
But getting error in if let number line.
As already mentioned decimal ASCII values are in range of 0-255 and can be more than 2 digits
Based on Sulthan's answer and assuming there are no characters < 32 (0x20) and > 199 (0xc7) in the text this approach checks the first character of the cropped string. If it's "1" the character is represented by 3 digits otherwise 2.
func convertAscii(asciiStr: String) {
var source = asciiStr
var result = ""
while source.count >= 2 {
let digitsPerCharacter = source.hasPrefix("1") ? 3 : 2
let charBytes = source.prefix(digitsPerCharacter)
source = String(source.dropFirst(digitsPerCharacter))
let number = Int(charBytes)!
let character = UnicodeScalar(number)!
result += String(character)
}
print(result) // "Happy Birthday"
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
If we consider the string to be composed of characters where every character is represented by 2 decimal letters, then something like this would work (this is just an example, not optimal).
func convertAscii(asciiStr: String) {
var source = asciiStr
var characters: [String] = []
let digitsPerCharacter = 2
while source.count >= digitsPerCharacter {
let charBytes = source.prefix(digitsPerCharacter)
source = String(source.dropFirst(digitsPerCharacter))
let number = Int(charBytes, radix: 10)!
let character = UnicodeScalar(number)!
characters.append(String(character))
}
let result: String = characters.joined()
print(result)
}
convertAscii(asciiStr: "7297112112121326610511411610410097121")
However, the format itself is ambigious because ASCII characters can take from 1 to 3 decimal digits, therefore to parse correctly, you need all characters to have the same length (e.g. 1 should be 001).
Note that I am taking always the same number of letters, then convert them to a number and then create a character the number.

How to capitalize each alternate character of a string?

Lets say there is a string "johngoestoschool" it should become "JoHnGoEsToScHoOl" and incase if there is a special character in between it should ignore it for example given string "jo$%##hn^goe!st#os&choo)l" answer should be "Jo$%##Hn^GoE!sT#oS&cHoO)l"
From this answer, we in order to iterate we can do:
let s = "alpha"
for i in s.characters.indices[s.startIndex..<s.endIndex]
{
print(s[i])
}
Why can't we print the value of "i" here?
When we do i.customPlaygroundQuickLook it types int 0 to int4.
So my idea is to
if (i.customPlaygroundQuickLook == 3) {
s.characters.currentindex = capitalized
}
Kindly help
This should solve your function, the hard part is just checking weather the character is letters or not, using inout and replace range would give better performance:
func altCaptalized(string: String) -> String {
var stringAr = string.characters.map({ String($0) }) // Convert string to characters array and mapped it to become array of single letter strings
var numOfLetters = 0
// Convert string to array of unicode scalar character to compare in CharacterSet
for (i,uni) in string.unicodeScalars.enumerated() {
//Check if the scalar character is in letter character set
if CharacterSet.letters.contains(uni) {
if numOfLetters % 2 == 0 {
stringAr[i] = stringAr[i].uppercased() //Replace lowercased letter with uppercased
}
numOfLetters += 1
}
}
return stringAr.joined() //Combine all the single letter strings in the array into one string
}

String with Unicode (variable) [duplicate]

I have a problem I couldn't find a solution to.
I have a string variable holding the unicode "1f44d" and I want to convert it to a unicode character 👍.
Usually one would do something like this:
println("\u{1f44d}") // 👍
Here is what I mean:
let charAsString = "1f44d" // code in variable
println("\u{\(charAsString)}") // not working
I have tried several other ways but somehow the workings behind this magic stay hidden for me.
One should imagine the value of charAsString coming from an API call or from another object.
One possible solution (explanations "inline"):
let charAsString = "1f44d"
// Convert hex string to numeric value first:
var charCode : UInt32 = 0
let scanner = NSScanner(string: charAsString)
if scanner.scanHexInt(&charCode) {
// Create string from Unicode code point:
let str = String(UnicodeScalar(charCode))
println(str) // 👍
} else {
println("invalid input")
}
Slightly simpler with Swift 2:
let charAsString = "1f44d"
// Convert hex string to numeric value first:
if let charCode = UInt32(charAsString, radix: 16) {
// Create string from Unicode code point:
let str = String(UnicodeScalar(charCode))
print(str) // 👍
} else {
print("invalid input")
}
Note also that not all code points are valid Unicode scalars,
compare Validate Unicode code point in Swift.
Update for Swift 3:
public init?(_ v: UInt32)
is now a failable initializer of UnicodeScalar and checks if the
given numeric input is a valid Unicode scalar value:
let charAsString = "1f44d"
// Convert hex string to numeric value first:
if let charCode = UInt32(charAsString, radix: 16),
let unicode = UnicodeScalar(charCode) {
// Create string from Unicode code point:
let str = String(unicode)
print(str) // 👍
} else {
print("invalid input")
}
This can be done in two steps:
convert charAsString to Int code
convert code to unicode character
Second step can be done e.g. like this
var code = 0x1f44d
var scalar = UnicodeScalar(code)
var string = "\(scalar)"
As for first the step, see here how to convert String in hex representation to Int
As of Swift 2.0, every Int type has an initializer able to take String as an input. You can then easily generate an UnicodeScalar corresponding and print it afterwards. Without having to change your representation of chars as string ;).
UPDATED: Swift 3.0 changed UnicodeScalar initializer
print("\u{1f44d}") // 👍
let charAsString = "1f44d" // code in variable
let charAsInt = Int(charAsString, radix: 16)! // As indicated by #MartinR radix is required, default won't do it
let uScalar = UnicodeScalar(charAsInt)! // In Swift 3.0 this initializer is failible so you'll need either force unwrap or optionnal unwrapping
print("\(uScalar)")
You can use
let char = "-12"
print(char.unicodeScalars.map {$0.value }))
You'll get the values as:
[45, 49, 50]
Here are a couple ways to do it:
let string = "1f44d"
Solution 1:
"&#x\(string);".applyingTransform(.toXMLHex, reverse: true)
Solution 2:
"U+\(string)".applyingTransform(StringTransform("Hex/Unicode"), reverse: true)
I made this extension that works pretty well:
extension String {
var unicode: String? {
if let charCode = UInt32(self, radix: 16),
let unicode = UnicodeScalar(charCode) {
let str = String(unicode)
return str
}
return nil
}
}
How to test it:
if let test = "e9c8".unicode {
print(test)
}
//print:
You cannot use string interpolation in Swift as you try to use it. Therefore, the following code won't compile:
let charAsString = "1f44d"
print("\u{\(charAsString)}")
You will have to convert your string variable into an integer (using init(_:radix:) initializer) then create a Unicode scalar from this integer. The Swift 5 Playground sample code below shows how to proceed:
let validCodeString = "1f44d"
let validUnicodeScalarValue = Int(validCodeString, radix: 16)!
let validUnicodeScalar = Unicode.Scalar(validUnicodeScalarValue)!
print(validUnicodeScalar) // 👍

How to express Strings in Swift using Unicode hexadecimal values (UTF-16)

I want to write a Unicode string using hexadecimal values in Swift. I have read the documentation for String and Character so I know that I can use special Unicode characters directly in strings like the following:
var variableString = "Cat‼🐱" // "Cat" + Double Exclamation + cat emoji
But I would like to do it using the Unicode code points. The docs (and this question) show it for characters, but are not very clear about how to do it for strings.
(Note: Although the answer seems obvious to me now, it wasn't obvious at all just a short time ago. I am answering my own question below as a means of learning how to do this and also to help myself understand Unicode terminology and how Swift Characters and Strings work.)
Character
The Swift syntax for forming a hexadecimal code point is
\u{n}
where n is a hexadecimal number up to 8 digits long. The valid range for a Unicode scalar is U+0 to U+D7FF and U+E000 to U+10FFFF inclusive. (The U+D800 to U+DFFF range is for surrogate pairs, which are not scalars themselves, but are used in UTF-16 for encoding the higher value scalars.)
Examples:
// The following forms are equivalent. They all produce "C".
let char1: Character = "\u{43}"
let char2: Character = "\u{0043}"
let char3: Character = "\u{00000043}"
// Higher value Unicode scalars are done similarly
let char4: Character = "\u{203C}" // ‼ (DOUBLE EXCLAMATION MARK character)
let char5: Character = "\u{1F431}" // 🐱 (cat emoji)
// Characters can be made up of multiple scalars
let char7: Character = "\u{65}\u{301}" // é = "e" + accent mark
let char8: Character = "\u{65}\u{301}\u{20DD}" // é⃝ = "e" + accent mark + circle
Notes:
Leading zeros can be added or omitted
Characters are known as extended grapheme clusters. Even when they are composed of multiple scalars, they are still considered a single character. What is key is that they appear to be a single character (grapheme) to the user.
TODO: How to convert surrogate pair to Unicode scalar in Swift
String
Strings are composed of characters. See the following examples for some ways to form them using hexadecimal code points.
Examples:
var string1 = "\u{0043}\u{0061}\u{0074}\u{203C}\u{1F431}" // Cat‼🐱
// pass an array of characters to a String initializer
let catCharacters: [Character] = ["\u{0043}", "\u{0061}", "\u{0074}", "\u{203C}", "\u{1F431}"] // ["C", "a", "t", "‼", "🐱"]
let string2 = String(catCharacters) // Cat‼🐱
Converting Hex Values at Runtime
At runtime you can convert hexadecimal or Int values into a Character or String by first converting it to a UnicodeScalar.
Examples:
// hex values
let value0: UInt8 = 0x43 // 67
let value1: UInt16 = 0x203C // 8252
let value2: UInt32 = 0x1F431 // 128049
// convert hex to UnicodeScalar
let scalar0 = UnicodeScalar(value0)
// make sure that UInt16 and UInt32 form valid Unicode values
guard
let scalar1 = UnicodeScalar(value1),
let scalar2 = UnicodeScalar(value2) else {
return
}
// convert to Character
let character0 = Character(scalar0) // C
let character1 = Character(scalar1) // ‼
let character2 = Character(scalar2) // 🐱
// convert to String
let string0 = String(scalar0) // C
let string1 = String(scalar1) // ‼
let string2 = String(scalar2) // 🐱
// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
if let scalar = UnicodeScalar(hexValue) {
myString.append(Character(scalar))
}
}
print(myString) // Cat‼🐱
Further reading
Strings and Characters docs
Glossary of Unicode Terms
Strings in Swift
Working with Unicode code points in Swift
from your Hex "0x1F52D" to actual Emoji
let c = 0x1F602
next step would possibly getting an Uint32 from your Hex
let intEmoji = UnicodeScalar(c!).value
from this you can do something like
titleLabel.text = String(UnicodeScalar(intEmoji)!)
here you have a "😂"
it work with range of hexadecimal too
let emojiRanges = [
0x1F600...0x1F636,
0x1F645...0x1F64F,
0x1F910...0x1F91F,
0x1F30D...0x1F52D
]
for range in emojiRanges {
for i in range {
let c = UnicodeScalar(i)!.value
data.append(c)
}
}
to get multiple UInt32 from your Hex range for exemple

Swift: Split String into sentences

I'm wondering how I can split a string containing several sentences into an array of the sentences.
I know about the split function but spliting by "." doesn't suite for all cases.
Is there something like mentioned in this answer
You can use NSLinguisticsTagger to identify SentenceTerminator tokens and then split into an array of strings from there.
I used this code and it worked great.
https://stackoverflow.com/a/57985302/10736184
let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
in: text.startIndex..<text.endIndex,
scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
let r = prev...ix
result.append(
text[r].trimmingCharacters(
in: NSCharacterSet.whitespaces))
prev = text.index(after: ix)
}
Where result will now be an array of sentence strings. Note that the sentence will have to be terminated with '?', '!', '.', etc to count. If you want to split on newlines as well, or other Lexical Classes, you can add
|| $0.1 == "ParagraphBreak"
after
$0.1 == "SentenceTerminator"
to do that.
If you are capable of using Apple's Foundation then solution could be quite straightforward.
import Foundation
var text = """
Let's split some text into sentences.
The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like 👨‍👩‍👧‍👦! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
sentences.append(tag ?? "")
}
There are ways do it with pure Swift of course. Here is quick and dirty split:
let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like 👨‍👩‍👧‍👦! How do I split this?
"""
let sentencesPureSwift = simpleText.split(omittingEmptySubsequences:true) { $0.isPunctuation && !Set("',").contains($0)}
It could be refined with reduce().
Take a look on this link :
How to create String split extension with regex in Swift?
it shows how to combine regex and componentsSeparatedByString.
Try this:-
var myString : NSString = “This is a test”
var myWords: NSArray = myString.componentsSeparatedByString(“ “)
//myWords is now: ["This", "is", "a", "test"]

Resources