Here is a weird problem in iOS 14.1/Swift 5 that caused many griefs and consumed hours, until I understood what's going on. I still want to know if this is a bug or "by design". In the latter case, please provide a link describing the behavior and provide a list of other characters that are not counted.
Let us assume that I have a string like below:
let data = "l1\r\nl2\r\nl3"
I need to create an HTTP response manually and replace the Content-Length with the data length. I use a template for that:
static let RESPONSE =
"""
HTTP/1.1 200 OK
Content-Length: LENGTH
Content-Type: text/plain
DATA
""".trimmingCharacters(in: .whitespaces)
Finally, I create a response from a template:
let response = RESPONSE.replacingOccurrences(of: ": LENGTH", with: ": \(data.count)")
.replacingOccurrences(of:"DATA", with:data)
As a result, Content-Length was set to 8, not to 10, and a client didn't receive "l3".
Please note that the string with carriage returns has been generated by Apple's own BASE64 API, so there is nothing "special" that I did here myself:
Data(digest).base64EncodedString(options: [.lineLength64Characters])
Very strange behavior that I didn't see in any other languages.
Swift treats the combination of \r\n as a single newline character (abbreviated in the docs to CR-LF).
let combo = Character('\r\n')
print(combo.isNewline) // true
So when you convert this Character to a String and count it you get the answer one.
print(String(combo).count) // 1
Character has no count because by definition it represents a single user-perceived character even if it is constructed from a number of components.
I guess Swift's developers decided that the count property of String should output the number of user perceived characters, and since \r\n to all intents and purposes has the same effect has a single newline character it is counted as a single character.
Note however that String does not throw away the data from which it was constructed; you can still get the 'raw' count property that is most relevant to your case via the unicodeScalars property.
let data = "l1\r\nl2\r\nl3"
print(data.count) // 8
print(data.unicodeScalars.count) // 10
By the way, it's not just CR-LF that gets this special treatment; national flag emojis are a single user perceived character that are actually composed of two scalars.
let unionJack = Character("🇬🇧")
for scalar in unionJack.unicodeScalars {
print(String(scalar.value, radix: 16).uppercased() )
}
// 1F1EC
// 1F1E7
Change data.count to data.utf16.count in order to get the "outside world" view of how "long" the string is.
(Alternatively you could say (data as NSString).length.)
Related
I have two strings encoded differently for spacing:
let first = "https://joytst.page.link/zMx3nAj9DxwcE1JC9?title=New+calendar+test"
let second = "https://joytst.page.link/zMx3nAj9DxwcE1JC9?title=New%20calendar%20test"
let firstOutput = first.removingPercentEncoding //https://joytst.page.link/zMx3nAj9DxwcE1JC9?title=New+calendar+test
let secondOutput = second.removingPercentEncoding //https://joytst.page.link/zMx3nAj9DxwcE1JC9?title=New calendar test
Why it doesn't remove encoding correctly, since + is a correct encoding for space?
How can I correctly decode both of them, no matter which one I receive?
“Why” is a difficult question to answer except for the people who had a hand in implementing CFURLCreateStringByReplacingPercentEscapes. The fact is that it doesn't.
I can speculate that it doesn't because the + for space replacement is not part of RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. It should only be used in the query part of the URL, which is of type application/x-www-form-urlencoded. But this is just my guess.
Anyway, if you want to convert + to space, you should do so before performing percent-decoding, lest you decode %2b into + and then further decode it into a space, leaving no way for your URL to contain a genuine + after decoding.
let firstOutput = first
.replacingOccurrences(of: "+", with: " ")
.removingPercentEncoding
If you could decode a whitespace from two different patters, then when you wanted to do the opposite and encode it which one it should take?
That's the reason why removingPercentEncoding only supports one of them.
I am working on string manipulation using LUA and having trouble with the following problem.
Using this as an example of the original data I am given -
"[0;1;36m(Web): You say, "Text here."[0;37m"
I want to keep the string intact except for removing the ANSI codes.
I have been pointed toward using gsub with the LUA pattern matching but I cannot seem to get the pattern correct. I am also unsure how to reference exactly the escape character sent.
text:gsub("[\27\[([\d\;]+)m]", "")
or
text:gsub("%x%[[%d+;+]m", "")
If successful, all I want to be left with, using the above example, would be:
(Web): You say, "Text here."
Your string example is missing the escape character, ASCII 27.
Here's one way:
s = '\x1b[0;1;36m(Web): You say, "Text here."\x1b[0;37m'
s = s:gsub('\x1b%[%d+;%d+;%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+m','')
:gsub('\x1b%[%d+m','')
print(s)
Using iOS + Swift, what's the best method to allow special characters .$#[]/ in my Firebase database keys (node names)?
Add percent encoding & decoding! Remember to allow alphanumeric characters (see example below).
var str = "this.is/a#crazy[string]right$here.$[]#/"
if let strEncoded = str.addingPercentEncoding(withAllowedCharacters: .alphanumerics) {
print(strEncoded)
if let strDecoded = strEncoded.removingPercentEncoding {
print(strDecoded)
}
}
The question is
How Do I Allow Special Characters in My Firebase Realtime Database?
The actual answer is there is nothing required to allow Special Characters in Firebase Realtime Database.
For example: given the following code
//self.ref is a reference to the Firebase Database
let str = "this.is/a#crazy[string]right$here.$[]#/"
let ref = self.ref.childByAutoId()
ref.setValue(str)
When the code is run, the following is written to firebase
{
"-KlZovTc2uhQXNzDodW_" : "this.is/a#crazy[string]right$here.$[]#/"
}
As you can see the string is identical to the given string, including the special characters.
It's important to note the question asks about allowing special characters in strings. Everything in Firebase is stored as key: value pairs and the Values can be strings so that's what this answer addresses.
Key's are different
If you create your own keys, they must be UTF-8 encoded, can be a maximum of 768 bytes, and cannot contain ., $, #, [, ], /, or ASCII control characters 0-31 or 127.
The bigger question goes back to; a structure that would require those characters to be included as a key could (and should) probably be re-thought at as there are generally better solutions.
I found some weirdest thing in Firebase Database/Storage. The thing is that I don't know if Firebase or Swift is not detecting umlauts e.g(ä, ö, ü).
I did some easy things with Firebase like upload images to Firebase Storage and then download them into tableview. Some of my .png files had umlauts in the title for example(Röda.png).
So the problem occurs now if I download them. The only time my download url is nil is if the file name contains the umlauts I was talking about.
So I tried some alternatives like in HTML ö - ö. But this is not working. Can you guys suggest me something? I can't use ö - o, ü - u etc.
This is the code when url is nil when trying to set some values into Firebase:
FIRStorage.storage().reference()
.child("\(productImageref!).png")
.downloadURLWithCompletion({(url, error)in
FIRDatabase.database().reference()
.child("Snuses").child(productImageref!).child("productUrl")
.setValue(url!.absoluteString)
let resource = Resource(downloadURL: url!, cacheKey: productImageref)
After spending a fair bit of time research your problem, the difference boils down to how the character ö is encoded and I traced it down to Unicode normalization forms.
The letter ö can be written in two ways, and String / NSString considers them equal:
let str1 = "o\u{308}" // decomposed : latin small letter o + combining diaeresis
let str2 = "\u{f6}" // precomposed: latin small letter o with diaeresis
print(str1, str2, str1 == str2) // ö ö true
But when you percent-encode them, they produce different results:
print(str1.stringByAddingPercentEncodingWithAllowedCharacters(.URLPathAllowedCharacterSet())!)
print(str2.stringByAddingPercentEncodingWithAllowedCharacters(.URLPathAllowedCharacterSet())!)
// o%CC%88
// %C3%B6
My guess is that Google / Firebase chooses the decomposed form while Apple prefers the other in its text input system. You can convert the file name to its decomposed form to match Firebase:
let str3 = str2.decomposedStringWithCanonicalMapping
print(str3.stringByAddingPercentEncodingWithAllowedCharacters(.URLPathAllowedCharacterSet()))
// o%CC%88
This is irrelevant for ASCII-ranged characters. Unicode can be very confusing.
References:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (highly recommended)
Strings in Swift 2
NSString and Unicode
Horray for Unicode!
The short answer is that no, we're actually not doing anything special here. Basically all we do under the hood is:
// This is the list at https://cloud.google.com/storage/docs/json_api/ without the & because query parameters
NSString *const kGCSObjectAllowedCharacterSet =
#"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~!$'()*+,;=:#";
- (nullable NSString *)GCSEscapedString:(NSString *)string {
NSCharacterSet *allowedCharacters =
[NSCharacterSet characterSetWithCharactersInString:kGCSObjectAllowedCharacterSet];
return [string stringByAddingPercentEncodingWithAllowedCharacters:allowedCharacters];
}
What blows my mind is that:
let str1 = "o\u{308}" // decomposed : latin small letter o + combining diaeresis
let str2 = "\u{f6}" // precomposed: latin small letter o with diaeresis
print(str1, str2, str1 == str2) // ö ö true
returns true. In Objective-C (which the Firebase Storage client is built in), it totally shouldn't, as they're two totally different characters (in actuality, the length of str1 is 2 while the length of str2 is 1 in Obj-C, while in Swift I assume the answer is 1 for both).
Apple must be normalizing strings before comparison in Swift (probably a reasonable thing to do, since otherwise it leads to bugs like this where strings are "the same" but compare differently). Turns out, this is exactly what they do (see the "Extended Grapheme Clusters" section of their docs).
So, when you provide two different characters in Swift, they're being propagated to Obj-C as different characters and thus are encoded differently. Not a bug, just one of the many differences between Swift's String type and Obj-C's NSString type. When in doubt, choose a canonical representation you expect and stick with it, but as a library developer, it's very hard for us to choose that representation for you.
Thus, when naming files that contain Unicode characters, make sure to pick a standard representation (C,D,KC, or KD) and always use it when creating references.
let imageName = "smorgasbörd.jpg"
let path = "images/\(imageName)"
let decomposedPath = path.decomposedStringWithCanonicalMapping // Unicode Form D
let ref = FIRStorage.storage().reference().child(decomposedPath)
// use this ref and you'll always get the same objects
let word = "sample string"
let firstLetter = Character(word.substringToIndex(advance(word.startIndex,1)).uppercaseString)
I got the above example from a tutorial. Can anyone know what they mean by "advance" and what is difference between "substringToIndex" and "substringWithRange".
This advance syntax is from Swift 1, it's different now.
Swift 2
let firstLetter = Character(word.substringToIndex(word.startIndex.advancedBy(1)).uppercaseString)
The advancedBy method moves the current index along the String.
With substringToIndex you slice a part of the String, beginning at the start of the String and ending at the index defined by advancedBy.
Here you advance by 1 in the String, so it means that substringToIndex will get the first character from the String.
Swift 3
The syntax has changed again, we now use substring and an index with an offset:
let firstLetter = Character(word.substring(to: word.index(word.startIndex, offsetBy: 1)).uppercased())
substringToIndex
Returns a new string containing the characters of the receiver up to,
but not including, the one at a given index.
Return Value A new string containing the characters of the receiver up to, but not including, the one at anIndex. If anIndex is
equal to the length of the string, returns a copy of the receiver.
substringWithRange
Returns a string object containing the characters of the receiver that
lie within a given range.
Return Value A string object containing the characters of the receiver that lie within aRange.
Special Considerations This method detects all invalid ranges (including those with negative lengths). For applications linked
against OS X v10.6 and later, this error causes an exception; for
applications linked against earlier releases, this error causes a
warning, which is displayed just once per application execution.
For more info detail, you can get in the Apple NSString Class Reference
Your tutorial is outdated. advance was deprecated in Swift 2. Strings in Swift cannot be randomly accessed, i.e. there's no word[0] to get the first letter of the string. Instead, you need an Index object to specify the position of the character. You create that index by starting with another index, usually the startIndex or endIndex of the string, then advance it to the character you want:
let word = "sample string"
let index0 = word.startIndex // the first letter, an 's'
let index6 = word.startIndex.advancedBy(6) // the seventh letter, the whitespace
substringToIndex takes all characters from the left of string, stopping before the index you specified. These two are equivalent:
print("'\(word.substringToIndex(index6))'")
print("'\(word[index0..<index6])'")
Both print 'sample'