Inconsistently handled emoji sequences on iOS? - ios

On both iOS and macOS, sequences of regional indicator symbols are rendered as national flag emoji, and if the sequence is invalid the actual symbols are presented instead:
However, if the sequence happens to contain a pair of regional indicator symbols that don't map to a flag emoji, the potential flags are rendered on a first-match basis:
iOS/macOS rendering the symbols: F F I S E S.
In Swift 3, consecutive regional indicator symbols were all lumped into one Character, meaning that one Character object could contain a theoretically limitless amount of UnicodeScalar objects, as long as they were all regional indicator symbols. In essence, Swift 3 didn't break regional indicator symbols at all.
In Swift 4, on the other hand, one Character object contains at most two regional indicator symbols in its Unicode scalar representation. Additionally, and understandably, the validity of the sequence isn't considered, so regional indicator symbol sequences are simply broken up at every two scalars and considered a Character. Now, iterating the same string as above and printing each character produces the following:
Swift 4 string containing the symbols: F F I S E S.
Which brings us to the actual question – is the issue with how iOS and macOS renders the sequences, or how Swift 4 constructs the Character representation in strings?
I'm curious as to which party would be the most appropriate to report this peculiarity to.
Here is a minimal reproducible snippet for the behaviour in Swift 4:
// Regional indicator symbols F F I S E S
var string = "\u{1f1eb}\u{1f1eb}\u{1f1ee}\u{1f1f8}\u{1f1ea}\u{1f1f8}"
for character in string {
print(character)
}

After some investigation, it appears that neither is wrong, although the method implemented in Swift 4 is more true to recommendations.
As per the Unicode standard (emphasis mine):
The representative glyph for a single regional indicator symbol is just a dotted box containing a capital Latin letter. The Unicode Standard does not prescribe how the pairs of regional indicator symbols should be rendered. However, current industry practice widely interprets pairs of regional indicator symbols as representing a flag associated with the corresponding ISO 3166 region code.
– The Unicode Standard, Version 10.0 – Core Specification, page 836.
Then, on the following page:
Conformance to the Unicode Standard does not require conformance to UTS #51. However, the interpretation and display of pairs of regional indicator symbols as specified in UTS #51 is now widely deployed, so in practice it is not advisable to attempt to interpret pairs of regional indicator symbols as representing anything other than an emoji flag.
– The Unicode Standard, Version 10.0 – Core Specification, page 837.
From this I gather that while the standard doesn't set any rules for how the flags should be rendered, the chosen path for handling the rendering of invalid flag sequences in iOS and macOS is inadvisable. So, even if there exists a valid flag further in the sequence, the renderer should always consider two consecutive regional indicator symbols as a flag.
Finally, taking a look at UTS #51, or "the emoji specification":
Options for presenting an emoji_flag_sequence for which a system does not have a specific flag or other glyph include:
Displaying each REGIONAL INDICATOR symbol separately as a letter in a dotted square, as shown in the Unicode charts. This provides information about the specific region indicated, but may be mystifying to some users.
For all unsupported REGIONAL INDICATOR pairs, displaying the same “missing flag” glyph, such as the image shown below. This would indicate that the supported pair was intended to represent the flag of some region, without indicating which one.
– Unicode Technical Standard #51, revision 12, Annex B.
So, in conclusion, best practice would be representing invalid flag sequences as a pair of regional indicator symbols – exactly as is the case with Character objects in Swift 4 strings – or as a generic missing flag glyph.

Related

String comparison (>) returns different results on different platforms? [duplicate]

This question already has an answer here:
Swift how to sort dict keys by byte value and not alphabetically?
(1 answer)
Closed 5 years ago.
Consider the following predicate
print("S" > "g")
Running this on Xcode yields false, whereas running this on the online compiler of tutorialspoint or e.g. the IBM Swift Sandbox (Swift Dev. 4.0 (Sep 5, 2017) / Platform: Linux (x86_64)), yields true.
How come there's a different result of the predicate on the online compilers (Linux?) as compared to vs Xcode?
This is a known open "bug" (or perhaps rather a known limitation):
SR-530 - [String] sort order varies on Darwin vs. Linux
Quoting Dave Abrahams' comment to the open bug report:
This will mostly be fixed by the new string work, wherein String's
default sort order will be implemented as a lexicographical ordering
of FCC-normalized UTF16 code units.
Note that on both platforms we rely on ICU for normalization services,
and normalization differences among different implementations of ICU
are a real possibility, so there will never be a guarantee that two
arbitrary strings sort the same on both platforms.
However, for Latin-1 strings such as those in the example, the new
work will fix the problem.
Moreover, from The String Manifest:
Comparing and Hashing Strings
...
Following this scheme everywhere would also allow us to make sorting
behavior consistent across platforms. Currently, we sort String
according to the UCA, except that--only on Apple platforms--pairs of
ASCII characters are ordered by unicode scalar value.
Most likely, the particular example of the OP (covering solely ASCII characters), comparison according to UCA (Unicode Collation Algorithm) is used for Linux platforms, whereas on Apple platforms, the sorting of these single ASCII character String's (or; String instances starting with ASCII characters) is according to unicode scalar value.
// ASCII value
print("S".unicodeScalars.first!.value) // 83
print("g".unicodeScalars.first!.value) // 103
// Unicode scalar value
print(String(format: "%04X", "S".unicodeScalars.first!.value)) // 0053
print(String(format: "%04X", "g".unicodeScalars.first!.value)) // 0067
print("S" < "g") // 'true' on Apple platforms (comparison by unicode scalar value),
// 'false' on Linux platforms (comparison according to UCA)
See also the excellent accepted answer to the following Q&A:
What does it mean that string and character comparisons in Swift are not locale-sensitive?

NSRegularExpression not matching number sign (#)

I'm working on a Guitar Chord transposer, and so from a given text file, I want to identify guitar chords. e.g. G#, Ab, F#m, etc.
I'm almost there! I have run into a few problems already due to the number sign (hash tag).
#
For example, you can't include the number sign in your regex pattern. The NSRegularExpression will not initialize with this:
let fail: String = "\\b[ABCDEFG](b|#)?\\b"
let success: String = "\\b[CDEFGAB](b|\\u0023)?\\b"
I had to specifically provide the unicode character. I can live with that.
However, now that I have a NSRegularExpression object, it won't match these (sharps = number sign) when I have a line of text such as:
Am Bb G# C Dm F E
When it starts processing the G#, the sharp associated with that second capture group is not matched. (i.e. the NSTextCheckingResult's second range has a location of NSNotFound) Note, it works for Bb... it matches the 'b'
I'm wondering what I need to do here. It would seem the documentation doesn't cover this case of '#' which IS in fact sometimes used in Regex patterns (I think related to comments or sth)
One thing that would be great would be to not have to look up the unicode identifier for a #, but just use it as a String "#" then convert that so it plays nicely with the pattern. There exists the chance that \u0023 is in fact not the code associated with # ...
The \b word boundary is a context dependent construct. It matches in 4 contexts: 1) between start of string and a word char, 2) between a word char and end of string, 3) between word and a non-word and 4) a non-word and a word char.
Your regex is written in such a way that ultimately the regex engine sees a \b after # and that means a # will only match if there is a word char after it.
If you replace \b with (?!\w), a negative lookahead that fails the match if there is a word char immediately to the right of the current location, it will work.
So, you may use
\\b[CDEFGAB](b|\\u0023)?(?!\\w)
See the regex demo.
Details
\b - a word boundary
[CDEFGAB] - a char from the set
(b|\\u0023)? - an optional sequence of b or #
(?!\\w) - a negative lookahead failing the match (and causing backtracking into the preceding pattern! To avoid that, add + after ? to prevent backtracking into that pattern) if there is a word char immediately to the right of the current position.
(I'd like to first say #WiktorStribiżew has been a tremendous help and what I am writing now would not have been possible without him! I'm not concerned about StackOverflow points and rep, so if you like this answer, please upvote his answer.)
This issue took many turns and had a few issues going on. Ultimately this question should be called How do I use Regex on iOS to detect Musical Chords in a text file?
The answer is (so far), not simply.
CRASH COURSE IN MUSIC THEORY
In music you have notes. They are made up of a letter between A->G and an optional symbol called an accidental. (A note relates to the acoustic frequency of the sound you hear when that note is played) An accidental can be a flat (represented as a ♭ or simply a b), or a sharp (represented as a ♯ or simply a #, as these are easier to type on a keyboard). An accidental serves to make a note a semitone higher (#) or lower (b). As such, a F# is the same acoustic frequency as a Gb. On a piano, the white keys are notes with no accidentals, and the black keys represent notes with an accidental. Depending on some factors of the piece of music, that piece won't mix accidental types. It will either be flats throughout the piece or sharps. (Depending on the musical key of the composition, but this is not that relevant here.)
In terms of regex, you have something like ABCDEFG? to determine the note. In reality it's more complicated.
Then, a Musical Chord is comprised of the root note and it's chord type. There are over 50 types of chords. They have a 'text signature' that is unique. Also, a 'major' chord has an empty signature. So in terms of pseudo-regex you have for a Chord:
[ABCDEFG](b|#)?(...|...|...)?
where the first part you recognize as the note (as before), and the last optional is to determine the chord type. The different types were omitted, but can be as simple as a m (for Minor chord), or maj7#5 (for a major 7th chord with an augmented 5th... don't worry about it. Just know there are many string constants to represent a chord type)
Then finally, with guitar you often have a corresponding bass note that changes the chord's tonality somewhat. You denote this by adding a slash and then the note, giving the general pseudoform:
[ABCDEFG](b|#)?(...|...|...)?(/[ABCDEFG](b|#)?)? // NOT real Regex
real examples: C/F or C#m/G# and so on
where the last part has a slash then the same pattern to recognize a note.
So putting these all together, in general we want to find chords that could take on many forms, such as:
F Gm C#maj7/G# F/C Am A7 A7/F# Bmaj13#11
I was hoping to find one Regex to rule them all. I ended up writing code that works, though it seems like I kind of hacked around a bit to get the results I desired.
You can see this code here, written in Swift. It is not complete for my purposes, but it will parse a string, return a list of Chord Results and their text range within the original string. From there you would have to finish the implementation to suit your needs.
There have been a few issues on iOS:
iOS does not handle the number sign (#) well at all. When providing regex patterns or match text, I either had to replace the # with its unicode \u0023, or what ultimately worked was replacing all occurrences of # with another character (such as 'S'), and then convert it back once regex did it's thing. So this code I wrote often has to 'sanitize' the pattern or the input text before doing anything.
I couldn't get a Regex Pattern to perfectly parse a chord structure. It wasn't fully working for a Chord with a bass note, but it would successfully match a Chord with a bass note, then I had to split those 2 components and parse them separately, then recombine them
Regex is really a bit of voodoo, and I think it sucks that for something so confusing to many people, there are also different platform-dependent implementations of it. For example, Wiktor referred me to Regex patterns he wrote to help me solve the problem on www.regex101.com, that would WORK on that website, but these would not work on iOS, and NSRegularExpression would throw an error (often it had something to do with this # character)
My solution pays absolutely no regard to performance. It just wanted it to work.

Print a code128C barcode pair

The bar code 128 subset C the number of digits should always be even.
How to print bar code with odd character? example:
1517072011170323703007607271023031701
Using DelhiXE7 and Fortes Report 4.0 VCL
Is this question related to finnish banking barcode ?
if YES: You must pad the data to be of even length, according to the documentation published by the bank. Switching the barcode encoding system is not allowed by the relevant banking standard.
reference URL: http://www.finanssiala.fi/maksujenvalitys/dokumentit/Bank_bar_code_guide.pdf
if NO: Just first encode the even-length part, then switch to code 128A or 128B using the encoding switching special "character" and finally encode the last digit using either 128A or 128B, whichever serves you better.

Unicode filenames in iOS

Is it possible to use the full range of (let's say) the Chinese language in filenames of assets (images) within iOS? If not, what portions of big languages are supported in filenames, string searches and other file handling activities?
iOS and Mac OS currently use the HFS+ filesystem, which supports full Unicode in filenames. This means essentially any character, including Chinese and other human languages. The filesystem allows up to 255 characters, which for most languages is about 255 code points. (I see a note that the length is based on UTF16-encoded characters. There are characters which require more than 16 bits to encode, like emoji, which you can also use, but you'll have fewer characters allowed.)
The file APIs on iOS (NSFileManager, etc) should accommodate Unicode strings without any extra work. Do note that Unicode sequences are canonicalized in a particular way: e.g. an é character can be represented in multiple different ways in Unicode, but will be decomposed in a standardized way as a filename.
The bottom line is, you can feel free to use Unicode strings as your filenames as long as they are of reasonable length. Because superlong Unicode names will start running into length issues in a slightly unpredictable way (really just complicated and unnecessary to compute), you should probably set some sane self-imposed length limits.
APFS is the next-gen filesystem that Apple is developing, and will appear on iOS at some point soon. I can't find info on file name encoding but it's a fair assumption that it will support anything HFS+ supports, if not more so.
The iOS filesystem uses case-sensitive HFSX, which is a variant of HFS Plus and uses the same rules for filenames and character encodings.
Those rules are laid out in several sections of Apple Technote 1150.
The important considerations are:
You may use up to 255 16-bit Unicode characters per file or folder name as described in the HFS Plus Names section of Technote 1150.
The filesystem at its base level uses Unicode v2.0 (this is fixed) and strings must be stored in fully decomposed, canonical order. This precludes the use of some "equivalent forms" -- i.e. they must be converted to decomposed form. This is described in detail in the Unicode Subtleties section of Technote 1150. This section details other issues and should be read carefully.
A list of illegal characters can be found in this Decomposition Table.
The colon character ':' is used as a directory separator and is invalid in file and folder names.

Inconsistent Unicode Emoji Glyphs/Symbols

I've been trying to make use of the Unicode symbols for astrology in products for both Apple and iOS. I'm getting inconsistent results, as shown here:
Most of these are coming out as I like, but for some reason the Taurus symbol is appearing one way on the first line, following the Moon, and a very different way, with the Emoji-like purple button, when it follows Mars. These results are consistent for different symbols and across Apple hardware; here's a screen capture from my phone showing the same problem with some other signs - Scorpio comes out all right, but Libra and Cancer are buttons.
The strings are extremely straightforward; "Moon Taurus" in the first image is \u263D for Moon, \u2649 for Taurus, basically assembled as [NSString stringWithFormat:#"%#%#", #"\u263D", #"\u2649"]. The "Mars Taurus" image is the same, only with \u2642 for Mars. The string formatting is identical in the different cells of the OSX table, and in the iOS AttributedString.
Any idea what makes these symbols appear one way sometimes, and another way other times?
Unicode uses variation sequences to select between different renderings for certain code points—listed in the StandardizedVariants.txt file. In your case, the astrological symbols have both "text style" and "emoji style" variants that are selected between by a U+FEOE (text style) or U+FE0F (emoji style) following the code point:
U+2650 U+FE0E: ♐︎
U+2650 U+FE0F: ♐️
Note that correct interpretation of the variation selector depends on support from both the application/framework and the fonts being used. On Chrome (42) there doesn't appear to be any difference between my examples above, but on Safari (8) they are distinct.

Resources