UILabel: "opposite" of non breaking space? I.e. show where a word can be hyphenated - ios

I need to mark occurrences in words where it can be hyphenated if there's not enough rough, e.g.
loremip|sum
so if there's enough room, it should show loremipsum if not, it should be loremip- on the first line, sum on the second.
Bonus point: if the character is ignored by search, i.e. searching for loremips would find that occurrence. Does something like that exist on iOS?
Using NSParagraphStyle isn't enough as it splits the words at points where it isn't allowed (note: I'm using german). Also I would like to use words not really common as they are dialect.

Related

How to make a word mentioned in column A bold in column B where that word is mentioned in a sentence (Google Sheets)?

Basically, I am trying to create conditional formatting that does the following:
Simple example
I really would like to just make bold a word (not a whole sentence) in column B that is mentioned in column A.
I tried many different formulas in the "Value or formula" field:
=REGEXMATCH($B1,A1)
=REGEXMATCH($B1,"<"&A1&">")
=REGEXMATCH(B1,"\b"&A1&"\b")
=ARRAYFORMULA(REGEXMATCH(SPLIT(B1," "),"\b"&A1&"\b"))
...but none of these work.
This is actually sort of possible, if you are willing to stretch your definition of 'bold'. It's absolutely right that you can't make a particular part of a text string within a cell bold (or italic/underlined/coloured/etc.) in a way that replicates the effect if you manually select an area of a text string and format it using the menu bar options - effectively you are setting some metadata which sits 'outside' of the cell so isn't accessible to a formula.
However, Unicode fonts generally contain a region within their character map which corresponds to a bold version of the underlying font, and so with a little bit of trickery it's possible to substitute characters with their bold version. Here's a formula which achieves something like your original request:
=let(wordstobold,A1:A2,
sentencestosplit,B1:B2,
boldwords,map(wordstobold,lambda(eachword,concatenate(map(split(regexreplace(eachword,"(.)","$1_"),"_"),lambda(eachchar,filter(BOLDCHARS(),exact(STDCHARS(),eachchar))))))),
splitsentences,map(sentencestosplit,wordstobold,lambda(sentence,word,split(sentence,word,false))),
presplits,choosecols(splitsentences,1),
postsplits,choosecols(splitsentences,2),
map(presplits,boldwords,postsplits,lambda(x,y,z,x&y&z)))
The formula exploits the fact that SPLIT treats a multicharacter delimiter as a single entity by default, so if you pass the word to be bolded to SPLIT as the delimiter for the sentence it will split the sentence into two halves, presplit and postsplit. The substitution of the normal characters for bold characters could be done in a number of ways, but what I'm doing here is to explode the word to be bolded into individual characters and then MAPping a the equivalent bold character onto each one using FILTER/EXACT.
A couple of 'helper' Named Functions are required: STDCHARS() & BOLDCHARS(); these don't accept any parameters and are a means of storing the character sets for the part of the formula which swaps characters for their bold equivalents. It would be possible to integrate these into the formula, albeit at the expense of readability.
STDCHARS():
={"a";"b";"c";"d";"e";"f";"g";"h";"i";"j";"k";"l";"m";"n";"o";"p";"q";"r";"s";"t";"u";"v";"w";"x";"y";"z";"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z";"0";"1";"2";"3";"4";"5";"6";"7";"8";"9"}
BOLDCHARS():
={"๐—ฎ";"๐—ฏ";"๐—ฐ";"๐—ฑ";"๐—ฒ";"๐—ณ";"๐—ด";"๐—ต";"๐—ถ";"๐—ท";"๐—ธ";"๐—น";"๐—บ";"๐—ป";"๐—ผ";"๐—ฝ";"๐—พ";"๐—ฟ";"๐˜€";"๐˜";"๐˜‚";"๐˜ƒ";"๐˜„";"๐˜…";"๐˜†";"๐˜‡";"๐—”";"๐—•";"๐—–";"๐——";"๐—˜";"๐—™";"๐—š";"๐—›";"๐—œ";"๐—";"๐—ž";"๐—Ÿ";"๐— ";"๐—ก";"๐—ข";"๐—ฃ";"๐—ค";"๐—ฅ";"๐—ฆ";"๐—ง";"๐—จ";"๐—ฉ";"๐—ช";"๐—ซ";"๐—ฌ";"๐—ญ";"๐Ÿฌ";"๐Ÿญ";"๐Ÿฎ";"๐Ÿฏ";"๐Ÿฐ";"๐Ÿฑ";"๐Ÿฒ";"๐Ÿณ";"๐Ÿด";"๐Ÿต"}

This regex matches in BBEdit and regex.com, but not on iOS - why?

I am trying to "highlight" references to law statutes in some text I'm displaying. These references are of the form <number>-<number>-<number>(char)(char), where:
"number" may be whole numbers 18 or decimal numbers 12.5;
the parenthetical terms are entirely optional: zero or one or more;
if a parenthetical term does exist, there may or may not be a space between the last number and the first parenthesis, as in 18-1.3-401(8)(g) or 18-3-402 (2).
I am using the regex
((\d+(\.\d+)*-){2}(\d+(\.\d+)*))( ?(\([0-9a-zA-Z]+\))*)
to find the ranges of these strings and then highlight them in my text. This expression works perfectly, 100% of the time, in all of the cases I've tried (dozens), in BBEdit, and on regex101.com and regexr.com.
However, when I use that exact same expression in my code, on iOS 12.2, it is extremely hit-or-miss as to whether a string matching the regex is actually found. So hit-or-miss, in fact, that a string of the exact same form of two other matches in a specific bit of text is NOT found. E.g., in this one paragraph I have, there are five instances of xxx-x-xxx; the first and the last are matched, but the middle three are not matched. This makes no sense to me.
I'm using the String method func range(of:options:range:locale:) with options of .regularExpression (and nil locale) to do the matching. I see that iOS uses ICU-compatible regexes, whereas these other tools use PCRE (I think). But, from what I can tell, my expression should be compatible and valid for my case with the ICU parsing. But, something is definitely different, and I cannot figure out what it is.
Anyone? (I'm going to give NSRegularExpression a go and see if it behaves differently, but I'd still like to figure out what's going on here.)

NSRegularExpression not matching number sign (#)

I'm working on a Guitar Chord transposer, and so from a given text file, I want to identify guitar chords. e.g. G#, Ab, F#m, etc.
I'm almost there! I have run into a few problems already due to the number sign (hash tag).
#
For example, you can't include the number sign in your regex pattern. The NSRegularExpression will not initialize with this:
let fail: String = "\\b[ABCDEFG](b|#)?\\b"
let success: String = "\\b[CDEFGAB](b|\\u0023)?\\b"
I had to specifically provide the unicode character. I can live with that.
However, now that I have a NSRegularExpression object, it won't match these (sharps = number sign) when I have a line of text such as:
Am Bb G# C Dm F E
When it starts processing the G#, the sharp associated with that second capture group is not matched. (i.e. the NSTextCheckingResult's second range has a location of NSNotFound) Note, it works for Bb... it matches the 'b'
I'm wondering what I need to do here. It would seem the documentation doesn't cover this case of '#' which IS in fact sometimes used in Regex patterns (I think related to comments or sth)
One thing that would be great would be to not have to look up the unicode identifier for a #, but just use it as a String "#" then convert that so it plays nicely with the pattern. There exists the chance that \u0023 is in fact not the code associated with # ...
The \b word boundary is a context dependent construct. It matches in 4 contexts: 1) between start of string and a word char, 2) between a word char and end of string, 3) between word and a non-word and 4) a non-word and a word char.
Your regex is written in such a way that ultimately the regex engine sees a \b after # and that means a # will only match if there is a word char after it.
If you replace \b with (?!\w), a negative lookahead that fails the match if there is a word char immediately to the right of the current location, it will work.
So, you may use
\\b[CDEFGAB](b|\\u0023)?(?!\\w)
See the regex demo.
Details
\b - a word boundary
[CDEFGAB] - a char from the set
(b|\\u0023)? - an optional sequence of b or #
(?!\\w) - a negative lookahead failing the match (and causing backtracking into the preceding pattern! To avoid that, add + after ? to prevent backtracking into that pattern) if there is a word char immediately to the right of the current position.
(I'd like to first say #WiktorStribiลผew has been a tremendous help and what I am writing now would not have been possible without him! I'm not concerned about StackOverflow points and rep, so if you like this answer, please upvote his answer.)
This issue took many turns and had a few issues going on. Ultimately this question should be called How do I use Regex on iOS to detect Musical Chords in a text file?
The answer is (so far), not simply.
CRASH COURSE IN MUSIC THEORY
In music you have notes. They are made up of a letter between A->G and an optional symbol called an accidental. (A note relates to the acoustic frequency of the sound you hear when that note is played) An accidental can be a flat (represented as a โ™ญ or simply a b), or a sharp (represented as a โ™ฏ or simply a #, as these are easier to type on a keyboard). An accidental serves to make a note a semitone higher (#) or lower (b). As such, a F# is the same acoustic frequency as a Gb. On a piano, the white keys are notes with no accidentals, and the black keys represent notes with an accidental. Depending on some factors of the piece of music, that piece won't mix accidental types. It will either be flats throughout the piece or sharps. (Depending on the musical key of the composition, but this is not that relevant here.)
In terms of regex, you have something like ABCDEFG? to determine the note. In reality it's more complicated.
Then, a Musical Chord is comprised of the root note and it's chord type. There are over 50 types of chords. They have a 'text signature' that is unique. Also, a 'major' chord has an empty signature. So in terms of pseudo-regex you have for a Chord:
[ABCDEFG](b|#)?(...|...|...)?
where the first part you recognize as the note (as before), and the last optional is to determine the chord type. The different types were omitted, but can be as simple as a m (for Minor chord), or maj7#5 (for a major 7th chord with an augmented 5th... don't worry about it. Just know there are many string constants to represent a chord type)
Then finally, with guitar you often have a corresponding bass note that changes the chord's tonality somewhat. You denote this by adding a slash and then the note, giving the general pseudoform:
[ABCDEFG](b|#)?(...|...|...)?(/[ABCDEFG](b|#)?)? // NOT real Regex
real examples: C/F or C#m/G# and so on
where the last part has a slash then the same pattern to recognize a note.
So putting these all together, in general we want to find chords that could take on many forms, such as:
F Gm C#maj7/G# F/C Am A7 A7/F# Bmaj13#11
I was hoping to find one Regex to rule them all. I ended up writing code that works, though it seems like I kind of hacked around a bit to get the results I desired.
You can see this code here, written in Swift. It is not complete for my purposes, but it will parse a string, return a list of Chord Results and their text range within the original string. From there you would have to finish the implementation to suit your needs.
There have been a few issues on iOS:
iOS does not handle the number sign (#) well at all. When providing regex patterns or match text, I either had to replace the # with its unicode \u0023, or what ultimately worked was replacing all occurrences of # with another character (such as 'S'), and then convert it back once regex did it's thing. So this code I wrote often has to 'sanitize' the pattern or the input text before doing anything.
I couldn't get a Regex Pattern to perfectly parse a chord structure. It wasn't fully working for a Chord with a bass note, but it would successfully match a Chord with a bass note, then I had to split those 2 components and parse them separately, then recombine them
Regex is really a bit of voodoo, and I think it sucks that for something so confusing to many people, there are also different platform-dependent implementations of it. For example, Wiktor referred me to Regex patterns he wrote to help me solve the problem on www.regex101.com, that would WORK on that website, but these would not work on iOS, and NSRegularExpression would throw an error (often it had something to do with this # character)
My solution pays absolutely no regard to performance. It just wanted it to work.

Regex that finds a line with exactly 3 words in it

I have a problem that requires me to write a regex that finds a line that containing exactly 3 groups of characters (it could be words or numbers) and that ends with another specific word. The way I had in mind was to find a pattern that ended in a space, and look for it 3 times. assuming this is the correct way to go about it, I do no know how to find a space, but I thought it would look like .*"find a space"{3} endword$. Is this the way it would be done? Even if it is not the way to do it how do you find a space? Any suggestions?
Assuming by three groups of words you would accept any non-space character, you could write:
/^\s*(?:\S+\s+){3}endword$/
The initial caret is to make sure you have exactly 3 non-space groups on the line.
Of course you need to consider whether things like control characters could appear, and adjust accordingly.
Depending on your flavor, something like the below would do it:
\b+.+?\b+.+?\b+.+?\bendword$
This makes use of the word boundary mark (\b) and non-greedy repetitions (+?), so it may be slightly different in your specific implementation, especially if you're using something old like grep.

Smarter Autocapitalization

I've been looking around, and I am wondering whether there is a simple way to capitalize all words in a UITextField, while leaving certain words (such as of, the, or, etc.) lowercase, unless they are the first word of the phrase.
This is an
Example of the Effect I'm Trying to Convey.
One of the methods I've found is to search the text field value for the certain words and replace them with lowercase versions, as the user types a new word or character, perhaps listening for the space bar.
I'm not sure if the method above is best practice, or whether my searches haven't been broad enough to find a solution already in the mix.
I was originally thinking something along these "pseudocode" lines:
When value of textfield is changed
Get current value textfield
For each word in value:
If the word matches ("For", "Of", "The", etc.) and the word is not the first word in the value:
Change the word to lowercase, and replace word
Go to next word
My actual question is mainly one of performance. Would this method be overly strenuous on my application? If so, are there any better solutions?
Thank you all for your assistance!
Update:
Thanks to holex, cluemein, and others who have already commented and answered. I will try your solutions when I get the opportunity to do so.
A better way then converting the words to lowercase is to capitalize the words that are NOT those words you specified. Set up if statements to capitalize the beginning letter of the first word, and to capitalize the words following that if they are not the words you specified. Then, if you want to make sure the specified words weren't capitalized after the first word, use an else statement. "pseudocode" example:
Capitalize letter of first word;
Move on to next word;
While not end of textfield (or while typing):
if word is not ("the"|"and"|"of"|"or"|...):
capitalize first letter;
else:
set first letter to lowercase;
move to next word at space;
This will on average be roughly twice as fast as going back through the text looking for the specified words in terms of runtime. This isn't the code you would use, but the algorithm you would implement. Also, take into account what holex said about spaces. I leave how you implement this algorithm up to you. Just to clarify, this algorithm is for both autocapitalizng and auto-setting to lower case.

Resources