Inverse homomorphism of (00+1)* - automata

I found an example of inverse homomorphism of regular expression (00+1)* (on page no 131 of 'Hopcroft, Motwani, ullman' book).
If h(a)=01 and h(b)=10 then auther says that inverse homomorphism of the given regular expression is regular expression (ba)*.
But there are strings 00 and 1 in the language of (00+1)* which cannot be represented by any string in language of (ba)*.
Is this example wrong or Am I thinking in wrong direction?

It is not strings in the language (ba)* that represent strings in (00+1)* but the other way round. The homomorpisam maps from the alphabet of (ba)* to strings over the alphabet of (00+1)*. Therefore the INVERSE homomorphism maps the other way round.
The image contains all strings that have a picture in (00+1)* . As you observe correctly, 00 and 1 do not have pictures. Therefore they do not contribute anything. This is how inverse morphisms are different from non-inverse ones. The decisive fact is that all the strings that do contribute, contribute strings from (ba)*.

Related

NSRegularExpression not matching number sign (#)

I'm working on a Guitar Chord transposer, and so from a given text file, I want to identify guitar chords. e.g. G#, Ab, F#m, etc.
I'm almost there! I have run into a few problems already due to the number sign (hash tag).
#
For example, you can't include the number sign in your regex pattern. The NSRegularExpression will not initialize with this:
let fail: String = "\\b[ABCDEFG](b|#)?\\b"
let success: String = "\\b[CDEFGAB](b|\\u0023)?\\b"
I had to specifically provide the unicode character. I can live with that.
However, now that I have a NSRegularExpression object, it won't match these (sharps = number sign) when I have a line of text such as:
Am Bb G# C Dm F E
When it starts processing the G#, the sharp associated with that second capture group is not matched. (i.e. the NSTextCheckingResult's second range has a location of NSNotFound) Note, it works for Bb... it matches the 'b'
I'm wondering what I need to do here. It would seem the documentation doesn't cover this case of '#' which IS in fact sometimes used in Regex patterns (I think related to comments or sth)
One thing that would be great would be to not have to look up the unicode identifier for a #, but just use it as a String "#" then convert that so it plays nicely with the pattern. There exists the chance that \u0023 is in fact not the code associated with # ...
The \b word boundary is a context dependent construct. It matches in 4 contexts: 1) between start of string and a word char, 2) between a word char and end of string, 3) between word and a non-word and 4) a non-word and a word char.
Your regex is written in such a way that ultimately the regex engine sees a \b after # and that means a # will only match if there is a word char after it.
If you replace \b with (?!\w), a negative lookahead that fails the match if there is a word char immediately to the right of the current location, it will work.
So, you may use
\\b[CDEFGAB](b|\\u0023)?(?!\\w)
See the regex demo.
Details
\b - a word boundary
[CDEFGAB] - a char from the set
(b|\\u0023)? - an optional sequence of b or #
(?!\\w) - a negative lookahead failing the match (and causing backtracking into the preceding pattern! To avoid that, add + after ? to prevent backtracking into that pattern) if there is a word char immediately to the right of the current position.
(I'd like to first say #WiktorStribiżew has been a tremendous help and what I am writing now would not have been possible without him! I'm not concerned about StackOverflow points and rep, so if you like this answer, please upvote his answer.)
This issue took many turns and had a few issues going on. Ultimately this question should be called How do I use Regex on iOS to detect Musical Chords in a text file?
The answer is (so far), not simply.
CRASH COURSE IN MUSIC THEORY
In music you have notes. They are made up of a letter between A->G and an optional symbol called an accidental. (A note relates to the acoustic frequency of the sound you hear when that note is played) An accidental can be a flat (represented as a ♭ or simply a b), or a sharp (represented as a ♯ or simply a #, as these are easier to type on a keyboard). An accidental serves to make a note a semitone higher (#) or lower (b). As such, a F# is the same acoustic frequency as a Gb. On a piano, the white keys are notes with no accidentals, and the black keys represent notes with an accidental. Depending on some factors of the piece of music, that piece won't mix accidental types. It will either be flats throughout the piece or sharps. (Depending on the musical key of the composition, but this is not that relevant here.)
In terms of regex, you have something like ABCDEFG? to determine the note. In reality it's more complicated.
Then, a Musical Chord is comprised of the root note and it's chord type. There are over 50 types of chords. They have a 'text signature' that is unique. Also, a 'major' chord has an empty signature. So in terms of pseudo-regex you have for a Chord:
[ABCDEFG](b|#)?(...|...|...)?
where the first part you recognize as the note (as before), and the last optional is to determine the chord type. The different types were omitted, but can be as simple as a m (for Minor chord), or maj7#5 (for a major 7th chord with an augmented 5th... don't worry about it. Just know there are many string constants to represent a chord type)
Then finally, with guitar you often have a corresponding bass note that changes the chord's tonality somewhat. You denote this by adding a slash and then the note, giving the general pseudoform:
[ABCDEFG](b|#)?(...|...|...)?(/[ABCDEFG](b|#)?)? // NOT real Regex
real examples: C/F or C#m/G# and so on
where the last part has a slash then the same pattern to recognize a note.
So putting these all together, in general we want to find chords that could take on many forms, such as:
F Gm C#maj7/G# F/C Am A7 A7/F# Bmaj13#11
I was hoping to find one Regex to rule them all. I ended up writing code that works, though it seems like I kind of hacked around a bit to get the results I desired.
You can see this code here, written in Swift. It is not complete for my purposes, but it will parse a string, return a list of Chord Results and their text range within the original string. From there you would have to finish the implementation to suit your needs.
There have been a few issues on iOS:
iOS does not handle the number sign (#) well at all. When providing regex patterns or match text, I either had to replace the # with its unicode \u0023, or what ultimately worked was replacing all occurrences of # with another character (such as 'S'), and then convert it back once regex did it's thing. So this code I wrote often has to 'sanitize' the pattern or the input text before doing anything.
I couldn't get a Regex Pattern to perfectly parse a chord structure. It wasn't fully working for a Chord with a bass note, but it would successfully match a Chord with a bass note, then I had to split those 2 components and parse them separately, then recombine them
Regex is really a bit of voodoo, and I think it sucks that for something so confusing to many people, there are also different platform-dependent implementations of it. For example, Wiktor referred me to Regex patterns he wrote to help me solve the problem on www.regex101.com, that would WORK on that website, but these would not work on iOS, and NSRegularExpression would throw an error (often it had something to do with this # character)
My solution pays absolutely no regard to performance. It just wanted it to work.

Converting a function (as a string) to be graphed by TChart?

I am getting the user to input a function, e.g. y = 2x^2 + 3, as a string. What I am looking to do is to enter that string into TChart and for TChart to graph the function.
As far as I know, TChart/TeeChart will only accept X values that are assigned values, e.g. -10 to 10 for X, so the X value would need to be calculated each time - this isn't an issue.
The issue is getting each part of the inputted function and substituting the X-values into each part. The workaround I have found is to get the user to enter the degree for each part of the function, e.g. 2 for X^2, 3 for X^3, etc. but is there a cleaner way of doing this?
If I could convert the inputted string into a Mathematical formula which TeeChart would accept, that would be the ideal outcome.
Saying that you can't use external units effectively makes your question unanswerable in the SO format, because the topic is far broader (and deeper) that can comfortably be dealt with in SO's Q&A format. So the following is at best an outline:
If you want to, or have to, write a DIY expression evaluator, one way to do it is to proceed as follows:
Write yourself a class that takes a string as input and snips it up into a series of symbols, aka "tokens" which represent the component parts of the expression, e.g, numbers, operators, parentheses, names of functions, names of variables, etc; these tokens might themselves be records or class instances and need to include a mechanisms for storing values associated with particular symbols (e.g. the tokens that represent numbers in the input). This step is called "tokenisation" or "lexing". Store the resulting list of symbols in a list or similiar structure. This class needs to implement a mechanism to retrieve the next symbol from the list (usually, this method is called something like "NextToken") and indicate whether there are any symbols left. This class also needs a mechanism to "put back" a symbol (or, equivalently, "peek" the symbol following the current one).
Then, write yourself a s/ware machine which takes the tokenised symbols and "evaluates" the list of symbols to produce the (mathematical) result you're after. This step is an order of magnitude or two more difficult than the tokenisation step. There are numerous ways to do it. As I said an a comment earlier, a recursive descent parser is probably the most tractable approach if you've never done anything like this before. There are countless examples in textbooks, but here's a link to an article about a Delphi implementation that should be understandable as an intro:
http://www8.umoncton.ca/umcm-deslierres_michel/Calcs/ParsingMathExpr-1.html
That article begins by noting that there are numerous pre-existing Delphi expression evaluators but makes the point that they are not necessarily the best place to start for someone wanting to learn how to write an evaluator/parser rather than just use one. Instead it goes through the coding of an evaluator to implement this simple expression grammar:
expression : term | term + term | term − term
term : factor | factor * factor | factor / factor
factor : number | ( expression ) | + factor | − factor
(the vertical bar | denotes ‘or’)
The article has a link to a second part which shows had to add exponentiation to the evaluator - this is trickier than it might sound and involves issues of ambiguity: e.g. how to evaluate - and what does it mean to write - an expression like
x^y^z
? This relates to the issue of "associativity": most operators are "left associative" which means that they bind more tightly to what's on the left of them than what's on their right. The exponentiation operator is an example of the reverse, where the operator binds more tightly to what's on its right.
Have fun!
By the way, you used to see suggestions to implement an evaluator using the "shunting yard algorithm"
http://en.wikipedia.org/wiki/Shunting-yard_algorithm
to convert an "infix" expression where the operators are between the operands, as in 1 + 3 * 4 to RPN (reverse Polish notation), as used on older HP calculators. The reason to do that was that RPN makes for much more efficient evaluation of an expression that the infix equivalent. Ymmv, but personally I found that implementing the SY algorithm properly was actually trickier than learning how to write an evaluator in the expression/term/factor style.
Fwiw, RPN is the basis of the Forth programming language, http://en.wikipedia.org/wiki/Forth_%28programming_language%29, so you could write a Forth implementation in Delphi if you wanted!

Extended Huffman Coding

I know this is not a coding issue but since I found some Huffman questions here I am posting here since I still need this for my implementation. When doing extended Huffman coding, I understand that you do for example a1a1,a1a2,a1a3 etc and you do their probabilities times, however, how do you get the codeword? For example from the image below how do you get that 0.6400 = 0 and 0.0160 = 10101, etc?
First, let me describe how a Huffman tree works, then I will explain how extended Huffman encoding works.
Some terms, codeword means a sequence of bits in our encoded output, that has been compressed.
Terms like a1, a2 or a3 are our input characters, we can think of them as letters for now.
We have the two rules,
More common letters map to shorter code words than less likely to appear letters.
The two least likely letters have the same length code word.
These two requirements lead to a simple way of building a binary
tree describing an optimum prefix code - THE Huffman Code.
Start with the two most unlikely letters, we know their codewords will be p0 and p1 for some prefix p, now we merge them and consider them as one super-letter, and find the two least common
letters again.
Repeat until the prefix is empty.
Right, now for the extended code, we just group a sequence of letters, pairs in your example, and treat them as one letter in a much larger alphabet.
Source: http://www.ws.binghamton.edu/fowler/fowler%20personal%20page/EE523_files/Ch_03%20Huffman%20&%20Extended%20Huffman%20%28PPT%29.pdf

When to use V instead of a decimal in Cobol Pic Clauses

Studying for a test right now and can't seem to wrap my head around when to use "V" for a decimal instead of an actual decimal in PIC clauses. I've done some research but can't find anything I understand. Only been learning cobol for about a week, so is there like a rule of thumb here? Thanks for your time.
You use an actual decimal-point when you want to "output" a value which has decimal places, like a report line, a position on a screen, an item in an output file which is going to a "different" system which doesn't understand the format with an implied decimal pace.
That's what the V is, it is an implied decimal place. It tells the compiler where to align results from calculations, MOVEs, whatever. Computer chips, and the machine instructions they support, don't know about actual decimal points for their internal processing.
COBOL is a language with fixed-length fields. The machine instructions don't need to know where the decimal point is (effectively it can deal with everything as integer values) but the compiler does, and the compiler has to do the correct scaling and alignment of results.
Storing on your own files, use V, the implied decimal place.
For data which is to be "human readable" or read by a system which cannot understand your character set, cannot scale what looks like an integer, use an actual decimal-point, . (for computer-readable stuff, you can sometimes use a separate scaling factor, if that is more convenient for the receiving system).
Basically, V for internal, . for external, should be a rule of thumb to get you there.
Which COBOL are you using? I'm surprised it is not covered in your documentation.

OCR - most "different" or "recognizable" ASCII characters?

I am looking for a way to determine the most "different" or "recognizable" N ASCII characters... For example, if N = 10, what would be the most different N characters in the ASCII set from 0x21 to 0x7E? Obviously, the character "X" is very different than "O" (the letter), but "O" (the letter) is very similar to "0" (zero). Assuming a restricted OCR character subset, such that zero and the letter O would be detected as one or the other only, and one didn't have to worry about whether it was a zero or a letter O, what would be the most different N characters that typical OCR engines (for example Tesseract) recognize easily from a poor quality input image? Assumptions. such as "+" and "t" could widely be mistaken for one another. can be made, and thus each input character, whether it's "+" or "t" would only correspond to one or the other.
Thanks,
Ben
Unfortunately I don't think there will be a single unique answer for this.
It'll depend on the font: Compare the different ways that 0, f, s are represented and also stylistic flourishes.
It'll depend on the type of damage the characters receive before being scanned, some may be more resilient against smudging, others against cuts, others against over-writing.
If you're looking for a representation that's best at surviving being printed, scanned and OCRed, then maybe a 1D or 2D barcode would be a better choice?
Only one way to answer this question: test it. Create a set of samples for each letter, and run OCR on each sample. The letters that OCR gets right the most often are the most "recognizable"; the letters that OCR gets wrong most often are the most "different".

Resources