Can the intersection of 2 non-regular languages be a regular language? - automata

Can the intersection of 2 non-regular languages be a regular language ?

Suppose that the two non-regular languages are distinct and have no strings in common. The intersection of these 2 languages will be the empty set, since no string exists in both languages.
The empty set is a regular language, so this can happen sometimes.

Suppose L and L' are non-regular languages. Can we conclude that L intersection L' is not regular?
The answer to this question can be found here

Related

Is $E_{LBA}$ a Turing recognizable language?

I know that $E_{LBA}$ = {< M > | L(M) = \emptyset }$ is an undecidable language, but is it also recognizable? It seems that it's complement is recognizable since it could enumerate all strings and see if any belong to the language. If both were recognizable, then $E_{LBA}$ would be decidable, but it isn't, which leads me to think it isn't recognizable. Is this true?
Indeed, the language of all Turing-machine encodings which accept the empty language is:
undecidable, since there is no TM which answers yes for strings in the language and no for strings not in the language;
co-recursively-enumerable in that there is a TM which answers no for strings not in the language (using dovetailing, try all strings on all TMs and you will eventually answer no for any string not in the language)
not recursively enumerable (or recognizable) because if it were, it would be decidable given that we know it's co-recursively-enumerable.

How can I tell that a language is context-free from first sight?

My professor expects us to quickly tell if a given language is regular, context-free but not regular, or not context-free (in other words, without drawing a PDA, writing a context-free grammar, and using the pumping lemma for context-free languages).
I'm aware of tips that help us quickly tell what a regular language is at first sight,but not whether or not a language is context free.
Thank you.
Of course, there is no universal answer. But there are some general patterns that CF can or can not do that show up in different variants. Things CF can do (and REG not):
count simultaneously in two places like in a^n b^n,
also repeatedly like in a^n b^n a^m b^m
or nested like in a^n b^m a^m b^n
palindromic patterns, i.e. w followed by the reverse of w
count the number of one letter against another like in "words with an equal number of a and b" or "words with 5 more a than b"
Typical things CF cannot do:
count simultaneously in three places like in a^n b^n c^n
count simultaneously twice in two crossing pairs of places like in a^n b^m a^n b^m
two ordered copies like ww
compare the numbers of three letters like in "words with an equal number of a, b, and c".
With these patterns in mind, you should be able to determine context-freeness of most common example languages.

Java underscore equivalent in F#?

In Java it's possible to write 1_000_000 instead of 1000000 for better readability. Is there something equivalent in F#?
This question was already asked on feature request page and the current status of this chanbge request is "planned" and "approved in principle".
So it may will be implemented in one of the next releases.
You can find more information about this request (like a summary about this feature, the motivation and suggested implementation details) on the github page for F#:
Summary
Allow underscores between any digits in numeric literals. This feature enables you, for example, to separate groups of digits in numeric literals, which can improve the readability of your code.
For instance, if your code contains numbers with many digits, you can use an underscore character to separate digits in groups of three, similar to how you would use a punctuation mark like a comma, or a space, as a separator.
Motivation
This is a popular feature in other languages. Some other languages with a similar feature:
Perl
Ruby
Java 7
C++11 (use single quote)
just to name a few...
Detailed design
You can place underscores only between digits. You cannot place underscores in the following places:
At the beginning or end of a number
Adjacent to a decimal point in a floating point literal
Prior to an F or L or other suffix
In positions where a string of digits is expected

Given a language a^n b^m such that n and m have some relation between them implies that the given language can not be regular.Am i correct? [duplicate]

I know anbn for n > 0 is not regular by the pumping lemma but I would imagine a*b* to be regular since both a,b don't have to be the same length. Is there a proof for it being regular or not?
Answer to your question:
imagine a*b* to be regular, Is there a proof for it being regular or not?
No need to imagine, expression a*b* is called regular expression (re), and regular expressions are possible only for regular languages. If a language is not regular then regular expression is also not possible for that and if a language is regular language then we can always represent it by some regular expression.
Yes, a*b* represents a regular language.
Language description: Any number of a followed by any numbers of b (by any number I mean zero (including null ^) or more times). Some example strings are:
{^, a, b, aab, abbb, aabbb, ...}
DFA for RE a*b* will be as follows:
a- b-
|| ||
▼| ▼|
---►((Q0))---b---►((Q1))
In figure: `(())` means final state, so both `{Q0, Q1}` are final states.
You need to understand following basic concept:
What is basically a regular language? And why an infinite language `a*b*` is regular whereas languages like `{ anbn | n > 0 }` are not regular!!
A language(a set) is called regular language, if it requires only bounded(finite) amount of information to keep store at any instance of time while processing strings of the language.
So, what is 'bounded' information?
For example: Consider a fan 'on'/'off' switch. By viewing fan switch we can say whether the fan is in the on or off state (this is bounded or limited information). But we cannot tell 'how many times' a fan has been switched to on or off in the past! (to memorize this, we require a mechanism to store an 'unbounded' amount of information to count — 'how many times' e.g. the meter used in our cars/bikes).
The language { anbn | n > 0 } is not a regular language because here n is unbounded(it can be infinitely large). To verify strings in the language anbn, we need to memorize how many a symbols there are and it requires an infinite memory storage to count because the number of a symbols in the string can be infinitely large!
That means an automata is only capable of processing strings of the language anbn if it has infinite memory e.g PDA.
Whereas, a*b* is of course regular by its nature, because there is the bounded restriction ‐ that b may come after some a ( and a can't came after b). And that is why every string of this language can be easily processed (or recognized) by an automata in which we have finite memory - and finite automata is a class of automata where memory is finite. Yes, in finite automata, we have finite amount of memory in the term of states.
(Memory in finite automata is present in the form of states Q and according to automata principal: any automata can have only finite states. hence finite automata have finite memory, this is the reason the class of automata for regular languages is called finite automata. You can think of a finite automata like a CPU without memory, that has finite register to remember its internal states)
Finite State ⇒ Finite Memory ⇒ Only language can be processed for which finite memory needs to store at any instance of time while processing the string ⇒ that language is called Regular Language
Absent of external memory is limitation of finite automate ⇒ or we can say limitation of finite automata defined class of language called Regular Language.
You should read other answer "finiteness of regular language" to learn scope of regular language.
side note::
language { anbn | n > 0 } is subset of a*b*
Also a language { anbn | 10>100 n > 0 } is regular, a large set but regular because n is bounded, hence finite automata and regular expression is possible for this language.
You should also read: How to prove a language is regular?
The proof is: ((a*)(b*)) is a well-formed regular expression, hence matching a regular language. a*b* is a syntactic shortenning of the same expression.
Another proof: Regular languages are closed to concatenation. a* is a regular language. b* is a regular language, therefore their concatenation, a*b*, is also a regular expression.
You can build an automat for it:
0 ->(a) 1
0 ->(b) 2
1 ->(a) 1
1 ->(b) 2
2 ->(b) 2
2 ->(a) 3
3 ->(a,b) 3
where only 3 is not an accepting state, and prove that the language is a*b*.
To prove that a language is regular, it is sufficient to show either:
1) There exists some DFA that recognizes it. In this case, the DFA is trivial.
2) The language can be expressed as a regular expression, as mentioned in another answer. a*b* is a regular expression to recognize this language.
A regular language is a language that can be expressed with a regular expression or a deterministic or non-deterministic finite automata or state machine.
A language is a set of strings which are made up of characters from a specified alphabet, or set of symbols. Regular languages are a subset of the set of all strings.
a closure property is a statement that a certain operation on languages, when applied to languages in a class (e.g., the regular languages), produces a result that is also in that class.
this RE shows..the type of language that accepts multiple of (a) if any but before (b)
means language without containing any substring (ba)
Regular languages are not subset of context free languages. For example, ab is regular, comprising all the strings made of substring of a's followed by substring of b's. This is not subset of a^nb^n, but superset.

OCR - most "different" or "recognizable" ASCII characters?

I am looking for a way to determine the most "different" or "recognizable" N ASCII characters... For example, if N = 10, what would be the most different N characters in the ASCII set from 0x21 to 0x7E? Obviously, the character "X" is very different than "O" (the letter), but "O" (the letter) is very similar to "0" (zero). Assuming a restricted OCR character subset, such that zero and the letter O would be detected as one or the other only, and one didn't have to worry about whether it was a zero or a letter O, what would be the most different N characters that typical OCR engines (for example Tesseract) recognize easily from a poor quality input image? Assumptions. such as "+" and "t" could widely be mistaken for one another. can be made, and thus each input character, whether it's "+" or "t" would only correspond to one or the other.
Thanks,
Ben
Unfortunately I don't think there will be a single unique answer for this.
It'll depend on the font: Compare the different ways that 0, f, s are represented and also stylistic flourishes.
It'll depend on the type of damage the characters receive before being scanned, some may be more resilient against smudging, others against cuts, others against over-writing.
If you're looking for a representation that's best at surviving being printed, scanned and OCRed, then maybe a 1D or 2D barcode would be a better choice?
Only one way to answer this question: test it. Create a set of samples for each letter, and run OCR on each sample. The letters that OCR gets right the most often are the most "recognizable"; the letters that OCR gets wrong most often are the most "different".

Resources