Deterministic Pushdown automa vs Non-deterministic pushdown Automata - automata

how can you show a) Deterministic pushdown automata (DPDA) are more powerful than finite automata and less powerful than a non-determinstic pushdown automata?

(1) First, show that any language that can be accepted by a finite automaton can also be accepted by a deterministic pushdown automaton. Recall that any language accepted by a finite automaton is accepted by a deterministic finite automaton, and a deterministic pushdown automaton can simulate a deterministic finite automaton simply by doing nothing interesting with its stack. Next, show there is a non-regular language accepted by a DPDA. 0^n 1^n is a good candidate. Prove this language is non-regular using the pumping lemma or Myhill-Nerode theorem, thenshow the DPDA that pushes on 0s and then switches to popping on 1s works.
(2) First, note that NPDAs can accept any language accepted by DPDAs since all DPDAs are also NPDAs that happen not to make use of nondeterminism. Next, find a language that has an NPDA but no DPDA. Even-length palindromes over the alphabet {0, 1} might work. There is a simple NPDA for this that nondeterministically guesses when the first half of the input has been read and switches from pushing to popping. To show there is no DPDA is more challenging. Perhaps you could argue as follows: suppose there were a DPDA. Then, in any configuration of the DPDA, only one transition would be possible. If string w leads to an accepting state in the DPDA and empties the stack, x00 may lead either to an accepting or non-accepting state (since x00 either may or may not be an even-length palindrome). This is a contradiction, though, so our DPDA does not exist. The same argument fails for NPDAs, by the way, because there can be multiple paths through, so one failed choice means nothing.

NDPA is more powerful than DPDA because we can add more transitions to it. It is possible for every language to add a transition. For some languages, we can construct DPDA there exist an NPDA but there are some languages that are accepted by NPDA but are not by DPDA. This is said to be powerful when it accepts more sets of languages than other automata.
In fact, it is more powerful than DFA(Deterministic finite automata) and NFA(Non-deterministic finite automata) also because, In the case of DFA and NFA, they are equivalent in power. So for every language accepted by DFA there exist an NFA and Vice-Versa. There is not any language for which we construct NFA but not DFA. Hence, we can’t convert NPDA to DPDA always and we can convert NFA to equivalent dfa.

Related

What is the point of the 4 grammars specified in Chomsky hierarchy?

I'm currently studying compiler's and am on the topic of "Chomsky Hierarchy and the 4 languages." But it beats me as to what the practical purpose of all this is?
It'd be great if I could see real-life examples of the 4 grammars: Unrestricted, CSG, CFG, Regular Grammer come to play.
I found online that Chomsky hierarchy along with the 4 grammars is used to evaluate proposals within cognitive science but this goes way over my head. It'd be great if someone could break it down for me, thanks a lot!
There is no practical value. That's the whole point.
Let me try to break that down a bit. It's useful to remember that Chomsky is a linguist --someone who studies human languages-- and that he was writing in the late 1950s when computational theory was not as well-developed as it is today. (To put it mildly.) His goal was to find a mathematical model which could provide some insights into the mechanisms by which human beings generate and understand sentences, and he took as his starting point a particular simple model of sentence generation.
In this model, a grammar is a function F, which transforms elements from an arbitrary sequence of symbols from some alphabet onto another sequence of symbols from the same alphabet. F is defined by a finite set of pairs (called productions) α → β. We then say that F(ω) = F(ζ) if the definition of F contains some pair α → β such that α is a substring of ω and ζ is the result of substituting a single instance of α in ω with β.
That's not very interesting in and of itself; we make that into a full language by starting with some designated starting sequence, normally represented as the single symbol S, and repeatedly apply F as many times as is necessary. (In all interesting grammars, the set so constructed is infinite, so it cannot actually be constructed. But we can imagine proceeding from the starting point until we find the sentence we wanted to generate.)
The problem with this model is that it can be used to describe an arbitrary Turing Machine. Or, if you like, an arbitrary computer program, although the equivalence is easier to see with a Turing Machine. In other words, it is at least theoretically possible to construct a finite grammar which will recognise strings consisting of the description of a Turing Machine (i.e., a program written in some programming language) followed by an input and an output only if the Turing Machine applied to the input would produce the output. In other words, there exists (in the mathematical sense) a grammar of this form which is computationally equivalent to a general purpose computer.
Unfortunately, that's not actually very useful if our goal is to understand sentences, because there is actually no algorithm for running computers backwards. The best we can do as a general solution is to enumerate all possible inputs and run the program on each of them until we find the output we hoped for. And that doesn't actually work because there is no limit to the amount of time the program might take to produce an output and no way to even know if the program will eventually come to an end. (This is called the "halting problem".) So we might get stuck on some possible input, and we'll never know if some other input might have produced the desired output.
As a result, we cannot tell whether the provided input was "grammatical", that is, whether it conformed to the grammar provided. And that's not just the case with the particular grammar we built to emulate Turing Machines. It means that we have no confidence that we can recognise sentences from any arbitrary grammar, and even if we stumble upon an answer, we have no way to limit how much time it might take to get there.
Clearly, this is not how human beings understand each other. So if it is to serve a practical purpose, we must restrict the possible grammars in some way to make them computationally feasible.
On the other end of the spectrum, a lot was known about finite-state machines. A finite-state machine is a Turing Machine without a tape; that is to say, it is simply a finite collection of states. In each state, the machine reads a single input symbol and uses it to decide what the next state will be. It turns out that finite-state machines can be modelled using a grammar (as above) restricted to very simple productions, each of which is either of the form A → a B or A → a, where a is a symbol from the "terminal alphabet" (that is, a word) and A and B are single grammatical symbols. These grammars are called "regular grammars" and they are computationally equivalent to what mathematicians call "regular expressions" (which are a small subset of what is recognised by "regex" libraries, but that's a whole other discussion).
Regular grammars are easy to parse. All that is needed to is trace through the state machine, so it can be done without backtracking in time proportional to the length of the input. But regular grammars are far too weak to be able to represent human language, or even most computer languages. As a simple example, algebraic expressions with parentheses cannot be recognised with a regular grammar (or with a finite-state machine) because there is no way to count the parenthesis depth; the finite-state machine has no memory at all (other than knowing which state it is in, and there are only a finite number of states).
So unrestricted grammars are too powerful to parse and regular grammars are too weak to be useful. (Useful for complex parsing problems, that is. There are certain applications for regular expressions, but parsing complete computer programs is not one of them.)
The next step, then, was to try to find a restriction on grammars which was still powerful enough to represent human language without being so powerful that parsing became impossible.
That, finally, is the origin of Chomsky's hierarchy. Between the two extremes described above (type 0 and type 3 grammars), Chomsky proposed two possible intermediate restrictions -- type 1 and type 2 grammars -- and proved a number of important properties about each of them.
While this work turned out to be fundamental in the development of formal language theory, it cannot really be said to have answered the question Chomsky started with. Type 2 grammars -- context-free grammars -- are indeed computationally tractable; they can be parsed with simple algorithms in polynomial time, and can represent a large number of useful languages. But they are still too weak to represent human language. In particular, context-free grammars cannot represent a language as simple as "all strings which contain two instances of the same substring". Type 1 grammars -- context-sensitive grammars -- can probably represent any useful language, and are not quite as unruly as unrestricted grammars, but they are still too powerful to parse. (Since the derivation steps in a context-sensitive grammar never get shorter, it is possible to enumerate all possible derivations from a starting point in order by length, which means that you can decide whether a sentence is generated by the grammar without running into the halting problem. But that's as good as it gets; that procedure takes exponential time and is not remotely feasible for non-trivial inputs.)
In the six decades since Chomsky published his seminal papers, a lot of work has been done to try to find useful intermediate restrictions between type 1 and type 2 grammars. And there has been a lot of useful study into algorithms for parsing context-free languages, which is of enormous utility in building compilers. All of this builds on the crucial work done by Chomsky and the other computational theorists whose work he built on -- Markov, Turing, Church and Kleene, just to name a few worthy of study. But Chomsky's original project remains unsolved.
So if your goal is to build a simple parser for a programming language, the Chomsky hierarchy is probably just an interesting footnote. But if you are interested in the academic study of formal language theory, there are still lots of interesting unsolved problems to work on.

Theory: LL(k) parser vs parser for LL(k) grammars

I'm concerned about the very important difference between the therms: "LL(k) parser" and "parser for LL(k) grammars". When a LL(1) backtracking parser is in question, it IS a parser for LL(k) grammars, because it can parse them, but its NOT LL(k) parser, because it does not use k tokens to look ahead from a single position in the grammar, but its exploring with backtracking the possible cases, regardless that it still uses k tokens to explore.
Am I right?
The question may break down to the way the look-ahead is performed. If the look-ahead is actually still processing the grammar with backtracking, that does not make it LL(k) parser. To be LL(k) parser the parser must not use the grammar with backtracking mechanism, because then it would be "LL(1) parser with backtracking that can parser LL(k) grammars".
Am I right again?
I think the difference is related to the expectation that LL(1) parser is using a constant time per token, and LL(k) parser is using at most k * constant (linear to the look-ahead) time per token, not an exponential time as it would be in the case of a backtracking parser.
Update 1: to simplify - per token, is the parsing LL(k) expected to run exponentially in respect to k or in a linear time in respect to k?
Update 2: I have changed it to LL(k) because the question is irrelevant to the range of which k is (integer or infinity).
An LL(k) parser needs to do the following at each point in the inner loop:
Collect the next k input symbols. Since this is done at each point in the input, this can be done in constant time by keeping the lookahead vector in a circular buffer.
If the top of the prediction stack is a terminal, then it is compared with the next input symbol; either both are discarded or an error is signalled. This is clearly constant time.
If the top of the prediction stack is a non-terminal, the action table is consulted, using the non-terminal, the current state and the current lookahead vector as keys. (Not all LL(k) parsers need to maintain a state; this is the most general formulation. But it doesn't make a difference to complexity.) This lookup can also be done in constant time, again by taking advantage of the incremental nature of the lookahead vector.
The prediction action is normally done by pushing the right-hand side of the selected production onto the stack. A naive implementation would take time proportional to the length of the right-hand side, which is not correlated with either the lookahead k nor the length of the input N, but rather is related to the size of the grammar itself. It's possible to avoid the variability of this work by simply pushing a reference to the right-hand side, which can be used as though it were the list of symbols (since the list can't change during the parse).
However, that's not the full story. Executing a prediction action does not consume an input, and it's possible -- even likely -- that multiple predictions will be made for a single input symbol. Again, the maximum number of predictions is only related to the grammar itself, not to k nor to N.
More specifically, since the same non-terminal cannot be predicted twice in the same place without violating the LL property, the total number of predictions cannot exceed the number of non-terminals in the grammar. Therefore, even if you do push the entire right-hand side onto the stack, the total number of symbols pushed between consecutive shift actions cannot exceed the size of the grammar. (Each right-hand side can be pushed at most once. In fact, only one right-hand side for a given non-terminal can be pushed, but it's possible that almost every non-terminal has only one right-hand side, so that doesn't reduce the asymptote.) If instead only a reference is pushed onto the stack, the number of objects pushed between consecutive shift actions -- that is, the number of predict actions between two consecutive shift actions -- cannot exceed the size of the non-terminal alphabet. (But, again, it's possible that |V| is O(|G|).
The linearity of LL(k) parsing was established, I believe, in Lewis and Stearns (1968), but I don't have that paper at hand right now so I'll refer you to the proof in Sippu & Soisalon-Soininen's Parsing Theory (1988), where it is proved in Chapter 5 for Strong LL(K) (as defined by Rosenkrantz & Stearns 1970), and in Chapter 8 for Canonical LL(K).
In short, the time the LL(k) algorithm spends between shifting two successive input symbols is expected to be O(|G|), which is independent of both k and N (and, of course, constant for a given grammar).
This does not really have any relation to LL(*) parsers, since an LL(*) parser does not just try successive LL(k) parses (which would not be possible, anyway). For the LL(*) algorithm presented by Terence Parr (which is the only reference I know of which defines what LL(*) means), there is no bound to the amount of time which could be taken between successive shift actions. The parser might expand the lookahead to the entire remaining input (which would, therefore, make the time complexity dependent on the total size of the input), or it might fail over to a backtracking algorithm, in which case it is more complicated to define what is meant by "processing an input symbol".
I suggest you to read the chapter 5.1 of Aho Ullman Volume 1.
https://dl.acm.org/doi/book/10.5555/578789
A LL(k) parser is a k-predictive algorithm (k is the lookahead integer >= 1).
A LL(k) parser can parse any LL(k) grammar. (chapter 5.1.2)
for all a, b you have a < b => LL(b) grammar is also a LL(a) grammar. But the reverse is not true.
A LL(k) parser is PREDICTIVE. So there is NO backtracking.
All LL(k) parsers are O(n) n is the length of the parsed sentence.
It is important to understand that a LL(3) parser do not parse faster than a LL(1). But the LL(3) parser can parse MORE grammars than the LL(1). (see the point #2 and #3)

LR(k) to LR(1) grammar conversion

I am confused by the following quote from Wikipedia:
In other words, if a language was reasonable enough to allow an
efficient one-pass parser, it could be described by an LR(k) grammar.
And that grammar could always be mechanically transformed into an
equivalent (but larger) LR(1) grammar. So an LR(1) parsing method was,
in theory, powerful enough to handle any reasonable language. In
practice, the natural grammars for many programming languages are
close to being LR(1).[citation needed]
This means that a parser generator, like bison, is very powerful (since it can handle LR(k) grammars), if one is able to convert a LR(k) grammar to a LR(1) grammar. Do some examples of this exist, or a recipe on how to do this? I'd like to know this since I have a shift/reduce conflict in my grammar, but I think this is because it is a LR(2) grammar and would like to convert it to a LR(1) grammar. Side question: is C++ an unreasonable language, since I've read, that bison-generated parsers cannot parse it.
For references on the general purpose algorithm to find a covering LR(1) grammar for an LR(k) grammar, see Real-world LR(k > 1) grammars?
The general purpose algorithm produces quite large grammars; in fact, I'm pretty sure that the resulting PDA is the same size as the LR(k) PDA would be. However, in particular cases it's possible to come up with simpler solutions. The general principle applies, though: you need to defer the shift/reduce decision by unconditionally shifting until the decision can be made with a single lookahead token.
One example: Is C#'s lambda expression grammar LALR(1)?
Without knowing more details about your grammar, I can't really help more than that.
With regard to C++, the things that make it tricky to parse are the preprocessor and some corner cases in parsing (and lexing) template instantiations. The fact that the parse of an expression depends on the "kind" (not type) of a symbol (in the context in which the symbol occurs) makes precise parsing with bison complicated. [1] "Unreasonable" is a value judgement which I'm not comfortable making; certainly, tool support (like accurate syntax colourizers and tab-completers) would have been simple with a different grammar, but the evidence is that it is not that hard to write (or even read) good C++ code.
Notes:
[1] The classic tricky parse, which also applies to C, is (a)*b, which is a cast of a dereference if a represents a type, and otherwise a multiplication. If you were to write it in the context: c/(a)*b, it would be clear that an AST cannot be constructed without knowing whether it's a cast or a product, since that affects the shape of the AST,
A more C++-specific issue is: x<y>(z) (or x<y<z>>(3)) which parse (and arguably tokenise) differently depending on whether x names a template or not.

Can a Regular Language have a Linear Bounded Automaton

As the question states:
I am trying to understand automata. Can every regular language have a linear bounded automaton?
Yes, it is possible to have Linear Bounded Automata for each and every regular languages. Regular languages are proper subset of Context Sensitive Languages. CSL is the language associated with Linear Bounded Automata (LBA). For more information on hierarchy of classes of formal grammars and languages read about Chomsky Hierarchy.
http://en.wikipedia.org/wiki/Chomsky_hierarchy

Converting a context-free grammar into a LL(1) grammar

First off, I have checked similar questions and none has quite the information I seek.
I want to know if, given a context-free grammar, it is possible to:
Know if there exists or not an equivalent LL(1) grammar. Equivalent in the sense that it should generate the same language.
Convert the context-free grammar to the equivalent LL(1) grammar, given it exists. The conversion should succeed if an equivalent LL(1) grammar exists. It is OK if it does not terminate if an equivalent does not exists.
If the answer to those questions are yes, are such algorithms or tools available somewhere ? My own researches have been fruitless.
Another answer mentions that the Dragon Book has an algorithms to eliminate left recursion and to left factor a context-free grammar. I have access to the book and checked it but it is unclear to me if the grammar is guaranteed to be LL(1). The restriction imposed on the context-free grammar (no null production and no cycle) are agreeable to me.
From university compiler courses I took I remember that LL(1) grammar is context free grammar, but context free grammar is much bigger than LL(1). There is no general algorithm (meaning not NP hard) to check and convert from context-free (that can be transformed to LL(1)) to LL(1).
Applying the bag of tricks like removing left recursion, removing first-follow conflict, left-factoring, etc. are similar to mathematical transformation when you want to integrate a function... You need experience that is sometimes very close to an art. The transformations are often inverse of each other.
One reason why LR type grammars are being used now a lot for generated parsers is that they cover much wider spectrum of context free grammars than LL(1).
Btw. e.g. C grammar can be expressed as LL(1), but C# cannot (e.g. lambda function x -> x + 1 comes to mind, where you cannot decide if you are seeing a parameter of lambda or a known variable).

Resources