I'm trying to learn how to solve the following exercise. I don't understand how to start, it's overwhelming. I do understand DFA's, NFA's and how to convert DFA's to NFA's. I also understand the formal notations.
This is not a homework exercise, it's just for studying. I do have the solution, but I can't make sense out of it either..
If someone could ELI5 the exercise that would be amazing, examples on solving exercises like these (with proper explanations) would be great too, I have yet to find a similar exercise on the web.
Given:
Alphabet Σ
A symbol c ∈ Σ
Regular language L over Σ
Language L-c = {uv | ucv ∈ L}
Let D = (Q, Σ, 𝛿, q0, F) be a DFA with ℒ(D)=L.
Show how to construct an NFA N using copies of D such that ℒ(N)=L-c. Provide a formal definition of N.
I am not going to write down the formal definition in detail. Basically you can use two copies of the DFA. In the first one you start, it does not have final states. For every transition from a state p to a state q that reads c, you add a transition not reading anything that goes from p in the first copy to q in the second copy. This way you skip exactly one c. Once you are in the second copy, you know that a c has been skipped and you can accept if the remaining inputleads to an accepting state.
Related
I'm working on a language very similar to STLC that I'm converting to Z3 propositions, hence a few (sub)questions about how Z3 treats (uninterpreted) functions:
Functions are naturally curried in my language and as I'm converting the terms of my language recursively, I'd like to be able to build the corresponding Z3 AST recursively as well. That is, when I have a term f x y I'd like to first apply f to x and then apply that to y. Is there a way to do this? The API I've found so far (Z3_mk_func_decl/Z3_mk_app) seems to require me to collect all arguments first and apply them all at once.
Is there a reasonable way to represent something like (if b then f else g) x?
In both cases, I'm totally fine with functions being uninterpreted and restricting the reasoning to things like "b = True /\ f x = 0 => (if b then f else g) x = 0 holds".
SMTLib (as described in http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf) is a many-sorted first-order logic. All functions (uninterpreted or not) must be applied to all its arguments, and you cannot have any form of currying. Also, you cannot do higher-order if-then-else, i.e., the branches of an if-then-else will have to be first-order values. (However, they can be arrays, and you can imagine "faking" functions with arrays. But that's besides the point.)
It should be noted that the next iteration of SMTLib (v3) will be based on a higher-order logic, at which point features like you're asking might become available. See: http://smtlib.cs.uiowa.edu/version3.shtml. Of course, this is still a proposal and it'll take a while before it's settled and actual solvers start implementing it faithfully. Eventually that'll happen, but I wouldn't expect it in the very near term.
Aside: Since you mentioned STLC (simply-typed-lambda-calculus), I presume you might be familiar with functional languages like Haskell. If that's the case, you might want to look into using SBV: https://hackage.haskell.org/package/sbv. It provides a framework for doing some of these things by carefully translating them away behind the scenes. Here's an example:
Prelude Data.SBV> sat $ \b -> (ite b (uninterpret "f") (uninterpret "g")) (0::SInteger) .== (0::SInteger)
Satisfiable. Model:
s0 = True :: Bool
f :: Integer -> Integer
f _ = 0
g :: Integer -> Integer
g _ = 2
Here we created two functions and used the ite construct to "merge" them; and got the solver to return us a model. Behind the scenes, SBV will fully saturate these applications and let you "pretend" you're programming in a higher-order sense, just like in STLC or Haskell. Of course, the devil is in the details and there are limitations to the approach, but modeling STLC in Haskell is a classic pastime for many people and doing it symbolically using SBV can be a fun exercise.
So far, my understanding of the algorithm of bottom-up parsing is this.
shift a token into the stack
check the stack from top if some elements including the top can be reduced by some production rule
if the elements can be reduced, pop and push the left hand side of the production rule.
continue those steps until top is the start symbol and next input is EOF
So to support my question with an example grammar,
S → aABe
A → Abc
A → b
B → d
if we have input string as
abbcde$
we will shift a in stack
and because there are no production rule that reduces a, we shift the next token b.
Then we can find a production rule A → b and reduce b to A.
Then my question is this. We have aA on stack and the next input is b. Then how can the parser determine whether we reduce b to A we wait for c to come and use the rule A → Abc?
Well of course, reducing b to A at that point results in an error. But how does the parser know at that point that we should wait for c?
I'm sorry if I missed something while studying.
That's an excellent question, and it will be addressed in the next part of your course.
For now, it's sufficient to pretend that there is some magic black box which tells the parser when it should reduce (and, sometimes, which of several possible productions to use).
The various parsing algorithms explain the construction of this black box. Note that one possible solution is to fork reality and try both actions in parallel, but a more common solution is to process the grammar in order to work out how to predict the correct action.
I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.
For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.
So I have a set of grammar
S -> X Y
X -> a X
X ->
Y -> b
Z -> a Z
Z -> a
My only confusing with this grammar is that 2nd Production for X
There is nothing there. Is that the equivalent of using Epsilon ε, or Lamda λ
I am assuming it is merely a difference in notation for the grammars but wanted to be sure as I am trying to build the first and follow sets
Both ε and λ (and sometimes Λ) are used by different writers to represent the empty string. In modern writing, ε is much more common but you'll often find λ in older textbooks, and Λ in even older ones.
The point of using these symbols is to make the empty sequence visible. However it is written, it is the empty sequence and should be read as though it were nothing, as in your production X ⇒ .
If you find it difficult getting your head around the idea that a symbol means nothing, then you might enjoy reading Charles Seife's Zero: The Biography of a Dangerous Idea or Robert Kaplan's The Nothing that Is: A Natural History of Zero, both published in the emblematic year 2K and both of which explore the long and difficult struggle to understand the concept of nothing. ("No one goes out to buy zero fish" -- Alfred North Whitehead).
It has been suggested that Λ/λ comes from the German word "leer", meaning empty, while ε comes from English "empty". There was a time when German was more common in academic discussion of mathematical logic, so the theory seems reasonable.
I have this grammar below and trying to figure out if can be parsed using the LL parser? If not, please explain.
S --> ab | cB
A --> b | Bb
B --> aAb | cC
C --> cA | Aba
From what I understand the intersection of the two sets must be empty to pass the pairwise disjointness test.
But I am not sure where to begin and have been looking through my textbook and http://en.wikipedia.org/wiki/LL_parser#Parsing_procedure but can't quite understand or find any examples to follow along. I just need to see the procedure or steps to understand how to do other similar problems to this. Any help is appreciated.
Compute FIRST sets for all the non-terminals, and check to see if FIRST sets for the alternatives of a given non-terminal are all disjoint. If all are, it is LL, and if there are any non-terminals for which they are not it is not. If there are any ε rules, you'll need FOLLOW sets as well.
Computing FIRST1 sets is quite easy and will tell you if the grammar is LL(1). Computing FIRSTk sets is quite a bit more involved, but will tell you if the grammar is LL(k) for any specific k you calculate the FIRSTk sets for.