Image of DFA: https://ibb.co/LCW99q9
From my understanding, any string is accepted as long as it contains the substring “abc”; anything before is okay, and everything after is okay, including “λ”.
My problem is that I’m not sure how to write the notation, so is this correct?
L = {wabcv: v,w ∈ {a,b,c}*}
Yes, your answer is correct. That is:
L = {wabcv: v,w ∈ {a,b,c}*}
Related
I'm making a parser for a DSL in Haskell using Alex + Happy.
My DSL uses dice rolls as part of the possible expressions.
Sometimes I have an expression that I want to parse that looks like:
[some code...] 3D6 [... rest of the code]
Which should translate roughly to:
TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}
My DSL also uses variables (basically, Strings), so I have a special token that handle variable names.
So, with this tokens:
"D" { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']* { \pos str -> TokenName pos str}
$digit+ { \pos str -> TokenInt pos (read str) }
The result I'm getting when using my parse now is:
TokenInt {... value = 3}, TokenName { ... , name = "D6"}
Which means that my lexer "reads" an Integer and a Variable named "D6".
I have tried many things, for example, i changed the token D to:
$digit "D" $digit { \pos str -> TokenD pos }
But that just consumes the digits :(
Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?
PS: I'm using PosN as a wrapper, not sure if relevant.
The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do
$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
where ls = split 'D' s
split can be found here.
This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.
Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:
When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.
But then your code should work. I don't know if this is an issue with Alex.
I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter.
That also solved some possible problems with some other reserved words.
I'm still curious to know if there were other solutions, but the problem in itself was solved.
Thank you all!
I understand the concept of LR(1) parsing and lookahead symbols. I have the solution to the exercise and it does not agree with my solution.
I'm trying to fill the LR(1) parsing table for the grammar below:
S->xAz
S->BAx
A->Ay
A->e
B->yB
B->y
Ι don't have to extend the grammar since S does not appear in any right hand side of any rule.
First(A)=y,e
First(Ax)=x,y
First(B)=y
First(Ay)=y
Lookahead symbols in brackets.
So, I0 = Closure(S->.xAz($) , S->.BAx($) ) =
S->.xAz($)
S->.BAx($)
B->.yB(x,y)
B->.y(x,y)
When i try GOTO(0,x) i think that i should go to:
S->x.Az($)
A->.Ay(z)
A->. (z)
To find the lookahead symbol for A->. & A->.Ay i take First(z). But the official book solution says the lookeahead is (z,y).
Where does that y comes from?
Thank you in advance
Im following the algorithm for left recursion elimination from a grammar.It says remove the epsilon production if there is any
I have the following grammer
S-->Aa/b
A-->Ac/Sd/∈
I can see after removing the epsilon productions the grammer becomes
1) S-->Aa/a/b
2)A-->Ac/Sd/c/d
Im confused where the a/b comes in 1) and c/d comes in 2)
Can someone explain this?
lets look at the rule S->Aa, if A->∈ then S->∈a giving just S->a, so together with the previous rules we get S->Aa|a|b
now lets check the rule A->Ac and A->∈c which gives us A->c.
what about A->Sd? I dont see how you got A->d as a rule. if that is a rule, then the string "da" is accepted by this grammar (S->Aa & A->d --> "da"), but try to construct this string with the original grammar - if you start with S and the string finishes with a, it means you must use S->Aa, but then in order to have a "d" you must use A->Sd, which forces us to have another "a" or "b", meaning we cannot construct this string, and the rule A->d is not correct.
I am new to automata theory. This question below is for practice:
Let there be a language that is made of words that start and end with different symbols and have the alphabet {0,1}. For example, 001, 10110101010100, 10 and 01 are all accepted. But 101, 1, 0, and 1010001101 are rejected.
How do I:
Construct a Deterministic Finite Automata (DFA) diagram?
Find the regular expression for the DFA?
I tried to post an image of the DFA I drew, but I need 10 reputations to post images unfortunately, which I do not yet have.
To answer this question, I think it's easier to identify the regular expression first.
Regular Expression
1(1|0)*0 | 0(1|0)*1
(* denotes Kleene's star operation)
Now we convert this regular expression into an equivalent finite automata.
Constructing a DFA
You can design the NFA-∧(or NFA-ε in some texts) easily using Thompson constructors[1] for a given language(regex) which is then converted into an NFA without lambda transitions.
This NFA can then be mapped to an equivalent DFA using subset construction method. [2]
If you want, you can further reduce this DFA to obtain a minimal DFA which is unique for a given regular language. (Myhill-Nerode theorem) [3]
Regex → NFA-∧ → NFA → DFA → DFA(minimal),
This is the standard procedure.
[1]http://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm
[2] http://www.cs.nuim.ie/~jpower/Courses/Previous/parsing/node9.html
[3]http://en.wikipedia.org/wiki/Myhill%E2%80%93Nerode_theorem
We can get two possibilities here-
1) String starts with 0 and ends with 1 => [0(0|1)*1]
2) Strings staring with 1 and ending with 0 => [1(0|1)*0]
Also from rejected strings we know that minimum length would be 2.
Therefore final expression would be [0(0|1)*1]|[1(0|1)*0]
NFA would be something like this
NFA for given language
hi i am using latex and texmaker to do the following:
$\mathcal{a( X, Y )= a_i \circ a_j}$
which i expect to get
a(X,Y)= a subscript {i} circle a subscript {j}
but instead i get weird signs instead for a's (on the right side of equation), i and j ...can you tell me why? thanks...
The problem is that \mathcal only works on upper case letters.
never mind i did
$a(X,Y)=a_i \circ a_j$
it solved the problem.
thanks anyways!