NFA for this specific design - automata

I have this NFA bellow, and I feel that it has the regular expression (0+1)*00. But looking at it i also feel that I should add something to the RE for the transition q2->q0, but cant quite understand how to do it. Could anyone give me any tips? Thank you

If you want an NFA for (0 + 1)*00, your NFA works fine. Indeed, the transition from q2 to q0 is unnecessary; the NFA without that works as well. NFAs do not need to define all transitions and accept any string that has one accepted path through the NFA. Assuming q2 is an accepting state for you, it accepts any string in (0 + 1)*00 (self-loop on q0 for however many copies of (0 + 1) the Kleene closure is used to produce, and then go from q0 to q1 and q1 to q2) and only strings in (0 + 1)*00 (any string that gets from q0 to q2 and stays there must end with two 0s, which is the only requirement of strings in our language). In particular, it is legal for a NFA to "crash" on certain configurations; that just means that particular path through the NFA does not accept the string, but some other one might. If there are 10^1000000 paths through an NFA and all but one crashes or rejects, and just one leads to an accepting state, that string is in the language of the NFA.

Related

How can all keywords in a grammar be set to accept upper and lower case in an existing grammar?

I have a grammar that works, except the keywords must be upper case. Is there a way to shotgun all the keywords such that lower case equivalents will not be rejected? If not, how do I affect each of them individually?
I don't recommend input streams that convert case to make the keyword recognition case-insensitive. Such a stream will convert everything, strings, comments etc. even though that is a total waste of CPU cycles. A better approach is to tell explicitly in your grammar that you want (only) certain keywords to be case sensitive. The grammar is trivial:
fragment A: [aA];
fragment B: [bB];
...
fragment Z: [zZ];
KEYWORD1: K E Y W O R D '1';
...
The ATN for these rules is only marginally more complex (using 2 intervals instead of one for each letter, which is (in total) faster than a case conversion):
and as example the letter S:
Each node is a step the ATN simulator has to walk to parse a rule. Edge labels are symbols to match to allow this transition (with ɛ being the epsilon transition, i.e. an unconditional step without input consumption).

Eliminating Epsilon Production for Left Recursion Elimination

Im following the algorithm for left recursion elimination from a grammar.It says remove the epsilon production if there is any
I have the following grammer
S-->Aa/b
A-->Ac/Sd/∈
I can see after removing the epsilon productions the grammer becomes
1) S-->Aa/a/b
2)A-->Ac/Sd/c/d
Im confused where the a/b comes in 1) and c/d comes in 2)
Can someone explain this?
lets look at the rule S->Aa, if A->∈ then S->∈a giving just S->a, so together with the previous rules we get S->Aa|a|b
now lets check the rule A->Ac and A->∈c which gives us A->c.
what about A->Sd? I dont see how you got A->d as a rule. if that is a rule, then the string "da" is accepted by this grammar (S->Aa & A->d --> "da"), but try to construct this string with the original grammar - if you start with S and the string finishes with a, it means you must use S->Aa, but then in order to have a "d" you must use A->Sd, which forces us to have another "a" or "b", meaning we cannot construct this string, and the rule A->d is not correct.

Deterministic Finite Automata

I am new to automata theory. This question below is for practice:
Let there be a language that is made of words that start and end with different symbols and have the alphabet {0,1}. For example, 001, 10110101010100, 10 and 01 are all accepted. But 101, 1, 0, and 1010001101 are rejected.
How do I:
Construct a Deterministic Finite Automata (DFA) diagram?
Find the regular expression for the DFA?
I tried to post an image of the DFA I drew, but I need 10 reputations to post images unfortunately, which I do not yet have.
To answer this question, I think it's easier to identify the regular expression first.
Regular Expression
1(1|0)*0 | 0(1|0)*1
(* denotes Kleene's star operation)
Now we convert this regular expression into an equivalent finite automata.
Constructing a DFA
You can design the NFA-∧(or NFA-ε in some texts) easily using Thompson constructors[1] for a given language(regex) which is then converted into an NFA without lambda transitions.
This NFA can then be mapped to an equivalent DFA using subset construction method. [2]
If you want, you can further reduce this DFA to obtain a minimal DFA which is unique for a given regular language. (Myhill-Nerode theorem) [3]
Regex → NFA-∧ → NFA → DFA → DFA(minimal),
This is the standard procedure.
[1]http://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm
[2] http://www.cs.nuim.ie/~jpower/Courses/Previous/parsing/node9.html
[3]http://en.wikipedia.org/wiki/Myhill%E2%80%93Nerode_theorem
We can get two possibilities here-
1) String starts with 0 and ends with 1 => [0(0|1)*1]
2) Strings staring with 1 and ending with 0 => [1(0|1)*0]
Also from rejected strings we know that minimum length would be 2.
Therefore final expression would be [0(0|1)*1]|[1(0|1)*0]
NFA would be something like this
NFA for given language

In the PowerShell grammar, what is the the `lvalueExpression` rule saying?

I was reviewing the PowerShell grammar posted here: http://www.manning.com/payette/AppCexcerpt.pdf
(I don't think it has been updated since PowerShell v1, and there are some typos. So, it's clearly not the true PowerShell Grammar, but a human-oriented document.)
In section C.2.1, it says:
<lvalueExpression> = <lvalue> [? |? <lvalue>]*
What is the meaning of the question marks? I can't tell if it means "match any character" or "match a question mark" or it's a typo.
I'm not sure what inputs this is intended to match, but maybe it's this:
$a,$b = 1, 2
in which case maybe the question mark is supposed to be a comma?
Based on its use in the preceding rule (<assignmentStatementRule> = <lvalueExpression> <AssignmentOperatorToken> <pipelineRule>), it appears that lvalueExpression in Appendix C of Windows PowerShell in Action corresponds to expression in section B.2.3 of The PowerShell Language Specification that Joey linked to. Matching it further than this is difficult, but I'll add some speculation anyway :)
The ? characters in [? |? <lvalue>]* are very likely erroneous. If it had been used to represent "the previous token is optional", then:
the [ and | tokens it was applied to should have been quoted
only [ makes sense as part of a value expression, but indexing is already covered later by the propertyOrArrayReferenceOperator rule
? is not used anywhere else in the grammar, but {0|1} is used multiple times to indicate "can appear zero or one times"
Given its similarity to [ '|' <cmdletCall> ]* at the end of the first rule in the section, it may have been a copy-and-paste error, compounded by a ‘smart quote’ round-trip encoding error. Assuming this was copied with the intent of editing later, then ?|? may have become '.' to represent multiple property accesses (but again, this is covered by the propertyOrArrayReferenceOperator rule).
Though based on the statement at the end of section C.2.1 that "[the pipeline rule] also handles parsing assignment expressions", lvalueExpression was probably intended to list all the assignable expressions besides simpleLvalue (e.g. cast-expression for [int]$x = 1, array-literal-expression for $a,$b,$c = 1,2,3), etc).

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

Resources