I want to draw a parse tree have multiple statements if the condition is true, and multiple statements if the condition is false.
I know how to draw for 1 statement.
You are missing the syntactic element of a block or statement-listthat may consist of multiple statements but also may be used as a statement.
In your diagram statement would expand to a block and that then contains the statements.
Related
I have a DSL where a file consists of multiple named blocks.
Ideally, each block should occur only once, but the order doesn't matter.
How do I write a parser that ignores block order, but gives syntax errors if the same block is repeated?
One option is to detect the error after parsing, perhaps with a walker.
If you need to detect the errors during parsing, then add a semantics class that stores the block identifiers and raises SemanticError if a block name is repeated.
One of the biggest problems with designing a lexical analyzer/parser combination is overzealousness in designing the analyzer. (f)lex isn't designed to have parser logic, which can sometimes interfere with the design of mini-parsers (by means of yy_push_state(), yy_pop_state(), and yy_top_state().
My goal is to parse a document of the form:
CODE1 this is the text that might appear for a 'CODE' entry
SUBCODE1 the CODE group will have several subcodes, which
may extend onto subsequent lines.
SUBCODE2 however, not all SUBCODEs span multiple lines
SUBCODE3 still, however, there are some SUBCODES that span
not only one or two lines, but any number of lines.
this makes it a challenge to use something like \r\n
as a record delimiter.
CODE2 Moreover, it's not guaranteed that a SUBCODE is the
only way to exit another SUBCODE's scope. There may
be CODE blocks that accomplish this.
In the end, I've decided that this section of the project is better left to the lexical analyzer, since I don't want to create a pattern that matches each line (and identifies continuation records). Part of the reason is that I want the lexical parser to have knowledge of the contents of each line, without incorporating its own tokenizing logic. That is to say, if I match ^SUBCODE[ ][ ].{71}\r\n (all records are blocked in 80-character records) I would not be able to harness the power of flex to tokenize the structured data residing in .{71}.
Given these constraints, I'm thinking about doing the following:
Entering a CODE1 state from the <INITIAL> start condition results
in calls to:
yy_push_state(CODE_STATE)
yy_push_state(CODE_CODE1_STATE)
(do something with the contents of the CODE1 state identifier, if such contents exist)
yy_push_state(SUBCODE_STATE) (to tell the analyzer to expect SUBCODE states belonging to the CODE_CODE1_STATE. This is where the analyzer begins to masquerade as a parser.
The <SUBCODE1_STATE> start condition is nested as follows: <CODE_STATE>{ <CODE_CODE1_STATE> { <SUBCODE_STATE>{ <SUBCODE1_STATE> { (perform actions based on the matching patterns) } } }. It also sets the global previous_state variable to yy_top_state(), to wit SUBCODE1_STATE.
Within <SUBCODE1_STATE>'s scope, \r\n will call yy_pop_state(). If a continuation record is present (which is a pattern at the highest scope against which all text is matched), yy_push_state(continuation_record_states[previous_state]) is called, bringing us back to the scope in 2. continuation_record_states[] maps each state with its continuation record state, which is used by the parser.
As you can see, this is quite complicated, which leads me to conclude that I'm massively over-complicating the task.
Questions
For states lacking an extremely clear token signifying the end of its scope, is my proposed solution acceptable?
Given that I want to tokenize the input using flex, is there any way to do so without start conditions?
The biggest problem I'm having is that each record (beginning with the (SUB)CODE prefix) is unique, but the information appearing after the (SUB)CODE prefix is not. Therefore, it almost appears mandatory to have multiple states like this, and the abstract CODE_STATE and SUBCODE_STATE states would act as groupings for each of the concrete SUBCODE[0-9]+_STATE and CODE[0-9]+_STATE states.
I would look at how the OMeta parser handles these things.
If I evaluate independent variables in the same EVALUATE TRUE block, are they evaluated in the order they are listed?
E.g., if it's "COLD" and "SUNNY", would I ever "BRING SUNGLASSES?" Or would I just "WEAR SWEATER" and exit the block?
EVALUATE TRUE
WHEN COLD
WEAR SWEATER
WHEN SUNNY
BRING SUNGLASSES
END-EVALUATE
In many other languages, we often need to insert break statement (or similar) to each selection, so that it does not fall through. However that’s not the case in COBOL evaluate, COBOL evaluate ends when one of those selections satisfies (or none).
Yes, they are evaluated in the order they are listed. Once it meets the condition of one of the WHEN statements it breaks out of the code and goes to the END-EVALUATE
I have a grammar
S->a{b}
and I'm trying to rewrite it to avoid using {} . If I write
S->a|aB
B->b|bB
then I'm unable to parse predictively in the second rule. If I write
S->a|aB
B->b|Bb
then I become left-recursive in the second rule.
Trying to do left-factoring,
B->bC
C->(e)|B
I'm introducing empty symbols. The wish so far is to make grammar without (e), suitable for predictive parsing and not left-recursive.
Is it possible?
I don't think so. Essentially, you can drop the a part in your grammar and the first of the b's, they are not relevant to your problem. You have then enumerated all three styles of declaring an infinite number of b's. You'll have to choose one of them.
I'd advise to just go with the empty symbol.
What is the difference between using as="element(data)+" and as="element(data)" in xsl:variable. The below XSL solution works if use "+" but not when i use "". Can some one clarify.
element(data)+
means a sequence of one or more data elements. That is, the sequence cannot be empty.
element(data)*
means a sequence of zero or more `data elements. That is, the sequence can be empty.