Meaning of rule " <*> " in flex - flex-lexer

I have seen this and im not sure of his meaning.
<*> --
Does it mean that it cover any state (Initial + all the ones declared with /x) ?

Yes, that's exactly what it means. See the start conditions section in the flex manual.
Note that start conditions can be declared either with %x or %s. The difference is explained in the manual section linked above.

Related

Is there a guide/convention to the markup being used in the lua official documentation?

I consistently have a really hard time reading official documentation when it's related to coding. I generally don't understand it unless it's paired with an example. I am seeking clarification on what kind of conventions are inplace when reading docs, if any. Take the example below from the lua manual(https://www.lua.org/manual/5.1/manual.html#2.1)
:
stat ::= if exp then block {elseif exp then block} [else block] end
The first word, Stat, is defined as a statement and "this set includes assignments, control structures, function calls, and variable declarations."
::= Is not defined in the docs, it can be googled thankfully.
Exp is linked and explained.
Block has a section as well.
But then they do {} and []. They literally stated "Square brackets are used to index a table" just a few lines above. And that squiggly brackets are for writing a table. So what am I supposed to deduce from this? That {} and [] are being used to denote separate sections as a markup to make it easier to see certain components? Or that {elseif exp then block} is a table with those values inside of itself and [else block] is a key-value indexing a table? If I was writing a doc where that was indeed the case, wouldn't I write it this way?
Then I see
var ::= prefixexp `[´ exp `]´`
' ' defines a string, but I have to make the assumption that '[' ']' is used as a way to highlight the fact that because they were talking about what square brackets do in the previous section they are simply highlighting their position and this should not be included in the code. I only know to make this assumption though cause I know it doesn't work when you put them in there.
But then I see this:
chunk ::= {stat [`;´]}
Similarly they are talking about the placement of the semicolon before listing that code, but the entirety of the line of code was also newly explained and being talked about. Why would I assume that its written without the parenthesis if its written with the parenthesis? And I see they are using {} and [] again, and I have no idea what they are referencing because its not stated explicitly that we're talking about a table...its simply using the code itself to explain whether its talking about a table or not with the {}, but we have that first set of code where {} is being used and its not talking about a table.
What is the convention being used? What are they actually trying to do/show by using {} and [] in the first line of code?
As stated both at the beginning of the Lua documentation and in the section on the Lua grammar, Lua presents its grammar in extended BNF format.
EBNF has its own punctuation with its own meaning, like ::= as you discovered. But as a grammar, there needs to be a distinction between the EBNF meaning of a piece of punctuation and "this punctuation appears in the language defined by the grammar". The former meaning is therefore always assumed; the latter meaning can only be achieved by quoting the punctuation.
So this:
var ::= prefixexp `[´ exp `]´`
Means a prefixexp followed by an open bracket followed by exp followed by a close bracket.
By contrast, this:
funcname ::= Name {`.´ Name} [`:´ Name]
Means Name followed by zero or more sub-sequences of . followed by Name, followed by an optional sub-sequence of : followed by Name. Because those are what {} and [] mean to EBNF.

why we need both Look Ahead symbol and read ahead symbol in Compiler

well i was reading some common concepts regarding parsing in compiler..i came across look ahead and read ahead symbol i search and read about them but i am stuck like why we need both of them ? would be grateful for any kind suggestion
Lookahead symbol: when node being considered in parse tree is for a terminal, and the
terminal matches lookahead symbol,then we advance in both parse and
input
read aheadsymbol: lexical analyzer may need to read some character
before it can decide on the token to be returned
One of these is about parsing and refers to the next token to be produced by the lexical scanner. The other one, which is less formal, is about lexical analysis and refers to the next character in the input stream. It should be clear which is which.
Note that while most parsers only require a single lookahead token, it is not uncommon for lexical analysis to have to backtrack, which is equivalent to examining several unconsumed input characters.
I hope I got your question right.
Consider C.
It has several punctuators that begin the same way:
+, ++, +=
-, --, -=, ->
<, <=, <<, <<=
...
In order to figure out which one it is when you see the first + or - or <, you need to look ahead one character in the input (and then maybe one more for <<=).
A similar thing can happen at a higher level:
{
ident1 ident2;
ident3;
ident4:;
}
Here ident1, ident3 and ident4 can begin a declaration, an expression or a label. You can't tell which one immediately. You can consult your existing declarations to see if ident1 or ident3 is already known (as a type or variable/function/enumeration), but it's still ambiguous because a colon may follow and if it does, it's a label because it's permitted to use the same identifier for both a label and a type/variable/function/enumeration (those two name spaces do not intersect), e.g.:
{
typedef int ident1;
ident1 ident2; // same as int ident2
int ident3 = 0;
ident3; // unused expression of value 0
ident1:; // unused label
ident2:; // unused label
ident3:; // unused label
}
So, you may very well need to look ahead a character or a token (or "unread" one) to deal with situations like these.

Pin & recoverWhile in a .bnf (Parsing)

I've searched the internet far and wide (for at least half a day now) and I can't seem to find the answers needed.
Currently I'm trying to create a .bnf-file for an IntelliJ-Plugin with custom language support.
A few tutorials mention the existance of {pin=1},{pin=2} and {recoverWhile=xyz}, but I didn't find any real explanation on their uses, and if there are any other things I should know (maybe a {pin=3} also exists?).
So could somebody tell me what exactly those flags, methods or however they're called are, and how to use them in my .bnf, please?
Thank you for your help and best regards,
Fuchs
These attributes are explained here:
https://github.com/JetBrains/Grammar-Kit/blob/master/HOWTO.md#22-using-recoverwhile-attribute
https://github.com/JetBrains/Grammar-Kit/blob/master/TUTORIAL.md
But the usage is not trivial. A good idea is to use Live Preview to play around with it.
My understanding:
Pin and recoverWhile attributes are used to recover parser from errors.
Pin specifies a part of the rule (by index or literally) after successful parsing of which the rule considered successful.
In the example:
expr ::= expr1 "+" expr2 {pin=1}
if expr1 is matched, the whole rule will be considered successful and parser will try yo match the rest.
if pin=2 the rule will be considered successful after matching "+" and will fail if expr1 or "+" not matched.
RecoverWhile attribute specifies where to skip after parsing the rule. Independently of its success.
For example
{recoverWhile=expr_recover}
expr_recover ::= !(";" | ".")
will skip all input before ";" or ".". I.e. parser will start matching next rule from ";" or ".".

How do you read the "<<" and ">>" symbols out loud?

I'm wondering if there is a standard way, if we are pronouncing typographical symbols out loud, for reading the << and >> symbols? This comes up for me when teaching first-time C++ students and discussing/fixing exactly what symbols need to be written in particular places.
The best answer should not be names such as "bitwise shift" or "insertion", because those refer to more specific C++ operators, as opposed to the context-free symbol itself (which is what we want here). In that sense, this question is not the same as questions such as this or this, none of whose answers satisfy this question.
Some comparative examples:
We can read #include <iostream> as "pound include bracket iostream
bracket".
We can read int a, b, c; as "int a comma b comma c
semicolon".
We can read if (a && b) c = 0; as "if open parenthesis a double ampersand b close parenthesis c equals zero semicolon".
So an equivalent question would be: How do we similarly read cout << "Hello";? At the current time in class we are referring to these symbols as "left arrow" and "right arrow", but if there is a more conventional phrasing I would prefer to use that.
Other equivalent ways of stating this question:
How do we typographically read <<?
What is the general name of the symbol <<, whether being used for bit-shifts, insertion, or overloaded for something entirely new?
If a student said, "Professor, I don't remember how to make an insertion operator; please tell me what symbol to type", then what is the best verbal response?
What is the best way to fill in this analogy? "For the multiplication operation we use an asterisk; for the division operation we use a forward-slash; for the insertion operation we use ____."
Saw this question through your comment on Slashdot. I suggest a simpler name for students that uses an already common understanding of the symbol. In the same way that + is called "plus" and - is (often) called "minus," you can call < by the name "less" or "less-than" and > by "greater" or "greater-than." This recalls math operations and symbols that are taught very early for most students and should be easy for them to remember. Plus, you can use the same name when discussing the comparison operators. So, you would read
std::cout << "Hello, world!" << std::endl;
as
S T D colon colon C out less less double-quote Hello comma world exclamation-point double-quote less less S T D colon colon end L semicolon.
Also,
#include <iostream>
as
pound include less I O stream greater
So, the answer to
"Professor, I don't remember how to make an insertion operator; please tell me what symbol to type."
is "Less less."
The more customary name "left/right angle bracket" should be taught at the same time to teach the more common name, but "less/greater" is a good reminder of what the actual symbol is, I think.
Chevron is also a neat name, but a bit obscure in my opinion, not to mention the corporate affiliation.
A proposal: Taking the appearance of the insertion/extraction operators as similar to the Guillemet symbols, we might look to the Unicode description of those symbols. There they are described as "Left-pointing double angle quotation mark" and "Right-pointing double angle quotation mark" (link).
So perhaps we could be calling the symbols "double-left angle" and "double-right angle".
My comment was mistaken (Chrome's PDF Reader has a buggy "Find in File" feature that didn't give me all of the results at first).
Regarding the OP's specific question about the name of the operator, regardless of context - then there is no answer, because the ISO C++ specification does not name the operators outside of a use context (e.g. the + operator is named "addition" but only with number types, it is not named as such when called to perform string concatenation, for example). That is, the ISO C++ standard does not give operator tokens a specific name.
The section on Shift Operators (5.8) only defines and names them for integral/enum types, and the section on Overloaded Operators does not confer upon them a name.
Myself, if I were teaching C++ and explaining the <</>> operators I would say "the double-angle-bracket operator is used to denote bitshifts with integer types, and insertion/extraction with streams and strings". Or if I were being terse I'd overload the word and simply say "the bitshift operator is overloaded for streams to mean something completely different".
Regarding the secondary question (in the comment thread) about the name of the <</>> operators in the context of streams and strings, the the C++14 ISO specification (final working-draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf ) does refer to them as "extractors and inserters":
21.4.8.9 Inserters and extractors
template<class charT, class traits, class Allocator>
basic_istream<charT,traits>&
operator>>(
basic_istream<charT,traits>& is,
basic_string<charT,traits,Allocator>& str
);
(and the rest of the >> operator overload definitions follow)
This is further expanded upon on 2.7.2.2.2:
27.7.2.2.2 Arithmetic extractors
operator>>(unsigned short& val);
operator>>(unsigned int& val);
operator>>(long& val);
(and so on...)
cout << "string" << endl;// I really just say "send string to see out. Add end line."
i++; // i plus plus
auto x = class.func() // auto x equal class dot func
10 - ( i %4) * x; // ten minus the quantity i mod four times x
stdout // stud-out
stderr // stud-err
argc // arg see
argv // arg vee
char* // char pointer
&f // address of f
Just because it's an "extraction" or an "insertion" operator does not mean that is the OPERATION.
The operation is "input" and "output"
They are stream operators.
The natural label would be c-out stream output double-quote Hello world exclamation double-quote stream output endline
This the OPERATION you are doing (the verb)
What the ARM calls the operator is irrelevant in that it is a systemic way of looking at things and we are trying to help humans understand things instead

Haskell/Parsec: How do you use the functions in Text.Parsec.Indent?

I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!

Resources