i currently have a method
pMain
| parser |
parser := 'proc' asParser, #space asParser, "<---- im trying to use the method identifier here - so i tried self identifier, instead of 'proc' asParser
#letter asParser plus, $( asParser,
'int' asParser,#space asParser,
#letter asParser plus, $) asParser,
#space asParser, 'corp' asParser.
^ parser
i also have these two methods
1- keywords method
keywords
^ keywords ifNil: [
keywords := Set newFrom: #(
proc corp
if then else fi
do od
print as
while
for from to by
return
int string real array format bool
true false
mutable static
)
]
2- identifier method
identifier
^ ((#letter asParser , #word asParser star) flatten) >=> [ : ctxt : aBlock | | parse |
parse := aBlock value.
(self keywords includes: parse) ifTrue: [
PPFailure message: 'keyword matched' context: ctxt
] ifFalse: [
parse
]]
Question: how is the identifier parser used in pMain?
I feed it this line
MyParser new pMain:= 'proc a( int a ) corp'
'proc' asParser returns a parser that accepts the string 'proc'; this is similar to $p asParser that returns a parser that accepts the character $p.
I guess your question is about how to refer to parser productions. In subclasses of PPCompositeParser you can do this by creating a method that returns its parser (you did this). Then productions refer to each other by reading the respective instance variable of the same name (you have to create those yourself, unless you use the PetitParser tools).
You can find a tutorial about composite parsers in the documentation.
Related
Problem
When I use do, a lua keyword as a table's key it gives following error
> table.newKey = { do = 'test' }
stdin:1: unexpected symbol near 'do'
>
I need to use do as key. What should I do ?
sometable.somekey is syntactic sugar for sometable['somekey'],
similarly { somekey = somevalue } is sugar for { ['somekey'] = somevalue }
Information like this can be found in this very good resource:
For such needs, there is another, more general, format. In this format, we explicitly write the index to be initialized as an expression, between square brackets:
opnames = {["+"] = "add", ["-"] = "sub",
["*"] = "mul", ["/"] = "div"}
-- Programming in Lua: 3.6 – Table Constructors
Use this syntax:
t = { ['do'] = 'test' }
or t['do'] to get or set a value.
I need to use do as key. What should I do ?
Read the Lua 5.4 Reference Manual and understand that something like t = { a = b} or t.a = b only works if a is a valid Lua identifier.
3.4.9 - Table constructors
The general syntax for constructors is
tableconstructor ::= ‘{’ [fieldlist] ‘}’
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
fieldsep ::= ‘,’ | ‘;’
A field of the form name = exp is equivalent to ["name"] = exp.
So why does this not work for do?
3.1 - Lexical Conventsions
Names (also called identifiers) in Lua can be any string of Latin
letters, Arabic-Indic digits, and underscores, not beginning with a
digit and not being a reserved word. Identifiers are used to name
variables, table fields, and labels.
The following keywords are reserved and cannot be used as names:
and break do else elseif end
false for function goto if in
local nil not or repeat return
do is not a name so you need to use the syntax field ::= ‘[’ exp ‘]’ ‘=’ exp
which in your example is table.newKey = { ['do'] = 'test' }
I'm trying to implement an analyzer via a monadic parser combination using FParsec. I also use an indentation module, but it is not important for the current problem.
So I'm trying to parse this branch of my little AST:
type Identifier = string
type Expression =
...
| Call of Identifier * Expression list
...
type Program = Program of Expression list
I have this implementation:
// Identifier parser
let private identifier =
many1Satisfy2L isLetter
(fun c -> isLetter c || isDigit c) "identifier"
// Monadic parser
let call = parse {
let! id = identifier
let! parameters = sepBy parameter spaces1
return Call(id, parameters)
}
and expression =
... // <|>
attempt call // <|>
...
parameter represents all acceptable expressions in a function call as a parameter:
// All possible parameter
let parameter =
attempt pint32 <|> // A number, for example.
attempt (identifier |>> fun callid -> Call(callid, [])) <|>
attempt (between (pstring "(") (pstring ")") call)
As you can see, there are two elements of the parser that use the "C/call". They correspond to this, in order and for example:
add x 1 // Call the function `add` and passing it another call (`x`) and the simple literal `1`
add (add 8 2) 10 // Call the function `add` and passing it another call (`add 8 2`) and the simple literal `10`
And of course, these elements can also be intertwined:
add (add x 1) 7
The problem, which I obviously can't solve otherwise I won't ask the question, is that the generated tree doesn't look like what is expected:
add x 1 gives:
Success: Program [Call ("add",[Call ("x",[]); Literal (Int 1)])]
In other words, the parser seems to identify the following x as the arguments of x.
However, the second way works. add (add 8 2) 10 gives:
Success: Program
[Call
("add",[Call ("add",[Literal (Int 8); Literal (Int 2)]); Literal (Int 10)])]
Could you put me on track?
To me it looks like the following line matches x:
attempt (identifier |>> fun callid -> Call(callid, [])) <|>
A single identifier is seen as a call.
It therefore doesn't suprise that you get: [Call ("add",[Call ("x",[]); Literal (Int 1)])
As you think this is wrong what is the expected results for you? I would have expected this to be an identifer dereference expression.
Also a tip, you seem to have opted for the OCaml style of invoking functions like so: f x y
Perhaps you should also go for the OCaml style that a function always takes a single argument and returns a single value.
add x 1 would then be parsed into something like: Apply (Apply (Identifier "add", Identifier "x"), Literal 1)
I'm stuck on this problem for a while now, hope you can help. I've got the following (shortened) language grammar:
lexical Id = [a-zA-Z][a-zA-Z]* !>> [a-zA-Z] \ MyKeywords;
lexical Natural = [1-9][0-9]* !>> [0-9];
lexical StringConst = "\"" ![\"]* "\"";
keyword MyKeywords = "value" | "Male" | "Female";
start syntax Program = program: Model* models;
syntax Model = Declaration;
syntax Declaration = decl: "value" Id name ':' Type t "=" Expression v ;
syntax Type = gender: "Gender";
syntax Expression = Terminal;
syntax Terminal = id: Id name
| constructor: Type t '(' {Expression ','}* ')'
| Gender;
syntax Gender = male: "Male"
| female: "Female";
alias ASLId = str;
data TYPE = gender();
public data PROGRAM = program(list[MODEL] models);
data MODEL = decl(ASLId name, TYPE t, EXPR v);
data EXPR = constructor(TYPE t, list[EXPR] args)
| id(ASLId name)
| male()
| female();
Now, I'm trying to parse:
value mannetje : Gender = Male
This parses fine, but fails on implode, unless I remove the id: Id name and it's constructor from the grammar. I expected that the /MyKeywords would prevent this, but unfortunately it doesn't. Can you help me fix this, or point me in the right direction to how to debug? I'm having some trouble with debugging the Concrete and Abstract syntax.
Thanks!
It does not seem to be parsing at all (I get a ParseError if I try your example).
One of the problems is probably that you don't define Layout. This causes the ParseError with you given example. One of the easiest fixes is to extend the standard Layout in lang::std::Layout. This layout defines all the default white spaces (and comment) characters.
For more information on nonterminals see here.
I took the liberty in simplifying your example a bit further so that parsing and imploding works. I removed some unused nonterminals to keep the parse tree more concise. You probably want more that Declarations in your Program but I leave that up to you.
extend lang::std::Layout;
lexical Id = ([a-z] !<< [a-z][a-zA-Z]* !>> [a-zA-Z]) \ MyKeywords;
keyword MyKeywords = "value" | "Male" | "Female" | "Gender";
start syntax Program = program: Declaration* decls;
syntax Declaration = decl: "value" Id name ':' Type t "=" Expression v ;
syntax Type = gender: "Gender";
syntax Expression
= id: Id name
| constructor: Type t '(' {Expression ','}* ')'
| Gender
;
syntax Gender
= male: "Male"
| female: "Female"
;
data PROGRAM = program(list[DECL] exprs);
data DECL = decl(str name, TYPE t, EXPR v);
data EXPR = constructor(TYPE t, list[EXPR] args)
| id(str name)
| male()
| female()
;
data TYPE = gender();
Two things:
The names of the ADTs should correspond to the nonterminal names (you have difference cases and EXPR is not Expression). That is the only way implode can now how to do its work. Put the data decls in their own module and implode as follows: implode(#AST::Program, pt) where pt is the parse tree.
The grammar was ambiguous: the \ MyKeywords only applied to the tail of the identifier syntax. Use the fix: ([a-zA-Z][a-zA-Z]* !>> [a-zA-Z]) \ MyKeywords;.
Here's what worked for me (grammar unchanged except for the fix):
module AST
alias ASLId = str;
data Type = gender();
public data Program = program(list[Model] models);
data Model = decl(ASLId name, Type t, Expression v);
data Expression = constructor(Type t, list[Expression] args)
| id(ASLId name)
| male()
| female();
First, the code:
package com.digitaldoodles.markup
import scala.util.parsing.combinator.{Parsers, RegexParsers}
import com.digitaldoodles.rex._
class MarkupParser extends RegexParsers {
val stopTokens = (Lit("{{") | "}}" | ";;" | ",,").lookahead
val name: Parser[String] = """[##!$]?[a-zA-Z][a-zA-Z0-9]*""".r
val content: Parser[String] = (patterns.CharAny ** 0 & stopTokens).regex
val function: Parser[Any] = name ~ repsep(content, "::") <~ ";;"
val block1: Parser[Any] = "{{" ~> function
val block2: Parser[Any] = "{{" ~> function <~ "}}"
val lst: Parser[Any] = repsep("[a-z]", ",")
}
object ParseExpr extends MarkupParser {
def main(args: Array[String]) {
println("Content regex is ", (patterns.CharAny ** 0 & stopTokens).regex)
println(parseAll(block1, "{{#name 3:4:foo;;"))
println(parseAll(block2, "{{#name 3:4:foo;; stuff}}"))
println(parseAll(lst, "a,b,c"))
}
}
then, the run results:
[info] == run ==
[info] Running com.digitaldoodles.markup.ParseExpr
(Content regex is ,(?:[\s\S]{0,})(?=(?:(?:\{\{|\}\})|;;)|\,\,))
[1.18] parsed: (#name~List(3:4:foo))
[1.24] failure: `;;' expected but `}' found
{{#name 3:4:foo;; stuff}}
^
[1.1] failure: string matching regex `\z' expected but `a' found
a,b,c
^
I use a custom library to assemble some of my regexes, so I've printed out the "content" regex; its supposed to be basically any text up to but not including certain token patterns, enforced using a positive lookahead assertion.
Finally, the problems:
1) The first run on "block1" succeeds, but shouldn't, because the separator in the "repsep" function is "::", yet ":" are parsed as separators.
2) The run on "block2" fails, presumably because the lookahead clause isn't working--but I can't figure out why this should be. The lookahead clause was already exercised in the "repsep" on the run on "block1" and seemed to work there, so why should it fail on block 2?
3) The simple repsep exercise on "lst" fails because internally, the parser engine seems to be looking for a boundary--is this something I need to work around somehow?
Thanks,
Ken
1) No, "::" are not parsed as separators. If it did, the output would be (#name~List(3, 4, foo)).
2) It happens because "}}" is also a delimiter, so it takes the longest match it can -- the one that includes ";;" as well. If you make the preceding expression non-eager, it will then fail at "s" on "stuff", which I presume is what you expected.
3) You passed a literal, not a regex. Modify "[a-z]" to "[a-z]".r and it will work.
I am using Scala combinatorial parser by extending scala.util.parsing.combinator.syntactical.StandardTokenParser. This class provides following methods
def ident : Parser[String] for parsing identifiers and
def numericLit : Parser[String] for parsing a number (decimal I suppose)
I am using scala.util.parsing.combinator.lexical.Scannersfrom scala.util.parsing.combinator.lexical.StdLexicalfor lexing.
My requirement is to parse a hexadecimal number (without the 0x prefix) which can be of any length. Basically a grammar like: ([0-9]|[a-f])+
I tried integrating Regex parser but there are type issues there. Other ways to extend the definition of lexer delimiter and grammar rules lead to token not found!
As I thought the problem can be solved by extending the behavior of Lexer and not the Parser. The standard lexer takes only decimal digits, so I created a new lexer:
class MyLexer extends StdLexical {
override type Elem = Char
override def digit = ( super.digit | hexDigit )
lazy val hexDigits = Set[Char]() ++ "0123456789abcdefABCDEF".toArray
lazy val hexDigit = elem("hex digit", hexDigits.contains(_))
}
And my parser (which has to be a StandardTokenParser) can be extended as follows:
object ParseAST extends StandardTokenParsers{
override val lexical:MyLexer = new MyLexer()
lexical.delimiters += ( "(" , ")" , "," , "#")
...
}
The construction of the "number" from digits is taken care by StdLexical class:
class StdLexical {
...
def token: Parser[Token] =
...
| digit~rep(digit)^^{case first ~ rest => NumericLit(first :: rest mkString "")}
}
Since StdLexical gives just the parsed number as a String it is not a problem for me, as I am not interested in numeric value either.
You can use the RegexParsers with an action associated to the token in question.
import scala.util.parsing.combinator._
object HexParser extends RegexParsers {
val hexNum: Parser[Int] = """[0-9a-f]+""".r ^^
{ case s:String => Integer.parseInt(s,16) }
def seq: Parser[Any] = repsep(hexNum, ",")
}
This will define a parser that reads comma separated hex number with no prior 0x. And it will actually return a Int.
val result = HexParser.parse(HexParser.seq, "1, 2, f, 10, 1a2b34d")
scala> println(result)
[1.21] parsed: List(1, 2, 15, 16, 27439949)
Not there is no way to distinguish decimal notation numbers. Also I'm using the Integer.parseInt, this is limited to the size of your Int. To get any length you may have to make your own parser and use BigInteger or arrays.