I have created a custom DSL in xtext along with LSP support, which looks something like
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
(intParameters+=IntParameter ("," intParameters+=IntParameter)*)?
stringParameters+=StringParameter*
elements+=Element*
otherElements+=AnotherElement*
;
Element returns Element:
'Element'
'{'
'name' name=StringValue
'}';
ParameterElement returns ParameterElement:
{ParameterElement} ref=([StringParameter|STRING])?
;
AnotherElement returns AnotherElement:
'AnotherElement'
'{'
'name' name=StringValue
'value' value=[Element]
'}';
StringValue:
{StringValue} ref=([StringParameter|STRING])?("+" STRING)? | value=ID
;
StringElement:
Element | StringParameter
;
StringParameter returns StringParameter:
{StringParameter}
'StringParameter'
name=ID
'{'
('value' value=STRING)?
'}';
IntValue:
ref=[IntParameter] | value=DECINT
;
IntParameter returns IntParameter:
{IntParameter}
'IntParameter'
name=ID
'{'
('value' value=DECINT)?
'}';
terminal fragment DIGIT: '0'..'9';
terminal DECINT: '0' | ('1'..'9' DIGIT*) | ('-''0'..'9' DIGIT*) ;
I was able to create vs code extension, where I am able to get code completion and keyword highlighting. I have implemented some basic validation in xtext, which works well in vs code as well.
Now my question is, how can I parse my DSL file? I have access to the current file and able to print the text of it
const document = vscode.window.activeTextEditor?.document;
console.log(document.gettext())
For xml files, I saw examples which used
xml2js.parseStringPromise(document.getText(), {
mergeAttrs: true,
explicitArray: false
}))
How can I do something like this for my custom language? Since I have used xtext with LSP support, I should be able to use the underlying parser of xtext right? I am able to get all the symbols in file with
let symbols = await vscode.commands.executeCommand ('vscode.executeDocumentSymbolProvider', uri);
console.log (symbols);
But I don't want just the symbols, but the entire text to be parsed.
Edit: I found https://github.com/tunnelvisionlabs/antlr4ts which generates parsers in typescript, which is exactly what I wanted! Only problem is that it needs g4, but xtext generetes .g (I guess this is v3). But then there is this another nice tool https://github.com/kaby76/Domemtech.Trash which converts .g2/3 to .g4
But now I get errors
line 1556:29 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1556:62 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1556:74 mismatched input ')' expecting SEMI
line 1560:25 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1560:36 mismatched input ')' expecting SEMI
Error in parse of /home/parser/InternalMyDsl.g4
The rules look like
1548 fragment RULE_DIGIT : '0'..'9';
1549
1550 RULE_DECINT : ('0'|'1'..'9' RULE_DIGIT*|'-' '0'..'9' RULE_DIGIT*);
1551
1552 RULE_ID : '^'? ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
1553
1554 RULE_INT : ('0'..'9')+;
1555
1556 RULE_STRING : ('"' ('\\' .|~(('\\'|'"')))* '"'|'\'' ('\\' .|~(('\\'|'\'')))* '\'');
1557
1558 RULE_ML_COMMENT : '/*' ( . ) * ?'*/';
1559
1560 RULE_SL_COMMENT : '//' ~(('\n'|'\r'))* ('\r'? '\n')?;
1561
1562 RULE_WS : (' '|'\t'|'\r'|'\n')+;
1563
1564 RULE_ANY_OTHER : .;
It has been auto-generated by xtext, so not sure if I should be modifying it.
I found https://github.com/tunnelvisionlabs/antlr4ts which generates parsers in typescript, which is exactly what I wanted! Once I run the command
antlr4ts -visitor src/InternalKinematics.g
I got some errors
line 1556:29 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1556:62 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1556:74 mismatched input ')' expecting SEMI
line 1560:25 extraneous input '(' expecting {TOKEN_REF, LEXER_CHAR_SET, STRING_LITERAL}
line 1560:36 mismatched input ')' expecting SEMI
Error in parse of /home/parser/InternalMyDsl.g4
Based on the answer in Simple Xtext example generates grammar that Antlr4 doesn't like - who's to blame?, I just removed the extra parenthesis, and it worked!
Related
I have a parser laid out similarly to (though not exactly like) this:
compilationUnit: statement* EOF;
methodBody: statement*;
ifBlock: IF '(' expression ')' '{' statement* '}' ;
statement: methodBody | ifBlock | returnStatement | ... ;
This parser works fine, and I can use it. However, it has the flaw that returnStatement will parse even if it's not in a method body. Ideally, I would be able to set it up such that returnStatement will only match in the statement rule if one of its parents down the line was methodBody. Is there a way to do that?
You have to differentiate the statements that appear inside the methodBody from the ones that appear outside of that scope. Ideally you will have two different productions. Something like this:
compilationUnit: member* EOF;
member: method | class | ... ;
method: methodName '(' parameters ')' '{' methodBody '}' ;
methodBody: statement*;
statement: methodBody | ifBlock | returnStatement | ... ;
ifBlock: IF '(' expression ')' '{' statement* '}' ;
You are trying to solve this problem at the wrong level. It shouldn't be handled at the grammar level, but in the following semantic phase (which is used to find logical/semantic errors, instead of syntax errors, what your parser is dealing with). You can see the same approach in the C grammar. The statement rule references the jumpStatement rule, which in turn matches (among others) the return statement.
Handling such semantic errors also allows for better error messages. Instead of "no viable alt" you can print an error that says "return only allowed in a function body" or similar. To do that examine the generated parse tree, search for return statements and check the parent context(s) of that sub tree to know if it is valid or not.
I'm trying to parse 'for loop' according to this (partial) grammar:
grammar GaleugParserNew;
/*
* PARSER RULES
*/
relational
: '>'
| '<'
;
varChange
: '++'
| '--'
;
values
: ID
| DIGIT
;
for_stat
: FOR '(' ID '=' values ';' values relational values ';' ID varChange ')' '{' '}'
;
/*
* LEXER RULES
*/
FOR : 'for' ;
ID : [a-zA-Z_] [a-zA-Z_0-9]* ;
DIGIT : [0-9]+ ;
SPACE : [ \t\r\n] -> skip ;
When I try to generate the gui of how it's parsed, it's not following the grammar I provided above. This is what it produces:
I've encountered this problem before, what I did then was simply exit cmd, open it again and compile everything and somehow that worked then. It's not working now though.
I'm not really very knowledgeable about antlr4 so I'm not sure where to look to solve this problem.
Must be a problem of the IDE you are using. The grammar is fine and produces this parse tree in Visual Studio Code:
I guess the IDE is using the wrong parser or lexer (maybe from a different work file?). Print the lexer tokens to see if they are what you expect. Hint: avoid defining implicit lexer tokens (like '(', '}' etc.), which will allow to give the tokens good names.
I don't understand what is wrong with this grammar:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
header=Header (elements+=Element)*;
Header:
'Test:Revision' version=Decimal ';'
;
Decimal:
INT'.'INT
;
Element:
TableRow
;
TableRow:
'__Row' name=ID '{'
'__Alias' '=' Alias(','Alias)* ';'
'}'
;
Alias:
'0'|'1'|'H'|'L'
;
The following simple test statement fails with JUnit with the message "mismatched input '0' expecting RULE_INT on Header
Test:Revision2.0;
Everything works fine if I remove '0' from the Alias rule or I change the test statement to:
Test:Revision2.00;
Can you please tell me what is wrong with this grammar?
With Alias you turn '0' into a keyword so it can never be matched by the INT terminal rule. The same would happen if you create a element with name 'L' or name 'H' you could introduce a datatype rule like
IntValue: INT | '0' | '1';
and use that one instead of INT inside Decimal
I am currently writing a parser with yecc in Erlang.
Nonterminals expression.
Terminals '{' '}' '+' '*' 'atom' 'app' 'integer' 'if0' 'fun' 'rec'.
Rootsymbol expression.
expression -> '{' '+' expression expression '}' : {'AddExpression', '$3','$4'}.
expression -> '{' 'if0' expression expression expression '}' : {'if0', '$3', '$4', '$5'}.
expression -> '{' '*' expression expression '}' : {'MultExpression', '$3','$4'}.
expression -> '{' 'app' expression expression '}' : {'AppExpression', '$3','$4'}.
expression -> '{' 'fun' '{' expression '}' expression '}': {'FunExpression', '$4', '$6'}.
expression -> '{' 'rec' '{' expression expression '}' expression '}' : {'RecExpression', '$4', '$5', '$7'}.
expression -> atom : '$1'.
expression -> integer : '$1'.
I also have an erlang project that tokenizes the the input before parsing:
tok(X) ->
element(2, erl_scan:string(X)).
get_Value(X)->
element(2, parse(tok(X))).
These cases are accepted:
interp:get_Value("{+ {+ 4 6} 6}").
interp:get_Value("{+ 4 2}").
These return:
{'AddExpression' {'AddExpression' {integer, 1,6} {integer,1,6}}{integer,1,6}}
and
{'AddExpression' {integer,1,4} {integer,1,2}}
But this test case:
interp:get_Value("{if0 3 4 5}").
Returns:
{1,string_parser,["syntax error before: ","if0"]}
In the grammar rules what you are showing are the category of the terminal tokens and not their values. So you can match against an atom but not against a specific atom. If you are using the Erlang tokenizer then the token generated for "if0" will be {atom,Line,if0} while in you grammar you want a {if0,Line} token. This is what the "Pre-processing" section of the yecc documentation is trying to explain.
You will need a special tokenizer for this. A simple way of handling this if you want to use the Erlang tokenizer is have a pre-processing pass which scans the token list and converts {atom,Line,if0} tokens to {if0,Line} tokens.
I need a Xtext grammar rule (or multiple) working similar to the following:
1: CollectionGetElement:
2: val=[VariableReference] '='
3: (ref=[List] | ref=[Bytefield] | ref=[Map])
4: '[' keys+=GetElementKeyType ']' ('[' keys+=GetElementKeyType ']')* ';';
5: GetElementKeyType:
6: key=[VariableReference] | INT | STRING;
Like this unfortuantely it doesn't work, because of the 3 line!
I also tried 3 seperated rules (for: map, list and bytefield), but then It's difficult (impossible) for the parser to recognize the correct rule.
ListGetElement:
val=[VariableReference] '='
ref=[List]
'[' key+=GetElementKeyType ']' ('[' key+=GetElementKeyType ']')* ';';
... same for the others
Error then is:
Decision can match input such as "RULE_ID '=' RULE_ID '[' RULE_ID ']' '[' RULE_ID ']' ';'" using multiple alternatives: 5, 6
The following alternatives can never be matched: 6
What's the best way to achive this?
there are two problems in your grammar,
assigning 3 different types to attribute 'ref'
generating 3 different types by parsing some ID
I am not sure what do you want to do. But, I can give you an example. Hope it can help you.
e.g.
List:
'list' '(' elements += Element * ')';
Map:
'map' '(' pairs += Pair * ')';
GeneralDataType:
List | Map
CollectionGetElement:
val=[VariableReference] '='
ref = GeneralDataType
;