check if condition is met before executing the action in JFlex - flex-lexer

I am writing a lexical analyzer using JFlex. When the word co is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true whenever this word is matched and if an identifier or an operator is matched after co until the end of the line, I simply ignore it because I have an if condition in my Identifier and Operator token identification.
I am wondering if there is better way to do this and get rid of this if statement that appears everywhere?
Here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
%{
private boolean isCommentOpen = false;
private void toggleIsCommentOpen() {
this.isCommentOpen = ! this.isCommentOpen;
}
private boolean getIsCommentOpen() {
return this.isCommentOpen;
}
%}
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%%
{Operators} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
{Identifier} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
"co" {
toggleIsCommentOpen();
}
. {}
{EndOfLine} {
if (getIsCommentOpen()) {
toggleIsCommentOpen();
}
}

One way to do this is to use states in JFlex. We say that every time the word co is matched, we enter in a state named COMMENT_STATE and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE state. So here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%xstate YYINITIAL, COMMENT_STATE
%%
<YYINITIAL> {
"co" {yybegin(COMMENT_STATE);}
}
<COMMENT_STATE> {
{EndOfLine} {yybegin(YYINITIAL);}
. {}
}
{Operators} {// Do Code}
{Identifier} {// Do Code}
. {}
{EndOfLine} {}
With this new approach, the lexer is more simpler and it's also more readable.

Related

Highlight / parse PSI elements inside another PSI element

I developed a Custom Language plugin based on this this tutorial.
My plugin parses key/value language files with format like below. Values can contain some HTML tags like <br>, <i>, <b>, <span> and \n. So I want to highlight these tags as separate PSI elements inside green PSI elements (values) (see pic). How can I overwrite my rules to get this?
#Section header
KEY1 = First<br>Value
KEY2 = Second\nValue
Bnf rules I use
lngFile ::= item_*
private item_ ::= (property|header|COMMENT|CRLF)
property ::= (KEY? SEPARATOR VALUE?) | KEY {
mixin="someClass"
implements="someClass"
methods=[getKey getValue getName setName getNameIdentifier getPresentation]
}
header ::= HEADER {
mixin="someClass"
implements="someClass"
methods=[getName setName getNameIdentifier getPresentation]
}
Flex
%%
%class LngLexer
%implements FlexLexer
%unicode
%function advance
%type IElementType
%eof{ return;
%eof}
CRLF=\R
WHITE_SPACE=[\ \n\t\f]
FIRST_VALUE_CHARACTER=[^ \n\f\\] | "\\"{CRLF} | "\\".
VALUE_CHARACTER=[^\n\f\\] | "\\"{CRLF} | "\\".
END_OF_LINE_COMMENT=("//")[^\r\n]*
HEADER=("#")[^\r\n]*
SEPARATOR=[:=]
KEY_CHARACTER=[^:=\ \n\t\f\\] | "\\ "
%state WAITING_VALUE
%%
<YYINITIAL> {END_OF_LINE_COMMENT} { yybegin(YYINITIAL); return LngTypes.COMMENT; }
<YYINITIAL> {HEADER} { yybegin(YYINITIAL); return LngTypes.HEADER; }
<YYINITIAL> {KEY_CHARACTER}+ { yybegin(YYINITIAL); return LngTypes.KEY; }
<YYINITIAL> {SEPARATOR} { yybegin(WAITING_VALUE); return LngTypes.SEPARATOR; }
<WAITING_VALUE> {CRLF}({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; }
<WAITING_VALUE> {WHITE_SPACE}+ { yybegin(WAITING_VALUE); return TokenType.WHITE_SPACE; }
<WAITING_VALUE> {FIRST_VALUE_CHARACTER}{VALUE_CHARACTER}* { yybegin(YYINITIAL); return LngTypes.VALUE; }
({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; }
[^] { return TokenType.BAD_CHARACTER; }

How to write a loop in REPL mode?

I would like to execute a loop in REPL mode but I am getting a SyntaxError: expecting '('
var methods = eval(ObjC.classes.UIViewController.$methods);
for item in methods { console.log(item) }
Here is an example for iterating and invoking class methods
var UIDevice = ObjC.classes.UIDevice.currentDevice();
UIDevice.$ownMethods
.filter(function(method) {
return method.indexOf(':') == -1 /* filter out methods with parameters */
&& method.indexOf('+') == -1 /* filter out public methods */
})
.forEach(function(method) {
console.log(method, ':', UIDevice[method]())
})
Update:
var UIViewControllerInstance = ObjC.chooseSync(ObjC.classes.UIViewController)[0];
console.log('Sanity check =', UIViewControllerInstance, JSON.stringify(UIViewControllerInstance.$ownMethods, null, 2));
UIViewControllerInstance.$ownMethods
.filter(method => { return method.indexOf(':') == -1 && method.indexOf('+') == -1 })
.forEach(method => {
console.log(method, ':', UIViewControllerInstance[method]())
})
Instead of looking for UIViewController instances on the heap, you have direct access through UIApplication
take a look # https://frida.re/docs/examples/ios/
This problem is related to command line shells and not to Frida or any other REPL tool.
This is a Single command and multiple lines subject of a shell in terminals.
To solve it, all you need to do is to add "\" at the end of each line.
Example:
var methods = eval(ObjC.classes.UIViewController.$methods);\
for (item in methods) { console.log(item) }\

How to match whitespace and comments with re2c

I started very recently to use bison for writing small compiler exercises. I am having some issues with white spaces ans comments. I was trying to debug the problem and I arrived to this source that looks like what I am looking for. I tried to chnage and erase some characters as advised but didn't work.
Also during compilation I have the following error: re2c: error: line 2963, column 0: can only difference char sets.
Below the part of the code:
yy::conj_parser::symbol_type yy::yylex(lexcontext& ctx)
{
const char* anchor = ctx.cursor;
ctx.loc.step();
// Add a lambda function to avoid repetition
auto s = [&](auto func, auto&&... params) { ctx.loc.columns(ctx.cursor - anchor); return func(params..., ctx.loc); };
%{ /* Begin re2c lexer : Tokenization process starts */
re2c:yyfill:enable = 0;
re2c:define:YYCTYPE = "char";
re2c:define:YYCURSOR = "ctx.cursor";
"return" { return s(conj_parser::make_RETURN); }
"while" | "for" { return s(conj_parser::make_WHILE); }
"var" { return s(conj_parser::make_VAR); }
"if" { return s(conj_parser::make_IF); }
// Identifiers
[a-zA-Z_] [a-zA-Z_0-9]* { return s(conj_parser::make_IDENTIFIER, std::string(anchor, ctx.cursor)); }
// String and integers:
"\""" [^\"]* "\"" { return s(conj_parser::make_STRINGCONST, std::string(anchor+1, ctx.cursor-1)); }
[0-9]+ { return s(conj_parser::make_NUMCONST, std::stol(std::string(anchor, ctx.cursor))); }
// Whitespace and comments:
"\000" { return s(conj_parser::make_END); }
"\r\n" | [\r\n] { ctx.loc.lines(); return yylex(ctx); }
"//" [^\r\n]* { return yylex(ctx); }
[\t\v\b\f ] { ctx.loc.columns(); return yylex(ctx); }
Thank you very much for pointing in the right direction or shading some lights on why this error could be solved.
You really should mention which line is line 2963. Perhaps it is here, because there seems to be an extra quotation mark in that line.
"\""" [^\"]* "\""
^

Setting up Cup/JLex parsing properly

I have a very basic lexer here:
import java_cup.runtime.*;
import java.io.IOException;
%%
%class AnalyzerLex
%function next_token
%type java_cup.runtime.Symbol
%unicode
//%line
//%column
// %public
%final
// %abstract
%cupsym sym
%cup
%cupdebug
%eofval{
return sym(sym.EOF);
%eofval}
%init{
// TODO: code that goes to constructor
%init}
%{
private Symbol sym(int type)
{
return sym(type, yytext());
}
private Symbol sym(int type, Object value)
{
return new Symbol(type, yyline, yycolumn, value);
}
private void error()
throws IOException
{
throw new IOException("Illegal text at line = "+yyline+", column = "+yycolumn+", text = '"+yytext()+"'");
}
%}
ANY = .
%%
{ANY} { return sym(sym.ANY); }
"\n" { }
And this is my very basic parser:
import java_cup.runtime.*;
parser code
{:
public void syntax_error(Symbol cur_token) {
System.err.println("syntax_error " + cur_token );
}
:}
action code
{:
:}
terminal ANY;
non terminal grammar;
grammar ::= ANY : a
{:
//System.out.println(a);
:}
;
I am trying to parse a sample file. I made a method like this:
AnalyzerLex scanner = null;
ParserCup pc = null;
try {
scanner = new AnalyzerLex( new java.io.FileReader(argv[i]) );
pc = new ParserCup(scanner);
while ( !scanner.zzAtEOF ){
pc.parse_debug();
}
}
But the above code throws an error:
#2
Unexpected exception:
# Initializing parser
# Current Symbol is #2
# Shift under term #2 to state #2
# Current token is #2
syntax_error #2
# Attempting error recovery
# Finding recovery state on stack
# Pop stack by one, state was # 2
# Pop stack by one, state was # 0
# No recovery state found on stack
# Error recovery fails
Couldn't repair and continue parse at character 0 of input
java.lang.Exception: Can't recover from previous error(s)
at java_cup.runtime.lr_parser.report_fatal_error(lr_parser.java:375)
at java_cup.runtime.lr_parser.unrecovered_syntax_error(lr_parser.java:424)
at java_cup.runtime.lr_parser.debug_parse(lr_parser.java:816)
at AnalyzerLex.main(AnalyzerLex.java:622)
I think that I am setting up the lexer/parser not properly.
I am not an expert but I can recommend you to take these actions:
You may have to specify which non terminal to start with, for example:
start with compilation_unit;
You can enhance your syntax error method by adding line and column, that way it is clearer where the error is.
public void syntax_error(Symbol s){
System.out.println("compiler has detected a syntax error at line " + s.left
+ " column " + s.right);
}

Human readable key names for characters

I want to trigger some code if the 'A' key is pressed:
document.on.keyDown.add((event) {
if (event.keyIdentifier == 'U+0041') {
...
}
});
Using the unicode code (U+0041) of the character is not very readable. Is there any method I can use to convert the code to a character or vice versa? I would like to do something like this:
document.on.keyDown.add((event) {
if (event.keyIdentifier == unicodeCode('A')) {
...
}
});
I hope this will help:
document.on.keyPress.add((KeyboardEvent event) {
final String char = new String.fromCharCodes([event.charCode]);
print('Key: $char');
if (event.keyIdentifier == 'U+0041') {
print('$char pressed.');
}
if (char == 'a') {
print('Lowercase "a" has been pressed.');
}
});
Currently the keyDown event doesn't provide the appropriate charCodes.

Resources