Setting up Cup/JLex parsing properly - parsing

I have a very basic lexer here:
import java_cup.runtime.*;
import java.io.IOException;
%%
%class AnalyzerLex
%function next_token
%type java_cup.runtime.Symbol
%unicode
//%line
//%column
// %public
%final
// %abstract
%cupsym sym
%cup
%cupdebug
%eofval{
return sym(sym.EOF);
%eofval}
%init{
// TODO: code that goes to constructor
%init}
%{
private Symbol sym(int type)
{
return sym(type, yytext());
}
private Symbol sym(int type, Object value)
{
return new Symbol(type, yyline, yycolumn, value);
}
private void error()
throws IOException
{
throw new IOException("Illegal text at line = "+yyline+", column = "+yycolumn+", text = '"+yytext()+"'");
}
%}
ANY = .
%%
{ANY} { return sym(sym.ANY); }
"\n" { }
And this is my very basic parser:
import java_cup.runtime.*;
parser code
{:
public void syntax_error(Symbol cur_token) {
System.err.println("syntax_error " + cur_token );
}
:}
action code
{:
:}
terminal ANY;
non terminal grammar;
grammar ::= ANY : a
{:
//System.out.println(a);
:}
;
I am trying to parse a sample file. I made a method like this:
AnalyzerLex scanner = null;
ParserCup pc = null;
try {
scanner = new AnalyzerLex( new java.io.FileReader(argv[i]) );
pc = new ParserCup(scanner);
while ( !scanner.zzAtEOF ){
pc.parse_debug();
}
}
But the above code throws an error:
#2
Unexpected exception:
# Initializing parser
# Current Symbol is #2
# Shift under term #2 to state #2
# Current token is #2
syntax_error #2
# Attempting error recovery
# Finding recovery state on stack
# Pop stack by one, state was # 2
# Pop stack by one, state was # 0
# No recovery state found on stack
# Error recovery fails
Couldn't repair and continue parse at character 0 of input
java.lang.Exception: Can't recover from previous error(s)
at java_cup.runtime.lr_parser.report_fatal_error(lr_parser.java:375)
at java_cup.runtime.lr_parser.unrecovered_syntax_error(lr_parser.java:424)
at java_cup.runtime.lr_parser.debug_parse(lr_parser.java:816)
at AnalyzerLex.main(AnalyzerLex.java:622)
I think that I am setting up the lexer/parser not properly.

I am not an expert but I can recommend you to take these actions:
You may have to specify which non terminal to start with, for example:
start with compilation_unit;
You can enhance your syntax error method by adding line and column, that way it is clearer where the error is.
public void syntax_error(Symbol s){
System.out.println("compiler has detected a syntax error at line " + s.left
+ " column " + s.right);
}

Related

check if condition is met before executing the action in JFlex

I am writing a lexical analyzer using JFlex. When the word co is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true whenever this word is matched and if an identifier or an operator is matched after co until the end of the line, I simply ignore it because I have an if condition in my Identifier and Operator token identification.
I am wondering if there is better way to do this and get rid of this if statement that appears everywhere?
Here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
%{
private boolean isCommentOpen = false;
private void toggleIsCommentOpen() {
this.isCommentOpen = ! this.isCommentOpen;
}
private boolean getIsCommentOpen() {
return this.isCommentOpen;
}
%}
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%%
{Operators} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
{Identifier} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
"co" {
toggleIsCommentOpen();
}
. {}
{EndOfLine} {
if (getIsCommentOpen()) {
toggleIsCommentOpen();
}
}
One way to do this is to use states in JFlex. We say that every time the word co is matched, we enter in a state named COMMENT_STATE and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE state. So here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%xstate YYINITIAL, COMMENT_STATE
%%
<YYINITIAL> {
"co" {yybegin(COMMENT_STATE);}
}
<COMMENT_STATE> {
{EndOfLine} {yybegin(YYINITIAL);}
. {}
}
{Operators} {// Do Code}
{Identifier} {// Do Code}
. {}
{EndOfLine} {}
With this new approach, the lexer is more simpler and it's also more readable.

Reading a file in dart and split the string has different results in console that in vscode

I'm new in dart, I'm trying to read information from a txt file and use the data to create objects from a class (in this case about pokemon), but when I run my program in the terminal it doesn't prints the correct information, and when I run the program in vscode (whit the dart extension, the "run" button) it prints in the debug console the correct information. What is the problem?
When I run the program in vscode I get in my print method (printP) this (which is what I want)
vscode:
Print method:
1+: Bulbasaur GRASS | POISON
but when I run the program in the terminal I get this.
Terminal:
Print method:
| POISONsaur
Here is the dart code.
main.dart
import 'dart:io';
import 'pokemon.dart';
void main() {
var file = new File("/home/ariel/Documents/script/pokemon.txt");
String str = file.readAsStringSync();
var pokes = str.split("[");
pokes = pokes.sublist(1, pokes.length);
getPokemon(pokes[0]).printP();
}
Pokemon getPokemon(String str) {
Pokemon p = new Pokemon();
print("string: " + str);
var aux = str.split("\n");
print(aux.length);
for (var i in aux) {
print("line: " + i);
}
p.number = int.parse(aux[0].split("]")[0]);
p.name = aux[1].split("=")[1];
p.type1 = aux[3].split("=")[1];
p.type2 = aux[4].split("=")[1];
return p;
}
pokemon.dart
class Pokemon {
String _name, _type1, _type2;
int _number;
Pokemon() {
this._name = "";
this._number = 0;
this._type1 = "";
this._type2 = "";
}
void printP() {
print("Print method:");
print("${this._number}+: ${this._name} ${this._type1} | ${this._type2}");
}
void set number(int n) {
this._number = n;
}
void set name(String nm) {
this._name = nm;
}
void set type1(String t) {
this._type1 = t;
}
void set type2(String t) {
this._type2 = t;
}
}
And here is the txt file
pokemon.txt
[1]
Name=Bulbasaur
InternalName=BULBASAUR
Type1=GRASS
Type2=POISON
BaseStats=45,49,49,45,65,65
GenderRate=FemaleOneEighth
GrowthRate=Parabolic
BaseEXP=64
EffortPoints=0,0,0,0,1,0
Rareness=45
Happiness=70
Abilities=OVERGROW
HiddenAbility=CHLOROPHYLL
Moves=1,TACKLE,3,GROWL,7,LEECHSEED,9,VINEWHIP,13,POISONPOWDER,13,SLEEPPOWDER,15,TAKEDOWN,19,RAZORLEAF,21,SWEETSCENT,25,GROWTH,27,DOUBLEEDGE,31,WORRYSEED,33,SYNTHESIS,37,SEEDBOMB
EggMoves=AMNESIA,CHARM,CURSE,ENDURE,GIGADRAIN,GRASSWHISTLE,INGRAIN,LEAFSTORM,MAGICALLEAF,NATUREPOWER,PETALDANCE,POWERWHIP,SKULLBASH,SLUDGE
Compatibility=Monster,Grass
StepsToHatch=5355
Height=0.7
Weight=6.9
Color=Green
Habitat=Grassland
Kind=Seed
Pokedex=Almacena energía en el bulbo de su espalda para alimentarse durante épocas de escasez de recursos o para atacar liberándola de golpe.
BattlerPlayerY=0
BattlerEnemyY=25
BattlerAltitude=0
Evolutions=IVYSAUR,Level,16
Your code are dependent on the newline format of your txt file. I will recommend you are using the LineSplitter class from dart:convert to split your lines.
The problem is that Windows newlines contains both '\n' and '\r' but you are only removing the '\n' part. '\r' are essential meaning the terminal should set the cursor back to the beginning of the line.
You can read this like a typewriter where you first move the head back and set move the paper to the next line. And can read a lot more about is topic here: https://en.wikipedia.org/wiki/Newline
The purpose of the LineSplitter class is to abstract all of this logic and get some behavior which will work on all platforms.
So import dart:convert and change this line:
var aux = str.split("\n");
Into:
var aux = LineSplitter.split(str).toList();

How to match whitespace and comments with re2c

I started very recently to use bison for writing small compiler exercises. I am having some issues with white spaces ans comments. I was trying to debug the problem and I arrived to this source that looks like what I am looking for. I tried to chnage and erase some characters as advised but didn't work.
Also during compilation I have the following error: re2c: error: line 2963, column 0: can only difference char sets.
Below the part of the code:
yy::conj_parser::symbol_type yy::yylex(lexcontext& ctx)
{
const char* anchor = ctx.cursor;
ctx.loc.step();
// Add a lambda function to avoid repetition
auto s = [&](auto func, auto&&... params) { ctx.loc.columns(ctx.cursor - anchor); return func(params..., ctx.loc); };
%{ /* Begin re2c lexer : Tokenization process starts */
re2c:yyfill:enable = 0;
re2c:define:YYCTYPE = "char";
re2c:define:YYCURSOR = "ctx.cursor";
"return" { return s(conj_parser::make_RETURN); }
"while" | "for" { return s(conj_parser::make_WHILE); }
"var" { return s(conj_parser::make_VAR); }
"if" { return s(conj_parser::make_IF); }
// Identifiers
[a-zA-Z_] [a-zA-Z_0-9]* { return s(conj_parser::make_IDENTIFIER, std::string(anchor, ctx.cursor)); }
// String and integers:
"\""" [^\"]* "\"" { return s(conj_parser::make_STRINGCONST, std::string(anchor+1, ctx.cursor-1)); }
[0-9]+ { return s(conj_parser::make_NUMCONST, std::stol(std::string(anchor, ctx.cursor))); }
// Whitespace and comments:
"\000" { return s(conj_parser::make_END); }
"\r\n" | [\r\n] { ctx.loc.lines(); return yylex(ctx); }
"//" [^\r\n]* { return yylex(ctx); }
[\t\v\b\f ] { ctx.loc.columns(); return yylex(ctx); }
Thank you very much for pointing in the right direction or shading some lights on why this error could be solved.
You really should mention which line is line 2963. Perhaps it is here, because there seems to be an extra quotation mark in that line.
"\""" [^\"]* "\""
^

How to pipe to a process using vala/glib

I'm trying to pipe output from echo into a command using GLib's spawn_command_line_sync method. The problem I've run into is echo is interpreting the entire command as the argument.
To better explain, I run this in my code:
string command = "echo \"" + some_var + "\" | command";
Process.spawn_command_line_sync (command.escape (),
out r, out e, out s);
I would expect the variable to be echoed to the pipe and the command run with the data piped, however when I check on the result it's just echoing everything after echo like this:
"some_var's value" | command
I think I could just use the Posix class to run the command but I like having the result, error and status values to listen to that the spawn_command_line_sync method provides.
The problem is that you are providing shell syntax to what is essentially the kernel’s exec() syscall. The shell pipe operator redirects the stdout of one process to the stdin of the next. To implement that using Vala, you need to get the file descriptor for the stdin of the command process which you’re running, and write some_var to it manually.
You are combining two subprocesses into one. Instead echo and command should be treated separately and have a pipe set up between them. For some reason many examples on Stack Overflow and other sites use the Process.spawn_* functions, but using GSubprocess is an easier syntax.
This example pipes the output of find . to sort and then prints the output to the console. The example is a bit longer because it is a fully working example and makes use of a GMainContext for asynchronous calls. GMainContext is used by GMainLoop, GApplication and GtkApplication:
void main () {
var mainloop = new MainLoop ();
SourceFunc quit = ()=> {
mainloop.quit ();
return Source.REMOVE;
};
read_piped_commands.begin ("find .", "sort", quit);
mainloop.run ();
}
async void read_piped_commands (string first_command, string second_command, SourceFunc quit) {
var output = splice_subprocesses (first_command, second_command);
try {
string? line = null;
do {
line = yield output.read_line_async ();
print (#"$(line ?? "")\n");
}
while (line != null);
} catch (Error error) {
print (#"Error: $(error.message)\n");
}
quit ();
}
DataInputStream splice_subprocesses (string first_command, string second_command) {
InputStream end_pipe = null;
try {
var first = new Subprocess.newv (first_command.split (" "), STDOUT_PIPE);
var second = new Subprocess.newv (second_command.split (" "), STDIN_PIPE | STDOUT_PIPE);
second.get_stdin_pipe ().splice (first.get_stdout_pipe (), CLOSE_TARGET);
end_pipe = second.get_stdout_pipe ();
} catch (Error error) {
print (#"Error: $(error.message)\n");
}
return new DataInputStream (end_pipe);
}
It is the splice_subprocesses function that answers your question. It takes the STDOUT from the first command as an InputStream and splices it with the OutputStream (STDIN) for the second command.
The read_piped_commands function takes the output from the end of the pipe. This is an InputStream that has been wrapped in a DataInputStream to give access to the read_line_async convenience method.
Here's the full, working implementation:
try {
string[] command = {"command", "-options", "-etc"};
string[] env = Environ.get ();
Pid child_pid;
string some_string = "This is what gets piped to stdin"
int stdin;
int stdout;
int stderr;
Process.spawn_async_with_pipes ("/",
command,
env,
SpawnFlags.SEARCH_PATH | SpawnFlags.DO_NOT_REAP_CHILD,
null,
out child_pid,
out stdin,
out stdout,
out stderr);
FileStream input = FileStream.fdopen (stdin, "w");
input.write (some_string.data);
/* Make sure we close the process using it's pid */
ChildWatch.add (child_pid, (pid, status) => {
Process.close_pid (pid);
});
} catch (SpawnError e) {
/* Do something w the Error */
}
I guess playing with the FileStream is what really made it hard to figure this out. Turned out to be pretty straightforward.
Based on previous answers probably an interesting case is to use program arguments to have a general app to pipe any input on it:
pipe.vala:
void main (string[] args) {
try {
string command = args[1];
var subproc = new Subprocess(STDIN_PIPE | STDOUT_PIPE, command);
var data = args[2].data;
var input = new MemoryInputStream.from_data(data, GLib.free);
subproc.get_stdin_pipe ().splice (input, CLOSE_TARGET);
var end_pipe = subproc.get_stdout_pipe ();
var output = new DataInputStream (end_pipe);
string? line = null;
do {
line = output.read_line();
print (#"$(line ?? "")\n");
} while (line != null);
} catch (Error error) {
print (#"Error: $(error.message)\n");
}
}
build:
$ valac --pkg gio-2.0 pipe.vala
and run:
$ ./pipe sort "cc
ab
aa
b
"
Output:
aa
ab
b
cc

Apache Beam TextIO.Read with line number

Is it possible to get access to line numbers with the lines read into the PCollection from TextIO.Read? For context here, I'm processing a CSV file and need access to the line number for a given line.
If not possible through TextIO.Read it seems like it should be possible using some kind of custom Read or transform, but I'm having trouble figuring out where to begin.
You can use FileIO to read the file manually, where you can determine the line number when you read from the ReadableFile.
A simple solution can look as follows:
p
.apply(FileIO.match().filepattern("/file.csv"))
.apply(FileIO.readMatches())
.apply(FlatMapElements
.into(strings())
.via((FileIO.ReadableFile f) -> {
List<String> result = new ArrayList<>();
try (BufferedReader br = new BufferedReader(Channels.newReader(f.open(), "UTF-8"))) {
int lineNr = 1;
String line = br.readLine();
while (line != null) {
result.add(lineNr + "," + line);
line = br.readLine();
lineNr++;
}
} catch (IOException e) {
throw new RuntimeException("Error while reading", e);
}
return result;
}));
The solution above just prepends the line number to each input line.

Resources