How to match a variable to a token in javacc? - token

I am trying to match a variable (a string) to one of my defined tokens in JAVACC. The pseudocode for what I am trying to do is...
String x;
if (x matches <FUNCTIONNAME>) {...}
How would I go about achieving this?
Thank you

Here is one way to do it. Use the STATIC==false option. The following code should do what you need
public boolean matches( String str, int k ) {
// Precondition: k should be one of the integers
// given a name in XXXConstants
// Postcondition: result is true if and only if str would be lexed by
// the lexer as a single token of kind k possibly
// preceeded and followed by any number of skipped and special tokens.
StringReader sr = new StringReader( str ) ;
SimpleCharStream scs = new SimpleCharStream( sr ) ;
XXXTokenManager lexer = new XXXTokenManager( scs );
boolean matches = false ;
try {
Token a = lexer.getNextToken() ;
Token b = lexer.getNextToken() ;
matches = a.kind == k && b.kind == 0 ; }
catch( Throwable t ) {}
return matches ;
}
One problem with this is that it will skip tokens declared as SKIP or SPECIAL_TOKEN. E.g. if I use a Java lexer then "/*hello*/\tworld // \n" will still match JavaParserConstants.ID. If you don't want this, you need to do two things. First go into the .jj file and convert any SKIP tokens to SPECIAL_TOKENS. Second add checks that there no special tokens were found
matches = a.kind == k && b.kind == 0 && a.specialToken == null && b.specialToken == null ;

Related

Weird antlr grammar rule

I have found an old file that define antlr grammar rules like that:
rule_name[ ParamType *param ] > [ReturnType *retval]:
<<
$retval = NULL;
OtherType1 *new_var1 = NULL;
OtherType2 *new_var2 = NULL;
>>
subrule1[ param ] > [ $retval ]
| subrule2 > [new_var2]
<<
if( new_var2 == SOMETHING ){
$retval = something_related_to_new_var2;
}
else{
$retval = new_var2;
}
>>
{
somethingelse > [new_var_1]
<<
/* Do something with new_var_1 */
$retval = new_var_1;
>>
}
;
I'm not an Antlr expert and It's the first time that i see this kind of semantic for a rule definition.
Does anybody know where I can find documentation/informations about this?
Even a keyword for a google search is welcome.
Edit:
It should be ANTLR Version 1.33MR33.
Ok, I found! Here is the guide:
http://www.antlr2.org/book/pcctsbk.pdf
I quote the interesting part of the pdf that answer to my question.
1) Page 47:
poly > [float r]
: <<float f;>>
term>[$r] ( "\+" term>[f] <<$r += f;>> )*
;
Rule poly is defined to have a return value called $r via the "> [float r]" notation; this is similar to the output redirection character of UNIX shells. Setting the value of $r sets the return value of poly. he first action after the ":" is an init-action (because it is the first action of a rule or subrule). The init-action defines a local variable called f that will be used in the (...)* loop to hold the return value of the term.
2) Page 85:
A rule looks like:
rule : alternative1
| alternative2
...
| alternativen
;
where each alternative production is composed of a list of elements that can be references to rules, references to tokens, actions, predicates, and subrules. Argument and return value definitions looks like the following where there are n arguments and m return values:
rule[arg1,...,argn] > [retval1,...,retvalm] : ... ;
The syntax for using a rule mirrors its definition:
a : ... rule[arg1,...,argn] > [v1,...,vm] ...
;
Here, the various vi receive the return values from the rule rule, each vi must be an l-value.
3) Page 87:
Actions are of the form <<...>> and contain user-supplied C or C++ code that must be executed during the parse.

ANTLR fetch comment text for additional processing

In an experimental language development, need to fetch the comment text for further processing.
At the tokens level this does not work,
COMMENT : comm = ('/*' ~'*' .* '*/') { System.out.println($comm.text); } ;
Tried to add statement and/or expression, yet this desired syntax is not parsed either,
x = myFunction(x1, /* comment x1 */
x2, /* comment x2 */
x3)
Update: using ANTLR 3.1.3.
Found this working approach though not fully sound as of associating a statement/expression to a comment,
#lexer::members {
public static final int COMMENTS = 2;
}
and so comments are deviated to numbered channel,
COMMENT : '/*' ~'*' .* '*/' {$channel=COMMENTS;} ;
Then,
NetworkLexer lexer = new NetworkLexer(sourceStream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Parser parser = new Parser(tokenStream);
parser.prog();
To fetch the comment text with line numbers consider this Java code,
for (Object tk: tokenStream.getTokens()) {
CommonToken ctk = (CommonToken) tk;
if (ctk.getChannel() == 2) {
if (ctk.getText() != null && ctk.getText().trim().isEmpty() == false) {
System.out.println(String.format("tk channel %s line %s text %s"),
ctk.getChannel(), ctk.getLine(), ctk.getText());
}
}

How to match the start and end of a block

I want to define a special code block, which may starts by any combination of characters of {[<#, and the end will be }]>#.
Some example:
{
block-content
}
{##
block-content
##}
#[[<{###
block-content
###}>]]#
Is it possible with petitparser-dart?
Yes, back-references are possible, but it is not that straight-forward.
First we need a function that can reverse our delimiter. I came up with the following:
String reverseDelimiter(String token) {
return token.split('').reversed.map((char) {
if (char == '[') return ']';
if (char == '{') return '}';
if (char == '<') return '>';
return char;
}).join();
}
Then we have to declare the stopDelimiter parser. It is undefined at this point, but will be replaced with the real parser as soon as we know it.
var stopDelimiter = undefined();
In the action of the startDelimiter we replace the stopDelimiter with a dynamically created parser as follows:
var startDelimiter = pattern('#<{[').plus().flatten().map((String start) {
stopDelimiter.set(string(reverseDelimiter(start)).flatten());
return start;
});
The rest is trivial, but depends on your exact requirements:
var blockContents = any().starLazy(stopDelimiter).flatten();
var parser = startDelimiter & blockContents & stopDelimiter;
The code above defines the blockContents parser so that it reads through anything until the matching stopDelimiter is encountered. The provided examples pass:
print(parser.parse('{ block-content }'));
// Success[1:18]: [{, block-content , }]
print(parser.parse('{## block-content ##}'));
// Success[1:22]: [{##, block-content , ##}]
print(parser.parse('#[[<{### block-content ###}>]]#'));
// Success[1:32]: [#[[<{###, block-content , ###}>]]#]
The above code doesn't work if you want to nest the parser. If necessary, that problem can be avoided by remembering the previous stopDelimiter and restoring it.

LPeg Increment for Each Match

I'm making a serialization library for Lua, and I'm using LPeg to parse the string. I've got K/V pairs working (with the key explicitly named), but now I'm going to add auto-indexing.
It'll work like so:
#"value"
#"value2"
Will evaluate to
{
[1] = "value"
[2] = "value2"
}
I've already got the value matching working (strings, tables, numbers, and Booleans all work perfectly), so I don't need help with that; what I'm looking for is the indexing. For each match of #[value pattern], it should capture the number of #[value pattern]'s found - in other words, I can match a sequence of values ("#"value1" #"value2") but I don't know how to assign them indexes according to the number of matches. If that's not clear enough, just comment and I'll attempt to explain it better.
Here's something of what my current pattern looks like (using compressed notation):
local process = {} -- Process a captured value
process.number = tonumber
process.string = function(s) return s:sub(2, -2) end -- Strip of opening and closing tags
process.boolean = function(s) if s == "true" then return true else return false end
number = [decimal number, scientific notation] / process.number
string = [double or single quoted string, supports escaped quotation characters] / process.string
boolean = P("true") + "false" / process.boolean
table = [balanced brackets] / [parse the table]
type = number + string + boolean + table
at_notation = (P("#") * whitespace * type) / [creates a table that includes the key and value]
As you can see in the last line of code, I've got a function that does this:
k,v matched in the pattern
-- turns into --
{k, v}
-- which is then added into an "entry table" (I loop through it and add it into the return table)
Based on what you've described so far, you should be able to accomplish this using a simple capture and table capture.
Here's a simplified example I knocked up to illustrate:
lpeg = require 'lpeg'
l = lpeg.locale(lpeg)
whitesp = l.space ^ 0
bool_val = (l.P "true" + "false") / function (s) return s == "true" end
num_val = l.digit ^ 1 / tonumber
string_val = '"' * l.C(l.alnum ^ 1) * '"'
val = bool_val + num_val + string_val
at_notation = l.Ct( (l.P "#" * whitesp * val * whitesp) ^ 0 )
local testdata = [[
#"value1"
#42
# "value2"
#true
]]
local res = l.match(at_notation, testdata)
The match returns a table containing the contents:
{
[1] = "value1",
[2] = 42,
[3] = "value2",
[4] = true
}

JSON2 Error / Conflict with another script

I am using the JSON2 library in order to use JSON.stringify to send some JSON data to my MVC controller.
When I include another script in my view (Telerik MVC) I start to get script conflicts when using IE7.
When I click the refresh button in the grid, I get the following error:
Line: 191
Error: Object doesn't support this property or method
String.prototype.toJSON =
Number.prototype.toJSON =
Boolean.prototype.toJSON = function (key) {
return this.valueOf();
};
The error occurs on the following line specifically:
return this.valueOf();
Does anyone have any insight into why this conflict is occurring and how to resolve it? Specifically, why would this work in IE8/Chrome but fail in IE7. What would cause the error? Are both scripts trying to define the same method and that's why it is failing or is it impossible to tell without digging through tons of code?
Edit:
This is the json2.js library I am speaking of: https://github.com/douglascrockford/JSON-js
Probably the reply is too late, but I thought it's worth replying as this might save some valuable lives ;)
The JSON2 script won't initialize/extend the JSON object if there is an existing implementation(Native or Included). However if the JSON object does not exist, the script will create that object and attach few methods to it (JSON.stringify and JSON.parse to be precise). However in order to make those methods to work, there are other objects (like Date, String, Number and Boolean objects) which need to be extended to support certain methods (like toJSON method). The JSON2 script takes care of extending the required objects as well.
Now coming to the specific issue here (Telerik MVC). I faced the same problem while working with Telerik for one of the Projects. However I was able to trace it. The probable cause is the conflict between Telerik scripts and the current JSON2 script. The Date and Boolean Objects' toJSON method somehow conflicts with Telerik's implmentation of the same method for those two objects which breaks the Telerik script at some places. I have modified the JSON2 library for a more robust check which doesn't fail in any scenario (even on use of Telerik MVC on the page). I have tested the script and it works fine for me, however in case someone finds any further conflicts, please reply back.
var JSON;
if (!JSON) {
JSON = {};
}
(function () {
'use strict';
function f(n) {
// Format integers to have at least two digits.
return n < 10 ? '0' + n : n;
}
if (typeof Date.prototype.toJSON !== 'function') {
Date.prototype.toJSON = function (key) {
return isFinite(this.valueOf())
? this.getUTCFullYear() + '-' +
f(this.getUTCMonth() + 1) + '-' +
f(this.getUTCDate()) + 'T' +
f(this.getUTCHours()) + ':' +
f(this.getUTCMinutes()) + ':' +
f(this.getUTCSeconds()) + /*added - start*/ '.'+
f(this.getUTCMilliseconds()) + /*added - end*/ 'Z'
: null;
};
//pushed the below code outside current if block
// String.prototype.toJSON =
// Number.prototype.toJSON =
// Boolean.prototype.toJSON = function (key) {
// return this.valueOf();
// };
}
/*added - start*/
if (typeof String.prototype.toJSON !== 'function') {
String.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
if (typeof Number.prototype.toJSON !== 'function') {
Number.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
if (typeof Boolean.prototype.toJSON !== 'function') {
Boolean.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
/*added - end*/
var cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
escapable = /[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
gap,
indent,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
},
rep;
function quote(string) {
// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.
escapable.lastIndex = 0;
return escapable.test(string) ? '"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string'
? c
: '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' : '"' + string + '"';
}
function str(key, holder) {
// Produce a string from holder[key].
var i, // The loop counter.
k, // The member key.
v, // The member value.
length,
mind = gap,
partial,
value = holder[key];
// If the value has a toJSON method, call it to obtain a replacement value.
if (value && typeof value === 'object' &&
typeof value.toJSON === 'function') {
value = value.toJSON(key);
}
// If we were called with a replacer function, then call the replacer to
// obtain a replacement value.
if (typeof rep === 'function') {
value = rep.call(holder, key, value);
}
// What happens next depends on the value's type.
switch (typeof value) {
case 'string':
return quote(value);
case 'number':
// JSON numbers must be finite. Encode non-finite numbers as null.
return isFinite(value) ? String(value) : 'null';
case 'boolean':
case 'null':
// If the value is a boolean or null, convert it to a string. Note:
// typeof null does not produce 'null'. The case is included here in
// the remote chance that this gets fixed someday.
return String(value);
// If the type is 'object', we might be dealing with an object or an array or
// null.
case 'object':
// Due to a specification blunder in ECMAScript, typeof null is 'object',
// so watch out for that case.
if (!value) {
return 'null';
}
// Make an array to hold the partial results of stringifying this object value.
gap += indent;
partial = [];
// Is the value an array?
if (Object.prototype.toString.apply(value) === '[object Array]') {
// The value is an array. Stringify every element. Use null as a placeholder
// for non-JSON values.
length = value.length;
for (i = 0; i < length; i += 1) {
partial[i] = str(i, value) || 'null';
}
// Join all of the elements together, separated with commas, and wrap them in
// brackets.
v = partial.length === 0
? '[]'
: gap
? '[\n' + gap + partial.join(',\n' + gap) + '\n' + mind + ']'
: '[' + partial.join(',') + ']';
gap = mind;
return v;
}
// If the replacer is an array, use it to select the members to be stringified.
if (rep && typeof rep === 'object') {
length = rep.length;
for (i = 0; i < length; i += 1) {
if (typeof rep[i] === 'string') {
k = rep[i];
v = str(k, value);
if (v) {
partial.push(quote(k) + (gap ? ': ' : ':') + v);
}
}
}
} else {
// Otherwise, iterate through all of the keys in the object.
for (k in value) {
if (Object.prototype.hasOwnProperty.call(value, k)) {
v = str(k, value);
if (v) {
partial.push(quote(k) + (gap ? ': ' : ':') + v);
}
}
}
}
// Join all of the member texts together, separated with commas,
// and wrap them in braces.
v = partial.length === 0
? '{}'
: gap
? '{\n' + gap + partial.join(',\n' + gap) + '\n' + mind + '}'
: '{' + partial.join(',') + '}';
gap = mind;
return v;
}
}
// If the JSON object does not yet have a stringify method, give it one.
if (typeof JSON.stringify !== 'function') {
JSON.stringify = function (value, replacer, space) {
// The stringify method takes a value and an optional replacer, and an optional
// space parameter, and returns a JSON text. The replacer can be a function
// that can replace values, or an array of strings that will select the keys.
// A default replacer method can be provided. Use of the space parameter can
// produce text that is more easily readable.
var i;
gap = '';
indent = '';
// If the space parameter is a number, make an indent string containing that
// many spaces.
if (typeof space === 'number') {
for (i = 0; i < space; i += 1) {
indent += ' ';
}
// If the space parameter is a string, it will be used as the indent string.
} else if (typeof space === 'string') {
indent = space;
}
// If there is a replacer, it must be a function or an array.
// Otherwise, throw an error.
rep = replacer;
if (replacer && typeof replacer !== 'function' &&
(typeof replacer !== 'object' ||
typeof replacer.length !== 'number')) {
throw new Error('JSON.stringify');
}
// Make a fake root object containing our value under the key of ''.
// Return the result of stringifying the value.
return str('', {'': value});
};
}
// If the JSON object does not yet have a parse method, give it one.
if (typeof JSON.parse !== 'function') {
JSON.parse = function (text, reviver) {
// The parse method takes a text and an optional reviver function, and returns
// a JavaScript value if the text is a valid JSON text.
var j;
function walk(holder, key) {
// The walk method is used to recursively walk the resulting structure so
// that modifications can be made.
var k, v, value = holder[key];
if (value && typeof value === 'object') {
for (k in value) {
if (Object.prototype.hasOwnProperty.call(value, k)) {
v = walk(value, k);
if (v !== undefined) {
value[k] = v;
} else {
delete value[k];
}
}
}
}
return reviver.call(holder, key, value);
}
// Parsing happens in four stages. In the first stage, we replace certain
// Unicode characters with escape sequences. JavaScript handles many characters
// incorrectly, either silently deleting them, or treating them as line endings.
text = String(text);
cx.lastIndex = 0;
if (cx.test(text)) {
text = text.replace(cx, function (a) {
return '\\u' +
('0000' + a.charCodeAt(0).toString(16)).slice(-4);
});
}
// In the second stage, we run the text against regular expressions that look
// for non-JSON patterns. We are especially concerned with '()' and 'new'
// because they can cause invocation, and '=' because it can cause mutation.
// But just to be safe, we want to reject all unexpected forms.
// We split the second stage into 4 regexp operations in order to work around
// crippling inefficiencies in IE's and Safari's regexp engines. First we
// replace the JSON backslash pairs with '#' (a non-JSON character). Second, we
// replace all simple value tokens with ']' characters. Third, we delete all
// open brackets that follow a colon or comma or that begin the text. Finally,
// we look to see that the remaining characters are only whitespace or ']' or
// ',' or ':' or '{' or '}'. If that is so, then the text is safe for eval.
if (/^[\],:{}\s]*$/
.test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '#')
.replace(/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g, ']')
.replace(/(?:^|:|,)(?:\s*\[)+/g, ''))) {
// In the third stage we use the eval function to compile the text into a
// JavaScript structure. The '{' operator is subject to a syntactic ambiguity
// in JavaScript: it can begin a block or an object literal. We wrap the text
// in parens to eliminate the ambiguity.
j = eval('(' + text + ')');
// In the optional fourth stage, we recursively walk the new structure, passing
// each name/value pair to a reviver function for possible transformation.
return typeof reviver === 'function'
? walk({'': j}, '')
: j;
}
// If the text is not JSON parseable, then a SyntaxError is thrown.
throw new SyntaxError('JSON.parse');
};
}
}());
Note: The above code is not my implementation. It is from the source https://github.com/douglascrockford/JSON-js I have just modified it a little to avoid any conflicts with Telerik or otherwise.
I have had exactly the same problem.
I couldn't find any other solution than to edit the json2.js file like you suggested, thanks for that.
However, I found that this would fix the issue for IE7 and still work in IE8/9 as well as firefox, but it now stopped working in Chrome ("this" for [this.valueOf === 'function'] is undefined).
Have you run into that issue, too, or did yours work in Chrome? I'm trying to figure out if this is related to my data or telerik-internal.
Thanks for your post!
Edit:
For now I have just returned null if "this" is undefined/null (in all three functions). Seems to work in all browsers and allows the Telerik grid to rebind without problems.
I don't know how correct this is in the global context of json2.js .toJSON method, though.

Resources