I want to define a special code block, which may starts by any combination of characters of {[<#, and the end will be }]>#.
Some example:
{
block-content
}
{##
block-content
##}
#[[<{###
block-content
###}>]]#
Is it possible with petitparser-dart?
Yes, back-references are possible, but it is not that straight-forward.
First we need a function that can reverse our delimiter. I came up with the following:
String reverseDelimiter(String token) {
return token.split('').reversed.map((char) {
if (char == '[') return ']';
if (char == '{') return '}';
if (char == '<') return '>';
return char;
}).join();
}
Then we have to declare the stopDelimiter parser. It is undefined at this point, but will be replaced with the real parser as soon as we know it.
var stopDelimiter = undefined();
In the action of the startDelimiter we replace the stopDelimiter with a dynamically created parser as follows:
var startDelimiter = pattern('#<{[').plus().flatten().map((String start) {
stopDelimiter.set(string(reverseDelimiter(start)).flatten());
return start;
});
The rest is trivial, but depends on your exact requirements:
var blockContents = any().starLazy(stopDelimiter).flatten();
var parser = startDelimiter & blockContents & stopDelimiter;
The code above defines the blockContents parser so that it reads through anything until the matching stopDelimiter is encountered. The provided examples pass:
print(parser.parse('{ block-content }'));
// Success[1:18]: [{, block-content , }]
print(parser.parse('{## block-content ##}'));
// Success[1:22]: [{##, block-content , ##}]
print(parser.parse('#[[<{### block-content ###}>]]#'));
// Success[1:32]: [#[[<{###, block-content , ###}>]]#]
The above code doesn't work if you want to nest the parser. If necessary, that problem can be avoided by remembering the previous stopDelimiter and restoring it.
Related
I wanted to match for a string which starts with a '#', then matches everything until it matches the character that follows '#'. This can be achieved using capturing groups like this: #(.)[^(?1)]*(?1)(EDIT this regex is also erroneous). This matches #$foo$, does not match #%bar&, matches first 6 characters of #"foo"bar.
But since flex lex does not support capturing groups, what is the workaround here?
As you say, (f)lex does not support capturing groups, and it certainly doesn't support backreferences.
So there is no simple workaround, but there are workarounds. Here are a few possibilities:
You can read the input one character at a time using the input() function, until you find the matching character (but you have to create your own buffer to store the characters, because characters read by input() are not added to the current token). This is not the most efficient because reading one character at a time is a bit clunky, but it's the only interface that (f)lex offers. (The following snippet assumes you have some kind of expandable stringBuilder; if you are using C++, this would just be replaced with a std::string.)
#. { StringBuilder sb = string_builder_new();
int delim = yytext[1];
for (;;) {
int next = input();
if (next == delim) break;
if (next == EOF ) { /* Signal error */; break; }
string_builder_addchar(next);
}
yylval = string_builder_release();
return DELIMITED_STRING;
}
Even less efficiently, but perhaps more conveniently, you can get (f)lex to accumulate the characters in yytext using yymore(), matching one character at a time in a start condition:
%x DELIMITED
%%
int delim;
#. { delim = yytext[1]; BEGIN(DELIMITED); }
<DELIMITED>.|\n { if (yytext[0] == delim) {
yylval = strdup(yytext);
BEGIN(INITIAL);
return DELIMITED_STRING;
}
yymore();
}
<DELIMITED><<EOF>> { /* Signal unterminated string error */ }
The most efficient solution (in (f)lex) is to just write one rule for each possible delimiter. While that's a lot of rules, they could be easily generated with a small script in whatever scripting language you prefer. And, actually, there are not that many rules, particularly if you don't allow alphabetic and non-printing characters to be delimiters. This has the additional advantage that if you want Perl-like parenthetic delimiters (#(Hello) instead of #(Hello(), you can just modify the individual pattern to suit (as I've done below). [Note 1] Since all the actions are the same; it might be easier to use a macro for the action, making it easier to modify.
/* Ordinary punctuation */
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#![^!]*! { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\.[^.]*\. { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Matched pairs */
#<[^>]*> { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\[[^]]*] { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Trap errors */
# { /* Report unmatched or invalid delimiter error */ }
If I were writing a script to generate these rules, I would use hexadecimal escapes for all the delimiter characters rather than trying to figure out which ones needed escapes.
Notes:
Perl requires nested balanced parentheses in constructs like that. But you can't do that with regular expressions; if you wanted to reproduce Perl behaviour, you'd need to use some variation on one of the other suggestions. I'll try to revisit this answer later to address that feature.
In an experimental language development, need to fetch the comment text for further processing.
At the tokens level this does not work,
COMMENT : comm = ('/*' ~'*' .* '*/') { System.out.println($comm.text); } ;
Tried to add statement and/or expression, yet this desired syntax is not parsed either,
x = myFunction(x1, /* comment x1 */
x2, /* comment x2 */
x3)
Update: using ANTLR 3.1.3.
Found this working approach though not fully sound as of associating a statement/expression to a comment,
#lexer::members {
public static final int COMMENTS = 2;
}
and so comments are deviated to numbered channel,
COMMENT : '/*' ~'*' .* '*/' {$channel=COMMENTS;} ;
Then,
NetworkLexer lexer = new NetworkLexer(sourceStream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Parser parser = new Parser(tokenStream);
parser.prog();
To fetch the comment text with line numbers consider this Java code,
for (Object tk: tokenStream.getTokens()) {
CommonToken ctk = (CommonToken) tk;
if (ctk.getChannel() == 2) {
if (ctk.getText() != null && ctk.getText().trim().isEmpty() == false) {
System.out.println(String.format("tk channel %s line %s text %s"),
ctk.getChannel(), ctk.getLine(), ctk.getText());
}
}
I'm trying to implement a parser by directly reading a treeWalker and implementing the commands needed for the compiler on the fly. So if I have a command like:
statement
:
^('WRITE' expression)
{
//Here is the command that is created by my Tree Parser
ch.emitRO("OUT",0,0,0,"write out the value of ac");
//and then I handle it in my other classes
}
;
I want it to write OUT 0,0,0; to a file. That's my grammar.
I have a problem though with the loop section in my grammar it is:
'WHILE'^ expression 'DO' stat_seq 'ENDDO'
and in the tree parser:
doWhileStatement
:
^('WHILE' expression 'DO' stat_seq 'ENDDO')
;
What I want to do is directly parse the code from the while loop into the commands I need. I came up with this solution but it doesn't work:
doWhileStatement
:
^('WHILE' e=expression head='DO'
{
int loopHead =((CommonTree) head).getTokenStartIndex();
}
stat_seq
{
if ($e.result==1) {
input.seek(loopHead);
doWhileStatement();
}
}
'ENDDO')
;
for the record here are some of the other commands I've written:
(ignore the code written in brackets, it's for the generation of the commands in a text file.)
stat_seq
:
(statement)+
;
statement
:
^(':=' ID e=expression) { variables.put($ID.text,e); }
| ^('WRITE' expression)
{
ch.emitRM("LDC",ac,$expression.result,0,"pass the expression value to the ac reg");
ch.emitRO("OUT",ac,0,0,"write out the value of ac");
}
| ^('READ' ID)
{
ch.emitRO("IN",ac,0,0,"read value");
}
| ^('IF' expression 'THEN'
{
ch.emitRM("LDC",ac1,$expression.result,0,"pass the expression result to the ac reg");
int savedLoc1 = ch.emitSkip(1);
}
sseq1=stat_seq
'ELSE'
{
int savedLoc2 = ch.emitSkip(1);
ch.emitBackup(savedLoc1);
ch.emitRM("JEQ",ac1,savedLoc2+1,0,"skip as many places as needed depending on the expression");
ch.emitRestore();
}
sseq2=stat_seq
{
int savedLoc3 = ch.emitSkip(0);
ch.emitBackup(savedLoc2);
ch.emitRM("LDC",PC_REG,savedLoc3,0,"skip for the else command");
ch.emitRestore();
}
'ENDIF')
| doWhileStatement
;
Any help would be appreciated, thank you
I found it for everyone who has the same problem I did it like this and it's working:
^('WHILE'
{int c = input.index();}
expression
{int s=input.index();}
.* )// .* is a sequence of statements
{
int next = input.index(); // index of node following WHILE
input.seek(c);
match(input, Token.DOWN, null);
pushFollow(FOLLOW_expression_in_statement339);
int condition = expression();
state._fsp--;
//there is a problem here
//expression() seemed to be reading from the grammar file and I couldn't
//get it to read from the tree walker rule somehow
//It printed something like no viable alt at input 'DOWN'
//I googled it and found this mistake
// So I copied the code from the normal while statement
// And pasted it here and it works like a charm
// Normally there should only be int condition = expression()
while ( condition == 1 ) {
input.seek(s);
stat_seq();//stat_seq is a sequence of statements: (statement ';')+
input.seek(c);
match(input, Token.DOWN, null); //Copied value from EvaluatorWalker.java
//cause couldn't find another way to do it
pushFollow(FOLLOW_expression_in_statement339);
condition = expression();
state._fsp--;
System.out.println("condition:"+condition + " i:"+ variables.get("i"));
}
input.seek(next);
}
I wrote the problem at the comments of my code. If anyone can help me out and answer this for me how to do it I would be grateful. It's so weird that there is nearly no feedback on a correct way to implement loops within a tree grammar on the fly.
Regards,
Alex
I am using the JSON2 library in order to use JSON.stringify to send some JSON data to my MVC controller.
When I include another script in my view (Telerik MVC) I start to get script conflicts when using IE7.
When I click the refresh button in the grid, I get the following error:
Line: 191
Error: Object doesn't support this property or method
String.prototype.toJSON =
Number.prototype.toJSON =
Boolean.prototype.toJSON = function (key) {
return this.valueOf();
};
The error occurs on the following line specifically:
return this.valueOf();
Does anyone have any insight into why this conflict is occurring and how to resolve it? Specifically, why would this work in IE8/Chrome but fail in IE7. What would cause the error? Are both scripts trying to define the same method and that's why it is failing or is it impossible to tell without digging through tons of code?
Edit:
This is the json2.js library I am speaking of: https://github.com/douglascrockford/JSON-js
Probably the reply is too late, but I thought it's worth replying as this might save some valuable lives ;)
The JSON2 script won't initialize/extend the JSON object if there is an existing implementation(Native or Included). However if the JSON object does not exist, the script will create that object and attach few methods to it (JSON.stringify and JSON.parse to be precise). However in order to make those methods to work, there are other objects (like Date, String, Number and Boolean objects) which need to be extended to support certain methods (like toJSON method). The JSON2 script takes care of extending the required objects as well.
Now coming to the specific issue here (Telerik MVC). I faced the same problem while working with Telerik for one of the Projects. However I was able to trace it. The probable cause is the conflict between Telerik scripts and the current JSON2 script. The Date and Boolean Objects' toJSON method somehow conflicts with Telerik's implmentation of the same method for those two objects which breaks the Telerik script at some places. I have modified the JSON2 library for a more robust check which doesn't fail in any scenario (even on use of Telerik MVC on the page). I have tested the script and it works fine for me, however in case someone finds any further conflicts, please reply back.
var JSON;
if (!JSON) {
JSON = {};
}
(function () {
'use strict';
function f(n) {
// Format integers to have at least two digits.
return n < 10 ? '0' + n : n;
}
if (typeof Date.prototype.toJSON !== 'function') {
Date.prototype.toJSON = function (key) {
return isFinite(this.valueOf())
? this.getUTCFullYear() + '-' +
f(this.getUTCMonth() + 1) + '-' +
f(this.getUTCDate()) + 'T' +
f(this.getUTCHours()) + ':' +
f(this.getUTCMinutes()) + ':' +
f(this.getUTCSeconds()) + /*added - start*/ '.'+
f(this.getUTCMilliseconds()) + /*added - end*/ 'Z'
: null;
};
//pushed the below code outside current if block
// String.prototype.toJSON =
// Number.prototype.toJSON =
// Boolean.prototype.toJSON = function (key) {
// return this.valueOf();
// };
}
/*added - start*/
if (typeof String.prototype.toJSON !== 'function') {
String.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
if (typeof Number.prototype.toJSON !== 'function') {
Number.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
if (typeof Boolean.prototype.toJSON !== 'function') {
Boolean.prototype.toJSON = function (key) {
return ((typeof this.valueOf === 'function') ? this.valueOf(): this.toString());
};
}
/*added - end*/
var cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
escapable = /[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
gap,
indent,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
},
rep;
function quote(string) {
// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.
escapable.lastIndex = 0;
return escapable.test(string) ? '"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string'
? c
: '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' : '"' + string + '"';
}
function str(key, holder) {
// Produce a string from holder[key].
var i, // The loop counter.
k, // The member key.
v, // The member value.
length,
mind = gap,
partial,
value = holder[key];
// If the value has a toJSON method, call it to obtain a replacement value.
if (value && typeof value === 'object' &&
typeof value.toJSON === 'function') {
value = value.toJSON(key);
}
// If we were called with a replacer function, then call the replacer to
// obtain a replacement value.
if (typeof rep === 'function') {
value = rep.call(holder, key, value);
}
// What happens next depends on the value's type.
switch (typeof value) {
case 'string':
return quote(value);
case 'number':
// JSON numbers must be finite. Encode non-finite numbers as null.
return isFinite(value) ? String(value) : 'null';
case 'boolean':
case 'null':
// If the value is a boolean or null, convert it to a string. Note:
// typeof null does not produce 'null'. The case is included here in
// the remote chance that this gets fixed someday.
return String(value);
// If the type is 'object', we might be dealing with an object or an array or
// null.
case 'object':
// Due to a specification blunder in ECMAScript, typeof null is 'object',
// so watch out for that case.
if (!value) {
return 'null';
}
// Make an array to hold the partial results of stringifying this object value.
gap += indent;
partial = [];
// Is the value an array?
if (Object.prototype.toString.apply(value) === '[object Array]') {
// The value is an array. Stringify every element. Use null as a placeholder
// for non-JSON values.
length = value.length;
for (i = 0; i < length; i += 1) {
partial[i] = str(i, value) || 'null';
}
// Join all of the elements together, separated with commas, and wrap them in
// brackets.
v = partial.length === 0
? '[]'
: gap
? '[\n' + gap + partial.join(',\n' + gap) + '\n' + mind + ']'
: '[' + partial.join(',') + ']';
gap = mind;
return v;
}
// If the replacer is an array, use it to select the members to be stringified.
if (rep && typeof rep === 'object') {
length = rep.length;
for (i = 0; i < length; i += 1) {
if (typeof rep[i] === 'string') {
k = rep[i];
v = str(k, value);
if (v) {
partial.push(quote(k) + (gap ? ': ' : ':') + v);
}
}
}
} else {
// Otherwise, iterate through all of the keys in the object.
for (k in value) {
if (Object.prototype.hasOwnProperty.call(value, k)) {
v = str(k, value);
if (v) {
partial.push(quote(k) + (gap ? ': ' : ':') + v);
}
}
}
}
// Join all of the member texts together, separated with commas,
// and wrap them in braces.
v = partial.length === 0
? '{}'
: gap
? '{\n' + gap + partial.join(',\n' + gap) + '\n' + mind + '}'
: '{' + partial.join(',') + '}';
gap = mind;
return v;
}
}
// If the JSON object does not yet have a stringify method, give it one.
if (typeof JSON.stringify !== 'function') {
JSON.stringify = function (value, replacer, space) {
// The stringify method takes a value and an optional replacer, and an optional
// space parameter, and returns a JSON text. The replacer can be a function
// that can replace values, or an array of strings that will select the keys.
// A default replacer method can be provided. Use of the space parameter can
// produce text that is more easily readable.
var i;
gap = '';
indent = '';
// If the space parameter is a number, make an indent string containing that
// many spaces.
if (typeof space === 'number') {
for (i = 0; i < space; i += 1) {
indent += ' ';
}
// If the space parameter is a string, it will be used as the indent string.
} else if (typeof space === 'string') {
indent = space;
}
// If there is a replacer, it must be a function or an array.
// Otherwise, throw an error.
rep = replacer;
if (replacer && typeof replacer !== 'function' &&
(typeof replacer !== 'object' ||
typeof replacer.length !== 'number')) {
throw new Error('JSON.stringify');
}
// Make a fake root object containing our value under the key of ''.
// Return the result of stringifying the value.
return str('', {'': value});
};
}
// If the JSON object does not yet have a parse method, give it one.
if (typeof JSON.parse !== 'function') {
JSON.parse = function (text, reviver) {
// The parse method takes a text and an optional reviver function, and returns
// a JavaScript value if the text is a valid JSON text.
var j;
function walk(holder, key) {
// The walk method is used to recursively walk the resulting structure so
// that modifications can be made.
var k, v, value = holder[key];
if (value && typeof value === 'object') {
for (k in value) {
if (Object.prototype.hasOwnProperty.call(value, k)) {
v = walk(value, k);
if (v !== undefined) {
value[k] = v;
} else {
delete value[k];
}
}
}
}
return reviver.call(holder, key, value);
}
// Parsing happens in four stages. In the first stage, we replace certain
// Unicode characters with escape sequences. JavaScript handles many characters
// incorrectly, either silently deleting them, or treating them as line endings.
text = String(text);
cx.lastIndex = 0;
if (cx.test(text)) {
text = text.replace(cx, function (a) {
return '\\u' +
('0000' + a.charCodeAt(0).toString(16)).slice(-4);
});
}
// In the second stage, we run the text against regular expressions that look
// for non-JSON patterns. We are especially concerned with '()' and 'new'
// because they can cause invocation, and '=' because it can cause mutation.
// But just to be safe, we want to reject all unexpected forms.
// We split the second stage into 4 regexp operations in order to work around
// crippling inefficiencies in IE's and Safari's regexp engines. First we
// replace the JSON backslash pairs with '#' (a non-JSON character). Second, we
// replace all simple value tokens with ']' characters. Third, we delete all
// open brackets that follow a colon or comma or that begin the text. Finally,
// we look to see that the remaining characters are only whitespace or ']' or
// ',' or ':' or '{' or '}'. If that is so, then the text is safe for eval.
if (/^[\],:{}\s]*$/
.test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '#')
.replace(/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g, ']')
.replace(/(?:^|:|,)(?:\s*\[)+/g, ''))) {
// In the third stage we use the eval function to compile the text into a
// JavaScript structure. The '{' operator is subject to a syntactic ambiguity
// in JavaScript: it can begin a block or an object literal. We wrap the text
// in parens to eliminate the ambiguity.
j = eval('(' + text + ')');
// In the optional fourth stage, we recursively walk the new structure, passing
// each name/value pair to a reviver function for possible transformation.
return typeof reviver === 'function'
? walk({'': j}, '')
: j;
}
// If the text is not JSON parseable, then a SyntaxError is thrown.
throw new SyntaxError('JSON.parse');
};
}
}());
Note: The above code is not my implementation. It is from the source https://github.com/douglascrockford/JSON-js I have just modified it a little to avoid any conflicts with Telerik or otherwise.
I have had exactly the same problem.
I couldn't find any other solution than to edit the json2.js file like you suggested, thanks for that.
However, I found that this would fix the issue for IE7 and still work in IE8/9 as well as firefox, but it now stopped working in Chrome ("this" for [this.valueOf === 'function'] is undefined).
Have you run into that issue, too, or did yours work in Chrome? I'm trying to figure out if this is related to my data or telerik-internal.
Thanks for your post!
Edit:
For now I have just returned null if "this" is undefined/null (in all three functions). Seems to work in all browsers and allows the Telerik grid to rebind without problems.
I don't know how correct this is in the global context of json2.js .toJSON method, though.
I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Here are the different elements:
Group IDs:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.
Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?
So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.
The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:
id^length^content^
At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.
So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.
Breaking it down into these steps, first we need a way to get "tokens" from the string. This function returns an object with three methods, to find out if we're at "end of file", and to grab the next delimited or counted substring:
var tokens = function(str) {
var pos = 0;
return {
eof: function() {
return pos == str.length;
},
delimited: function(d) {
var end = str.indexOf(d, pos);
if (end == -1) {
throw new Error('Expected delimiter');
}
var result = str.substr(pos, end - pos);
pos = end + d.length;
return result;
},
counted: function(c) {
var result = str.substr(pos, c);
pos += c;
return result;
}
};
};
Now we can conveniently write the reusable parse function:
var parse = function(str) {
var parts = {};
var t = tokens(str);
while (!t.eof()) {
var id = t.delimited('^');
var len = t.delimited('^');
var content = t.counted(parseInt(len, 10));
var end = t.counted(1);
if (end !== '^') {
throw new Error('Expected ^ after counted string, instead found: ' + end);
}
parts[id] = content;
}
return parts;
};
It builds an object where the keys are the IDs (or variable names). I'm asuming as they have names that the order isn't significant.
Then we can use that at both levels to create the function to do the whole job:
var parseGroups = function(str) {
var groups = parse(str);
Object.keys(groups).forEach(function(id) {
groups[id] = parse(groups[id]);
});
return groups;
}
For your example, it produces this object:
{
'1': {
VAR1: 'VALUE1'
},
'4': {
VAR1: 'VALUE1',
VAR2: 'VAL2'
}
}
I don't think it's a trivial task to create a grammar for this. But on the other hand, a simple straight forward approach is not that hard. You know the corresponding string length for every critical string. So you just chop your string according to those lengths apart..
where do you see problems?