Hi I'm trying to build a regular expression builder to detect 2 or more spaces or a tab, so (let twoOrMoreSpacesOrTab = /\s{2,}|\t/)
How to build this using a Regex Builder?
I tried this but its not 100% acurate:
ChoiceOf {
OneOrMore(" ")
One("\t")
}
The problem here is that is trying to match multiples of 2 white spaces, and I want to consume the whole thing.
This might work.
let twoOrMoreSpacesOrTab = Regex {
ChoiceOf {
Repeat(2...) {
One(.whitespace)
}
One("\t")
}
}
Related
I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*
im trying to make locals with Chinese letters
local 屁 = p
or
屁 = p
none of those work
any ways to do it?
You can't do this as "屁" is not a valid Lua identifier.
Lua identifiers can only have letters, numbers and underscores and must not start with a number.
However, you can create a table with a key 屁:
local chinese_letters = {
["屁"] = p
}
And access it as chinese_letters["屁"], for example
local chinese_letters = {
["屁"] = 10
}
print(chinese_letters["屁"])
By the way, the correct name for these chinese characters is Hanzi
I am parsing BASIC:
530 FOR I=1 TO 9:C(I,1)=0:C(I,2)=0:NEXT I
The patterns that are used in this case are:
FOR { return TOK_FOR; }
TO { return TOK_TO; }
NEXT { return TOK_NEXT; }
(many lines later...)
[A-Za-z_#][A-Za-z0-9_]*[\$%\!#]? {
yylval.s = g_string_new(yytext);
return IDENTIFIER;
}
(many lines later...)
[ \t\r\l] { /* eat non-string whitespace */ }
The problem occurs when the spaces are removed, which was common in the era of 8k RAM. So the line that is actually found in Super Star Trek is:
530 FORI=1TO9:C(I,1)=0:C(I,2)=0:NEXTI
Now I know why this is happening: "FORI" is longer than "FOR", it's a valid IDENTIFIER in my pattern, so it matches IDENTIFIER.
The original rule in MS BASIC was that variable names could be only two characters, so there was no * so the match would fail. But this version is also supporting GW BASIC and Atari BASIC, which allow variables with long names. So "FORI" is a legal variable name in my scanner, so that matches as it is the longest hit.
Now when I look at the manual, and the only similar example deliberately returns an error. It seems what I need is "match the ID, but only if it's not the same as defined %token", is there such a thing?
It's easy to recognise keywords even if they have an identifier concatenated. What's tricky is deciding in what contexts you should apply that technique.
Here's a simple pattern for recognising keywords, using trailing context:
tail [[:alnum:]]*[$%!#]?
%%
FOR/{tail} { return TOK_FOR; }
TO/{tail} { return TOK_TO; }
NEXT/{tail} { return TOK_NEXT; }
/* etc. */
[[:alpha:]]{tail} { /* Handle an ID */ }
Effectively, that just extends the keyword match without extending the matched token.
But I doubt the problem is so simple. How should FORFORTO be tokenised, for example?
Is it possible to format a document as follows, using Xtext formatting? As you can see, Test children are indented with 4 spaces while External children are indented with 2 spaces only. I am using Xtext 2.12.0.
Test my_prog {
Device = "my_device";
Param = 0;
}
External {
Path = "my_path";
File = "my_file";
}
you could try to work with custom replacers, dont know if this will work with nested block though
def dispatch void format(External model, extension IFormattableDocument document) {
model.regionFor.keyword("}").prepend[newLine]
for (l : model.ids) {
val region = l.regionFor.feature(MyDslPackage.Literals.IDX__NAME)
region.prepend[newLine]
val r = new AbstractTextReplacer(document, region) {
override createReplacements(ITextReplacerContext it) {
val offset = region.offset
it.addReplacement(region.textRegionAccess.rewriter.createReplacement(offset, 0, " "))
it
}
}
addReplacer(r)
}
}
I am using corona (lua) with parse.com and I have hit a problem constructing an $in query using values from another table / array.
My code is a little like this:
local usersToFetch = {}
table.insert( usersToFetch, "KnVvDiV2Cj")
table.insert( usersToFetch, "Paf6LDmykp")
and the working query I want to perform is the following lua table (which will get encoded before heading to parse). As I said, this works when I am hard coding the values as shown
local queryTable = {
["where"] = {
["objectId"] = { ["$in"] = {"KnVvDiV2Cj","Paf6LDmykp" }}
},
["limit"] = 1000
}
My problem is how do I incorporate my 'usersToFetch' table in the above table so that it will work the same as hard coding the values?
I swore I tried that but clearly I did not.. I think that I placed it inside the braces whereas they are not needed which is where I went wrong.
Thanks, hjpotte92 - what you put worked fine but this is my final solution in a single declaration:
Where I was going wrong before was because I had too many braces ["objectId"] = { ["$in"] = { usersToFetch } }
local queryTable = {
["where"] = {
["objectId"] = { ["$in"] = usersToFetch}
},
["limit"] = 1000
}