How can you define alternative literals in a syntax definition? - rascal

I am attempting to define a syntax to parse data definitions in COBOL and had a particular definition for picture clauses like this:
syntax PictureClause = pic: "PIC" PictureStringType PictureStringLen ("VALUE"|"VALUES") ValueSpec
My matching ADT for this syntax was as so:
data PictureClause = pic(str pictype, PictureStringLen plen, str valuespec);
However, I noticed that it seems as if the implode function was attempting to match the parenthesized statement to the second str parameter, instead of ignoring it like the "PIC" string literal. However, this syntax definition worked as expected:
syntax PictureClause = pic: "PIC" PictureStringType PictureStringLen "VALUE" ValueSpec
|pic: "PIC" PictureStringType PictureStringLen "VALUES" ValueSpec;
As the title states, how can I define alternatives in a single statement for literals that I do not want in my ADT in a syntax definition? I can see that alternatives are possible, but I'm wondering if there is a more concise way of defining it, in the spirit of my first attempt

I seem to recall that the current version of implode treats alternatives as nodes and does not flatten them, even if the alternatives are merely literals. Your definition is perfect, nevertheless.
It's a relatively simple feature request imho, if you have time to register it on GitHub.
Another option is to not implode at all and use concrete syntax

Related

Subscript operator with variable or how to cut the end-of-line characters?

I have an expression in Jenkinsfile:
myvar = myvar.substring(0, myvar.length() - 2)
The goal is to cut the EoL characters from the string (which contains the execution result of the batch command).
Recently I've installed GroovyLint plugin for VSCode and it complained about this line:
Violation in class None. The String.substring(int, int) method can be replaced with the subscript operatorGroovyLint(UnnecessarySubstring-1)
I've googled what is that subscript operator and it looks like the replacement would be something like this:
myvar = myvar[0..myvar.length() - 2]
but unfortunately, it does not work: it gives no visible error, but also makes no changes to the myvar.
What do I miss? Maybe you can't use variables as part of the subscript operator?
Maybe there is a better way to cut those end-of-line characters? I guess I could use regexp, but to me, that sounds like overkill.
Thanks!
Thanks to ernset_k, found the answer!
The problem was that the subscript operator includes the upper bound. In my original scenario, last 2 characters where not printable and that's why I did not see the difference while debugging. I had to use "myvar.length()-3".
But as indicated by ernest_k, instead of calculating the length of the string you need to trim, we can also use other options of the operator. All these examples work as expected:
println myvar[0..myvar.length() - 3]
println myvar[0..<-2]
println myvar[0..-3]

How can I simplify statements like these in an =OR() statement?

isnumber(search("-tr",right(j2,3
))),isnumber(search("-trus",right(j2,5))),isnumber(search(" ll",right(j2,3))),isnumber(search(" homes",right(j2,6))),isnumber(search("the ",left(j2,4))),isnumber(search(" hoa",right(j2,4))),isnumber(search("b ch",right(j2,4))),isnumber(search(" ch",right(j2,3))),isnumber(search("-trs",right(j2,4))),isnumber(search(" prop",right(j2,5))),isnumber(search(" st",right(j2,3))),isnumber(search(" av",right(j2,3))),isnumber(search(" ave",right(j2,4))),isnumber(search(" servi",right(j2,6))),isnumber(search(" maint",right(j2,6))),isnumber(search(" home",right(j2,5))),isnumber(search(" tr",right(j2,3))),isnumber(search(" assn",right(j2,5))),isnumber(search(" co",right(j2,3))),isnumber(search(" trus",right(j2,5))),isnumber(search(" trs",right(j2,4))),isnumber(search("-trs",right(j2,4))),isnumber(search(" tru",right(j2,4))),isnumber(search("jtrs",right(j2,4))),isnumber(search(" est of",right(j2,7))),isnumber(search(" trs",right(j2,4))),isnumber(value(LEFT(j2,1))),isnumber(search(" apts",right(j2,5))),isnumber(value(right(j2,3))),isnumber(search(" grp",right(j2,4))),isnumber(value(left(right(j2,4),1))),isnumber(search(" mgmt",right(j2,5))),isnumber(search(" props",right(j2,6))),isnumber(search(" tr",right(j2,3))),isnumber(search(" dev",right(j2,4))),isnumber(search(" tr",right(j2,3))),isnumber(search(" fdn",right(j2,4))),isnumber(search(" ent",right(j2,4))),isnumber(search(" PRPTS",right(j2,6))),isnumber(search(" ARPTS",right(j2,6))),isnumber(search(" univ",right(j2,5)))
So I have this giant =OR() statement containing a bunch of isnumner(search() statements checking to see if the string in a cell ends in these phrases. It is for the purpose of identifying company names in lists that contain both peoples names and company names. I feel like there must be a more efficient way. Adding them all together in one isnumber(search() in this format {item1|item2|item3} does not work.
I feel like there must be a more efficient way.
Building on the answer provided here, matching the end of the string can be done by using the $-sign (which means 'end of the string in regular expressions). Matching the beginning of the string on the other hand is done by providing a pattern after a caret (^), indicating the beginning of a string.
So, if you'd want to add both the the formula provided in the other thread
(LP|JT/RS)$ : match LP OR JT/RS at the end of the string
^(ABC|DEF) : match ABC OR DEF at the beginning of the string
That would make the formula look something like:
=REGEXMATCH(A2, "(?i)LLC|CORPORATION|COMPANY|HOLDINGS|PARTNERS|EQUITY|(LP|JT/RS)$|^(ABC|DEF)")
REFERENCE:
REGEXMATCH()
RE2 SYNTAX

Rails: Given a String, check if an Array (of strings) contains a substring of String

Is there a more Railsy way to do this (without explicit regex, perhaps?):
array_o_strings = ["some strings", "I'd like", "to parse"]
string = "like to parse"
re = Regexp.union(array_o_strings.map { |i| Regexp.new(i) })
string =~ re
Just pining for magical Rails methods.
There's really nothing wrong with using a regular expression here if that's your intent. It's generally more efficient to use one of those than to go through the trouble of comparing arrays.
It's worth noting you don't have to do that much work to get this:
re = Regexp.union(array)
That should handle automatically escaping those strings and compiling them into a singular regular expression. Test with strings containing * and ? to be sure.
One note to add on style is that the =~ operator is a hold-over from Perl. It's preferable to use string.match(re) to make it clear what's going on there.
How big is the array? It may be worth comparing the speed using a regex vs checking each element. If the array is sorted shortest to longest that would help when checking one by one as you're more likely to find a match first.
In any event, this is one way:
array_o_strings.any?{|e| string.index(e) }

Why are redundant parenthesis not allowed in syntax definitions?

This syntax module is syntactically valid:
module mod1
syntax Empty =
;
And so is this one, which should be an equivalent grammar to the previous one:
module mod2
syntax Empty =
( )
;
(The resulting parser accepts only empty strings.)
Which means that you can make grammars such as this one:
module mod3
syntax EmptyOrKitchen =
( ) | "kitchen"
;
But, the following is not allowed (nested parenthesis):
module mod4
syntax Empty =
(( ))
;
I would have guessed that redundant parenthesis are allowed, since they are allowed in things like expressions, e.g. ((2)) + 2.
This problem came up when working with the data types for internal representation of rascal syntax definitions. The following code will create the same module as in the last example, namely mod4 (modulo some whitespace):
import Grammar;
import lang::rascal::format::Grammar;
str sm1 = definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
alt({seq([])})
],{})))))));
The problematic part of the data is on its own line - alt({seq([])}). If this code is changed to seq([]), then you get the same syntax module as mod2. If you further delete this whole expression, i.e. so that you get this:
str sm3 =
definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
], {})))))));
Then you get mod1.
So should such redundant parenthesis by printed by the definition2rascal(...) function? And should it matter with regards to making the resulting module valid or not?
Why they are not allowed is basically we wanted to see if we could do without. There is currently no priority relation between the symbol kinds, so in general there is no need to have a bracket syntax (like you do need to + and * in expressions).
Already the brackets have two different semantics, one () being the epsilon symbol and two (Sym1 Sym2 ...) being a nested sequence. This nested sequence is defined (syntactically) to expect at least two symbols. Now we could without ambiguity introduce a third semantics for the brackets with a single symbol or relax the requirement for sequence... But we reckoned it would be confusing that in one case you would get an extra layer in the resulting parse tree (sequence), while in the other case you would not (ignored superfluous bracket).
More detailed wise, the problem of printing seq([]) is not so much a problem of the meta syntax but rather that the backing abstract notation is more relaxed than the concrete notation (i.e. it is a bigger language or an over-approximation). The parser generator will generate a working parser for seq([]). But, there is no Rascal notation for an empty sequence and I guess the pretty printer should throw an exception.

Remove "[string]" from BUILD_LOG_REGEX extracted lines

Here is my sample string.
[echo] The SampleProject solution currently has 85% code coverage.
My desired output should be.
The SampleProject solution currently has 85% code coverage.
Btw, I had this out because I'm getting through the logs in my CI using Jenkins.
Any help? Thanks..
You can try substText parameter in BUILD_LOG_REGEX token to substitute the text matching your regex
New optional arg: ${BUILD_LOG_REGEX, regex, linesBefore, linesAfter, maxMatches, showTruncatedLines, substText} which allows substituting text for the matched regex. This is particularly useful when the text contains references to capture groups (i.e. $1, $2, etc.)
Using below will remove prefix [echo] from all of your logs ,
${BUILD_LOG_REGEX, regex="^\[echo] (.*)$", maxMatches=0, showTruncatedLines=false, substText="$1"}
\[[^\]]*\] will match the bit you want to remove. Just use a string replace function to replace that bit with an empty string.
Andrew has the right idea, but with Perl-style regex syntaxes (which includes Java's built-in regex engine), you can do even better:
str.replaceAll("\\[.*?\\]", "");
(i.e., use the matching expression \[.*?\]. The ? specifies minimal match: so it will finish matching upon the first ] found.)

Resources