How to parse text in Groovy - parsing

I need to parse a text (output from a svn command) in order to retrieve a number (svn revision).
This is my code. Note that I need to retrieve all the output stream as a text to do other operations.
def proc = cmdLine.execute() // Call *execute* on the strin
proc.waitFor() // Wait for the command to finish
def output = proc.in.text
//other stuff happening here
output.eachLine {
line ->
def revisionPrefix = "Last Changed Rev: "
if (line.startsWith(revisionPrefix)) res = new Integer(line.substring(revisionPrefix.length()).trim())
}
This code is working fine, but since I'm still a novice in Groovy, I'm wondering if there were a better idiomatic way to avoid the ugly if...
Example of svn output (but of course the problem is more general)
Path: .
Working Copy Root Path: /svn
URL: svn+ssh://svn.company.com/opt/svnserve/repos/project/trunk
Repository Root: svn+ssh://svn.company.com/opt/svnserve/repos
Repository UUID: 516c549e-805d-4d3d-bafa-98aea39579ae
Revision: 25447
Node Kind: directory
Schedule: normal
Last Changed Author: ubi
Last Changed Rev: 25362
Last Changed Date: 2012-11-22 10:27:00 +0000 (Thu, 22 Nov 2012)
I've got inspiration from the answer below and I solved using find(). My solution is:
def revisionPrefix = "Last Changed Rev: "
def line = output.readLines().find { line -> line.startsWith(revisionPrefix) }
def res = new Integer(line?.substring(revisionPrefix.length())?.trim()?:"0")
3 lines, no if, very clean

One possible alternative is:
def output = cmdLine.execute().text
Integer res = output.readLines().findResult { line ->
(line =~ /^Last Changed Rev: (\d+)$/).with { m ->
if( m.matches() ) {
m[ 0 ][ 1 ] as Integer
}
}
}
Not sure it's better or not. I'm sure others will have different alternatives
Edit:
Also, beware of using proc.text. if your proc outputs a lot of stuff, then you could end up blocking when the inputstream gets full...
Here is a heavily commented alternative, using consumeProcessOutput:
// Run the command
String output = cmdLine.execute().with { proc ->
// Then, with a StringWriter
new StringWriter().with { sw ->
// Consume the output of the process
proc.consumeProcessOutput( sw, System.err )
// Make sure we worked
assert proc.waitFor() == 0
// Return the output (goes into `output` var)
sw.toString()
}
}
// Extract the version from by looking through all the lines
Integer version = output.readLines().findResult { line ->
// Pass the line through a regular expression
(line =~ /Last Changed Rev: (\d+)/).with { m ->
// And if it matches
if( m.matches() ) {
// Return the \d+ part as an Integer
m[ 0 ][ 1 ] as Integer
}
}
}

Related

Unexpected error "incorrect use of identifier" when running DXL code

I have a script which calls functions from this script:
https://www.baselinesinc.com/dxl-repository/?filerepoaction=showpost&filepost_id=13
which is distributed under the LGPL. I have found several references to this script online, indicating that it works. For my own purposes, I removed the part of the code which executes on load (some 20 lines at the bottom of the file) and converted the file to an .inc, to serve as a library. I haven't done any other changes.
My own script has worked for a few months at this point. But now, all of a sudden, when I try to run it, I get these errors:
-E- DXL: <DOORSImporter\Import_From_Excel.inc:291> incorrect use of identifier (rtfValue)
-E- DXL: <DOORSImporter\Import_From_Excel.inc:292> incorrect use of identifier (nonRTFValue)
-E- DXL: <DOORSImporter\Import_From_Excel.inc:346> incorrect use of identifier (temp)
-E- DXL: <DOORSImporter\Import_From_Excel.inc:464> incorrect use of identifier (columnHeading)
-E- DXL: <DOORSImporter\Import_From_Excel.inc:1127> incorrect use of identifier (cellValue)
-I- DXL: All done. Errors reported: 5. Warnings reported: 0.
Below is a snippet of code which shows this error. I marked lines 291 and 292
bool richTextTruncated(int rowNumber, int columnNumber) {
Buffer rtfValue = create
Buffer nonRTFValue = create
rtfValue = ""
nonRTFValue = ""
rtfValue = copyRichTextFromCell(""intToCol(columnNumber) "" rowNumber"") // get the rich text
rtfValue = trimWhitespace(stripRichText(tempStringOf(rtfValue))) // strip off RTF formatting. <----- 291
nonRTFValue = trimWhitespace(getCellValue(rowNumber, columnNumber)) //<----- 292
if(tempStringOf(rtfValue) != tempStringOf(nonRTFValue)) { // if the two strings aren't equal
delete(rtfValue)
delete(nonRTFValue)
return true // return that text was truncated
}
delete(rtfValue)
delete(nonRTFValue)
return false
}
The function trimWhitespace itself is defined as
string trimWhitespace(string s) so it's OK to assign its output to a buffer.
So this is all very confusing because, like I said, this has worked and the code has not been changed (last check in of the file DOORSImporter\Import_From_Excel.inc is from last November, whereas I have used the script even in January.
Our IT department might have updated DOORS recently, I don't know. I checked the latest version of the dxl reference manual for any recent changes which might have caused this, but I couldn't find any change to the way buffers are handled.
To be fair, I have found working with buffers quite confusing. In my own code, I ended up discarding them altogether. However, the script Import_From_Excel has worked unchanged for a long time now.
Any support with this will be greatly appreciated.
********** Edit 27.02.2020 **************
The function copyRichTextFromCell is defined as below. Its sourced is from the link below.
https://www.baselinesinc.com/dxl-repository/?filerepoaction=showpost&filepost_id=10
string copyRichTextFromCell(string loc) {
if(!getCell(loc)) {
return("error");
}
if(!checkResult(oleMethod(objCell, cMethodSelect))) { // selects the cell
return("error");
}
if(!checkResult(oleMethod(objCell, cMethodCopy))) { // copies the contents of the cell to the clipboard
return("error");
}
return stringOf(richClip)
}
********** Edit 28.02.2020 **************
The function trimWhitespace is as below.
string trimWhitespace(string s) {
int first = 0;
int last = length(s) - 1;
while(last > 0 && (isspace(s[last]) || s[last] == '\n' || s[last] == '\r' || s[last] == '\t')) {
last--;
}
while(isspace(s[first]) && first < last) {
first++;
}
if(s[first:last] == " ") {
return("");
}
return(s[first:last]);
}
While playing around, I discovered that if I modify as below, one of the errors disappears. It is beyond me as to why that might happen, because the code is equivalent, as far as I can tell...
bool richTextTruncated(int rowNumber, int columnNumber) {
Buffer rtfValue = create
Buffer nonRTFValue = create
string fooString
rtfValue = ""
nonRTFValue = ""
rtfValue = copyRichTextFromCell(""intToCol(columnNumber) "" rowNumber"") // get the rich text
fooString = trimWhitespace(stripRichText(tempStringOf(rtfValue)))
rtfValue = fooString // strip off RTF formatting.
fooString = trimWhitespace(getCellValue(rowNumber, columnNumber))
nonRTFValue = fooString
if(tempStringOf(rtfValue) != tempStringOf(nonRTFValue)) { // if the two strings aren't equal
delete(rtfValue)
delete(nonRTFValue)
return true // return that text was truncated
}
delete(rtfValue)
delete(nonRTFValue)
return false
}
So the lines with fooString do not throw an error anymore. In the initial version, these lines were just e.g. rtfValue = trimWhitespace(stripRichText(tempStringOf(rtfValue))) and this would throw an error.

Grails config file closure definition

I'm writing a config file for grails app where I want to define redirect patterns.
I've written a config script RedirectMappingsConfig.groovy:
import java.util.regex.Pattern
def c = {pattern, goto, path ->
if (pattern instanceof Pattern && pattern.matcher(path).matches()) {
return goto
}
return false
}
def redirectFromTo = [
c.curry(Pattern.compile('/si/reference.*'), '/enterprise-solutions/references-and-partners#references'),
c.curry(Pattern.compile('/si/kontakt.*'), '/contact-us'),
c.curry(Pattern.compile('/si/zaposlitve.*'), '/careers'),
c.curry(Pattern.compile('/aa'), '/')
]
This list will be read in a filter which will perform redirect if some pattern matches request uri.
Problem: application does not compile, the error is:
Compilation error: startup failed:
RedirectMappingsConfig.groovy: 3: unexpected token: pattern # line 3, column 10.
def c = {pattern, goto, path ->
^
Any idea what is wrong with the syntax?
I'm using grails 2.1.1.
goto is a reserved word in Groovy... Change your closure to:
def c = {pattern, addr, path ->
if (pattern instanceof Pattern && pattern.matcher(path).matches()) {
return addr
}
return false
}
And this error should go away :-)

WLST Ant Task white space within arguments

I am using the WLST Ant task which allows a list of space delimited arguments to be passed in under the arguments attribute.
The issue is when I pass a file directory which contains a space. For instance "Program Files" which becomes two arguments of Program and Files.
Is there any suggestions to get around this?
My suggestion below would only work with one value.
For example append the "Program Files" argument to the end and loop from the known end argument to the actual end of sys.argv.
IE If we want "Program Files" to be the 4th system argument then inside the WLST script we append sys.argv[4],[5]...[end].
Short answer for WLST 11.1.1.9.0: You can't get around this.
I have the same problem and debugged a bit.
My findings:
The class WLSTTask in weblogic-11.1.1.9.jar calls via command line WLSTInterpreterInvoker which parses the args:
private void parseArgs(String[] arg) {
for (int i = 0; i < arg.length; i++) {
this.arguments = (this.arguments + " " + arg[i]);
}
[...]
For reasons I don't know these args are parsed again, before the python script is invoked:
private void executePyScript() {
[...]
if (this.arguments != null) {
String[] args = StringUtils.splitCompletely(this.arguments, " ");
[...]
public static String[] splitCompletely(String paramString1, String paramString2)
{
return splitCompletely(new StringTokenizer(paramString1, paramString2));
}
private static String[] splitCompletely(StringTokenizer paramStringTokenizer) {
int i = paramStringTokenizer.countTokens();
String[] arrayOfString = new String[i];
for (int j = 0; j < i; j++) arrayOfString[j] = paramStringTokenizer.nextToken();
return arrayOfString;
}
Unfortunately the StringTokenizer method does not distinguish quoted strings and so sys.argv in Python gets separate arguments, even if you quote the parameter
There are two possible alternatives:
Replace spaces in Ant with something else (eg %20) and 'decode' them in Python.
Write a property file in Ant and read from that in Pyhton.
The code for executePyScript() in 12.2.1 has changed a lot and it seems that the problem may be gone there (I haven't checked)
if ((this.arguments.indexOf("\"") == -1) && (this.arguments.indexOf("'") == -1))
args = StringUtils.splitCompletely(this.arguments, " ");
else {
args = splitQuotedString(this.arguments);
}

PEG for Python style indentation

How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:
Examples of a not-yet-existing programming language:
square x =
x * x
cube x =
x * square x
fib n =
if n <= 1
0
else
fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets
Update:
Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:
foo
bar = 1
baz = 2
tap
zap = 3
# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }
Pure PEG cannot parse indentation.
But peg.js can.
I did a quick-and-dirty experiment (being inspired by Ira Baxter's comment about cheating) and wrote a simple tokenizer.
For a more complete solution (a complete parser) please see this question: Parse indentation level with PEG.js
/* Initializations */
{
function start(first, tail) {
var done = [first[1]];
for (var i = 0; i < tail.length; i++) {
done = done.concat(tail[i][1][0])
done.push(tail[i][1][1]);
}
return done;
}
var depths = [0];
function indent(s) {
var depth = s.length;
if (depth == depths[0]) return [];
if (depth > depths[0]) {
depths.unshift(depth);
return ["INDENT"];
}
var dents = [];
while (depth < depths[0]) {
depths.shift();
dents.push("DEDENT");
}
if (depth != depths[0]) dents.push("BADDENT");
return dents;
}
}
/* The real grammar */
start = first:line tail:(newline line)* newline? { return start(first, tail) }
line = depth:indent s:text { return [depth, s] }
indent = s:" "* { return indent(s) }
text = c:[^\n]* { return c.join("") }
newline = "\n" {}
depths is a stack of indentations. indent() gives back an array of indentation tokens and start() unwraps the array to make the parser behave somewhat like a stream.
peg.js produces for the text:
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
these results:
[
"alpha",
"INDENT",
"beta",
"gamma",
"INDENT",
"delta",
"DEDENT",
"DEDENT",
"epsilon",
"INDENT",
"zeta",
"DEDENT",
"BADDENT",
"eta",
"theta",
"INDENT",
"iota",
"DEDENT",
"",
""
]
This tokenizer even catches bad indents.
I think an indentation-sensitive language like that is context-sensitive. I believe PEG can only do context-free langauges.
Note that, while nalply's answer is certainly correct that PEG.js can do it via external state (ie the dreaded global variables), it can be a dangerous path to walk down (worse than the usual problems with global variables). Some rules can initially match (and then run their actions) but parent rules can fail thus causing the action run to be invalid. If external state is changed in such an action, you can end up with invalid state. This is super awful, and could lead to tremors, vomiting, and death. Some issues and solutions to this are in the comments here: https://github.com/dmajda/pegjs/issues/45
So what we are really doing here with indentation is creating something like a C-style blocks which often have their own lexical scope. If I were writing a compiler for a language like that I think I would try and have the lexer keep track of the indentation. Every time the indentation increases it could insert a '{' token. Likewise every time it decreases it could inset an '}' token. Then writing an expression grammar with explicit curly braces to represent lexical scope becomes more straight forward.
You can do this in Treetop by using semantic predicates. In this case you need a semantic predicate that detects closing a white-space indented block due to the occurrence of another line that has the same or lesser indentation. The predicate must count the indentation from the opening line, and return true (block closed) if the current line's indentation has finished at the same or shorter length. Because the closing condition is context-dependent, it must not be memoized.
Here's the example code I'm about to add to Treetop's documentation. Note that I've overridden Treetop's SyntaxNode inspect method to make it easier to visualise the result.
grammar IndentedBlocks
rule top
# Initialise the indent stack with a sentinel:
&{|s| #indents = [-1] }
nested_blocks
{
def inspect
nested_blocks.inspect
end
}
end
rule nested_blocks
(
# Do not try to extract this semantic predicate into a new rule.
# It will be memo-ized incorrectly because #indents.last will change.
!{|s|
# Peek at the following indentation:
save = index; i = _nt_indentation; index = save
# We're closing if the indentation is less or the same as our enclosing block's:
closing = i.text_value.length <= #indents.last
}
block
)*
{
def inspect
elements.map{|e| e.block.inspect}*"\n"
end
}
end
rule block
indented_line # The block's opening line
&{|s| # Push the indent level to the stack
level = s[0].indentation.text_value.length
#indents << level
true
}
nested_blocks # Parse any nested blocks
&{|s| # Pop the indent stack
# Note that under no circumstances should "nested_blocks" fail, or the stack will be mis-aligned
#indents.pop
true
}
{
def inspect
indented_line.inspect +
(nested_blocks.elements.size > 0 ? (
"\n{\n" +
nested_blocks.elements.map { |content|
content.block.inspect+"\n"
}*'' +
"}"
)
: "")
end
}
end
rule indented_line
indentation text:((!"\n" .)*) "\n"
{
def inspect
text.text_value
end
}
end
rule indentation
' '*
end
end
Here's a little test driver program so you can try it easily:
require 'polyglot'
require 'treetop'
require 'indented_blocks'
parser = IndentedBlocksParser.new
input = <<END
def foo
here is some indented text
here it's further indented
and here the same
but here it's further again
and some more like that
before going back to here
down again
back twice
and start from the beginning again
with only a small block this time
END
parse_tree = parser.parse input
p parse_tree
I know this is an old thread, but I just wanted to add some PEGjs code to the answers. This code will parse a piece of text and "nest" it into a sort of "AST-ish" structure. It only goes one deep and it looks ugly, furthermore it does not really use the return values to create the right structure but keeps an in-memory tree of your syntax and it will return that at the end. This might well become unwieldy and cause some performance issues, but at least it does what it's supposed to.
Note: Make sure you have tabs instead of spaces!
{
var indentStack = [],
rootScope = {
value: "PROGRAM",
values: [],
scopes: []
};
function addToRootScope(text) {
// Here we wiggle with the form and append the new
// scope to the rootScope.
if (!text) return;
if (indentStack.length === 0) {
rootScope.scopes.unshift({
text: text,
statements: []
});
}
else {
rootScope.scopes[0].statements.push(text);
}
}
}
/* Add some grammar */
start
= lines: (line EOL+)*
{
return rootScope;
}
line
= line: (samedent t:text { addToRootScope(t); }) &EOL
/ line: (indent t:text { addToRootScope(t); }) &EOL
/ line: (dedent t:text { addToRootScope(t); }) &EOL
/ line: [ \t]* &EOL
/ EOF
samedent
= i:[\t]* &{ return i.length === indentStack.length; }
{
console.log("s:", i.length, " level:", indentStack.length);
}
indent
= i:[\t]+ &{ return i.length > indentStack.length; }
{
indentStack.push("");
console.log("i:", i.length, " level:", indentStack.length);
}
dedent
= i:[\t]* &{ return i.length < indentStack.length; }
{
for (var j = 0; j < i.length + 1; j++) {
indentStack.pop();
}
console.log("d:", i.length + 1, " level:", indentStack.length);
}
text
= numbers: number+ { return numbers.join(""); }
/ txt: character+ { return txt.join(""); }
number
= $[0-9]
character
= $[ a-zA-Z->+]
__
= [ ]+
_
= [ ]*
EOF
= !.
EOL
= "\r\n"
/ "\n"
/ "\r"

How to find functions in a cpp file that contain a specific word

using grep, vim's grep, or another unix shell command, I'd like to find the functions in a large cpp file that contain a specific word in their body.
In the files that I'm working with the word I'm looking for is on an indented line, the corresponding function header is the first line above the indented line that starts at position 0 and is not a '{'.
For example searching for JOHN_DOE in the following code snippet
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
should give me
void bar ( std::string arg2 )
The algorithm that I hope to catch in grep/vim/unix shell scripts would probably best use the indentation and formatting assumptions, rather than attempting to parse C/C++.
Thanks for your suggestions.
I'll probably get voted down for this!
I am an avid (G)VIM user but when I want to review or understand some code I use Source Insight. I almost never use it as an actual editor though.
It does exactly what you want in this case, e.g. show all the functions/methods that use some highlighted data type/define/constant/etc... in a relations window...
(source: sourceinsight.com)
Ouch! There goes my rep.
As far as I know, this can't be done. Here's why:
First, you have to search across lines. No problem, in vim adding a _ to a character class tells it to include new lines. so {_.*} would match everything between those brackets across multiple lines.
So now you need to match whatever the pattern is for a function header(brittle even if you get it to work), then , and here's the problem, whatever lines are between it and your search string, and finally match your search string. So you might have a regex like
/^\(void \+\a\+ *(.*)\)\_.*JOHN_DOE
But what happens is the first time vim finds a function header, it starts matching. It then matches every character until it finds JOHN_DOE. Which includes all the function headers in the file.
So the problem is that, as far as I know, there's no way to tell vim to match every character except for this regex pattern. And even if there was, a regex is not the tool for this job. It's like opening a beer with a hammer. What we should do is write a simple script that gives you this info, and I have.
fun! FindMyFunction(searchPattern, funcPattern)
call search(a:searchPattern)
let lineNumber = line(".")
let lineNumber = lineNumber - 1
"call setpos(".", [0, lineNumber, 0, 0])
let lineString = getline(lineNumber)
while lineString !~ a:funcPattern
let lineNumber = lineNumber - 1
if lineNumber < 0
echo "Function not found :/"
endif
let lineString = getline(lineNumber)
endwhile
echo lineString
endfunction
That should give you the result you want and it's way easier to share, debug, and repurpose than a regular expression spit from the mouth of Cthulhu himself.
Tough call, although as a starting point I would suggest this wonderful VIM Regex Tutorial.
You cannot do that reliably with a regular expression, because code is not a regular language. You need a real parser for the language in question.
Arggh! I admit this is a bit over the top:
A little program to filter stdin, strip comments, and put function bodies on the same line. It'll get fooled by namespaces and function definitions inside class declarations, besides other things. But it might be a good start:
#include <stdio.h>
#include <assert.h>
int main() {
enum {
NORMAL,
LINE_COMMENT,
MULTI_COMMENT,
IN_STRING,
} state = NORMAL;
unsigned depth = 0;
for(char c=getchar(),prev=0; !feof(stdin); prev=c,c=getchar()) {
switch(state) {
case NORMAL:
if('/'==c && '/'==prev)
state = LINE_COMMENT;
else if('*'==c && '/'==prev)
state = MULTI_COMMENT;
else if('#'==c)
state = LINE_COMMENT;
else if('\"'==c) {
state = IN_STRING;
putchar(c);
} else {
if(('}'==c && !--depth) || (';'==c && !depth)) {
putchar(c);
putchar('\n');
} else {
if('{'==c)
depth++;
else if('/'==prev && NORMAL==state)
putchar(prev);
else if('\t'==c)
c = ' ';
if(' '==c && ' '!=prev)
putchar(c);
else if(' '<c && '/'!=c)
putchar(c);
}
}
break;
case LINE_COMMENT:
if(' '>c)
state = NORMAL;
break;
case MULTI_COMMENT:
if('/'==c && '*'==prev) {
c = '\0';
state = NORMAL;
}
break;
case IN_STRING:
if('\"'==c && '\\'!=prev)
state = NORMAL;
putchar(c);
break;
default:
assert(!"bug");
}
}
putchar('\n');
return 0;
}
Its c++, so just it in a file, compile it to a file named 'stripper', and then:
cat my_source.cpp | ./stripper | grep JOHN_DOE
So consider the input:
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
The output of "cat example.cpp | ./stripper" is:
int foo ( int arg1 ) { }
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The output of "cat example.cpp | ./stripper | grep JOHN_DOE" is:
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The job of finding the function name (guess its the last identifier to precede a "(") is left as an exercise to the reader.
For that kind of stuff, although it comes to primitive searching again, I would recommend compview plugin. It will open up a search window, so you can see the entire line where the search occured and automatically jump to it. Gives a nice overview.
(source: axisym3.net)
Like Robert said Regex will help. In command mode start a regex search by typing the "/" character followed by your regex.
Ctags1 may also be of use to you. It can generate a tag file for a project. This tag file allows a user to jump directly from a function call to it's definition even if it's in another file using "CTRL+]".
u can use grep -r -n -H JOHN_DOE * it will look for "JOHN_DOE" in the files recursively starting from the current directory
you can use the following code to practically find the function which contains the text expression:
public void findFunction(File file, String expression) {
Reader r = null;
try {
r = new FileReader(file);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
BufferedReader br = new BufferedReader(r);
String match = "";
String lineWithNameOfFunction = "";
Boolean matchFound = false;
try {
while(br.read() > 0) {
match = br.readLine();
if((match.endsWith(") {")) ||
(match.endsWith("){")) ||
(match.endsWith("()")) ||
(match.endsWith(")")) ||
(match.endsWith("( )"))) {
// this here is because i guessed that method will start
// at the 0
if((match.charAt(0)!=' ') && !(match.startsWith("\t"))) {
lineWithNameOfFunction = match;
}
}
if(match.contains(expression)) {
matchFound = true;
break;
}
}
if(matchFound)
System.out.println(lineWithNameOfFunction);
else
System.out.println("No matching function found");
} catch (IOException ex) {
ex.printStackTrace();
}
}
i wrote this in JAVA, tested it and works like a charm. has few drawbacks though, but for starters it's fine. didn't add support for multiple functions containing same expression and maybe some other things. try it.

Resources