Adding numbers in a dynamic string separated by some token in the kusto table - parsing

Suppose there is a table like below:
datatable(str:string) [
"a,b,2,10,d,e;a,b,c,14,d,e;a,b,c,10,d,e",
"a,b,c,11,d,e;a,b,c,12,d,e;a,b,c,13,d,e;a,b,c,10,d,e",
"a,b,c,20,d,e;a,b,c,25,d,e",
]
I need to add 4th value in each string separated by semicolon
e.g. Answer for above table is
10+14+10=34
11+12+13+10=46
20+25=45
I tried below which works for single row:
let calculateCostForARow = (str:string) {
print row = split(str,";")
| mv-expand row
| parse row with * "," * "," * "," cost:long "," *
| summarize sum(cost)
};
calculateCostForARow("a,b,c,11,d,e;a,b,c,12,d,e;a,b,c,13,d,e;a,b,c,10,d,e")
but doesn't work for table as to_scalar has issues with table
let calculateCostForARow = (str:string) {
toscalar(print row = split(str,";")
| mv-expand row
| parse row with * "," * "," * "," cost:long "," *
| summarize sum(cost))
};
datatable(str:string) [
"a,b,c,10,d,e;a,b,c,10,d,e;a,b,c,10,d,e",
"a,b,c,10,d,e;a,b,c,10,d,e;a,b,c,10,d,e;a,b,c,10,d,e",
"a,b,c,10,d,e;a,b,c,10,d,e",
]
| project calculateCostForARow(str)
Let me know if there are other ways to do this?

you could try this, using mv-apply:
datatable(str:string) [
"a,b,2,10,d,e;a,b,c,14,d,e;a,b,c,10,d,e",
"a,b,c,11,d,e;a,b,c,12,d,e;a,b,c,13,d,e;a,b,c,10,d,e",
"a,b,c,20,d,e;a,b,c,25,d,e",
]
| mv-apply s = split(str, ";") on (
summarize result = sum(tolong(split(s, ",", 3)[0]))
)
str
result
a,b,2,10,d,e;a,b,c,14,d,e;a,b,c,10,d,e
34
a,b,c,11,d,e;a,b,c,12,d,e;a,b,c,13,d,e;a,b,c,10,d,e
46
a,b,c,20,d,e;a,b,c,25,d,e
45

Related

Need DXL code to arrange attribute lines into table (converting DOORS data to LaTeX source)

I have a DXL script which parses all data in DOORS columns into a LaTeX -compatible text source file. What I can't figure out is how to re-order some data into a tabular - compatible format. The attributes in question are DXL links to a reference DOORS module, so there is one line (separated by a line-feed) per link in each cell. Currently I loop thru all columns for each object (row), with the code snippet (part of the full script)
for col in doorsModule do {
var_name = title( col )
if( ! main( col ) && search( regexp "Absolute Number", var_name, 0 ) == false )
{
// oss is my output stream variable
if ( length(text(col, obj) ) > 0 )
{
oss << "\\textbf{";
oss << var_name; // still the column title here
oss << "}\t"
var_name = text( col, obj );
oss << var_name;
oss << "\n\n";
c++;
}
}
}
Examples of the contents of a cell, where I have separately parsed the Column Name to bold and collected it prior to collecting the cell contents. All four lines are the contents of a single cell.
\textbf{LinkedItemName}
DISTANCE
MinSpeed
MaxSpeed
Time
\textbf{Unit}
m
km/h
km/h
minutes
\textbf{Driver1}
100
30
80
20
\textbf{Driver2}
50
20
60
10
\textbf{Driver3}
60
30
60
30
What I want to do is re-arrange the data so that I can write the source code for a table, to wit:
\textbf{LinkedItemName} & \textbf{Unit} & \textbf{Driver1} & \textbf{Driver2} & \textbf{Driver3} \\
DISTANCE & m & 100 & 50 & 60 \\
MinSpeed & km/h & 30 & 20 & 30 \\
MaxSpeed & km/h & 80 & 60 & 60 \\
Time & minutes & 20 & 10 & 30 \\
I know in advance the exact Attribute names I'm "collecting." I can't figure out how to manipulate the data returned from each cell (regex or otherwise) to create my desired final output. I'm guessing some regex code (in DXL) might be able to assign the contents of each line within a cell to a series of variables, but don't quite see how.
Combination of regex and string assembly seems to work. Here's a sample bit of code (some of which is straight from the DOORS DXL Reference Manual)
int idx = 0
Array thewords = create(1,1)
Array thelen = create(1,1)
Regexp getaline = regexp2 ".*"
// matches any character except newline
string txt1 = "line 1\nline two\nline three\n"
// 3 line string
while (!null txt1 && getaline txt1) {
int ilen = length(txt1[match 0])
print "ilen is " ilen "\n"
put(thelen, ilen, idx, 0)
putString(thewords,txt1[match 0],0,idx)
idx ++
// match 0 is whole of match
txt1 = txt1[end 0 + 2:] // move past newline
}
int jj
// initialize to simplify adding the "&"
int lenone = (int get(thelen,0,0) )
string foo = (string get(thewords, 0, 0,lenone ) )
int lenout
for (jj = 1; jj < idx; jj++) {
lenout = (int get(thelen,jj,0) )
foo = foo "&" (string get(thewords, 0, jj,lenout ) )
}
foo = foo "\\\\"
// foo is now "line 1&line two&line three\\ " (without quotes) as LaTeX wants

How to handle text tables with FastParse?

I have text file with single row table (tab separated) and I need to parse it to receive Map("one" -> 1, "two" -> 2, "three" -> 3). I can't figure out how to do it and even not sure that it is possible at all. Any ideas guys?
one two three
1 2 3
Ok, I've figured out how to do it by myself.
val lines = Source.fromResource("test.txt").getLines().mkString("\r\n")
def sentence[_: P] = P(CharIn("0-9", "a-z").rep(1).!)
def tableHeader[_: P] = P((sentence.! ~ "\t".?).rep ~ lineSeparator)
def tableRow[_: P](h: Seq[String]) = P((sentence.! ~ "\t".?).rep ~ (lineSeparator | End))
.map(r => println(h.zip(r).toMap))
def singleRowTable[_: P] = P(tableHeader.flatMap(tableRow))
def lineSeparator[_: P] = P("\r\n" | "\r" | "\n")
def parseA[_: P] = P(singleRowTable)
parse(lines, parseA(_), true) match {
case Parsed.Success(value, successIndex) =>
println("Success value=" + value +" successIndex=" + successIndex)
case f # Parsed.Failure(label, index, extra) =>
println("Failure " + f.trace(true))
}
It will print
Map(one -> 1, two -> 2, three -> 3)
Success value=() successIndex=20

How to parse only comments using pegjs grammar?

I've written a pegjs grammar that is supposed to parse any kind of js/c-style comments. However, it's not quite working since I've only managed to capture the comment itself, and ignore everything else. How should I alter this grammar to only parse comments out of any kind of input?
Grammar:
Start
= Comment
Character
= .
Comment
= MultiLineComment
/ SingleLineComment
LineTerminator
= [\n\r\u2028\u2029]
MultiLineComment
= "/*" (!"*/" Character)* "*/"
MultiLineCommentNoLineTerminator
= "/*" (!("*/" / LineTerminator) Character)* "*/"
SingleLineComment
= "//" (!LineTerminator Character)*
Input:
/**
* Trending Content
* Returns visible videos that have the largest view percentage increase over
* the time period.
*/
Other text here
Error
Line 5, column 4: Expected end of input but "\n" found.
You need to refactor to specifically capture the line content before you consider the comment (either single or multiple line), as in:
lines = result:line* {
return result
}
line = WS* line:$( !'//' CHAR )* single_comment ( EOL / EOF ) { // single-comment line
return line.replace(/^\s+|\s+$/g,'')
}
/ WS* line:$( !'/*' CHAR )* multi_comment ( EOL / EOF ) { // mult-comment line
return line.replace(/^\s+|\s+$/g,'')
}
/ WS* line:$CHAR+ ( EOL / EOF ) { // non-blank line
return line.replace(/^\s+|\s+$/g,'')
}
/ WS* EOL { // blank line
return ''
}
single_comment = WS* '//' CHAR* WS*
multi_comment = WS* '/*' ( !'*/' ( CHAR / EOL ) )* '*/' WS*
CHAR = [^\n]
WS = [ \t]
EOF = !.
EOL = '\n'
which, when run against:
no comment here
single line comment // single-comment HERE
test of multi line comment /*
multi-comment HERE
*/
last line
returns:
[
"no comment here",
"",
"single line comment",
"",
"test of multi line comment",
"",
"last line"
]

Parsing a TeX-like language with lpeg

I am struggling to get my head around LPEG. I have managed to produce one grammar which does what I want, but I have been beating my head against this one and not getting far. The idea is to parse a document which is a simplified form of TeX. I want to split a document into:
Environments, which are \begin{cmd} and \end{cmd} pairs.
Commands which can either take an argument like so: \foo{bar} or can be bare: \foo.
Both environments and commands can have parameters like so: \command[color=green,background=blue]{content}.
Other stuff.
I also would like to keep track of line number information for error handling purposes. Here's what I have so far:
lpeg = require("lpeg")
lpeg.locale(lpeg)
-- Assume a lot of "X = lpeg.X" here.
-- Line number handling from http://lua-users.org/lists/lua-l/2011-05/msg00607.html
-- with additional print statements to check they are working.
local newline = P"\r"^-1 * "\n" / function (a) print("New"); end
local incrementline = Cg( Cb"linenum" )/ function ( a ) print("NL"); return a + 1 end , "linenum"
local setup = Cg ( Cc ( 1) , "linenum" )
nl = newline * incrementline
space = nl + lpeg.space
-- Taken from "Name-value lists" in http://www.inf.puc-rio.br/~roberto/lpeg/
local identifier = (R("AZ") + R("az") + P("_") + R("09"))^1
local sep = lpeg.S(",;") * space^0
local value = (1-lpeg.S(",;]"))^1
local pair = lpeg.Cg(C(identifier) * space ^0 * "=" * space ^0 * C(value)) * sep^-1
local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
local parameters = (P("[") * list * P("]")) ^-1
-- And the rest is mine
anything = C( (space^1 + (1-lpeg.S("\\{}")) )^1) * Cb("linenum") / function (a,b) return { text = a, line = b } end
begin_environment = P("\\begin") * Ct(parameters) * P("{") * Cg(identifier, "environment") * Cb("environment") * P("}") / function (a,b) return { params = a[1], environment = b } end
end_environment = P("\\end{") * Cg(identifier) * P("}")
texlike = lpeg.P{
"document";
document = setup * V("stuff") * -1,
stuff = Cg(V"environment" + anything + V"bracketed_stuff" + V"command_with" + V"command_without")^0,
bracketed_stuff = P"{" * V"stuff" * P"}" / function (a) return a end,
command_with =((P("\\") * Cg(identifier) * Ct(parameters) * Ct(V"bracketed_stuff"))-P("\\end{")) / function (i,p,n) return { command = i, parameters = p, nodes = n } end,
command_without = (( P("\\") * Cg(identifier) * Ct(parameters) )-P("\\end{")) / function (i,p) return { command = i, parameters = p } end,
environment = Cg(begin_environment * Ct(V("stuff")) * end_environment) / function (b,stuff, e) return { b = b, stuff = stuff, e = e} end
}
It almost works!
> texlike:match("\\foo[one=two]thing\\bar")
{
command = "foo",
parameters = {
{
one = "two",
},
},
}
{
line = 1,
text = "thing",
}
{
command = "bar",
parameters = {
},
}
But! First, I can't get the line number handling part to work at all. The function within incrementline is never fired.
I also can't quite work out how nested capture information is passed to handling functions (which is why I have scattered Cg, C and Ct semirandomly over the grammar). This means that only one item is returned from within a command_with:
> texlike:match("\\foo{text \\command moretext}")
{
command = "foo",
nodes = {
{
line = 1,
text = "text ",
},
},
parameters = {
},
}
I would also love to be able to check that the environment start and ends match up but when I tried to do so, my back references from "begin" were not in scope by the time I got to "end". I don't know where to go from here.
Late answer but hopefully it'll offer some insight if you're still looking for a solution or wondering what the problem was.
There are a couple of issues with your grammar, some of which can be tricky to spot.
Your line increment here looks incorrect:
local incrementline = Cg( Cb"linenum" ) /
function ( a ) print("NL"); return a + 1 end,
"linenum"
It looks like you meant to create a named capture group and not an anonymous group. The backcapture linenum is essentially being used like a variable. The problem is because this is inside an anonymous capture, linenum will not update properly -- function(a) will always receive 1 when called. You need to move the closing ) to the end so "linenum" is included:
local incrementline = Cg( Cb"linenum" /
function ( a ) print("NL"); return a + 1 end,
"linenum")
Relevant LPeg documentation for Cg capture.
The second problem is with your anything non-terminal rule:
anything = C( (space^1 + (1-lpeg.S("\\{}")) )^1) * Cb("linenum") ...
There are several things to be careful here. First, a named Cg capture (from incrementline rule once it's fixed) doesn't produce anything unless it's in a table or you backref it. The second major thing is that it has an adhoc scope like a variable. More precisely, its scope ends once you close it in an outer capture -- like what you're doing here:
C( (space^1 + (...) )^1)
Which means by the time you reference its backcapture with * Cb("linenum"), that's already too late -- the linenum you really want already closed its scope.
I always found LPeg's re syntax a bit easier to grok so I've rewritten the grammar with that instead:
local grammar_cb =
{
fold = pairfold,
resetlinenum = resetlinenum,
incrementlinenum = incrementlinenum, getlinenum = getlinenum,
error = error
}
local texlike_grammar = re.compile(
[[
document <- '' -> resetlinenum {| docpiece* |} !.
docpiece <- {| envcmd |} / {| cmd |} / multiline
beginslash <- cmdslash 'begin'
endslash <- cmdslash 'end'
envcmd <- beginslash paramblock? {:beginenv: envblock :} (!endslash docpiece)*
endslash openbrace {:endenv: =beginenv :} closebrace / &beginslash {} -> error .
envblock <- openbrace key closebrace
cmd <- cmdslash {:command: identifier :} (paramblock? cmdblock)?
cmdblock <- openbrace {:nodes: {| docpiece* |} :} closebrace
paramblock <- opensq ( {:parameters: {| parampairs |} -> fold :} / whitesp) closesq
parampairs <- parampair (sep parampair)*
parampair <- key assign value
key <- whitesp { identifier }
value <- whitesp { [^],;%s]+ }
multiline <- (nl? text)+
text <- {| {:text: (!cmd !closebrace !%nl [_%w%p%s])+ :} {:line: '' -> getlinenum :} |}
identifier <- [_%w]+
cmdslash <- whitesp '\'
assign <- whitesp '='
sep <- whitesp ','
openbrace <- whitesp '{'
closebrace <- whitesp '}'
opensq <- whitesp '['
closesq <- whitesp ']'
nl <- {%nl+} -> incrementlinenum
whitesp <- (nl / %s)*
]], grammar_cb)
The callback functions are straight-forwardly defined as:
local function pairfold(...)
local t, kv = {}, ...
if #kv % 2 == 1 then return ... end
for i = #kv, 2, -2 do
t[ kv[i - 1] ] = kv[i]
end
return t
end
local incrementlinenum, getlinenum, resetlinenum do
local line = 1
function incrementlinenum(nl)
assert(not nl:match "%S")
line = line + #nl
end
function getlinenum() return line end
function resetlinenum() line = 1 end
end
Testing the grammar with a non-trivial tex-like str with multiple lines:
local test1 = [[\foo{text \bar[color = red, background = black]{
moretext \baz{
even
more text} }
this time skipping multiple
lines even, such wow!}]]
Produces the follow AST in lua-table format:
{
command = "foo",
nodes = {
{
text = "text",
line = 1
},
{
parameters = {
color = "red",
background = "black"
},
command = "bar",
nodes = {
{
text = " moretext",
line = 2
},
{
command = "baz",
nodes = {
{
text = "even ",
line = 3
},
{
text = "more text",
line = 4
}
}
}
}
},
{
text = "this time skipping multiple",
line = 7
},
{
text = "lines even, such wow!",
line = 9
}
}
}
And a second test for begin/end environments:
local test2 = [[\begin[p1
=apple,
p2=blue]{scope} scope foobar
\end{scope} global foobar]]
Which seems to give approximately what you're looking for:
{
{
{
text = " scope foobar",
line = 3
},
parameters = {
p1 = "apple",
p2 = "blue"
},
beginenv = "scope",
endenv = "scope"
},
{
text = " global foobar",
line = 4
}
}

Xtext grammar error "Decision can match input ... using multiple alternatives: 1, 3, 4, 5"

I got stuck with my xtext grammar definition. Basically I like to define multiple parameters for a component. The component should contain at least one parameter definition paramA OR paramB OR paramC OR (paramA AND paramB) OR (paramB AND paramC) OR (paramA AND paramB AND paramC).
Overall these are 6 cases, as you can see in my grammar definition:
Component:
'Define available parameters:' (
(newParamA = ParamA | newParamB = ParamB | newParamC = ParamC)
| (newParamA = ParamA & newParamB = ParamB)
| (newParamA = ParamA & newParamC = ParamC)
| (newParamB = ParamB & newParamC = ParamC)
| (newParamA = ParamA & newParamB = ParamB & newParamC = ParamC)
)
;
ParamA: ('paramA = ' paramA=Integer ';');
ParamB: ('paramB = ' paramB=Integer ';');
ParamC: ('paramC = ' paramC=Integer ';');
// Datatype
Integer returns ecore::EIntegerObject: '-'? INT;
Here is what is working when I reduce my grammar to use (newParamA = ParamA | newParamB = ParamB | newParamC = ParamC) only, means without the other cases in the first code snippet:
Define available parameters:
paramA = 1;
...
Define available parameters:
paramB = 2;
...
Define available parameters:
paramC = 3;
But I like to be able to define multiple available params in my dsl, e.g.
Define available parameters:
paramA = 1; paramB = 2;
...
Define available parameters:
paramB = 2; paramC = 3;
...
Define available parameters:
paramA = 1; paramB = 2; paramC = 3;
Any idea how to resolve that issue? Hope you can help me, I'ld appreciate any help!
This is the error I get when generating the grammar from code snippet #1:
warning(200): ../my.packagename/src-gen/my/packagename/projectname/parser/antlr/internal/InternalMyDSL.g:722:1: Decision can match input such as "'paramC = ' '-' RULE_INT ';'" using multiple alternatives: 1, 3, 4, 5
As a result, alternative(s) 3,5,4 were disabled for that input
Semantic predicates were present but were hidden by actions.
...
4514 [main] ERROR enerator.CompositeGeneratorFragment - java.io.FileNotFoundException: ..\my.packagename.ui\src-gen\my\packagename\projectname\ui\contentassist\antlr\internal\InternalMyDSLParser.java (The system cannot find the file specified)
org.eclipse.emf.common.util.WrappedException: java.io.FileNotFoundException: ..\my.packagename.ui\src-gen\my\packagename\projectname\ui\contentassist\antlr\internal\InternalMyDSLParser.java (The system cannot find the file specified)
at org.eclipse.xtext.util.Files.readFileIntoString(Files.java:129)
at org.eclipse.xtext.generator.parser.antlr.AbstractAntlrGeneratorFragment.simplifyUnorderedGroupPredicates(AbstractAntlrGeneratorFragment.java:130)
at org.eclipse.xtext.generator.parser.antlr.AbstractAntlrGeneratorFragment.simplifyUnorderedGroupPredicatesIfRequired(AbstractAntlrGeneratorFragment.java:118)
at org.eclipse.xtext.generator.parser.antlr.XtextAntlrUiGeneratorFragment.generate(XtextAntlrUiGeneratorFragment.java:86)
Here is a workaround I've tried (which works) but it's not a solution because the keywords within the language are changing to avoid the parser error:
('newParamA1 = ' paramA1=Integer ';')
| ('newParamB1 = ' paramB1=Integer ';')
| ('newParamC1 = ' paramC1=Integer ';')
| (('newParamA2 = ' paramA2=Integer ';') & ('newParamB2 = ' paramB2=Integer ';'))
| (('newParamA3 = ' paramA3=Integer ';') & ('newParamC2 = ' paramC2=Integer ';'))
| (('newParamB3 = ' paramB3=Integer ';') & ('newParamC3 = ' paramC3=Integer ';'))
| (('newParamA4 = ' paramA4=Integer ';') & ('newParamB4 = ' paramB4=Integer ';') & ('newParamC4 = ' paramC4=Integer ';'))
I think what you really want is a validation that ensures that at least one parameter is given on the semantic level rather than on the syntactic level. This will greatly simplify your grammar, e.g you could just use
(newParamA = ParamA)? & (newParamB = ParamB)? & (newParamC = ParamC)?
(parenth. added for clarity)
Also note that it's generally a good idea to avoid spaces in keywords. You should prefer 'paramA' '=' over 'paramA ='. This will greatly improve the error handling in the lexer / parser.
What you want to do is something like this:
You want a simple grammar (as Sebastian described it):
(newParamA = ParamA)? & (newParamB = ParamB)? & (newParamC = ParamC)?
To make sure that at least one parameter is required, you can write your own validator, which could look like this:
class MyDSLValidator extends AbstractMyDSLValidator {
#Check
def void atLeastOneParameter(Component component) {
if (component.newParamA == null && component.newParamB == null && component.newParamC == null) {
error('requires at least one parameter definition', MyDSLPackage.Literals.COMPONENT__PARAMA);
}
}
}

Resources