I have some data in file and need to print in output some format to the data in print.
Example content to parse:
012231-33339411.sxz.ree.fg*-*
U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p
012236-33349111.sxz.ree.fg*-*
bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH
I need some way to take in Awk the first variable from begin to "-"
Example:
variable1=012231
and
variable1=012236
Variable 2 the 4 digits after the - character
Example:
Variable2=3333
and
variable2=3334
Variable 3 the 2 digits after the 4 digits of variable2
Example:
variable3=94
and
variable3=91
Variable 4 as the text before the newline
Example:
variable4=U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p
and
variable4=bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH
Example print expected in output:
'012231' '3333' '94' 'U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p'
'012236' '3334' '91' 'bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH'
Haved tested the following code with result of print selecting by number of record and counting the fixed width of the field, without care the format or shape of the content.
awk -v FIELDWIDTHS="6 1 4 2 2 15" 'NR==1{print $1" "$3" "$4}NR==2{print}NR==3{print $1" "$3" "$4}NR==4{print}' file
But it`s a large file with variable lenght of number of records in the large string so the equal will not work for this case I will need catch this string to a variable to print it later in the output as field in all the sequences of show this field.
Could help me with some code to parse the input and print the output as close to the need, please explain how to take the positions in the input.
Thank in advance.
Using any awk in any shell on every Unix box:
$ cat tst.awk
split($0,f,"-") > 1 {
if ( NR > 1 ) {
prt()
delete var
}
var[1] = f[1]
var[2] = substr(f[2],1,4)
var[3] = substr(f[2],5,2)
next
}
{ var[4] = var[4] $0 }
END { prt() }
function prt( i) {
for ( i=1; i<=4; i++ ) {
printf "\047%s\047%s", var[i], (i<4 ? OFS : ORS)
}
}
$ awk -f tst.awk file
'012231' '3333' '94' 'U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqvFhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p'
'012236' '3334' '91' 'bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVpDUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTmcUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH'
I have found an old file that define antlr grammar rules like that:
rule_name[ ParamType *param ] > [ReturnType *retval]:
<<
$retval = NULL;
OtherType1 *new_var1 = NULL;
OtherType2 *new_var2 = NULL;
>>
subrule1[ param ] > [ $retval ]
| subrule2 > [new_var2]
<<
if( new_var2 == SOMETHING ){
$retval = something_related_to_new_var2;
}
else{
$retval = new_var2;
}
>>
{
somethingelse > [new_var_1]
<<
/* Do something with new_var_1 */
$retval = new_var_1;
>>
}
;
I'm not an Antlr expert and It's the first time that i see this kind of semantic for a rule definition.
Does anybody know where I can find documentation/informations about this?
Even a keyword for a google search is welcome.
Edit:
It should be ANTLR Version 1.33MR33.
Ok, I found! Here is the guide:
http://www.antlr2.org/book/pcctsbk.pdf
I quote the interesting part of the pdf that answer to my question.
1) Page 47:
poly > [float r]
: <<float f;>>
term>[$r] ( "\+" term>[f] <<$r += f;>> )*
;
Rule poly is defined to have a return value called $r via the "> [float r]" notation; this is similar to the output redirection character of UNIX shells. Setting the value of $r sets the return value of poly. he first action after the ":" is an init-action (because it is the first action of a rule or subrule). The init-action defines a local variable called f that will be used in the (...)* loop to hold the return value of the term.
2) Page 85:
A rule looks like:
rule : alternative1
| alternative2
...
| alternativen
;
where each alternative production is composed of a list of elements that can be references to rules, references to tokens, actions, predicates, and subrules. Argument and return value definitions looks like the following where there are n arguments and m return values:
rule[arg1,...,argn] > [retval1,...,retvalm] : ... ;
The syntax for using a rule mirrors its definition:
a : ... rule[arg1,...,argn] > [v1,...,vm] ...
;
Here, the various vi receive the return values from the rule rule, each vi must be an l-value.
3) Page 87:
Actions are of the form <<...>> and contain user-supplied C or C++ code that must be executed during the parse.
initially I have to state, that I have little to no experience with powershell so far. A previous system generates the wrong output for me. So I want to use PowerShell to change this. From the System I get an output looking like this:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^...^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^...^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
When you look at it, there is a starting part for each line (TEST1^|^9999^|^) followed by a1 to a-n tuples (example: Y^|^NOT IN^|^('1','2','3')^|^).
The way I want this to look like is here:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')
TEST2^|^9998^|^N^|^LIKE^|^('6','7','8','9')
TEST2^|^9998^|^Y^|^NOT IN^|^('1','2','15','16','17')
TEST2^|^9998^|^Y^|^NOT IN^|^('18','19','20','21','22')
So the tuples shall be printed out per line, with the starting part attached in front.
My solution approach is the AWK equivalent in Powershell, but to date I lack the understanding of how to tackle the issue of how to deal with an indetermined number of tuples and to repeat the starting block.
I thank you so much in advance for your help!
I'd split the lines at ^|^ and recombine the fields of the resulting array in a loop. Something like this:
$sp = '^|^'
Get-Content 'C:\path\to\input.txt' | % {
$a = $_ -split [regex]::Escape($sp)
for ($i=2; $i -lt $a.length; $i+=3) {
"{0}$sp{1}$sp{2}$sp{3}$sp{4}" -f $a[0,1,$i,($i+1),($i+2)]
}
} | Set-Content 'C:\path\to\output.txt'
The data looks quite regular so you could loop over it using | as the delimiter and counting the following cells in 3s:
$data = #"
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
"#
$data.split("`n") | % {
$ds = $_.split("|")
$heading = "$($ds[0])|$($ds[1])"
$j = 0
for($i = 2; $i -lt $ds.length; $i += 1) {
$line += "|$($ds[$i])" -replace "\^(\((?:'\d+',?)+\))\^?",'$1'
$j += 1
if($j -eq 3) {
write-host $heading$line
$line = ""
$j = 0
}
}
}
Parsing an arbitary length string record to row records is quite error prone. A simple solution would be processing the data row-by-row and creating output.
Here is a simple illustration how to process a single row. Processing the whole input file and writing output is left as trivial an exercise to the reader.
$s = "TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')"
$t = $s.split('\)', [StringSplitOptions]::RemoveEmptyEntries)
$testNum = ([regex]::match($t[0], "(?i)(test\d+\^\|\^\d+)")).value # Hunt for 1st colum values
$t[0] = $t[0] + ')' # Fix split char remove
for($i=1;$i -lt $t.Length; ++$i) { $t[$i] = $testNum + $t[$i] + ')' } # Add 1st colum and split char remove
$t
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')
I have a simple question, which is almost too simple to find on this forum or on awk learning sites.
I have some awk code that matches a line beginning with a number, and prints the 6th column of that line:
/^[1-9]/ {
print $6
}
How do I tell it to print only the first 50 rows of the column from the match?
ADDITIONAL QUESTION
I tried used my own version of the answers below and I got it to print 50 lines. However, now I am trying to choose which 50 lines I print. I do this by skipping a line that starts with a number and contains the word 'residue'. Then I skip 5 lines that start with a number and contain a 'w'. This method is working as if I am only skipping the line with residue and prints from the first line starting with a number after that. Do you know why my 'w's are not being considered.
#!/usr/bin/awk -f
BEGIN {
line = 0;
skipW = 0;
}
# Ignore all lines beginning with a number until I find one I'm interested in.
/^[0-9]+ residue/ { next }
# Ignore the first five lines beginning with a number followed by a 'w'.
/^[0-9]+ w/ {
skipW += 1;
if (skipW <= 5) next
}
# For all other lines beginning with a number, perform the following. If we are
# "printing", increment the line count. When we've printed 50 lines turn off
# printing from that point on.
/^[0-9]+/ {
++line
if ((line > 0) && (line <= 50)) print $6
}
Use a match counter as part of your condition:
/^[1-9]/ && matched < 50 {
print $6
matched++
}
You can use a shortcut method also:
/^[1-9]/ { print $6; matched++ }
matched == 50 { exit }
But this may not always work on a pipline, if the producer command does not handle SIGPIPE gracefully.
awk '/^[1-9]/ { if (num_printed++ < 50) print $6 }'
This increments num_printed each time a match is found and prints out the first 50 such lines, regardless of where the lines are in the files in the input.
This reads through all the input. If an early exit is OK, then you can use:
awk '/^[1-9]/ { print $6; if (++num_printed == 50) exit }'
Note the switch from post-increment to pre-increment.
I'm trying to implement a parser by directly reading a treeWalker and implementing the commands needed for the compiler on the fly. So if I have a command like:
statement
:
^('WRITE' expression)
{
//Here is the command that is created by my Tree Parser
ch.emitRO("OUT",0,0,0,"write out the value of ac");
//and then I handle it in my other classes
}
;
I want it to write OUT 0,0,0; to a file. That's my grammar.
I have a problem though with the loop section in my grammar it is:
'WHILE'^ expression 'DO' stat_seq 'ENDDO'
and in the tree parser:
doWhileStatement
:
^('WHILE' expression 'DO' stat_seq 'ENDDO')
;
What I want to do is directly parse the code from the while loop into the commands I need. I came up with this solution but it doesn't work:
doWhileStatement
:
^('WHILE' e=expression head='DO'
{
int loopHead =((CommonTree) head).getTokenStartIndex();
}
stat_seq
{
if ($e.result==1) {
input.seek(loopHead);
doWhileStatement();
}
}
'ENDDO')
;
for the record here are some of the other commands I've written:
(ignore the code written in brackets, it's for the generation of the commands in a text file.)
stat_seq
:
(statement)+
;
statement
:
^(':=' ID e=expression) { variables.put($ID.text,e); }
| ^('WRITE' expression)
{
ch.emitRM("LDC",ac,$expression.result,0,"pass the expression value to the ac reg");
ch.emitRO("OUT",ac,0,0,"write out the value of ac");
}
| ^('READ' ID)
{
ch.emitRO("IN",ac,0,0,"read value");
}
| ^('IF' expression 'THEN'
{
ch.emitRM("LDC",ac1,$expression.result,0,"pass the expression result to the ac reg");
int savedLoc1 = ch.emitSkip(1);
}
sseq1=stat_seq
'ELSE'
{
int savedLoc2 = ch.emitSkip(1);
ch.emitBackup(savedLoc1);
ch.emitRM("JEQ",ac1,savedLoc2+1,0,"skip as many places as needed depending on the expression");
ch.emitRestore();
}
sseq2=stat_seq
{
int savedLoc3 = ch.emitSkip(0);
ch.emitBackup(savedLoc2);
ch.emitRM("LDC",PC_REG,savedLoc3,0,"skip for the else command");
ch.emitRestore();
}
'ENDIF')
| doWhileStatement
;
Any help would be appreciated, thank you
I found it for everyone who has the same problem I did it like this and it's working:
^('WHILE'
{int c = input.index();}
expression
{int s=input.index();}
.* )// .* is a sequence of statements
{
int next = input.index(); // index of node following WHILE
input.seek(c);
match(input, Token.DOWN, null);
pushFollow(FOLLOW_expression_in_statement339);
int condition = expression();
state._fsp--;
//there is a problem here
//expression() seemed to be reading from the grammar file and I couldn't
//get it to read from the tree walker rule somehow
//It printed something like no viable alt at input 'DOWN'
//I googled it and found this mistake
// So I copied the code from the normal while statement
// And pasted it here and it works like a charm
// Normally there should only be int condition = expression()
while ( condition == 1 ) {
input.seek(s);
stat_seq();//stat_seq is a sequence of statements: (statement ';')+
input.seek(c);
match(input, Token.DOWN, null); //Copied value from EvaluatorWalker.java
//cause couldn't find another way to do it
pushFollow(FOLLOW_expression_in_statement339);
condition = expression();
state._fsp--;
System.out.println("condition:"+condition + " i:"+ variables.get("i"));
}
input.seek(next);
}
I wrote the problem at the comments of my code. If anyone can help me out and answer this for me how to do it I would be grateful. It's so weird that there is nearly no feedback on a correct way to implement loops within a tree grammar on the fly.
Regards,
Alex