Parse and change the output of a system through Powershell - parsing

initially I have to state, that I have little to no experience with powershell so far. A previous system generates the wrong output for me. So I want to use PowerShell to change this. From the System I get an output looking like this:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^...^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^...^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
When you look at it, there is a starting part for each line (TEST1^|^9999^|^) followed by a1 to a-n tuples (example: Y^|^NOT IN^|^('1','2','3')^|^).
The way I want this to look like is here:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')
TEST2^|^9998^|^N^|^LIKE^|^('6','7','8','9')
TEST2^|^9998^|^Y^|^NOT IN^|^('1','2','15','16','17')
TEST2^|^9998^|^Y^|^NOT IN^|^('18','19','20','21','22')
So the tuples shall be printed out per line, with the starting part attached in front.
My solution approach is the AWK equivalent in Powershell, but to date I lack the understanding of how to tackle the issue of how to deal with an indetermined number of tuples and to repeat the starting block.
I thank you so much in advance for your help!

I'd split the lines at ^|^ and recombine the fields of the resulting array in a loop. Something like this:
$sp = '^|^'
Get-Content 'C:\path\to\input.txt' | % {
$a = $_ -split [regex]::Escape($sp)
for ($i=2; $i -lt $a.length; $i+=3) {
"{0}$sp{1}$sp{2}$sp{3}$sp{4}" -f $a[0,1,$i,($i+1),($i+2)]
}
} | Set-Content 'C:\path\to\output.txt'

The data looks quite regular so you could loop over it using | as the delimiter and counting the following cells in 3s:
$data = #"
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
"#
$data.split("`n") | % {
$ds = $_.split("|")
$heading = "$($ds[0])|$($ds[1])"
$j = 0
for($i = 2; $i -lt $ds.length; $i += 1) {
$line += "|$($ds[$i])" -replace "\^(\((?:'\d+',?)+\))\^?",'$1'
$j += 1
if($j -eq 3) {
write-host $heading$line
$line = ""
$j = 0
}
}
}

Parsing an arbitary length string record to row records is quite error prone. A simple solution would be processing the data row-by-row and creating output.
Here is a simple illustration how to process a single row. Processing the whole input file and writing output is left as trivial an exercise to the reader.
$s = "TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')"
$t = $s.split('\)', [StringSplitOptions]::RemoveEmptyEntries)
$testNum = ([regex]::match($t[0], "(?i)(test\d+\^\|\^\d+)")).value # Hunt for 1st colum values
$t[0] = $t[0] + ')' # Fix split char remove
for($i=1;$i -lt $t.Length; ++$i) { $t[$i] = $testNum + $t[$i] + ')' } # Add 1st colum and split char remove
$t
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')

Related

AWK take some data input from file and set as variable in output

I have some data in file and need to print in output some format to the data in print.
Example content to parse:
012231-33339411.sxz.ree.fg*-*
U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p
012236-33349111.sxz.ree.fg*-*
bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH
I need some way to take in Awk the first variable from begin to "-"
Example:
variable1=012231
and
variable1=012236
Variable 2 the 4 digits after the - character
Example:
Variable2=3333
and
variable2=3334
Variable 3 the 2 digits after the 4 digits of variable2
Example:
variable3=94
and
variable3=91
Variable 4 as the text before the newline
Example:
variable4=U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p
and
variable4=bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH
Example print expected in output:
'012231' '3333' '94' 'U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqv
FhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p'
'012236' '3334' '91' 'bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVp
DUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTm
cUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH'
Haved tested the following code with result of print selecting by number of record and counting the fixed width of the field, without care the format or shape of the content.
awk -v FIELDWIDTHS="6 1 4 2 2 15" 'NR==1{print $1" "$3" "$4}NR==2{print}NR==3{print $1" "$3" "$4}NR==4{print}' file
But it`s a large file with variable lenght of number of records in the large string so the equal will not work for this case I will need catch this string to a variable to print it later in the output as field in all the sequences of show this field.
Could help me with some code to parse the input and print the output as close to the need, please explain how to take the positions in the input.
Thank in advance.
Using any awk in any shell on every Unix box:
$ cat tst.awk
split($0,f,"-") > 1 {
if ( NR > 1 ) {
prt()
delete var
}
var[1] = f[1]
var[2] = substr(f[2],1,4)
var[3] = substr(f[2],5,2)
next
}
{ var[4] = var[4] $0 }
END { prt() }
function prt( i) {
for ( i=1; i<=4; i++ ) {
printf "\047%s\047%s", var[i], (i<4 ? OFS : ORS)
}
}
$ awk -f tst.awk file
'012231' '3333' '94' 'U2FsdGVkX1+1pfXeR/h4u6P/BrItX75L0wHVIka4yA6tqS9a5CFUWvLu1AB4x2m8NpmJ>fyoXdADqlWDiGWi6Pw1a8NgNDbdTOlMtGBz4FCi8n97UdVQX9f0a2u9d5l7lOCxVDDzd>wJXbi9x4O+Dmo/lm9DbWAjBGKwWu0tTQxsU2TIpqvFhUZmGd3E6vN+puPXz4yXeVQhMfQ+K8OpSM2ZuTpKCtDgm0SdUDyFnalA4lxHaFZqh+E>3+9JgHK7/KiiZmIJshUmqrwnkX0yKihCcOXCzaFITiByxBM/7PGeJo0IBAjyKI/GflgQ>8GsIWWRkCJnz2OMiYKr8uOMOAfTHnW57Dq+orDG1p'
'012236' '3334' '91' 'bCRIVArOSClIWrZz6KciBFT2iPjqsS/qMRSBYinBzpDmESj8kZHoGQ46BMq+LgHJiY5P>7yygNxCkEv25GKGViKTX1X6KSSLZ+RVNEts4N7jzVLoufZ+X/TAv2Ib7pnnEj7h4rWDn>y7KP1XrTynItaas5z5fpFt2zUHFNElvNmyrjbFZVpDUsnWWDuvemWUr5YwOLxeRCnwTvfw71gwGEVeBzIJq4TsZb2/G8j9vpb/L7KNybsyQNN>DlOTMW5CHzd5otyYaNBcYo9V/4ky63q2vZMzQDWtCwVPaTKREPUqPLRKea3VkQnnsUic>/iBe+6Sv5GYl+XPGbIjWbTJWLQmc1kv8LXPyvUmTmcUVypKp9fDlyFUkOkEVAxW8dMxHJ0c83BPw37GkCvsR9itkzO0FpX0Zn+OvRQRkUCyzr>dgijhcH'

splitting a string into similar elements along with the values

I have a string like
$query = "date=20.10.2007&amount=400+date=11.02.2008&amount=1400+date=12.03.2008&amount=1500";
there are two variables named date and amount containing a value e.g date= 20.10.2007 and amount=400 and these two variables repeat itself with different values and each set (date & amount) are separated by '+' sign. Now i want to display this string like this:
Date Amount
20.10.2007 400
11.02.2008 1400
12.02.2008 1500
Need help
We can make judicious use of explode and preg_split here to get the output you want:
$query = "date=20.10.2007&amount=400+date=11.02.2008&amount=1400+date=12.03.2008&amount=1500";
$array = explode("+", $query);
$counter = 0;
echo "Date Amount\n";
foreach($array as $item) {
if ($counter > 0) echo "\n";
$parts = explode("&", $item);
echo preg_split("/=/", $parts[0])[1] . " ";
echo preg_split("/=/", $parts[1])[1];
$counter = $counter + 1;
}
This prints:
Date Amount
20.10.2007 400
11.02.2008 1400
12.03.2008 1500
The logic here is that we first split the query string on + to obtain components looking like:
date=20.10.2007&amount=400
Then, inside the loop over all such components, we split again by & to obtain the date and amount terms. Finally, each of these are split again on = to get the actual values.
Thanks a lot Tim Biegeleisen for your kind guidance. From your help i did it with this code below:
$str = "date=20.10.2007&amount=400+date=11.02.2008&amount=1400+date=12.03.2008&amount=1500"; $array = explode("+",$str);
$i = 0;
echo nl2br("Date Amount \n");
foreach($array as $item[$i])
{
parse_str($item[$i]);
echo $date;
echo $amount."<br>";
$i++;
}

Parse Binary file with Powershell

I am trying to search through a binary file. After reviewing the file via a hex editor I found patterns throughout the file. You can see them here. As you can see they are before and after the file listing.
/% ......C:\Users\\Desktop\test1.pdf..9
/% ......C:\Users\\Desktop\testtesttesttest.pdf..9
What I woudld like to do is find ..9 (HEX = 000039), and then "backup" until I find, /% ...... (hex = 2F25A01C1000000000), then move forward x amount of bytes so I can get the complete path. The code I have now is below:
$file = 'C:\Users\<username>\Desktop\bc03160ee1a59fc1.automaticDestinations-ms'
$begin_pattern = '2F25A01C1000000000' #/% ......
$end_pattern = '000039' #..9
$prevBytes = '8'
$bytes = [string]::join('', (gc $file -en byte | % {'{0:x2}' -f $_}))
[regex]::matches($bytes, $end_pattern) |
% {
$i = $_.index - $prevBytes * 2
[string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)])
}
Some of the output roughly translates to this:
ffff2e0000002f000000300000003b0000003200000033000000340000003500000036000000370000003800
655c4465736b746f705c466f72656e7369635f426f6f6b735c5b656e5d646566745f6d616e75616c2e706466
0000000000000000000000000000010000000a00000000000000000020410a000000000000000a00000000
ÿÿ./0;2345678?e\Desktop\deft_manual.pdf?
?sic Science, Computers, and the Internet.pdf
?ware\Desktop\Dive Into Python 3.pdf?
You can use the System.IO.BinaryReader class from PowerShell.
$path = "<yourPathToTheBinaryFile>"
$binaryReader = New-Object System.IO.BinaryReader([System.IO.File]::Open($path, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite))
Then you have access to all the methods like:
$binaryReader.BaseStream.Seek($pos, [System.IO.SeekOrigin]::Begin)
AFAIK, no easy way to "find" a pattern without reading the bytes (using ReadBytes) and implementing the search yourself.

Text parsing in Powershell: Identify a target line and parse the next X lines to create objects

I am parsing text output from a disk array that lists information about LUN snapshots in a predictable format. After trying every other way to get this data out of the array in a useable manner, the only thing I can do is generate this text file and parse it. The output looks like this:
SnapView logical unit name: deleted_for_security_reasons
SnapView logical unit ID: 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
Target Logical Unit: 291
State: Inactive
This repeats all through the file with one line break between each group. I want to identify a group, parse each of the four lines, create a new PSObject, add the value for each line as a new NoteProperty, and then add the new object to a collection.
What I can figure out is, once I identify the first line in the block of four lines, how to then process the text from lines two, three, and four. I'm looping through each line, finding the start of a block, and then processing it. Here's what I have so far, with comments where the magic goes:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
foreach ($line in $snaps)
{
if ([regex]::ismatch($line,"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value $line.replace("SnapView logical unit name: ","")
#Go to the next line and add the UID
#Go to the next line and add the TLU
#Go to the next line and add the State
$snapObjects += $snapObject
}
}
I have scoured the Google and StackOverflow attempting to figure out how I can reference the line number of the object I'm iterating through, and I can't figure it out. I may rely on foreach loops too much and so that's affecting my thinking, I don't know.
As you say, I think you're thinking too much foreach when you should be thinking for. The below modification should be more along the lines of what you're looking for:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
for ($i = 0; $i -lt $snaps.length; $i++)
{
if ([regex]::ismatch($snaps[$i],"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value ($snaps[$i]).replace("SnapView logical unit name: ","")
# $snaps[$i+1] Go to the next line and add the UID
# $snaps[$i+2] Go to the next line and add the TLU
# $snaps[$i+3] Go to the next line and add the State
$snapObjects += $snapObject
}
}
A while loop may be even cleaner because then you can increment $i by 4 instead of 1 when you hit this case, but since the other 3 lines won't trigger the "if" statement... there's no danger, just a few wasted cycles.
Another possibility
function Get-Data {
$foreach.MoveNext() | Out-Null
$null, $returnValue = $foreach.Current.Split(":")
$returnValue
}
foreach($line in (Get-Content "C:\test.dat")) {
if($line -match "SnapView logical unit name") {
$null, $Name = $line.Split(":")
$ID = Get-Data
$Unit = Get-Data
$State = Get-Data
New-Object PSObject -Property #{
Name = $Name.Trim()
ID = ($ID -join ":").Trim()
Unit = $Unit.Trim()
State = $State.Trim()
}
}
}
Name ID Unit State
---- -- ---- -----
deleted_for_security_reasons 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX 291 Inactive
switch -regex -file C:\powershell\snaplist.txt {
'^.+me:\s+(\S*)' {$SnapName = $Matches[1]}
'^.+ID:\s+(\S*)' {$UID = $Matches[1]}
'^.+it:\s+(\S*)' {$TLU = $Matches[1]}
'^.+te:\s+(\S*)' {
New-Object PSObject -Property #{
SnapName = $SnapName
UID = $UID
TLU = $TLU
State = $Matches[1]
}
}
}
try this
Get-Content "c:\temp\test.txt" | ConvertFrom-String -Delimiter ": " -PropertyNames Intitule, Value
if you have multiple packet try this
$template=#"
{Data:SnapView logical unit name: {UnitName:reasons}
SnapView logical unit ID: {UnitId:12:3456:Zz}
Target Logical Unit: {Target:123456789}
State: {State:A State}}
"#
Get-Content "c:\temp\test.txt" | ConvertFrom-String -TemplateContent $template | % {
[pscustomobject]#{
UnitName=$_.Data.UnitName
UnitId=$_.Data.UnitId
Target=$_.Data.Target
State=$_.Data.State
}
}

parsing issue with comma separated csv file

I am trying to extract 4th column from csv file (comma separated, and skipping first 2 header lines) using this command,
awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more
However, it doesn't work because the first column cantains comma, thus 4th column is not really 4th. Below is an example of a row:
"sdfsdfsd, sfsdf", 454,fgdfg, I_want_this_column,sdfgdg,34546, 456465, etc
Unless you have specific reasons for using awk, I would recommend using a CSV parsing library. Many scripting languages have one built-in (or at least available) and they'll save you from these headaches.
if your first column has quotes always,
$ awk 'BEGIN{ FS="\042[ ]*," } { m=split($2,a,","); print a[3] } ' file
I_want_this_column
if the column you want is always the last 2nd,
$ awk -F"," '{print $(NF-1)}' file
I_want_this_column
You can try this demo script to break down the columns
awk 'BEGIN{ FS="," }
{
for(i=1;i<=NF;i++){
# save normal
if($i !~ /^[ ]*\042|[ ]*\042[ ]*$/){
a[++j]=$i
}
# if quotes at the end
if(f==1 && $i ~ /[ ]*\042[ ]*$/){
s=s","$i
a[++j]=s
#reset
s="";f=0
}
# if quotes in front
if($i ~ /^[ ]*\042/){
s=s $i
f=1
}
if(f==1 && ( $i !~/\042/ ) ){
s=s","$i
}
}
}
END{
# print columns
for(p=1;p<=j;p++){
print "Field "p,": "a[p]
}
} ' file
output
$ cat file
"sdfsdfsd, sfsdf", "454,fgdfg blah , words ", I_want_this_column,sdfgdg
$ ./shell.sh
Field 1 : "sdfsdfsd, sfsdf"
Field 2 : fgdfg blah
Field 3 : "454,fgdfg blah , words "
Field 4 : I_want_this_column
Field 5 : sdfgdg
You shouldn't use awk here. Use Python csv module or Perl Text::CSV or Text::CSV_XS modules or another real csv parser.
Related question -
parse csv file using gawk
If you can't avoid awk, this piece of code does the job you need:
BEGIN {FS=",";}
{
f=0;
j=0;
for (i = 1; i <=NF ; ++i) {
if (f) {
a[j] = a[j] "," $(i);
if ($(i) ~ "\"$") {
f = 0;
}
}
else {
++j;
a[j] = $(i);
if ((a[j] ~ "^\"[^\"]*$")) {
f = 1;
}
}
}
for (i = 1; i <= j; ++i) {
gsub("^\"","",a[i]);
gsub("\"$","",a[i]);
gsub("\"\"","\"",a[i]);
print "i = \"" a[i] "\"";
}
}
Working with CSV files that have quoted fields with commas inside can be difficult with the standard UNIX text tools.
I wrote a program called csvquote to make the data easy for them to handle. In your case, you could use it like this:
csvquote filename.csv | awk 'NR <2 {next}{FS =","}{print $4}' | csvquote -u | more
or you could use cut and tail like this:
csvquote filename.csv | tail -n +3 | cut -d, -f4 | csvquote -u | more
The code and docs are here: https://github.com/dbro/csvquote

Resources