Powershell parse parts of a text file and save to CSV - parsing

All, I'm very new to powershell and am hoping someone can get me going on what I think would be a simple script.
I need to parse a text file, capture certain lines from it, and save those lines as a csv file.
For example, each alert is in its own text file. Each file is similar to this:
--start of file ---
Name John Smith
Dept Accounting
Codes bas-2349,cav-3928,deg-3942
iye-2830,tel-3890
Urls hxxp://blah.com
hxxp://foo.com, hxxp://foo2.com
Some text I dont care about
More text i dont care about
Comments
---------
"here is a multi line
comment I need
to capture"
Some text I dont care about
More text i dont care about
Date 3/12/2013
---END of file---
For each text file if I wanted to write only Name, Codes, and Urls to a CSV file. Could someone help me get going on this?
I'm more a PERL guy so I know I could write a regex for capturing a single line beginning with Name. However I am completely lost on how I could read the "Codes" line when it might be one line or it might be X lines long until I run into the Urls field.
Any help would be greatly appreciated!

Text parsing usually means regex. With regex, sometimes you need anchors to know when to stop a match and that can make you care about text you otherwise wouldn't. If you can specify that first line of "Some text I don't care about" you can use that to "anchor" your match of the URLs so you know when to stop matching.
$regex = #'
(?ms)Name (.+)?
Dept .+?
Codes (.+)?
Urls (.+)?
Some text I dont care about.+
Comments
---------
(.+)?
Some text I dont care about
'#
$file = 'c:\somedir\somefile.txt'
[IO.File]::ReadAllText($file) -match $regex
if ([IO.File]::ReadAllText($file) -match $regex)
{
$Name = $matches[1]
$Codes = $matches[2] -replace '\s+',','
$Urls = $matches[3] -replace '\s+',','
$comment = $matches[4] -replace '\s+',' '
}
$Name
$Codes
$Urls
$comment

If the file is not too big to be processed in memory, the simple way is to read it as an array of strings. (What too big means is subject to your system. Anything sub-gigabyte should work without too much a hickup.)
After you've read the file, set up a head and tail counters to point to element zero. Move the tail pointer row by row forward, until you find the date row. You can match data with regexps. Now you know the start and end of a single record. For the next record, set head counter to tail+1, tail to tail+2 and start scanning rows again. Lather, rinse, repeat until end of array is reached.
When a record is matched, you can extract name with a regex. Codes and Urls are a bit trickier. Match the Codes row with a regex. Extract it and all the next rows unless they do not match the code pattern. Same goes to Urls data. If the file always has whitespace padding on rows that are data to previous Urls and Codes, you could use match whitespace count with a regexp to get data rows too.

Maybe something line this would to it:
foreach ($Line in gc file.txt) {
switch -regex ($Line) {
'^(Name|Dept|Codes|Urls)' {
$Capture = $true
break
}
'^[A-Za-z0-9_-]+' {
$Capture = $false
break
}
}
if ($Capture) {
$Line
}
}
If you want the end result as a CSV file then you may use the Export-Csv cmdlet.

According the fact that c:\temp\file.txt contains :
Name John Smith
Dept Accounting
Codes bas-2349,cav-3928,deg-3942
iye-2830,tel-3890
Urls hxxp://blah.com
hxxp://foo.com
hxxp://foo2.com
Some text I dont care about
More text i dont care about
.
.
Date 3/12/2013
You can use regular expressions like this :
$a = Get-Content C:\temp\file.txt
$b = [regex]::match($a, "^.*Codes (.*)Urls (.*)Some.*$", "Multiline")
$codes = $b.groups[1].value -replace '[ ]{2,}',','
$urls = $b.groups[2].value -replace '[ ]{2,}',','

If all files have the same structure you could do something like this:
$srcdir = "C:\Test"
$outfile = "$srcdir\out.csv"
$re = '^Name (.*(?:\r\n .*)*)\r\n' +
'Dept .*(?:\r\n .*)*\r\n' +
'Codes (.*(?:\r\n .*)*)\r\n' +
'Urls (.*(?:\r\n .*)*)' +
'[\s\S]*$'
Get-ChildItem $srcdir -Filter *.txt | % {
[io.file]::ReadAllText($_.FullName)
} | Select-String $re | % {
$f = $_.Matches | % { $_.Groups } | ? { $_.Index -gt 0 }
New-Object -TypeName PSObject -Prop #{
'Name' = $f[0].Value;
'Codes' = $f[1].Value;
'Urls' = $f[2].Value;
}
} | Export-Csv $outfile -NoTypeInformation

Related

I need to get numbers out of a website script after a certain string

I'm trying to get a certain string of numbers (the numbers vary in length on each reload) out of a script tag on a website. However, I am struggling to figure out how to do it as I am stuck with PowerShell v2 and cannot upgrade it higher.
I've managed to get the full script by getting element by loading the site in IE and getting element by tag name "script" and I've attempted to try some regex to find the string but can't quite figure it out.
I have also tried stripping the characters off the front and back of the script, that's when I realised the lengths of the numbers change each time.
Part of the script is:
var value = document.wizform.selActivities.options[document.wizform.selActivities.selectedIndex].value;
if (value == "Terminate") {
if (confirm("Are you sure you want to terminate the selected business process(es)?")) {
document.wizform.action = "./Page?next=page.actionrpt&action=terminate&pos=0&1006999619";
javascript:document.wizform.submit();
}
} else if (value == "TerminateAndRestart") {
if (confirm("Are you sure you want to terminate and restart the selected business process(es)?")) {
document.wizform.action = "./Page?next=page.actionrpt&action=terminateandrestart&pos=0&237893352";
javascript:document.wizform.submit();
}
}
The part I want to capture is the numbers here
document.wizform.action = "./Page?next=page.actionrpt&action=terminateandrestart&pos=0&237893352";
The PowerShell code I have so far is
$checkbox = $ie.Document.getElementsByTagName("script") | Where-Object {
$_.outerHTML -like "*./Page?next=page.actionrpt&action=terminate*"
} # | select -Expand outerHTML
$content = $checkbox
$matches = [regex]::Matches($content, '".\action=terminate\.([^"]+)')
$matches | ForEach-Object {
$_.Groups[1].Value
}
What I would like is PowerShell to have just the number as a variable, so in the example above I would like to be able to have either 0&237893352 or just 237893352 (as the note does not change, so I can add the 0& back in after if I need to).
Use a positive lookbehind assertion for matching the particular action you're interested in:
$re = '(?<=action=terminateandrestart&pos=)0&\d+'
$content |
Select-String -Pattern $re |
Select-Object -Expand Matches |
Select-Object -Expand Value
(?<=...) is a regular expression construct called "positive lookbehind assertion" that allows for matching something that is preceded by a particular string (in your case "action=terminateandrestart&pos=") without making that string part of the returned match. This way you can look for the string "action=terminateandrestart&pos=" followed by "0&" and one or more digits (\d+) and return only "0&" and the digits.

Join multiple lines into One (.cap file) CentOS

Single entry has multiple lines. Each entry is separated by two blank lines.
Each entry has to be made into a single line followed by a delimiter(;).
Sample Input:
Name:Sid
ID:123
Name:Jai
ID:234
Name:Arun
ID:12
Tried replacing the blank lines with cat test.cap | tr -s [:space:] ';'
Output:
Name:Sid;ID:123;Name:Jai;ID:234;Name:Arun;ID:12;
Expected Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Same is the case with Xargs.
I've used sed command as well but it only joined two lines into one. Where as I've 132 lines as one entry and 1000 such entries in one file.
You may use
cat file | awk 'BEGIN { FS = "\n"; RS = "\n\n"; ORS=";" } { gsub(/\n/, "", $0); print }' | sed 's/;;*$//' > output.file
Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12
Notes:
FS = "\n" will set field separators to a newline`
RS = "\n\n" will set your record separators to double newline
gsub(/\n/, "", $0) will remove all newlines from a found record
sed 's/;;*$//' will remove the trailing ; added by awk
See the online demo
Could you please try following.
awk 'NF{val=(val?$0~/^ID/?val $0";":val $0:$0)} END{print val}' Input_file
Output will be as follows.
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Explanation: Adding explanation of above code too now.
awk ' ##Starting awk program here.
NF{ ##Checking condition if a LINE is NOT NULL and having some value in it.
val=(val?$0~/^ID/?val $0";":val $0:$0) ##Creating a variable val here whose value is concatenating its own value along with check if a line starts with string ID then add a semi colon at last else no need to add it then.
}
END{ ##Starting END section of awk here.
print val ##Printing value of variable val here.
}
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed):
sed -r '/./{N;s/\n//;H};$!d;x;s/.//;s/\n|$/;/g' file
If it is not a blank line, append the following line and remove the newline between them. Append the result to the hold space and if it is not the end of the file, delete the current line. At the end of the file, swap to the hold space, remove the first character (which will be a newline) and then replace all newlines (append an extra semi-colon for the last line only) with semi-colons.

PowerShell Parse INF file

I am trying to parse an INF; specifically, driver version from the file. I am new to PowerShell, so I've gotten only this far.
The file looks like this:
[Version]
Signature = "$WINDOWS NT$"
Class = Bluetooth
ClassGuid = {e0cbf06c-cd8b-4647-bb8a-263b43f0f974}
Provider = %PROVIDER_NAME%
CatalogFile = ibtusb.cat
DriverVer=11/04/2014,17.1.1440.02
CatalogFile=ibtusb.cat
The second last line has the information I am looking for. I am trying to parse out just 17.1.1440.02.
One file may contain multiple lines with DriverVer=..., but I am only interested in the first instance.
Right now I've the following script.
$path = "C:\FilePath\file.inf"
$driverVersoin = Select-String -Pattern "DriverVer" -path $path
$driverVersoin[0] # lists only first instance of 'DriverVer'
$driverVersoin # lists all of the instances with 'DriverVer'
Output is:
Filepath\file.inf:7:DriverVer=11/04/2014,17.1.1440.02
But I am only looking for 17.1.1440.02
Make your expression more specific and make the part you want to extract a capturing group.
$pattern = 'DriverVer\s*=\s*(?:\d+/\d+/\d+,)?(.*)'
Select-String -Pattern $pattern -Path $path |
select -Expand Matches -First 1 |
% { $_.Groups[1].Value }
Regular expression breakdown:
DriverVer\s*=\s* matches the string "DriverVer" followed by any amount of whitespace, an equals sign and again any amount of whitespace.
(?:\d+/\d+/\d+,)? matches an optional date followed by a comma in a non-capturing group ((?:...)).
(.*) matches the rest of the line, i.e. the version number you want to extract. The parentheses without the ?: make it a capturing group.
Another option (if the version number is always preceded by a date) would be to just split the line at the comma and select the last field (index -1):
Get-Content $path |
Where-Object { $_ -like 'DriverVer*' } |
Select-Object -First 1 |
ForEach-Object { $_.Split(',')[-1] }

Given repeating sections, how do I find sections matching certain criteria using Powershell

I need to parse a text file and retrieve data from it... based on other data in the same file..
I need to find the lines that say not ok.. and then find the Nodes they are under..
I know how to pull the data in..and find the Not Ok's and the Nodes. I also have an idea that I'm sure is overly complicated to find what I'm looking for. I can parse the Node lines into an array so like
$test = (select-string -path C:\error.txt -Pattern "Node:").linenumber
then find the line number of the not oks and backup lol but this seems like the most difficult way to do this. I'm familiar with PS but not an expert.
$test2 = (select-string -path C:\error.txt -Pattern "Not ok").linenumber
so to spell out what I need ..
parse file for Node.. find lines below that are not ok.. and if so set node to variable...if not ok isn't found move on to next node.
Thanks for any help
example txt file below
Node: Server
*********************
Line 1 ok
line 2 ok
line 3 ok
Line 4 Not ok
line 5 ok
line 6 ok
*********************
Node: Server2
*********************
Line 1 ok
line 2 ok
line 3 Not ok
Line 4 ok
line 5 ok
line 6 ok
*********************
$errorNodes = #()
Get-Content C:\temp\test.txt | ForEach-Object {
if ($_ -imatch 'Node: (.+)$') {
$node = $Matches[1]
}
if ($_ -imatch 'not ok') {
$errorNodes += $node
}
}
$errorNodes
Explanation
Get-Content reads a file line by line.
For each line, first check to see if it's a node; if so, set the $node variable to the current node's name.
Then check to see if the line matches the text 'not ok'. If so, add the node name to the list of error nodes (the array variable $errorNodes.
So at the end, $errorNodes will contain the nodes with problems.
If your list is long, this should be a quicker way to parse (also less code :)):
$nodes = [Regex]::Split((Get-Content info.txt), 'Node:')
# '?' is an alias for Where-Object
$bad = $nodes | ? { $_.ToLower().Contains('not ok') }
$bad now also contains all the text under the node containing "not ok" (in the even there are multiple lines that are not ok).
This answer is most likely more complicated than it needs to be but it returns useful objects that, depending what else op needs to do in his code, can be useful for further processing. For this example I used the file structure of the OP and added some extra nodes to make the output a little more verbose.
$file = Get-Content "c:\temp\test.txt" -Raw
$file -split '\s+(?=Node: \w+)' |
%{ $stringData = (($_ -replace ": "," = ") -replace 'line\W+(\d+)\W+','Line$1 = ') -replace '\*+'
New-Object PSObject -Property $(ConvertFrom-StringData $Stringdata)
} | select node,line* | Format-Table
Using PowerShell 3.0: The code will read the file as a whole string (Not creating a string array) using the -Raw parameter. The $file is the string split at the text "Node: " which will break up the Nodes as separate objects.
In order to create the custom object we need to make sure all the items of the Node contain name=value pairs. To accomplish this I nested some -replace operations.
$_ -replace ": "," = " - To change the first line to "Node = Servername"
-replace 'line\W+(\d+)\W+','Line$1 = ' - Convert Line # ok into Line# = Ok\Not Ok Where # is the particular line 1-6
-replace '\*+' - To remove the lines that contain just astericks ( or whatever the plural of astericks is)
The formated string is used as input for New-Object PSObject -Property $(ConvertFrom-StringData $Stringdata)
After that we can control the piped output like we would almost any other object. To ensure that node appears first in the list the select-object statement.
The following is my sample output:
Node Line4 Line5 Line6 Line1 Line2 Line3
---- ----- ----- ----- ----- ----- -----
Server Not ok ok ok ok ok ok
Server2 ok ok ok ok ok Not ok
Server4 ok ok ok Not ok ok ok
Server3 ok ok ok ok ok ok

Parse out spaces in between words in PowerShell

My goal is to eventually generate a CSV file that I can use in PowerShell. I am having troubles with the delimiter aspect of the command though.
My problem is that there are different amounts of white spaces in between the headers. Should I try to edit all the white spaces and replace them with a "," so I can import-csv easily? If so how? Is there another way to convert this to a CSV easily?
At first I tried to replace every space with a "," but it obviously didnt turn out right.. Here's what I have so far.
$Data = GC $FileLocation|Select -skip 1|%{
[PSCustomObject][Ordered]#{
"Interface"=$_.substring(0,40).Trim()
"IP-Address"=$_.Substring(41,24).Trim()
"OK?"=$_.Substring(66,4).Trim()
"Method"=$_.Substring(70,7).Trim()
"Status"=$_.Substring(76,35).Trim()
"Protocol"=$_.Substring(112,($_.length - 111))
}
}
$data
Try this, it will replace one or multiple instances of white space with one ,
$data = Get-Content $filelocation
$cleanedUpData = $data -replace "\s+",","
$csv = Convertfrom-Csv $cleanedUpData
$csv ## this line will output the resultant csv object
Actually going to go a whole other route with this. Since you know where the columns should be I would suggest pulling substrings and trimming them to get your data. Check this out:
$Data = GC $FileLocation|Select -skip 1|%{
[PSCustomObject][Ordered]#{
"Interface"=$_.substring(0,40).Trim()
"IP-Address"=$_.Substring(41,24).Trim()
"OK?"=$_.Substring(66,4).Trim()
"Method"=$_.Substring(70,7).Trim()
"Status"=$_.Substring(76,35).Trim()
"Protocol"=$_.Substring(112,($_.length - 112))
}
}
Now, I do not know where the actual columns line up so you will need to modify the SubString qualifiers for each property listed there. Remember that it is column# (starting at 0, not 1), and then how many characters you want to capture (everything up to the next field, including spaces since it trims those for you). The numbers listed are just some fabricated ones that worked for the test I ran, yours will be different!

Resources