Split EDI X12 files using Powershell - powershell-2.0

I am likely recreating the wheel here but this is my stab and solving an issue partly and asking for community assistance to resolve the remaining.
My task is to split EDI X12 documents into their own file (ISA to IEA)
and CRLF each line separately (similar to ex. EDI2.EDI below).
Below is my Powershell script and example EDI documents 1, 2 and 3.
My script will successfully split a contiguous X12 EDI document from ISA to IEA and CRLF into a file so that one contiguous string becomes something more readable. This works well and will even handle any segment delimiter as well as any line delimiter.
My issue is dealing with non-contiguous documents (ex. EDI2) or combined (ex. EDI3). The source folder could have any of the formatted files shown below. If the file already contains the CRLF, then I just need to split it from ISA to IEA. My script is failing when i pull in CRLF'd files.
Could someone help me solving this?
$sourceDir = "Z:\temp\EDI\temp\"
$targetDir = "Z:\temp\EDI\temp\archive"
<##### F U N C T I O N S #####>
<#############################>
Function FindNewFile
{
Param (
[Parameter(mandatory=$true)]
[string]$filename,
[int]$counter)
$filename = Resolve-Path $filename
$validFileName = "{0}\{1} {2}{3}" -f $targetDir, #([system.io.fileinfo]$filename).DirectoryName,
([system.io.fileinfo]$filename).basename,
$counter, #"1", #([guid]::newguid()).tostring("N"),
([system.io.fileinfo]$filename).extension
Return $validFileName
}
<###### M A I N L I N E ######>
<#############################>
If(test-path $sourceDir)
{
$files = #(Get-ChildItem $sourceDir | Where {!$_.PsIsContainer -and $_.extension -eq ".edi" -and $_.length -gt 0})
"{0} files to process. . ." -f $files.count
If($files)
{
If(!(test-path $targetDir))
{
New-Item $targetDir -ItemType Directory | Out-Null
}
foreach ($file in $files)
{
$me = $file.fullname
# Get the new file name
$isaCount = 1
$newFile = FindNewFile $me $isaCount
$data = get-content $me
# Reset variables for each new file
$dataLen = [int] $data.length
$linDelim = $null
$textLine = $null
$firstRun = $True
$errorFlag = $False
for($x=0; $x -lt $data.length; $x++)
{
$textLine = $data.substring($x, $dataLen)
$findISA = "ISA{0}" -f $textLine.substring(3,1)
If($textLine.substring(0,4) -eq $findISA)
{
$linDelim = $textLine.substring(105, 1)
If(!($FirstRun))
{
$isaCount++
$newFile = FindNewFile $me $isaCount
}
$FirstRun = $False
}
If($linDelim)
{
$delimI = $textLine.IndexOf($linDelim) + 1
$textLine = $textLine.substring(0,$delimI)
$fLine = $textLine
add-content $newFile $fLine
$x += $fLine.length - 1
$dataLen = $data.length - ($x + 1)
}
Else
{
$errorFlag = $True
"`t=====> {0} is not a valid EDI X12 file!" -f $me
$x += $data.length
}
}
If(!($errorFlag))
{
"{0} contained {1} ISA's" -f $me, $isaCount
}
}
}
Else
{
"No files in {0}." -f $sourceDir
}
}
Else
{
"{0} does not exist!" -f $sourceDir
}
Filename: EDI1.EDI
ISA*00* *00* *08*925xxxxxx0 *01*78xxxx100 *170331*1630*U*00401*000000114*0*P*>~GS*FA*8473293489*782702100*20170331*1630*42*T*004010UCS~ST*997*116303723~SE*6*116303723~GE*1*42~IEA*1*000000114~ISA*00* *00* *08*WARxxxxxx *01*78xxxxxx0 *170331*1545*U*00401*000002408*0*T*>~GS*FA*5035816100*782702100*20170331*1545*1331*T*004010UCS~ST*997*000001331~~SE*24*000001331~GE*1*1331~IEA*1*000002408~
Filename: EDI2.EDI
ISA*00* *00* *ZZ*REINxxxxxxxDSER*01*78xxxx100 *170404*0819*|*00501*100000097*0*P*}~
GS*PO*REINHxxxxxxDSER*782702100*20170404*0819*1097*X*005010~
ST*850*1097~
SE*14*1097~
GE*1*1097~
IEA*1*100000097~
Filename: EDI3.EDI
ISA*00* *00* *08*925xxxxxx0 *01*78xxxx100 *170331*1630*U*00401*000000114*0*P*>~GS*FA*8473293489*782702100*20170331*1630*42*T*004010UCS~ST*997*116303723~SE*6*116303723~GE*1*42~IEA*1*000000114~ISA*00* *00* *08*WARxxxxxx *01*78xxxxxx0 *170331*1545*U*00401*000002408*0*T*>~GS*FA*5035816100*782702100*20170331*1545*1331*T*004010UCS~ST*997*000001331~~SE*24*000001331~GE*1*1331~IEA*1*000002408~
ISA*00* *00* *ZZ*REINxxxxxxxDSER*01*78xxxx100 *170404*0819*|*00501*100000097*0*P*}~
GS*PO*REINHxxxxxxDSER*78xxxxxx0*20170404*0819*1097*X*005010~
ST*850*1097~
SE*14*1097~
GE*1*1097~
IEA*1*100000097~
FWIW, I've compiled this code from all over the net including stackoverflow.com. If you see your code and desire recognition, let me know and I'll add it. I'm not claiming any of this is original! My motto is "ARRRGH!"

EDI3 is an invalid X12 document, each file should only contain one ISA segment with repeated envelopes if required.
The segment terminator should also be consistent. In EDI3 it is both ~ and ~ which is again invalid.

Segment terminator should be tilde "~".
It can be suffixed by: nothing, "\n" or, "\r\n", what is optional is the suffix for human reading. Some implementations might be more relaxed in terms of the X12 standard.
https://www.ibm.com/support/knowledgecenter/en/SS6V3G_5.3.1/com.ibm.help.gswformstutscreen.doc/GSW_EDI_Delimiters.html
https://docs.oracle.com/cd/E19398-01/820-1275/agdbj/index.html
https://support.microsoft.com/en-sg/help/2723596/biztalk-2010-configuring-segment-terminator-for-an-x12-encoded-interch
BTW, check my splitter/viewer: https://gist.github.com/ppazos/94a63ab18910ab0c0d23c9ff4ff7e5c2

Related

ProgressBar Overlay Or ProgressBar from Copy Functions

Function GetTotalBytesOfCopyDestination{
param($destinationPath);
$colItems = (Get-ChildItem $destinationPath | Measure-Object -property length -sum)
return $colItems.sum;
}
Function GetBytesOfFile{
param($sourcePath);
return (Get-Item $sourcePath).length;
}
Function GetPosition{
param([double]$currentOfBytesSended)
param([double]$countsOfBytesWillSend)
$position = ($currentOfBytesSended / $countsOfBytesWillSend) * 100;
#range 0 - 100
#(15800 bytes / 1975633689 bytes)*100
#
return $position;
}
Function Copy-File {
#.Synopsis
# Copies all files and folders in $source folder to $destination folder, but with .copy inserted before the extension if the file already exists
param($source,$Destination2)
# create destination if it's not there ...
mkdir $Destination2 -force -erroraction SilentlyContinue
[double]$currentOfBytesSended = 0;
[double]$countsOfBytesWillSend = 0;
[double]$countsOfBytesWillSend = GetTotalBytesOfCopyDestination($source);
$progressbar6.Maximum = 100;
$progressbar6.Step = 1;
foreach($original in ls $source -recurse) {
$result = $original.FullName.Replace($source,$Destination2)
while(test-path $result -type leaf){ $result = [IO.Path]::ChangeExtension($result,"copy$([IO.Path]::GetExtension($result))") }
[System.Windows.Forms.Application]::DoEvents()
if($original.PSIsContainer) {
mkdir $result -ErrorAction SilentlyContinue
} else {
copy $original.FullName -destination $result
[System.Windows.Forms.Application]::DoEvents()
$currentCopyingFileSizeInBytes = 0;
$currentCopyingFileSizeInBytes = GetBytesOfFile($original.FullName);
$currentOfBytesSended = [double]$currentOfBytesSended + [double]$currentCopyingFileSizeInBytes;
#$currentOfBytesSended += $currentCopyingFileSizeInBytes;
$progressbar6.Value=GetPosition([double]$currentOfBytesSended, [double]$countsOfBytesWillSend);
[System.Windows.Forms.Application]::DoEvents()
#$progressbar6.PerformStep();
$progressbar6.Refresh();
}
}
}
what I'm trying to get is Copy-File Function ,copy files & directory from remote machine to local machine while moving progress bar depending on total amount to copy,and which are already copied and it define position for the progress bar and i get this error
ERROR: GetPosition : Cannot process argument transformation on parameter 'currentOfBytesSended'. Cannot convert the "System.Object[]" value of type "System.Object[]"
ERROR: to type "System.Double".
Assit App.pff (337): ERROR: At Line: 337 char: 35
ERROR: + $progressbar6.Value=GetPosition([double]$currentOfBytesSended, [double]$coun ...
ERROR: + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ERROR: + CategoryInfo : InvalidData: (:) [GetPosition], ParameterBindingArgumentTransformationException
ERROR: + FullyQualifiedErrorId : ParameterArgumentTransformationError,GetPosition
ERROR:
>> Script Ended
First of all, welcome to StackOverflow. Try to break up your questions into individual questions, rather than grouping many of them together. The format of this site is better suited to one question, one answer, unless maybe there is guidance that states otherwise.
Excluding Folders
You can ignore folders by using the Exclude parameter on the Get-ChildItem command.
Imagine that you have a folder structure similar to the following:
c:\gci
|
|\a
| \a.txt
|\b
| \b.txt
|\c
| \c.txt
If you only want to get the contents of c:\gci\a and c:\gci\c, you can exclude c:\gci\b using the command:
Get-ChildItem -Path c:\gci -Exclude b* -Recurse;
Keep in mind that this will also exclude other items starting with "b" such as c:\gci\bcd.txt.
Progress Bar
You can create a progress bar using the Write-Progress command. You will have to write your own, custom logic to determine what tasks to report progress on, and what the percentage of progress is. You can do this based on the number of bytes copied vs. the total number of bytes to be copied, the number of files copied vs. the total number of files, or some other sort of metric.
There's no simple answer to this one. You will have to perform the calculations yourself, and call Write-Progress at the appropriate times.

PowerShell parsing a PDF and extracting multiple lines

I'm using iTextSharp to search a PDF for a keyword, and extract any line(s) that contain that keyword. What I'd like to do is not only extract the line(s) with the keyword but subsequent lines.
Line with keyword and the next line, Line with keyword and the next 2 lines, etc.
I've been hung up on this for awhile, trying arrays, hash tables, iterators...none of them are working right. Any help is appreciated. This is the basic design i've been working with:
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList anypdf.pdf
for ($page = 1; $page -le $reader.NumberOfPages; $page++) {
$lines = [char[]]$reader.GetPageContent($page) -join "" -split "`n"
foreach ($line in $lines) {
if ($line -match $searchstring) {
$line = $line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
$line = $line -replace "\\([\S])", $matches[1]
Write-host $line
}
}
}
I can't take credit for the logic that strips out the unwanted characters from the PDF, and that may be why I haven't figured this out yet. The above code gets me any line that contains the keyword. The problem seems to be the PDF is split into pages and those pages are split into lines (which are each an array of characters). It would be nice and efficient if I could simply create a hash table of every line in the PDF from the start.
That's what Select-String was invented for.
for ($page = 1; $page -le $reader.NumberOfPages; $page++) {
[char[]]$reader.GetPageContent($page) -join "" -split "`n" `
| Select-String $searchstring -Context 0,2 `
| % {
$_ -replace "^\[\(|\)\]TJ$", "" `
-split "\)\-?\d+\.?\d*\(" -join "" `
-replace "\\([\S])", $_.Matches.Value
}
}
I don't quite understand all the splitting and joinging and replacing you're doing there, so you may need to adjust that.
Also, the above doesn't include the after context, since I wouldn't know where you want it to go. It can be accessed via $_.Context.PostContext.

Why will Powershell write to the console, but not to a file?

I have a PowerShell script that sets flags based on various conditions of the file. I'll abbreviate for brevity.
$path = "c:\path"
$srcfiles = Get-ChildItem $path -filter *.htm*
ForEach ($doc in $srcfiles) {
$s = $doc.Fullname
Write-Host "Processing :" $doc.FullName
if (stuff) {flag 1 = 1} else {flag 1 = 0}
if (stuff) {flag 1 = 1} else {flag 1 = 0}
if (stuff) {flag 1 = 1} else {flag 1 = 0}
$t = "$s;$flag1;$flag2;$flag2"
Write-Host "Output: " $t
This all works great. My file processes, the flags are set properly, and a neat semicolon delimited line is generated as $t. However, if I slap these two lines at the end of the function,
$stream = [System.IO.StreamWriter] "flags.txt"
$stream.WriteLine $t
I get this error.
Unexpected token 't' in expression or statement.
At C:\CGC003a.ps1:53 char:25
+ $stream.WriteLine $t <<<<
+ CategoryInfo : ParserError: (t:String) [], ParseException
+ FullyQualifiedErrorId : UnexpectedToken
If I'm reading this write, it appears that write-host flushed my variable $t before it got to the WriteLine. Before I try out-file, though, I want to understand what's happening to $t after Write-Host that prevents Write Line from seeing it as valid. Or is there something else I'm missing?
try:
$stream.WriteLine($t)
writeline is a method of streamwriter .net object. To pass in value you need to enclose it in ( )
-if you need to append to a streamwriter you need to create it like this:
$a = new-object 'System.IO.StreamWriter' -ArgumentList "c:\path\to\flags.txt",$true
Where the boolean arguments can be true to append data to the file orfalse to overwrite the file.
I suggest to pass full path for:
$stream = [System.IO.StreamWriter] "c:\path\to\flags.txt"
otherwise you create the file in .net current folder ( probably c:\windows\system32 if run as administrator your current powershell console, to know it type [System.IO.Directory]::GetCurrentDirectory())
you could try
$t > c:\path\to\flags.txt

Using System.Drawing.Image to Export-Csv with $imageFile.Width and $_.Fullname

I wonder if there is a betterr way to write this script to gather image dimensions and filepaths. The script works great on small to medium size directories, but I'm not positive that 100,000+ files/folders is possible.
Measure-Command {
[Void][System.Reflection.Assembly]::LoadFile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll")
$path = "\\servername.corp.company.com\top_directory"
$data = Get-ChildItem -Recurse $path | % {
$imageFile = [System.Drawing.Image]::FromFile($_.FullName) ;
New-Object PSObject -Property #{
name = $_.Name
fullname = $_.Fullname
width = $imageFile.Width
height = $imageFile.Height
length = $_.Length
}
}
$data | Where-Object {$_.width -eq 500 -or $_.width -eq 250 -or $_.width -eq 1250 } |
Export-Csv \\servername.corp.company.com\top_directory\some_directory\log_file.csv -NoTypeInformation }
I don't actually use the Where-Object filter right now.
When running the above script on a remote directory with appx. 20,000 files + folders the script takes appx. 26 minutes, before creating a .csv.
I am running the script from Powershell V2 ISE on Windows 7 and I belive the remote server is on Windows Server 2003.
Would running the script directly from the remote server be faster?
Is the process of exoprting the csv slow since all data is collected in "cache" before being written to the csv?
If all I had to go through was 20,000 files, I'd wait the 26 minutes, but 500,000 files and folders is a long wait.
I'm testing out the below method since I think my real issue is not the speed, but storing too much data in the memory. Thanks to a post from George Howarth for this, and to PoSherLife for the top script -http://powershellcommunity.org/tabid/54/aft/4844/Default.aspx
Measure-Command {
[System.Reflection.Assembly]::LoadFile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll")
"Name|SizeInBytes|Width|Height|FullName" >> C:\Users\fcool\Documents\JPGInfo.txt
$path = "C:\Users\fcool\Documents"
$images = Get-ChildItem -Recurse $path -Include *.jpg
foreach ($image in $images)
{
$name = $image.Name
$length = $image.Length
$imageFile = [System.Drawing.Image]::FromFile($image.FullName)
$width = $imageFile.Width
$height = $imageFile.Height
$FullName = $image.Fullname
"$name|$length|$width|$height|$FullName" >> C:\Users\fcool\Documents\JPGInfo.txt
$imageFile.Dispose()
}
}
Is there any risk/loss of performance when running these scripts on non-image filetypes?
when I don't exclude non-images, I get this error:
Exception calling "FromFile" with "1" argument(s): "Out of memory."
At C:\scripts\directory_contents_IMAGE_DIMS_ALT_method.ps1:13 char:46
+ $imageFile = [System.Drawing.Image]::FromFile <<<< ($image.FullName)
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : DotNetMethodException
thanks for any advice!
and thanks again to George Howarth and to PoSherLife for the scripts!
Using -Filter with Get-ChildItem is much faster than -Include however you can only apply one filter string. So if you only want to match *.jpg you can use filter. In my testing using filter was close to 5 times faster than include.
Get-ChildItem -Recurse \\server\Photos -Filter *.jpg | % {
$image = [System.Drawing.Image]::FromFile($_.FullName)
if ($image.width -eq 500 -or $image.width -eq 250 -or $image.width -eq 1250) {
New-Object PSObject -Property #{
name = $_.Name
fullname = $_.Fullname
width = $image.Width
height = $image.Height
length = $_.Length
}
}
} | Export-Csv 'C:\log.csv' -NoTypeInformation

Text parsing in Powershell: Identify a target line and parse the next X lines to create objects

I am parsing text output from a disk array that lists information about LUN snapshots in a predictable format. After trying every other way to get this data out of the array in a useable manner, the only thing I can do is generate this text file and parse it. The output looks like this:
SnapView logical unit name: deleted_for_security_reasons
SnapView logical unit ID: 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
Target Logical Unit: 291
State: Inactive
This repeats all through the file with one line break between each group. I want to identify a group, parse each of the four lines, create a new PSObject, add the value for each line as a new NoteProperty, and then add the new object to a collection.
What I can figure out is, once I identify the first line in the block of four lines, how to then process the text from lines two, three, and four. I'm looping through each line, finding the start of a block, and then processing it. Here's what I have so far, with comments where the magic goes:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
foreach ($line in $snaps)
{
if ([regex]::ismatch($line,"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value $line.replace("SnapView logical unit name: ","")
#Go to the next line and add the UID
#Go to the next line and add the TLU
#Go to the next line and add the State
$snapObjects += $snapObject
}
}
I have scoured the Google and StackOverflow attempting to figure out how I can reference the line number of the object I'm iterating through, and I can't figure it out. I may rely on foreach loops too much and so that's affecting my thinking, I don't know.
As you say, I think you're thinking too much foreach when you should be thinking for. The below modification should be more along the lines of what you're looking for:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
for ($i = 0; $i -lt $snaps.length; $i++)
{
if ([regex]::ismatch($snaps[$i],"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value ($snaps[$i]).replace("SnapView logical unit name: ","")
# $snaps[$i+1] Go to the next line and add the UID
# $snaps[$i+2] Go to the next line and add the TLU
# $snaps[$i+3] Go to the next line and add the State
$snapObjects += $snapObject
}
}
A while loop may be even cleaner because then you can increment $i by 4 instead of 1 when you hit this case, but since the other 3 lines won't trigger the "if" statement... there's no danger, just a few wasted cycles.
Another possibility
function Get-Data {
$foreach.MoveNext() | Out-Null
$null, $returnValue = $foreach.Current.Split(":")
$returnValue
}
foreach($line in (Get-Content "C:\test.dat")) {
if($line -match "SnapView logical unit name") {
$null, $Name = $line.Split(":")
$ID = Get-Data
$Unit = Get-Data
$State = Get-Data
New-Object PSObject -Property #{
Name = $Name.Trim()
ID = ($ID -join ":").Trim()
Unit = $Unit.Trim()
State = $State.Trim()
}
}
}
Name ID Unit State
---- -- ---- -----
deleted_for_security_reasons 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX 291 Inactive
switch -regex -file C:\powershell\snaplist.txt {
'^.+me:\s+(\S*)' {$SnapName = $Matches[1]}
'^.+ID:\s+(\S*)' {$UID = $Matches[1]}
'^.+it:\s+(\S*)' {$TLU = $Matches[1]}
'^.+te:\s+(\S*)' {
New-Object PSObject -Property #{
SnapName = $SnapName
UID = $UID
TLU = $TLU
State = $Matches[1]
}
}
}
try this
Get-Content "c:\temp\test.txt" | ConvertFrom-String -Delimiter ": " -PropertyNames Intitule, Value
if you have multiple packet try this
$template=#"
{Data:SnapView logical unit name: {UnitName:reasons}
SnapView logical unit ID: {UnitId:12:3456:Zz}
Target Logical Unit: {Target:123456789}
State: {State:A State}}
"#
Get-Content "c:\temp\test.txt" | ConvertFrom-String -TemplateContent $template | % {
[pscustomobject]#{
UnitName=$_.Data.UnitName
UnitId=$_.Data.UnitId
Target=$_.Data.Target
State=$_.Data.State
}
}

Resources