Parse through text file and write out data - parsing

I'm working on the first steps towards creating a powershell script that will read through printer logs (probably using get-WMI cmdlet), and parse through the logs. Afterwards, I plan on having the script output to a .txt file the name of the printer, a counter of the number of times a printer was used (if possible), and specific info found in the logs.
In order to do this, I've decided to try working backwards. Below is a small portion of what the logs will look like:
10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2 20130219123105.000000-300
10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
Working backwards and just focusing on parsing first, I'd like to be able to specifically get the "/GRP", "R3446 (in general, R** as this is the printer name)", and get a counter that shows how often a specific printer appeared in the log files.
It has been a while since I last worked with Powershell, however at the moment this is what I've managed to create in order to try accomplishing my goal:
Select-String -Path "C:\Documents and Settings\a411882\My Documents\Scripts\Print Parse Test.txt" -Pattern "/GPR", " R****" -AllMatches -SimpleMatch
The code does not produce any errors, however I'm also unable to get any output to appear on screen to see if I'm capturing the /GRP and printer name. At the moment I'm trying to just ensure I'm gathering the right output before worrying about any counters. Would anyone be able to assist me and tell me what I'm doing wrong with my code?
Thanks!
EDIT: Fixed a small error with my code that was causing no data to appear on screen. At the moment this code outputs the entire two lines of test text instead of only outputting the /GPR and server name. The new output is the following:
My Documents\Scripts\Print Parse Test.txt:1:10 Document 81, A361058/GPR0000151814_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53704; pages printed: 2
20130219123105.000000-300
My Documents\Scripts\Print Parse Test.txt:2:10 Document 80, A361058/GPR0000151802_1: owned by A361058 was printed on
R3556 via port IP_***.***.***.***. Size in bytes: 53700; pages printed: 2
I'd like to try having it eventually look something like the following:
/GPR, R****, count: ## (although for now I'm less concerned about the counter)

You can try this. It only returns a line when /GPR (and "on" from "printed on") is present.
Get-Content .\test.txt | % {
if ($_ -match '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)') {
$_ -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}
}
Output:
/GPR,R3556
/GPR,R3556
I'm sure there are better regex versions. I'm still learning it :-)
EDIT this is easier to read. The regex is still there for extraction, but I filter out lines with /GPR first using select-string instead:
Get-Content .\test.txt | Select-String -SimpleMatch -AllMatches -Pattern "/GPR" | % {
$_.Line -replace '(?:.*)(/GPR)(?:.*)(?<=on\s)(\w+)(?:.*)', '$1,$2'
}

I generally start with an example of the line I'm matching, and build a regex from that, substituting regex metacharacters for the variable parts of the text. This makes makes the regex longer, but much more intuitive to read later.
Assign the regex to a variable, and then use that variable in subsequent code to keep the messy details of the regex from cluttering up the rest of the code:
[regex]$DocPrinted =
'Document \d\d, \w+/(\D{3})[0-9_]+: owned by \w+ was printed on (\w+) via port IP_[0-9.]+ Size in bytes: \d+; pages printed: \d+'
get-content <log file> |
foreach {
if ($_ -match $DocPrinted)
{
$line -match $docprinted > $null
$matches
}
}

Related

Sending IFS File to Outq Prints Line of "#" Symbols

I am attempting to send a file from IFS to an outq on our AS/400 system. Whenever I do, I get exactly what I send, as well as a line of "#" symbols of varying lengths appended to the end.
Here's the command I'm using:
qsh cmd('cat -c /path/test.txt | Rfile -wbQ -c "ovrprtf file(qprint)
outq(*LIBL/ABCD) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)"
qprint')
The contents of test.txt is just Hello World!
The output I get when I send the command is
Hello World!####################################################################
I have not found any posts online about a similar problem, and have tried changing values and looking for additional switches to get it to work. Nothing I'm doing seems to fix the issue.
Is there a command or switch that I am missing, or is something I have in there already causing this?
EDIT:
I found this documentation which is the first time I've seen this issue mentioned, but it's not very helpful:
“Messages for a Take Action command might consist of a long string of "at" symbols (#) in a pop-up message. (The Reflex automation Take Action command, which is configured in situations, does not have this problem.) A resolution for this problem is under construction. This problem might be resolved by the time of the product release. If you see this problem, contact IBM Software Support.”
The only differences are: 1) this is not a pop-up message, it's printed. 2) I don't believe we use Tivoli Monitoring, although I could be wrong.
Assuming we do use Tivoli Monitoring, what would the solution be? There's no additional documentation past that, and I am not a system administrator, so I can't really make the call to IBM Software Support myself. And assuming we DON'T use it, what else could cause this issue?
I get different results, yet similar. I created a test.txt with Windows Explorer, put in Hello, world!, saved it and tried the script. I got gibberish for the 'Hello, world!' and then the line of # symbols.
My system is 7.3 TR5, CCSID 37 (US English) and my IFS file is CCSID 1252 (Windows English). Results did not change if I used a stream file of CCSID 819 (US ASCII).
I didn't have any luck modifying Rfile switches.
I found that removing devtype(*userascii) produced printed output in plain English without the # symbols. Do you really need *USERASCII? I would think that would be more for a pre-formatted 'print-ready' file like Postscript or the like.
EDIT: some more things to try
I don't understand why *USERASCII is adding those # symbols; it looks like a translation issue.
I tried this and still got the extra ###... You might have to play with the TOCCSID() parameter. Although a failure, it did give me an idea: what if those # symbols are EBCDIC spaces being sent as-is to the *USERASCII print stream? All we'd need is a way to send only the number of bytes in the stream file, without any padding.
CRTPF FILE(QTEMP/PRTSTMF) RCDLEN(132)
CPY OBJ('/path/test.txt') TOOBJ('/qsys.lib/qtemp.lib/prtstmf.file/prtstmf.mbr') replace(*yes)
ovrprtf file(qprint) outq(*LIBL/prt3812) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)
cpyf prtstmf qprint
The data in QTEMP/PRTSTMF is in ASCII; DSPPFM shows that much. It also shows a bunch of spaces: after all, it is a fixed length file. My next step was to write an RPG program to read the stream file and print it, but Scott Klement already did that: http://www.scottklement.com/PrtStmf.zip
This works on my system:
ovrprtf file(qsysprt) outq(*LIBL/abcd) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)
prtstmf stmf('/path/test.txt') outq(abcd)

I want to count detected strings in file with grep, excluding a specific phrase

Hi I'm trying to search a log file for the following words and assign the number of matches to a variable as a number.
errors
error
fail
failure
can't
But I don't want to match the words error or errors if its preceeded by "No "
So ignore error if its "no error" and ignore errors if its "no errors"
Here's what I have so far
ErrorCheck=$(grep -vi "No errors" $LOGFILE | grep -ciE "error|fail|can't" $LOGFILE)
Its not working out for me unfortunatly, any suggestions would be great.
PS I'm using microcore running busybox shell, so I have a slightly lean environment to work in.
All comments and suggestions welcome.
Thanks for your input.
Well, I think one problem is that the "-v" flag for grep will basically omit the entire line containing 'No errors' or any other string you specify. So if a single line contains both "no failures" and "can't" for example, you'd have a problem.
One possible (sort of janky) way to do this could be to store values in three different variables: NUM_COUNTED minus NUM_NOTS = ErrorCheck, which should account for having a "no failure" and an actual failure indicator in the same line.
NUM_COUNTED=grep -ciF 'error
fail
can't' $LOGFILE
NUM_NOTS=grep -ciF 'no error
no fail' $LOGFILE
ErrorCheck=`expr $NUM_COUNTED - $NUM_NOTS`
Alternatively, this seems to be giving (mostly accurate) results:
ErrorCheck=$(grep -vi 'No errors' $LOGFILE | grep -ciF 'error
fail
can't' $LOGFILE)
The -F flag just tells grep to look for string literals (error, fail, and can't) which are newline separated.
Hope this helps.

Search string occurrence and display directory wise count

We have a error log directory structure wherein we store all errors log files for a particular day in datewise directories -
errorbackup/20150629/errorlogFile3453123.log.xml
errorbackup/20150629/errorlogFile5676934.log.xml
errorbackup/20150629/errorlogFile9812387.log.xml
errorbackup/20150628/errorlogFile1097172.log.xml
errorbackup/20150628/errorlogFile1908071_log.xml
errorbackup/20150627/errorlogFile5675733.log.xml
errorbackup/20150627/errorlogFile9452344.log.xml
errorbackup/20150626/errorlogFile6363446.log.xml
I want to search for a particular string in the error log file and get the output such that I will get directory wise search result of a count of that string's occurrence. For example grep "blahblahSQLError" should output something like-
20150629:0
20150628:0
20150627:1
20150626:1
This is needed because we fixed some errors in one of the release and I want to make sure that there are no occurrences of that error since the day it was deployed to Prod. Also note that there are thousands of error log files created every day. Each error log file is created with a random number in its name to ensure uniqueness.
If you are sure the filenames of the log files will not contain any "odd" characters or newlines then something like the following should work.
for dir in errorbackup/*; do
printf '%s:%s\n' "${dir#*/}" "$(grep -l blahblahSQLError "$dir/"*.xml | wc -l)"
done
If they can have unexpected names then you would need to use multiple calls to grep and count the matching files manually I believe. Something like this.
for dir in errorbackup/*; do
_dcount=0;
for log in "$dir"/*.xml; do
grep -l blahblahSQLError "$log" && _dcount=$((_dcount + 1));
done
done
Something like this should do it:
for dir in errorbackup/*
do
awk -v dir="${dir##*/}" -v OFS=':' '/blahblahSQLError/{c++} END{print dir, c+0}' "$dir"/*
done
There's probably a cuter way to do it with find and xargs to avoid the loop and you could certainly do it all within one awk command but life's too short....

Simple program that reads and writes to a pipe

Although I am quite familiar with Tcl this is a beginner question. I would like to read and write from a pipe. I would like a solution in pure Tcl and not use a library like Expect. I copied an example from the tcl wiki but could not get it running.
My code is:
cd /tmp
catch {
console show
update
}
proc go {} {
puts "executing go"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
fileevent $pipe readable [list piperead $pipe]
if {![eof $pipe]} {
puts $pipe "hello cat program!"
flush $pipe
set got [gets $pipe]
puts "result: $got"
}
}
go
The output is executing go\n result:, however I would expect that reading a value from the pipe would return the line that I have sent to the cat program.
What is my error?
--
EDIT:
I followed potrzebie's answer and got a small example working. That's enough to get me going. A quick workaround to test my setup was the following code (not a real solution but a quick fix for the moment).
cd /home/stephan/tmp
catch {
console show
update
}
puts "starting pipe"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
after 10
puts $pipe "hello cat!"
flush $pipe
set got [gets $pipe]
puts "got from pipe: $got"
Writing to the pipe and flushing won't make the OS multitasking immediately leave your program and switch to the cat program. Try putting after 1000 between the puts and the gets command, and you'll see that you'll probably get the string back. cat has then been given some time slices and has had the chance to read it's input and write it's output.
You can't control when cat reads your input and writes it back, so you'll have to either use fileevent and enter the event loop to wait (or periodically call update), or periodically try reading from the stream. Or you can keep it in blocking mode, in which case gets will do the waiting for you. It will block until there's a line to read, but meanwhile no other events will be responded to. A GUI for example, will stop responding.
The example seem to be for Tk and meant to be run by wish, which enters the event loop automatically at the end of the script. Add the piperead procedure and either run the script with wish or add a vwait command to the end of the script and run it with tclsh.
PS: For line-buffered I/O to work for a pipe, both programs involved have to use it (or no buffering). Many programs (grep, sed, etc) use full buffering when they're not connected to a terminal. One way to prevent them to, is with the unbuffer program, which is part of Expect (you don't have to write an Expect script, it's a stand-alone program that just happens to be included with the Expect package).
set pipe [open "|[list unbuffer grep .]" {RDWR}]
I guess you're executing the code from http://wiki.tcl.tk/3846, the page entitled "Pipe vs Expect". You seem to have omitted the definition of the piperead proc, indeed, when I copy-and-pasted the code from your question, I got an error invalid command name "piperead". If you copy-and-paste the definition from the wiki, you should find that the code works. It certainly did for me.

Capturing output from WshShell.Exec using Windows Script Host

I wrote the following two functions, and call the second ("callAndWait") from JavaScript running inside Windows Script Host. My overall intent is to call one command line program from another. That is, I'm running the initial scripting using cscript, and then trying to run something else (Ant) from that script.
function readAllFromAny(oExec)
{
if (!oExec.StdOut.AtEndOfStream)
return oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
return "STDERR: " + oExec.StdErr.ReadLine();
return -1;
}
// Execute a command line function....
function callAndWait(execStr) {
var oExec = WshShell.Exec(execStr);
while (oExec.Status == 0)
{
WScript.Sleep(100);
var output;
while ( (output = readAllFromAny(oExec)) != -1) {
WScript.StdOut.WriteLine(output);
}
}
}
Unfortunately, when I run my program, I don't get immediate feedback about what the called program is doing. Instead, the output seems to come in fits and starts, sometimes waiting until the original program has finished, and sometimes it appears to have deadlocked. What I really want to do is have the spawned process actually share the same StdOut as the calling process, but I don't see a way to do that. Just setting oExec.StdOut = WScript.StdOut doesn't work.
Is there an alternate way to spawn processes that will share the StdOut & StdErr of the launching process? I tried using "WshShell.Run(), but that gives me a "permission denied" error. That's problematic, because I don't want to have to tell my clients to change how their Windows environment is configured just to run my program.
What can I do?
You cannot read from StdErr and StdOut in the script engine in this way, as there is no non-blocking IO as Code Master Bob says. If the called process fills up the buffer (about 4KB) on StdErr while you are attempting to read from StdOut, or vice-versa, then you will deadlock/hang. You will starve while waiting for StdOut and it will block waiting for you to read from StdErr.
The practical solution is to redirect StdErr to StdOut like this:
sCommandLine = """c:\Path\To\prog.exe"" Argument1 argument2"
Dim oExec
Set oExec = WshShell.Exec("CMD /S /C "" " & sCommandLine & " 2>&1 """)
In other words, what gets passed to CreateProcess is this:
CMD /S /C " "c:\Path\To\prog.exe" Argument1 argument2 2>&1 "
This invokes CMD.EXE, which interprets the command line. /S /C invokes a special parsing rule so that the first and last quote are stripped off, and the remainder used as-is and executed by CMD.EXE. So CMD.EXE executes this:
"c:\Path\To\prog.exe" Argument1 argument2 2>&1
The incantation 2>&1 redirects prog.exe's StdErr to StdOut. CMD.EXE will propagate the exit code.
You can now succeed by reading from StdOut and ignoring StdErr.
The downside is that the StdErr and StdOut output get mixed together. As long as they are recognisable you can probably work with this.
Another technique which might help in this situation is to redirect the standard error stream of the command to accompany the standard output.
Do this by adding "%comspec% /c" to the front and "2>&1" to the end of the execStr string.
That is, change the command you run from:
zzz
to:
%comspec% /c zzz 2>&1
The "2>&1" is a redirect instruction which causes the StdErr output (file descriptor 2) to be written to the StdOut stream (file descriptor 1).
You need to include the "%comspec% /c" part because it is the command interpreter which understands about the command line redirect. See http://technet.microsoft.com/en-us/library/ee156605.aspx
Using "%comspec%" instead of "cmd" gives portability to a wider range of Windows versions.
If your command contains quoted string arguments, it may be tricky to get them right:
the specification for how cmd handles quotes after "/c" seems to be incomplete.
With this, your script needs only to read the StdOut stream, and will receive both standard output and standard error.
I used this with "net stop wuauserv", which writes to StdOut on success (if the service is running)
and StdErr on failure (if the service is already stopped).
First, your loop is broken in that it always tries to read from oExec.StdOut first. If there is no actual output then it will hang until there is. You wont see any StdErr output until StdOut.atEndOfStream becomes true (probably when the child terminates). Unfortunately, there is no concept of non-blocking I/O in the script engine. That means calling read and having it return immediately if there is no data in the buffer. Thus there is probably no way to get this loop to work as you want. Second, WShell.Run does not provide any properties or methods to access the standard I/O of the child process. It creates the child in a separate window, totally isolated from the parent except for the return code. However, if all you want is to be able to SEE the output from the child then this might be acceptable. You will also be able to interact with the child (input) but only through the new window (see SendKeys).
As for using ReadAll(), this would be even worse since it collects all the input from the stream before returning so you wouldn't see anything at all until the stream was closed. I have no idea why the example places the ReadAll in a loop which builds a string, a single if (!WScript.StdIn.AtEndOfStream) should be sufficient to avoid exceptions.
Another alternative might be to use the process creation methods in WMI. How standard I/O is handled is not clear and there doesn't appear to be any way to allocate specific streams as StdIn/Out/Err. The only hope would be that the child would inherit these from the parent but that's what you want, isn't it? (This comment based upon an idea and a little bit of research but no actual testing.)
Basically, the scripting system is not designed for complicated interprocess communication/synchronisation.
Note: Tests confirming the above were performed on Windows XP Sp2 using Script version 5.6. Reference to current (5.8) manuals suggests no change.
Yes, the Exec function seems to be broken when it comes to terminal output.
I have been using a similar function function ConsumeStd(e) {WScript.StdOut.Write(e.StdOut.ReadAll());WScript.StdErr.Write(e.StdErr.ReadAll());} that I call in a loop similar to yours. Not sure if checking for EOF and reading line by line is better or worse.
You might have hit the deadlock issue described on this Microsoft Support site.
One suggestion is to always read both from stdout and stderr.
You could change readAllFromAny to:
function readAllFromAny(oExec)
{
var output = "";
if (!oExec.StdOut.AtEndOfStream)
output = output + oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
output = output + "STDERR: " + oExec.StdErr.ReadLine();
return output ? output : -1;
}

Resources