Tailing a binary file in Erlang adds mysterious bit-string - erlang

I want to run tail on a named pipe to facilitate some binary logfile processing. The problem is that mysterious data is being added to the beginning of the stream. I run my tests by starting the erlang process with the opened port (open_port) and then I use another shell to cat the bin into the named pipe.
Here is a simple function for getting data from the port:
bin_from_tail() ->
open_port({spawn,"/usr/bin/tail -F named_pipe"},
[binary,in,eof]),
receive
{_,{data,<<Data/binary>>}} -> Data
end.
So here are two ways for me to grab the same data...
Create the named pipe
mkfifo named_pipe
This command blocks until you run "cat log.bin > named_pipe" from another shell
{ok,TailBin} = file:read_file(log.bin).
Read the entire file into memory using the erlang file library
FileBin = file:read_file(log.in).
But TailBin and FileBin are not the same! TailBin has a mysterious 120-byte string at the beginning:
<<40,6,161,69,172,216,56,14,100,0,80,6,0,0,0>>

Thanks for the idea about the endlessly looping cat/restarting a dead port. It appears that named pipes buffer just a little bit, so if the port opens up fast enough the writer process (another program) won't crash! Definitely risky stuff, but as far as hacks go... it works.
Because all the mailing list posts just said do this, do that without examples, I'm going to post how mine works! If anyone wants to offer up improvements, please feel free to do so. My solution:
read() ->
Port = open_port({spawn,"/bin/cat /path/to/pipe"},
[binary,in,eof]),
do_read(Port).
do_read(Port) ->
receive
{Port,{data,<<Data/binary>>}} ->
case do_something:with(Data) of
ok ->
io:format("G") % Good
Any ->
io:format("B") % Bad
end;
{Port,eof} ->
read();
Any ->
io:format("No match fifo_client:do_read/1, ~p~n",[Any])
end,
do_read(Port).

I found the same thing happened outside erlang. The problem is that tail is trying to show you the end of the file, not the whole file. If you use it on a normal file, anything written would be new, and picked up by -f, but in this case it looks like tail is waiting until the end of the file (the eof that comes through the pipe) and then showing the last 10 lines (treating the binary as text).
tail -F -c 9999999
(assuming your log is 9999999 bytes or less) would probably work.
Maybe try using cat instead of tail -F, that seemed to work for me. Then you just need to avoid the fact that cat exits upon eof, which I assume you were trying to avoid by using tail.
So a shell script which loops cat endlessly, maybe?
Or get erlang to restart close and recreate the port when it dies, since you're getting the eof signal anyway. Or use the exit_status flag to open_port to be signalled when the process exits, incase you need to distinguish eof and process exit. (If you use both exit_status and eof, the eof never comes, a brief test with cat < /dev/null indicates)

Related

How do I intercept the unbuffered output of a Proc::Async in Raku?

With a snippet like
# Contents of ./run
my $p = Proc::Async.new: #*ARGS;
react {
whenever Promise.in: 5 { $p.kill }
whenever $p.stdout { say "OUT: { .chomp }" }
whenever $p.ready { say "PID: $_" }
whenever $p.start { say "Done" }
}
executed like
./run raku -e 'react whenever Supply.interval: 1 { .say }'
I expected to see something like
PID: 1234
OUT: 0
OUT: 1
OUT: 2
OUT: 3
OUT: 4
Done
but instead I see
PID: 1234
OUT: 0
Done
I understand that this has to do with buffering: if I change that command into something like
# The $|++ disables buffering
./run perl -E '$|++; while(1) { state $i; say $i++; sleep 1 }'
I get the desired output.
I know that TTY IO::Handle objects are unbuffered, and that in this case the $*OUT of the spawned process is not one. And I've read that IO::Pipe objects are buffered "so that a write without a read doesn't immediately block" (although I cannot say I entirely understand what this means).
But no matter what I've tried, I cannot get the unbuffered output stream of a Proc::Async. How do I do this?
I've tried binding an open IO::Handle using $proc.bind-stdout but I still get the same issue.
Note that doing something like $proc.bind-stdout: $*OUT does work, in the sense that the Proc::Async object no longer buffers, but it's also not a solution to my problem, because I cannot tap into the output before it goes out. It does suggest to me that if I can bind the Proc::Async to an unbuffered handle, it should do the right thing. But I haven't been able to get that to work either.
For clarification: as suggested with the Perl example, I know I can fix this by disabling the buffering on the command I'll be passing as input, but I'm looking for a way to do this from the side that creates the Proc::Async object.
You can set the .out-buffer of a handle (such as $*OUT or $*ERR) to 0:
$ ./run raku -e '$*OUT.out-buffer = 0; react whenever Supply.interval: 1 { .say }'
PID: 11340
OUT: 0
OUT: 1
OUT: 2
OUT: 3
OUT: 4
Done
Proc::Async itself isn't performing buffering on the received data. However, spawned processes may do their own depending on what they are outputting to, and that's what is being observed here.
Many programs make decisions about their output buffering (among other things, such as whether to emit color codes) based on whether the output handle is attached to a TTY (a terminal). The assumption is that a TTY means a human is going to be watching the output, and thus latency is preferable to throughput, so buffering is disabled (or restricted to line buffering). If, on the other hand, the output is going to a pipe or a file, then the assumption is that latency is not so important, and buffering is used to achieve a significant throughput win (a lot less system calls to write data).
When we spawn something with Proc::Async, the standard output of the spawned process is bound to a pipe - which is not a TTY. Thus the invoked program may use this to decide to apply output buffering.
If you're willing to have another dependency, then you can invoke the program via. something that fakes up a TTY, such as unbuffer (part of the expect package, it seems). Here's an example of a program that is suffering from buffering:
my $proc = Proc::Async.new: 'raku', '-e',
'react whenever Supply.interval(1) { .say }';
react whenever $proc.stdout {
.print
}
We only see a 0 and then have to wait a long time for more output. Running it via unbuffer:
my $proc = Proc::Async.new: 'unbuffer', 'raku', '-e',
'react whenever Supply.interval(1) { .say }';
react whenever $proc.stdout {
.print
}
Means that we see a number output every second.
Could Raku provide a built-in solution to this some day? Yes - by doing the "magic" that unbuffer itself does (I presume allocating a pty - kind of a fake TTY). This isn't trivial - although it is being explored by the libuv developers; at least so far as Rakudo on MoarVM goes, the moment there's a libuv release available offering such a feature, we'll work on exposing it.

Broken pipe error in CCL Lisp

I am using CCL Lisp to run batches of experiments in parallel. On my machine, everything is running fine. However, I would like to use this on a server. When I execute this on a server, I always get the following error message:
> Error: on #<BASIC-CHARACTER-OUTPUT-STREAM UTF-8 (PIPE/7) #x302001C2725D> :
> Broken pipe during write
> While executing: #<CCL::STANDARD-KERNEL-METHOD CCL::STREAM-IO-ERROR (STREAM T T)>, in process listener(1).
My code always reaches the same point when trowing this error. An excerpt of the code is given below:
;; ... A really long function
;; write commands to processes
(format t ".. writing commands to process ~a:~%" counter)
(loop for c in commands
do
(format t " ~a~%" c)
(write-string c output-stream)
(princ #\lf output-stream))
(force-output t)
(force-output output-stream)
(finish-output output-stream)
#-lispworks
(close output-stream))
I think this error occurs inside the loop statement, since not all of the commands are written to the output stream.
How can I further debug this and solve this issue?
"Broken pipe" means that the process which is supposed to be reading from the pipe is dead when the Lisp process is writing to the pipe.
IOW, the problem is probably outside of Lisp. You need to see what is happening with the other process.
PS. You can combine your write-string and princ into a single write-line. Also, you don't need force-output if you are calling finish-output immediately.

Simple program that reads and writes to a pipe

Although I am quite familiar with Tcl this is a beginner question. I would like to read and write from a pipe. I would like a solution in pure Tcl and not use a library like Expect. I copied an example from the tcl wiki but could not get it running.
My code is:
cd /tmp
catch {
console show
update
}
proc go {} {
puts "executing go"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
fileevent $pipe readable [list piperead $pipe]
if {![eof $pipe]} {
puts $pipe "hello cat program!"
flush $pipe
set got [gets $pipe]
puts "result: $got"
}
}
go
The output is executing go\n result:, however I would expect that reading a value from the pipe would return the line that I have sent to the cat program.
What is my error?
--
EDIT:
I followed potrzebie's answer and got a small example working. That's enough to get me going. A quick workaround to test my setup was the following code (not a real solution but a quick fix for the moment).
cd /home/stephan/tmp
catch {
console show
update
}
puts "starting pipe"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
after 10
puts $pipe "hello cat!"
flush $pipe
set got [gets $pipe]
puts "got from pipe: $got"
Writing to the pipe and flushing won't make the OS multitasking immediately leave your program and switch to the cat program. Try putting after 1000 between the puts and the gets command, and you'll see that you'll probably get the string back. cat has then been given some time slices and has had the chance to read it's input and write it's output.
You can't control when cat reads your input and writes it back, so you'll have to either use fileevent and enter the event loop to wait (or periodically call update), or periodically try reading from the stream. Or you can keep it in blocking mode, in which case gets will do the waiting for you. It will block until there's a line to read, but meanwhile no other events will be responded to. A GUI for example, will stop responding.
The example seem to be for Tk and meant to be run by wish, which enters the event loop automatically at the end of the script. Add the piperead procedure and either run the script with wish or add a vwait command to the end of the script and run it with tclsh.
PS: For line-buffered I/O to work for a pipe, both programs involved have to use it (or no buffering). Many programs (grep, sed, etc) use full buffering when they're not connected to a terminal. One way to prevent them to, is with the unbuffer program, which is part of Expect (you don't have to write an Expect script, it's a stand-alone program that just happens to be included with the Expect package).
set pipe [open "|[list unbuffer grep .]" {RDWR}]
I guess you're executing the code from http://wiki.tcl.tk/3846, the page entitled "Pipe vs Expect". You seem to have omitted the definition of the piperead proc, indeed, when I copy-and-pasted the code from your question, I got an error invalid command name "piperead". If you copy-and-paste the definition from the wiki, you should find that the code works. It certainly did for me.

On external port how to only close the output and wait for exit_status

Im using a port to run a pipeline with uncompresses and dd's some data:
Port = open_port({spawn, "bzcat | sudo dd of=/dev/foo},
[stream, use_stdio, exit_status]),
What I would like to do is produce a end-of-file situation on the output which causes the pipeline to complete and eventually exit.
I would like to wait for this completion and also capture the exit_status.
When I just call port_close it looks to me as if the pipeline is just terminated and there is no wait for completion. Also I don't get any exit_status ....
How can I accomplish waiting for exit before my next step (which requires the dd to have completed).
Did some experiments and it looks like at least port_close doesn't kill the process, you just don't find out when its done. Is this correct?
If you just need to wait for spawned by open_port command to complete you need to wait for exit_status message:
1> Port = open_port({spawn, "sleep 7"}, [exit_status]).
#Port<0.497>
2> receive {Port, {exit_status, Code}} -> Code after 10000 -> timeout end.
0
Update (about to say a port just close the output pipe): I think you can't just close the output pipe with the default spawn driver. Default driver doesn't have any control commands and port_close although don't kill spawned command but completely erase all port's state.
Possible solutions:
Write input stream to a file first and then run bzip/dd sequence on that file;
Write your own driver or NIF (Maybe some open source implementations already exist?)
Use some external script and control protocol, for example full (or chunk) length can be transferred before the actual content so the script will know when to close the connection
Several rather ugly workarounds to this problem can be found here: limitations of erlang:open_port() and os:cmd()
Some even use netcat to map the problem to a tcp connection.

Capturing output from WshShell.Exec using Windows Script Host

I wrote the following two functions, and call the second ("callAndWait") from JavaScript running inside Windows Script Host. My overall intent is to call one command line program from another. That is, I'm running the initial scripting using cscript, and then trying to run something else (Ant) from that script.
function readAllFromAny(oExec)
{
if (!oExec.StdOut.AtEndOfStream)
return oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
return "STDERR: " + oExec.StdErr.ReadLine();
return -1;
}
// Execute a command line function....
function callAndWait(execStr) {
var oExec = WshShell.Exec(execStr);
while (oExec.Status == 0)
{
WScript.Sleep(100);
var output;
while ( (output = readAllFromAny(oExec)) != -1) {
WScript.StdOut.WriteLine(output);
}
}
}
Unfortunately, when I run my program, I don't get immediate feedback about what the called program is doing. Instead, the output seems to come in fits and starts, sometimes waiting until the original program has finished, and sometimes it appears to have deadlocked. What I really want to do is have the spawned process actually share the same StdOut as the calling process, but I don't see a way to do that. Just setting oExec.StdOut = WScript.StdOut doesn't work.
Is there an alternate way to spawn processes that will share the StdOut & StdErr of the launching process? I tried using "WshShell.Run(), but that gives me a "permission denied" error. That's problematic, because I don't want to have to tell my clients to change how their Windows environment is configured just to run my program.
What can I do?
You cannot read from StdErr and StdOut in the script engine in this way, as there is no non-blocking IO as Code Master Bob says. If the called process fills up the buffer (about 4KB) on StdErr while you are attempting to read from StdOut, or vice-versa, then you will deadlock/hang. You will starve while waiting for StdOut and it will block waiting for you to read from StdErr.
The practical solution is to redirect StdErr to StdOut like this:
sCommandLine = """c:\Path\To\prog.exe"" Argument1 argument2"
Dim oExec
Set oExec = WshShell.Exec("CMD /S /C "" " & sCommandLine & " 2>&1 """)
In other words, what gets passed to CreateProcess is this:
CMD /S /C " "c:\Path\To\prog.exe" Argument1 argument2 2>&1 "
This invokes CMD.EXE, which interprets the command line. /S /C invokes a special parsing rule so that the first and last quote are stripped off, and the remainder used as-is and executed by CMD.EXE. So CMD.EXE executes this:
"c:\Path\To\prog.exe" Argument1 argument2 2>&1
The incantation 2>&1 redirects prog.exe's StdErr to StdOut. CMD.EXE will propagate the exit code.
You can now succeed by reading from StdOut and ignoring StdErr.
The downside is that the StdErr and StdOut output get mixed together. As long as they are recognisable you can probably work with this.
Another technique which might help in this situation is to redirect the standard error stream of the command to accompany the standard output.
Do this by adding "%comspec% /c" to the front and "2>&1" to the end of the execStr string.
That is, change the command you run from:
zzz
to:
%comspec% /c zzz 2>&1
The "2>&1" is a redirect instruction which causes the StdErr output (file descriptor 2) to be written to the StdOut stream (file descriptor 1).
You need to include the "%comspec% /c" part because it is the command interpreter which understands about the command line redirect. See http://technet.microsoft.com/en-us/library/ee156605.aspx
Using "%comspec%" instead of "cmd" gives portability to a wider range of Windows versions.
If your command contains quoted string arguments, it may be tricky to get them right:
the specification for how cmd handles quotes after "/c" seems to be incomplete.
With this, your script needs only to read the StdOut stream, and will receive both standard output and standard error.
I used this with "net stop wuauserv", which writes to StdOut on success (if the service is running)
and StdErr on failure (if the service is already stopped).
First, your loop is broken in that it always tries to read from oExec.StdOut first. If there is no actual output then it will hang until there is. You wont see any StdErr output until StdOut.atEndOfStream becomes true (probably when the child terminates). Unfortunately, there is no concept of non-blocking I/O in the script engine. That means calling read and having it return immediately if there is no data in the buffer. Thus there is probably no way to get this loop to work as you want. Second, WShell.Run does not provide any properties or methods to access the standard I/O of the child process. It creates the child in a separate window, totally isolated from the parent except for the return code. However, if all you want is to be able to SEE the output from the child then this might be acceptable. You will also be able to interact with the child (input) but only through the new window (see SendKeys).
As for using ReadAll(), this would be even worse since it collects all the input from the stream before returning so you wouldn't see anything at all until the stream was closed. I have no idea why the example places the ReadAll in a loop which builds a string, a single if (!WScript.StdIn.AtEndOfStream) should be sufficient to avoid exceptions.
Another alternative might be to use the process creation methods in WMI. How standard I/O is handled is not clear and there doesn't appear to be any way to allocate specific streams as StdIn/Out/Err. The only hope would be that the child would inherit these from the parent but that's what you want, isn't it? (This comment based upon an idea and a little bit of research but no actual testing.)
Basically, the scripting system is not designed for complicated interprocess communication/synchronisation.
Note: Tests confirming the above were performed on Windows XP Sp2 using Script version 5.6. Reference to current (5.8) manuals suggests no change.
Yes, the Exec function seems to be broken when it comes to terminal output.
I have been using a similar function function ConsumeStd(e) {WScript.StdOut.Write(e.StdOut.ReadAll());WScript.StdErr.Write(e.StdErr.ReadAll());} that I call in a loop similar to yours. Not sure if checking for EOF and reading line by line is better or worse.
You might have hit the deadlock issue described on this Microsoft Support site.
One suggestion is to always read both from stdout and stderr.
You could change readAllFromAny to:
function readAllFromAny(oExec)
{
var output = "";
if (!oExec.StdOut.AtEndOfStream)
output = output + oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
output = output + "STDERR: " + oExec.StdErr.ReadLine();
return output ? output : -1;
}

Resources