Akka Streams: a Flow over a running process - stream

How do you make a Flow over a process? Lets say I have a process that gets input via stdin and sends output via stdout. I want to have a Flow over it. So the Flow will initially startup the process, control its input and output stream and then act as a Map, mapping the input to output by feeding it to the process? And ultimately terminating the process when the stream ends. Also how would back-pressure control work in this situation?

Here is a very basic Flow that is "coupled" so that it terminates due to an interrupt from either input or output (sink/source). For simplicity I've assumed the case where there is no stdin, but you could adapt this to add a Sink that isn't ignored and pipes input into the process:
// capture stdout from the "ls" command
val runCmd: Source[Message, NotUsed] = {
Source.fromIterator[Message](() => Process("ls").lineStream.toIterator.map(line => TextMessage.Strict(line)))
}
// create flow for emitting stdout source to client
val messageStdout: Flow[Any, Message, NotUsed] = Flow
.fromSinkAndSourceCoupled(Sink.ignore, runCmd)
As for backpressure, notice above my code example is a blocking map operation (one line at a time). As the rate of processing grows, you can develop an asynchronous solution, and one that has a maximum parallelism obeying backpressure as defined by mapasync.

Related

How do I intercept the unbuffered output of a Proc::Async in Raku?

With a snippet like
# Contents of ./run
my $p = Proc::Async.new: #*ARGS;
react {
whenever Promise.in: 5 { $p.kill }
whenever $p.stdout { say "OUT: { .chomp }" }
whenever $p.ready { say "PID: $_" }
whenever $p.start { say "Done" }
}
executed like
./run raku -e 'react whenever Supply.interval: 1 { .say }'
I expected to see something like
PID: 1234
OUT: 0
OUT: 1
OUT: 2
OUT: 3
OUT: 4
Done
but instead I see
PID: 1234
OUT: 0
Done
I understand that this has to do with buffering: if I change that command into something like
# The $|++ disables buffering
./run perl -E '$|++; while(1) { state $i; say $i++; sleep 1 }'
I get the desired output.
I know that TTY IO::Handle objects are unbuffered, and that in this case the $*OUT of the spawned process is not one. And I've read that IO::Pipe objects are buffered "so that a write without a read doesn't immediately block" (although I cannot say I entirely understand what this means).
But no matter what I've tried, I cannot get the unbuffered output stream of a Proc::Async. How do I do this?
I've tried binding an open IO::Handle using $proc.bind-stdout but I still get the same issue.
Note that doing something like $proc.bind-stdout: $*OUT does work, in the sense that the Proc::Async object no longer buffers, but it's also not a solution to my problem, because I cannot tap into the output before it goes out. It does suggest to me that if I can bind the Proc::Async to an unbuffered handle, it should do the right thing. But I haven't been able to get that to work either.
For clarification: as suggested with the Perl example, I know I can fix this by disabling the buffering on the command I'll be passing as input, but I'm looking for a way to do this from the side that creates the Proc::Async object.
You can set the .out-buffer of a handle (such as $*OUT or $*ERR) to 0:
$ ./run raku -e '$*OUT.out-buffer = 0; react whenever Supply.interval: 1 { .say }'
PID: 11340
OUT: 0
OUT: 1
OUT: 2
OUT: 3
OUT: 4
Done
Proc::Async itself isn't performing buffering on the received data. However, spawned processes may do their own depending on what they are outputting to, and that's what is being observed here.
Many programs make decisions about their output buffering (among other things, such as whether to emit color codes) based on whether the output handle is attached to a TTY (a terminal). The assumption is that a TTY means a human is going to be watching the output, and thus latency is preferable to throughput, so buffering is disabled (or restricted to line buffering). If, on the other hand, the output is going to a pipe or a file, then the assumption is that latency is not so important, and buffering is used to achieve a significant throughput win (a lot less system calls to write data).
When we spawn something with Proc::Async, the standard output of the spawned process is bound to a pipe - which is not a TTY. Thus the invoked program may use this to decide to apply output buffering.
If you're willing to have another dependency, then you can invoke the program via. something that fakes up a TTY, such as unbuffer (part of the expect package, it seems). Here's an example of a program that is suffering from buffering:
my $proc = Proc::Async.new: 'raku', '-e',
'react whenever Supply.interval(1) { .say }';
react whenever $proc.stdout {
.print
}
We only see a 0 and then have to wait a long time for more output. Running it via unbuffer:
my $proc = Proc::Async.new: 'unbuffer', 'raku', '-e',
'react whenever Supply.interval(1) { .say }';
react whenever $proc.stdout {
.print
}
Means that we see a number output every second.
Could Raku provide a built-in solution to this some day? Yes - by doing the "magic" that unbuffer itself does (I presume allocating a pty - kind of a fake TTY). This isn't trivial - although it is being explored by the libuv developers; at least so far as Rakudo on MoarVM goes, the moment there's a libuv release available offering such a feature, we'll work on exposing it.

What standard input and output would be if there's no terminal connected to server?

This question came up into my mind when I was thinking about ways of server logging yesterday.
Normally, we open a terminal connected to local computer or remote server, run an executable, and print (printf, cout) some debug/log information in the terminal.
But for those processes/executables/scripts running on the server which are not connected to a terminal, what are the standard input and output?
For example:
Suppose I have a crontab task, running a program on the server many times a day. If I write something like cout << "blablabla" << endl; in the program. What's gonna happen? Where those output will flow into?
Another example I came up and wanted to know is, if I write a CGI program (use C or C++) for let's say a Apache web server, what is the standard input and output of my CGI program ? (According to this C++ CGI tutorial, I guess the standard input and output of the CGI program are in some ways redirected to the Apache server. Because it's using cout to output the html contents, not by return. )
I've read this What is “standard input”? before asking, which told me standard input isn't necessary to be tied to keyboard while standard output isn't necessary to be tied to a terminal/console/screen.
OS is Linux.
The standard input and standard output (and standard error) streams can point to basically any I/O device. This is commonly a terminal, but it can also be a file, a pipe, a network socket, a printer, etc. What exactly those streams direct their I/O to is usually determined by the process that launches your process, be that a shell or a daemon like cron or apache, but a process can redirect those streams itself it it would like.
I'll use Linux as an example, but the concepts are similar on most other OSes. On Linux, the standard input and standard output stream are represented by file descriptors 0 and 1. The macros STDIN_FILENO and STDOUT_FILENO are just for convenience and clarity. A file descriptor is just a number that matches up to some file description that the OS kernel maintains that tells it how to write to that device. That means that from a user-space process's perspective, you write to pretty much anything the same way: write(some_file_descriptor, some_string, some_string_length) (higher-level I/O functions like printf or cout are just wrappers around one or more calls to write). To the process, it doesn't matter what type of device some_file_descriptor represents. The OS kernel will figure that out for you and pass your data to the appropriate device driver.
The standard way to launch a new process is to call fork to duplicate the parent process, and then later to call one of the exec family of functions in the child process to start executing some new program. In between, it will often close the standard streams it inherited from its parent and open new ones to redirect the child process's output somewhere new. For instance, to have the child pipe its output back to the parent, you could do something like this in C++:
int main()
{
// create a pipe for the child process to use for its
// standard output stream
int pipefds[2];
pipe(pipefds);
// spawn a child process that's a copy of this process
pid_t pid = fork();
if (pid == 0)
{
// we're now in the child process
// we won't be reading from this pipe, so close its read end
close(pipefds[0]);
// we won't be reading anything
close(STDIN_FILENO);
// close the stdout stream we inherited from our parent
close(STDOUT_FILENO);
// make stdout's file descriptor refer to the write end of our pipe
dup2(pipefds[1], STDOUT_FILENO);
// we don't need the old file descriptor anymore.
// stdout points to this pipe now
close(pipefds[1]);
// replace this process's code with another program
execlp("ls", "ls", nullptr);
} else {
// we're still in the parent process
// we won't be writing to this pipe, so close its write end
close(pipefds[1]);
// now we can read from the pipe that the
// child is using for its standard output stream
std::string read_from_child;
ssize_t count;
constexpr size_t BUF_SIZE = 100;
char buf[BUF_SIZE];
while((count = read(pipefds[0], buf, BUF_SIZE)) > 0) {
std::cout << "Read " << count << " bytes from child process\n";
read_from_child.append(buf, count);
}
std::cout << "Read output from child:\n" << read_from_child << '\n';
return EXIT_SUCCESS;
}
}
Note: I've omitted error handling for clarity
This example creates a child process and redirects its output to a pipe. The program run in the child process (ls) can treat the standard output stream just as it would if it were referencing a terminal (though ls changes some behaviors if it detects its standard output isn't a terminal).
This sort of redirection can also be done from a terminal. When you run a command you can use the redirection operators to tell your shell to redirect that commands standard streams to some other location than the terminal. For instance, here's a convoluted way to copy a file from one machine to another using an sh-like shell:
gzip < some_file | ssh some_server 'zcat > some_file'
This does the following:
create a pipe
run gzip redirecting its standard input stream to read from "some_file" and redirecting its standard output stream to write to the pipe
run ssh and redirect its standard input stream to read from the pipe
on the server, run zcat with its standard input redirected from the data read from the ssh connection and its standard output redirected to write to "some_file"

Default process flags

Is there a way to instruct the Erlang VM to apply a set of process flags to every new process that is spawned in the system?
For example in testing environment I would like every process to have save_calls flag set.
One way for doing this is to combine the Erlang tracing functionalities with a .erlang file.
Specifically, you could either use the low-level tracing capabilities provided by erlang:trace/3 or you could simply exploit the dbg:tracer/2 function to create a new tracing process which executes your custom handler function every time a tracing message is received.
To automate things a bit, you could then create an Erlang Start Up File in the directory where you're running your code or in your home directory. The Erlang Start Up File is a special file, called .erlang, which gets executed every time you start the run-time system.
Something like the following should do the job:
% -*- Erlang -*-
erlang:display("This is automatically executed.").
dbg:tracer(process, {fun ({trace, Pid, spawn, Pid2, {M, F, Args}}, Data) ->
process_flag(Pid2, save_calls, Data),
Data;
(_Trace, Data) ->
Data
end, 100}).
dbg:p(new, [procs, sos]).
Basically, I'm creating a new tracing process, which will trace processes (first argument). I'm specifying an handler function to get executed and some initial data. In the handler function, I'm setting the save_calls flag for newly spawned processes, whilst I'm ignoring all other tracing messages. I set the save_calls' option to 100, using the Initial Data parameter. In the last call, I'm telling dbg that I'm interested only in newly created processes. I'm also setting the sos (set_on_spawn) option to ensure inheritance of the tracing flags.
Finally, note how you need to use a variant of the process_flag function, which takes an extra argument (the Pid of the process you want to set the flag for).

On external port how to only close the output and wait for exit_status

Im using a port to run a pipeline with uncompresses and dd's some data:
Port = open_port({spawn, "bzcat | sudo dd of=/dev/foo},
[stream, use_stdio, exit_status]),
What I would like to do is produce a end-of-file situation on the output which causes the pipeline to complete and eventually exit.
I would like to wait for this completion and also capture the exit_status.
When I just call port_close it looks to me as if the pipeline is just terminated and there is no wait for completion. Also I don't get any exit_status ....
How can I accomplish waiting for exit before my next step (which requires the dd to have completed).
Did some experiments and it looks like at least port_close doesn't kill the process, you just don't find out when its done. Is this correct?
If you just need to wait for spawned by open_port command to complete you need to wait for exit_status message:
1> Port = open_port({spawn, "sleep 7"}, [exit_status]).
#Port<0.497>
2> receive {Port, {exit_status, Code}} -> Code after 10000 -> timeout end.
0
Update (about to say a port just close the output pipe): I think you can't just close the output pipe with the default spawn driver. Default driver doesn't have any control commands and port_close although don't kill spawned command but completely erase all port's state.
Possible solutions:
Write input stream to a file first and then run bzip/dd sequence on that file;
Write your own driver or NIF (Maybe some open source implementations already exist?)
Use some external script and control protocol, for example full (or chunk) length can be transferred before the actual content so the script will know when to close the connection
Several rather ugly workarounds to this problem can be found here: limitations of erlang:open_port() and os:cmd()
Some even use netcat to map the problem to a tcp connection.

Capturing output from WshShell.Exec using Windows Script Host

I wrote the following two functions, and call the second ("callAndWait") from JavaScript running inside Windows Script Host. My overall intent is to call one command line program from another. That is, I'm running the initial scripting using cscript, and then trying to run something else (Ant) from that script.
function readAllFromAny(oExec)
{
if (!oExec.StdOut.AtEndOfStream)
return oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
return "STDERR: " + oExec.StdErr.ReadLine();
return -1;
}
// Execute a command line function....
function callAndWait(execStr) {
var oExec = WshShell.Exec(execStr);
while (oExec.Status == 0)
{
WScript.Sleep(100);
var output;
while ( (output = readAllFromAny(oExec)) != -1) {
WScript.StdOut.WriteLine(output);
}
}
}
Unfortunately, when I run my program, I don't get immediate feedback about what the called program is doing. Instead, the output seems to come in fits and starts, sometimes waiting until the original program has finished, and sometimes it appears to have deadlocked. What I really want to do is have the spawned process actually share the same StdOut as the calling process, but I don't see a way to do that. Just setting oExec.StdOut = WScript.StdOut doesn't work.
Is there an alternate way to spawn processes that will share the StdOut & StdErr of the launching process? I tried using "WshShell.Run(), but that gives me a "permission denied" error. That's problematic, because I don't want to have to tell my clients to change how their Windows environment is configured just to run my program.
What can I do?
You cannot read from StdErr and StdOut in the script engine in this way, as there is no non-blocking IO as Code Master Bob says. If the called process fills up the buffer (about 4KB) on StdErr while you are attempting to read from StdOut, or vice-versa, then you will deadlock/hang. You will starve while waiting for StdOut and it will block waiting for you to read from StdErr.
The practical solution is to redirect StdErr to StdOut like this:
sCommandLine = """c:\Path\To\prog.exe"" Argument1 argument2"
Dim oExec
Set oExec = WshShell.Exec("CMD /S /C "" " & sCommandLine & " 2>&1 """)
In other words, what gets passed to CreateProcess is this:
CMD /S /C " "c:\Path\To\prog.exe" Argument1 argument2 2>&1 "
This invokes CMD.EXE, which interprets the command line. /S /C invokes a special parsing rule so that the first and last quote are stripped off, and the remainder used as-is and executed by CMD.EXE. So CMD.EXE executes this:
"c:\Path\To\prog.exe" Argument1 argument2 2>&1
The incantation 2>&1 redirects prog.exe's StdErr to StdOut. CMD.EXE will propagate the exit code.
You can now succeed by reading from StdOut and ignoring StdErr.
The downside is that the StdErr and StdOut output get mixed together. As long as they are recognisable you can probably work with this.
Another technique which might help in this situation is to redirect the standard error stream of the command to accompany the standard output.
Do this by adding "%comspec% /c" to the front and "2>&1" to the end of the execStr string.
That is, change the command you run from:
zzz
to:
%comspec% /c zzz 2>&1
The "2>&1" is a redirect instruction which causes the StdErr output (file descriptor 2) to be written to the StdOut stream (file descriptor 1).
You need to include the "%comspec% /c" part because it is the command interpreter which understands about the command line redirect. See http://technet.microsoft.com/en-us/library/ee156605.aspx
Using "%comspec%" instead of "cmd" gives portability to a wider range of Windows versions.
If your command contains quoted string arguments, it may be tricky to get them right:
the specification for how cmd handles quotes after "/c" seems to be incomplete.
With this, your script needs only to read the StdOut stream, and will receive both standard output and standard error.
I used this with "net stop wuauserv", which writes to StdOut on success (if the service is running)
and StdErr on failure (if the service is already stopped).
First, your loop is broken in that it always tries to read from oExec.StdOut first. If there is no actual output then it will hang until there is. You wont see any StdErr output until StdOut.atEndOfStream becomes true (probably when the child terminates). Unfortunately, there is no concept of non-blocking I/O in the script engine. That means calling read and having it return immediately if there is no data in the buffer. Thus there is probably no way to get this loop to work as you want. Second, WShell.Run does not provide any properties or methods to access the standard I/O of the child process. It creates the child in a separate window, totally isolated from the parent except for the return code. However, if all you want is to be able to SEE the output from the child then this might be acceptable. You will also be able to interact with the child (input) but only through the new window (see SendKeys).
As for using ReadAll(), this would be even worse since it collects all the input from the stream before returning so you wouldn't see anything at all until the stream was closed. I have no idea why the example places the ReadAll in a loop which builds a string, a single if (!WScript.StdIn.AtEndOfStream) should be sufficient to avoid exceptions.
Another alternative might be to use the process creation methods in WMI. How standard I/O is handled is not clear and there doesn't appear to be any way to allocate specific streams as StdIn/Out/Err. The only hope would be that the child would inherit these from the parent but that's what you want, isn't it? (This comment based upon an idea and a little bit of research but no actual testing.)
Basically, the scripting system is not designed for complicated interprocess communication/synchronisation.
Note: Tests confirming the above were performed on Windows XP Sp2 using Script version 5.6. Reference to current (5.8) manuals suggests no change.
Yes, the Exec function seems to be broken when it comes to terminal output.
I have been using a similar function function ConsumeStd(e) {WScript.StdOut.Write(e.StdOut.ReadAll());WScript.StdErr.Write(e.StdErr.ReadAll());} that I call in a loop similar to yours. Not sure if checking for EOF and reading line by line is better or worse.
You might have hit the deadlock issue described on this Microsoft Support site.
One suggestion is to always read both from stdout and stderr.
You could change readAllFromAny to:
function readAllFromAny(oExec)
{
var output = "";
if (!oExec.StdOut.AtEndOfStream)
output = output + oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
output = output + "STDERR: " + oExec.StdErr.ReadLine();
return output ? output : -1;
}

Resources