Is stdout line buffered, unbuffered or indeterminate by default? - stdout

Section 7.19.3/7 of c99 states that:
At program start-up, three text streams are predefined and need not be opened explicitly - standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output).
As initially opened, the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
So that makes sense. If you're pushing your standard output to a file, you want it fully buffered for efficiency.
But I can find no mention in the standard as to whether the output is line buffered or unbuffered when you can't determine the device is non-interactive (ie, normal output to a terminal).
The reason I ask was a comment to my answer here that I should insert an fflush(stdout); between the two statements:
printf ("Enter number> ");
// fflush (stdout); needed ?
if (fgets (buff, sizeof(buff), stdin) == NULL) { ... }
because I wasn't terminating the printf with a newline. Can anyone clear this up?

The C99 standard does not specify if the three standard streams are unbuffered or line buffered: It is up to the implementation. All UNIX implementations I know have a line buffered stdin. On Linux, stdout in line buffered and stderr unbuffered.
As far as I know, POSIX does not impose additional restrictions. POSIX's fflush page does note in the EXAMPLES section:
[...] The fflush() function is used because standard output is usually buffered and the prompt may not immediately be printed on the output or terminal.
So the remark that you add fflush(stdout); is correct.
An alternative could be to make stdout unbuffered:
setbuf(stdout, NULL);
/* or */
setvbuf(stdout, NULL, _IONBF, 0);
But as R. notes you can only do this once, and it must be before you write to stdout or perform any other operantion on it. (C99 7.19.5.5 2)
I just read a recent thread on comp.lang.c about the same thing. One of the remarks:
Unix convention is that stdin and stdout are line-buffered when associated with a terminal, and fully-buffered (aka block-buffered) otherwise. stderr is always unbuffered.

Related

Getting the current index in the input string (flex lexer)

I am using flex lexer. Is there a way to (1) get the current index in the input string (2) jump back to that index in a future time point?
Thanks.
It's fairly easy to maintain the current input position. When any rule is matched, yyleng contains the length of the match, so it is sufficient to add yyleng to the cumulative length processed. Assuming you are using flex, it is not necessary to insert the code directly into every rule action, which would be tedious. Instead, you can use the YY_USER_ACTION macro:
#define YY_USER_ACTION input_pos += yyleng;
(This assumes that you have defined input_pos somewhere, and arranged for it to be initialized to 0 when the lexical scan commences.)
This will lead to incorrect results if you use REJECT, yymore(), yyless() or input(); in all of these cases, you will have to adjust the value of input_pos. For every call to yymore(), you need to subtract yyleng from input_pos; this will also work for REJECT. For a call to yyless(), you can subtract yyleng before the call and add it back after the call. For each call to input(), you need to add one to input_pos.
Within a rule, you can then use input_pos as the position at the end of the match, or input_pos - yyleng as the position at the beginning of the match.
Returning to a saved position is trickier.
(F)lex does not maintain the entire input in memory, so in principle you would need to use fseek() to rewind yyin to the correct place. However, in the common case where yyin has not been opened in binary mode, you cannot reliably use fseek() to return to a computed input offset. So at a minimum, you would have to ensure that yyin was opened (or reopened) in binary mode.
Moreover, it is not in general possible to guarantee that whatever stream yyin is attached to can be rewound at all (it might be console input, a pipe, or some other non-seekable device). So to be fully general, you might have to use a temporary file to store data read from the stream. This will create additional complications when you attempt to reread previous input, because you will have to switch to the temporary file for reading until it is finished, at which point you would have to return to the main file. Creative use of yywrap will simplify this procedure.
Note that after you rewind the input stream -- whether or not you switch to reading from a temporary file -- you must call yyrestart() to reset the scanner's input buffer. (This is also a flex-only feature; Posix lex does not specify the mechanism by which you inform the scanner that its buffer needs to be reset, so if you are not using flex you will have to consult the relevant documentation for your scanner generator.)

Going back to old position in lex

During my lex processing, I need to go back in the lex input file, to process the same input several times with different local settings.
However, just doing fseek(yyin, old_pos, SEEK_SET); does not work, since the input data are buffered by lex. How can I (portably) deal with this?
I tried to add a YY_FLUSH_BUFFER after the fseek(), but it didn't help, since the old file position was incorrect (it was set to the point after filling the buffer, not to the point where I evaluate the token).
The combination of YY_FLUSH_BUFFER() and fseek(yyin, position, SEEK_SET) (in either order, but I would do the YY_FLUSH_BUFFER() first) will certainly cause the next token to be scanned starting at position. The problem is figuring out the correct value of position.
It is relatively simple to track the character offset (but see the disclaimer below if you require a portable scanner which could run on non-Posix platforms such as Windows):
%{
long scan_position = 0;
%}
%%
[[:space:]]* scan_position += yyleng;
"some pattern" { scan_position += yyleng; ... }
Since it's a bit tedious to insert scan_position += yyleng; into every rule, you can use flex's helpful YY_USER_ACTION macro hook: this macro is expanded at the beginning of every action (even empty actions). So you could write the above more simply:
%{
long scan_position = 0;
#define YY_USER_ACTION scan_position += yyleng;
%}
%%
[[:space:]]*
"some pattern" { ... }
One caveat: This will not work if you use any of the flex actions which adjust token length or otherwise alter the normal scanning procedure. That includes at least yyless, yymore, REJECT, unput and input. If you use any of the first three, you need to reset scan_position -= yyleng; (that needs to go just before the invocation of yyless, yymore or REJECT. For input and unput, you need to increment / decrement scan_position to account for the character read outside of the scanning process.
Disclaimer:
Tracking positions like that assumes that there is a one-to-one correspondence between bytes read from an input stream and raw bytes in the underlying file system. For Posix systems, this is guaranteed to be the case: fread(3) and read(2) will read the same bytes and the b open mode flag has no effect.
In general, though, there is no reliable way of tracking file position. You could open the stream in binary mode and deal with the system's idiosyncratic line endings yourself (this will work on Windows but there is no portable way of establishing what the line ending sequence is, so it is not portable either). But on other non-Posix systems, it is possible that a binary read produces a completely different result (for example, the underlying file might use fixed-length records so that each line is padded (with some system-specific padding character) to make it the correct length.
That's why the C standard prohibits the use of computed offset values:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET. (ยง7.21.9.2 "The fseek function", paragraph 4).
There is no way to turn buffering off in flex -- or any version of lex that I know of -- because correctly handling fallback depends on being able to buffer. (Fallback happens when the scan has proceeded beyond the end of a token, because the token matches the prefix of a longer token which happens not to be present.)
I think the only portable solution would be to copy the input stream token by token into your own buffer (or temporary file) and then use yy_push_buffer_state and yy_scan_buffer (if you're using a buffer) to insert that buffer into the input stream. That solution would look a lot like the tracking code above, except that YY_USER_ACTION would append the tokens read to your own string buffer or temporary file. (You would want to make that conditional on a flag so that it only happens in the segment of the file you want to rescan.) If you have nested repeats, you could track the position in your own buffer/file in order to be able to return to it.

Disable Tcl's input buffering?

Does Tcl do any internal input buffering that's out of the script writers control? Will the following code possibly waste entropy (read more than 1 byte), and if so, how can I prevent it?
set stream [open "/dev/srandom"]
chan configure $stream -translation binary
set randomByte [chan read $stream 1]
Yes, tcl defaults to buffering and will waste enthropy (as much as a single read call will decide to hand over).
I thought that you can prevent it with
chan configure $stream -buffering none
But no, -buffering has no effect on input queue (it's not a single buffer internally).
However,
chan configure $stream -buffersize 0
does the trick, as I've seen from an experiment with stdin under strace. It makes any input go in reads (syscall) of size 1 (an argument to TCL read doesn't matter), so it would be extremely slow for normal use.

Common Lisp struggle with read-byte/write-byte

I want to be able to write bytes and read them from standard input/output but when I try this in SBCL I get the error "The stream has no suitable method[...]", why is this and how would I go about to make my own stream which can handle bytes?
This seems to be because the standard input and output streams are streams with element type character, not (unsigned-byte 8). The element type of a stream is usually configured, when the stream is opened, which, in the case of standard input/output, is done automatically when the interpreter starts.
However, SBCL has the notion of bivalent streams, which can support both, character and byte-oriented I/O. As it happens, on my machine,
* (read-byte *standard-input* nil)
a
97
* (read-char *standard-input* nil)
a
#\a
works fine. So, which version of SBCL are you using? Mine is SBCL 1.0.49.

popen() system call hangs in HP-Ux 11.11

I have a program which calculates 'Printer Queues Total' value using '/usr/bin/lpstat' through popen() system call.
{
int n=0;
FILE *fp=NULL;
printf("Before popen()");
fp = popen("/usr/bin/lpstat -o | grep '^[^ ]*-[0-9]*[ \t]' | wc -l", "r");
printf("After popen()");
if (fp == NULL)
{
printf("Failed to start lpstat - %s", strerror(errno));
return -1;
}
printf("Before fscanf");
fscanf(fp, "%d", &n);
printf("After fscanf");
printf("Before pclose()");
pclose(fp);
printf("After pclose()");
printf("Value=%d",n);
printf("=== END ===");
return 0;
}
Note: In the command line, '/usr/bin/lpstat' command is hanging for some time as there are many printers available in the network.
The problem here is, the execution is hanging at popen() system call, Where as I would expect it to hang at fscanf() which reads the output from the file stream fp.
If anybody can tell me the reasons for the hang at popen() system call, it will help me in modifying the program to work for my requirement.
Thanks for taking time in reading this post and your efforts.
What people expect does not always have a basis in reality :-)
The command you're running doesn't actually generate any output until it's finished. That would be why it would seem to be hung in the popen rather than the fscanf.
There are two possible reasons for that which spring to mind immediately.
The first is that it's implemented this way, with popen capturing the output in full before delivering the first line. Based on my knowledge of UNIX, this seems unlikely but I can't be sure.
Far more likely is the impact of the pipe. One thing I've noticed is that some filters (like grep) batch up their lines for efficiency. So, while popen itself may be spewing forth its lines immediately (well, until it gets to the delay bit anyway), the fact that grep is holding on to the lines until it gets a big enough block may be causing the delay.
In fact, it's almost certainly the pipe-through-wc, which cannot generate any output until all lines are received from lpstat (you cannot figure out how many lines there are until all the lines have been received). So, even if popen just waited for the first character to be available, that would seem to be where the hang was.
It would be a simple matter to test this by simply removing the pipe-through-grep-and-wc bit and seeing what happens.
Just one other point I'd like to raise. Your printf statements do not have newlines following and, even if they did, there are circumstances where the output may still be fully buffered (so that you probably wouldn't see anything until that program exited, or the buffer filled up).
I would start by changing them to the form:
printf ("message here\n"); fflush (stdout); fsync (fileno (stdout));
to ensure they're flushed fully before continuing. I'd hate this to be a simple misunderstanding of a buffering issue :-)
It sounds as if popen may be hanging whilst lpstat attempts to retrieve information from remote printers. There is a fair amount of discussion on this particular problem. Have a look at that thread, and especially the ones that are linked from that.

Resources