Reading the binary output of an external program in Common Lisp

Reading the binary output of an external program in Common Lisp - stream

I'm trying to run an external program in SBCL and capture its output.
The output is binary data (a png image), while SBCL insists on interpreting it as strings.
I tried a number of ways, like
(trivial-shell:shell-command "/path/to/png-generator" :input "some input")
(with-input-from-string (input "some input")
(with-output-to-string (output)
(run-program "/path/to/png-generator" () :input input :output output))
(with-input-from-string (input "some input")
(flexi-streams:with-output-to-sequence (output)
(run-program "/path/to/png-generator" () :input input :output output))
But I get errors like
Illegal :UTF-8 character starting at byte position 0.
It seems to me that SBCL is trying to interpret the binary data as a text and decode it. How do I change this behaviour ? I'm interested only in obtaining a vector of octets.
Edit: Since it is not clear from the text above, I'd like to add that at least in the case of flexi-stream, the element-type of the stream is a flexi-streams:octect (which is a (unsigned-byte 8)).
I would expect at least in this case run-program to read the raw bytes without many issues. Instead I get a message like Don't know how to copy to stream of element-type (UNSIGNED-BYTE 8)

Edit: I got angry at not being able to do this very simple task and solved the problem.
Functionally, the ability to send a stream of type UNSIGNED-BYTE into run-program and have it work correctly is severely limited, for reasons I don't understand. I tried gray streams, flexi-streams, fd streams, and a few other mechanisms, like you.
However, perusing run-program's source (for the fifth or sixth time), I noticed that there's an option :STREAM you can pass to output. Given that, I wondered if read-byte would work... and it did. For more performant work, one could determine how to get the length of a non-file stream and run READ-SEQUENCE on it.
(let*
;; Get random bytes
((proc-var (sb-ext:run-program "head" '("-c" "10" "/dev/urandom")
:search t
;; let SBCL figure out the storage type. This is what solved the problem.
:output :stream))
;; Obtain the streams from the process object.
(output (process-output proc-var))
(err (process-error proc-var)))
(values
;;return both stdout and stderr, just for polish.
;; do a byte read and turn it into a vector.
(concatenate 'vector
;; A byte with value 0 is *not* value nil. Yay for Lisp!
(loop for byte = (read-byte output nil)
while byte
collect byte))
;; repeat for stderr
(concatenate 'vector
(loop for byte = (read-byte err nil)
while byte
collect byte))))

If you're willing to use some external libraries, this can be done with babel-streams. This is a function I use to safely get content from a program. I use :latin-1 because it maps the first 256 bytes just to the characters. You could remove the octets-to-string and have the vector.
If you wanted stderr as well, you could use nested 'with-output-to-sequence' to get both.
(defun safe-shell (command &rest args)
(octets-to-string
(with-output-to-sequence (stream :external-format :latin-1)
(let ((proc (sb-ext:run-program command args :search t :wait t :output stream)))
(case (sb-ext:process-status proc)
(:exited (unless (zerop (sb-ext:process-exit-code proc))
(error "Error in command")))
(t (error "Unable to terminate process")))))
:encoding :latin-1))

Paul Nathan already gave a pretty complete answer as to how to read I/O from a program as binary, so I'll just add why your code didn't work: because you explicitely asked SBCL to interpret the I/O as a string of UTF-8 characters, using with-{in,out}put-to-string.
Also, I'd like to point that you don't need to go as far as run-program's source code to get to the solution. It's clearly documented in SBCL's manual.

Related

Erlang equivalent of javascript codePointAt?

Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?

You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:
1> CodePointAt = fun(Binary, Offset) ->
<<_:Offset/binary, Char/utf8, _/binary>> = Binary,
Char
end.
Test:
2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>
As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case expression if needed.

First, remember that only binary strings are using UTF-8 in Erlang. Plain double-quote strings are already just lists of code points (much like UTF-32). The unicode:chardata() type represents both of these kinds of strings, including mixed lists like ["Hello", $\s, [<<"Filip"/utf8>>, $!]]. You can use unicode:characters_to_list(Chardata) or unicode:characters_to_binary(Chardata) to get a flattened version to work with if needed.
Meanwhile, the JS codePointAt function works on UTF-16 encoded strings, which is what JavaScript uses. Note that the index in this case is not a byte position, but the index of the 16-bit units of the encoding. And UTF-16 is also a variable length encoding: code points that need more than 16 bits use a kind of escape sequence called "surrogate pairs" - for example emojis like 👍 - so if such characters can occur, the index is misleading: in "a👍z" (in JavaScript), the a is at 0, but the z is not at 2 but at 3.
What you want is probably what's called the "grapheme clusters" - those that look like a single thing when printed (see the docs for Erlang's string module: https://www.erlang.org/doc/man/string.html). And you can't really use numerical indexes to dig the grapheme clusters out from a string - you need to iterate over the string from the start, getting them out one at a time. This can be done with string:next_grapheme(Chardata) (see https://www.erlang.org/doc/man/string.html#next_grapheme-1) or if you for some reason really need to index them numerically, you could insert the individual cluster substrings in an array (see https://www.erlang.org/doc/man/array.html). For example: array:from_list(string:to_graphemes(Chardata)).

How to open multiple URLs at the same time in an Emacs buffer?

I am using the Emacs editor together with the org-mode and evil-mode mainly for text handling and documentation. Often there is a topic where several different URLs to websites belong to.
Example: I have a text snippet on how to install Emacs:
*** install emacs
emacs - I want to try org-mode. What's the shortest path from zero to typing? - Stack Overflow
https://stackoverflow.com/questions/4940680/i-want-to-try-org-mode-whats-the-shortest-path-from-zero-to-typing
Index of /gnu/emacs/windows/emacs-26
http://ftp.gnu.org/gnu/emacs/windows/emacs-26/emacs-26.3-x86_64.zip
Installation target:
file://C:\Lupo_Pensuite\MyApps\emacs
How to
file://C:\Lupo_Pensuite\MyDocs\howto.txt
Is it possible to select the region and all the URLs are opened within my default web browser? And the file link is being opened by Windows Explorer? And the text file is opened with the associated editor?
Or even better: emacs is aware that the a.m. text snippet actually is a org-mode chapter. And regardless where within that chapter the cursor is positioned, something like M-x open-all-links-in-chapter is...opening all mentioned links in the current chapter.
Prio 1: is there something like that existing in emacs/org-mode/evil-mode already?
Prio 2: is there a elisp function you know which can achieve this use case?
Enviroment: Cygwin under Windows 10, emacs 26.3, org-mode 9.1.9

It turns out, that org-mode has this already built-in!
Today I was browsing the documentation of org-mode, wondering how exactly C-c C-o is working. That key combo is calling the emacs org-mode function "org-open-at-point". org-open-at-point is opening the URL where the cursor (in emacs speak: point) is positioned.
Now if a C-c C-o is pressed on a heading, then all URL's beneath that heading are opened! Which is exactly what I asked for from the beginning. Thanks a lot, NickD, for your constructive contributions!
Here the original help text:
When point is on a headline, display a list of every link in the entry, so it is possible to pick one, or all, of them.

Warning: used without thought, the following can bring your machine to its knees. I will add some more specific warnings at the end, but be careful!
The basic idea of the code below is to parse the buffer of an Org mode file, in order to get a parse tree of the buffer: that is done by org-element-parse-buffer. We can then use org-element-map to walk the parse tree and select only nodes of type link, applying a function to each one as we go. The function we apply, get-link, munges through the contents of the link node, extracting the type and path and returning a list of those two. Here's how it looks so far:
(defun get-link (x)
(let* ((link (cadr x))
(type (plist-get link :type))
(path (plist-get link :path)))
(if (or (string= type "http") (string= type "https"))
(list type path))))
(defun visit-all-http-links ()
(interactive)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
links))
Note that I only keep http and https links: you may want to add extra types.
This already goes a long way towards getting you what you want. In fact, if you load the file with the two functions above, you can try it on the following sample Org mode file:
* foo
** foo 1
http://www.google.com
https://redhat.com
* bar
** bar 2
[[https://gnome.org][Gnome]] is a FLOSS project. So is Fedora: https://fedoraproject.org.
* Code
#+begin_src emacs-lisp :results value verbatim :wrap example
(visit-all-http-links)
#+end_src
#+RESULTS:
#+begin_example
(("http" "//www.google.com") ("https" "//redhat.com") ("https" "//gnome.org") ("https" "//fedoraproject.com"))
#+end_example
and evaluating the source block with C-c C-c, you get the results shown.
Now all we need to do is convert each (TYPE PATH) pair in the result list to a real URL and then visit it - here's the final version of the code:
(defun get-link (x)
"Assuming x is a LINK node in an Org mode parse tree,
return a list consisting of its type (e.g. \"http\")
and its path."
(let* ((link (cadr x))
(type (plist-get link :type))
(path (plist-get link :path)))
(if (or (string= type "http") (string= type "https"))
(list type path))))
(defun format-url (x)
"Take a (TYPE PATH) list and return a proper URL. Note
the following works for http- and https-type links, but
might need modification for other types."
(format "%s:%s" (nth 0 x) (nth 1 x)))
(defun visit-all-http-links ()
(interactive)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
(mapcar #'browse-url (mapcar #'format-url links))))
We add a function format-url that does this: ("http" "//example.com") --> "http://example.com" and map it on the links list, producing a new list of URLS. Then we map the function browse-url (which is provided by emacs) on the resulting list and we watch the browser open them all.
WARNINGS:
If you have hundreds or thousands of links in the file, then you are going to create hundreds or thousands of tabs in your browser. Are you SURE your machine can take it?
If your links point to big objects, that's going to put another kind of memory pressure on your system. Are you SURE your machine can take it?
If your Org mode buffer is big, then org-element-parse-buffer can take a LONG time to process it. Moreover, even though there is a caching mechanism, it is not enabled by default because of bugs, so every time you execute the function you are going to parse the buffer AGAIN from scratch.
Every time you execute the function, you are going to open NEW tabs in your browser.
EDIT in response to questions in comments:
Q1: "visit-all-http-links opens all URLs in the file. My original question was, whether it is possible to open only the URLs which are being found in the current org-mode chapter."
A1: Doing just a region is a bit harder but possible, if you guarantee that the region is syntactically correct Org mode (e.g. a collection of headlines and their contents). You just write the region to a temporary buffer and then do what I did on the temp buffer instead of the original.
Here's the modified code using the visit-url function from Question 2:
(defun visit-all-http-links-in-region (beg end)
(interactive "r")
(let ((s (buffer-substring beg end)))
(with-temp-buffer
(set-buffer (current-buffer))
(insert s)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
(mapcar #'visit-url (mapcar #'format-url links))))))
(defun visit-all-http-links ()
(interactive)
(visit-all-http-links-in-region (point-min) (point-max)))
Very lightly tested.
Q2: "Every time I execute your function with your example URLs, the URLs are being opened with a different sequence - is it possible to open the URLs in that very sequence which is found in the org file?"
A2: The links are found deterministically in the order that they occur in the file. But the moment you call browse-url, all bets are off, because the URL now belongs to the browser, which will try to open each URL it receives in a separate tab and using a separate thread - in other words asynchronously. You might try introducing a delay between calls, but there are no guarantees:
(defun visit-url(url)
(browse-url)
(sit-for 1 t))
and then use visit-url instead of browse-url in visit-all-urls.

Common Lisp output file streams SBCL

I am on SBCL on debian.
For some reason if I use this:
(with-open-file (output (open #p"file.txt"
:direction :output
:if-exists :overwrite))
(format output "test")))
Where file.txt is a plain text file.
I get the error
#<SB-SYS:FD-STREAM for "file /home/me/file.txt" {1004A90813}> is not
a character output stream.
Even using :element-type 'character doesn't save me. I haven't been able to get any output stream opened by any method. If I try to use write-bit it says that it isn't a binary output stream. No other write functions work either, such as write-sequence or write-line. They all return this error. How do I fix this?

I've made the important points bold. The problem is actually more tricky then one might think:
Let's look at the form.
First mistake: it's not indented correctly. Let's indent:
(with-open-file (output (open #p"file.txt"
:direction :output
:if-exists :overwrite))
(format output "test")))
Now we can see more mistakes. An additional parentheses
(with-open-file (output (open #p"file.txt"
:direction :output
:if-exists :overwrite))
(format output "test"))) ; <- additional parenthesis
But more important:
(open #p"file.txt"
:direction :output
:if-exists :overwrite)
Above opens a file for writing output and returns a stream.
WITH-OPEN-FILE does also open a file. So you try to open the file TWICE, first for writing..
(with-open-file (output stream)
(format output "test")))
Above opens a file for reading. You have opened the file twice: first for writing, then for reading.
Now you try to write with FORMAT to an input stream.
The slightly surprising part is this: both open and with-open-file can take a file stream as a file spec. If it gets a file stream as a file spec, then the associated pathname is used for the open operation.
So, as mentioned in another answer, this would be more correct:
(with-open-file (output #p"file.txt"
:direction :output
:if-exists :supersede)
(format output "Hello"))
SBCL error message:
#<SB-SYS:FD-STREAM for "file /home/me/file.txt" {1004A90813}>
is not a character output stream.
The point of the error message here is not that the stream is not a character stream. It's not an output stream. The stream actually is a character input stream! Thus calling FORMAT using the stream won't work. Let's write an assert to verify this:
CL-USER 18 > (with-open-file (output (open #p"/tmp/file.txt"
:direction :output
:if-does-not-exist :create
:if-exists :overwrite))
(assert (output-stream-p output) (output)
"The stream ~a is not an output stream!"
output)
(format output "test"))
Error: The stream #<STREAM::LATIN-1-FILE-STREAM /tmp/file.txt>
is not an output stream!
Your extra question: Why is the following form working?
(with-open-file (input (open #p"file.txt")) ...)
It just opens the file TWICE for reading.

Your usage of with-open-file is incorrect.
(with-open-file (output #p"file.txt"
:direction :output
:if-exists :supersede)
(format output "Hello"))

why does io:get_line return "\n" in erlang shell?

when using io:getline("prompt") in the erlang shell , the function returns immediately with a return value of "\n"
io:get_line("prompt").
prompt
"\n"
but as suggested in another thread doing the following reads from standard_io correctly.
spawn(fun() -> timer:sleep(100),io:get_line("prompt") end).
waits for user input and reads from standard io (shell). It was mentioned that it was a race condition . can anyone tell me why is it so and how is it possible to read a value from the erlang shell ?

io:get_line/1 and io:get_line/2 returns data with \n every time.
get_line(Prompt) -> Data | server_no_data()
Where:
Data
The characters in the line terminated by a LF (or end of file). If the
IO device supports Unicode, the data may represent codepoints larger
than 255 (the latin1 range). If the I/O server is set to deliver
binaries, they will be encoded in UTF-8 (regardless of if the IO
device actually supports Unicode or not).
In first case you got \n, and try to get result of io:get_line in second case:
spawn(fun() ->
timer:sleep(100),
Result = io:get_line("prompt"),
io:format("Result: ~p~n", [Result])
end).

Let's break it down...
Why io:get_line/1 returns a \n?
io:get_line/1 returns a "line" and a \n ("end of a line" or "new line") constitutes a line together with the string you entered.
> io:get_line("Prompt:").
Prompt:TheStringIEntered
"TheStringIEntered\n"
How to read a value from the Erlang shell?
> Data = string:strip(io:get_line("Prompt:"), right, $\n).
Prompt:TheStringIEntered
"TheStringIEntered"
> Data.
"TheStringIEntered"
Note that the value (Data) here is always of string type. You can convert it into other types, but you always start with a string.
Why does spawn(fun() -> timer:sleep(100),io:get_line("prompt") end). behave differently?
Because spawn spawns a new process that temporarily takes over the shell. Once this process gets TheStringIEntered, it also reaches the end of its life. So it dies without having its return value (TheStringIEntered) printed to the shell.

How to check for EOF/EOL with Stream I/O in Fortran?

I would like to use FORTRAN streaming I/O to make a program that tells me how many lines a text-file has. The idea is to make something like this:
OPEN(UNIT=10,ACCESS='STREAM',FILE='testfile.txt')
nLines=0
bContinue=.TRUE.
DO WHILE (bContinue)
READ(UNIT=10) cCharacter
IF (cCharacter.EQ.{EOL-char}) nLines=nLines+1
IF (cCharacter.EQ.{EOF-char}) bContinue=.FALSE.
ENDDO
(I didn't include variable declaration but I think you get the idea of what they are; the only important clarification would be that that cCharacter has LEN=1)
My problem is that I don't know how to check if the character I just read from the file is an end-of-line or end-of-file (the "ifs" in the code). When you read and print characters this way, you eventually get newlines in the same place you had them in the original text, so I think it does read and recognize them as "characters", somehow. Perhaps turning the characters into integers and comparing to the appropriate number? Or is there a more direct way?
(I know that you can use the register reading (EDIT: I meant record reading) to do a program that reads lines more easily and add an IOstatus to check for eof, but the "line counter" is just a useful example, the idea is to learn how to move in a more controlled way through a textfile)

Checking for a specific character as line terminator makes you program OS dependent. It would be better to use the facilities of the language so that your program is compiler and OS dependent. Since lines are basically records, why do this with steam I/O? That request seems to make an easy job into a hard one. If are can use regular IO, here is an example program to count the lines in a text file.
EDIT: the code fragment was changed into a program to answer questions in the comments. With "line" as a character variable, when I test the program with gfortran and ifort I don't see a problem when the input file has empty or blank lines.
program test_lc
use, intrinsic :: iso_fortran_env
integer :: LineCount, Read_Code
character (len=200) :: line
open (unit=51, file="temp.txt", status="old", access='sequential', form='formatted', action='read' )
LineCount = 0
ReadLoop: do
read (51, '(A)', iostat=Read_Code) line
if ( Read_Code /= 0 ) then
if ( Read_Code == iostat_end ) then
exit ReadLoop ! end of file --> line count found
else
write ( *, '( / "read error: ", I0 )' ) Read_Code
stop
end if
end if
LineCount = LineCount + 1
write (*, '( I0, ": ''", A, "''" )' ) LineCount, trim (line)
if ( len_trim (line) == 0 ) write (*, '("The above is an empty or all blank line.")' )
end do ReadLoop
write (*, *) "found", LineCount, " lines"
end program test_lc
If you want to do further processing of the file, you can rewind it.
P.S.
The main reason that I have used Fortran Stream IO is to read files produced by other languages, e.g., C
Portable methods are provided to write new-line boundaries; I'm not aware of a portable method to test for such.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Reading the binary output of an external program in Common Lisp - stream

Related

Erlang equivalent of javascript codePointAt?

How to open multiple URLs at the same time in an Emacs buffer?

Common Lisp output file streams SBCL

why does io:get_line return "\n" in erlang shell?

How to check for EOF/EOL with Stream I/O in Fortran?

Categories

Resources