Emacs ESS keybinding "less than" dash - binding

How to efficiently program in Emacs ESS-mode the key
"<" "[less than]"
to
"<- " "[less than][dash][space]"
Just like the MacOS version of R utilizes.

Looks like there is an existing function for it in the file ess-s-l.el. It would appear that you can use the variable ess-S-assign-key for this:
;; This is by Seth Falcon, modeled after ess-toggle-underscore (see below).
(defun ess-toggle-S-assign-key (force)
"Possibly bind the key in `ess-S-assign-key' to inserting `ess-S-assign'.
If `ess-S-assign-key' is \"_\", simply use \\[ess-toggle-underscore].
Otherwise, unless the prefix argument FORCE is set,
toggle between the new and the previous assignment."
(interactive "P")
(require 'ess-mode)
(require 'ess-inf)
(let ((current-action (lookup-key ess-mode-map ess-S-assign-key))
(insert-S-assign (lambda() (interactive)
(delete-horizontal-space) (insert ess-S-assign))))
(if (and (stringp ess-S-assign-key)
(string= ess-S-assign-key "_"))
(ess-toggle-underscore force)
;; else "do things here"
(let* ((current-is-S-assign (eq current-action insert-S-assign))
(new-action (if force insert-S-assign
;; else "not force" (default):
(if (or current-is-S-assign
(eq ess-S-assign-key-last insert-S-assign))
ess-S-assign-key-last
insert-S-assign))))
(message "[ess-toggle-S-assign-key:] current: '%s', new: '%s'"
current-action new-action)
(define-key ess-mode-map ess-S-assign-key new-action)
(define-key inferior-ess-mode-map ess-S-assign-key new-action)
(if (not (and force current-is-S-assign))
(setq ess-S-assign-key-last current-action))))))

Perhaps this is ESS version dependent.
In my version of ESS (12.03), it seems that you can bind ">" to 'ess-insert-S-assign to get what you like.
Look at the ess- commands available to you (M-x ess-<TAB><TAB> and search in the *Completions* buffer that just popped up for assign) to see which command will be the likely culprit that you should bind to ">".
If that does not work for you -- perhaps you might need to upgrade.

Related

Starting a parser for scheme language

I am writing a basic parser for a Scheme interpreter and here are the definitions I have set up to define the various type of tokens:
# 1. Parens
Type:
PAREN
Subtype:
LEFT_PAREN
Value:
'('
# 2. Operators (<=, =, +, ...)
Type:
OPERATOR
Subtype:
EQUALS
Value:
'='
Arity:
2
# 3. Types (2.5, "Hello", #f, etc.)
Type:
DATA
Subtype:
NUMBER
Value:
2.4
# 4. Procedures, builtins, and such
Type:
KEYWORD
Subtype:
BUILTIN
Value:
"set"
Arity:
2
PROCEDURE:
... // probably need a new class for this
Does the above seem like it's a good starting place? Are there some obvious things I'm missing here, or does this give me a "good-enough" foundation?
Your approach makes distinctions which really don't exist in the syntax of the language, and also makes decisions far too early. For example consider this program:
(let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
When I run this:
> (let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
setting x to 2
setting x to 3
3
In order for this to work with-assignment-notes has to somehow redefine what (set! ...) means in its body. Here's a hacky and probably incorrect (Racket) implementation of that:
(define-syntax with-assignment-notes
(syntax-rules (set!)
[(_ form ...)
(let-syntax ([rewrite/maybe
(syntax-rules (set!)
[(_ (set! var val))
(let ([r val])
(printf "setting ~A to ~A~%" 'var r)
(set! var r))]
[(_ thing)
thing])])
(rewrite/maybe form) ...)]))
So the critical features of any parser for a Lisp-family language are:
it should not make any decision about the semantics of the language that it can avoid making;
the structure it constructs must be available to the language itself as first-class objects;
(and optionally) the parser should be modifiable from the language itself.
As examples:
it is probably inevitable that the parser needs to make decisions about what is and is not a number and what sort of number it is;
it would be nice if it had default handling for strings, but this should ideally be controllable by the user;
it should make no decision at all about what, say (< x y) means but rather should return a structure representing it for interpretation by the language.
The reason for the last, optional, requirement is that Lisp-family languages are used by people who are interested in using them for implementing languages. Allowing the reader to be altered from within the language makes that hugely easier, since you don't have to start from scratch each time you want to make a language which is a bit like the one you started with but not completely.
Parsing Lisp
The usual approach to parsing Lisp-family languages is to have machinery which will turn a sequence of characters into a sequence of s-expressions consisting of objects which are defined by the language itself, notably symbols and conses (but also numbers, strings &c). Once you have this structure you then walk over it to interpret it as a program: either evaluating it on the fly or compiling it. Critically, you can also write programs which manipulate this structure itself: macros.
In 'traditional' Lisps such as CL this process is explicit: there is a 'reader' which turns a sequence of characters into a sequence of s-expressions, and macros explicitly manipulate the list structure of these s-expressions, after which the evaluator/compiler processes them. So in a traditional Lisp (< x y) would be parsed as (a cons of a symbol < and (a cons of a symbol x and (a cons of a symbol y and the empty list object)), or (< . (x . (y . ()))), and this structure gets handed to the macro expander and hence to the evaluator or compiler.
In Scheme it is a little more subtle: macros are specified (portably, anyway) in terms of rules which turn a bit of syntax into another bit of syntax, and it's not (I think) explicit whether such objects are made of conses & symbols or not. But the structure which is available to syntax rules needs to be as rich as something made of conses and symbols, because syntax rules get to poke around inside it. If you want to write something like the following macro:
(define-syntax with-silly-escape
(syntax-rules ()
[(_ (escape) form ...)
(call/cc (λ (c)
(define (escape) (c 'escaped))
form ...))]
[(_ (escape val ...) form ...)
(call/cc (λ (c)
(define (escape) (c val ...))
form ...))]))
then you need to be able to look into the structure of what came from the reader, and that structure needs to be as rich as something made of lists and conses.
A toy reader: reeder
Reeder is a little Lisp reader written in Common Lisp that I wrote a little while ago for reasons I forget (but perhaps to help me learn CL-PPCRE, which it uses). It is emphatically a toy, but it is also small enough and simple enough to understand: certainly it is much smaller and simpler than the standard CL reader, and it demonstrates one approach to solving this problem. It is driven by a table known as a reedtable which defines how parsing proceeds.
So, for instance:
> (with-input-from-string (in "(defun foo (x) x)")
(reed :from in))
(defun foo (x) x)
Reeding
To read (reed) something using a reedtable:
look for the next interesting character, which is the next character not defined as whitespace in the table (reedtables have a configurable list of whitespace characters);
if that character is defined as a macro character in the table, call its function to read something;
otherwise call the table's token reader to read and interpret a token.
Reeding tokens
The token reader lives in the reedtable and is responsible for accumulating and interpreting a token:
it accumulates a token in ways known to itself (but the default one does this by just trundling along the string handling single (\) and multiple (|) escapes defined in the reedtable until it gets to something that is whitespace in the table);
at this point it has a string and it asks the reedtable to turn this string into something, which it does by means of token parsers.
There is a small kludge in the second step: as the token reader accumulates a token it keeps track of whether it is 'denatured' which means that there were escaped characters in it. It hands this information to the token parsers, which allows them, for instance, to interpret |1|, which is denatured, differently to 1, which is not.
Token parsers are also defined in the reedtable: there is a define-token-parser form to define them. They have priorities, so that the highest priority one gets to be tried first and they get to say whether they should be tried for denatured tokens. Some token parser should always apply: it's an error if none do.
The default reedtable has token parsers which can parse integers and rational numbers, and a fallback one which parses a symbol. Here is an example of how you would replace this fallback parser so that instead of returning symbols it returns objects called 'cymbals' which might be the representation of symbols in some embedded language:
Firstly we want a copy of the reedtable, and we need to remove the symbol parser from that copy (having previously checked its name using reedtable-token-parser-names).
(defvar *cymbal-reedtable* (copy-reedtable nil))
(remove-token-parser 'symbol *cymbal-reedtable*)
Now here's an implementation of cymbals:
(defvar *namespace* (make-hash-table :test #'equal))
(defstruct cymbal
name)
(defgeneric ensure-cymbal (thing))
(defmethod ensure-cymbal ((thing string))
(or (gethash thing *namespace*)
(setf (gethash thing *namespace*)
(make-cymbal :name thing))))
(defmethod ensure-cymbal ((thing cymbal))
thing)
And finally here is the cymbal token parser:
(define-token-parser (cymbal 0 :denatured t :reedtable *cymbal-reedtable*)
((:sequence
:start-anchor
(:register (:greedy-repetition 0 nil :everything))
:end-anchor)
name)
(ensure-cymbal name))
An example of this. Before modifying the reedtable:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(x y . z)
After:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(#S(cymbal :name "x") #S(cymbal :name "y") . #S(cymbal :name "z"))
Macro characters
If something isn't the start of a token then it's a macro character. Macro characters have associated functions and these functions get called to read one object, however they choose to do that. The default reedtable has two-and-a-half macro characters:
" reads a string, using the reedtable's single & multiple escape characters;
( reads a list or a cons.
) is defined to raise an exception, as it can only occur if there are unbalanced parens.
The string reader is pretty straightforward (it has a lot in common with the token reader although it's not the same code).
The list/cons reader is mildly fiddly: most of the fiddliness is dealing with consing dots which it does by a slightly disgusting trick: it installs a secret token parser which will parse a consing dot as a special object if a dynamic variable is true, but otherwise will raise an exception. The cons reader then binds this variable appropriately to make sure that consing dots are parsed only where they are allowed. Obviously the list/cons reader invokes the whole reader recursively in many places.
And that's all the macro characters. So, for instance in the default setup, ' would read as a symbol (or a cymbal). But you can just install a macro character:
(defvar *qr-reedtable* (copy-reedtable nil))
(setf (reedtable-macro-character #\' *qr-reedtable*)
(lambda (from quote table)
(declare (ignore quote))
(values `(quote ,(reed :from from :reedtable table))
(inch from nil))))
And now 'x will read as (quote x) in *qr-reedtable*.
Similarly you could add a more compllicated macro character on # to read objects depending on their next character in the way CL does.
An example of the quote reader. Before:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
\'
The object it has returned is a symbol whose name is "'", and it didn't read beyond that of course. After:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
`(x y . z)
Other notes
Everything works one-character-ahead, so all of the various functions get the stream being read, the first character they should be interested in and the reedtable, and return both their value and the next character. This avoids endlessly unreading characters (and probably tells you what grammar class it can handle natively (obviously macro character parsers can do whatever they like so long as things are sane when they return).
It probably doesn't use anything which isn't moderately implementable in non-Lisp languages. Some
Macros will cause pain in the usual way, but the only one is define-token-parser. I think the solution to that is the usual expand-the-macro-by-hand-and-write-that-code, but you could probably help a bit by having an install-or-replace-token-parser function which dealt with the bookkeeping of keeping the list sorted etc.
You'll need a language with dynamic variables to implement something like the cons reeder.
it uses CL-PPCRE's s-expression representation of regexps. I'm sure other languages have something like this (Perl does) because no-one wants to write stringy regexps: they must have died out decades ago.
It's a toy: it may be interesting to read but it's not suitable for any serious use. I found at least one bug while writing this: there will be many more.

How to open multiple URLs at the same time in an Emacs buffer?

I am using the Emacs editor together with the org-mode and evil-mode mainly for text handling and documentation. Often there is a topic where several different URLs to websites belong to.
Example: I have a text snippet on how to install Emacs:
*** install emacs
emacs - I want to try org-mode. What's the shortest path from zero to typing? - Stack Overflow
https://stackoverflow.com/questions/4940680/i-want-to-try-org-mode-whats-the-shortest-path-from-zero-to-typing
Index of /gnu/emacs/windows/emacs-26
http://ftp.gnu.org/gnu/emacs/windows/emacs-26/emacs-26.3-x86_64.zip
Installation target:
file://C:\Lupo_Pensuite\MyApps\emacs
How to
file://C:\Lupo_Pensuite\MyDocs\howto.txt
Is it possible to select the region and all the URLs are opened within my default web browser? And the file link is being opened by Windows Explorer? And the text file is opened with the associated editor?
Or even better: emacs is aware that the a.m. text snippet actually is a org-mode chapter. And regardless where within that chapter the cursor is positioned, something like M-x open-all-links-in-chapter is...opening all mentioned links in the current chapter.
Prio 1: is there something like that existing in emacs/org-mode/evil-mode already?
Prio 2: is there a elisp function you know which can achieve this use case?
Enviroment: Cygwin under Windows 10, emacs 26.3, org-mode 9.1.9
It turns out, that org-mode has this already built-in!
Today I was browsing the documentation of org-mode, wondering how exactly C-c C-o is working. That key combo is calling the emacs org-mode function "org-open-at-point". org-open-at-point is opening the URL where the cursor (in emacs speak: point) is positioned.
Now if a C-c C-o is pressed on a heading, then all URL's beneath that heading are opened! Which is exactly what I asked for from the beginning. Thanks a lot, NickD, for your constructive contributions!
Here the original help text:
When point is on a headline, display a list of every link in the entry, so it is possible to pick one, or all, of them.
Warning: used without thought, the following can bring your machine to its knees. I will add some more specific warnings at the end, but be careful!
The basic idea of the code below is to parse the buffer of an Org mode file, in order to get a parse tree of the buffer: that is done by org-element-parse-buffer. We can then use org-element-map to walk the parse tree and select only nodes of type link, applying a function to each one as we go. The function we apply, get-link, munges through the contents of the link node, extracting the type and path and returning a list of those two. Here's how it looks so far:
(defun get-link (x)
(let* ((link (cadr x))
(type (plist-get link :type))
(path (plist-get link :path)))
(if (or (string= type "http") (string= type "https"))
(list type path))))
(defun visit-all-http-links ()
(interactive)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
links))
Note that I only keep http and https links: you may want to add extra types.
This already goes a long way towards getting you what you want. In fact, if you load the file with the two functions above, you can try it on the following sample Org mode file:
* foo
** foo 1
http://www.google.com
https://redhat.com
* bar
** bar 2
[[https://gnome.org][Gnome]] is a FLOSS project. So is Fedora: https://fedoraproject.org.
* Code
#+begin_src emacs-lisp :results value verbatim :wrap example
(visit-all-http-links)
#+end_src
#+RESULTS:
#+begin_example
(("http" "//www.google.com") ("https" "//redhat.com") ("https" "//gnome.org") ("https" "//fedoraproject.com"))
#+end_example
and evaluating the source block with C-c C-c, you get the results shown.
Now all we need to do is convert each (TYPE PATH) pair in the result list to a real URL and then visit it - here's the final version of the code:
(defun get-link (x)
"Assuming x is a LINK node in an Org mode parse tree,
return a list consisting of its type (e.g. \"http\")
and its path."
(let* ((link (cadr x))
(type (plist-get link :type))
(path (plist-get link :path)))
(if (or (string= type "http") (string= type "https"))
(list type path))))
(defun format-url (x)
"Take a (TYPE PATH) list and return a proper URL. Note
the following works for http- and https-type links, but
might need modification for other types."
(format "%s:%s" (nth 0 x) (nth 1 x)))
(defun visit-all-http-links ()
(interactive)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
(mapcar #'browse-url (mapcar #'format-url links))))
We add a function format-url that does this: ("http" "//example.com") --> "http://example.com" and map it on the links list, producing a new list of URLS. Then we map the function browse-url (which is provided by emacs) on the resulting list and we watch the browser open them all.
WARNINGS:
If you have hundreds or thousands of links in the file, then you are going to create hundreds or thousands of tabs in your browser. Are you SURE your machine can take it?
If your links point to big objects, that's going to put another kind of memory pressure on your system. Are you SURE your machine can take it?
If your Org mode buffer is big, then org-element-parse-buffer can take a LONG time to process it. Moreover, even though there is a caching mechanism, it is not enabled by default because of bugs, so every time you execute the function you are going to parse the buffer AGAIN from scratch.
Every time you execute the function, you are going to open NEW tabs in your browser.
EDIT in response to questions in comments:
Q1: "visit-all-http-links opens all URLs in the file. My original question was, whether it is possible to open only the URLs which are being found in the current org-mode chapter."
A1: Doing just a region is a bit harder but possible, if you guarantee that the region is syntactically correct Org mode (e.g. a collection of headlines and their contents). You just write the region to a temporary buffer and then do what I did on the temp buffer instead of the original.
Here's the modified code using the visit-url function from Question 2:
(defun visit-all-http-links-in-region (beg end)
(interactive "r")
(let ((s (buffer-substring beg end)))
(with-temp-buffer
(set-buffer (current-buffer))
(insert s)
(let* ((parse-tree (org-element-parse-buffer))
(links (org-element-map parse-tree 'link #'get-link)))
(mapcar #'visit-url (mapcar #'format-url links))))))
(defun visit-all-http-links ()
(interactive)
(visit-all-http-links-in-region (point-min) (point-max)))
Very lightly tested.
Q2: "Every time I execute your function with your example URLs, the URLs are being opened with a different sequence - is it possible to open the URLs in that very sequence which is found in the org file?"
A2: The links are found deterministically in the order that they occur in the file. But the moment you call browse-url, all bets are off, because the URL now belongs to the browser, which will try to open each URL it receives in a separate tab and using a separate thread - in other words asynchronously. You might try introducing a delay between calls, but there are no guarantees:
(defun visit-url(url)
(browse-url)
(sit-for 1 t))
and then use visit-url instead of browse-url in visit-all-urls.

Haskell/Parsec: How do you use the functions in Text.Parsec.Indent?

I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!

Reading the binary output of an external program in Common Lisp

I'm trying to run an external program in SBCL and capture its output.
The output is binary data (a png image), while SBCL insists on interpreting it as strings.
I tried a number of ways, like
(trivial-shell:shell-command "/path/to/png-generator" :input "some input")
(with-input-from-string (input "some input")
(with-output-to-string (output)
(run-program "/path/to/png-generator" () :input input :output output))
(with-input-from-string (input "some input")
(flexi-streams:with-output-to-sequence (output)
(run-program "/path/to/png-generator" () :input input :output output))
But I get errors like
Illegal :UTF-8 character starting at byte position 0.
It seems to me that SBCL is trying to interpret the binary data as a text and decode it. How do I change this behaviour ? I'm interested only in obtaining a vector of octets.
Edit: Since it is not clear from the text above, I'd like to add that at least in the case of flexi-stream, the element-type of the stream is a flexi-streams:octect (which is a (unsigned-byte 8)).
I would expect at least in this case run-program to read the raw bytes without many issues. Instead I get a message like Don't know how to copy to stream of element-type (UNSIGNED-BYTE 8)
Edit: I got angry at not being able to do this very simple task and solved the problem.
Functionally, the ability to send a stream of type UNSIGNED-BYTE into run-program and have it work correctly is severely limited, for reasons I don't understand. I tried gray streams, flexi-streams, fd streams, and a few other mechanisms, like you.
However, perusing run-program's source (for the fifth or sixth time), I noticed that there's an option :STREAM you can pass to output. Given that, I wondered if read-byte would work... and it did. For more performant work, one could determine how to get the length of a non-file stream and run READ-SEQUENCE on it.
(let*
;; Get random bytes
((proc-var (sb-ext:run-program "head" '("-c" "10" "/dev/urandom")
:search t
;; let SBCL figure out the storage type. This is what solved the problem.
:output :stream))
;; Obtain the streams from the process object.
(output (process-output proc-var))
(err (process-error proc-var)))
(values
;;return both stdout and stderr, just for polish.
;; do a byte read and turn it into a vector.
(concatenate 'vector
;; A byte with value 0 is *not* value nil. Yay for Lisp!
(loop for byte = (read-byte output nil)
while byte
collect byte))
;; repeat for stderr
(concatenate 'vector
(loop for byte = (read-byte err nil)
while byte
collect byte))))
If you're willing to use some external libraries, this can be done with babel-streams. This is a function I use to safely get content from a program. I use :latin-1 because it maps the first 256 bytes just to the characters. You could remove the octets-to-string and have the vector.
If you wanted stderr as well, you could use nested 'with-output-to-sequence' to get both.
(defun safe-shell (command &rest args)
(octets-to-string
(with-output-to-sequence (stream :external-format :latin-1)
(let ((proc (sb-ext:run-program command args :search t :wait t :output stream)))
(case (sb-ext:process-status proc)
(:exited (unless (zerop (sb-ext:process-exit-code proc))
(error "Error in command")))
(t (error "Unable to terminate process")))))
:encoding :latin-1))
Paul Nathan already gave a pretty complete answer as to how to read I/O from a program as binary, so I'll just add why your code didn't work: because you explicitely asked SBCL to interpret the I/O as a string of UTF-8 characters, using with-{in,out}put-to-string.
Also, I'd like to point that you don't need to go as far as run-program's source code to get to the solution. It's clearly documented in SBCL's manual.

How can I "URL decode" a string in Emacs Lisp?

I've got a string like "foo%20bar" and I want "foo bar" out of it.
I know there's got to be a built-in function to decode a URL-encoded string (query string) in Emacs Lisp, but for the life of me I can't find it today, either in my lisp/ folder or with Google.
What is it called?
url-unhex-string
In my case I needed to do this interactively. The previous answers gave me the right functions to call, then it was just a matter of wrapping it a little to make them interactive:
(defun func-region (start end func)
"run a function over the region between START and END in current buffer."
(save-excursion
(let ((text (delete-and-extract-region start end)))
(insert (funcall func text)))))
(defun hex-region (start end)
"urlencode the region between START and END in current buffer."
(interactive "r")
(func-region start end #'url-hexify-string))
(defun unhex-region (start end)
"de-urlencode the region between START and END in current buffer."
(interactive "r")
(func-region start end #'url-unhex-string))
Add salt, I mean bind to keys according to taste.
Emacs is shipped with a URL library that provides a bunch of URL parsing functions—as huaiyuan and Charlie Martin already pointed out. Here is a small example that'd give you an idea how to use it:
(let ((url "http://www.google.hu/search?q=elisp+decode+url&btnG=Google+keres%E9s&meta="))
;; Return list of arguments and values
(url-parse-query-string
;; Decode hexas
(url-unhex-string
;; Retrieve argument list
(url-filename
;; Parse URL, return a struct
(url-generic-parse-url url)))))
=> (("meta" "") ("btnG" "Google+keresés") ("/search?q" "elisp+decode+url"))
I think is better to rely on it rather than Org-mode as it is its main purpose to parse a URL.
org-link-unescape does the job for very simple cases ... w3m-url-decode-string is better, but it isn't built in and the version I have locally isn't working with Emacs 23.
You can grab urlenc from MELPA and use urlenc:decode-region for a region or urlenc:decode-insert to insert your text interactively.
I think you're making it a little too hard: split-string will probably do most of what you want. For fancier stuff, have a look at the functions in url-expand.el; unfortunately, many of them don't have doc-strings, so you may have to read code.
url-generic-parse-url looks like a potential winner.

Resources