Read Loop Over String in Racket Scheme - parsing

I'm using Racket Scheme. I have a string defined and I'm attempting to parse it.
I initially have
(define expression (open-input-string "(expression here)"))
And now I'm attempting to iterate over all the characters using the scheme read function. I'm new to scheme though and I'm not quite sure how to properly loop them.
Essentially I need to loop all
read-char expression
Thanks

You don't want plain read for something like this. It is built to read Scheme/Racket syntax, not arbitrary data. Instead, you probably want string->list, which splits a string into a list of characters.
(string->list "(expression here)")
; => '(#\( #\e #\x #\p #\r #\e #\s #\s #\i #\o #\n #\space #\h #\e #\r #\e #\))
Perhaps you don't want to read the whole string at once, though? There does exist a read-char function, as you include in your post. There are lots of ways to loop in Racket. You can use recursion, or you can use Racket's plethora of for loop forms.
Still, if you have an input port and you just want to loop over all the characters, well, there's an easy way to do that, too! You can use the handy in-input-port-chars sequence with for loops like this:
(define expression (open-input-string "(expression here)"))
(for ([c (in-input-port-chars expression)])
(display c))
; => (expression here)

Related

How to replace some characters of input file, before it getting lexed in flex?

How to replace all occurrences of some character or char-sequence with some other character or char-sequence, before flex lexes it. For example I want B\65R to match identifier rule as it is equivalent to BAR in my grammar. So, essentially I want to turn a sequence of \dd into its equivalent ascii character and then lex it. (\65 -> A, \66 -> B, …).
I know, I can first search the entire file for a sequence of \dd and replace it with equivalent character and then feed it to flex. But I wonder if there exists a better way. Something like writing a rule that matches \dd and then replacing it with corresponding alternative in the input stream, so that, I don't have to parse entire file twice.
Several options...
Next, flex is going to read from a filter that
substitutes "\dd" by "chr(dd)" (untested).
You could run something along the lines of
YYIN = popen("perl -pe 's/\\(\d\d)/chr($1)/e' ", "r");
yylex()....

Starting a parser for scheme language

I am writing a basic parser for a Scheme interpreter and here are the definitions I have set up to define the various type of tokens:
# 1. Parens
Type:
PAREN
Subtype:
LEFT_PAREN
Value:
'('
# 2. Operators (<=, =, +, ...)
Type:
OPERATOR
Subtype:
EQUALS
Value:
'='
Arity:
2
# 3. Types (2.5, "Hello", #f, etc.)
Type:
DATA
Subtype:
NUMBER
Value:
2.4
# 4. Procedures, builtins, and such
Type:
KEYWORD
Subtype:
BUILTIN
Value:
"set"
Arity:
2
PROCEDURE:
... // probably need a new class for this
Does the above seem like it's a good starting place? Are there some obvious things I'm missing here, or does this give me a "good-enough" foundation?
Your approach makes distinctions which really don't exist in the syntax of the language, and also makes decisions far too early. For example consider this program:
(let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
When I run this:
> (let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
setting x to 2
setting x to 3
3
In order for this to work with-assignment-notes has to somehow redefine what (set! ...) means in its body. Here's a hacky and probably incorrect (Racket) implementation of that:
(define-syntax with-assignment-notes
(syntax-rules (set!)
[(_ form ...)
(let-syntax ([rewrite/maybe
(syntax-rules (set!)
[(_ (set! var val))
(let ([r val])
(printf "setting ~A to ~A~%" 'var r)
(set! var r))]
[(_ thing)
thing])])
(rewrite/maybe form) ...)]))
So the critical features of any parser for a Lisp-family language are:
it should not make any decision about the semantics of the language that it can avoid making;
the structure it constructs must be available to the language itself as first-class objects;
(and optionally) the parser should be modifiable from the language itself.
As examples:
it is probably inevitable that the parser needs to make decisions about what is and is not a number and what sort of number it is;
it would be nice if it had default handling for strings, but this should ideally be controllable by the user;
it should make no decision at all about what, say (< x y) means but rather should return a structure representing it for interpretation by the language.
The reason for the last, optional, requirement is that Lisp-family languages are used by people who are interested in using them for implementing languages. Allowing the reader to be altered from within the language makes that hugely easier, since you don't have to start from scratch each time you want to make a language which is a bit like the one you started with but not completely.
Parsing Lisp
The usual approach to parsing Lisp-family languages is to have machinery which will turn a sequence of characters into a sequence of s-expressions consisting of objects which are defined by the language itself, notably symbols and conses (but also numbers, strings &c). Once you have this structure you then walk over it to interpret it as a program: either evaluating it on the fly or compiling it. Critically, you can also write programs which manipulate this structure itself: macros.
In 'traditional' Lisps such as CL this process is explicit: there is a 'reader' which turns a sequence of characters into a sequence of s-expressions, and macros explicitly manipulate the list structure of these s-expressions, after which the evaluator/compiler processes them. So in a traditional Lisp (< x y) would be parsed as (a cons of a symbol < and (a cons of a symbol x and (a cons of a symbol y and the empty list object)), or (< . (x . (y . ()))), and this structure gets handed to the macro expander and hence to the evaluator or compiler.
In Scheme it is a little more subtle: macros are specified (portably, anyway) in terms of rules which turn a bit of syntax into another bit of syntax, and it's not (I think) explicit whether such objects are made of conses & symbols or not. But the structure which is available to syntax rules needs to be as rich as something made of conses and symbols, because syntax rules get to poke around inside it. If you want to write something like the following macro:
(define-syntax with-silly-escape
(syntax-rules ()
[(_ (escape) form ...)
(call/cc (λ (c)
(define (escape) (c 'escaped))
form ...))]
[(_ (escape val ...) form ...)
(call/cc (λ (c)
(define (escape) (c val ...))
form ...))]))
then you need to be able to look into the structure of what came from the reader, and that structure needs to be as rich as something made of lists and conses.
A toy reader: reeder
Reeder is a little Lisp reader written in Common Lisp that I wrote a little while ago for reasons I forget (but perhaps to help me learn CL-PPCRE, which it uses). It is emphatically a toy, but it is also small enough and simple enough to understand: certainly it is much smaller and simpler than the standard CL reader, and it demonstrates one approach to solving this problem. It is driven by a table known as a reedtable which defines how parsing proceeds.
So, for instance:
> (with-input-from-string (in "(defun foo (x) x)")
(reed :from in))
(defun foo (x) x)
Reeding
To read (reed) something using a reedtable:
look for the next interesting character, which is the next character not defined as whitespace in the table (reedtables have a configurable list of whitespace characters);
if that character is defined as a macro character in the table, call its function to read something;
otherwise call the table's token reader to read and interpret a token.
Reeding tokens
The token reader lives in the reedtable and is responsible for accumulating and interpreting a token:
it accumulates a token in ways known to itself (but the default one does this by just trundling along the string handling single (\) and multiple (|) escapes defined in the reedtable until it gets to something that is whitespace in the table);
at this point it has a string and it asks the reedtable to turn this string into something, which it does by means of token parsers.
There is a small kludge in the second step: as the token reader accumulates a token it keeps track of whether it is 'denatured' which means that there were escaped characters in it. It hands this information to the token parsers, which allows them, for instance, to interpret |1|, which is denatured, differently to 1, which is not.
Token parsers are also defined in the reedtable: there is a define-token-parser form to define them. They have priorities, so that the highest priority one gets to be tried first and they get to say whether they should be tried for denatured tokens. Some token parser should always apply: it's an error if none do.
The default reedtable has token parsers which can parse integers and rational numbers, and a fallback one which parses a symbol. Here is an example of how you would replace this fallback parser so that instead of returning symbols it returns objects called 'cymbals' which might be the representation of symbols in some embedded language:
Firstly we want a copy of the reedtable, and we need to remove the symbol parser from that copy (having previously checked its name using reedtable-token-parser-names).
(defvar *cymbal-reedtable* (copy-reedtable nil))
(remove-token-parser 'symbol *cymbal-reedtable*)
Now here's an implementation of cymbals:
(defvar *namespace* (make-hash-table :test #'equal))
(defstruct cymbal
name)
(defgeneric ensure-cymbal (thing))
(defmethod ensure-cymbal ((thing string))
(or (gethash thing *namespace*)
(setf (gethash thing *namespace*)
(make-cymbal :name thing))))
(defmethod ensure-cymbal ((thing cymbal))
thing)
And finally here is the cymbal token parser:
(define-token-parser (cymbal 0 :denatured t :reedtable *cymbal-reedtable*)
((:sequence
:start-anchor
(:register (:greedy-repetition 0 nil :everything))
:end-anchor)
name)
(ensure-cymbal name))
An example of this. Before modifying the reedtable:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(x y . z)
After:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(#S(cymbal :name "x") #S(cymbal :name "y") . #S(cymbal :name "z"))
Macro characters
If something isn't the start of a token then it's a macro character. Macro characters have associated functions and these functions get called to read one object, however they choose to do that. The default reedtable has two-and-a-half macro characters:
" reads a string, using the reedtable's single & multiple escape characters;
( reads a list or a cons.
) is defined to raise an exception, as it can only occur if there are unbalanced parens.
The string reader is pretty straightforward (it has a lot in common with the token reader although it's not the same code).
The list/cons reader is mildly fiddly: most of the fiddliness is dealing with consing dots which it does by a slightly disgusting trick: it installs a secret token parser which will parse a consing dot as a special object if a dynamic variable is true, but otherwise will raise an exception. The cons reader then binds this variable appropriately to make sure that consing dots are parsed only where they are allowed. Obviously the list/cons reader invokes the whole reader recursively in many places.
And that's all the macro characters. So, for instance in the default setup, ' would read as a symbol (or a cymbal). But you can just install a macro character:
(defvar *qr-reedtable* (copy-reedtable nil))
(setf (reedtable-macro-character #\' *qr-reedtable*)
(lambda (from quote table)
(declare (ignore quote))
(values `(quote ,(reed :from from :reedtable table))
(inch from nil))))
And now 'x will read as (quote x) in *qr-reedtable*.
Similarly you could add a more compllicated macro character on # to read objects depending on their next character in the way CL does.
An example of the quote reader. Before:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
\'
The object it has returned is a symbol whose name is "'", and it didn't read beyond that of course. After:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
`(x y . z)
Other notes
Everything works one-character-ahead, so all of the various functions get the stream being read, the first character they should be interested in and the reedtable, and return both their value and the next character. This avoids endlessly unreading characters (and probably tells you what grammar class it can handle natively (obviously macro character parsers can do whatever they like so long as things are sane when they return).
It probably doesn't use anything which isn't moderately implementable in non-Lisp languages. Some
Macros will cause pain in the usual way, but the only one is define-token-parser. I think the solution to that is the usual expand-the-macro-by-hand-and-write-that-code, but you could probably help a bit by having an install-or-replace-token-parser function which dealt with the bookkeeping of keeping the list sorted etc.
You'll need a language with dynamic variables to implement something like the cons reeder.
it uses CL-PPCRE's s-expression representation of regexps. I'm sure other languages have something like this (Perl does) because no-one wants to write stringy regexps: they must have died out decades ago.
It's a toy: it may be interesting to read but it's not suitable for any serious use. I found at least one bug while writing this: there will be many more.

wxMaxima: how to use texput to tell tex1 how to handle strings?

tex1() seems to return all strings as follow:
tex1(hello);
{\it hello}
tex1("hello");
\mbox{ hello }
What variable must one use to change this handling via texput? e.g. if I would just like it to print strings literally? I'm using other Maxima commands (like printf and concat to produce strings that are then passed to tex1, and occasionally the default handling is causing issues.
I tried texput(""", ...) and texput("''", ...); the first wasn't accepted, the 2nd was, but did not change the output. I really have no clue for the non-quoted strings.
Let's be careful to distinguish symbols from strings. When you enter tex1(hello) then hello is a symbol, and when you enter tex1("hello") then "hello" is a string. Symbols are essentially names for items in a lookup table, which can store additional info (symbol properties) for each. Strings on the other hand are just (from Maxima's point of view) just a sequence of characters.
Anyway changing the output for all symbols or all strings is unfortunately not possible via texput. But with a one-line Lisp function, one can accomplish it. Try this: for symbols,
:lisp (defun tex-stripdollar (sym) (maybe-invert-string-case (symbol-name (stripdollar sym))))
and for strings,
:lisp (defun tex-string (str) str)
These are going to change some existing outputs, so you'll want to try it and see if it works for you.

How to output text from lists of variable sizes using printf?

I'm trying to automate some output using printf but I'm struggling to find a way to pass to it the list of arguments expr_1, ..., expr_n in
printf (dest, string, expr_1, ..., expr_n)
I thought of using something like Javascript's spread operator but I'm not even sure I should need it.
For instace, say I have a list of strings to be output
a:["foo","bar","foobar"];
a string of appropriate format descriptors, say
s: "~a ~a ~a ~%";
and an output stream, say os. How can I invoke printf using these things in such a way that the result will be the same as writing
printf(os,s,a[1],a[2],a[3]);
Then I could generalize it to output lists of variable size.
Any suggestions?
Thanks.
EDIT:
I just learned about apply and, using the conditions I posed in my OP, the following seems to work wonderfully:
apply(printf,append([os,s],a));
Maxima printf implements most or maybe all of the formatting operators from Common Lisp FORMAT, which are quite extensive; see: http://www.lispworks.com/documentation/HyperSpec/Body/22_c.htm See also ? printf in Maxima to get an abbreviated list of formatting operators.
In particular for a list you can do something like:
printf (os, "my list: ~{~a~^, ~}~%", a);
to get the elements of a separated by ,. Here "~{...~}" tells printf to expect a list, and ~a is how to format each element, ~^ means omit the inter-element stuff after the last element, and , means put that between elements. Of course , could be anything.
There are many variations on that; if that's not what you're looking for, maybe I can help you find it.

What is the equivalent of Mathematica's ToExpression in Racket?

I am looking for something similar to ToExpression that is available in Mathematica. I just want to convert a string to an expression, and evaluate the expression. As a first pass, my strings will include only numbers and arithmetic operators, and not even parentheses.
If I need to write it, please point me in the direction of the appropriate pre-defined modules/definitions which I should use.
Maybe you can use this parser for infix expressions.
http://planet.racket-lang.org/package-source/soegaard/infix.plt/1/0/planet-docs/manual/index.html
Here is a small example (it takes a while for the library to install - it seems it old Schematics test suite takes forever to install these days - I need to switch to a builtin one).
#lang at-exp racket
(require (planet soegaard/infix)
(planet soegaard/infix/parser))
(display (format "1+2*3 is ~a\n" #${1+2*3} ))
(parse-expression #'here (open-input-string "1+2*3"))
The output will be:
1+2*3 is 7
.#<syntax:6:21 (#%infix (+ 1 (* 2 3)))>
The function parse-expression parses the expression in the string and
returns a syntax-object that resembles the output of ToExpression.
Does the section on dynamic evaluation apply to your question? You can parse strings into expressions by using a combination of read and open-input-string. The resulting expressions can be evaluated, with or without the help of a sandbox.
http://docs.racket-lang.org/guide/eval.html

Resources