How to handle left recursion in FParsec - parsing

My languages uses s-expressions with one additional feature - a dot operator for accessing elements from an array or struct.
Currently, my parser works on this code using the access operator -
; Parition a sequence into a pair of sequences.
; NOTE: currently not tail-recursive.
[defRec partition [pred seq]
(if (isDone seq)
(pair (list) (list))
(let (value (peek seq))
(nextSeq (next seq))
(nextResult (partition pred nextSeq))
(nextResultFirst (access :m:first nextResult))
(nextResultSecond (access :m:second nextResult))
(if (pred value)
(pair (cons value nextResultFirst) nextResultSecond)
(pair nextResultFirst (cons value nextResultSecond)))))]
However, I want to add alternate parsing using a dot operator like so -
; Parition a sequence into a pair of sequences.
; NOTE: currently not tail-recursive.
[defRec partition [pred seq]
(if (isDone seq)
(pair (list) (list))
(let (value (peek seq))
(nextSeq (next seq))
(nextResult (partition pred nextSeq))
(nextResultFirst nextResult.first)
(nextResultSecond nextResult.second)
(if (pred value)
(pair (cons value nextResultFirst) nextResultSecond)
(pair nextResultFirst (cons value nextResultSecond)))))]
They will both parse out to an equivalent AST. Left recursion matters here because an expression like (f x).y should parse out just like (access :m:y (f x))
However, I don't know how to make FParsec deal with this type of left-recursive parsing, or what alternatives I have to left-recursion.

Related

Could these be used as Racket definitions for cons, first, and rest?

I am trying to follow along this Racket tutorial:
https://cs.uwaterloo.ca/~plragde/flaneries/TYR/Pure_Functional_Programming.html
and it describes these alternatives to the actual implementations. I know what cons, first and rest do, and I understand how lambda functions work, but I cannot figure out how these successfully implement cons, first and rest.
(define (cons f r) (lambda (b) (if b f r)))
(define (first lst) (lst true))
(define (rest lst) (lst false))
Maybe I have misinterpretted what they are supposed to be. Here is what the tutorial says:
As another example of the use of closures, consider the following "alternative" implementation of the basic list primitives.
Could someone clarify?
#lang racket
(define (cons f r) (lambda (b) (if b f r)))
Here cons has been redefined to be a procedure which takes two arguments f and r, and returns a procedure which takes one argument b. Let's try it:
cons-func.rkt> (define a-cons (cons 1 2))
Now we have called the cons procedure with the values 1 and 2; cons has returned a procedure which we have named a-cons.
cons-func.rkt> a-cons
#<procedure:...ratch/cons-func.rkt:3:19>
Now, a-cons is a procedure that takes a single argument; if that argument is true, then the first of two values passed in the earlier call to cons is returned; if instead the argument to a-cons is false, then the second of the two earlier arguments is returned:
cons-func.rkt> (a-cons true)
1
cons-func.rkt> (a-cons false)
2
There are two values stored in the closure returned by our new cons, and both of them can be retrieved. This is effectively a cons cell. Let's add some sugar to make this nicer to use. Here first and rest just do what we did by hand a moment ago:
(define (first lst) (lst true))
(define (rest lst) (lst false))
Now we can access the two parts of our cons cell in the usual way:
cons-func.rkt> (first a-cons)
1
cons-func.rkt> (rest a-cons)
2
This new implementation of cons cells is not compatible with the old one. For example, we have not redefined car and cdr, and the normal implementation of those procedures will not work with our new cons cells:
cons-func.rkt> (car a-cons)
; car: contract violation
; expected: pair?
; given: #<procedure:...ratch/cons-func.rkt:3:19>
We can use our new cons cells to define list:
(define (list . xs)
(if (null? xs)
'()
(cons (car xs)
(apply list (cdr xs)))))
Here we are cheating a bit, using the dot syntax to capture an arbitrary number of arguments in a list (the regular kind). We use car and cdr to take this list apart (because they work with regular lists), and reassemble it with the new cons into a closure-based list.
cons-func.rkt> (define a-list (list 'a 'b 'c))
cons-func.rkt> a-list
#<procedure:...ratch/cons-func.rkt:3:19>
Here you can see that the list created by our new list procedure, named a-list, is itself a procedure. We can call our first and rest procedures on a-list:
cons-func.rkt> (first a-list)
'a
cons-func.rkt> (rest a-list)
#<procedure:...ratch/cons-func.rkt:3:19>
cons-func.rkt> (first (rest a-list))
'b
cons-func.rkt> (first (rest (rest a-list)))
'c
cons-func.rkt> (rest (rest (rest a-list)))
'()
So, it seems that our closure-based cons cells behave as regular cons cells, though the underlying implementation is not compatible with Rackets own cons and list accessor procedures, and we can use these closure-based cons cells to create closure-based lists. If someone had the time and inclination, they could probably implement a whole Scheme around these closure-based cons cells.
Defining a data structure is a process of 2 steps.
defining the axioms of the data structure, i.e. the abstract data structure
defining the implementation which must obey the axioms.
Whatever implementation you create is correct as time as the axioms are true in that implementation.
The axioms for the lisp constructor differ from implementation to implementation. In scheme they are so
(CAR (CONS x y)) = x
(CDR (CONS x y)) = y
Each implementation that creates the functions CONS, CAR, CDR that obey this is correct and can be used as implementation for cons cells contructor.
The things are not so direct however, as the axioms of CONS can interact with other axioms of the system. For example, some systems impose the axiom:
(EQ (CONS x y) (CONS X Y))
and in this case you need to take care in the implementation such that cons to check if there is already a (X . Y) pair allocated, and not duplicate the memory.
The other answers detail the implementation of cons with functions.
Explanation through the substitution method:
The results you want are
(first (cons a b))
—> a
and
(rest (cons a b))
—> b
These are all the requirements for the three primitives.
Now consider (cons x y): replace f and r in the definition of cons with x and y – that is,
(cons x y)
—> (lambda (b) (if b x y))
Now, (first (cons x y)) is
(first (lambda (b) (if b x y)))
—> ((lambda (b) (if b x y)) true)
—> (if true x y)
—> x
and (rest (cons x y)) is
(first (lambda (b) (if b x y)))
—> ((lambda (b) (if b x y)) false)
—> (if false x y)
—> y

order matters with return-from in SBCL

I'm on Day 3 of trying to learn Common Lisp (SBCL) by doing Advent of Code. I understand there is more than one type of return. I'm wondering if someone could explain to me why the following function will return nil (this makes sense)
(defun fn (n)
(cond ((zerop n)
(return-from fn nil))
(t
(write-line "Hello World")
(fn (- n 1)))))
but the following function will return "Hello World" (this does not make sense to me).
(defun fn (n)
(cond ((zerop n)
(return-from fn nil))
(t
(fn (- n 1))
(write-line "Hello World"))))
I found a great post covering some aspects of SBCL's return behaviour here, but to my understanding it doesn't seem to address this particular detail.
EDIT: a loop call is a more sensible way of writing this function, but it is not the way that I discovered this behaviour. My suspicion is that this behaviour arises because fn is called recursively.
(I started writing this before Sylwester's answer, which is mostly better I think.)
A critical difference between Lisp-family languages and many other languages is that Lisp-family languages are 'expression languages'. What this means technically is that languages like (say) C or Python there are two sorts of constructs:
expressions, which have values;
statements, which do not;
While in Lisp-family languages there is one sort of thing: expressions, which have values. Lisp-family languages are sometimes called 'expression languages' as a result of this.
This makes a huge difference if you want to write functions, which are things which return values (a function call is an expression in other words).
Conventional languages (Python as an example)
In a language which is not an expression language then if you're defining a function and find yourself in the middle of some construct which is a statement and you want to return a value, you have to use some special magic construct, often called return to do that. So in Python, where conditionals are statements, you might write:
def fib(n):
if n < 2:
return n
else:
return fib(n - 1) + fib(n - 2)
And in fact you have to use return because the body of a function definition in Python is a series of statements, so to return any kind of value at all you need to use return.
In fact, Python (and C, and Java &c &c) have a special form of a conditional which is an expression: in Python this looks like this:
def fib(n):
return n if n < 2 else (fib(n - 1) + fib(n - 2)
It looks different in C but it does the same thing.
But you still need this annoying return (OK, only one of them now) & that brings to light another feature of such languages: if some place in the syntax wants a statement you generally need to have a statement there, or if you can put an expression there its value just gets dropped. So you can try something like this:
def fib(n):
n if n < 2 else (fib(n - 1) + fib(n - 2)
And that's syntactically OK -- the expression gets turned into a statement -- but it fails at runtime because the function no longer returns a useful value. In Python you can get around this if you want people to hate you:
fib = lambda n: n if n < 2 else fib(n - 1) + fib(n - 2)
Python people will hate you if you do this, and it's also not useful, because Python's lambda only takes expressions so what you can write is crippled.
Lisp
Lisp has none of this: in Lisp everything is an expression and therefore everything has a value, you just need to know where it comes from. There is still return (in CL, anyway) but you need to use it much less often.
But, of course people often do want to write programs which look like 'do this, then do this, then do this', where most of the doing is being done for side-effect, so Lisps generally have some kind of sequencing construct, which lets you just have a bunch of expressions one after the other, all but (typically) one of which get evaluated for side-effect. In CL the most common sequencing construct is called progn (for historical reasons). (progn ...) is an expression made of other expressions, and its value is the value of last expression in its body.
progn is so useful in fact that a bunch of other constructs have 'implicit progns' in them. Two examples are function definitions (the body of defun is an implicit progn) and cond (the body of a cond-clause is an implicit `progn).
Your function
Here is your function (first version) with its various parts notated
(defun fn (n)
;; the body of fn is an implicit progn with one expression, so
;; do this and return its value
(cond
;; the value of cond is the value of the selected clause, or nil
((zerop n)
;; the body of this cond clause is an implicit progn with on
;; expression so do this and ... it never returns
(return-from fn nil))
(t
;; the body of this cond clause is an implicit progn with two expressions, so
;; do this for side-effect
(write-line "Hello World")
;; then do this and return its value
(fn (- n 1)))))
Here is the second version
(defun fn (n)
;; the body of fn is an implicit progn with one expression, so
;; do this and return its value
(cond
;; the value of cond is the value of the selected clause, or nil
((zerop n)
;; the body of this cond clause is an implicit progn with on
;; expression so do this and ... it never returns
(return-from fn nil))
(t
;; the body of this cond clause is an implicit progn with two expressions, so
;; do this for side-effect
(fn (- n 1))
;; then do this and return its value
(write-line "Hello World"))))
So you can see what is happening here: in the first version the value that gets returned is either nil or the value of the recursive call (also nil). In the second version the value that gets returned is either nil or whatever write-line returns. And it turns out that write-line returns the value of its argument, so that's what you get if you call it with an integer greater than zero.
Why have return-from at all in Lisp?
One thing that should be immediately clear from this whole expression-language thing is that you hardly ever need to explicitly return something in Lisp: you just have an expression that computes the value you want. But there are two good uses (which perhaps are really the same use) of explicit returns.
The first is that sometimes you are doing some big search for something in the form of a bunch of nested loops and at some point you just want to say 'OK, found it, here's the answer'. You can do that in one of two ways: you can carefully structure your loops so that once you find what you're after they all terminate nicely and the value gets passed back up, or you can just say 'here's the answer'. The latter thing is what return-from does: it just says 'I'm done now, unwind the stack carefully and return this':
(defun big-complicated-search (l m n)
(dotimes (i l)
(dotimes (j m)
(dotimes (k n)
(let ((it (something-involving i j k l m n)))
(when (interesting-p it)
(return-from big-complicated-search it)))))))
And return-from does this in the right way:
(defun big-complicated-file-search (file1 file2)
(with-open-file (f1 file1)
(with-open-file (f2 file2)
...
(when ...
(return-from big-complicated-search found)))))
When you call this, and when the thing is found, return-from will make sure that the two files you have opened are properly closed.
The second, which is really almost the same thing, is that sometimes you need to just give up, and return-from is a good way of doing this: it returns immediately, deals with clean-ups (see above) and is generally a nice way of saying 'OK, I give up now'. At first blush this seems like something you would do with some kind of exception-handling system, but in fact there are two critical differences:
in an exception-handling system (which CL has, of course), you need some kind of exception to raise so you might need to invent something;
exception-handling systems are dynamic not lexical: if you raise an exception then the thing that gets to handle it is hunted for up the stack dynamically: this means that you're at the mercy of anyone who stuck a handler in the way and it also typically rather slow.
Finally the exceptional-return-via-error-handling-mechanism is just, well, horrid.
Your code:
(defun fn (n)
(cond ((zerop n) (return-from fn nil))
(t (write-line "Hello World") (fn (- n 1)))
)
)
There is a bunch of things slightly wrong with above code:
(defun fn (n)
(cond ((zerop n) (return-from fn nil)) ; 1) the return from is not needed
(t (write-line "Hello World") (fn (- n 1))) ; 2) this line is not correctly
; indented
) ; 3) dangling parentheses Don't. Never.
; also: incorrect indentation
)
the first cond clause already returns a value, just write nil as the return value. Then the whole cond returns this value. It is very rare that you need return or return-from from a cond clause.
use an editor to indent your code. In GNU Emacs / SLIME the command control-meta-q will indent the expression. For help about the editor commands in the current mode see: control-h m for mode help.
indent correctly and don't use dangling parentheses. They are useless in Lisp. Learn to use the editor to correctly indent code - that's much more useful than to place wrongly indented parentheses on their own line. tab indents the current line.
It's more useful to format the code like this for a beginner:
(defun fn (n)
(cond ((zerop n)
(return-from fn nil))
(t
(write-line "Hello World")
(fn (- n 1)))))
Code then will look more like a prefix tree.
Also don't forget to disable inserting tabs in GNU Emacs Put this into your emacs init file: (setq-default indent-tabs-mode nil). You can evaluate Emacs Lisp expressions also on the fly with meta>-:.
Now according to 1. above code is usually written as:
(defun fn (n)
(cond ((zerop n)
nil)
(t
(write-line "Hello World")
(fn (- n 1)))))
When n is zero, the first clause is selected and its last value is returned. The other clauses are not looked at -> cond returns nil -> the function fn returns nil.
Usually I would write above recursive function like this:
(defun fn (n)
(unless (zerop n)
(write-line "Hello World")
(fn (- n 1))))
unless returns nil if (zerop n) is true. Another variant:
(defun fn (n)
(when (plusp n)
(write-line "Hello World")
(fn (- n 1))))
You CAN use return-from, but in case it was not clear: you don't need it most of the time.
Unlike C language family Lisp has the feature that everything is expressions. That means you "return" the result of an expression. eg.
(+ (if (< x 0)
(- x)
x)
3)
Here the result of the if is that it will be the absolute value of x. Thus if x is -5 or 5 the result of the expression is 8. You can write abs like this:
(defun my-abs (v)
(if (< v 0)
(- v)
v))
Notice I do not use return. THe result of the if is the last expression and that means the result of that is the result of my-abs.
Your two functions can be written like this:
(defun fn1 (n)
(cond
((zerop n) nil)
(t (write-line "Hello World") (fn1 (- n 1)))))
And
(defun fn2 (n)
(cond
((zerop n) nil)
(t (fn2 (- n 1)) (write-line "Hello World"))))
Needless to say (write-line "Hello World") returns its argument in addition to print the argument. Thus whenever it is the last expression it will be the result.
For every n above 0 it will do the recursions first and each end every one except the first will return "Hello World". If you call (fn2 0) the result is nil, the same as fn1.
EDIT
One might ask what is the purpose of return and return-from when there obviously is little use for it. If you want something else than the default result in a loop macro the common way to do it by finally clause.
(defun split-by (test list &key (return-form #'values))
"Split a list in two groups based on test"
(loop :for e :in list
:if (funcall test e)
:collect e :into alist
:else
:collect e :into blist
:finally (return (funcall return-form alist blist))))
(split-by #'oddp '(1 2 3 4) :return-form #'list)
; ==> ((1 3) (2 4))
Another way is if you are doing recursion and want to cancel everything when you know the result you can use return-from:
(defun find-tree-p (needle haystack &key (test #'eql))
"search the tree for element using :test as comparison"
(labels ((helper (tree)
(cond ((funcall test tree needle)
(return-from find-tree t))
((consp tree)
(helper (car tree))
(helper (cdr tree)))
(t nil))))
(helper haystack)))
(find-tree '(f g) '(a b c (d e (f g) q) 1 2 3) :test #'equal)
; ==> (f g) ; t
Now if you hadn't done return-from you would have had logic to check the returned value to see if you needed to continue or not. If you want to process elements and don't want to pass twice to check validity before computing the result, you can just start computing and use return-from as a call/cc. This function can be used to map over lists of lists and it stops at the shortest list so needs to become () when the first sublist is empty:
(defun cdrs (lists)
"return the cdrs if all elements are cons, () otherwise"
(loop :for list :in lists
:when (null list) :do (return-from cdrs '())
:collect (cdr list)))
(cdrs '((a) (b) (c))) ; ==> (nil nil nil)
(cdrs '((a) (b) ())) ; ==> ()

Associativity, Precedence and F# Pipe-forward

The F# pipe-forward can be expressed as:
let (|>) x f = f x
For example:
let SimpleFunction (a : typeA) (b : typeB) (c : typeC)=
printf "OK."
// No problem here.
SimpleFunction a b c
// Using pipe-forward
c |> SimpleFunction a b
// No problem here. Interpreted as the same as above.
However, according to the documentation, the pipe-forward operator is left-associative.
https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/symbol-and-operator-reference/
So, I expected the pipe-forward statement:
// Original Expression
c |> SimpleFunction a b
// to be equivalent to:
(c |> SimpleFunction) a b
// Which is equivalent to:
SimpleFunction c a b
// Error!! SimpleFunction takes typeA, then typeB, then typeC.
Why does the compiler not "interpret" the pipe-forward expression as the error expression? Do I have any confusion about the operator precedence/associativity?
Additional Sources:
http://theburningmonk.com/2011/09/fsharp-pipe-forward-and-pipe-backward/
What is associativity of operators and why is it important?
https://en.wikipedia.org/wiki/Operator_associativity
The associativitity of a binary operator only matters when you have two or more occurrences of the same operator. When you have different operators (here: |> and juxtaposition), what matters is their relative precedence.
Juxtaposition has a higher precedence than |>, therefore
c |> SimpleFunction a b
is parsed like
(c) |> (SimpleFunction a b)
so, by the definition of |>, it's equivalent to
(SimpleFunction a b) (c)
which would usually be written
SimpleFunction a b c
That last equivalence is due to juxtaposition being left-associative.
The fact that |> is left-associative means that an expression like
x |> f |> g
is parsed as
(x |> f) |> g
which is equivalent to
g (f x)
i.e. chains of |> express function composition — successive pipeline steps — and not passing more arguments to a function.
Functions are curried by default, you original expression is actually like this:
c |> (SimpleFunction a b) which then becomes (SimpleFunction a b) c. For this to work function application will have to have precedence.

Streams and the substitution model

I am wondering how the substitution model can be used to show certain things about infinite streams. For example, say you have a stream that puts n in the nth spot and so on inductively. I define it below:
(define all-ints
(lambda ((n <integer>))
(stream-cons n (all-ints (+ 1 n)))))
(define integers (all-ints 1))
It is pretty clear that this does what it is supposed to, but how would someone go about proving it? I decided to use induction. Specifically, induction on k where
(last (stream-to-list integers k))
provides the last value of the first k values of the stream provided, in this case integers. I define stream-to-list below:
(define stream-to-list
(lambda ((s <stream>) (n <integer>))
(cond ((or (zero? n) (stream-empty? s)) '())
(else (cons (stream-first s)
(stream-to-list (stream-rest s) (- n 1)))))))
What I'd like to prove, specifically, is the property that k = (last (stream-to-list integers k)) for all k > 1.
Getting the base case is fairly easy and I can do that, but how would I go about showing the "inductive case" as thoroughly as possible? Since computing the item in the k+1th spot requires that the previous k items also be computed, I don't know how this could be shown. Could someone give me some hints?
In particular, if someone could explain how, exactly, streams are interpreted using the substitution model, I'd really appreciate it. I know they have to be different from the other constructs a regular student would have learned before streams, because they delay computation and I feel like that means they can't be evaluated completely. In turn this would man, I think, the substitution model's apply eval apply etc pattern would not be followed.
stream-cons is a special form. It equalent to wrapping both arguments in lambdas, making them thunks. like this:
(stream-cons n (all-ints (+ 1 n))) ; ==>
(cons (lambda () n) (lambda () (all-ints (+ n 1))))
These procedures are made with the lexical scopes so here n is the initial value while when forcing the tail would call all-ints again in a new lexical scope giving a new n that is then captured in the the next stream-cons. The procedures steam-first and stream-rest are something like this:
(define (stream-first s)
(if (null? (car s))
'()
((car s))))
(define (stream-rest s)
(if (null? (cdr s))
'()
((cdr s))))
Now all of this are half truths. The fact is they are not functional since they mutates (memoize) the value so the same value is not computed twice, but this is not a problem for the substitution model since side effects are off limits anyway. To get a feel for how it's really done see the SICP wizards in action. Notice that the original streams only delayed the tail while modern stream libraries delay both head and tail.

Comparing two lists of symbols in lisp

Let's say I have two lisp lists that are the same but in different sequence: '(A B C) and '(C B A).
How can I check if they are the same (in the sense that the elements are the same)?
CL-USER> (equal '(a b c) '(c b a))
NIL
Like this:
(not (set-exclusive-or '(a b c) '(c b a)))
which returns T if the two sets are equal, NIL otherwise.
[Edit] If they are not truly sets then you could use this:
(not (set-exclusive-or
(remove-duplicates '(a b c))
(remove-duplicates '(c b a))))
If the lists are not sets and repeated items are important, one could use a function like this:
(defun same-elements-p (a b)
(loop (when (and (null a) (null b))
(return t))
(when (or (null a) (null b))
(return nil))
(setf b (remove (pop a) b :count 1))))
If both lists are empty, they are the same. We remove all items of one list from the other and see what happens. Note the :count 1 argument to REMOVE. It makes sure than only one item is removed.
We can define the functions perm-equal and perm-equalp which are similar to EQUAL and EQUALP except that if the arguments are lists, then their permutation doesn't matter. The list (1 1 2 3) is perm-equal to (2 1 3 1), but not to (2 3 1).
The implementation works by normalizing values into a canonical permutation by sorting. This brings up the ugly spectre of requiring an inequality comparison. However, we can hide that by providing a predefined one which works for numbers, symbols and strings. (Why doesn't the sort function do something like this, the way eql is defaulted as the :key parameter?)
(defun less (a b)
(if (realp a)
(< a b)
(string< a b)))
(defun lessp (a b)
(if (realp a)
(< a b)
(string-lessp a b)))
(defun perm-equal (a b &optional (pred #'less))
(if (or (atom a) (atom b))
(equal a b)
(let ((as (sort (copy-list a) pred))
(bs (sort (copy-list b) pred)))
(equal as bs))))
(defun perm-equalp (a b &optional (pred #'lessp))
(if (or (atom a) (atom b))
(equalp a b)
(let ((as (sort (copy-list a) pred))
(bs (sort (copy-list b) pred)))
(equalp as bs))))
Notes:
Doesn't handle improper lists: it just tries to sort them and it's game over.
Even though equalp compares vectors, perm-equalp doesn't extend its permutation-squashing logic over vectors.
realp is used to test for numbers because complex numbers satisfy numberp, yet cannot be compared with <.
The trivial answer for non-sets is to sort both lists. CL's default sort is destructive, so you'll need copies if you want to keep them afterwards.
(defun sorted (a-list predicate)
(sort (copy-list a-list) predicate))
(defun same-list-p (list-a list-b predicate)
(equalp (sorted list-a predicate) (sorted list-b predicate)))
It doesn't have the best performance, but is simple and functional.
This looks to me like an O(n) variant:
(defun equal-elementwise (a b &key (test #'eq))
(loop with hash = (make-hash-table :test test)
for i on a for j on b do
(let ((i (car i)) (j (car j)))
(unless (funcall test i j)
(setf (gethash i hash) (1+ (gethash i hash 0))
(gethash j hash) (1- (gethash j hash 0)))))
finally (return
(unless (or (cdr i) (cdr j))
(loop for value being the hash-value of hash do
(unless (zerop value) (return))
finally (return t))))))
However, this won't be efficient on short lists.

Resources