I’ve recently ventured into the awesome land of writing a Scheme interpreter, and I’ve run into a roadblock: closures. From what I understand, they encapsulate a local environment with a procedure that gets restored every time the closure is called (this may not be exactly right). The issue that I can’t seem to find anywhere online is how a closure is formally defined i.e., in an EBNF grammar. Most examples I’ve seen say that a closure is a procedure with zero arguments that has a lambda expression nested inside a let expression. Is this the only way to define a Scheme closure? More importantly, if there’s no formal way to formally define a closure, how do you actually interpret it? What happens if you translate all let expressions to lambdas? For example, if I declare a closure as such
(define (foo) (let ((y 0)) (λ (x) (…))))
Then assign it to a variable
(define bar (foo))
In what order is this evaluated? From what I’ve seen, when foo is declared, it stores a pointer to the parent environment, and declares its own environment. If I call (bar), should I substitute in the saved local environment immediately after?
I don't think it's helpful, today, to think of closures as some special magic thing: long ago in languages in the prehistory of Scheme they were, but in modern languages they are not a special thing at all: they just follow from the semantics of the language in an obvious way.
The two important things (these are both quotes from R7RS, both from section 1.1 are these:
Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.
and
All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent.
What this means is that Scheme is a language with lexical scope and indefinite extent: any variable binding exists for as long as there is possibility of reference. And, conveniently, you can always tell statically (ie by reading the code) what bindings a bit of code may refer to.
And the important thing here is that these rules are absurdly simple: there are no weird special cases. If a reference to a variable binding is visible in a bit of code, which you can tell by looking at the code, then it is visible. It's not visible only sometimes, or only during some interval, or only if the Moon is gibbous: it's visible.
But the implication of the rules is that procedures, somehow, need to remember all the bindings that they reference or may reference and which were in scope when they were created. Because scope is static it is always possible to determine which bindings are in scope (disclaimer: I'm not sure how this works formally for global bindings).
So then the very old-fashioned definition of a closure would be a procedure defined in a scope in which bindings to which it refers exist. This would be a closure:
(define x
(let ((y 1))
(λ (z)
(set! y (+ y z))
y)))
And this procedure would return a closure:
(define make-incrementor
(λ (val)
(λ ()
(let ((v val))
(set! val (+ val 1))
v))))
But you can see that in both cases the behaviour of these things just follows immediately from the scope and extent rules of the language: there's no special 'this is a closure' rule.
In the first case the function which ends up as the value of x both refers to and mutates the binding of y as well as referring to the binding of z established when it was called.
In the second case, calling make-incrementor establishes a binding for val, which binding is then referred to and mutated by the function that it returns.
I'm never sure if it helps to understand things to turn all the lets into λs, but the second thing turns into
(define make-incrementor
(λ (val)
(λ ()
((λ (v)
(set! val (+ val 1))
v)
val))))
And you can see now that the function returned by make-incrementor, when called, now immediately calls another function which binds v solely to its argument, which itself is the value of the binding established by make-incrementor: it's doing this simply to keep hold of the pre-increment value of that binding of course.
Again, the rules are simple: you can just look at the code and see what it does. There is no special 'closure' case.
If you actually do want the formal semantics that gives rise to this, then 7.2 of R7RS has the formal semantics of the language.
A closure is a pair of a pointer to some code and a pointer to the environment the code should be evaluated in, which is the same as the environment the closure was created in.
The presence of closures in the language makes the environment look like a tree. Without closures the environment is like a stack. This is how the environment was in the first lisp systems. Stallman stated he chose dynamic environment in elisp because the static environment was hard to understand at the time (1986).
The closures are one of the most central concepts of computation and they allow the derivation of many other concepts like coroutines, fibers, continuations, threads, thunks to delay, etc etc.
Related
This question is a continuation of the comments at Using Local Special Variables, regarding how best to avoid global variables. As I understand it, global variables are problematic mainly because they have the potential to interfere with referential transparency. Transparency is violated if an expression changes a global value using information outside its calling context (eg, a previous value of the global variable itself, or any other external values). In these cases evaluating the expression may have different results at different times, either in the value(s) returned or in side effects. (However, it seems not all global updates are problematic, since some updates may not depend on any external information--eg, resetting a global counter to 0). The normal global approach for a deeply embedded counter might look like:
* (defparameter *x* 0)
*X*
* (defun foo ()
(incf *x*))
FOO
* (defun bar ()
(foo))
BAR
* (bar)
1
* *x*
1
This would seem to violate referential transparency because (incf *x*) depends on the external (global) value of *x* to do its work. The following is an attempt to maintain both functionality and referential transparency by eliminating the global variable, but I'm not convinced that it really does:
* (let ((x 0))
(defun inc-x () (incf x))
(defun reset-x () (setf x 0))
(defun get-x () x))
GET-X
* (defun bar ()
(inc-x))
BAR
* (defun foo ()
(bar))
FOO
* (get-x)
0
* (foo)
1
* (get-x)
1
The global variable is now gone, but it still seems like the expression (inc-x) has a (latent) side effect, and it will return different (but unused) values each time it is called. Does this confirm that using a closure on the variable in question does not solve the transparency problem?
global variables are problematic mainly because they have the potential to interfere with referential transparency
If one wants to create a global configuration value, a global variable in Common Lisp is just fine.
Often it's desirable to package a bunch of configuration state and then it may be better to put that into an object.
There is no general requirement for procedures to be referential transparent.
It's useful to guide software design by software engineering principles, but often easy debugging and maintenance is more important than strict principles.
(let ((x 0))
(defun inc-x () (incf x))
(defun reset-x () (setf x 0))
(defun get-x () x))
Practically above means that it
is difficult to inspect
has problematic effects of reloading the code
prohibits the file compiler to recognize the top-level nature of the functions
creates a whole API for just managing a single variable
Referential transparency means that if you bind some variable x to an expression e, you can replace all occurrences of x by e without changing the outcome. For example:
(let ((e (* pi 2)))
(list (cos e) (sin e)))
The above could be written:
(list (cos (* pi 2))
(sin (* pi 2)))
The resulting value is equivalent to the first one for some useful definition of equivalence (here equalp, but you could choose another one). Contrast this with:
(let ((e (random))
(list e e))
Here above, each call to random gives a different result (statistically), and thus the behaviour is different if you reuse the same result multiple times or generate a new after each call.
Special variables are like additional arguments to functions, they can influence the outcome of a result simply by being bound to different values. Consider *default-pathname-defaults*, which is used to build pathnames.
In fact, for a given binding of that variable, each call to (merge-pathnames "foo") returns the same result. The result changes only if you use the same expression in different dynamical context, which is no different than calling a function with different arguments.
The main difficulty is that the special variable is hidden, i.e. you might not know that it influences some expressions, and that's why you need them documented and limited in number.
What breaks referential transparency is the presence of side-effects, whether you are using lexical or special variables. In both cases, a place is modified as part of the execution of the function, which means that you need to consider when and how often you call it.
You could have better suggestions if you explained a bit more how your code is organized. You said that you have many special variables due to prototyping but in the refactoring you want to do it seems as-if you want to keep to prototypal code mostly untouched. Maybe there is a way to pack things in a nice modular way but we can't help without knowing more about why you need many special variables, etc.
That code isn't referentially transparent. It is an improvement from special variables though.
The code you put would be a functional nonce if you dropped the reset-x.
My answer to your previous question had general guidelines about special variables. For your specific case, perhaps they are worth it? I could see the case for using special variables as a nonce, for example, where it is probably silly to pass them around.
Common Lisp has so many facilities for dealing with global information, so there is rarely a need for having lots of global variables. You could define an *env* alist to store your values in, or put them in a hash table, or put them into symbol plists, or package them in a closure to pass around, or do something else, or use CLOS.
Where is the side effect of the second example ? The x inside the let isn't accessible from the outside.
Here's another closure example, with top-level functions, and a counter explicitly inside it.
(defun repeater (n)
(let ((counter -1))
(lambda ()
(if (< counter n)
(incf counter)
(setf counter 0)))))
(defparameter *my-repeater* (repeater 3))
;; *MY-REPEATER*
(funcall *my-repeater*)
0
(funcall *my-repeater*)
1
https://lispcookbook.github.io/cl-cookbook/functions.html#closures
I would like to model the behaviour of generic datatypes in SMT v2.6. I am using Z3 as constraint solver. I modelled, based on the official example, a generic list as parameterised datatype in the following way:
(declare-datatypes (T) ((MyList nelem (cons (hd T) (tl MyList)))))
I would like the list to be generic with respect to the datatype. Later on, I would like to declare constants the following way:
(declare-const x (MyList Int))
(declare-const y (MyList Real))
However, now I would like to define functions on the generic datatype MyList (e.g., a length operation, empty operation, ...) so that they are re-usable for all T's. Do you have an idea how I could achieve this? I did try something like:
(declare-sort K)
(define-fun isEmpty ((in (MyList K))) Bool
(= in nelem)
)
but this gives me an error message; for this example to work Z3 would need to do some type-inference, I suppose.
Would be great if you could could give me a hint.
SMT-Lib does not allow polymorphic user-defined functions. Section 4.1.5 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf states:
Well-sortedness checks, required for commands that use sorts or terms,
are always done with respect to the current signature. It is an error
to declare or define a symbol that is already in the current
signature. This implies in particular that, contrary to theory
function symbols, user-defined function symbols cannot be overloaded.
Which is further expanded in Footnote-29:
The motivation for not overloading user-defined symbols is to simplify
their processing by a solver. This restriction is significant only for
users who want to extend the signature of the theory used by a script
with a new polymorphic function symbol—i.e., one whose rank would
contain parametric sorts if it was a theory symbol. For instance,
users who want to declare a “reverse” function on arbitrary lists,
must define a different reverse function symbol for each (concrete)
list sort used in the script. This restriction might be removed in
future versions.
So, as you suspected, you cannot define "polymorphic" functions at the user level. But as the footnote indicates, this restriction might be removed in the future, something that will most likely happen as SMT-solvers are more widely deployed. Exactly when that might happen, however, is anyone's guess.
I try to understand how works the quote phenomenon in Scheme. In particular, I would like to understand when are bound free variables of quoted terms.
For instance, when I write
(define q 'a)
(define a 42)
(eval q)
it returns 42. Thus I deduce that binding time is at runtime. But in this case, why does this code fail
(let ((q 'a))
(let ((a 42))
(eval q)
)
)
and returns
unbound variable: a
Can someone explain me what is the binding time model of quoted terms (is is comparable to MetaOCaml for instance? (I don't think so)) and the difference between define and let?
Scheme has lexical scope discipline, not a dynamic binding discipline.
Your top-level define definitions behave as though creating a binding in a top-level lexical environment.
The second code snippet actually creates two lexical environments, one nested inside the other. So where (not "when") q is bound, a is still unbound. But the real question is, which environment is used by eval?
Your implementation behaves as though it uses the definitional environment, or a top level environment, but certainly not the current lexical environment, for evaluating the symbol 'a, which is the value of the q variable. The variable q has a clear binding lexical environment, created by its let form -- but where does a symbol 'a's binding reside? How are we to know?
Details should be in the documentation.
First off a quoted symbol is just as much a variable as a string with the same sequences of chars as a variable in a C syntax language like Javascript. They have nothing in common since they live in different worlds.
eval does not know of lexical variables, only global ones. It knows of lexical variables that is in the structure to be evaluated. Eg.
(eval '(let ((tmp (list q q)))
tmp))
q needs to be global, but tmp is a lexical variable.
Standard Scheme, aka R6RS, take a second argument where you can choose what libraries should be available. These are still considered global.
Variables are bound at runtime. Implementations are free to optimize and constant fold as long as this optimization does not break the report.
eval is a powerful procedure which should never be used unless it's the most sensible way to solve a problem. I've seen it twice in production code during my 17 year career and I think it's one time too much.
I'm starting to doubt I really understand this topic.
Until now, I was understanding a continuation as calling a function with closure (typically returned by another function). But MLton seems to have a non‑standard special structure for this (a structure I'm not sure to understand), and also in some other documents, mention special optimizations (using jumps, as quickly mentioned on page 58, printed page 51) with continuations, namely, instead of naming call to functions with closure. Also, function closures seems to be sometime described as the basis for continuations, but not described as being continuations, while some other times people assert the opposite (that function closures are special case of continuations, not the other way).
As an example, how do continuations differs from this, and what would looks like the same, with continuations instead of function with closure:
datatype next = Next of (unit -> next)
fun f (i:int): next =
(print (Int.toString i);
Next (fn () => f (i + 1)))
val Next g = f 1
val Next g = g ()
val Next g = g ()
val Next g = g ()
…
I wonder about it, in the general computer‑science context, as much as specifically in the practical SML context.
Note: the question may looks the same as “difference between closures and continuations”, but reading this one did not answer my question and does not address a practical case as a basis. Except it drove me to add another question: why are continuations said to be more abstract than closures, if in the end continuations are made of closures as the incomplete (to my eyes) answer in the above link suggest?
Is the difference really important or just a matter of style / syntax / vocabulary?
I feel a similar question arise with monads versus continuations, but that would be too much for a single question post (but if on the opposite, that can be simply answered in the while, feel free…).
Update
Still from MLton's world, a wording which seems to suggest continuations and function closures are the same (unless I'm not understanding correctly).
CommonArg (mlton.org), near the bottom of the page, says:
What I think the common argument optimization shows is that the
dominator analysis does slightly better than the reviewer puts it:
we find more than just constant continuations, we find common
continuations. And I think this is further justified by the fact
that I have observed common argument eliminate some env_X arguments
which would appear to correspond to determining that while the
closure being executed isn’t constant it is at least the same as
the closure being passed elsewhere.
It's talking about the same using both words, isn't it?
Similarly and may be more explicitely, at the bottom on this page: ReturnStatement (mlton.org).
There too, it seems to be the same. Is it?
It seems there is a terminological confusion. 'Continuation' is an abstract concept, which is a meaning of a context of an expression. Closure is a very particular way to realize
values that represent functions (higher-order languages can be implemented without closures at all, for example, using substitution semantics).
Control operator can capture the current continuation and produce a particular representation of it (this is called reification). The particular representation of a captured continuation may indeed be a closure -- or may be not. For example, in OCaml, the continuations captured by the delimcc library are repersented as values of the abstract data type (whose realization is quite different from closures). You might find the introduction part of the following page useful.
Undelimited continuations are not functions
I started playing with Clojure today and stumbled upon the statement that one could change functions dynamically during runtime.
That sounds pretty cool so I wrote a little piece of code using this feature.
(defn ^:dynamic state [x]
(odd x))
(defn even [x]
(if (= x 0)
(println "even")
(binding [state odd] (parity x))))
(defn odd [x]
(if (= x 0)
(println "odd")
(binding [state even](parity x))))
(defn parity [x]
(state (dec x)))
It works out fine, but since I am completly new to Clojure I don't know whether this is
a) clean functional code (since odd and even seem to have sideeffects?)
b) the way changing functions on runtime is supposed to be done
I would appreciate any kind of advice on that! :)
-Zakum
Use of dynamic bindings is mostly a question of taste, but there are a few considerations:
Dynamic bindings are pretty much a shortcut for explicitly passing values on the call stack. There are only a few situations where doing that is a totally obvious win; mostly things like passing "global" configuration settings/arguments "through" APIs that don't support them.
An API that relies on dynamic bindings is hard to wrap into something more explicit, while the other way around is much easier (and can usually be done semi-automatically).
Dynamic bindings do not play nice with lazy sequences or anything else that evaluates outside of the current call stack (like other threads).
All in all, I think the "cleaner" functional solution would be to pass state as an argument to parity, but arguments can be made either way.
While being able to dynamically bind a symbol to different functions, I guess what you're after is really redefining a function.
Think of it this way: your code creates a symbol and two functions, and you dynamically bind the symbol to a different function:
+---> func1
/
symbol ---- [dynamic binding] ---<
\
+---> func2
The effect of your dynamic binding is limited to the scope of the binding invocation.
What we want to achieve is that, given a symbol and a function, provide a new implementation for the function so that all the code that refers to it will access the new implementation:
(defn func1 [...])
(var func1) ; ---> func1
(defn func1 [...])
(var func1) ; ---> func1*
and such a change permanently affects all the code that uses func1. This a normal task when you're developing a piece of clojure: you'll most likely have a REPL opened on a running application, and you'll def and defn several time the same symbols over and over again, redefining all the moving parts of your application on the fly.
If you're using Emacs and SLIME/Swank, any time you hit C-c C-k on a modified Clojure source file, you're potentially redefining all the functions in a namespace without the need to restart the application.