I try to understand how works the quote phenomenon in Scheme. In particular, I would like to understand when are bound free variables of quoted terms.
For instance, when I write
(define q 'a)
(define a 42)
(eval q)
it returns 42. Thus I deduce that binding time is at runtime. But in this case, why does this code fail
(let ((q 'a))
(let ((a 42))
(eval q)
)
)
and returns
unbound variable: a
Can someone explain me what is the binding time model of quoted terms (is is comparable to MetaOCaml for instance? (I don't think so)) and the difference between define and let?
Scheme has lexical scope discipline, not a dynamic binding discipline.
Your top-level define definitions behave as though creating a binding in a top-level lexical environment.
The second code snippet actually creates two lexical environments, one nested inside the other. So where (not "when") q is bound, a is still unbound. But the real question is, which environment is used by eval?
Your implementation behaves as though it uses the definitional environment, or a top level environment, but certainly not the current lexical environment, for evaluating the symbol 'a, which is the value of the q variable. The variable q has a clear binding lexical environment, created by its let form -- but where does a symbol 'a's binding reside? How are we to know?
Details should be in the documentation.
First off a quoted symbol is just as much a variable as a string with the same sequences of chars as a variable in a C syntax language like Javascript. They have nothing in common since they live in different worlds.
eval does not know of lexical variables, only global ones. It knows of lexical variables that is in the structure to be evaluated. Eg.
(eval '(let ((tmp (list q q)))
tmp))
q needs to be global, but tmp is a lexical variable.
Standard Scheme, aka R6RS, take a second argument where you can choose what libraries should be available. These are still considered global.
Variables are bound at runtime. Implementations are free to optimize and constant fold as long as this optimization does not break the report.
eval is a powerful procedure which should never be used unless it's the most sensible way to solve a problem. I've seen it twice in production code during my 17 year career and I think it's one time too much.
Related
I’ve recently ventured into the awesome land of writing a Scheme interpreter, and I’ve run into a roadblock: closures. From what I understand, they encapsulate a local environment with a procedure that gets restored every time the closure is called (this may not be exactly right). The issue that I can’t seem to find anywhere online is how a closure is formally defined i.e., in an EBNF grammar. Most examples I’ve seen say that a closure is a procedure with zero arguments that has a lambda expression nested inside a let expression. Is this the only way to define a Scheme closure? More importantly, if there’s no formal way to formally define a closure, how do you actually interpret it? What happens if you translate all let expressions to lambdas? For example, if I declare a closure as such
(define (foo) (let ((y 0)) (λ (x) (…))))
Then assign it to a variable
(define bar (foo))
In what order is this evaluated? From what I’ve seen, when foo is declared, it stores a pointer to the parent environment, and declares its own environment. If I call (bar), should I substitute in the saved local environment immediately after?
I don't think it's helpful, today, to think of closures as some special magic thing: long ago in languages in the prehistory of Scheme they were, but in modern languages they are not a special thing at all: they just follow from the semantics of the language in an obvious way.
The two important things (these are both quotes from R7RS, both from section 1.1 are these:
Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.
and
All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent.
What this means is that Scheme is a language with lexical scope and indefinite extent: any variable binding exists for as long as there is possibility of reference. And, conveniently, you can always tell statically (ie by reading the code) what bindings a bit of code may refer to.
And the important thing here is that these rules are absurdly simple: there are no weird special cases. If a reference to a variable binding is visible in a bit of code, which you can tell by looking at the code, then it is visible. It's not visible only sometimes, or only during some interval, or only if the Moon is gibbous: it's visible.
But the implication of the rules is that procedures, somehow, need to remember all the bindings that they reference or may reference and which were in scope when they were created. Because scope is static it is always possible to determine which bindings are in scope (disclaimer: I'm not sure how this works formally for global bindings).
So then the very old-fashioned definition of a closure would be a procedure defined in a scope in which bindings to which it refers exist. This would be a closure:
(define x
(let ((y 1))
(λ (z)
(set! y (+ y z))
y)))
And this procedure would return a closure:
(define make-incrementor
(λ (val)
(λ ()
(let ((v val))
(set! val (+ val 1))
v))))
But you can see that in both cases the behaviour of these things just follows immediately from the scope and extent rules of the language: there's no special 'this is a closure' rule.
In the first case the function which ends up as the value of x both refers to and mutates the binding of y as well as referring to the binding of z established when it was called.
In the second case, calling make-incrementor establishes a binding for val, which binding is then referred to and mutated by the function that it returns.
I'm never sure if it helps to understand things to turn all the lets into λs, but the second thing turns into
(define make-incrementor
(λ (val)
(λ ()
((λ (v)
(set! val (+ val 1))
v)
val))))
And you can see now that the function returned by make-incrementor, when called, now immediately calls another function which binds v solely to its argument, which itself is the value of the binding established by make-incrementor: it's doing this simply to keep hold of the pre-increment value of that binding of course.
Again, the rules are simple: you can just look at the code and see what it does. There is no special 'closure' case.
If you actually do want the formal semantics that gives rise to this, then 7.2 of R7RS has the formal semantics of the language.
A closure is a pair of a pointer to some code and a pointer to the environment the code should be evaluated in, which is the same as the environment the closure was created in.
The presence of closures in the language makes the environment look like a tree. Without closures the environment is like a stack. This is how the environment was in the first lisp systems. Stallman stated he chose dynamic environment in elisp because the static environment was hard to understand at the time (1986).
The closures are one of the most central concepts of computation and they allow the derivation of many other concepts like coroutines, fibers, continuations, threads, thunks to delay, etc etc.
Conceptually I see a value as a single element. I can understand that at the lowest level of hardware the value returned is zero or one. I just see a "value" as returning a single unit. I see a procedure as a multiple unit. For example, a procedure (+ x x) to me seems like it should return "(", ")", "+" , "x". In this example, the value of lambda is the procedure.
What am I missing here?
Scheme is primarily a functional programming language. Functional languages deal with expressions1 (as opposed to statements); expressions are at the core of functional languages pretty much like classes are at the core of object-oriented languages.
In Scheme, functions are expressed as lambda expressions. Since Scheme primarily deals with expressions, and since lambda expressions themselves are expressions, Scheme deals with functions just like any other expression. Therefore, functions are first-class citizens of the language.
I don't think one should feel overly concerned about how exactly all of this translates under the hood in terms of bits and bytes. What plays as a strength in some languages (C/C++) can quickly turn against you here: imperative thinking in Scheme will only get you frustrated, and bounce you right back out to mainstream paradigms and languages.
What functional languages are really about is abstraction, metaprogramming (many Schemes feature powerful syntactic macros), and more abstraction. There is this well-known quote from Peter Deutsch: "Lisp ... made me aware that software could be close to executable mathematics.", I think it sums it up very well.
1 In Scheme (and other dialects of LISP), s-expressions are used to denote expressions. They give the language that distinctive parenthesized syntax.
When the program is compiled the code for a procedure is generated. When you run the program the code for a procedure is stored at a given address. A procedure value will in most implementations consist of a record/struct containing the name of the procedure, the address where the procedure us stored (when you call the procedure, the cpu jumps to this address) and finally in the case of procedure created by a lambda expression a table of the free values.
In Scheme everything is passed by value. When that said values that are not able to be stored in the actual address space of a machine word are pointers to data structures.
An internal procedure (primitive) is treated specially such that it might be just a value while a evaluated lambda expression is multi value object. That object has an address which is the "value" of the procedure. When evaluating a lambda form it turns into such a structure. Example:
;; a lambda form is evaluated into a closure and then
;; that object is the value of variable x
(define x (lambda y y)) ; ==> undefined
;; x is evaluated and turns out to be a closure. Scheme
;; evalautes rest of arguments before applying.
(x 1 2 3) ; ==> (1 2 3)
;; same only that all the arguments also evaluates to the same closure.
(x x x x) ; ==> (#<closure> #<closure> #<closure>)
I started playing with Clojure today and stumbled upon the statement that one could change functions dynamically during runtime.
That sounds pretty cool so I wrote a little piece of code using this feature.
(defn ^:dynamic state [x]
(odd x))
(defn even [x]
(if (= x 0)
(println "even")
(binding [state odd] (parity x))))
(defn odd [x]
(if (= x 0)
(println "odd")
(binding [state even](parity x))))
(defn parity [x]
(state (dec x)))
It works out fine, but since I am completly new to Clojure I don't know whether this is
a) clean functional code (since odd and even seem to have sideeffects?)
b) the way changing functions on runtime is supposed to be done
I would appreciate any kind of advice on that! :)
-Zakum
Use of dynamic bindings is mostly a question of taste, but there are a few considerations:
Dynamic bindings are pretty much a shortcut for explicitly passing values on the call stack. There are only a few situations where doing that is a totally obvious win; mostly things like passing "global" configuration settings/arguments "through" APIs that don't support them.
An API that relies on dynamic bindings is hard to wrap into something more explicit, while the other way around is much easier (and can usually be done semi-automatically).
Dynamic bindings do not play nice with lazy sequences or anything else that evaluates outside of the current call stack (like other threads).
All in all, I think the "cleaner" functional solution would be to pass state as an argument to parity, but arguments can be made either way.
While being able to dynamically bind a symbol to different functions, I guess what you're after is really redefining a function.
Think of it this way: your code creates a symbol and two functions, and you dynamically bind the symbol to a different function:
+---> func1
/
symbol ---- [dynamic binding] ---<
\
+---> func2
The effect of your dynamic binding is limited to the scope of the binding invocation.
What we want to achieve is that, given a symbol and a function, provide a new implementation for the function so that all the code that refers to it will access the new implementation:
(defn func1 [...])
(var func1) ; ---> func1
(defn func1 [...])
(var func1) ; ---> func1*
and such a change permanently affects all the code that uses func1. This a normal task when you're developing a piece of clojure: you'll most likely have a REPL opened on a running application, and you'll def and defn several time the same symbols over and over again, redefining all the moving parts of your application on the fly.
If you're using Emacs and SLIME/Swank, any time you hit C-c C-k on a modified Clojure source file, you're potentially redefining all the functions in a namespace without the need to restart the application.
I was reading this and now wonder: what is the evaluation order in F#?
Obviously ; makes effects happen in a sequential fashion. But what about things like function calls or applications, order of evaluation for operators, and the like.
I've glanced at the F# spec, but there is no mention of that. Thanks for any insight!
I found some emails where we fixed the implementation to have a rigid application order. The code
open System
let f a =
Console.WriteLine "app1";
fun b ->
Console.WriteLine "app2";
()
(Console.WriteLine "f"; f) (Console.WriteLine "arg1") (Console.WriteLine "arg2")
will print "f", "arg1", "arg2", "app1", "app2". However this didn't make it into the spec. I'll file a spec bug.
(Some other portions of the spec are already more explicit, e.g.
6.9.6 Evaluating Method Applications
For elaborated applications of methods, the elaborated form of the expression will be either expr.M(args) or M(args).
The (optional) expr and args are evaluated in left-to-right order and the body of the member is evaluated in an environment with formal parameters that are mapped to corresponding argument values.
If expr evaluates to null then NullReferenceException is raised.
If the method is a virtual dispatch slot (that is, a method that is declared abstract) then the body of the member is chosen according to the dispatch maps of the value of expr.
That said, some experts believe that you will live a longer, happier life if you do not rely on evaluation order. :) )
(Possibly see also
http://blogs.msdn.com/ericlippert/archive/2009/11/19/always-write-a-spec-part-one.aspx
http://blogs.msdn.com/ericlippert/archive/2009/11/23/always-write-a-spec-part-two.aspx
for more on how easy it is to screw things up with evaluation order.)
Why is it that functions in F# and OCaml (and possibly other languages) are not by default recursive?
In other words, why did the language designers decide it was a good idea to explicitly make you type rec in a declaration like:
let rec foo ... = ...
and not give the function recursive capability by default? Why the need for an explicit rec construct?
The French and British descendants of the original ML made different choices and their choices have been inherited through the decades to the modern variants. So this is just legacy but it does affect idioms in these languages.
Functions are not recursive by default in the French CAML family of languages (including OCaml). This choice makes it easy to supercede function (and variable) definitions using let in those languages because you can refer to the previous definition inside the body of a new definition. F# inherited this syntax from OCaml.
For example, superceding the function p when computing the Shannon entropy of a sequence in OCaml:
let shannon fold p =
let p x = p x *. log(p x) /. log 2.0 in
let p t x = t +. p x in
-. fold p 0.0
Note how the argument p to the higher-order shannon function is superceded by another p in the first line of the body and then another p in the second line of the body.
Conversely, the British SML branch of the ML family of languages took the other choice and SML's fun-bound functions are recursive by default. When most function definitions do not need access to previous bindings of their function name, this results in simpler code. However, superceded functions are made to use different names (f1, f2 etc.) which pollutes the scope and makes it possible to accidentally invoke the wrong "version" of a function. And there is now a discrepancy between implicitly-recursive fun-bound functions and non-recursive val-bound functions.
Haskell makes it possible to infer the dependencies between definitions by restricting them to be pure. This makes toy samples look simpler but comes at a grave cost elsewhere.
Note that the answers given by Ganesh and Eddie are red herrings. They explained why groups of functions cannot be placed inside a giant let rec ... and ... because it affects when type variables get generalized. This has nothing to do with rec being default in SML but not OCaml.
One crucial reason for the explicit use of rec is to do with Hindley-Milner type inference, which underlies all staticly typed functional programming languages (albeit changed and extended in various ways).
If you have a definition let f x = x, you'd expect it to have type 'a -> 'a and to be applicable on different 'a types at different points. But equally, if you write let g x = (x + 1) + ..., you'd expect x to be treated as an int in the rest of the body of g.
The way that Hindley-Milner inference deals with this distinction is through an explicit generalisation step. At certain points when processing your program, the type system stops and says "ok, the types of these definitions will be generalised at this point, so that when someone uses them, any free type variables in their type will be freshly instantiated, and thus won't interfere with any other uses of this definition."
It turns out that the sensible place to do this generalisation is after checking a mutually recursive set of functions. Any earlier, and you'll generalise too much, leading to situations where types could actually collide. Any later, and you'll generalise too little, making definitions that can't be used with multiple type instantiations.
So, given that the type checker needs to know about which sets of definitions are mutually recursive, what can it do? One possibility is to simply do a dependency analysis on all the definitions in a scope, and reorder them into the smallest possible groups. Haskell actually does this, but in languages like F# (and OCaml and SML) which have unrestricted side-effects, this is a bad idea because it might reorder the side-effects too. So instead it asks the user to explicitly mark which definitions are mutually recursive, and thus by extension where generalisation should occur.
There are two key reasons this is a good idea:
First, if you enable recursive definitions then you can't refer to a previous binding of a value of the same name. This is often a useful idiom when you are doing something like extending an existing module.
Second, recursive values, and especially sets of mutually recursive values, are much harder to reason about then are definitions that proceed in order, each new definition building on top of what has been already defined. It is nice when reading such code to have the guarantee that, except for definitions explicitly marked as recursive, new definitions can only refer to previous definitions.
Some guesses:
let is not only used to bind functions, but also other regular values. Most forms of values are not allowed to be recursive. Certain forms of recursive values are allowed (e.g. functions, lazy expressions, etc.), so it needs an explicit syntax to indicate this.
It might be easier to optimize non-recursive functions
The closure created when you create a recursive function needs to include an entry that points to the function itself (so the function can recursively call itself), which makes recursive closures more complicated than non-recursive closures. So it might be nice to be able to create simpler non-recursive closures when you don't need recursion
It allows you to define a function in terms of a previously-defined function or value of the same name; although I think this is bad practice
Extra safety? Makes sure that you are doing what you intended. e.g. If you don't intend it to be recursive but you accidentally used a name inside the function with the same name as the function itself, it will most likely complain (unless the name has been defined before)
The let construct is similar to the let construct in Lisp and Scheme; which are non-recursive. There is a separate letrec construct in Scheme for recursive let's
Given this:
let f x = ... and g y = ...;;
Compare:
let f a = f (g a)
With this:
let rec f a = f (g a)
The former redefines f to apply the previously defined f to the result of applying g to a. The latter redefines f to loop forever applying g to a, which is usually not what you want in ML variants.
That said, it's a language designer style thing. Just go with it.
A big part of it is that it gives the programmer more control over the complexity of their local scopes. The spectrum of let, let* and let rec offer an increasing level of both power and cost. let* and let rec are in essence nested versions of the simple let, so using either one is more expensive. This grading allows you to micromanage the optimization of your program as you can choose which level of let you need for the task at hand. If you don't need recursion or the ability to refer to previous bindings, then you can fall back on a simple let to save a bit of performance.
It's similar to the graded equality predicates in Scheme. (i.e. eq?, eqv? and equal?)