How can the symbols in a Pandoric Macro be compiled out? - closures

I have read section 6.7 of LOL a few times now, and I still can't wrap my mind around the following.
Bindings that were previously closed to outside code are now wide open for us to tinker with, even if those bindings were compiled to something efficient and have long since had their accessor symbols forgotten.
If bound symbols are essentially compiled down to pointers in the environment of the closure, how can you pass a symbol to the already compiled function, and the function somehow is able to compare the symbol?
I've been messing with the pantest example in clisp, and I can see that I'm able to change both acc and this inside pantest. I can compile and disassemble pantest, but all the symbols show up in the environment. If I had a lisp that compiled down to assembly, I might gain some more intuition, but the code is complicated enough that it will probably be too difficult to follow without explanation.

I'm not familar with Let Over Lambda.
The book Lisp in Small Pieces explains how lexical binding can compile down to very efficient variable references. Since all known references to the variables are in a limited scope, you could use an array to store the bindings and reference them by a numeric index, rather than by using the symbol to look up things or using a property of the symbol to get the value.
A symbol passed into a function is just a symbol, a kind of data. Comparing it to other things in the function isn't the same as accessing information about the lexical bindings in a particular scope.
There's a didactic Lisp pseudo-OO technique that shows how you can change function behavior by passing in a symbol to access and modify the lexical state, but it's choosing from a fixed set of known things to compare, not an arbitrary lookup of lexical info based on a symbol.
(defun make-incrementor (initial-value)
(let ((value initial-value))
(lambda (action)
(ecase action
(:value
value)
(:increment
(incf value))
(:reset
(setf value initial-value))))))
> (defvar *inc* (make-incrementor 10))
*INC*
> (funcall *inc* :increment)
11
> (funcall *inc* :increment)
12
> (funcall *inc* :increment)
13
> (funcall *inc* :reset)
10
This is manipulating the lexical binding of value without externally accessing it. All changes are mediated through code in the same lexical location.

(I will come back later and fill in some more information here later)
In short (and slightly oversimplified), the pandoric-let macro has added in some extra code to handle the case of trying to get or set each of the different variables it introduces. This extra code remembers the symbols of the variables after the code has been compiled, but since it's the only code which needs that information everything else gets compiled down to very efficient pointer operations.
The functions which generate this extra code are pandoriclet-get and pandoriclet-set, both of which are tricky to read because they return code which depends on the symbols sym and val which are provided in the pandoriclet macro itself.

Related

How are Scheme closures formally defined?

I’ve recently ventured into the awesome land of writing a Scheme interpreter, and I’ve run into a roadblock: closures. From what I understand, they encapsulate a local environment with a procedure that gets restored every time the closure is called (this may not be exactly right). The issue that I can’t seem to find anywhere online is how a closure is formally defined i.e., in an EBNF grammar. Most examples I’ve seen say that a closure is a procedure with zero arguments that has a lambda expression nested inside a let expression. Is this the only way to define a Scheme closure? More importantly, if there’s no formal way to formally define a closure, how do you actually interpret it? What happens if you translate all let expressions to lambdas? For example, if I declare a closure as such
(define (foo) (let ((y 0)) (λ (x) (…))))
Then assign it to a variable
(define bar (foo))
In what order is this evaluated? From what I’ve seen, when foo is declared, it stores a pointer to the parent environment, and declares its own environment. If I call (bar), should I substitute in the saved local environment immediately after?
I don't think it's helpful, today, to think of closures as some special magic thing: long ago in languages in the prehistory of Scheme they were, but in modern languages they are not a special thing at all: they just follow from the semantics of the language in an obvious way.
The two important things (these are both quotes from R7RS, both from section 1.1 are these:
Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.
and
All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent.
What this means is that Scheme is a language with lexical scope and indefinite extent: any variable binding exists for as long as there is possibility of reference. And, conveniently, you can always tell statically (ie by reading the code) what bindings a bit of code may refer to.
And the important thing here is that these rules are absurdly simple: there are no weird special cases. If a reference to a variable binding is visible in a bit of code, which you can tell by looking at the code, then it is visible. It's not visible only sometimes, or only during some interval, or only if the Moon is gibbous: it's visible.
But the implication of the rules is that procedures, somehow, need to remember all the bindings that they reference or may reference and which were in scope when they were created. Because scope is static it is always possible to determine which bindings are in scope (disclaimer: I'm not sure how this works formally for global bindings).
So then the very old-fashioned definition of a closure would be a procedure defined in a scope in which bindings to which it refers exist. This would be a closure:
(define x
(let ((y 1))
(λ (z)
(set! y (+ y z))
y)))
And this procedure would return a closure:
(define make-incrementor
(λ (val)
(λ ()
(let ((v val))
(set! val (+ val 1))
v))))
But you can see that in both cases the behaviour of these things just follows immediately from the scope and extent rules of the language: there's no special 'this is a closure' rule.
In the first case the function which ends up as the value of x both refers to and mutates the binding of y as well as referring to the binding of z established when it was called.
In the second case, calling make-incrementor establishes a binding for val, which binding is then referred to and mutated by the function that it returns.
I'm never sure if it helps to understand things to turn all the lets into λs, but the second thing turns into
(define make-incrementor
(λ (val)
(λ ()
((λ (v)
(set! val (+ val 1))
v)
val))))
And you can see now that the function returned by make-incrementor, when called, now immediately calls another function which binds v solely to its argument, which itself is the value of the binding established by make-incrementor: it's doing this simply to keep hold of the pre-increment value of that binding of course.
Again, the rules are simple: you can just look at the code and see what it does. There is no special 'closure' case.
If you actually do want the formal semantics that gives rise to this, then 7.2 of R7RS has the formal semantics of the language.
A closure is a pair of a pointer to some code and a pointer to the environment the code should be evaluated in, which is the same as the environment the closure was created in.
The presence of closures in the language makes the environment look like a tree. Without closures the environment is like a stack. This is how the environment was in the first lisp systems. Stallman stated he chose dynamic environment in elisp because the static environment was hard to understand at the time (1986).
The closures are one of the most central concepts of computation and they allow the derivation of many other concepts like coroutines, fibers, continuations, threads, thunks to delay, etc etc.

wxMaxima: subindexed variables in functions work when written as "x_1" but not when written as "x[1]"

I'm having trouble defining a function in terms of variables with subindices. Using the makelist command I can create an unspecified function that depends upon the subindexed variables x[1] and x[2]. However, when I try to give an expression to that function, wxMaxima does not allow it:
On the other hand, if I write the subindexed variables as x_1 and x_2 instead of x[1] and x_[2], things do work.
What is the reason for this behavior? Aren't the two subindexing methods equivalent in terms of functions?
Only symbols can be declared function arguments. In particular, subscripted expressions are not symbols and therefore can't be function arguments.
WxMaxima displays symbols which end in a number, e.g., x_1, the same as subscripted expressions, e.g., x[1]. This is intended as a convenience, although it is confusing because it makes it difficult to distinguish the two.
You can see the internal form of an expression via ?print (note the question mark is part of the name). E.g., ?print(x_1); versus ?print(x[1]);.

Symbol Creation in Z3 Java API

I am new to Z3, so excuse me if the question sounds too easy. I have two questions regrading constants in Z3 Java API.
How does creation of constants happen internally? To understand that I started by tracking
public BitVecExpr mkBVConst(String, int) down to public StringSymbol mkSymbol(String) which eventually calls Native.mkStringSymbol(var1.nCtx(), var2) which generates the variable in var3 in this line long var3 = INTERNALmkStringSymbol(var0, var2);
now because `INTERNALmkStringSymbol' is native I can't see its source. I am wondering about how does it operate. Does anyone know how does it work? Where to view its source?
Another thing I am confused about is the scoping of constants using the API. In the interactive Z3, it is maintained through matching push and pop but through the API, I am not sure how scoping is defined and managed.
Any insights or guidance is much appreciated.!
Z3 is open source, you can view and download the source from https://github.com/z3prover/z3.git. Symbols in Z3 are defined in src/util/symbol.h. You will see that symbols are similar to LISP atoms: they persist through the lifetime of the dll and are unique. So two symbols with the same name will be pointer-equal. The Java API calls into the C API, which is declared in src/api/z3_api.h. The directory src/api contains the API functions, including those that create symbols. When you create an expression constant, such as mkBVConst, it is an expression that is also pointer-unique (if you create the same mkBVConst twice, the unmanaged pointers will be equal. The Java pointers are not the same, but equality testing exploits all of this).
The Solver object has push and pop methods. You can add constraints to the solver object. The life-time of constraints follow the push/pop nesting: a constraint is active until there is a pop that removes the scope where the constraint was added.

Binding time of free variables of quoted terms in Scheme

I try to understand how works the quote phenomenon in Scheme. In particular, I would like to understand when are bound free variables of quoted terms.
For instance, when I write
(define q 'a)
(define a 42)
(eval q)
it returns 42. Thus I deduce that binding time is at runtime. But in this case, why does this code fail
(let ((q 'a))
(let ((a 42))
(eval q)
)
)
and returns
unbound variable: a
Can someone explain me what is the binding time model of quoted terms (is is comparable to MetaOCaml for instance? (I don't think so)) and the difference between define and let?
Scheme has lexical scope discipline, not a dynamic binding discipline.
Your top-level define definitions behave as though creating a binding in a top-level lexical environment.
The second code snippet actually creates two lexical environments, one nested inside the other. So where (not "when") q is bound, a is still unbound. But the real question is, which environment is used by eval?
Your implementation behaves as though it uses the definitional environment, or a top level environment, but certainly not the current lexical environment, for evaluating the symbol 'a, which is the value of the q variable. The variable q has a clear binding lexical environment, created by its let form -- but where does a symbol 'a's binding reside? How are we to know?
Details should be in the documentation.
First off a quoted symbol is just as much a variable as a string with the same sequences of chars as a variable in a C syntax language like Javascript. They have nothing in common since they live in different worlds.
eval does not know of lexical variables, only global ones. It knows of lexical variables that is in the structure to be evaluated. Eg.
(eval '(let ((tmp (list q q)))
tmp))
q needs to be global, but tmp is a lexical variable.
Standard Scheme, aka R6RS, take a second argument where you can choose what libraries should be available. These are still considered global.
Variables are bound at runtime. Implementations are free to optimize and constant fold as long as this optimization does not break the report.
eval is a powerful procedure which should never be used unless it's the most sensible way to solve a problem. I've seen it twice in production code during my 17 year career and I think it's one time too much.

What's the difference between luaL_checknumber and lua_tonumber?

Clarification (sorry the question was not specific): They both try to convert the item on the stack to a lua_Number. lua_tonumber will also convert a string that represents a number. How does luaL_checknumber deal with something that's not a number?
There's also luaL_checklong and luaL_checkinteger. Are they the same as (int)luaL_checknumber and (long)luaL_checknumber respectively?
The reference manual does answer this question. I'm citing the Lua 5.2 Reference Manual, but similar text is found in the 5.1 manual as well. The manual is, however, quite terse. It is rare for any single fact to be restated in more than one sentence. Furthermore, you often need to correlate facts stated in widely separated sections to understand the deeper implications of an API function.
This is not a defect, it is by design. This is the reference manual to the language, and as such its primary goal is to completely (and correctly) describe the language.
For more information about "how" and "why" the general advice is to also read Programming in Lua. The online copy is getting rather long in the tooth as it describes Lua 5.0. The current paper edition describes Lua 5.1, and a new edition describing Lua 5.2 is in process. That said, even the first edition is a good resource, as long as you also pay attention to what has changed in the language since version 5.0.
The reference manual has a fair amount to say about the luaL_check* family of functions.
Each API entry's documentation block is accompanied by a token that describes its use of the stack, and under what conditions (if any) it will throw an error. Those tokens are described at section 4.8:
Each function has an indicator like this: [-o, +p, x]
The first field, o, is how many elements the function pops from the
stack. The second field, p, is how many elements the function pushes
onto the stack. (Any function always pushes its results after popping
its arguments.) A field in the form x|y means the function can push
(or pop) x or y elements, depending on the situation; an interrogation
mark '?' means that we cannot know how many elements the function
pops/pushes by looking only at its arguments (e.g., they may depend on
what is on the stack). The third field, x, tells whether the function
may throw errors: '-' means the function never throws any error; 'e'
means the function may throw errors; 'v' means the function may throw
an error on purpose.
At the head of Chapter 5 which documents the auxiliary library as a whole (all functions in the official API whose names begin with luaL_ rather than just lua_) we find this:
Several functions in the auxiliary library are used to check C
function arguments. Because the error message is formatted for
arguments (e.g., "bad argument #1"), you should not use these
functions for other stack values.
Functions called luaL_check* always throw an error if the check is not
satisfied.
The function luaL_checknumber is documented with the token [-0,+0,v] which means that it does not disturb the stack (it pops nothing and pushes nothing) and that it might deliberately throw an error.
The other functions that have more specific numeric types differ primarily in function signature. All are described similarly to luaL_checkint() "Checks whether the function argument arg is a number and returns this number cast to an int", varying the type named in the cast as appropriate.
The function lua_tonumber() is described with the token [-0,+0,-] meaning it has no effect on the stack and does not throw any errors. It is documented to return the numeric value from the specified stack index, or 0 if the stack index does not contain something sufficiently numeric. It is documented to use the more general function lua_tonumberx() which also provides a flag indicating whether it successfully converted a number or not.
It too has siblings named with more specific numeric types that do all the same conversions but cast their results.
Finally, one can also refer to the source code, with the understanding that the manual is describing the language as it is intended to be, while the source is a particular implementation of that language and might have bugs, or might reveal implementation details that are subject to change in future versions.
The source to luaL_checknumber() is in lauxlib.c. It can be seen to be implemented in terms of lua_tonumberx() and the internal function tagerror() which calls typerror() which is implemented with luaL_argerror() to actually throw the formatted error message.
They both try to convert the item on the stack to a lua_Number. lua_tonumber will also convert a string that represents a number. luaL_checknumber throws a (Lua) error when it fails a conversion - it long jumps and never returns from the POV of the C function. lua_tonumber merely returns 0 (which can be a valid return as well.) So you could write this code which should be faster than checking with lua_isnumber first.
double r = lua_tonumber(_L, idx);
if (r == 0 && !lua_isnumber(_L, idx))
{
// Error handling code
}
return r;

Resources