parametric functions in smtlib - z3

I understand that there is a way to declare parametric datatypes in SMTLIB. Is there a way to define a function that takes in such type? For instance the standard doc has:
( declare - datatypes ( ( Pair 2) ) (
( par ( X Y ) ( ( pair ( first X ) ( second Y )) ))))
Now how can I declare a function that takes in a parametric Pair type?

SMTLib does not allow for parametric function definitions. Note that internal functions, like +, - etc., can be many-sorted/parametric, they work on Integers and Reals just fine, for instance. But user-defined functions aren't allowed to be many-sorted. That is, you can write a function that takes a (Pair Int Bool), but not one that takes (Pair a b) where a and b are type-variables; i.e., polymorphic.
This is explained in Section 4.1.5 of https://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2021-05-12.pdf:
Well-sortedness checks, required for commands that use sorts or terms, are always done with respect to the current signature. It is an error to declare or define a symbol that is already in the current signature. This implies in particular that, contrary to theory function symbols, user-defined function symbols cannot be overloaded.(29)
And later on in Footnote 29, it says:
The motivation for not overloading user-defined symbols is to simplify their processing by a solver. This restriction is significant only for users who want to extend the signature of the theory used by a script with a new polymorphic function symbol—i.e., one whose rank would contain parametric sorts if it was a theory symbol. For instance, users who want to declare a “reverse” function on arbitrary lists, must define a different reverse function symbol for each (concrete) list sort used in the script. This restriction might be removed in future versions.
TLDR; No you cannot define parametric functions in SMTLib. But this might change in the future as the logic gets richer. The current workaround is the process of "monomorphisation," i.e, generate a new version of the function at each concrete-type that you use. This can be done by hand, or it can be automated by a higher-level tool that generates the SMTLib for you.

Related

wxMaxima: subindexed variables in functions work when written as "x_1" but not when written as "x[1]"

I'm having trouble defining a function in terms of variables with subindices. Using the makelist command I can create an unspecified function that depends upon the subindexed variables x[1] and x[2]. However, when I try to give an expression to that function, wxMaxima does not allow it:
On the other hand, if I write the subindexed variables as x_1 and x_2 instead of x[1] and x_[2], things do work.
What is the reason for this behavior? Aren't the two subindexing methods equivalent in terms of functions?
Only symbols can be declared function arguments. In particular, subscripted expressions are not symbols and therefore can't be function arguments.
WxMaxima displays symbols which end in a number, e.g., x_1, the same as subscripted expressions, e.g., x[1]. This is intended as a convenience, although it is confusing because it makes it difficult to distinguish the two.
You can see the internal form of an expression via ?print (note the question mark is part of the name). E.g., ?print(x_1); versus ?print(x[1]);.

Modelling generic datatypes for Z3 and or SMT(v2.6)

I would like to model the behaviour of generic datatypes in SMT v2.6. I am using Z3 as constraint solver. I modelled, based on the official example, a generic list as parameterised datatype in the following way:
(declare-datatypes (T) ((MyList nelem (cons (hd T) (tl MyList)))))
I would like the list to be generic with respect to the datatype. Later on, I would like to declare constants the following way:
(declare-const x (MyList Int))
(declare-const y (MyList Real))
However, now I would like to define functions on the generic datatype MyList (e.g., a length operation, empty operation, ...) so that they are re-usable for all T's. Do you have an idea how I could achieve this? I did try something like:
(declare-sort K)
(define-fun isEmpty ((in (MyList K))) Bool
(= in nelem)
)
but this gives me an error message; for this example to work Z3 would need to do some type-inference, I suppose.
Would be great if you could could give me a hint.
SMT-Lib does not allow polymorphic user-defined functions. Section 4.1.5 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf states:
Well-sortedness checks, required for commands that use sorts or terms,
are always done with respect to the current signature. It is an error
to declare or define a symbol that is already in the current
signature. This implies in particular that, contrary to theory
function symbols, user-defined function symbols cannot be overloaded.
Which is further expanded in Footnote-29:
The motivation for not overloading user-defined symbols is to simplify
their processing by a solver. This restriction is significant only for
users who want to extend the signature of the theory used by a script
with a new polymorphic function symbol—i.e., one whose rank would
contain parametric sorts if it was a theory symbol. For instance,
users who want to declare a “reverse” function on arbitrary lists,
must define a different reverse function symbol for each (concrete)
list sort used in the script. This restriction might be removed in
future versions.
So, as you suspected, you cannot define "polymorphic" functions at the user level. But as the footnote indicates, this restriction might be removed in the future, something that will most likely happen as SMT-solvers are more widely deployed. Exactly when that might happen, however, is anyone's guess.

How can procedures be values in MIT scheme?

Conceptually I see a value as a single element. I can understand that at the lowest level of hardware the value returned is zero or one. I just see a "value" as returning a single unit. I see a procedure as a multiple unit. For example, a procedure (+ x x) to me seems like it should return "(", ")", "+" , "x". In this example, the value of lambda is the procedure.
What am I missing here?
Scheme is primarily a functional programming language. Functional languages deal with expressions1 (as opposed to statements); expressions are at the core of functional languages pretty much like classes are at the core of object-oriented languages.
In Scheme, functions are expressed as lambda expressions. Since Scheme primarily deals with expressions, and since lambda expressions themselves are expressions, Scheme deals with functions just like any other expression. Therefore, functions are first-class citizens of the language.
I don't think one should feel overly concerned about how exactly all of this translates under the hood in terms of bits and bytes. What plays as a strength in some languages (C/C++) can quickly turn against you here: imperative thinking in Scheme will only get you frustrated, and bounce you right back out to mainstream paradigms and languages.
What functional languages are really about is abstraction, metaprogramming (many Schemes feature powerful syntactic macros), and more abstraction. There is this well-known quote from Peter Deutsch: "Lisp ... made me aware that software could be close to executable mathematics.", I think it sums it up very well.
1 In Scheme (and other dialects of LISP), s-expressions are used to denote expressions. They give the language that distinctive parenthesized syntax.
When the program is compiled the code for a procedure is generated. When you run the program the code for a procedure is stored at a given address. A procedure value will in most implementations consist of a record/struct containing the name of the procedure, the address where the procedure us stored (when you call the procedure, the cpu jumps to this address) and finally in the case of procedure created by a lambda expression a table of the free values.
In Scheme everything is passed by value. When that said values that are not able to be stored in the actual address space of a machine word are pointers to data structures.
An internal procedure (primitive) is treated specially such that it might be just a value while a evaluated lambda expression is multi value object. That object has an address which is the "value" of the procedure. When evaluating a lambda form it turns into such a structure. Example:
;; a lambda form is evaluated into a closure and then
;; that object is the value of variable x
(define x (lambda y y)) ; ==> undefined
;; x is evaluated and turns out to be a closure. Scheme
;; evalautes rest of arguments before applying.
(x 1 2 3) ; ==> (1 2 3)
;; same only that all the arguments also evaluates to the same closure.
(x x x x) ; ==> (#<closure> #<closure> #<closure>)

Dart int and double being interned? Treated specially by identical()?

Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?
Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.
Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.
I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.

Why are functions in OCaml/F# not recursive by default?

Why is it that functions in F# and OCaml (and possibly other languages) are not by default recursive?
In other words, why did the language designers decide it was a good idea to explicitly make you type rec in a declaration like:
let rec foo ... = ...
and not give the function recursive capability by default? Why the need for an explicit rec construct?
The French and British descendants of the original ML made different choices and their choices have been inherited through the decades to the modern variants. So this is just legacy but it does affect idioms in these languages.
Functions are not recursive by default in the French CAML family of languages (including OCaml). This choice makes it easy to supercede function (and variable) definitions using let in those languages because you can refer to the previous definition inside the body of a new definition. F# inherited this syntax from OCaml.
For example, superceding the function p when computing the Shannon entropy of a sequence in OCaml:
let shannon fold p =
let p x = p x *. log(p x) /. log 2.0 in
let p t x = t +. p x in
-. fold p 0.0
Note how the argument p to the higher-order shannon function is superceded by another p in the first line of the body and then another p in the second line of the body.
Conversely, the British SML branch of the ML family of languages took the other choice and SML's fun-bound functions are recursive by default. When most function definitions do not need access to previous bindings of their function name, this results in simpler code. However, superceded functions are made to use different names (f1, f2 etc.) which pollutes the scope and makes it possible to accidentally invoke the wrong "version" of a function. And there is now a discrepancy between implicitly-recursive fun-bound functions and non-recursive val-bound functions.
Haskell makes it possible to infer the dependencies between definitions by restricting them to be pure. This makes toy samples look simpler but comes at a grave cost elsewhere.
Note that the answers given by Ganesh and Eddie are red herrings. They explained why groups of functions cannot be placed inside a giant let rec ... and ... because it affects when type variables get generalized. This has nothing to do with rec being default in SML but not OCaml.
One crucial reason for the explicit use of rec is to do with Hindley-Milner type inference, which underlies all staticly typed functional programming languages (albeit changed and extended in various ways).
If you have a definition let f x = x, you'd expect it to have type 'a -> 'a and to be applicable on different 'a types at different points. But equally, if you write let g x = (x + 1) + ..., you'd expect x to be treated as an int in the rest of the body of g.
The way that Hindley-Milner inference deals with this distinction is through an explicit generalisation step. At certain points when processing your program, the type system stops and says "ok, the types of these definitions will be generalised at this point, so that when someone uses them, any free type variables in their type will be freshly instantiated, and thus won't interfere with any other uses of this definition."
It turns out that the sensible place to do this generalisation is after checking a mutually recursive set of functions. Any earlier, and you'll generalise too much, leading to situations where types could actually collide. Any later, and you'll generalise too little, making definitions that can't be used with multiple type instantiations.
So, given that the type checker needs to know about which sets of definitions are mutually recursive, what can it do? One possibility is to simply do a dependency analysis on all the definitions in a scope, and reorder them into the smallest possible groups. Haskell actually does this, but in languages like F# (and OCaml and SML) which have unrestricted side-effects, this is a bad idea because it might reorder the side-effects too. So instead it asks the user to explicitly mark which definitions are mutually recursive, and thus by extension where generalisation should occur.
There are two key reasons this is a good idea:
First, if you enable recursive definitions then you can't refer to a previous binding of a value of the same name. This is often a useful idiom when you are doing something like extending an existing module.
Second, recursive values, and especially sets of mutually recursive values, are much harder to reason about then are definitions that proceed in order, each new definition building on top of what has been already defined. It is nice when reading such code to have the guarantee that, except for definitions explicitly marked as recursive, new definitions can only refer to previous definitions.
Some guesses:
let is not only used to bind functions, but also other regular values. Most forms of values are not allowed to be recursive. Certain forms of recursive values are allowed (e.g. functions, lazy expressions, etc.), so it needs an explicit syntax to indicate this.
It might be easier to optimize non-recursive functions
The closure created when you create a recursive function needs to include an entry that points to the function itself (so the function can recursively call itself), which makes recursive closures more complicated than non-recursive closures. So it might be nice to be able to create simpler non-recursive closures when you don't need recursion
It allows you to define a function in terms of a previously-defined function or value of the same name; although I think this is bad practice
Extra safety? Makes sure that you are doing what you intended. e.g. If you don't intend it to be recursive but you accidentally used a name inside the function with the same name as the function itself, it will most likely complain (unless the name has been defined before)
The let construct is similar to the let construct in Lisp and Scheme; which are non-recursive. There is a separate letrec construct in Scheme for recursive let's
Given this:
let f x = ... and g y = ...;;
Compare:
let f a = f (g a)
With this:
let rec f a = f (g a)
The former redefines f to apply the previously defined f to the result of applying g to a. The latter redefines f to loop forever applying g to a, which is usually not what you want in ML variants.
That said, it's a language designer style thing. Just go with it.
A big part of it is that it gives the programmer more control over the complexity of their local scopes. The spectrum of let, let* and let rec offer an increasing level of both power and cost. let* and let rec are in essence nested versions of the simple let, so using either one is more expensive. This grading allows you to micromanage the optimization of your program as you can choose which level of let you need for the task at hand. If you don't need recursion or the ability to refer to previous bindings, then you can fall back on a simple let to save a bit of performance.
It's similar to the graded equality predicates in Scheme. (i.e. eq?, eqv? and equal?)

Resources