By some reasons I have to use C++ API and C API of Z3 together. In C++ API, reference counting of Z3 objects are well maintained and I needn't to worry about making mistakes. However I have to manually maintain reference counting for Z3 objects when I use C API because C++ API uses Z3_mk_context_rc to create the context. I have several problems on reference counting maintenance in Z3.
(1) If the reference counting of a Z3_ast is reduced to 0, what is responsible to release the memory of this Z3_ast? And when?
(2) The code below
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
Z3_ast res = Z3_mk_eq(c,x,y);
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
Z3_ast fe = Z3_mk_eq(c,z,u);
#endif
std::cout << Z3_ast_to_string(c,res) << std::endl;
}
void main()
{
config cfg;
cfg.set("MODEL", true);
cfg.set("PROOF", true);
context c(cfg);
rctry(c);
}
Although I didn't increase reference count for AST referenced by res, the program works well. If FAULT_CLAUSE is defined, program still works, but it will output (= z u) instead of (= x y). How to explain this?
Thank you!
My golden rule for reference counting is: Whenever my program receives a pointer to a Z3 object, I immediately increment the ref count and I save the object somewhere safe (i.e., I now own 1 reference to that object). Only when I'm absolutely sure that I will not need the object any longer, then I will call Z3_dec_ref; from that point on, any access to that object will trigger undefined behavior (not necessarily a segfault), because I don't own any references anymore - Z3 owns all the rerferences and it can do whatever it wants to do with them.
Z3 objects are always deallocated when the ref count goes to zero; it's within the call to dec_ref() that the deallocation happens. If Z3_dec_ref() is never called (like in the example given), then the object may remain in memory so accessing that particular part of the memory might perhaps still give "ok looking" results, but that part of the memory may also be overwritten by other procedures so that they contain garbage.
In the example program given, we would need to add inc/dec_ref calls as follows:
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
Z3_ast res = Z3_mk_eq(c,x,y);
Z3_inc_ref(c, res); // I own 1 ref to res!
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
Z3_ast fe = Z3_mk_eq(c,z,u);
Z3_inc_ref(c, fe); I own 1 ref to fe!
#endif
std::cout << Z3_ast_to_string(c, res) << std::endl;
#ifdef FAULT_CLAUSE
Z3_dec_ref(c, fe); // I give up my ref to fe.
#endif
Z3_dec_ref(c, res); // I give up my ref to res.
}
The explanation for the output (= z u) is that the second call to Z3_mk_eq
re-uses the chunk of memory that previously held res, because apparently
only the library itself had a reference to it, so it is free to chose what to
do with the memory. The consequence is that the call to Z3_ast_to_string
reads from the right part of the memory (that used to contain res), but the
contents of that part of the memory have changed in the meanwhile.
That was the long explanation for anybody who needs to manage ref counts in C. In
the case of C++ there is also a much more convenient way: the ast/expr/etc
objects contain a constructor that takes C objects. Therefore, we can construct
managed objects by simply wrapping them in constructor calls; in this
particular example that could be done as follows:
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
expr res = expr(c, Z3_mk_eq(c, x, y)); // res is now a managed expr
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
expr fe = expr(c, Z3_mk_eq(c,z,u)); // fe is now a managed expr
#endif
std::cout << Z3_ast_to_string(c, res) << std::endl;
}
Within the destructor of expr there is a call to Z3_dec_ref, so that it
will be called automatically at the end of the function, when res and fe go
out of scope.
Related
Gave VisualRust another try and to see how far they got, I wrote a few lines of code. And as usual, the code causes me to write a question on stackoverflow...
See first, read my question later:
fn make_counter( state : &mut u32 ) -> Box<Fn()->u32>
{
Box::new(move || {let ret = *state; *state = *state + 1; ret })
}
fn test_make_counter() {
let mut cnt : u32 = 0;
{
let counter = make_counter( & mut cnt );
let x1 = counter();
let x2 = counter();
println!("x1 = {} x2 = {}",x1,x2);
}
}
fn alt_make_counter ( init : u32 ) -> Box<Fn()->u32> {
let mut state = init;
Box::new(move || {let ret = state; state = state + 1; ret })
}
fn test_alt_make_counter() {
let counter = alt_make_counter( 0u32 );
let x1 = counter();
let x2 = counter();
println!("x1 = {} x2 = {}",x1,x2);
}
fn main() {
test_make_counter();
test_alt_make_counter();
}
The difference between make_counter() and alt_make_counter() is, that in one case, the state is a pointer to a mutable u32 passed to the function and in the other case, it is a mutable u32 defined inside the function. As the test_make_counter() function shows clearly, there is no way, that the closure lives longer than the variable cnt. Even if I removed the block inside test_make_counter() they would still have the identical lifetime. With the block, the counter will die before cnt. And yet, Rust complains:
src\main.rs(4,2): error : captured variable state does not outlive the enclosing closure
src\main.rs(3,1): warning : note: captured variable is valid for the anonymous lifetime #1 defined on the block at 3:0
If you look at the alt_make_counter() function now, the lifetime of state should basically cause the same error message, right? If the code captures the state for the closure, it should not matter if the pointer is passed in or if the variable is bound inside the function, right? But obviously, those 2 cases are magically different.
Who can explain, why they are different (bug, feature, deep insight, ...?) and if there is a simple rule one can adopt which prevents wasting time over such issues now and then?
The difference is not in using a local variable vs. using a parameter. Parameters are perfectly ordinary locals. In fact, this version of alt_make_counter works1:
fn alt_make_counter (mut state: u32) -> Box<FnMut() -> u32> {
Box::new(move || {let ret = state; state = state + 1; ret })
}
The problem is that the closure in make_counter closes over a &mut u32 instead of u32. It doesn't have its own state, it uses an integer somewhere else as its scratch space. And thus it needs to worry about the lifetime of that location. The function signature needs to communicate that the closure can only work while it can still use the reference that was passed in. This can be expressed with a lifetime parameter:
fn make_counter<'a>(state: &'a mut u32) -> Box<FnMut() -> u32 + 'a> {
Box::new(move || {let ret = *state; *state = *state + 1; ret })
}
Note that 'a is also attached to the FnMut() -> u32 (though with a different syntax because it's a trait).
The simplest rule to avoid such trouble is to not use references when they cause problems. There is no good reason for this closure to borrow its state, so don't do it. I don't know whether you fall under this, but I've seen a bunch of people that were under the impression that &mut is the primary or only way to mutate something. That is wrong. You can just store it by value and then just mutate that directly by storing it, or the larger structure in which it is contained, in a local variable that is tagged as mut. A mutable reference is only useful if the results of the mutation needs to be shared with some other code and you can't just pass the new value to that code.
Of course, sometimes juggling references in complicated ways is necessary. Unfortunately there doesn't seem to be a quick and easy way to learn to deal with those confidently. It's a big pedagogic challenge, but so far it appears everyone just struggled for a while and then progressively had fewer problems as they get more experienced. No, there is no single simple rule that solves all lifetime woes.
1 The return type has to be FnMut in all cases. You just didn't get an error about that yet because your current error happens at an earlier stage in the compilation.
In the following code, does the "box 5i" get properly freed when exiting the "main" scope? The wording on their pointer guide seems to indicate that variables with box types act as if there's an automatic "free()" call when the variable goes out of scope. However, if you "free()" on "a" in this code, it would only end up freeing the "box 8i" that is on the heap. What happens to the "box 5i" that "a" was originally pointing to?
fn foo(a: &mut Box<int>) {
*a = box 8i;
}
fn main() {
let mut a = box 5i;
println!("{}", a); // -> "5"
foo(&mut a);
println!("{}", a); // -> "8"
}
By default, overwriting a memory location will run the destructor of the old value. For Box<...> this involves running the destructor of the contents (which is nothing for an int) and freeing the allocation, so if a has type &mut Box<T>, *a = box value is equivalent to (in C):
T_destroy(**a);
free(*a);
*a = malloc(sizeof T);
**a = value;
In some sense, the answer to your question is yes, because the type system guarantees that *a = box ... can only work if a is the only reference to the old Box, but unlike most garbage collected/managed languages this is all determined statically, not dynamically (it is a direct consequence of ownership and linear/affine types).
In short, I need to be able to traverse Z3_ast tree and access the data associated with its nodes. Cannot seem to find any documentation/examples on how to do that. Any pointers would be helpful.
At length, I need to parse smt2lib type formulae into Z3, make some variable to constant substitutions and then reproduce the formula in a data structure which is compatible with another unrelated SMT sovler (mistral to be specific, I don't think details about mistral are important to this question but funnily enough it does not have a command line interface where I can feed it text formulae. It just has a C API). I have figured that to generate the formula in mistral's format, I would need to traverse the Z3_ast tree and reconstruct the formula in the desired format. I cannot seem to find any documentation/examples that demonstrate how to do this. Any pointers would be helpful.
Consider using the C++ auxiliary classes defined at z3++.h. The Z3 distribution also includes an example using these classes. Here is a small code fragment that traverses a Z3 expression.
If your formulas do not contain quantifiers, then you don't even need to handle the is_quantifier() and is_var() branches.
void visit(expr const & e) {
if (e.is_app()) {
unsigned num = e.num_args();
for (unsigned i = 0; i < num; i++) {
visit(e.arg(i));
}
// do something
// Example: print the visited expression
func_decl f = e.decl();
std::cout << "application of " << f.name() << ": " << e << "\n";
}
else if (e.is_quantifier()) {
visit(e.body());
// do something
}
else {
assert(e.is_var());
// do something
}
}
void tst_visit() {
std::cout << "visit example\n";
context c;
expr x = c.int_const("x");
expr y = c.int_const("y");
expr z = c.int_const("z");
expr f = x*x - y*y >= 0;
visit(f);
}
I am writing a compiler of mini-pascal in Ocaml. I would like my compiler to accept the following code for instance:
program test;
var
a,b : boolean;
n : integer;
begin
...
end.
I have difficulties in dealing with the declaration of variables (the part following var). At the moment, the type of variables is defined like this in sib_syntax.ml:
type s_var =
{ s_var_name: string;
s_var_type: s_type;
s_var_uniqueId: s_uniqueId (* key *) }
Where s_var_uniqueId (instead of s_var_name) is the unique key of the variables. My first question is, where and how I could implement the mechanism of generating a new id (actually by increasing the biggest id by 1) every time I have got a new variable. I am wondering if I should implement it in sib_parser.mly, which probably involves a static variable cur_id and the modification of the part of binding, again don't know how to realize them in .mly. Or should I implement the mechanism at the next stage - the interpreter.ml? but in this case, the question is how to make the .mly consistent with the type s_var, what s_var_uniqueId should I provide in the part of binding?
Another question is about this part of statement in .mly:
id = IDENT COLONEQ e = expression
{ Sc_assign (Sle_var {s_var_name = id; s_var_type = St_void}, e) }
Here, I also need to provide the next level (the interpreter.ml) a variable of which I only know the s_var_name, so what could I do regarding its s_var_type and s_var_uniqueId here?
Could anyone help? Thank you very much!
The first question to ask yourself is whether you actually need an unique id. From my experience, they're almost never necessary or even useful. If what you're trying to do is making variables unique through alpha-equivalence, then this should happen after parsing is complete, and will probably involve some form of DeBruijn indices instead of unique identifiers.
Either way, a function which returns a new integer identifier every time it is called is:
let unique =
let last = ref 0 in
fun () -> incr last ; !last
let one = unique () (* 1 *)
let two = unique () (* 2 *)
So, you can simply assign { ... ; s_var_uniqueId = unique () } in your Menhir rules.
The more important problem you're trying to solve here is that of variable binding. Variable x is defined in one location and used in another, and you need to determine that it happens to be the same variable in both places. There are many ways of doing this, one of them being to delay the binding until the interpreter. I'm going to show you how to deal with this during parsing.
First, I'm going to define a context: it's a set of variables that allows you to easily retrieve a variable based on its name. You might want to create it with hash tables or maps, but to keep things simple I will be using List.assoc here.
type s_context = {
s_ctx_parent : s_context option ;
s_ctx_bindings : (string * (int * s_type)) list ;
s_ctx_size : int ;
}
let empty_context parent = {
s_ctx_parent = parent ;
s_ctx_bindings = [] ;
s_ctx_size = 0
}
let bind v_name v_type ctx =
try let _ = List.assoc ctx.s_ctx_bindings v_name in
failwith "Variable is already defined"
with Not_found ->
{ ctx with
s_ctx_bindings = (v_name, (ctx.s_ctx_size, v_type))
:: ctx.s_ctx_bindings ;
s_ctx_size = ctx.s_ctx_size + 1 }
let rec find v_name ctx =
try 0, List.assoc ctx.s_ctx_bindings v_name
with Not_found ->
match ctx.s_ctx_parent with
| Some parent -> let depth, found = find v_name parent in
depth + 1, found
| None -> failwith "Variable is not defined"
So, bind adds a new variable to the current context, find looks for a variable in the current context and its parents, and returns both the bound data and the depth at which it was found. So, you could have all global variables in one context, then all parameters of a function in another context that has the global context as its parent, then all local variables in a function (when you'll have them) in a third context that has the function's main context as the parent, and so on.
So, for instance, find 'x' ctx will return something like 0, (3, St_int) where 0 is the DeBruijn index of the variable, 3 is the position of the variable in the context identified by the DeBruijn index, and St_int is the type.
type s_var = {
s_var_deBruijn: int;
s_var_type: s_type;
s_var_pos: int
}
let find v_name ctx =
let deBruijn, (pos, typ) = find v_name ctx in
{ s_var_deBruijn = deBruijn ;
s_var_type = typ ;
s_var_pos = pos }
Of course, you need your functions to store their context, and make sure that the first argument is the variable at position 0 within the context:
type s_fun =
{ s_fun_name: string;
s_fun_type: s_type;
s_fun_params: context;
s_fun_body: s_block; }
let context_of_paramlist parent paramlist =
List.fold_left
(fun ctx (v_name,v_type) -> bind v_name v_type ctx)
(empty_context parent)
paramlist
Then, you can change your parser to take into account the context. The trick is that instead of returning an object representing part of your AST, most of your rules will return a function that takes a context as an argument and returns an AST node.
For instance:
int_expression:
(* Constant : ignore the context *)
| c = INT { fun _ -> Se_const (Sc_int c) }
(* Variable : look for the variable inside the contex *)
| id = IDENT { fun ctx -> Se_var (find id ctx) }
(* Subexpressions : pass the context to both *)
| e1 = int_expression o = operator e2 = int_expression
{ fun ctx -> Se_binary (o, e1 ctx, e2 ctx) }
;
So, you simply propagate the context "down" recursively through the expressions. The only clever parts are those when new contexts are created (you don't have this syntax yet, so I'm just adding a placeholder):
| function_definition_expression (args, body)
{ fun ctx -> let ctx = context_of_paramlist (Some ctx) args in
{ s_fun_params = ctx ;
s_fun_body = body ctx } }
As well as the global context (the program rule itself does not return a function, but the block rule does, and so a context is created from the globals and provided).
prog:
PROGRAM IDENT SEMICOLON
globals = variables
main = block
DOT
{ let ctx = context_of_paramlist None globals in
{ globals = ctx;
main = main ctx } }
All of this makes the implementation of your interpreter much easier due to the DeBruijn indices: you can have a "stack" which holds your values (of type value) defined as:
type stack = value array list
Then, reading and writing variable x is as simple as:
let read stack x =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos)
let write stack x value =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos) <- value
Also, since we made sure that function parameters are in the same order as their position in the function context, if you want to call function f and its arguments are stored in the array args, then constructing the stack is as simple as:
let inner_stack = args :: stack in
(* Evaluate f.s_fun_body with inner_stack here *)
But I'm sure you'll have a lot more questions to ask when you start working on your interpeter ;)
How to create a global id generator:
let unique =
let counter = ref (-1) in
fun () -> incr counter; !counter
Test:
# unique ();;
- : int = 0
# unique ();;
- : int = 1
Regarding your more general design question: it seems that your data representation does not faithfully represent the compiler phases. If you must return a type-aware data-type (with this field s_var_type) after the parsing phase, something is wrong. You have two choices:
devise a more precise data representation for the post-parsing AST, that would be different from the post-typing AST, and not have those s_var_type fields. Typing would then be a conversion from the untyped to the typed AST. This is a clean solution that I would recommend.
admit that you must break the data representation semantics because you don't have enough information at this stage, and try to be at peace with the idea of returning garbage such as St_void after the parsing phase, to reconstruct the correct information later. This is less typed (as you have an implicit assumption on your data which is not apparent in the type), more pragmatic, ugly but sometimes necessary. I don't think it's the right decision in this case, but you will encounter situation where it's better to be a bit less typed.
I think the specific choice of unique id handling design depends on your position on this more general question, and your concrete decisions about types. If you choose a finer-typed representation of post-parsing AST, it's your choice to decide whether to include unique ids or not (I would, because generating a unique ID is dead simple and doesn't need a separate pass, and I would rather slightly complexify the grammar productions than the typing phase). If you choose to hack the type field with a dummy value, it's also reasonable to do that for variable ids if you wish to, putting 0 as a dummy value and defining it later; but still I personally would do that in the parsing phase.
Is there an equivalent of driver_output_term in the other direction, i.e. sending an Erlang term to the driver without converting it to an iolist first? If not, I presumably should convert my term using term_to_binary and parse it on the C side with ei; any good examples?
According to the docs, you can only send stuff that's in iodata() format.
If all you want send to the driver is integers and strings, it might be more efficient (and a lot easier) to use your own term-to-iodata encoding, as in this tutorial from the Erlang documentation. They use a function to convert their calls to a mapping that can be sent to the driver directly and therefore doesn't need to be encoded using term_to_binary().
encode({foo, X}) -> [1, X];
encode({bar, Y}) -> [2, Y].
This mapping is feasible if X and Y are assumed to be small integers.
On the C side, the first byte of the input buffer is switched upon to call the appropriate function using the second byte as the argument:
static void example_drv_output(ErlDrvData handle, char *buff, int bufflen)
{
example_data* d = (example_data*)handle;
char fn = buff[0], arg = buff[1], res;
if (fn == 1) {
res = foo(arg);
} else if (fn == 2) {
res = bar(arg);
}
driver_output(d->port, &res, 1);
}