Is there a way to see what Z3 is doing under the hood? I would like to be able to see the steps it is taking, how long they take, how many steps, etc. I'm checking the equality of floating point addition/multiplication hardware designs with Z3's builtin floating point addition/multiplication. It is taking quite longer than expected, and it would be helpful to see what exactly it's doing in the process.
You can run it with higher verbosity:
z3 -v:10
This will print a lot of diagnostic info. But the output is unlikely to be readable/understandable unless you're really familiar with the source code. Of course, being an open source project, you can always study the code itself: https://github.com/Z3Prover/z3
If you're especially interested in e-matching and quantifiers, you can use the Axiom Profiler (GitHub source, research paper) to analyse a Z3 trace. The profiler shows instantiation graphs, tries to explain instantiations (which term triggered, which equalities were involved), and can even detect and explain matching loops.
For this, run z3 trace=true, and open the generated trace file in the profiler. Note that the tool is a research prototype: I've noticed some bugs, it seems to work more reliable on Windows than on *nix, and it might not always support the latest Z3 versions.
Related
https://rise4fun.com/Dafny/ZkKN
This assertion is not verified by Dafny 2.3.0. over MVS, but it is verfied in rise4fun, of course with a warning about triggers. It causes "Verification inconclusive".
Moreover, https://rise4fun.com/Dafny/Um6t does not print "hello" (is not running) in rise4fun. It should be some error since there is no "assertion violation".
Please, some help?
Your program verifies when I add the -arith:2 flag, which adds symbolic synonyms for the arithmetic symbols and allows them to be used in triggers.
Edit:
A more general answer is that your problem uses nonlinear arithmetic, which is in general undecidable. There are some tips on how to handle those in the FAQ at https://github.com/dafny-lang/dafny/wiki/FAQ, I don't have much experience with Dafny and nonlinear arithmetic myself, however.
I don't know why your file has worked before, but to investigate you could print the SMT encoding Dafny feeds to Z3 (see dafny output as SMT file) and compare different versions, if there's no difference, maybe there's a difference between Z3 versions.
Maybe there's a way to encode your problem differently, that works in a more stable manner between different solver versions, assuming there's no bug in any of the tools.
I'm trying to debug a program that is using the Z3 API, and I'm wondering if there's a way, either from within the API or by giving Z3 a command, to print the current logical context, hopefully as if it had been read in an SMT-LIB file.
This question from 7 years ago seemed to indicate that there would be a way to do this, but I couldn't find it in the API docs.
Part of my motivation is that I'm trying to debug whether my program is slow because it's creating an SMT problem that's hard to solve, or whether the slowdown is elsewhere. Being able to view the current context as an SMT-LIB file, and run it in Z3 on the command line, would make this easier.
It's not quite clear what you mean by "logical context." If you mean all the assertions the user has given to the solver, then the command:
(get-assertions)
will return it as an S-expression like list; see Section 4.2.4 of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf
But this doesn't sound useful for your purposes; after all it is going to return precisely everything you yourself have asserted.
If you're looking for a dump of all the learned-lemmas, internal assertions the solver created etc; I'm afraid there's no way to do that from SMTLib. You probably can't even do that using the programmatic API either. (Though this needs to be checked.) That would only be possible by actually modifying the source code of z3 itself (which is open-source), and putting in relevant debug traces. But that would require a lot of study of the internals of z3 and would unlikely to help unless you're intimately knowledgeable about z3 code base itself.
I find that running z3 -v:10 can sometimes provide diagnostic info; if you see it repeatedly printing something, it's a good indication that something has gone wrong in that area. But again, what it prints and what it exactly means is guess work unless you study the source code itself.
for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.
flex is called the "fast" lexical analyzer, but I can not find any document that explains why it is faster than lex. flex has a manual, but it focus on its usage instead of its internals. Could any experts in this field give some help please? Either an explanation about flex's performance improvements or a link to it is welcome.
This answer is from Vern Paxson, and he allows it being shared here.
Alas, this would take quite a bit of time to sketch in any sort of
useful detail, as there are a number of techniques that contribute to
its performance. I wrote a paper about it a loooong time ago (mid
80s!) but don't have a copy of it. Evidently you can buy it from:
http://www.ntis.gov/search/product.aspx?ABBR=DE85000703
Sorry not to be of more help ...
To add to Vern's statement, flex does a lot better job of table compression, providing several different space/time tradeoffs, and its inner loop is also considerably faster than lex's.
According to a (Usenet?) paper by van Jacobsen in the 1980s, lex was largely written by an AT&T intern. VJ described how its inner loop could be reduced from several dozen instructions to about three.
Vern Paxon wrote flex for what he described at the time as the fastest data acquisition applications in the world. Not sure if I should go into more details here.
I had the privilege of helping Vern with the 8-bit version, as I was working in compilers that had to scan Kanji and Katakana at the time.
I'm not so sure flex is so much faster than the AT&T version lex. Both programs have been developed independently and to avoid confusion with the official version, the authors of flex probably choose a slightly different name. They might have intended to generate faster scanners, which is also suggested by a couple of options to trade space for time. Also they motivate making %option yylineno (and a few other features) optional with the speed of the generated scanner.
Whether the slight differences in speed for such scanners are still relevant is debatable. I couldn't find any official statement on the choice of name either, so I guess you'd have to ask the original authors Jef Poskanzer and/or Vern Paxson. If you find them and get an answer, then please let us know here. History of software is interesting and you can still get the answer first hand.
After changing the order of assertions in unsat query it becomes sat.
The query structure is:
definitions1
assertions1
definitions2
bad_assertions
check-sat
I sort bad_assertions with Python's sorted function, and this makes Unsat query Sat.
Z3 versions 4.0, 4.1; Ubuntu 12.04
Unfortunately, queries are quite large which makes them difficult to debug,
so I can provide any other additional info if.
Here are originally unsat query with marked lines for mixing, and a simple python script to mix lines in the query.
I managed to reproduce the problem reported in your question. Both examples are satisfiable. The script that produces unsat is exposing a bug in the datatype theory. I fixed the bug, and the fix will be available in Z3 4.2. Since this is a soundness bug, we will release version 4.2 very soon. In the meantime, you can workaround the bug by using the option RELEVANCY=0 in the command line.
From your description it sounds like a bug.
sat/unsat should of course not depend on ordering.
If packaging up a repro is difficult, then one way to help us debug
the problem, once you have confidence in what triggers the bug,
is to use "open_log()" to dump a trace of all interactions with Z3.
You should use "open_log" before other calls to Z3.
We can then replay the log without your sources.