Using theory plugins with solvers - z3

Recent versions of Z3 have decoupled the notions of Z3_context and Z3_solver. The API mostly reflects these changes; for instance push is deprecated on contexts and respecified as taking a solver as an extra argument.
The interface for theories has not been updated, however. Theories are still created from contexts, and as far as I can tell, never explicitly attached to solvers.
One could think that a theory created from a context will always be attached to all solvers created from the context, but it seems from our experience that this is not the case. Instead, user-defined theories seem to be ignored entirely.
What is the exact status of the combination of Z3_solvers with Z3_theorys ?

The theory plugins were introduced a long time ago (version 2.8), since then Z3 has changed a lot.
They are considered deprecated in Z3 4.x. They can still be used with the old API, but can't be used with new features and in particular with Z3 solver objects (Z3_solver).
In the current Z3, we have many different solvers. The oldest one (implemented in the folder src/smt) is called smt::context. The theory plugins are actually extensions for this old solver.
We say smt::context is a general purpose solver as it supports many theories and quantifiers.
Actually, in Z3 4.3.1, it is the only general purpose solver available.
However, I believe it is based on an obsolete architecture that is not adequate for the new features we are planning for Z3. My plan is to replace it in the future with a solver based on the architecture described here.
Moreover, we don't really work on smt::context anymore. We are essentially just maintaining it and fixing bugs.
After we released the Z3 source code, I imagined the theory plugin support was not necessary anymore since users would be able to add their extensions inside the Z3 code base. However, this view is too simplistic since it prevents users from writing extensions in different programming languages.
So, the current plan is to eventually have theory plugins for the new solver that will be eventually available in Z3. The goal is to have an API such as:
Z3_solver Z3_mk_mcsat_solver(Z3_context ctx, Z3_mcsat_ext ext);
This API would create a new solver object using the given extension ext.
In the meantime, we could also extend the API with a function such as:
Z3_solver Z3_mk_smt_solver(Z3_context ctx, Z3_theory t);
That would create a new solver object based on smt::context using the given theory plugin.
This solution is conceptually simple, but there is a lot of "plumbing" needed to make it happen.
We have to adjust the Z3_theory object, fix some limitations that prevent theory plugins to be used with features that create copies of smt::context (e.g., MBQI), etc. If someone is very interested in this interface, I would invest cycles on it (remark: we had only a handful of users for the theory plugins). I'm not super excited about it because the plan is to eventually replace smt::context.

Related

Where is the pydata BLAZE project heading?

I find the blaze ecosystem* amazing because it covers most of the data engineering use cases. There was definitely a lot of interest on these projects during the period 2015-2016, but of late it has been ignored. I say this looking at the commits on the github repos.
So my question to the community are
- What happened in 2016 that resulted in lost interest?
- Are there other python based libraries that have replaced blaze?
blaze ecosystem:
Blaze: An interface to query data on different storage systems
Dask: Parallel computing through task scheduling and blocked algorithms
Datashape: A data description language
DyND: A C++ library for dynamic, multidimensional arrays
Odo: Data migration between different storage systems
references:
http://blaze.pydata.org/
I can give some part of the picture, although others were more involved.
Blaze was both an umbrella project for incubating data-engineering ideas into released oss packages, and a package itself focussing on symbolic manipulations of data-frames and translating these into various backend execution engines, particularly database services. Critically, Blaze wanted to be the (start of a) solution for a very broad range of problems! In particular, the translation layer became very large and hard to maintain and by trying to cater to all, limited the range of operations that the symbolic layer could offer.
In terms of an umbrella project, Blaze was a success. Many ideas that started in Blaze percolated into the ecosystem. Probably the most prominent single project to come out of Blaze is Dask, which, while originally planned as an execution layer for Blaze, implements an even larger API of data-frame operations, as well as other high-level collections and arbitrary graph manipulation. Even fully symbolic optimisations exist in Dask, though this is perhaps not as complete. Other Anaconda-stable projects such as numba and bokeh were influenced by the Blaze effort, but I'll not talk about them here.
As far as datashape/dynd go, this is a somewhat crowded space with many other related projects (xnd, uarray, etc) and ideas that can be loosely thought of as "numpy 2" (i.e., more comprehensive, flexible representation of complex data layouts and their description). This has not really been adopted by the community yet, almost everything uses numpy's type system (notable exception of what arrow does internally).
Finally, for data formats and Odo, I encourage you to consider Intake, which can be seem as a successor, which can offer much more functionality such as data source cataloging, and it does this by limiting the scope of operation to read-side. The big web of interactions that is Odo was also a many-to-many problem that became hard to maintain, and by keeping things simpler, Intake is hoping to become the de-facto layer over data loading libraries and the main way to describe location, description and parametrisation of data. Odo is not dead, though, so if file conversion is exactly what you need, you can still use it.
I was looking for a project similar to odo for loading csv data to various sources. An odo issue (https://github.com/blaze/odo/issues/614) recommended d6tstack, which appears to be currently maintained.
In practice, it is often just as easy to roll your own csv loader, in which case the tableschema project is very handy. It automates inferring datatypes from csv files.

Procedural Attachment in Z3

I am using z3py I have a predicate over two integers that needs to be evaluated using a custom algorithm. I have been trying to get it implemented, without much success. Apparently, what I need is a procedural attachment, which is now deprecated. Could anybody tell me how I might impelement this in z3py? I understand that it involves use of Tactics, but I am afraid I haven't managed to figure out how to use them. I wouldn't mind using the deprecated way either, as long as it works.
There is no procedural attachment tactic. All tactics are implemented inside of Z3;
you can compose tactics from outside.
Previous versions of Z3 exposed a way to register a "user theory".
This was deprecated since (1) the source of Z3 is now available so users can compile with their custom theories directly, (2) the user-theory abstraction lacked proper support
for model generation. You can of course try previous versions of Z3 that have the user theory extension, but it is not supported.

Z3/SMT: When should I prefer push/pop to reset?

I am using Z3 to solve the path conditions produced by a symbolic executor, which explores the state space in depth-first order, quite similarly to CUTE, DART or (possibly) SAGE. We are experimenting different ways of using Z3. At one extreme, we send every query to Z3 and (reset) it right after. At the other, we (push) every additional branch constraint, and (pop) (pop) upon backtrack the minimum necessary to correctly weaken the path condition. The problem is, no strategy seems to work better than any other in all the circumstances. Pushing seems to offer the best advantage, but we met a few cases where resetting Z3 after every query is more than one order of magnitude faster than doing push/pop. Note that communication overhead is negligible: almost all the time is spent inside check-sat.
Does anyone have any experience to share, or some indication on the state kept internally by Z3 (lemmas, etc), which can help clarifying its behavior? And what about the behavior of other SMT solvers?
The next release (v4.3.2) will expose a feature that may be useful for you. In Z3, the default solver combines a non-incremental solver and an incremental one. When push/pop are used (or multiple checks are used without invoking reset), Z3 will use the incremental solver. In the next release, we can provide a timeout for the incremental solver. If the incremental solver can't solve the problem in the given timeout, Z3 will automatically switch to the non-incremental one. Perhaps, if you use this feature, you will be able to get the best of "both worlds". To get the source code for the next release candidate, you should use
git clone https://git01.codeplex.com/z3 -b rc
To compile it, we have use
cd z3
python scripts/mk_make.py
cd build
make
To set the timeout for the incremental solver, we have to provide the following command line option:
combined_solver.solver2_timeout=<time in milliseconds>
If you are using the programmatic APIs, you can the new API:
Z3_global_param_set(Z3_string param_id, Z3_string param_value)
Note that, the next release will have a new framework for setting parameters. It allows the user to set parameters for internal Z3 modules.

Is there any free owl reasoner that can reason without load all data into memory?

I use Jena and TDB to store RDF,and I want to do some inference on it.But the RDF data is big,and Jena's owl reasoner has to load all the data into memory .
So I want to find one reasoner that can reason without load all data into memory,is there any one?
Not really. DL reasoning is computationally difficult at even low scale. With lots of data, that's just not going to work with the existing approaches. Doing it over secondary storage is still an open research problem afaik.
However, the various profiles of OWL exist to address this issue. They all have different computational complexities, which are all 'easier' than DL making them much more amenable to reasoning at scale. In particular, QL is designed for query time reasoning which in my experience tends to have a very small memory footprint and RL can be implemented with a standard rule reasoner.
So if you don't need to use DL, then I'd go with a tool that supports one of the profiles and you should get pretty good mileage out of that.
For reference, you might find this document about the compuational complexities of the various OWL dialects interesting.
If you are prepared to take a subset of OWL, there are things you can do in a stream processing fashion without loading all your RDF data in memory and which will materialise all the inferred triples.
As an example, have a look at RIOT's infer command:
http://incubator.apache.org/jena/documentation/io/riot.html#inference
Source code here:
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/tags/jena-arq-2.9.0-incubating/src/main/java/riotcmd/infer.java
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/tags/jena-arq-2.9.0-incubating/src/main/java/org/openjena/riot/pipeline/inf/InferenceProcessorRDFS.java
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/tags/jena-arq-2.9.0-incubating/src/main/java/org/openjena/riot/pipeline/inf/InferenceSetupRDFS.java
It is trivial to take RIOT's infer and run it in parallel with something like MapReduce, example is here:
https://github.com/castagna/tdbloader4/blob/f5363fa49d16a04a362898c1a5084ade620ee81b/src/main/java/org/apache/jena/tdbloader4/InferDriver.java
Another different approach which uses MapReduce to apply the RDFS and OWL ter Horst rules and materialize all the derived statements is here:
http://www.few.vu.nl/~jui200/webpie.html
Perhaps, you can look at the parts of OWL you are interested in and check if you can do it in a streaming fashion. If so, you can take RIOT's infer and extend it, adding the parts of OWL you are interested in. That would be a nice contribution to Apache Jena (get back in touch on the jena-dev mailing list if you want to do that).
WebPIE is a clever and interesting project, but as you can see, a bit more complex and it's a research project (with all that this implies from a long-term support and maintenance point of view). However, if it is OWL ter Horst you want/need, WebPIE would do.
You could even put the effort, fork WebPIE and contribute it to an open source project, if others are interested in using it.
You might be interested to look also at Ymris (but this is currently sleeping... zzzzz):
https://svn.apache.org/repos/asf/incubator/jena/Import/Jena-SVN/Ymris/trunk/
You may want to try GRAKN.AI, they perform reasoning in real time on persisted data in distributed systems.

Z3 Context serialization/deserialization?

Is it possible to serialize/deserialize a Z3 context (from C#)?
If not, is this feature planned ?
I think this feature is important for real world applications.
This is not directly supported in the current API. The next release will support multiple solvers, and we will provide commands for copying the assertions from one solver to another, and retrieving the assertions. With these commands, one can implement serialization by dumping the expressions in a file (in SMT 2.0 format). To deserialize, we just read the file back.
Note that, this solution can already be implemented using the current API if you keep track of the assertions you asserted into the logical context.
That being said, I've seen the following approach used in many projects that use Z3. They have their own representation for formulas. When they invoke Z3, they translate their representation into Z3's representation. In most cases the performance overhead is minimal. This approach gives them a lot of flexibility. Serialization is a good example. Some programming environment (e.g., Python) already provide some built-in support for serialization.

Resources