In my research I automatically generate SMT2, which I then pass to Z3. The generated code is basically one very large conjunction (and ...) of many different constraints.
Will I be losing (or gaining?) any significant performance by doing this, as opposed to generating many assertions?
You won't be losing or gaining. In almost all settings, Z3 splits any conjunction into multiple assertions and the time it takes to do so is negligible.
This questions has also come up before: Which is better practice in SMT: to add multiple assertions or single and?
Related
I have a SMT application (built on Haskell SBV library), which solves some complex equation against single s variable in Real logic using Z3. Finding solution takes about 30 seconds in my case.
Trying to speed things up, I added additional constraint s < 40000, as I have some estimation of solution. I was thinking that such constraint would shrink the search space and make solver return the result faster. However, this only made it slower (it didn't even finished in 10 minutes, actually).
The question is: can it be assumed that additional constraints always slows down/speeds up solution process, or there are no general rules and it always depends on circumstances?
I'm worried that even my 30-seconds algorithm may contain some extra constraint that isn't necessarily needed, but just slows the process.
I don't think you can make any general assumptions about this. It may or may not impact solving time, assuming sat/unsat status doesn't change.
Equalities usually help (as they propagate freely), but for anything else, it's anybody's guess. Also, different solvers can exhibit differing behavior for the same addition.
One way to think about this is that the underlying DPLL(T) algorithm is essentially a very smart glorified search algorithm. It keeps producing "learned lemmas" with the hope that it will find a contradiction with a previously known fact. The new "constraint" you add might cause it to generate a ton of correct but irrelevant lemmas that makes it go down the deep-end with no useful result. (Quantified formulae are usually like this: You can instantiate them in a million different ways; but unless you find the "correct" instantiation, all they do is end up polluting your search space.)
At least that's been my experience!
The Situation
So I have created some code in the form of modules that each represent a medical questionnaire (I'm calling them Catalogs). Each different questionnaire has it's own module as they may differ slightly in their content and associated calculations, but are essentially made up of simple questions that have boolean/numeric possible responses. Here is an example:
http://www.janssenmedicalinformation.ca/assets/pdf/HarveyBradshaw_English.pdf
These Catalog modules are included in an Entry class that collects responses matching the question names. Each questionnaire is transformed into a DEFINITION which is used in the Entry to do things like:
Validate inputs
Check completeness
Calculate scoring
Here are 2 examples for reference that illustrate the problem of duplication... much of the code is similar but not exactly the same.
https://gist.github.com/theworkerant/3a074d5d2a642ded1b96
The Problem
There is a lot of duplication here, but I'm not sure about the best strategy to remove it. There are a few things that make it difficult for this particular problem and make me lean towards accepting some duplication as opposed to a system that is too strict to work. The system needs to remain flexible enough to accommodate currently unknown medical questionnaires of a similar nature so I need to be careful (the reason I've gone with a Module system so far)
Here are some examples:
Each Catalog can have slightly different scoring requirements and custom grouping of questions that represent one "score"
Potentially many Catalogs are included in an Entry class and can't step on each other
Some Catalogs incorporate things like "Current Weight" for calculations, breaking the 1-5 or 1-10 paradigm and not fitting very nicely into simple sum reductions.
One Catalog requires a week of previous entries in order to be valid, a sort of weird custom validation.
The Question:
What strategies might be employed here to reduce duplication overall? I'm not looking for tweaks cut out a few lines from these specific examples. Implementation cost is a consideration.
Possibilities:
Put some of this into a database (sounds pretty good, but I think the cost of implementation could be high)
I fear there could be room for improvement in my metaprogramming here, perhaps there are better ways to accomplish this through some dynamic method creation or other voodoo.
Thanks!
If your system is basically crunching numbers i.e. given a set of large inputs, run a process on them, and then assert the outputs, which is the better framework for this?
By 'large inputs', I mean we need to enter data for several different, related entities.
Also, there are several outputs i.e. we don't just get one number at the end.
If you find yourself talking through different examples with people, JBehave is probably pretty good.
If you find yourself making lists of numbers and comparing inputs with outputs, Fitnesse is probably better.
However, if you find yourself talking to other devs and nobody else, use plain old JUnit. The less abstraction you have, the quicker it will be to run and the easier it will be to maintain.
One way to solve optimisation problems is to use an SMT solver to ask whether a (bad) solution exists, then to progressively add tighter cost constraints until the proposition is no longer satisfiable. This approach is discussed in, for example, http://www.lsi.upc.edu/~oliveras/espai/papers/sat06.pdf and http://isi.uni-bremen.de/agra/doc/konf/08_isvlsi_optprob.pdf.
Is this approach efficient, though? i.e. will the solver re-use information from previous solutions when attempting to solve with additional constraints?
The solver can reuse lemmas learned when trying to solve previous queries. Just keep in mind than in Z3 whenever you execute a pop all lemmas (created since the corresponding push) are forgotten. So, to accomplish that you must avoid push and pop commands and use "assumptions" if you need to retract assertions. In the following question, I describe how to use "assumptions" in Z3:
Soft/Hard constraints in Z3
Regarding efficiency, this approach is not the most efficient one for every problem domain. On the other hand, it can be implemented on top of most SMT solvers. Moreover, Pseudo-Boolean solvers (solver for 0-1 integer problems) successfully use a similar approach for solving optimization problems.
I've asked a couple of questions around this subject recently, and I think I'm managing to narrow down what I need to do.
I am attempting to create some "metrics" (quotes because these should not be confused with metrics relating to the performance of the application; these are metrics that are generated based on application data) in a Rails app; essentially I would like to be able to use something similar to the following in my view:
#metric(#customer,'total_profit','01-01-2011','31-12-2011').result
This would give the total profit for the given customer for 2011.
I can, of course, create a metric model with a custom result method, but I am confused about the best way to go about creating the custom metrics (e.g. total_profit, total_revenue, etc.) in such a way that they are easily extensible so that custom metrics can be added on a per-user basis.
My initial thoughts were to attempt to store the formula for each custom metric in a structure with operand, operation and operation_type models, but this quickly got very messy and verbose, and was proving very hard to do in terms of adding each metric.
My thoughts now are that perhaps I could create a custom metrics helper method that would hold each of my metrics (thus I could just hard code each one, and pass variables to each method), but how extensible would this be? This option doesn't seem very rails-esque.
Can anyone suggest a better alternative for approaching this problem?
EDIT: The answer below is a good one in that it keeps things very simple - though i'm concerned it may be fraught with danger, as it uses eval (thus there is no prospect of ever using user code). Is there another option for doing this (my previous option where operands etc. were broken down into chunks used a combination of constantize and get_instance_variable - is there a way these could be used to make the execution of a string safer)?
This question was largely answered with some discussion here: Rails - Scalable calculation model.
For anyone who comes across this, the solution is essentially to ensure an operation always has two operands, but an operand can either be an attribute, or the result of a previous calculation (i.e. it can be a metric itself), and it is thus highly scalable. This avoids the need to eval anything, and thus avoids the potential security holes that this entails.