Does lean enhance proof surveyability? - z3

By proof-surveyability I understand the fact that a human user could "trace" all the details of a proof. There are things that are not easily traceable. For instance, an SMT proof is based on specific heuristics that are then translated into the prover. In that situations, it may be useful to have easy mechanisms (not need to be expert to have them at your disposal) to scan why the proof failed or examine the internal structures of the proof procedure.
I was wondering if Lean enhances this kind of proof surveyability in contrast to Coq or Isabelle. I get the impression that this may be the case skimming through A Metaprogramming Framework for Formal Verification.

If I understand proof-surveyability or -traceability correctly, then by definition, a fully detailed proof is "100% traceable", whereas just stating the result (e.g. a lemma) is "0% traceable".
In that case, I don't see why Lean would improve over Coq or Isabelle, or any other tool whose core purpose is to check correctness of a fully detailed proof. Such tools often provide means to increase automation, which is convenient, but arguably reduces traceability, depending on how the additional proof steps are represented. E.g. a Coq-like tactic can increase automation, but traceability can be "recovered" because the steps the tactic infers can be represented in the same way the explicitly provided steps are represented: as proof rule applications or deduction steps.
The latter part is difficult for SMT-inferred proof steps: SMT solvers can achieve a much higher degree of automation, compared to proof checkers such as Coq, but at the expense of traceability, because its "reasoning" is much more technical and less human-like/deductive.
As a side remark: this difference between proof checkers and SMT solvers reminds me of the difference between classical and AI-based image recognition. The former is less automated/efficient, but easier to trace/explain.

Related

How to debug SMT scripts that have quantifiers?

Currently, I have a somewhat superficial understanding of how SMT solvers work (the basics of algorithms like E-matching, MBQI, and CVC4/5's inductive reasoning). However, it's very frustrating to debug by trial-and-error.
Is there any guidance on how to debug SMT scripts that make heavy use of quantifiers?
A badly-written script often goes into infinite loop but I cannot tell if it's my mistake, or it's just taking too long to respond.
The SMT solvers tend to hide internals from users, so it's quite hard to figure out why it's stuck. Is there any way to print the "solving context"?
Or maybe I'm using SMT solvers the wrong way? I should design my own verification algorithm, only employing SMT solvers for local decisions?
Any help is appreciated!
This is a very subjective question, and largely opinion based. But a couple of general remarks:
Don't directly program in SMTLib. It is not meant to be for human-consumption. Instead, use a higher-level API, and script them from a language that you're more familiar with. There are bindings available from any number of languages, including C/C++/Java/Python/O'Caml/Haskell/Scala etc. Just doing this will get rid of most of the mundane mistakes you make.
Turn on verbosity output of the solver. You might be able to notice patterns in the log output. Unfortunately this is very solver specific, and can be hard to decipher; but can also indicate if, for instance, you're stuck in an e-matching loop in the presence of quantifiers.
If there's a custom algorithm for your verification problem (Hoare triples, separation logic, abstract interpretation, ...), then you first have to apply these techniques and delegate local/sub-lemmas to an SMT solver. Do not expect the SMT solver to be able to do large proofs, and anything that requires actual induction out-of-the box.
Try reducing complexity by putting in over-constraints and see which ones help. Based on your findings you might be able to do a case-split, for instance, if the over-constraints enumerate a reasonably small search-space.
Again, these are very general remarks and whether they'll apply for your specific problem is anyone's guess. But I'd start with coding in a higher-level API if you aren't already doing so.

What is Function Point Analysis?

What does function Point Analysis Mean?
is it that its used for cost estimation of a software? or are there any proper definition that would define function Point Analysis?
Can you please give me a short description on it.
While I agree with Leo's answer, I'll try a more practical description:
What it is
Function Point Analysis (FPA) is one of currently five standards for Functional Sizing (see ISO/IEC 14143) as approved by ISO. FPA is actually the widely used short term for the ISO/IEC 20926 standard titled "IFPUG Functional Size Measurement".
FPA is a means to rate (the term 'measure' is actually misleading) the amount of functional requirements to software. To achieve this rating, a technique is used that was known as 'functional decomposition' in earlier times. This concept is in fact very close to describing requirements with 'use cases', even though the detailed rules and notations are quite different.
In short, the functional requirements are decomposed into 'elementary functions', which then are rated each with a point value. The total of points for all elementary functions is used as an indication of the 'size' or amount of requirements. This is called the 'functional size' expressed in the unit of 'function points' (fp).
The natural representation of a functional decomposition is the functional tree.
The FPA standard also has a set of rules for rating changes to existing applications, thus it can be used to rate the functional requirements for the adaption of extension of existing systems ('enhancements' or 'releases').
What it is not
FPA is not an effort estimation technique by itself. Obviously, the relation between the size of functional requirements and the implementation effort can be and often is rather loose. Function points can be used as (one) input to more complex estimation models (such as COCOMO), which have to take into account all other effort drivers.
FPA is not a 'software metric' - functional size is always related to the user requirements fulfilled by software. While you can count and measure lines of code or code complexity, functional size is the result of an analytical process.
When to use it
FPA can be helpful to estimate the effort for a software project in an early stage, when the requirements are known, but the details of implementation have not yet been specified or evaluated. The functional requirements are reflected in the functional size, the non-functional requs need to be input in an estimation model. You need to have/use a good and proven (and trusted) model, otherwise the functional size is useless for this purpose.
FPA can also help to rate the 'value' of an application in the sense of 'recovery costs'.
Eventually, in the context of IT client/vendor relationships, FPA can be used as a basis for pricing. Clients are invoiced based on an agreed 'price per fp' instead of an hourly rate.
When not to use it
By definition, FPA requires a basic understanding of the functional requirements. Thus, if you do not have or know the functional requirements, it will be difficult if not impossible to use FPA.
FPA is also not suited to rate the performance of individuals, as it is a rather holistic rating for an application and cannot to be used to size only parts of it.
the authoritative answer, from IFPUG
http://www.ifpug.org/about-ifpug/about-function-point-analysis/
Function Point Analysis (FPA) is a sizing measure of clear business
significance. First made public by Allan Albrecht of IBM in 1979, the
FPA technique quantifies the functions contained within software in
terms that are meaningful to the software users. The measure relates
directly to the business requirements that the software is intended to
address. It can therefore be readily applied across a wide range of
development environments and throughout the life of a development
project, from early requirements definition to full operational use.
Other business measures, such as the productivity of the development
process and the cost per unit to support the software, can also be
readily derived.The function point measure itself is derived in a
number of stages. Using a standardized set of basic criteria, each of
the business functions is a numeric index according to its type and
complexity. These indices are totaled to give an initial measure of
size which is then normalized by incorporating a number of factors
relating to the software as a whole. The end result is a single number
called the Function Point index which measures the size and complexity
of the software product.
In summary, the function point technique provides an objective,
comparative measure that assists in the evaluation, planning,
management and control of software production.
ps. the IFPUG definition is what is taken as certain in the Court here in Brazil, when someone has any kind of dispute about function points (mostly because Government contracts are usually defined in FPs)

Practical examples of NFA and epsilon NFA

What are the real time examples of NFA and epsilon NFA i.e. practical examples other than that it is used in designing compilers
Any time anyone uses a regular expression, they're using a finite automaton. And if you don't know a lot about regexes, let me tell you they're incredibly common -- in many ecosystems, it's the first tool most people try to apply when faced with getting structured data out of strings. Understanding automata is one way to understand (and reason about) regexes, and a quite viable one at that if you're mathematically inclined.
Actually, today's regex engines have grown beyond these mathematical concepts and added features that permit doing more than an FA allows. Nevertheless, many regexes don't use these features, or use them in such a limited way that it's feasible to implement them with FAs.
Now, I only spoke of finite automata in general before. An NFA is a specific FA, just like a DFA is, and the two can be converted into one another (technically, any DFA already is a NFA). So while you can just substitute "finite automaton" with "NFA" in the above, be aware that it doesn't have to be an NFA under the hood.
Like explained by #delnan, automatas are often used in the form of regular expressions. However, they are used a bit more than just that. Automatas are often used to model hardware and software systems and verifying their certain properties. You can find more information by looking at model checking. One really really simplified motivating example can be found in the introduction of Introduction to Automata, Languages, and Computation.
And let's not forget Markov chains which are basically based on on finite automata as well. In combination with the hardware and sofware modelling that bellpeace mentioned, a very powerful tool.
If you are wondering why epsilon NFAs are considered a variation of NFAs, then I don't think there is a good reason. They are interpreted in the same way, except every step may not be a unit time anymore, but an NFA is that not really either.
A somewhat obscure but effective example would be the aho-corasick algorithm, which uses a finite automaton to search for multiple strings within text

How to test an Machine Learning or statistic NLP algorithm implementation pack?

I am working on testing several Machine Learning algorithm implementations, checking whether they can work as efficient as described in the papers and making sure they could offer a great power to our statistic NLP (Natural Language Processing) platform.
Could u guys show me some methods for testing an algorithm implementation?
1)What aspects?
2)How?
3)Do I have to follow some basic steps?
4)Do I have to consider diversity specific situations when using different programming languages?
5)Do I have to understand the algorithm? I mean, does it offer any help if I really know what the algorithm is and how it works?
Basically, we r using C or C++ to implement the algorithm and our working env is Linux/Unix. Our testing methods only focus on black box testing and testing input/output of functions. I am eager to improve them but I dont have any better idea now...
Great Thx!! LOL
For many machine learning and statistical classification tasks, the standard metric for measuring quality is Precision and Recall. Most published algorithms will make some kind of claim about these metrics, or you could implement them and run these tests yourself. This should provide a good indicative measure of the quality you can expect.
When you talk about efficiency of an algorithm, this is usually some statement about the time or space performance of an algorithm in terms of the size or complexity of its input (often expressed in Big O notation). Most published algorithms will report an upper bound on the time and space characteristics of the algorithm. You can use that as a comparative indicator, although you need to know a little bit about computational complexity in order to make sure you're not fooling yourself. You could also possibly derive this information from manual inspection of program code, but it's probably not necessary, because this information is almost always published along with the algorithm.
Finally, understanding the algorithm is always a good idea. It makes it easier to know what you need to do as a user of that algorithm to ensure you're getting the best possible results (and indeed to know whether the results you are getting are sensible or not), and it will allow you to apply quality measures such as those I suggested in the first paragraph of this answer.

Intelligent code-completion? Is there AI to write code by learning?

I am asking this question because I know there are a lot of well-read CS types on here who can give a clear answer.
I am wondering if such an AI exists (or is being researched/developed) that it writes programs by generating and compiling code all on it's own and then progresses by learning from former iterations. I am talking about working to make us, programmers, obsolete. I'm imagining something that learns what works and what doesn't in a programming languages by trial and error.
I know this sounds pie-in-the-sky so I'm asking to find out what's been done, if anything.
Of course even a human programmer needs inputs and specifications, so such an experiment has to have carefully defined parameters. Like if the AI was going to explore different timing functions, that aspect has to be clearly defined.
But with a sophisticated learning AI I'd be curious to see what it might generate.
I know there are a lot of human qualities computers can't replicate like our judgement, tastes and prejudices. But my imagination likes the idea of a program that spits out a web site after a day of thinking and lets me see what it came up with, and even still I would often expect it to be garbage; but maybe once a day I maybe give it feedback and help it learn.
Another avenue of this thought is it would be nice to give a high-level description like "menued website" or "image tools" and it generates code with enough depth that would be useful as a code completion module for me to then code in the details. But I suppose that could be envisioned as a non-intelligent static hierarchical code completion scheme.
How about it?
Such tools exist. They are the subject of a discipline called Genetic Programming. How you evaluate their success depends on the scope of their application.
They have been extremely successful (orders of magnitude more efficient than humans) to design optimal programs for the management of industrial process, automated medical diagnosis, or integrated circuit design. Those processes are well constrained, with an explicit and immutable success measure, and a great amount of "universe knowledge", that is a large set of rules on what is a valid, working, program and what is not.
They have been totally useless in trying to build mainstream programs, that require user interaction, because the main item a system that learns needs is an explicit "fitness function", or evaluation of the quality of the current solution it has come up with.
Another domain that can be seen in dealing with "program learning" is Inductive Logic Programming, although it is more used to provide automatic demonstration or language / taxonomy learning.
Disclaimer: I am not a native English speaker nor an expert in the field, I am an amateur - expect imprecisions and/or errors in what follow. So, in the spirit of stackoverflow, don't be afraid to correct and improve my prose and/or my content. Note also that this is not a complete survey of automatic programming techniques (code generation (CG) from Model-Driven Architectures (MDAs) merits at least a passing mention).
I want to add more to what Varkhan answered (which is essentially correct).
The Genetic Programming (GP) approach to Automatic Programming conflates, with its fitness functions, two different problems ("self-compilation" is conceptually a no-brainer):
self-improvement/adaptation - of the synthesized program and, if so desired, of the synthesizer itself; and
program synthesis.
w.r.t. self-improvement/adaptation refer to Jürgen Schmidhuber's Goedel machines: self-referential universal problem solvers making provably optimal self-improvements. (As a side note: interesting is his work on artificial curiosity.) Also relevant for this discussion are Autonomic Systems.
w.r.t. program synthesis, I think is possible to classify 3 main branches: stochastic (probabilistic - like above mentioned GP), inductive and deductive.
GP is essentially stochastic because it produces the space of likely programs with heuristics such as crossover, random mutation, gene duplication, gene deletion, etc... (than it tests programs with the fitness function and let the fittest survive and reproduce).
Inductive program synthesis is usually known as Inductive Programming (IP), of which Inductive Logic Programming (ILP) is a sub-field. That is, in general the technique is not limited to logic program synthesis or to synthesizers written in a logic programming language (nor both are limited to "..automatic demonstration or language/taxonomy learning").
IP is often deterministic (but there are exceptions): starts from an incomplete specification (such as example input/output pairs) and use that to constraint the search space of likely programs satisfying such specification and then to test it (generate-and-test approach) or to directly synthesize a program detecting recurrences in the given examples, which are then generalized (data-driven or analytical approach). The process as a whole is essentially statistical induction/inference - i.e. considering what to include into the incomplete specification is akin to random sampling.
Generate-and-test and data-driven/analytical§ approaches can be quite fast, so both are promising (even if only little synthesized programs are demonstrated in public until now), but generate-and-test (like GP) is embarrassingly parallel and then notable improvements (scaling to realistic program sizes) can be expected. But note that Incremental Inductive Programming (IIP)§, which is inherently sequential, has demonstrated to be orders of magnitude more effective of non-incremental approaches.
§ These links are directly to PDF files: sorry, I am unable to find an abstract.
Programming by Demonstration (PbD) and Programming by Example (PbE) are end-user development techniques known to leverage inductive program synthesis practically.
Deductive program synthesis start with a (presumed) complete (formal) specification (logic conditions) instead. One of the techniques leverage automated theorem provers: to synthesize a program, it constructs a proof of the existence of an object meeting the specification; hence, via Curry-Howard-de Bruijn isomorphism (proofs-as-programs correspondence and formulae-as-types correspondence), it extracts a program from the proof. Other variants include the use of constraint solving and deductive composition of subroutine libraries.
In my opinion inductive and deductive synthesis in practice are attacking the same problem by two somewhat different angles, because what constitute a complete specification is debatable (besides, a complete specification today can become incomplete tomorrow - the world is not static).
When (if) these techniques (self-improvement/adaptation and program synthesis) will mature, they promise to rise the amount of automation provided by declarative programming (that such setting is to be considered "programming" is sometimes debated): we will concentrate more on Domain Engineering and Requirements Analysis and Engineering than on software manual design and development, manual debugging, manual system performance tuning and so on (possibly with less accidental complexity compared to that introduced with current manual, not self-improving/adapting techniques). This will also promote a level of agility yet to be demonstrated by current techniques.

Resources