Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I would like to add some basic convenience features to Dafny, such as the ability to define set union in Dafny (see this question). But the internals of Dafny don't seem to be well documented and I don't know where to begin.
How can I add such a feature?
This is an excellent question. I'm not sure why it was downvoted. Thanks for asking. I wish I had known better where to look for information on its internals when I started getting into Dafny.
Rustan has many tutorials/papers/examples on how to use Dafny. (In fact, I would say we suffer slightly from an embarrassment of riches here, because there are so many resources spread over nearly a decade that it can be hard to know where to start. But that's a story for another day.) Also, Dafny is a living project: things change, and so some documents are out of date. You should just prepare yourself for this, and always be willing to open a new file and try things out in modern Dafny.
All that said, there are comparatively few resources about the internals of Dafny. The best way to proceed is to make sure you have a thorough understanding of the theory behind Dafny, and then just read the code. (It's quite good!) Here are some concrete pointers.
The Dafny Reference Manual is essentially an annotated description of the input grammar to Dafny. It was last edited in earnest about two years ago now, so some things are out of date, but it is still invaluable as a mostly exhaustive list of Dafny features. (Please file github issues if you find specific things missing, and we'll try to fix them.) I recommend reading it cover-to-cover.
Check out Rustan's summer school course that give a theoretical presentation of Dafny and Boogie. Also check out this earlier summer school course on Spec#, which, gets many of the same ideas across, but at a more leisurely pace.
Learn to program in Boogie.
Start with the (10 years old, but still 90% accurate) manual This is Boogie 2. Understanding Boogie deeply will help you see, by comparison, what Dafny brings to the table.
Ask Dafny to translate some examples to Boogie for you (using the command-line option /print:foo.bpl), and read the resulting Boogie code.
Read the Boogie test suite to see more examples. Start with the textbook directory. Ignore directories with interesting names.
Also check out this paper on Boogie's more-sophisticated-than-you-might-expect type system. (It goes beyond Hindley-Milner polymorphism!)
Learn at least a little bit about Z3.
Focus especially on how it uses triggers to handle quantifiers. A good introduction to the Dafny-level view of triggers is the paper Trigger Selection Strategies to Stabalize Automatic Program Verifiers.
Ask Boogie to translate some (small) examples to Z3 for you (using the command-line option /proverLog:foo.smt2), and read the resulting Z3 code. This is very hard, but worth it once or twice for your own edification. It can also be occasionally helpful during debugging.
Dig into the Dafny test suite.
Read tests. There are a lot of tests in the test suite, and in many cases is the only place to see a real, live, working example of some feature. If there are features that are not represented in the test suite, please file a Github issue, and we'll try to take care of it.
Learn to run the tests, so that you can test whether your improvements to Dafny break existing programs. (The basic idea is to install the lit testing tool and point it at the Test directory.)
Read the code.
By and large, it's refreshingly good code. The high-level structure is underdocumented, but the low-level structure is usually documented and/or clear. Your job is thus to reconstruct the high-level structure. Start by understanding the different phases of Dafny -- parsing, "resolution"/type checking, translation to Boogie for verification, compilation to C# for execution.
Here's main() for the command-line tool. Trace through and find the phases and read the ones you're interested in.
Ask questions. Unfortunately, there's no great place for concrete, detailed questions about Dafny's internals. Stack Overflow isn't appropriate; neither is Github. Perhaps the best stop-gap is to file "request for documentation" issues on Github, and we'll see what we can do.
I'm hoping Rustan may chime in if there are things I missed.
Good luck, and in words of Rustan: Program Safely!
This page (scroll down to "Dafny") also links to some more papers on Dafny that you might find interesting.
for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.
I am curious if there are any extensive overview, preferrably specifications / technical reports about the GNU style and other commonly used styles for parsing Command Line Arguments.
As far as I know, there are many catches and it's not completely trivial to write a parsing library that would be as compliant as, for example, C++ boost::program_options, Python's argparse, GNU getopt and more.
On the other hand, there might be libraries that are too liberal in accepting certain options or too restrictive. So, if one wants to aim for a good compatibility / conformance with a de-facto standard (if such exists), is there a better way than simply reading a number of mature libraries' source code and/or test cases?
Posix provides guidelines for the syntax of utilities, as Chapter 12 of XBD (the Base Definitions). It's certainly worth a read. As is noted, backwards-compatibility has meant that many standardized utilities do not conform to these guidelines, but nonetheless the standard recommends
... that all future utilities and applications use these guidelines to enhance user portability. The fact that some historical utilities could not be changed (to avoid breaking existing applications) should not deter this future goal.
You can also read the rationale for the syntax guidelines.
Posix provides a basic syntax but it's insufficient for utilities with a large number of arguments, and single-letter options are somewhat lacking in self-documentation. Some utilities -- test, find and tcpdump spring to mind -- essentially implement domain specific languages. Others -- ls and ps, for example -- have a bewildering pantheon of invocation options. To say nothing of compilers...
Over the years, a number of possible extension methods have been considered, and probably all of the are still in use in at least one common (possibly even standard) utility. Posix recommends the use of -W as an extension mechanism, but there are few uses of that. X Windows and TCL/Tk popularized the use of spelled-out multicharacter options, but those utilities expect long option names to still start with a single dash, which renders it impossible to condense non-argument options [Note 1]. Other utilities -- dd, make and awk, to name a few -- special-case arguments which have the form {íd}={val} with no hyphens at all. The GNU approach of using a double-hyphen seems to have largely won, partly for this reason, but GNU-style option reordering is not universally appreciated.
A brief discussion of GNU style is found in the GNU style guide (see also the list of long options), and a slightly less brief discussion is in Eric Raymond's The Art of Unix Programming [Note 2].
Google code takes command-line options to a new level; the internal library has now been open-sourced as gflags so I suppose it is now not breaking confidentiality to observe how much of Google's server management tooling is done through command-line options. Google flags are scattered indiscriminately throughout the code, so that library functions can define their own options without the calling program ever being aware of them, making it possible to tailor the behaviour of key libraries independently of the application. (It's also possible to modify the value of a gflag on the fly at runtime, another interesting tool for service management.) From a syntactic viewpoint, gflags allows both single- and double-hyphen long option presentation, indiscriminately, and it doesn't allow coalesced single-character-option calls. [Note 3]
It's worth highlighting the observation in The Unix Programming Environment (Kernighan & Pike) that because the shell "must satisfy both the interactive and programming aspects of command execution, it is a strange language, shaped as much by history as by design." The requirements of these two aspects -- the desire of a concise interactive language and a precise programming language -- are not always compatible.
Syntax flexibility, while handy for the interactive user, can be disastrous for the script author. As an example, last night I typed -env=... instead of --env=... which resulted in my passing nv=... to the -e option rather than passing ... to the --env option, which I didn't notice until someone asked me why I was passing that odd string as an EOF indicator. On the other hand, my pet bugbear -- the fact that some prefer --long-option and others prefer --long_option and sometimes you find both styles in the same program (I'm looking at you, gcc) -- is equally annoying as an interactive user and as a scripter.
Sadly, I don't know of any resource which would serve as an answer to this question, and I'm not sure that the above serves the need either. But perhaps we can improve it over time.
Notes:
Obviously a bad idea, since it would make impossible the pastime of constructing useful netstat invocations whose argument is a readable word.
The book and its author are commonly known as TAOUP and ESR, respectively.
It took me a while to get used to this, and very little time to revert to my old habits. So you can see where my biases lie.
I'm trying to understand where SBE's complement or replaces traditional requirements documentation. The diagram levels of requirements shows three levels of traditional software requirements.
Which of the items below (from the diagram) does SBE replace and which ones does it complement:
Vision and Scope Document
Business Requirements
Use Case Document
User Requirements
Business Rules
Software Requirements Specification
System Requirements
Functional Requirements
Quality Attributes
External Interfaces
Constraints
My naive understanding of SBE's would say that the SBE's are just an alternative form of the Software Requirements Specification. Is this correct?
BDD and SBE are normally used by Agile teams, who don't focus as much on documentation as traditional software development teams do.
BDD is the art of using examples in conversation to illustrate behaviour. SBE then uses the examples as a way of specifying the behaviour that you decide to address (I always think of it as a subset of BDD, since talking through examples often ends up to eliminating scope, discovering uncertainty or finding different options, none of which end up as specifications).
There are a couple of things that are hard to do with BDD. One of them is anything which isn't discrete in nature, or which needs to always be true throughout the lifetime of the system - non-functionals, quality attributes, constraints, etc. It's hard to talk through examples of these. These continuous aspects of requirements lend themselves better to monitoring, and that's discrete, so BDD can even be used to help manage these.
Since an initial vision is usually created to help the company make money, save money, or protect existing revenue (stopping customers going elsewhere, for instance), you can even come up with examples of how the project will do this. In fact, if you can't, the project is likely to fail anyway. So BDD / SBE can also be used to help complement an initial vision and scope.
Therefore, BDD / SBE can complement all of these documents, and in Agile teams, the documents themselves are usually replaced by conversations about the requirements and rules (illustrated by examples), story cards to represent placeholders for those conversations, and perhaps some lightweight capture of those conversations on a Wiki.
It is unlikely that any Agile team captures all of their examples up-front, as this leads to excessive investment in the requirements and tends to turn it into a traditional Waterfall /SDLC project instead.
This blog post I wrote about BDD in the Large may also be of interest.
flex is called the "fast" lexical analyzer, but I can not find any document that explains why it is faster than lex. flex has a manual, but it focus on its usage instead of its internals. Could any experts in this field give some help please? Either an explanation about flex's performance improvements or a link to it is welcome.
This answer is from Vern Paxson, and he allows it being shared here.
Alas, this would take quite a bit of time to sketch in any sort of
useful detail, as there are a number of techniques that contribute to
its performance. I wrote a paper about it a loooong time ago (mid
80s!) but don't have a copy of it. Evidently you can buy it from:
http://www.ntis.gov/search/product.aspx?ABBR=DE85000703
Sorry not to be of more help ...
To add to Vern's statement, flex does a lot better job of table compression, providing several different space/time tradeoffs, and its inner loop is also considerably faster than lex's.
According to a (Usenet?) paper by van Jacobsen in the 1980s, lex was largely written by an AT&T intern. VJ described how its inner loop could be reduced from several dozen instructions to about three.
Vern Paxon wrote flex for what he described at the time as the fastest data acquisition applications in the world. Not sure if I should go into more details here.
I had the privilege of helping Vern with the 8-bit version, as I was working in compilers that had to scan Kanji and Katakana at the time.
I'm not so sure flex is so much faster than the AT&T version lex. Both programs have been developed independently and to avoid confusion with the official version, the authors of flex probably choose a slightly different name. They might have intended to generate faster scanners, which is also suggested by a couple of options to trade space for time. Also they motivate making %option yylineno (and a few other features) optional with the speed of the generated scanner.
Whether the slight differences in speed for such scanners are still relevant is debatable. I couldn't find any official statement on the choice of name either, so I guess you'd have to ask the original authors Jef Poskanzer and/or Vern Paxson. If you find them and get an answer, then please let us know here. History of software is interesting and you can still get the answer first hand.