They are from Microsoft and seem like they are proof assistants? Besides syntactical differences are there practical aspects that make them different from one another (say ability to do automation, expressive power, etc)? I am new to formal verification.
Edit: I am not asking for which one is better, am merely interested in a technical comparison between the different features offered by these tools. I'm looking for something like this
Each tool has a unique design, and is built and influenced by different people with different goals and philosophies, but the authors are all friends and have sat within a few offices of each other for many years.
Rustan Leino designed Dafny as a successor to many of the systems he built before including ESC Java, and Spec#.
Dafny is based on a Java or C# like imperative language with the ability to write Hoare logic style state invariants, this allows users of the languages to verify properties about methods, and objects that use mutable state, loops, arrays, and so on. Dafny's core theory is a custom program logic mostly designed by Rustan and a handful of collaborators. Dafny discharges the verification conditions it generates by compiling them to Boogie an intermediate verification language, which in turn compiles them into queries which are passed to an SMT solver such as Z3 or CVC4 to discharge.
Dafny's design goal is to feel very similar to imperative object oriented languages users are familiar with the added ability to verify your programs.
F* is based on a new type theory designed by Nikhil Swamy and collaborators, it began as an ML like programming language with the addition of refinement types which were discharged in the style of Dafny, but has evolved substantially in the past few years due to numerous outside additions, as well as influences from Dafny, Lean, LiquidHaskell, and so on.
F*'s also translates its verification conditions to SMT solvers like Dafny, but does not use an intermediate verification language like Boogie. F* has recently gained the ability to use tactics heavily influenced by the Lean tactic language.
F*'s main innovation over tools like Dafny and other refinement types is the use of Dijkstra Monads a way to describe the "effect" of code, giving the effect designer control over the verification conditions generated. DMs allow users to reason at different levels, for example code in the Pure effect can not use state, or throw exceptions and the user is able to ignore effectful features they don't use.
Lean's design is heavily influenced by Coq and other intensional type theories and is much more similar to them, the goal of Lean is to marry the best of automated and interactive theorem provers, by bringing techniques from the automated (SMT) world to the type theory world. It has very powerful meta-programming abilities, and has been gaining more and more automation. Lean does not require an SMT solver and reimplements many of the core procedures in a specialized way for Lean's type theory.
You can view F* and Lean as covering to a similar spaces but emphasizing different ways of getting there.
I am happy to elaborate more if this doesn't clarify.
Source: core developer of Lean, developer of F*, and sometime user and contributor to Dafny, worked at MSR for ~7 months and personally know all of the tool authors.
Related
I'm new to compiler design and have been watching a series of youtube videos by Ravindrababu Ravula.
I am creating my own language for fun and I'm parsing it to an Abstract Syntax Tree (AST). My understanding is that these trees can be portable given they follow the same structure as other languages.
How can I create an AST that will be portable?
Side notes:
My parser is currently written in javascript but I might move it to C#.
I've been looking at SpiderMonkey's specs for guidance. Is that a good approach?
Portability (however defined) is not likely to be your primary goal in building an AST. Few (if any) compiler frameworks provide a clear interface which allows the use of an external AST, and particular AST structures tend to be badly-documented and subject to change without notice. (Even if they are well-documented, the complexity of a typical AST implementation is challenging.)
An AST is very tied to the syntactic details of a language, as well as to the particular parsing strategy being used. While it is useful to be able to repurpose ASTs for multiple tasks -- compiling, linting, pretty-printing, interactive editing, static analysis, etc. -- the conflicting demands of these different use cases tends to increase complexity. Particularly at the beginning stages of language development, you'll want to give yourself a lot of scope for rapid prototyping.
The most tempting reason for portable ASTs would be to use some other language as a target, thereby saving the cost of writing code-generation, etc. However, in practice it is usually easier to generate the textual representation of the other language from your own AST than to force your parser to use a foreign AST. Even better is to target a well-documented virtual machine (LLVM, .Net IL, JVM, etc.), which is often not much more work than generating, say, C code.
You might want to take a look at the LLVM Kaleidoscope tutorial (the second section covers ASTs, although implemented in C++). Also, you might find this question on a sister site interesting reading. And finally, if you are going to do your implementation in Javascript, you should at least take a look at the jison parser generator, which takes a lot of the grunt-work out of maintaining a parser and scanner (and thus allows for easier experimentation.)
I'm dusting off an old project of mine which calculates a number of simple metrics about large software projects. One of the metrics is the length of files/classes/methods. Currently my code "guesses" where class/method boundaries are based on a very crude algorithm (traverse the file, maintaining a "current depth" and adjusting it whenever you encounter unquoted brackets; when you return to the level a class or method began on, consider it exited). However, there are many problems with this procedure, and a "simple" way of detecting when your depth has changed is not always effective.
To make this give accurate results, I need to use the canonical way (in each language) of detecting function definitions, class definitions and depth changes. This amounts to writing a simple parser to generate parse trees containing at least these elements for every language I want my project to be applicable to.
Obviously parsers have been written for all these languages before, so it seems like I shouldn't have to duplicate that effort (even though writing parsers is fun). Is there some open-source project which collects ready-to-use parser libraries for a bunch of source languages? Or should I just be using ANTLR to make my own from scratch? (Note: I'd be delighted to port the project to another language to make use of a great existing resource, so if you know of one, it doesn't matter what language it's written in.)
If you want language-accurate parsing, especially in the face of language complications such as macros and preprocessor conditionals, you need full language parsers. These are actually quite a lot of work to construct, and most languages don't lend themselves nicely to the various kinds of parser generators around. Nor are most authors of a language parser interested in other langauges; they tend to choose some parser generator that isn't obviously a huge roadblock when they start, implement their parser for the specific purpose they intend, and move on.
Consequence: there are very few libraries of language definitions around that are defined using a single formalism or a shared foundation. The ANTLR crowd maintains one of the larger sets IMHO, although as far as I can tell most of those parsers are not-quite-production capable. There's always Bison, which has been around long enough so you'd expect a library of langauge definitions to be collected somewhere, but I've never seen one.
I've spent the last 15 years defining foundation machinery for program analysis and transformation, and building another such library, called the DMS Software Reengineering Toolkit. It has production quality parsers for C, C++, C#, Java, COBOL (IBM Enterprise version), JCL, PHP, Python, etc. Your opinion may of course vary from mine but these are used daily with DMS to carry out mass change tasks on large bodies of code.
I don't know of any others where the set of langauge definitions are mature and built on a single foundation... it may be that IBM's compilers are such a set, but IBM doesn't offer out the machinery or the language definitions.
If all you want to do is compute simple metrics, you might be able to live with just lexers and ad hoc nest-counting (as you've described). Even that's harder than it looks to make it work right in most cases (check out Python's, Perl's and PHP crazy string syntaxes). When all is said and done, even C is a surprising amount of work just to define an accurate lexer: we have several thousand lines of sophisticated regular expressions to cover all the strange lexemes you find in Microsoft and/or GNU C.
Because DMS has consistently-defined, mature parsers for many languages, it follows that DMS has consistently defined, mature lexers for the same langauges. We actually build a Source Code Search Engine (SCSE) that provides fast search across large bodies of codes in multiple languages that works by lexing the languages it encounters and indexing those lexemes for fast lookup. The SCSE just so happens to compute the kind of metrics you are discussing, too, as it indexes the code base, pretty much the way you describe, except that it has these langauage accurate lexers to use.
You might be interested in gcc-xml if you are parsing C++. Java CUP has grammars for the Java language.
I always hear about UML being used in Java projects but never in Ruby ones. Is this just a cultural difference or is there less of a need for modeling in Ruby development because it's part of a more 'agile' culture?
Obviously you can't generalize this to everybody, but programmers in languages like Ruby and Python tend to be less drawn to large design documents and UML because they view their language of choice as being concise and expressive enough that it isn't always necessary. There's a feeling of, "I could spend time and plot all this out in UML...or I could just write some Python that actually implements the design and expresses it in a language I like to read and lots of people can read." Java programs tend to feel "heavier" than their Ruby or Python counterparts — it's part of the design of the language.
Note that I'm not saying this is true of your project or even that it's true at all as a whole — this is just what I've observed about these programming cultures.
Call me crazy but UML isn't for me regardless of the application stack.
(Note, tongue sometimes placed in cheek.)
Probably one of the biggest cultural differences is that Java is often used in projects with large numbers of programmers, led by PHBs, where the high-level system design is done by people with the title "software architect". On these sort of projects the people in the "software architect" role will often generate a large amount of documentation (including UML relationship and state diagrams) during the initial planning phase of the project. These and other documentation artifacts are then expected to be implemented by the hordes of non-architect-programmers.
Ruby on the other hand, is the new hotness and is therefore more often chosen by people who want to program in it. Since the "architect" is the implementer, there is less need for complex upfront documentation. The implementers jot a few notes on general design guidelines and then sit down to program rather than designing upfront for others to program.
This isn't to say that you won't find a few scattered UML diagrams here or there in projects built in Ruby or other snazzy languages -- such as when someone is trying to describe a complex concept -- but such things just aren't needed as much if you are doing the work yourself.
One of the obvious reasons is that well-designed Ruby programs rely heavily on Mixins, which AFAIK simply cannot be modeled in UML at all. I know that Schärli et al developed an extension to UML that can represent Traits which given the close relationship between Traits and Mixins could probably be adapted or just reused for representing Mixins, but then it's not UML anymore.
This is a comment to the answer about mixins. Mixins can actually be modelled in UML quite easily using many different methods. Typically one uses multiple inheritance, interfaces or stereotypes (or any combination of these). Choosing the method depends on the project and personal taste - let us not forget that the main reason for modeling is to conquer complexity, better understand reality and communicate more effectively so each model needs to fit a particular problem and audience. Models are, by definition, pragmatic and so must be the process of creating them.
Let us not forget that UML is extensible using profiles and stereotypes. Such extended UML is still valid UML.
In general, UML is more expressive and less restrictive than programming languages so if something can be written down in some programming language, it can also be done in UML.
Most of the posts that I read pertaining to these utilities usually suggest using some other method to obtain the same effect. For example, questions mentioning these tools usual have at least one answer containing some of the following:
Use the boost library (insert appropriate boost library here)
Don't create a DSL use (insert favorite scripting language here)
Antlr is better
Assuming the developer ...
... is comfortable with the C language
... does know at least one scripting
language (e.g., Python, Perl, etc.)
... must write some parsing code in almost
every project worked on
So my questions are:
What are appropriate situations which
are well suited for these utilities?
Are there any (reasonable) situations
where there is not a better
alternative to a problem than yacc
and lex (or derivatives)?
How often in actual parsing problems
can one expect to run into any short
comings in yacc and lex which are
better addressed by more recent
solutions?
For a developer which is not already
familiar with these tools is it worth
it for them to invest time in
learning their syntax/idioms? How do
these compare with other solutions?
The reasons why lex/yacc and derivatives seem so ubiquitous today are that they have been around for much longer than other tools, that they have far more coverage in the literature and that they traditionally came with Unix operating systems. It has very little to do with how they compare to other lexer and parser generator tools.
No matter which tool you pick, there is always going to be a significant learning curve. So once you have used a given tool a few times and become relatively comfortable in its use, you are unlikely to want to incur the extra effort of learning another tool. That's only natural.
Also, in the late 1960s and early 1970s when lex/yacc were created, hardware limitations posed a serious challenge to parsing. The table driven LR parsing method used by Yacc was the most suitable at the time because it could be implemented with a small memory footprint by using a relatively small general program logic and by keeping state in files on tape or disk. Code driven parsing methods such as LL had a larger minimum memory footprint because the parser program's code itself represents the grammar and therefore it needs to fit entirely into RAM to execute and it keeps state on the stack in RAM.
When memory became more plentiful a lot more research went into different parsing methods such as LL and PEG and how to build tools using those methods. This means that many of the alternative tools that have been created after the lex/yacc family use different types of grammars. However, switching grammar types also incurs a significant learning curve. Once you are familiar with one type of grammar, for example LR or LALR grammars, you are less likely to want to switch to a tool that uses a different type of grammar, for example LL grammars.
Overall, the lex/yacc family of tools is generally more rudimentary than more recent arrivals which often have sophisticated user interfaces to graphically visualise grammars and grammar conflicts or even resolve conflicts through automatic refactoring.
So, if you have no prior experience with any parser tools, if you have to learn a new tool anyway, then you should probably look at other factors such as graphical visualisation of grammars and conflicts, auto-refactoring, availability of good documentation, languages in which the generated lexers/parsers can be output etc etc. Don't pick any tool simply because "this is what everybody else seems to be using".
Here are some reasons I could think of for using lex/yacc or flex/bison :
the developer is already familiar with lex/yacc or flex/bison
the developer is most familiar and comfortable with LR/LALR grammars
the developer has plenty of books covering lex/yacc but no books covering others
the developer has a prospective job offer coming up and has been told that lex/yacc skills would increase his chances to get hired
the developer could not get buy-in from project members/stake holders for the use of other tools
the environment has lex/yacc installed and for some reason it is not feasible to install other tools
Whether it's worth learning these tools or not will depend heavily (almost entirely on how much parsing code you write, or how interested you are in writing more code on that general order. I've used them quite a bit, and find them extremely useful.
The tool you use doesn't really make as much difference as many would have you believe. For about 95% of the inputs I've had to deal with, there's little enough difference between one and another that the best choice is simply the one with which I'm most familiar and comfortable.
Of course, lex and yacc produce (and demand that you write your actions in) C (or C++). If you're not comfortable with them, a tool that uses and produces a language you prefer (e.g. Python or Java) will undoubtedly be a much better choice. I, for one, would not advise trying to use a tool like this with a language with which you're unfamiliar or uncomfortable. In particular, if you write code in an action that produces a compiler error, you'll probably get considerably less help from the compiler than usual in tracking down the problem, so you really need to be familiar enough with the language to recognize the problem with only a minimal hint about where compiler noticed something being wrong.
In a previous project, I needed a way to be able to generate queries on arbitrary data in a way that was easy for a relatively non-technical person to be able to use. The data was CRM-type stuff (so First Name, Last Name, Email Address, etc) but it was meant to work against a number of different databases, all with different schemas.
So I developed a little DSL for specifying the queries (e.g. [FirstName]='Joe' AND [LastName]='Bloggs' would select everybody called "Joe Bloggs"). It had some more complicated options, for example there was the "optedout(medium)" syntax which would select all people who had opted-out of receiving messages on a particular medium (email, sms, etc). There was "ingroup(xyz)" which would select everybody in a particular group, etc.
Basically, it allowed us to specify queries like "ingroup('GroupA') and not ingroup('GroupB')" which would be translated to an SQL query like this:
SELECT
*
FROM
Users
WHERE
Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)
(As you can see, the queries aren't as effecient as possible, but that's what you get with machine generation, I guess).
I didn't use flex/bison for it, but I did use a parser generator (the name of which has escaped me at the moment...)
I think it's pretty good advice to eschew the creation of new languages just to support a Domain specific language. It's going to be a better use of your time to take an existing language and extend it with domain functionality.
If you are trying to create a new language for some other reason, perhaps for research into language design, then these tools are a bit outdated. Newer generators such as antlr, or even newer implementation languages like ML, make language design a much easier affair.
If there's a good reason to use these tools, it's probably because of their legacy. You might already have a skeleton of a language you need to enhance, which is already implemented in one of these tools. You might also benefit from the huge volumes of tutorial information written about these old tools, for which there is not so great a corpus written for newer and slicker ways of implementing languages.
We have a whole programming language implemented in my office. We use it for that. I think it's meant to be a quick and easy way to write interpreters for things. You could conceivably write almost any sort of text parser using them, but a lot of times it's either A) easier to write it yourself quick or B) you need more flexibility than they provide.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
From day 1 of my programming career, I started with object-oriented programming. However, I'm interested in learning other paradigms (something which I've said here on SO a number of times is a good thing, but I haven't had the time to do). I think I'm not only ready, but have the time, so I'll be starting functional programming with F#.
However, I'm not sure how to structure much less design applications. I'm used to the one-class-per-file and class-noun/function-verb ideas in OO programming. How do you design and structure functional applications?
Read the SICP.
Also, there is a PDF Version available.
You might want to check out a recent blog entry of mine: How does functional programming affect the structure of your code?
At a high level, an OO design methodology is still quite useful for structuring an F# program, but you'll find this breaking down (more exceptions to the rule) as you get down to lower levels. At a physical level, "one class per file" will not work in all cases, as mutually recursive types need to be defined in the same file (type Class1 = ... and Class2 = ...), and a bit of your code may reside in "free" functions not bound to a particular class (this is what F# "module"s are good for). The file-ordering constraints in F# will also force you to think critically about the dependencies among types in your program; this is a double-edged sword, as it may take more work/thought to untangle high-level dependencies, but will yield programs that are organized in a way that always makes them approachable (as the most primitive entities always come first and you can always read a program from 'top to bottom' and have new things introduced one-by-one, rather than just start looking a directory full of files of code and not know 'where to start').
How to Design Programs is all about this (at tiresome length, using Scheme instead of F#, but the principles carry over). Briefly, your code mirrors your datatypes; this idea goes back to old-fashioned "structured programming", only functional programming is more explicit about it, and with fancier datatypes.
Given that modern functional languages (i.e. not lisps) by default use early-bound polymorphic functions (efficiently), and that object-orientation is just a particular way of arranging to have polymorphic functions, it's not really very different, if you know how to design properly encapsulated classes.
Lisps use late-binding to achieve a similar effect. To be honest, there's not much difference, except that you don't explictly declare the structure of types.
If you've programmed extensively with C++ template functions, then you probably have an idea already.
In any case, the answer is small "classes" and instead of modifying internal state, you have to return a new version with different state.
F# provides the conventional OO approachs for large-scale structured programming (e.g. interfaces) and does not attempt to provide the experimental approaches pioneered in languages like OCaml (e.g. functors).
Consequently, the large-scale structuring of F# programs is essentially the same as that of C# programs.
Functional programming is a different paradigm for sure. Perhaps the easiest way to wrap your head around it is to insist that the design be laid out using a flow chart. Each function is distinct, no inheritance, no polymorphism, distinct. The data is passed around from function to function to make deletions, updates, insertion, and create new data.
On structuring functional programs:
While OO languages structure the code with classes, functional languages structure it with modules. Objects contain state and methods, modules contain data types and functions. In both cases the structural units group data types together with related behavior. Both paradigms have tools for building and enforcing abstraction barriers.
I would highly recommend picking a functional programming language you are comfortable with (F#, OCaml, Haskell, or Scheme) and taking a long look at how its standard library is structured.
Compare, for example, the OCaml Stack module with System.Collections.Generic.Stack from .NET or a similar collection in Java.
It is all about pure functions and how to compose them to build larger abstractions. This is actually a hard problem for which a robust mathematical background is needed. Luckily, there are several patterns with deep formal and practical research available. On Functional and Reactive Domain Modeling Debasish Ghosh explores this topic further and puts together several practical scenarios applying pure functional patterns:
Functional and Reactive Domain Modeling teaches you how to think of
the domain model in terms of pure functions and how to compose them to
build larger abstractions. You will start with the basics of
functional programming and gradually progress to the advanced concepts
and patterns that you need to know to implement complex domain models.
The book demonstrates how advanced FP patterns like algebraic data
types, typeclass based design, and isolation of side-effects can make
your model compose for readability and verifiability.