What Lisp is better at parsing?

What Lisp is better at parsing? - parsing

I'd like to implement a Lisp interpreter in a Lisp dialect mainly as a learning exercise. The one thing I'm thrown off by is just how many choices there are in this area. Primarily, I'm a bit more interested in learning about some of the Lisps that have been around a while (like Scheme or Common Lisp). I don't want to use Clojure to do this for the sheer fact that I've already used it. :-)
So is one of the flavors any better than the others at parsing? And do you think it's a good idea to say implement Scheme in Common Lisp (or vice versa)? Or will there be enough differences between the two to throw me off?
And if it makes any difference, I'd like something that's cross-platform. I have a Windows PC, a Mac, and a Linux box, and I could end up writing this on any of them.

There are some books about that:
SICP chapter 4 and 5
PAIP, chapter 5
LiSP, the whole book explains implementing Lisp
Anatomy of Lisp, old classic about implementing Lisp
All of the above books are highly recommended, though Anatomy of Lisp is oldish, hard to get and hard to read.
Both Scheme and Common Lisp are fine for your task.
Implementing Common Lisp is a larger task, since the language is larger. Usually one implements Common Lisp better in Common Lisp, since there are Common Lisp libraries that can be used for new Common Lisp implementations. ;-)

PLT Scheme is an excellent platform for experimenting with programming languages, especially Lispy languages. PLT has an extensible parser (usually called a reader in Scheme) that provides reader macros to manipulate the built in syntax; or you can completely replace the reader with your own. If you'd rather use traditional lex/yacc style parsers and lexers, PLT comes with a parser-tools module that provides those, too. As a bonus, it has comprehensive documentation and a repository for third-party packages (two things that are missing from a lot of Schemes).
The reference implementation of Arc (arclanguage.org) is a fairly simple and readable
example of building a language that compiles to Scheme. It uses PLT's reader mostly, with a couple of reader macros to change the bits of Scheme syntax that differ from Arc's. There's also a JavaScript implementation available from PLT's package repository (planet.plt-scheme.org) if you want to see how to implement a non-Lisp language.

Related

What front-end can I use with RPython to implement a language?

I've looked high and low for examples of implementing a language using the RPython toolchain, but the only one I've been able to find so far is this one in which the author writes a simple BF interpreter. Because the grammar is so simple, he doesn't need to use a parser/lexer generator. Is there a front-end out there that supports developing a language in RPython?
Thanks!

I'm not aware of any general lexer or parser generator targeting RPython specifically. Some with Python output may work, but I wouldn't bet on it. However, there's a set of parsing tools in rlib.parsing. It seems quite usable. OTOH, there's a warning in the documentation: It's reportedly still in development, experimental, and only used for the Prolog interpreter so far.
Alternatively, you can write the frontend by hand. Lexers can be annoying and unnatural, granted (you may be able to rip out the utility modules for DFAs used by the Python implementation). But parsers are a piece of cake if you know the right algorithms. I'm a huge fan of "Top Down Operator Precedence parsers" a.k.a. "Pratt parsers", which are reasonably simple (recursive descent) but make all expression parsing issues (nesting, precedence, associativity, etc.) a breeze. There's depressingly little information on them, but the few blog posts were sufficient for me:
One by Crockford (wouldn't recommend it though, it throws a whole lot of unrelated stuff into the parser and thus obscures it),
another one at effbot.org (uses Python),
and a third by a sadly even-less-famous guy who's developing a language himself, Robert Nystrom.

Alex Gaynor has ported David Beazley's excellent PLY to RPython. Its documentation is quite good, and he even gave a talk about using it to implement an interpreter at PyCon US 2013.

Can Xtext be used for parsing general purpose programming languages?

I'm currently developing a general-purpose agent-based programming language (its syntaxt will be somewhat inspired by Java, and we are also using object in this language).
Since the beginning of the project we were doubtful about the fact of using ANTLR or Xtext. At that time we found out that Xtext was implementing a subset of the feature of ANTLR. So we decided to use ANLTR for our language losing the possibility to have a full-fledged Eclipse editor for free for our language (such a nice features provided by Xtext).
However, as the best of my knowledge, this summer the Xtext project has done a big step forward. Quoting from the link:
What are the limitations of Xtext?
Sven: You can implement almost any kind of programming language or DSL
with Xtext. There is one exception, that is if you need to use so
called 'Semantic Predicates' which is a rather complicated thing I
don't think is worth being explained here. Very few languages really
need this concept. However the prominent example is C/C++. We want to
look into that topic for the next release.
And that is also reinforced in the Xtext documentation:
What is Xtext? No matter if you want to create a small textual domain-specific language (DSL) or you want to implement a full-blown
general purpose programming language. With Xtext you can create your
very own languages in a snap. Also if you already have an existing
language but it lacks decent tool support, you can use Xtext to create
a sophisticated Eclipse-based development environment providing
editing experience known from modern Java IDEs in a surprisingly short
amount of time. We call Xtext a language development framework.
If Xtext has got rid of its past limitations why is it still not possible to find a complex Xtext grammar for the best known programming languages (Java, C#, etc.)?
On the ANTLR website you can find tons of such grammar examples, for what concerns Xtext instead the only sample I was able to find is the one reported in the documentation. So maybe Xtext is still not mature to be used for implementing a general purpose programming language? I'm a bit worried about this... I would not start to re-write the grammar in Xtext for then to recognize that it was not suited for that.

I think nobody implemented Java or C++ because it is a lot of work (even with Xtext) and the existing tools and compilers are excellent.
However, you could have a look at Xbase and Xtend, which is the expression language we ship with Xtext. It is built with Xtext and is quite a good proof for what you can build with Xtext. We have done that in about 4 person months.
I did a couple of screencasts on Xtend:
http://blog.efftinge.de/2011/03/xtend-screencast-part-1-basics.html
http://blog.efftinge.de/2011/03/xtend-screencast-part-2-switch.html
http://blog.efftinge.de/2011/03/xtend-screencast-part-3-rich-strings-ie.html
Note, that you can simply embed Xbase expressions into your language.

I can't speak for what Xtext is or does well.
I can speak to the problem of developing robust tools for processing real languages, based on our experience with the DMS Software Reengineering Toolkit, which we imagine is a language manipulation framework.
First, parsing of real languages usually involves something messy in lexing and/or parsing, due to the historical ways these languages have evolved. Java is pretty clean. C# has context-dependent keywords and a rudimentary preprocessor sort of like C's. C has a full blown preprocessor. C++ is famously "hard to parse" due to ambiguities in the grammar and shenanigans with template syntax. COBOL is fairly ugly, doesn't have any reference grammars, and comes in a variety of dialects. PHP will turn you to stone if you look at it because it is so poorly defined. (DMS has parsers for all of these, used in anger on real applications).
Yet you can parse all of these with most of the available parsing technologies if you try hard enough, usually by abusing the lexer or the parser to achieve your goals (how the GNU guys abused Bison to parse C++ by tangling lexical analysis with symbol table lookup is a nice ugly case in point). But it takes a lot of effort to get the language details right, and the reference manuals are only close approximations of the truth with respect to what the compilers really accept.
If Xtext has a decent parsing engine, one can likely do this with Xtext. A brief perusal of the Xtext site sounds like the lexers and parsers are fairly decent. I didn't see anything about the "Semantic Predicate"s; we have them in DMS and they are lifesavers in some of the really dark corners of parsing. Even using the really good parsing technology (we use GLR parsers), it would be very hard to parse COBOL data declarations (extracting their nesting structure during the parse) without them.
You have an interesting problem in that your language isn't well defined yet. That will make your initial parsers somewhat messy, and you'll revise them a lot. Here's where strong parsing technology helps you: if you can revise your grammar easily you can focus on what you want your language to look like, rather than focusing on fighting the lexer and parser. The fact that you can change your language definition means in fact that if Xtext has some limitations, you can probably bend your language syntax to match without huge amounts of pain. ANTLR does have the proven ability to parse a language pretty much as you imagine it, modulo the usual amount of parser hacking.
What is never discussed is what else is needed to process a language for real. The first thing you need to be able to do is to construct ASTs, which ANTLR and YACC will help you do; I presume Xtext does also. You also need symbol tables, control and data flow analysis (both local and global), and machinery to transform your language into something else (presumably more executable). Doing just symbol tables you will find surprisingly hard; C++ has several hundred pages of "how to look up an identifier"; Java generics are a lot tougher to get right than you might expect. You might also want to prettyprint the AST back to source code, if you want to offer refactorings. (EDIT: Here both ANTLR and Xtext offer what amounts to text-template driven code generation).
Yet these are complex mechanisms that take as much time, if not more than building the parser. The reason DMS exists isn't because it can parse (we view this just as the ante in a poker game), but because all of this other stuff is very hard and we wanted to amortize the cost of doing it all (DMS has, we think, excellent support for all of these mechanisms but YMMV).
On reading the Xtext overview, it sounds like they have some support for symbol tables but it is unclear what kind of assumption is behind it (e.g., for C++ you have to support multiple inheritance and namespaces).
If you are already started down the ANTLR road and have something running, I'd be tempted to stay the course; I doubt if Xtext will offer you a lot of additional help. If you really really want Xtext's editor, then you can probably switch at the price of restructuring what grammar you have (this is a pretty typical price to pay when changing parsing paradigms). Expect most of your work to appear after you get the parser right, in an ad hoc way. I doubt you will find Xtext or ANTLR much different here.

I guess the most simple answer to your question is: Many general purpose languages can be implemented using Xtext. But since there is no general answer to which parser-capabilities a general purpose languages needs, there is no general answer to your questions.
However, I've got a few pointers:
With Xtext 2.0 (released this summer), Xtext supports syntactic predicates. This is one of the most requested features to handle ambiguous syntax without enabling antlr's backtracking.
You might want to look at the brand-new languages Xbase and Xtend, which are (judging based on their capabilities) general-purpose and which are developed using Xtext. Sven has some nice screen casts in his blog: http://blog.efftinge.de/
Regarding your question why we don't see Xtext-grammars for Java, C++, etc.:
With Xtext, a language is more than just a grammar, so just having a grammar that describes a language's syntax is a good starting point but usually not an artifact valuable enough for shipping. The reason is that with an Xtext-grammar you also define the AST's structure (Abstract Syntax Tree, and an Ecore Model in fact) including true cross references. Since this model is the main internal API of your language people usually spend a lot of thought designing it. Furthermore, to resolve cross references (aka linking) you need to implement scoping (as it is called in Xtext). Without a proper implementation of scoping you can either not have true cross references in your model or you'll get many lining errors.
A guess my point is that creating a grammar + designing the AST model + implementing scoping is just a little more effort that taking a grammar from some language-zoo and translating it to Xtext's syntax.

Decomposition (modularity) in functional languages

Got an idea: functions (in FP) could be composed similar ways as components in OOP. For components in OOP we use interfaces. For functions we could use delegates. Goal is to achieve decomposition, modularity and interchangeability. We could employ dependency injection to make it easier.
I tried to find something about the topic. No luck. Probably because there are no functional programs big enough to need this? While searching for enterprise scale applications written in FP I found this list.
Functional Programming in the Real World and this paper.
I hope I just missed the killer applications for FP, which would be big enough to deserve decomposition.
Question: Could you show decent real-world FP application (preferably open source), which uses decomposition into modules?
Bonus chatter: What is the usual pattern used? What kind of functions are usually decomposed into separate modules? Are the implementations ever mocked for testing purposes?

Some time ago I was learning F# and wondering about the same topics, so I asked about quality open source projects to learn from.
The reason why you're not seeing anything similar to dependency injection in functional programming is that it's just "natural", because you "inject dependencies" just by passing or composing functions. Or as this article puts it, "Functional dependency injection == currying", but that's just one mechanism.
Mocking frameworks are not necessary. If you need to mock something, you just pass a "stub" function.
See also this question about real-world Scala applications.

Either we're talking at cross-purposes (it's possible: I'm rather unfamiliar with OOP terminology) or you're missing a lot about functional programming. Modules and abstraction (i.e. interchangability) were basically invented in the functional language CLU. The seminal papers on abstract types are James Morris's Protection in programming languages and Types are not sets. Later, most improvements in module systems and abstraction have come from the functional programming world, through ML-like languages.
The killer application for functional programming is often said to be symbolic manipulation. Most compilers for functional languages are written in the language itself, so you could look up the source of your favorite functional language implementation. But pretty much any nontrivial program (functional or not) is written in a modular way to some extent — maybe I'm missing something about what you mean by “decomposition”? The modularity will be more visible and use more advanced concepts in strongly typed languages with an advanced module system, such as Standard ML and Objective Caml.

Learning More About Parsing

I have been programming since 1999 for work and fun. I want to learn new things, and lately I've been focused on parsing, as a large part of my job is reading, integrating and analyzing data. I also have a large number of repetitive tasks that I think I could express in very simple domain-specific languages if the overhead was low enough. I have a few questions about the subject.
Most of my current parsing code don't define a formal grammar. I usually hack something together in my language of choice because that's easy, I know how to do it and I can write that code very fast. It's also easy for other people I work with to maintain. What are the advantages and disadvantages of defining a grammar and generating a real parser (as one would do with ANTLR or YACC) to parse things compared with the hacks that most programmers used to write parsers?
What are the best parser generation tools for writing grammar-based parsers in C++, Perl and Ruby? I've looked at ANTLR and haven't found much about using ANTLRv3 with a C++ target, but otherwise that looks interesting. What are the other tools that are similar to ANTLR that I should be reading about?
What are the canonical books and articles that someone interested in learning more about parsing? A course in compilers unfortunately wasn't part of my education, so basic material is very welcome. I've heard great things about the Dragon Book, but what else is out there?

On 1., I would say the main advantage is maintainability -- making a little change to the language just means making a correspondingly-small change to the grammar, rather than minutely hacking through the various spots in the code that may have something to do with what you want changed... orders of magnitude better productivity and smaller risk of bugs.
On 2. and 3., I can't suggest much beyond what you already found (I mostly use Python and pyparsing, and could comment from experience on many Python-centered parse frameworks, but for C++ I mostly use good old yacc or bison anyway, and my old gnarled copy of the Dragon Book -- not the latest edition, actually -- is all I keep at my side for the purpose...).

Here's my take on your (very good) questions:
I think a parser benefits most from non-trivial situations where a grammar actually exists. You have to know about how parsers and grammars work to think of that technique, and not every developer does.
lex/yacc are older Unix tools that might be usable for you as a C++ developer. Maybe Bison as well.
ANTRL and its attendant book are very good. "Writing Compilers and Interpreters" has C++ examples which you might like.
The GoF Interpreter pattern is another technique for writing "little languages". Take a look at that.

Let's Build A Compiler is a step-by-step tutorial on how to write a simple compiler. The code is written in Delphi (Pascal), but it's basic enough to easily translate into most other languages.

I would have a serious look at monadic combinator-based parsing (which often also deals with lexical analysis) in Haskell. I found it quite an eye opener; it's amazing how easily you can build a parser from scratch using this method. It's so easy, in fact, that it's often faster to write your own parser than it is to try to use existing libraries.
The most famous example is probably Parsec which has a good user guide that explains how to use it. There is a list of ports of this library to other languages (including C++ and Ruby) listed on the Parsec page of the Haskell wiki, though I'm not familiar with them and so I can't say how close they are to using Parsec in Haskell.
If you want to learn how these work internally and how to write your own, I recommend starting with Chapter 8 ("Functional Parsers") from Graham Hutton's Programming in Haskell. Once you understand that chapter well (which will probably take several readings), you'll be set.

In perl, the Parse::RecDescent modules is the first place to start. Add tutorial to the module name and Google should be able to find plenty of tutorials to get you started.

Defining a grammar using BNF, EBNF or something similar, is easier and later on you will have a better time maintaining it. Also, you can find a lot of examples of grammar definitions. Last but not least, if you are going to talk about your grammar to someone else on the field, it is better if you are both speaking the same language (BNF, EBNF etc.).
Writing your own parsing code is like reinventing the wheel and is prone to errors. It is also less maintainable. Of course, it can be more flexible, and for small projects it might also be a good choice, but using an existing parser generator that takes a grammar and spits out the code should cover most of our needs.
For C++ I would also suggest lex/yacc. For Ruby this looks like a decent choice: Coco/R(uby)

Funny timing: I spent lots of this morning wondering about state machines and parsers, and trying to figure out how I could learn more about them.
For 2, you might take a look at Ragel (it's good for C++ and Ruby).

Here's a tutorial on a self-contained (10 pages!), completely portable compiler-compiler
which can be used to design and implement "low overhead" DSLs very quickly:
http://www.bayfronttechnologies.com/mc_tutorial.html
This site walks you through Val Schorre's 1964 paper on MetaII.
Yes, 1964. And it is amazing. This is how I learned about compilers
back in 1970.

If you already know LISP, why would you also want to learn F#?

What is the added value for learning F# when you are already familiar with LISP?

Static typing (with type inference)
Algebraic data types
Pattern matching
Extensible pattern matching with active patterns.
Currying (with a nice syntax)
Monadic programming, called 'workflows', provides a nice way to do asynchronous programming.
A lot of these are relatively recent developments in the programming language world. This is something you'll see in F# that you won't in Lisp, especially Common Lisp, because the F# standard is still under development. As a result, you'll find there is a quite a bit to learn. Of course things like ADTs, pattern matching, monads and currying can be built as a library in Lisp, but it's nicer to learn how to use them in a language where they are conveniently built-in.
The biggest advantage of learning F# for real-world use is its integration with .NET.

Comparing Lisp directly to F# isn't really fair, because at the end of the day with enough time you could write the same app in either language.
However, you should learn F# for the same reasons that a C# or Java developer should learn it - because it allows functional programming on the .NET platform. I'm not 100% familiar with Lisp, but I assume it has some of the same problems as OCaml in that there isn't stellar library support. How do you do Database access in Lisp? What about high-performance graphics?
If you want to learn more about 'Why .NET', check out this SO question.

If you knew F# and Lisp, you'd find this a rather strange question to ask.
As others have pointed out, Lisp is dynamically typed. More importantly, the unique feature of Lisp is that it's homoiconic: Lisp code is a fundamental Lisp data type (a list). The macro system takes advantage of that by letting you write code which executes at compile-time and modifies other code.
F# has nothing like this - it's a statically typed language which borrows a lot of ideas from ML and Haskell, and runs it on .NET
What you are asking is akin to "Why do I need to learn to use a spoon if I know how to use a fork?"

Given that LISP is dynamically typed and F# is statically typed, I find such comparisons strange.

If I were switching from Lisp to F#, it would be solely because I had a task on my hands that hugely benefitted from some .NET-only library.
But I don't, so I'm not.

Money. F# code is already more valuable than Lisp code and this gap will widen very rapidly as F# sees widespread adoption.
In other words, you have a much better chance of earning a stable income using F# than using Lisp.
Cheers,
Jon Harrop.

F# is a very different language compared to most Lisp dialects. So F# gives you a very different angle of programming - an angle that you won't learn from Lisp. Most Lisp dialects are best used for incremental, interactive development of symbolic software. At the same time most Lisp dialects are not Functional Programming Languages, but more like multi-paradigm languages - with different dialects placing different weight on supporting FPL features (free of side effects, immutable data structures, algebraic data types, ...). Thus most Lisp dialects either lack static typing or don't put much emphasis on it.
So, if you know some Lisp dialect, then learning F# can make a lot of sense. Just don't think that much of your Lisp knowledge applies to F#, since F# is a very different language. As much as an imperative programming used to C or Java needs to unlearn some ideas when learning Lisp, one also needs to unlearn Lisp habits (no types, side effects, macros, ...) when using F#. F# is also driven by Microsoft and taking advantage of the .net framework.

F# has the benefit that .NET development (in general) is very widely adopted, easily available, and more mass market.
If you want to code F#, you can get Visual Studio, which many developers will already have...as opposed to getting the LISP environment up and running.
Additionally, existing .NET developers are much more likely to look at F# than LISP, if that means anything to you.
(This is coming from a .NET developer who coded, and loved, LISP, while in college).

I'm not sure if you would? If you find F# interesting that would be a reason. If you work requires it, it would be a reason. If you think it would make you more productive or bring you added value over your current knowledge, that would be a reason.
But if you don't find F# interesting, your work doesn't require it and you don't think it would make you more productive or bring you added value, then why would you?
If the question on the other hand is what F# gives that lisp don't, then type inference, pattern matching and integration with the rest of the .NET framework should be considered.

I know this thread is old but since I stumbled on this one I just wanted to comment on my reasons. I am learning F# simply for professional opportunities since .NET carries a lot of weight in a category of companies that dominate my field. The functional paradigm has been growing in use among more quantitatively and data oriented companies and I'd like to be one of the early comers to this trend. Currently there doesn't an exist a strong functional language that fully and safely integrates with the .NET library. I actually attempted to port some .NET from Lisp code and it's really a pain b/c the FFI only supports C primitives and .NET interoperability requires an 'interface' construct and even though I know how to do this in C it's really a huge pain. It would be really, really, good if Lisp went the extra mile in it's next standard and required a c++ class (including virtual functions w/ vtables), and a C# style interface type in it's FFI. Maybe even throw in a Java interface style type too. This would allow complete interoperability with the .NET library and make Lisp a strong contender as a large-scale language. However with that said, coming from a Lisp background made learning F# rather easy. And I like how F# has gone the extra mile to provide types that you would commonly see it quantitative type work. I believe F# was created with mathematical work in mind and that in itself has value over Lisp.

One way to look at this (the original question) is to match up the language (and associated tools and platforms) to the immediate task. If the task requires an overwhelming percentage of .NET code, and it would require less shoe-horning in one language than another to meet the task head-on, then take the path of least resistance (F#). If you don't need .NET capabilities, and you're comfortable working with LISP and there's no arm-bending to move away from it, keep using it.
Not really much different from comparing a hammer with a wrench. Pick the tool that fits the job most effectively. Trying to pick a tool that's objectively "best" is nonsense. And in any case, in 20 years, all of the currently "hot" languages might be outdated anyway.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart