Finding Java Grammars - rascal

I'm making my way into Rascal - nice work!
I'm halfway through the Tutor and might be getting ahead of myself, but one of my interests is refactoring Java. I've watched Tijs van der Storm's Curry On 2016 talk where he presents an example of this (the "TrafoFields" module around minute 16 in the video).
I was eager to get into this kind of work so I searched the documentation for the Java grammar and a similar example using it, to no avail. What's more, the library documentation has under lang::java only m3 and jdt. I reloaded Tijs' video to find he uses lang::java::\syntax::Java15. I blindly tried importing that in the Repl and it worked (albeit with lots of warnings)! I opened the Rascal .jar file to find there is even more in this package.
So my questions in this context are:
Why is this not in the documentation? I would have expected that the library documentation is exhaustive. Couldn't you at least add "TrafoFields" to the recipes?!
Is there an alternative way of finding out about such modules besides the online documentation (and apart from searching the .jar file)?
What is this weird backslash in the module name before "syntax", ::\syntax?

All good questions; and also implied suggestions for improvement. In the meantime if you get stuck please ask more questions.
Short answer: we've prioritized documenting what is used most and what is used in courses at universities and then what we've received questions about. So thanks for asking.
In reverse order:
The backslash is a generic escape to use identifiers in Rascal which are also keywords in the language. So any variable or package name or module name could be named "if" but you'd have to escape it. This helps a lot when defining abstract syntax trees for programming languages, as you can imagine where "if" would be a great name for an ast node type. "Syntax" happens to be a keyword in Rascal too.
The Rascal explorer view in eclipse can be used to browse through the library interactively. Also you can crawl the library using util::FileSystem and get a first class representation of everything that is in a library like |std:///|. Agreed, these are poorman's solutions and a good feature would a fully indexed searchable library. We're halfway there (see the analysis::text:: Lucene pre-work we've done towards supporting this.
That is a rhetorical question which I accept as a suggestion for improvement 😉😊 the answer is yes.

Related

is there a way to include another file in smtlib?

Similar to #include in C in importing functions and axioms that are defined in another file. I wasn't able to find such functionality described in the SMTLIB documentation or from the online examples. Any hints?
SMTLib has no means of #include'ing or importing other files. This might look like a shortcoming, but it is quite rare for people to hand-write SMTLib files: It is almost always machine generated from a higher level language, and it is assumed that whoever generates the SMTLib can simply spit out one big file that includes everything you need.
Having said that, I think this would be a useful feature to have indeed. SMTLib standard is always evolving and such features are usually discussed in their mailing list:
https://groups.google.com/forum/#!forum/smt-lib
Feel free to join the discussion and make a request!

Concise description of the Lua vm?

I've skimmed Programming in Lua, I've looked at the Lua Reference.
However, they both tells me this function does this, but not how.
When reading SICP, I got this feeling of: "ah, here's the computational model underlying scheme"; I'm trying to get the same sense concerning Lua -- i.e. a concise description of it's vm, a "how" rather than a "what".
Does anyone know of a good document (besides the C source) describing this?
You might want to read the No-Frills Intro to Lua 5(.1) VM Instructions (pick a link, click on the Docs tab, choose English -> Go).
I don't remember exactly where I've seen it, but I remember reading that Lua's authors specifically discourage end-users from getting into too much detail on the VM; I think they want it to be as much of an implementation detail as possible.
Besides already mentioned A No-Frills Introduction to Lua 5.1 VM Instructions, you may be interested in this excellent post by Mike Pall on how to read Lua source.
Also see related Lua-Users Wiki page.
See http://www.lua.org/source/5.1/lopcodes.h.html . The list starts at OP_MOVE.
The computational model underlying Lua is pretty much the same as the computational model underlying Scheme, except that the central data structure is not the cons cell; it's the mutable hash table. (At least until you get into metaprogramming with metatables.) Otherwise all the familiar stuff is there: nested first-class functions with mutable local variables (let-bound variables in Scheme), and so on.
It's not clear to me that you'd get much from a study of the VM. I did some hacking on the VM a while back and it's a lot like any other register-oriented VM, although maybe a bit cleaner. Only a handful of instructions are Lua-specific.
If you're curious about the metatables, the semantics is described clearly, if somewhat verbosely, in Section 2.8 of the reference manual for Lua 5.1. If you look at the VM code in src/lvm.c you'll see almost exactly that logic implemented in C (e.g., the internal Arith function). The VM instructions are specialized for the common cases, but it's all terribly straightforward; nothing clever is involved.
For years I've been wanting a more formal specification of Lua's computational model, but my tastes run more toward formal semantics...
I've found The Implementation of Lua 5.1 very useful for understanding what Lua is actually doing.
It explains the hashing techniques, garbage collection and some other bits and pieces.
Another great paper is The Implmentation of Lua 5.0, which describes design and motivations of various key systems in the VM. I found that reading it was a great way to parse and understand what I was seeing in the C code.
I am surprised you refer to the C source for the VM as this is protected by lua.org and the tecgraf/puc rio in Brazil specially as the language is used for real business and commercial applications in a number of countries. The paper about The Implementation of lua contains details about the VM in the most detail it is permitted to include but the structure of the VM is proprietary. It is worth noting that versions 5.0 and 5' were commissioned by IBM in Europe for use on customer mainframes and their register-based version have a VM which accepts the IBM defined format of intermediate instructions.

F# parsing Abstract Syntax Trees

What is the best way to use F# to parse an AST to build an interpreter? There are plenty of F# examples for trivial syntax (basic arithmatical operations) but I can't seem to find anything for languages with much larger ranges of features.
Discriminated unions look to be incrediably useful but how would you go about constructing one with large number of options? Is it better to define the types (say addition, subtraction, conditionals, control flow) elsewhere and just bring them together as predefined types in the union?
Or have I missed some far more effective means of writing interpreters? Is having an eval function for each type more effective, or perhaps using monads?
Thanks in advance
Discriminated unions look to be
incrediably useful but how would you
go about constructing one with large
number of options? Is it better to
define the types (say addition,
subtraction, conditionals, control
flow) elsewhere and just bring them
together as predefined types in the
union?
I am not sure what you are asking here; even with a large number of options, DUs still are simple to define. See e.g. this blog entry for a tiny language's DU structure (as well as a more general discussion about writing tree transforms). It's fine to have a DU with many more cases, and common in compilers/interpreters to use such a representation.
As for parsing, I prefer monadic parser combinators; check out FParsec or see this old blog entry. After using such parser combinators, I can never go back to anything like lex/yacc/ANTLR - external DSLs seem so primitive in comparison.
(EDIT: The 'tiny arithmetic examples' you have found are probably pretty much representative of what larger solutions looks like as well. The 'toy' examples usually show off the right architecture.)
You should take a copy of Robert Pickering's "Beginning F#".
The chapter 13, "Parsing Text", contains an example with FsLex and FsYacc, as suggested by Noldorin.
Other than that, in the same book, chapter 12, the author explains how to build an actual simple compiler for an arithmetic language he proposes. Very enlightening. The most important part is what you are looking for: the AST parser.
Good luck.
I second Brian's suggestion to take a look at FParsec. If you're interested in doing things the old-school way with FsLex and FsYacc, one place to look for how to parse a non-trivial language is the F# source itself. See the source\fsharp\FSharp.Compiler directory in the distribution.
You may be interesting in checking out the Lexing and Parsing section of the F# WikiBook. The F# PowerPack library contains the FsLex and FsYacc tools, which asssist greatly with this. The WikiBook guide is a good way to get started with this.
Beyond that, you will need to think about how you actually want to execute the code from the AST form, which is common the design of both compilers and interpreters. This is generally considered the easier part however, and there are lots of general resources on compilers/interpreter out there that should provide information on this.
I haven't done an interpreter myself. Hope the following helps:)
Here's a compiler course taught at Yale using ML, which you might find useful. The lecture notes are very concise (short) and informative. You can follow the first few lecture notes and the assignments. As you know F#, you won't have problem reading ML programs.
Btw, the professor was a student of A. Appel, who is the creator of SML implementation. So from these notes, you also get the most natural way to write a compiler/interpreter in ML family language.
This is an excellent example of a complete Small Basic implementation with F# and FParsec. It includes even IL compiler. The whole code is very accessible and is accompanied by a series of blog post from the author at http://trelford.com/blog/

Standard format for concrete and abstract syntax trees

I have an idea for a hobby project which performs some code analysis and manipulation. This project will require both the concrete and abstract syntax trees of a given source file. Additionally, bi-directional references between the two trees would be helpful. I would like to avoid the work of transcribing a grammar to construct my own lexer and parser.
Is there a standard format for describing either concrete or abstract syntax trees?
Do any widely-used tool chains support outputting to these formats?
I don't have a particular target programming language in mind. Any popular one will do for a prototype, but I'd prefer one I know well: Python, C#, Javascript, or C/C++.
I'd like the ability to run a source file through a tool or library and get back both trees. In an ideal world, it would be practical to run this tool on code as it is being edited by a user and be tolerant of errors. Again, I am simply trying to develop a prototype, so these requirements are pretty lax.
Thanks!
The research community decided that graph exchange was the right thing to do when moving information from one program analysis tool to another.
See http://www.gupro.de/GXL
More recently, the OMG has defined a standard for interchanging Abstract Syntax Trees.
See http://www.omg.org/spec/ASTM/1.0/Beta1/
This problem seems to get solved over and over again.
There's half a dozen "tool bus" proposals made over the years
that all solved it, with no one ever overtaking the industry.
The problem is that a) it is easy to represent ASTs using
any kind of nestable notation [parentheses like LISP,
like XML, ...] so people roll their own solution easily,
and b) for one tool to exchange an AST with another, they
both have to agree essentially on what the AST nodes mean;
but most ASTs are rather accidentally derived from the particular
grammar/parsing technology used by each tool, and there's
almost always disagreement about that between tools.
So, I've seen very few tools that exchange ASTs meaningfully.
If you're doing a hobby thing, I'd stick with a lisp-like
encoding of trees, where each node has the following format:
( ... )
Its easy to generate, and easy to read.
I work on a professional tool to manipulate programs. If we
have print out the AST, we do the above. Mostly individual
ASTs are far too complicated to look at in practice,
so we hardly ever print out the entire AST, at best only
a node and a few children deep. Our tool doesn't exchange
ASTs with anybody (see above reasons :) but does just
fine building it in memory, doing whizzy things with it
for analysis reasons or transformation reasons, and then
either just deleteing it (no need to send it anywhere)
or regenerating the original language text from the tree.
[The latter means you need anti-parsing or "prettyprinting"
technology]
In our project we defined the AST metamodel in UML and use ANTLR (Java) to populate the model. We also maintain the token information from ANTLR after parsing, but we have not yet tried to update the underlying text-file with modifications made on the model.
This has a hideous overhead (in infrastructure, such as Eclipse UML2/EMF), but our goal is to use high-level tools for Model-based/driven Development (MDD, MDA) anyway, so we decided to use it on each level.
I think one of our students once played with OpenArchitectureWare and managed to get changes from the Eclipse-based, generated editor back into the syntax tree (not related to the UML model above) automatically, but I don't know the details about this.
You might also want to look at ANTLR's tree grammars.
Specific standards are an expectation, while more general purpose standards may also be appropriate. Ira Baxter already mentioned GXL, and RDF may be added too, just that it would require an appropriate ontology and is more oriented toward semantic than syntax. Still may be an option to investigate.
For specific standards, Ira Baxter already mentioned ASTM, another one, although it rather targets a specific kind of programming language (logic languages), is a standard for semantic/conceptual graph, known as ISO‑IEC 24707 2007.
Not a standard on its own, but a paper about that matter: Towards Portable Source Code Representations Using XML
.
I don't know any effectively used standard (in this area, that's always house‑made cooking everywhere), I'm just interested too in this topic.

Parsing, where can I learn about it

I've been given a job of 'translating' one language into another. The source is too flexible (complex) for a simple line by line approach with regex. Where can I go to learn more about lexical analysis and parsers?
If you want to get "emotional" about the subject, pick up a copy of "The Dragon Book." It is usually the text in a compiler design course. It will definitely meet your need "learn more about lexical analysis and parsers" as well as a bunch of other fun stuff!
IMH(umble)O, save yourself an arm and/or leg and buy an older edition - it will fill your information desires.
Try ANLTR:
ANTLR, ANother Tool for Language
Recognition, is a language tool that
provides a framework for constructing
recognizers, interpreters, compilers,
and translators from grammatical
descriptions containing actions in a
variety of target languages.
There's a book for it also.
Niklaus Wirth's book "Compiler Construction" (available as a free PDF)
http://www.google.com/search?q=wirth+compiler+construction
I've recently been working with PLY which is an implementation of lex and yacc in Python. It's quite easy to get started with it and there are some simple examples in the documentation.
Parsing can quickly become a very technical topic and you'll find that you probably won't need to know all the details of the parsing algorithm if you're using a parser builder like PLY.
Lots of people have recommended books. For many these are much more useful in a structured environment with assignments and due dates and so forth. Even if not, having the material presented in a different way can help greatly.
(a) Have you considered going to a school with a decent CS curriculum?
(b) There are lots of online lectures, such as MIT's Open Courseware. Their EE/CS section has many courses that touch on parsing, though I can't see any on parsing per se. It's typically introduced as one of the first theory courses as language classification and automata is at the heart of much of CS theory.
If you prefer Java based tools, the Java Compiler Compiler, JavaCC, is a nice parser/scanner. It's config file driven, and will generate java code that you can include in your program. I haven't used it a couple years though, so I'm not sure how the current version is. You can find out more here: https://javacc.dev.java.net/
Lexing/Parsing + typecheck + code generation is a great CS exercise I would recommend it to anyone wanting a solid basis, so I'm all for the Dragon Book
I found this site helpful:
Lex and YACC primer/HOWTO
The first time I used lex/yacc was for a relatively simple project. This tutorial was all I really needed. When I approached more complex projects later, the familiarity I had from this tutorial and a simple project allowed me to build something fancier.
After taking (quite) a few compilers classes, I've used both The Dragon Book and C&T. I think C&T does a far better job of making compiler construction digestible. Not to take anything away from The Dragon Book, but I think C&T is a far more practical book.
Also, if you like writing in Java, I recommend using JFlex and BYACC/J for your lexing and parsing needs.
Yet another textbook to consider is Programming Language Pragmatics. I prefer it over the Dragon book, but YMMV.
If you're using Perl, yet another tool to consider is Parse::RecDescent.
If you just need to do this translation once and don't know anything about compiler technology, I would suggest that you get as far as you can with some fairly simplistic translations and then fix it up by hand. Yes, it is a lot of work. But it is less work than learning a complex subject and coding up the right solution for one job. That said, you should still learn the subject, but don't let not knowing it be a roadblock to finishing your current project.
Parsing Techniques - A Practical Guide
By Dick Grune and Ceriel J.H. Jacobs
This book (freely available as PDF) gives an extensive overview of different parsing techniques/algorithms. If you really want to understand the different parsing algorithms, this IMO is a better reference than the Dragon Book (as Parsing Techniques focuses entirely on parsing, while the Dragon Book covers parsing only as one - although important - part of the compiler construction process).
flex and bison are the new lex and yacc though. The syntax for BNF is often derided for being a bit obtuse. Some have moved to ANTLR and Ragel for this reason.
If you're not doing much translation, you may one to pull a one-off using multiline regexes with Perl or Ruby. Writing a compatible BNF grammar for an existing language is not a task to be taken lightly.
On the other hand, it is entirely possible to leverage any given language's .l and .y files if they are available as open source. Then, you could construct new code from an existing parse tree.

Resources