I am trying to pretty print an AST generated from
createAstFromFile(|cwd:///Java/Hello.java|,true);
Have I just missed how to do this in the documentation?
If you mean unparsing an AST (getting the Java code back) you will have to write something yourself.
If you however mean printing the AST structure nicely indented, we have iprintln exactly for this purpose.
Also, for large ASTs, the REPL might not like it that much, checkout our (as pf yet) undocumented Fast print functions in util::FastPrint. The fiprintln prints to the rascal output window, which is a lot faster.
No I believe the current release does not contains this feature. If you don't rewrite the AST, you can of course get the source by reading the location, as in:
rascal>import IO;
ok
rascal>readFile(ast#\loc)
str: ...
That only works when the weather is right.. The other solutions are:
to use string templates mapping the AST back to source (simplest)
map ASTs to the Box language and call the format function (most powerful and configurable)
a hybrid of the above
I seem to recall there is a function which maps back M3 ASTs back to JDT ASTs in Java and then calls the pretty print function of JDT, but it looks like it was discontinued. In other words, here are some TODOs.
Related
I am writing a clang tool, yet I am quite new to it, so i came across a problem, that I couldn't find in the docs (yet).
I am using the great Matchers API to find some nodes that I will later want to manipulate in the AST. The problem is, that the clang tool will actually parse eeeverything that belongs to the sourcefile including headers like iostream etc.
Since my manipulation will probably include some refactoring I definitely do not want to touch each and every thing the parser finds.
Right now I am dealing with this by comparing the sourceFiles of nodes that I matched against with the argumets in argv, but needless to say, that this feels wrong since it still parses through ALL the iostream code - it just ignores it whilst doing so. I just cant believe there is not a way to just tell the ClangTool something like:
"only match nodes which location's source file is something the user fed to this tool"
Thinking about it it only makes sense if its possible to individually create ASTs for each source file, but I do need them to be aware of each other or share contextual knowledge and I also haven't figured out a way to do that either.
I feel like I am missing something very obvious here.
thanks in advance :)
There are several narrowing matchers that might help: isExpansionInMainFile and isExpansionInSystemHeader. For example, one could combine the latter with unless to limit matches to AST nodes that are not in system files.
There are several examples of using these in the Code Analysis and Refactoring with Clang Tools repository. For example, see the file lib/callsite_expander.h around line 34, where unless(isExpansionInSystemHeader)) is used to exclude call expressions that are in system headers. Another example is at line 27 of lib/function_signature_expander.h, where the same is used to exclude function declarations in system headers that would otherwise match.
I was wondering, if there is a standard, canonical way in Haskell to write not only a parser for a specific file format, but also a writer.
In my case, I need to parse a data file for analysis. However, I also simulate data to be analyzed and save it in the same file format. I could now write a parser using Parsec or something equivalent and also write functions that perform the text output in the way that it is needed, but whenever I change my file format, I would have to change two functions in my code. Is there a better way to achieve this goal?
Thank you,
Dominik
The BNFC-meta package https://hackage.haskell.org/package/BNFC-meta-0.4.0.3
might be what you looking for
"Specifically, given a quasi-quoted LBNF grammar (as used by the BNF Converter) it generates (using Template Haskell) a LALR parser and pretty pretty printer for the language."
update: found this package that also seems to fulfill the objective (not tested yet) http://hackage.haskell.org/package/syntax
This is a rather technical question about the compilation process of
ABAP code.
I know that there are ABAP parser and scanner classes that actually
call C kernel functions to do the real work. Then there is code completion
functionality with a transaction that returns and prints the AST (abstract source tree) of a program as ABAP list or XML.
Now my question is: would it be possible to 'skip' the ABAP source
code and directly produce such an AST by other means than writing and then executing an
ABAP program in SE80 or so, and give it to some function that compiles and
executes it as if it had been written in and parsed from ABAP code?
That is, can I skip scanning and parsing of sources and directly give
an AST to the compiler? If so, in what format? ABAP lists look more
a printing format, not like e.g. Lisp lists surrounded by parentheses.
Unfortunately, the ABAP Compiler does not use ASTs to generate the VM code.
The ABAP compiler works sequentially and translates statement per statement (i.e. everything that is between two ".") into one or more virtual machine opcodes.
If you are curious, you could take a look at transaction SYNT which shows the compiler output. You could also take a look at report RSLOAD00 which shows the ABAP VM code that has been generated for a program.
ASTs have only been built on top to allow for code completion or high-level analyses.
If you want to invoke the ABAP compiler, you will need to generate textual ABAP source code.
I encountered a problem while doing my student research project. I'm an electrical engineering student, but my project has somewhat to do with theoretical computer science: I need to parse a lot of pascal sourcecode-files for typedefinitions and constants and visualize all occurrences. The typedefinitions are spread recursively over various files, i.e. there is type a = byte in file x, in file y, there is a record (struct) b, that contains type a and then there is even a type c in file z that is an array of type b.
My idea so far was to learn about compiler construction, since the compiler has to resolve all typedefinitions and break them down to the elemental types.
So, I've read about compiler construction in two books (one of which is even written by the pascal inventor), but I'm lacking so many basics of theoretical computer science that it took me one week alone to work my way halfway through. What I've learned so far is that for achieving my goal, lexer and parser should be sufficient. Since this software is only a really smart part of the whole project, I can't spend so much time with it, so I started experimenting with flex and later with antlr.
My hope was, that parsing for typedefinitions only was such an easy task, that I could manage to do it with only using a scanner and let it do some parser's work: The pascal-files consist of 5 main-parts, each one being optional: A header with comments, a const-section, a type-section, a var-section and (in least cases) a code-section. Each section has a start-identifier but no clear end-identifier. So I started searching for the start of the type- and const-section (TYPE, CONST), discarding everything else. In flex, this is fairly easy, because it allows "start conditions". They can be used as various states like "INITIAL", "TYPE-SECTION", "CONST-SECTION" and "COMMENT" with different rules for each state. I wanted to get back a string from the scanner with following syntax " = ". There was one thing that made this task difficult: Some type contain comments like in this example: AuEingangsBool_t {PCMON} = MAX_AuEingangsFeld;. The scanner can not extract such type-definition with a regular expression.
My next step was to do it properly with scanner AND parser, so I searched for a parsergenerator and found antlr. Since I write the tool in C# anyway, I decided to use its scannergenerator, too, so that I do not have to communicate between different programs. Now I encountered following Problem: AFAIK, antlr does not support "start conditions" as flex do. That means, I have to scan the whole file (okay, comments still get discarded) and get a lot of unneccessary (and wrong) tokens. Because I don't use rules for the whole pascal grammar, the scanner would identify most keywords of the pascal syntax as user-identifiers and the parser would nag about all those series of tokens, that do not fit to type- and constant-defintions
Now, finally my question(s): Can anyone of you tell me, which approach leads anywhere for my project? Is there a possibility to scan only parts of the source-files with antlr? Or do I have to connect flex with antlr for that purpose? Can I tell antlr's parser to ignore every token that is not in the const- or type-section? Are those tools too powerful for my task and should I write own routines instead?
You'd be better off to find a compiler for Pascal, and simply modify to report the information you want. Presumably there is such a compiler for your Pascal, and often the source code for such compilers is available.
Otherwise you essentially need to build a parser. Building lexer, and then hacking around with the resulting lexemes, is essentially building a bad parser by ad hoc methods. ANTLR is a good way to go; you can define the lexemes (including means to pick up and ignore comments) pretty easily, especially for older dialects of Pascal. You'll need good BNF rules for the type information that you want, and translate those rules to the parser generator. What you can do to minimize work, is to cheat on rules for the parts of the language you don't care about. For instance, you could write an accurate subgrammar for assignment statements. Since you don't care about them, you can write a sloppy subgrammar that treats assignment statements as anything that begins with an identifier, is followed by arbitrary other tokens, and ends with semicolon. This kind of a grammar is called an "island grammar"; it is only accurate where it needs to be accurate.
I don't know about the recursive bit. Is there a reason you can't just process each file separately? The answer may depend on what information you want to know about each type declaration, and if you go deep enough, you may need a symbol table as well as an island parser. Parser generators offer you no help for this.
First, there can be type and const blocks within other blocks (procedures, in later Delphi versions also classes).
Moreover, I'm not entirely sure that you can actually simply scan for a const token, and then start parsing. Const is also used for other purposes in most common (Borland) Pascal dialects. Some keywords can be reused in a different context, and if you don't parse the global blockstructure, and only look for const and type in specific places you will erroneously start parsing there.
A base problem of course is the comments. Scanners cut out comments as early as possible, and don't regard them further. You probably have to setup the scanner so that comments are attached to the adjacent tokens as field (associate with token before or save them up till a certain token follows).
As far antlr vs flex, no clue. The only parsergenerator I have some minor experience in parsing Pascal with is Coco/R (a parsergenerator popular by Wirthians), but in general I (and many pascalians) prefer handcoded.
By concept/function/implementation, what are the differences between compilers and parsers?
A compiler is often made up of several components, one of which is a parser.
A common set of components in a compiler is:
Lexer - break the program up into words.
Parser - check that the syntax of the sentences are correct.
Semantic Analysis - check that the sentences make sense.
Optimizer - edit the sentences for brevity.
Code generator - output something with equivalent semantic meaning using another vocabulary.
To add a little bit:
As mentioned elsewhere, small C is a recursive decent compiler that generated code as it parsed. Basically syntactical analysis, semantic analysis, and code generation in one pass. As I recall, it also lexed in the parser.
A long time ago, I wrote a C compiler (actually several: the Introl-C family for microcontrollers) that used recursive descent and did syntax and semantic checking during the parse and produced a tree representation of the program from which code was generated.
Today, I'm working on a compiler that does source -> tokens -> AST -> IR -> code, pretty much as I described above.
A parser just reads a text into an internal, more abstract representation, often a tree or graph of some sort.
A compiler translates such an internal representation into another format. Most often this means converting source code into executable programs. But the target doesn't have to be machine code. It can be another programming language as well; the compiler would still be a compiler. Obviously a compiler needs a parser to actually read its input.
Compiler always have a parser inside. Parser just process the language and return the tree representation of it, compiler generate something from that tree, actual machine codes or another language.
A parser is one element of a compiler.
Are you looking for the differences between an interpreter and a compiler?
A parser takes in raw-data and parses it into a tree structure. This syntax-tree is then passed on to generator, which will turn it into whatever it is supposed to generate.
So, a parser is a part of a compiler.
In general, parser is a part of the compiler, but compiler is designed to convert the received script generally into machine-readable code or sometimes into another language.
A compiler is a special type of computer program that translates a human readable text file into a form that the computer can more easily understand. At its most basic level, a computer can only understand two things, a 1 and a 0. At this level, a human will operate very slowly and find the information contained in the long string of 1s and 0s incomprehensible. A compiler is a computer program that bridges this gap.
A parser is a piece of software that evaluates the syntax of a script when it is executed on a web server. For scripting languages used on the web, the parser works like a compiler might work in other types of application development environments.Parsers are commonly used in script development because they can evaluate code when the script is executed and do not require that the code be compiled first.