Process of scanning, parsing and compiling ABAP sources?

Process of scanning, parsing and compiling ABAP sources? - parsing

This is a rather technical question about the compilation process of
ABAP code.
I know that there are ABAP parser and scanner classes that actually
call C kernel functions to do the real work. Then there is code completion
functionality with a transaction that returns and prints the AST (abstract source tree) of a program as ABAP list or XML.
Now my question is: would it be possible to 'skip' the ABAP source
code and directly produce such an AST by other means than writing and then executing an
ABAP program in SE80 or so, and give it to some function that compiles and
executes it as if it had been written in and parsed from ABAP code?
That is, can I skip scanning and parsing of sources and directly give
an AST to the compiler? If so, in what format? ABAP lists look more
a printing format, not like e.g. Lisp lists surrounded by parentheses.

Unfortunately, the ABAP Compiler does not use ASTs to generate the VM code.
The ABAP compiler works sequentially and translates statement per statement (i.e. everything that is between two ".") into one or more virtual machine opcodes.
If you are curious, you could take a look at transaction SYNT which shows the compiler output. You could also take a look at report RSLOAD00 which shows the ABAP VM code that has been generated for a program.
ASTs have only been built on top to allow for code completion or high-level analyses.
If you want to invoke the ABAP compiler, you will need to generate textual ABAP source code.

Related

Scanner and parser interaction

I am new to flex/bison. Reading books, it seems that in nearly all compiler implementations, the parser interacts with the scanner in a "coroutine" manner, that whenever the parser needs a token, it calls the scanner to get one, and left the scanner aside when it's busy on shift/reduce. A natural question is that why not let the scanner produces the token-stream (from the input byte-stream) as a whole, and then pass the entire token-stream to the parser, thus there is no explicit interaction betw. the two? Well, I can image that there are some drawbacks in this manner, and I can also see some benefits of doing so.
My question is, is there a sort of "comprehensive" discussion on that aspect, or is there any compiler implementation uses different scanner/parser interaction scheme other than "coroutine" manner?

In the traditional arrangement, the parser calls the scanner whenever it needs a token.
That's the same logic as used in the scanner (or many other programs) which call the I/O library every time they need more input. That's not usually described as a coroutine, and I'm not convinced it's an accurate description of the parser/scanner interaction either.
In coroutine control flow, two functions call each other in tandem. That's not usually the way I/O is handled. The fread() interface does maintain state for the next call (the file position, at least, and maybe a buffer) but it the calls are self contained.
In a sense, there is no difference between calling yylex() to get the next token and calling scanf() to get the next data value.
This is not always the most convenient architecture for a scanner. Sometimes, it would be convenient for the scanner to be able to feed tokens into the parser. A typical use case is when the scanner is generating tokens, for exanple through macro expansion, but sometimes it is just that the match of a single scanner pattern contains more than one token.
Many parser generators, including Bison, can generate callable parsers, usually called "push parsers". In this model, the scanner calls the parser with each succesive token. This is still not a coroutine model, really; it is just control-flow inversion. In the analogy with ordinary I/O, it's the equivalent of taking a data processor which called fgets() to read each input line and rewriting it as a process_line() function which is given a line of data to process (and thus does not interact with the I/O library). An early implementation of push parsing can be found in the Lemon parser generator.
Coroutine-like control flow could be useful for creating a parser whose eventual input stream must be handled asynchronously. But that doesn't really require coroutining between the parser and the scanner; rather, it requires coroutining between the scanner and the input stream. Again, coroutining is not really necessary and might be overkill: inverting control flow should suffice. Flex does not provide a "push scanner" interface, but other scanner generators do. I believe this feature is supported by Re2c, for example.

Pretty printing AST in Rascal

I am trying to pretty print an AST generated from
createAstFromFile(|cwd:///Java/Hello.java|,true);
Have I just missed how to do this in the documentation?

If you mean unparsing an AST (getting the Java code back) you will have to write something yourself.
If you however mean printing the AST structure nicely indented, we have iprintln exactly for this purpose.
Also, for large ASTs, the REPL might not like it that much, checkout our (as pf yet) undocumented Fast print functions in util::FastPrint. The fiprintln prints to the rascal output window, which is a lot faster.

No I believe the current release does not contains this feature. If you don't rewrite the AST, you can of course get the source by reading the location, as in:
rascal>import IO;
ok
rascal>readFile(ast#\loc)
str: ...
That only works when the weather is right.. The other solutions are:
to use string templates mapping the AST back to source (simplest)
map ASTs to the Box language and call the format function (most powerful and configurable)
a hybrid of the above
I seem to recall there is a function which maps back M3 ASTs back to JDT ASTs in Java and then calls the pretty print function of JDT, but it looks like it was discontinued. In other words, here are some TODOs.

Incremental Parsing from Handle in Haskell

I'm trying to interface Haskell with a command line program that has a read-eval-print loop. I'd like to put some text into an input handle, and then read from an output handle until I find a prompt (and then repeat). The reading should block until a prompt is found, but no longer. Instead of coding up my own little state machine that reads one character at a time until it constructs a prompt, it would be nice to use Parsec or Attoparsec. (One issue is that the prompt changes over time, so I can't just check for a constant string of characters.)
What is the best way to read the appropriate amount of data from the output handle and feed it to a parser? I'm confused because most of the handle-reading primatives require me to decide beforehand how much data I want to read. But it's the parser that should decide when to stop.

You seem to have two questions wrapped up in here. One is about incremental parsing, and one is about incremental reading.
Attoparsec supports incremental parsing directly. See the IResult type in Data.Attoparsec.Text. Parsec, alas, doesn't. You can run your parser on what you have, and if it gives an error, add more input and try again, but you really don't know if the error was an unrecoverable parse error, or just needing for more input.
In your case, usualy REPLs read one line at a time. Hence you can use hGetLine to read a line - pass it to Attoparsec, and if it parses evaluate it, and if not, get another line.
If you want to see all this in action, I do this kind of thing in Plush.Job.Output, but with three small differences: 1) I'm parsing byte streams, not strings. 2) I've set it up to pull as much as is available from the input and parse as many items as I can. 3) I'm reading directly from file descriptos. But the same structure should help you do it in your situation.

Partial parsing with flex/antlr

I encountered a problem while doing my student research project. I'm an electrical engineering student, but my project has somewhat to do with theoretical computer science: I need to parse a lot of pascal sourcecode-files for typedefinitions and constants and visualize all occurrences. The typedefinitions are spread recursively over various files, i.e. there is type a = byte in file x, in file y, there is a record (struct) b, that contains type a and then there is even a type c in file z that is an array of type b.
My idea so far was to learn about compiler construction, since the compiler has to resolve all typedefinitions and break them down to the elemental types.
So, I've read about compiler construction in two books (one of which is even written by the pascal inventor), but I'm lacking so many basics of theoretical computer science that it took me one week alone to work my way halfway through. What I've learned so far is that for achieving my goal, lexer and parser should be sufficient. Since this software is only a really smart part of the whole project, I can't spend so much time with it, so I started experimenting with flex and later with antlr.
My hope was, that parsing for typedefinitions only was such an easy task, that I could manage to do it with only using a scanner and let it do some parser's work: The pascal-files consist of 5 main-parts, each one being optional: A header with comments, a const-section, a type-section, a var-section and (in least cases) a code-section. Each section has a start-identifier but no clear end-identifier. So I started searching for the start of the type- and const-section (TYPE, CONST), discarding everything else. In flex, this is fairly easy, because it allows "start conditions". They can be used as various states like "INITIAL", "TYPE-SECTION", "CONST-SECTION" and "COMMENT" with different rules for each state. I wanted to get back a string from the scanner with following syntax " = ". There was one thing that made this task difficult: Some type contain comments like in this example: AuEingangsBool_t {PCMON} = MAX_AuEingangsFeld;. The scanner can not extract such type-definition with a regular expression.
My next step was to do it properly with scanner AND parser, so I searched for a parsergenerator and found antlr. Since I write the tool in C# anyway, I decided to use its scannergenerator, too, so that I do not have to communicate between different programs. Now I encountered following Problem: AFAIK, antlr does not support "start conditions" as flex do. That means, I have to scan the whole file (okay, comments still get discarded) and get a lot of unneccessary (and wrong) tokens. Because I don't use rules for the whole pascal grammar, the scanner would identify most keywords of the pascal syntax as user-identifiers and the parser would nag about all those series of tokens, that do not fit to type- and constant-defintions
Now, finally my question(s): Can anyone of you tell me, which approach leads anywhere for my project? Is there a possibility to scan only parts of the source-files with antlr? Or do I have to connect flex with antlr for that purpose? Can I tell antlr's parser to ignore every token that is not in the const- or type-section? Are those tools too powerful for my task and should I write own routines instead?

You'd be better off to find a compiler for Pascal, and simply modify to report the information you want. Presumably there is such a compiler for your Pascal, and often the source code for such compilers is available.
Otherwise you essentially need to build a parser. Building lexer, and then hacking around with the resulting lexemes, is essentially building a bad parser by ad hoc methods. ANTLR is a good way to go; you can define the lexemes (including means to pick up and ignore comments) pretty easily, especially for older dialects of Pascal. You'll need good BNF rules for the type information that you want, and translate those rules to the parser generator. What you can do to minimize work, is to cheat on rules for the parts of the language you don't care about. For instance, you could write an accurate subgrammar for assignment statements. Since you don't care about them, you can write a sloppy subgrammar that treats assignment statements as anything that begins with an identifier, is followed by arbitrary other tokens, and ends with semicolon. This kind of a grammar is called an "island grammar"; it is only accurate where it needs to be accurate.
I don't know about the recursive bit. Is there a reason you can't just process each file separately? The answer may depend on what information you want to know about each type declaration, and if you go deep enough, you may need a symbol table as well as an island parser. Parser generators offer you no help for this.

First, there can be type and const blocks within other blocks (procedures, in later Delphi versions also classes).
Moreover, I'm not entirely sure that you can actually simply scan for a const token, and then start parsing. Const is also used for other purposes in most common (Borland) Pascal dialects. Some keywords can be reused in a different context, and if you don't parse the global blockstructure, and only look for const and type in specific places you will erroneously start parsing there.
A base problem of course is the comments. Scanners cut out comments as early as possible, and don't regard them further. You probably have to setup the scanner so that comments are attached to the adjacent tokens as field (associate with token before or save them up till a certain token follows).
As far antlr vs flex, no clue. The only parsergenerator I have some minor experience in parsing Pascal with is Coco/R (a parsergenerator popular by Wirthians), but in general I (and many pascalians) prefer handcoded.

Difference between compilers and parsers?

By concept/function/implementation, what are the differences between compilers and parsers?

A compiler is often made up of several components, one of which is a parser.
A common set of components in a compiler is:
Lexer - break the program up into words.
Parser - check that the syntax of the sentences are correct.
Semantic Analysis - check that the sentences make sense.
Optimizer - edit the sentences for brevity.
Code generator - output something with equivalent semantic meaning using another vocabulary.
To add a little bit:
As mentioned elsewhere, small C is a recursive decent compiler that generated code as it parsed. Basically syntactical analysis, semantic analysis, and code generation in one pass. As I recall, it also lexed in the parser.
A long time ago, I wrote a C compiler (actually several: the Introl-C family for microcontrollers) that used recursive descent and did syntax and semantic checking during the parse and produced a tree representation of the program from which code was generated.
Today, I'm working on a compiler that does source -> tokens -> AST -> IR -> code, pretty much as I described above.

A parser just reads a text into an internal, more abstract representation, often a tree or graph of some sort.
A compiler translates such an internal representation into another format. Most often this means converting source code into executable programs. But the target doesn't have to be machine code. It can be another programming language as well; the compiler would still be a compiler. Obviously a compiler needs a parser to actually read its input.

Compiler always have a parser inside. Parser just process the language and return the tree representation of it, compiler generate something from that tree, actual machine codes or another language.

A parser is one element of a compiler.
Are you looking for the differences between an interpreter and a compiler?

A parser takes in raw-data and parses it into a tree structure. This syntax-tree is then passed on to generator, which will turn it into whatever it is supposed to generate.
So, a parser is a part of a compiler.

In general, parser is a part of the compiler, but compiler is designed to convert the received script generally into machine-readable code or sometimes into another language.

A compiler is a special type of computer program that translates a human readable text file into a form that the computer can more easily understand. At its most basic level, a computer can only understand two things, a 1 and a 0. At this level, a human will operate very slowly and find the information contained in the long string of 1s and 0s incomprehensible. A compiler is a computer program that bridges this gap.
A parser is a piece of software that evaluates the syntax of a script when it is executed on a web server. For scripting languages used on the web, the parser works like a compiler might work in other types of application development environments.Parsers are commonly used in script development because they can evaluate code when the script is executed and do not require that the code be compiled first.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart