clang finding operation ASAP times - clang

I need to read the DFG of a program and find the ASAP scheduling values of operations in clang. I've written the preprocessor that parses the main file and builds the AST.
From here however, I have no idea how to proceed. Are there any built-in classes that I can use? If so how do I use them to get the ASAP values? If not what do I have to do to get them manually?

Related

llvm-cov: statistics for uninstantiated functions

I'm starting to work with llvm-cov to produce coverage statistics for my project. llvm-cov has several categories: line coverage, function coverage and region coverage. But they all consider only instantiated functions, functions which are not instantiated are simply ignored. This way it is easy to get close to 100% coverage for files which have a low percentage of instantiated functions, which is not what I want. Is it possible to make llvm-cov consider even uninstantiated functions or make it produce separate coverage statistics?
At the moment, unfortunately not. This is a missing capability in llvm-cov.
The reason for this is that clang does not emit any code for unspecialized templates, and the coverage generation logic depends on clang emitting code for a function. This is a weird limitation. The compiler does have enough information to describe these templates.
Edit: Of course, another point to consider is that C++ translation units tend to contain absolutely enormous amounts of unspecialized/uninstantiated templates, and if the compiler were to emit coverage mapping regions for each of these, compile-time and binary size would likely regress massively.

Does OpenMP hints bypass the vectorisation legality check in llvm

I am currently looking into how "#pragma omp for simd" is actually recognised in llvm. To my knowledge, clang will parse it and set metadata in IR to indicate this force-vectorisation hint and later optimisation passes would read it and vectorise the marked loop. Therefore, the loop should be vectorised even the compiler think it might not be safe to do so?
So my assumption is that such force-vectorisation hints should bypass both the vectorisation legality and cost model check. However, in LoopVectorize.cpp, I can't see how this is done. All loops will be sent to a legality check of LVL.canVectorize() and, if this condition does not fit, it return to false directly without actually reaching the vectorisation stage.
Is there anything wrong with my assumption made on the use of force-vectorisation hints?
Thanks in advance,
T
While llvm has publicized a goal of going beyond gcc in implementation of omp simd, I haven't seen much of that myself. I don't see any updates of their high level docs in last 2 years. While it might be loosely described as "vectorize if you know how," I don't think anyone considers it as overriding proved dependencies. I wouldn't be surprised if it acts like auto vectorization with cost model suspended and possible but unproven dependencies ignored.

Why is using an AST faster than not using one?

I'm making an interpreter for my own language as a hobby project. Currently my interpreter just executes the code as it sees it. I've heard you should make the parser generate an AST from the source code. So I was wondering, how does an AST actually make things faster than just executing the code linearly, as the parser sees it?
Because then you would have to do the parsing all the time. If you have a loop for instance, you'd have to parse the commands in the loop body over and over again.
Also, I would argue that it's cleaner since you break down the problem in two distinct tasks: Deal with syntax, then deal with semantics.
It isn't specifically the "AST" that makes it faster.
It is using any data structure (AST, symbol tables, control flow graph, triples, p-codes, machine code) that caches the analysis of source code to extract its intended meaning, and as much of precomputation of the answer ("optimization"), as possible. In effect, anything that partially compiles the code, should produce programs that run faster than an interpreter of the pure text.
In interesting tradeoff: if the amount of program being executed before the execution stops isn't very big, it may actually be cheaper to execute the text, than to do any compiler-style analysis.
Given the speed of machines these days, one can sloppily compile a pretty big program in 100 milliseconds, which is about as fast as a human can react. Various versions of TurboPascal back in the 80s and 90s were pretty famous for this.

Serializing a TdwsProgram

As I understand it, DWScript does not compile scripts into an intermediary bytecode. However, I would like to be able to store a "compiled" script, to be able to send it through a stream or save it to a file.
I was wondering: Is there a way to serialize a TdwsProgram object?
I didn't manage to find any answer anywhere. I have looked over the code and it doesn't seem to be possible, but I thought I should ask the question anyway...
As far as I remember, it is not implemented nor wanted by its actual maintainer (since the execution AST is a tree of objects).
See this reference article about Why no bytecode format.
The easiest would be to first stream the source code, then compile it again.
DWS compilation is very fast, faster than Delphi, and Eric tries to always improve it, even if new features are added.

Generating intermediate code in a compiler. Is an AST or parse tree always necessary when dealing with conditionals?

I'm taking a compiler-design class where we have to implement our own compiler (using flex and bison). I have had experience in parsing (writing EBNF's and recursive-descent parsers), but this is my first time writing a compiler.
The language design is pretty open-ended (the professor has left it up to us). In class, the professor went over generating intermediate code. He said that it is not necessary for us to construct an Abstract Syntax Tree or a parse tree while parsing, and that we can generate the intermediate code as we go.
I found this confusing for two reasons:
What if you are calling a function before it is defined? How can you resolve the branch target? I guess you would have to make it a rule that you have to define functions before you use them, or maybe pre-define them (like C does?)
How would you deal with conditionals? If you have an if-else or even just an if, how can you resolve the branch target for the if when the condition is false (if you're generating code as you go)?
I planned on generating an AST and then walking the tree after I create it, to resolve the addresses of functions and branch targets. Is this correct or am I missing something?
The general solution to both of your issues is to keep a list of addresses that need to be "patched." You generate the code and leave holes for the missing addresses or offsets. At the end of the compilation unit, you go through the list of holes and fill them in.
In FORTH the "list" of patches is kept on the control stack and is unwound as each control structure terminates. See FORTH Dimensions
Anecdote: an early Lisp compiler (I believe it was Lisp) generated a list of machine code instructions in symbolic format with forward references to the list of machine code for each branch of a conditional. Then it generated the binary code walking the list backwards. This way the code location for all forward branches was known when the branch instruction needed to be emitted.
The Crenshaw tutorial is a concrete example of not using an AST of any kind. It builds a working compiler (including conditionals, obviously) with immediate code generation targeting m68k assembly.
You can read through the document in an afternoon, and it is worth it.

Resources