I don't know much about Clang APIs, so forgive the silly question.
If I am building a compiler for a language which have the same or similar features to C++, can I use Clang APIs to take responsibility for the code generation for me? For example, say I implemented a parser to parse the following code:
def class Adder
def func Add(x as int, y as int) as int
return x + y
Now say I parsed this code and I have the AST for it, I can use Clang APIs to generate the code for this class for me?
I know I can use LLVM APIs, and I have done so, but LLVM as relatively low level and it doesn't support OOP, so I would have to support them myself, which is not an easy task at all. So I am wondering whether there is anyway to employ Clang to do that job for me. If yes, I would be grateful if you can provide me with examples and links.
Thanks
The short answer is no.
A longer answer is that, Clang is just a compiler from C++ (and C and ObjC) to LLVM IR, going via the AST. Its external APIs all relate to compiling and analyzing C++.
Once you parse a language like your sample to an AST, what you need is precisely the LLVM APIs to construct LLVM IR. The LLVM tutorial is the perfect start.
Now, it's not that Clang is useless for you. Clang compiles C++ to LLVM IR. So it has code for handling OOP constructs and all other C++ constructs that are higher-level than LLVM IR. You can definitely learn from what it's doing by reading its code, but AFAIK none of this is really a public API.
Related
This question is purely from research point of view and right now I am not looking at any practical aspect of it.
Just like we have decompilers which can take in a binary code and generate LLVM IR, like
https://github.com/repzret/dagger or https://github.com/avast/retdec
and many other.
Do we have some code generator which can convert an LLVM IR to Clang AST?
Thank You in advance.
Found one dropped project -
https://www.phoronix.com/scan.php?page=news_item&px=MTE2OTg
Looking for more.
Going from the AST to the LLVM IR is a one way street.
Take a look at this picture.
A source code file of the high level programming language (which maybe C, C++, or Rust), is converted into the Clang AST. This is a data structure which has a knowledge about the source code constructs of the programming language itself. An AST is specific to a programming language. It is a description of the parsed source code file of the programming language, in the same way as the Javascript DOM tree is a description of the HTML document. This means that the AST contains information specific to that programming language. If the programming language is Rust, the Rust AST might for example contain functional coding constructs.
The LLVM IR however is sometimes described as a portable, high-level assembly language, because it has constructions that can map closely to system hardware.
A frontend module converts a high level programming language into LLVM IR. It does this by generating a language specific AST and then recursively traversing that AST and generating LLVM code constructs representing each node in the AST. Then we have LLVM IR code. Then the backend module converts the LLVM IR into an architecture specific assembly code.
There are multiple frontend modules, one for each high level language that you want to convert into LLVM IR. Once this conversion is complete, the generated LLVM IR has no way of knowing what programming language it came from. You could take C++ code and the same code written in Rust, and after generating the LLVM IR you won't be able to tell them apart.
Once the LLVM IR has been generated any high level language specific information is gone. This is including information about how to generate an AST, because an AST needs knowledge about coding constructs specific to that programming language.
Going from a high level (more abstract) source code representation into a medium level, such as LLVM IR, and even into a lower level, such as assembly code is relatively easy.
Going the other way, from a very low level machine specific code, to a more abstract source code of a high level programming language is much harder. This is because in high level programming languages you can solve the same problem many different ways, while the representation of code in assembly language is more limited, so you have no way of knowing which specific high level coding construct the low level code originally came from.
This is why in principle you cannot go from the LLVM IR into an AST. If someone would indeed try to do such a thing, it won't be an exact same representation of the original high level language source code, and it won't be very readable.
I have been reading about libadalang 1 2 and I am very impressed by it. However, I was wondering if this technique has already been used and another language supports a library for syntactically and semantically analyzing its code. Is this a unique approach?
C and C++: libclang "The C Interface to Clang provides a relatively small API that exposes facilities for parsing source code into an abstract syntax tree (AST), loading already-parsed ASTs, traversing the AST, associating physical source locations with elements within the AST, and other facilities that support Clang-based development tools." (See libtooling for a C++ API)
Python: See the ast module in the Python Language Services section of the Python Library manual. (The other modules can be useful, as well.)
Javascript: The ongoing ESTree effort is attempting to standardize parsing services over different Javascript engines.
C# and Visual Basic: See the .NET Compiler Platform ("Roslyn").
I'm sure there are lots more; those ones just came off the top of my head.
For a practical and theoretical grounding, you should definitely (re)visit the classical textbook Structure and Interpretation of Computer Programs by Abelson & Sussman (1st edition 1985, 2nd edition 1996), which helped popularise the idea of Metacircular Interpretation -- that is, interpreting a computer program as a formal datastructure which can be interpreted (or otherwise analysed) programmatically.
You can see "libadalang" as ASIS Mark II. AdaCore seems to be attempting to rethink ASIS in a way that will support both what ASIS already can do, and more lightweight operations, where you don't require the source to compile, to provide an analysis of it.
Hopefully the final API will be nicer than that of ASIS.
So no, it is not a unique approach. It has already been done for Ada. (But I'm not aware of similar libraries for other languages.)
I have been handed a good-sized chunk of c-code that would be better packaged as an ios framework, that our apps may choose to embed in their projects or, potentially, distribute to 3rd party.
While following the instructions # https://github.com/jverkoey/iOS-Framework#first_parties, the question i am asking popped up in my mind especially that the idea of a rewrite in objective-c is daunting, given the schedule and my current level of objective-c expertise.
A minimal set/amount of Objective-c is fine, if necessary for language binding or some such thing.
The original code is written in C (it's designed to be portable) but makes extensive use of gcc macros and extensions.
Your advice would be highly appreciated!
Yes, the straightforward (though possibly non-trivial) approach is simply to write a wrapper around the C library. If you need to support Obj-C and C developers down the road, I'd move the very-core functionality into a C only library, and the write better libraries to access the core feature set for both Obj-C and C (if this is even needed), so that there's less extra "wrapping" happening.
The original code is written in C (it's designed to be portable) but makes extensive use of gcc macros and extensions.
Clang has a pretty good attitude and approach to supporting the GCC extensions. I'd keep an eye out for anything very esoteric, but you should be OK.
I want to parse Verilog gate level code and store the data in a data structure (ex. graph).
Then I want to do something on the gates in C/C++ and output a corresponding Verilog file.
(I would like to build one program which input and output are Verilog gate level code)
(input.v => myProgram => output.v)
If there is any library or open source code to do so?
I found that it can be done by Flex and Bison but I have no idea how to use Flex and Bison.
There was a similar question a few days ago about doing this in ruby, in which I pointed to my Verilog parser gem. Not sure if it is robust enough for you though, would love feedback, bug reports, feature requests.
There are perl verilog parsers out there but I have not used any of them directly and avoid perl, hopefully others can add info about other parsers.
I have used Verilog-Perl successfully to parse Verilog code. It is well-maintained: it even supports the recent SystemVerilog extensions.
Yosys (https://github.com/cliffordwolf/yosys) is a framework for Verilog Synthesis written in C++. Yosys is still under construction but if you only want to read and write gate-level netlists it can do what you need..
PS: A reference manual (that also covers the C++ APIs) is on the way. I've written ~100 pages already, but can't publish it before I've finished my BSc. thesis (another month or so).
I have been trying to obtain ASTs from Clang but I have not been successfully so far. I found a one year old question here at stack that mentions about two other ways to obtain the ast using Clang which are:
./llvmc -cc1 -ast-dump file.c
./llvmc -cc1 -ast-print file.c
On this question doxygen is mentioned and a representation where an ast is given but I am mostly looking for one on some textual form such as XML so that further analysis can be performed.
lastly there was another question here on stack about exactly XML import but it was discontinued for several reasons also mentioned.
My question thus is, which version and how can I use it from the console to obtain ast related information for a given code in C? I believe this to be a very painless one line command code like those above but the documentation index did not refer anything about ast from as much as I have read and the only one at llvmc I found was about writing an AST by hand which is not really what I am looking for.
I tried all of the commands above but they all already fail on version 2.9 and I already found out llvm changes a whole lot between each version.
Thank you.
OP says "open to other suggestions as well".
He might consider our DMS Software Reengineering Toolkit with its C Front End.
While it would be pretty easy to exhibit a C AST produced by this, it is easier to show an ObjectiveC AST [already at SO] produced by DMS using the same C front end (ObjectiveC is a dialect of C).
See https://stackoverflow.com/a/10749970/120163 DMS can produce an XML equivalent of this, too.
We don't recommend exporting trees as text, because real trees for real code are simply enormous and are poorly manipulated as text or XML objects in our experience. One usually needs machinery beyond parsing. See my discussion about Life After Parsing