I am trying to implement a clang tool that does syntactic analysis using ASTMatcher API. I am trying to find out how to specify extra flags for clang to disable semantic checks. I know clang builds a giant AST which includes system headers. Is there any way to parse source code while disabling semantic checks which give rise to unknown type errors? I just want to analyze the syntactic integrity of the source code of the given file. So far, I have tried to get around this problem by modifying the DSL to check whether the matching code is from the main file:
cxxRecordDecl(isExpansionInMainFile()).bind("class");
But this doesn't stop clang from looking into the header files.
Unfortunately, it is impossible to use plain syntactic analysis without sema. The problem is not specifically with clang, but with all of the parsers for C++ out there. Without simultaneous semantic analysis, any syntactic analysis is ambiguous. The issue is properly covered in this answer.
Related
I'm trying to build a parser for Promela in llvm. I have the parser SPIN uses, which is built using yacc, including the input that goes to yacc. Is there a way to use the yacc parser to quickly and painlessly generate a clang/llvm parser? I will be using it to generate call graphs and perform static analysis.
What I need to know now is whether I can use the existing Promela compiler, which was built with yacc, to quickly build a parser (and later, IR generator) using the llvm framework.
Yes, you can re-use the existing YACC-grammar (and if you want even the existing AST) for your project. "Building a parser using the llvm framework" is a bit misleading though because LLVM won't have anything to do with parsing and the AST. LLVM won't enter into it until you generate the LLVM IR and then work with it.
So you either take the existing YACC grammar and the existing AST or you only take the grammar and replace the actions with ones that create your own AST that you've defined yourself. Either way that part won't involve LLVM.
Then you'd write a separate phase that walks the AST and generates LLVM IR using the LLVM API, on which you can then run all the transformations and analyses supported by LLVM.
I'd like to derive exactly that subset of the sources of a dart comiler (dart2js or dartdevc or other) or of a dart analyser that can 1. transform a string of dart code (or better a list of strings each representing a compilation unit) into a typed syntax tree, 2. be translated into js, 3. be run in the browser. Is there a marked subset that fulfills these requirements, which is it, and how can I find it, in general.
Accomplishing #1 is fairly simple using package:analyzer, which is the same static analyzer used to provide IDE hinting and autocomplete, etc. The Dart Team is currently working on unifying their compiler frontends behind on main API, but for now, analyzer should definitely take care of most of what you need.
Here's an example of getting a syntax tree and running analysis on it: https://github.com/thosakwe/analyzer_examples/blob/master/analyze_a_file/analyze_a_file.dart
As for #2, you'll likely have to fork the dart-lang/sdk repo and make your own adjustments to dart2js. It's not published as a standalone package. Otherwise, you can write your own compiler, which is probably not going be fun.
I suppose you'd have to figure out how to get #2 up and running, but hypothetically, if you could compile a JavaScript source, you could just eval it after compilation.
To answer your final question, no, AFAIK, there is no subset of dart2js available that lets you create your own Dart-to-JavaScript compiler.
I am trying to utilize clang tooling library for the purpose of my future tool.
What I would like to do with this tool is:
1. parse all the source code (with includes) and detect any of my keywords in the comments (comments will be some kind of interface between the programmer and my tool, which will do various things with the rest of the source code according to commands placed in the comments).
2. according to commands from the source code, do some refactoring of it
The refactoring itself will be done using clang AST, like from example below:
http://eli.thegreenplace.net/2014/07/29/ast-matchers-and-clang-refactoring-tools
The thing I am looking for currently is how to parse the comments, within the same run of clang tooling procedures. I do not want to make separate step just for parsing the source code, because it have to be already done in tooling library.
Do you know how to somehow get the information about comments included in the source code I am parsing by tooling library?
Try the options -Wdocumentation and associated options (as -fparse-all-comments). If U use some tools (as clang-check or clang-tidy, adds these options in the compile commands db.
This link describes how bytecode can be generated from an AST tree. Basically, it shows how the parsing phase of compilation can be bypassed and the AST be picked up by the java compiler to produce bytecode.
This works well but I would like to be able to generate the AST using javac the way it is without changing its source code and without any framework. Is this possible and has there been anything done like this before?
Thanks in advance for your reply.
So it turns out you cannot compile a tree created by the user using the arbitrary implementations of com.sun.source.tree.*. What can be done though is to print the AST to a string and compile the string in memory using the Java 6 Compiler API.
Is there anyway to use the llvm-clang parser in an incremental/online manner?
Say I'm writing an editor and I want to be able to parse the C++ code I have in front of me.
I don't want to write my own hacked up parser.
I'd like to use something full featured, like llvm-clang.
Is there an easy way to hijack the llvm-clang parser? (And is it fast enough to run it continuously in the background)?
Thanks!
I don't think clang can incrementally parse C++ files, but it's one of this project goals: http://clang.llvm.org/features.html
I've written something similar for my final year project. It wasn't C++ editor, but a Visual Studio plugin, which main task was improving C++ intellisense (like Visual Assist X).
When I was writing this project I've been also thinking about C++ incremental parser, but I haven't found any suitable solution. To solve the C++ intellisense problem I used normal C++ parser from GCC. However it was to slow, to parse file after each code completion request (ctrl+space), just try including boost::spirit. To make this project work properly I parsed files in the background and after each code completion request I compared current file with it's previous version (via diff) to detect changes made from last parsing. Having those changes I updated syntax tree, mostly by adding or removing variables.
Except incremental parsing, there is also another problem with projects like this. Mostly you'll be parsing C++ code which is being edited so it's invalid code. Given the complex C++ grammar, sometimes parser won't be able to recover from syntax errors, so it won't detect correctly some symbols in code.
Another issue are C++ parsers / compilers differences. Let's say I'm using working in Visual Studio and I have used some VC++ compiler specific contruction in my code. Clang parser won't be able to parse it correctly.
For writing something similair to IntelliSense, I would advise you to write your own parser using the LALR parsing algorithm. Since you can save its state in each line so you don't have to reparse the whole file when a file has been editted, which is very fast!
Note that C++ can't be fully expressed in BNF, but I think you could get pretty far with some adjustments. It's ofcourse a lot more work than using Clang's frontend, but you could still use Clang for analysing header files in coöperation with you own written parser.