Get clang/llvm parser from yacc parser - parsing

I'm trying to build a parser for Promela in llvm. I have the parser SPIN uses, which is built using yacc, including the input that goes to yacc. Is there a way to use the yacc parser to quickly and painlessly generate a clang/llvm parser? I will be using it to generate call graphs and perform static analysis.

What I need to know now is whether I can use the existing Promela compiler, which was built with yacc, to quickly build a parser (and later, IR generator) using the llvm framework.
Yes, you can re-use the existing YACC-grammar (and if you want even the existing AST) for your project. "Building a parser using the llvm framework" is a bit misleading though because LLVM won't have anything to do with parsing and the AST. LLVM won't enter into it until you generate the LLVM IR and then work with it.
So you either take the existing YACC grammar and the existing AST or you only take the grammar and replace the actions with ones that create your own AST that you've defined yourself. Either way that part won't involve LLVM.
Then you'd write a separate phase that walks the AST and generates LLVM IR using the LLVM API, on which you can then run all the transformations and analyses supported by LLVM.

Related

Enable only syntactic parsing in clang

I am trying to implement a clang tool that does syntactic analysis using ASTMatcher API. I am trying to find out how to specify extra flags for clang to disable semantic checks. I know clang builds a giant AST which includes system headers. Is there any way to parse source code while disabling semantic checks which give rise to unknown type errors? I just want to analyze the syntactic integrity of the source code of the given file. So far, I have tried to get around this problem by modifying the DSL to check whether the matching code is from the main file:
cxxRecordDecl(isExpansionInMainFile()).bind("class");
But this doesn't stop clang from looking into the header files.
Unfortunately, it is impossible to use plain syntactic analysis without sema. The problem is not specifically with clang, but with all of the parsers for C++ out there. Without simultaneous semantic analysis, any syntactic analysis is ambiguous. The issue is properly covered in this answer.

Bison C++ GLR parser using variants

I am currently creating a parser using bison, which uses the variant feature heavily, since my grammar is not LALR(1) I want to use the GLR option.
When I try to do so I get the following error:
error: '"glr.cc"' does not support '%define api.value.type variant'
%define api.value.type variant
^^^^^^^^^^^^^^
What am I doing wrong?
Note: The answer below was valid when written and for the subsequent four and a half years. However, in Bison v3.8 (released September 7, 2021), a new experimental C++ GLR implementation was included which does support variant semantic types. You can test this skeleton, if you have updated your bison installation to version 3.8, by adding the directive %skeleton "glr2.cc". The Changelog indicates:
It will eventually replace "glr.cc". However we need user feedback on this skeleton. Please report your results and comments about it.
To me, this suggests that it should not yet be used in production code, but undoubtedly this warning will become invalid sometime in the next four years. In the meantime, use your own judgement or read the answer below from 2017.
You are trying to build a GLR parser using the C++ API with a semantic type which is not POD, and that is not supported by the current C++ Bison GLR implementation.
In particular, the variant type used by Bison's C++ API is not POD, and so it cannot be used in a GLR parser, as the error message states.
The only workaround I know of is to use C-style discriminated unions with a tag field and a union.
If you look at the "examples/c++/glr/c++types.yy", you will know this can be solved by using latest version with skeleton "glr2.cc" by
%require "3.8"
%skeleton "glr2.cc"
Languages are measured by LR(k), for some k. Languages are not measured by the term GLR. GLR is a parsing technique.

AST of whole program

I would like to do transformations on AST of a c program but I need to have access to all ASTs created for the program to do right changes. LLVM processes one translation unit at a time and because of it, I do not have access to AST of all the translation units at the same time. Do you have any suggestion how I can access all the ASTs created for a program, do analysis on the ASTs and do modifications on the ASTs?
As a summary:
I need to have access to ASTs of the program at the same time.
Do analysis on ASTs.
Modify ASTs based on my analysis and create llvm IR from modified ASTs.
You can try using llvm-link on all of your generated .ll files (from clang with -S -emit-llvm) to create one large llvm source.
You have access to everything at that point.

Multiple language parser generator

Is there a parser generator that can take a single grammar and create a parser in both c# and javascript?
I've tried using ANTLR, but I have yet to get it into Visual Studio (lackluster/outdated documentation and packages).
The end goal is that I can manage a single grammar (for a subset of SQL; specifically select statements and a few new keywords specific to my problem domain) but generate two parsers (c#/javascript).
Any help is much appreciated.
Is there a parser generator that can take a single grammar and create a parser in both c# and javascript?
The only one I am aware of is ANTLR. Note that ANTLR will not generate both a JavaScript- and C# based parser in one go though. You will have to change (at least) one option in the grammar and invoke org.antlr.Tool to generate a parser for your other target language.
I've tried using ANTLR, but I have yet to get it into Visual Studio
Then don't use Visual Studio, but use your favorite text editor (and use org.antlr.Tool from the console), or ANTLRWorks.
There's canopy, which targets javascript, ruby, java, and python from PEG
My AGL parser builder is written in Kotlin common, so it can be used on any Kotlin target (JVM, JavaScript, native code, etc).
https://medium.com/#dr.david.h.akehurst/a-kotlin-multi-platform-parser-usable-from-a-jvm-or-javascript-59e870832a79
Unfortunately, Kotlin does not yet target .net....but maybe it will come in the future.
docopt let you describe your help message in a string respecting some common convention and that's how all commands, options and arguments will be defined.
Docopt has many official implementations: python, bash, C#, rust, ruby, C++, Go, R, Julia, Nim, Hashkell, PHP, C, F#, CoffeeScript, Swift, Scala, D, Java, Clojure, Tcl, Lua

What's the relationshiop between Xtext and ANTLR?

I heard that Xtext ultimately uses ANTLR but their grammar specification files have somewhat different formats. So what's the relationship between the two?
Xtext relies on the Antlr parser generator for the parsing of input files. On top of that the framework provides lot's of added value such as strongly typed ASTs, abstractions for linking and static analysis as well as IDE integration for Eclipse.
For that purpose, Xtext generates two Antlr grammars. One for the production parsing where the actual AST is produced, and a second grammar that is used to consume events to compute the content proposals for the Eclipse editor.
ANTLR grammar is generated from Xtext. You may find it in src-gen/org/example/dsl/parser/antlr/internal/InternalDsl.g.

Resources