What I'm doing?
I'm building a C code analyzer. to do some checks at the compile time. and if passed, compile it.
What do I want?
for my goal, I should lex and parse user source code. maybe in the future, I write a lexer and parser for this, but now I just want some tools to do this for me and just give me the final AST, so I can go through that and do my analysis.
so: I want something like a library that gets some source code in c, parses it, and gets me an ast structure.
What I did?
I searched for some tools to do that. it seems clang has some facilities and libraries for this, but I couldn't find any helpful thing.
Does anyone know some useful ways or things for this goal?
thank you for helping me!
Related
I want to use myLib for the purposes of recalling strings or tables, as a separately updatable package to my main program. I have looked at some programs that use it but I'm too silly to understand what is going on, could someone provide an explanation for how to achieve this for someone who hasn't got a clue what is going on?
I am trying to parse Haskell source code and generate a decision tree to analyze different paths Haskell programs can take.
haskell-src-exts gives a reasonable representation, but does not have any type information associated with it.
Does GHC or some other tool provide that functionality?
There's no tool except GHC that is particularly adept at typechecking Haskell source at the moment. A haskell-type-exts was in development to match src-exts, but it was never completed.
So you can use a reasonable wrapper to the GHC API, such as hint, and invoke it on the subexpressions you want to check using its type inference api.
This is a rather painful approach, but I can't think of much better. If you're only interested in working on haskell-like code as an exercise, you could instead import the PureScript compiler as a library, and then you'll be able to get a fully type annotated syntax tree in a more reasonable way.
Alternately, you can try to navigate the thicket of the GHC api itself to get fully typechecked source...
If you choose to go that route, this answer may get you started.
I am looking for a Go library providing CFG parsing (preferably not in Chomsky Normal Form). Has anybody heard of anything, or should I write it ? :)
Do you know about goyacc?. Although it's not a library, but a code generator. Anyway it supports CFGs and it's IMO a pretty standard way to handle such tasks. (?)
I cannot help you specifically with CFGs, but the Go Dashboard is a good central list of Go libraries.
Looking over it for parsers, two look helpful at first glace:
go-parse, modeled after Haskell's Parsec, and
peg for Parsing Expression Grammars.
I have been trying to obtain ASTs from Clang but I have not been successfully so far. I found a one year old question here at stack that mentions about two other ways to obtain the ast using Clang which are:
./llvmc -cc1 -ast-dump file.c
./llvmc -cc1 -ast-print file.c
On this question doxygen is mentioned and a representation where an ast is given but I am mostly looking for one on some textual form such as XML so that further analysis can be performed.
lastly there was another question here on stack about exactly XML import but it was discontinued for several reasons also mentioned.
My question thus is, which version and how can I use it from the console to obtain ast related information for a given code in C? I believe this to be a very painless one line command code like those above but the documentation index did not refer anything about ast from as much as I have read and the only one at llvmc I found was about writing an AST by hand which is not really what I am looking for.
I tried all of the commands above but they all already fail on version 2.9 and I already found out llvm changes a whole lot between each version.
Thank you.
OP says "open to other suggestions as well".
He might consider our DMS Software Reengineering Toolkit with its C Front End.
While it would be pretty easy to exhibit a C AST produced by this, it is easier to show an ObjectiveC AST [already at SO] produced by DMS using the same C front end (ObjectiveC is a dialect of C).
See https://stackoverflow.com/a/10749970/120163 DMS can produce an XML equivalent of this, too.
We don't recommend exporting trees as text, because real trees for real code are simply enormous and are poorly manipulated as text or XML objects in our experience. One usually needs machinery beyond parsing. See my discussion about Life After Parsing
I am currently working on a parser and it seems that I have made a few mistakes druing the
follow set calculation. So I was wondering if someone know a good tool to calculate follow and first sets so I could skip/reevaluate this error prone part of the parser construction.
Take a look at http://hackingoff.com/compilers/predict-first-follow-set
It's an awesome tool to compute first and follow sets in a grammar. also, you can check your answer with this visualization tools:
http://smlweb.cpsc.ucalgary.ca/start.html
I found my mistake by comparing my first/follow-sets with the one generated by this web-app
Most parser generators that I've encountered don't have obvious means to dump this information, let alone dump it in a readable way. (I built one that does, for the reason you are suggesting, but it isn't available by itself and I doubt you want the rest of the baggage).
If your parser definition doesn't work, you mostly don't need to know these things to debug it. Staring at the rules amazingly enough helps; it also helps to build the two smallest grammar instances you can think of, one being something you expect to be accepted, and the other being a slight variant that should be rejected.
In spite of having a parser generator that will dump this information, I rarely resort to using it to debug grammars, and I've built 20-30 pretty big grammars with it.