Case Insensitive String Comparison of Boost::Spirit Token Text in Semantic Action - parsing

I've got a tokeniser and a parser. the parser has a special token type, KEYWORD, for keywords (there are ~50). In my parser I want to ensure that the tokens are what I'd expect, so I've got rules for each. Like so:
KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];
This works well enough, but it's not case insensitive (and the grammar I'm trying to handle is!). I'd like to use boost::iequals, but attempts to convert _1 to an std::string result in the following error:
error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')
How can I treat these keywords as strings and ensure they're the expected text irrespective of case?

A little learning went a long way. I added the following to my lexer:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef void type;
};
template <typename Value>
void operator()(Value const& val) const
{
// This modifies the original input string.
typedef boost::iterator_range<std::string::iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
std::for_each(ip.begin(), ip.end(),
[](char& in)
{
in = std::toupper(in);
});
}
};
boost::phoenix::function<normalise_keyword_impl> normalise_keyword;
// The rest...
};
And then used phoenix to bind the action to the keyword token in my constructor, like so:
this->self =
KEYWORD [normalise_keyword(_val)]
// The rest...
;
Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?
I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')
Very peculiar, it appears I have some more learning to do.
Final Solution
Okay, I ended up using this function:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef std::string type;
};
template <typename Value>
std::string operator()(Value const& val) const
{
// Copy the token and update the attribute value.
typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
auto result = std::string(ip.begin(), ip.end());
result = boost::to_upper_copy(result);
return result;
}
};
And this semantic action:
KEYWORD [_val = normalise_keyword(_val)]
With (and this sorted things out), a modified token_type:
typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;
// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);
// ...
The important addition being boost::mpl::vector<std::string> >. The result:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')
I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.

Related

clang-8: getting typedef information from DeclRefExpr node in AST

I have below test code:
typedef void (*funcPtrType)()
funcPtrType FPT;
void myFunc(){
}
int main(){
FPT = myFunc;
FPT();
return 0;
}
And following is the part of AST dump of this code:
My question is, from which API can I get the 'void (*)()' information from DeclRefExpr node?
Already tried dynamic casting this node to VarDecl but from it I could not reach the information I mentioned.
Thanks in advance.
If you have a DeclRefExpr, that is an expression that refers to a declared entity. Call the getDecl method to get the associated ValueDecl, which is the declaration itself. On that object, call getType to get the QualType, which is the type, possibly including cv-qualifiers.
For example:
DeclRefExpr const *dre = ...; // wherever you got it
ValueDecl const *decl = dre->getDecl();
QualType type = decl->getType();
In this case, the type is a typedef. To inspect the underlying type, call getTypePtr to get the unqualified type, then getUnqualifiedDesugaredType to skip typedefs:
clang::Type const *underType = type.getTypePtr()->getUnqualifiedDesugaredType();
You can then call, for example, underType->isPointerType() to find out if it is a pointer type, etc. See the documentation for clang::Type for other ways to query it.
If you want to get a string representation of underType, use the static QualType::print method, something like this:
LangOptions lo;
PrintingPolicy pp(lo);
std::string s;
llvm::raw_string_ostream rso(s);
QualType::print(underType, Qualifiers(), rso, lo, llvm::Twine());
errs() << "type as string: \"" << rso.str() << "\"\n";
For your example, this will print:
type as string: "void (*)()"

How do I find the SourceLocation of the commas between function arguments using libtooling?

My main goal is trying to get macros (or even just the text) before function parameters. For example:
void Foo(_In_ void* p, _Out_ int* x, _Out_cap_(2) int* y);
I need to gracefully handle things like macros that declare parameters (by ignoring them).
#define Example _In_ int x
void Foo(Example);
I've looked at Preprocessor record objects and used Lexer::getSourceText to get the macro names In, Out, etc, but I don't see a clean way to map them back to the function parameters.
My current solution is to record all the macro expansions in the file and then compare their SourceLocation to the ParamVarDecl SourceLocation. This mostly works except I don't know how to skip over things after the parameter.
void Foo(_In_ void* p _Other_, _In_ int y);
Getting the SourceLocation of the comma would work, but I can't find that anywhere.
The title of the questions asks for libclang, but as you use Lexer::getSourceText I assume that it's libTooling. The rest of my answer is viable only in terms of libTooling.
Solution 1
Lexer works on the level of tokens. Comma is also a token, so you can take the end location of a parameter and fetch the next token using Lexer::findNextToken.
Here is a ParmVarDecl (for function parameters) and CallExpr (for function arguments) visit functions that show how to use it:
template <class T> void printNextTokenLocation(T *Node) {
auto NodeEndLocation = Node->getSourceRange().getEnd();
auto &SM = Context->getSourceManager();
auto &LO = Context->getLangOpts();
auto NextToken = Lexer::findNextToken(NodeEndLocation, SM, LO);
if (!NextToken) {
return;
}
auto NextTokenLocation = NextToken->getLocation();
llvm::errs() << NextTokenLocation.printToString(SM) << "\n";
}
bool VisitParmVarDecl(ParmVarDecl *Param) {
printNextTokenLocation(Param);
return true;
}
bool VisitCallExpr(CallExpr *Call) {
for (auto *Arg : Call->arguments()) {
printNextTokenLocation(Arg);
}
return true;
}
For the following code snippet:
#define FOO(x) int x
#define BAR float d
#define MINUS -
#define BLANK
void foo(int a, double b ,
FOO(c) , BAR) {}
int main() {
foo( 42 ,
36.6 , MINUS 10 , BLANK 0.0 );
return 0;
}
it produces the following output (six locations for commas and two for parentheses):
test.cpp:6:15
test.cpp:6:30
test.cpp:7:19
test.cpp:7:24
test.cpp:10:17
test.cpp:11:12
test.cpp:11:28
test.cpp:11:43
This is quite a low-level and error-prone approach though. However, you can change the way you solve the original problem.
Solution 2
Clang stores information about expanded macros in its source locations. You can find related methods in SourceManager (for example, isMacroArgExpansion or isMacroBodyExpansion). As the result, you can visit ParmVarDecl nodes and check their locations for macro expansions.
I would strongly advice moving in the second direction.
I hope this information will be helpful. Happy hacking with Clang!
UPD speaking of attributes, unfortunately, you won't have a lot of choices. Clang does ignore any unknown attribute and this behaviour is not tweakable. If you don't want to patch Clang itself and add your attributes to Attrs.td, then you're limited indeed to tokens and the first approach.

Building call graphs using Clang AST, link parameters to arguments

I am trying to build call graphs using Clang AST.
Is there a way to somehow link the parameters of a function to the arguments of an inner function call?
For example, given the following function:
void chainedIncrement(int *ptr) {
simplePointerIncr(ptr);
for (int i=0;i<3;i++) {
simplePointerIncr(ptr);
}
}
I looking for a way to be able to link ptr from chainedIncrement function to the argument of simplePointerIncr function. Doing this will allow building a call graph.
Maybe there is a way of getting the same id while calling getId() on parameters and arguments.
I tried to use the following AST matcher:
functionDecl(hasDescendant(callExpr(callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");)).bind("outerFunc")
It seems that arguments are of type Expr while function parameters are of type ParmVarDecl.
Assuming that the parameter is passed as-is, without modification to an inner function, is there a way to link them somehow?
Thanks
UPDATE: Added my solution
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");
Here is a simplified example:
int add2(int var) {
return var+2;
}
int caller(int var) {
add2(var);
for (int i=0; i<3; i++) {
add2(var);
}
return var;
}
int main(int argc, const char **argv) {
int ret = 0;
caller(ret);
return 0;
}
use Clang-query to show the matcher result:
clang-query> match callExpr(hasAnyArgument(hasAncestor(functionDecl(hasName("caller")))))
Match #1:
~/main.cpp:5:3: note: "root" binds here
add2(var);
^~~~~~~~~
Match #2:
~/main.cpp:7:5: note: "root" binds here
add2(var);
^~~~~~~~~
2 matches.
It matches the function calls that use the parameter of function caller
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");

Retrieve LHS/RHS value of operator

I'm looking to do something similar to this how to get integer variable name and its value from Expr* in clang using the RecursiveASTVisitor
The goal is to first retrieve all assignment operations then perform my own checks on them, to do taint analysis.
I've overridden the VisitBinaryOperator as such
bool VisitBinaryOperator (BinaryOperator *bOp) {
if ( !bOP->isAssignmentOp() ) {
return true;
}
Expr *LHSexpr = bOp->getLHS();
Expr *RHSexpr = bOp->getRHS();
LHSexpr->dump();
RHSexpr->dump();
}
This RecursiveASTVisitor is being run on Objective C codes, so I do not know what the LHS or RHS type will evaluate to (could even be a function on the RHS?)
Would it be possible to get the text representation of what is on the LHS/RHS out from clang in order to perform regex expression on them??
Sorry, I found something similar that works for this particular case.
Solution:
bool VisitBinaryOperator (BinaryOperator *bOp) {
if ( !bOP->isAssignmentOp() ) {
return true;
}
Expr *LHSexpr = bOp->getLHS();
Expr *RHSexpr = bOp->getRHS();
std::string LHS_string = convertExpressionToString(LHSexpr);
std::string RHS_string = convertExpressionToString(RHSexpr);
return true;
}
std::string convertExpressionToString(Expr *E) {
SourceManager &SM = Context->getSourceManager();
clang::LangOptions lopt;
SourceLocation startLoc = E->getLocStart();
SourceLocation _endLoc = E->getLocEnd();
SourceLocation endLoc = clang::Lexer::getLocForEndOfToken(_endLoc, 0, SM, lopt);
return std::string(SM.getCharacterData(startLoc), SM.getCharacterData(endLoc) - SM.getCharacterData(startLoc));
}
Only thing I'm not very sure about is why _endLoc is required to compute endLoc and how is the Lexer actually working.
EDIT:
Link to the post I found help Getting the source behind clang's AST

Evaluating Mathematical Expressions using Lua

In my previous question I was looking for a way of evaulating complex mathematical expressions in C, most of the suggestions required implementing some type of parser.
However one answer, suggested using Lua for evaluating the expression. I am interested in this approach but I don't know anything about Lua.
Can some one with experience in Lua shed some light?
Specifically what I'd like to know is
Which API if any does Lua provide that can evaluate mathematical expressions passed in as a string? If there is no API to do such a thing, may be some one can shed some light on the linked answer as it seemed like a good approach :)
Thanks
The type of expression I'd like to evaluate is given some user input such as
y = x^2 + 1/x - cos(x)
evaluate y for a range of values of x
It is straightforward to set up a Lua interpreter instance, and pass it expressions to be evaluated, getting back a function to call that evaluates the expression. You can even let the user have variables...
Here's the sample code I cooked up and edited into my other answer. It is probably better placed on a question tagged Lua in any case, so I'm adding it here as well. I compiled this and tried it for a few cases, but it certainly should not be trusted in production code without some attention to error handling and so forth. All the usual caveats apply here.
I compiled and tested this on Windows using Lua 5.1.4 from Lua for Windows. On other platforms, you'll have to find Lua from your usual source, or from www.lua.org.
Update: This sample uses simple and direct techniques to hide the full power and complexity of the Lua API behind as simple as possible an interface. It is probably useful as-is, but could be improved in a number of ways.
I would encourage readers to look into the much more production-ready ae library by lhf for code that takes advantage of the API to avoid some of the quick and dirty string manipulation I've used. His library also promotes the math library into the global name space so that the user can say sin(x) or 2 * pi without having to say math.sin and so forth.
Public interface to LE
Here is the file le.h:
/* Public API for the LE library.
*/
int le_init();
int le_loadexpr(char *expr, char **pmsg);
double le_eval(int cookie, char **pmsg);
void le_unref(int cookie);
void le_setvar(char *name, double value);
double le_getvar(char *name);
Sample code using LE
Here is the file t-le.c, demonstrating a simple use of this library. It takes its single command-line argument, loads it as an expression, and evaluates it with the global variable x changing from 0.0 to 1.0 in 11 steps:
#include <stdio.h>
#include "le.h"
int main(int argc, char **argv)
{
int cookie;
int i;
char *msg = NULL;
if (!le_init()) {
printf("can't init LE\n");
return 1;
}
if (argc<2) {
printf("Usage: t-le \"expression\"\n");
return 1;
}
cookie = le_loadexpr(argv[1], &msg);
if (msg) {
printf("can't load: %s\n", msg);
free(msg);
return 1;
}
printf(" x %s\n"
"------ --------\n", argv[1]);
for (i=0; i<11; ++i) {
double x = i/10.;
double y;
le_setvar("x",x);
y = le_eval(cookie, &msg);
if (msg) {
printf("can't eval: %s\n", msg);
free(msg);
return 1;
}
printf("%6.2f %.3f\n", x,y);
}
}
Here is some output from t-le:
E:...>t-le "math.sin(math.pi * x)"
x math.sin(math.pi * x)
------ --------
0.00 0.000
0.10 0.309
0.20 0.588
0.30 0.809
0.40 0.951
0.50 1.000
0.60 0.951
0.70 0.809
0.80 0.588
0.90 0.309
1.00 0.000
E:...>
Implementation of LE
Here is le.c, implementing the Lua Expression evaluator:
#include <lua.h>
#include <lauxlib.h>
#include <stdlib.h>
#include <string.h>
static lua_State *L = NULL;
/* Initialize the LE library by creating a Lua state.
*
* The new Lua interpreter state has the "usual" standard libraries
* open.
*/
int le_init()
{
L = luaL_newstate();
if (L)
luaL_openlibs(L);
return !!L;
}
/* Load an expression, returning a cookie that can be used later to
* select this expression for evaluation by le_eval(). Note that
* le_unref() must eventually be called to free the expression.
*
* The cookie is a lua_ref() reference to a function that evaluates the
* expression when called. Any variables in the expression are assumed
* to refer to the global environment, which is _G in the interpreter.
* A refinement might be to isolate the function envioronment from the
* globals.
*
* The implementation rewrites the expr as "return "..expr so that the
* anonymous function actually produced by lua_load() looks like:
*
* function() return expr end
*
*
* If there is an error and the pmsg parameter is non-NULL, the char *
* it points to is filled with an error message. The message is
* allocated by strdup() so the caller is responsible for freeing the
* storage.
*
* Returns a valid cookie or the constant LUA_NOREF (-2).
*/
int le_loadexpr(char *expr, char **pmsg)
{
int err;
char *buf;
if (!L) {
if (pmsg)
*pmsg = strdup("LE library not initialized");
return LUA_NOREF;
}
buf = malloc(strlen(expr)+8);
if (!buf) {
if (pmsg)
*pmsg = strdup("Insufficient memory");
return LUA_NOREF;
}
strcpy(buf, "return ");
strcat(buf, expr);
err = luaL_loadstring(L,buf);
free(buf);
if (err) {
if (pmsg)
*pmsg = strdup(lua_tostring(L,-1));
lua_pop(L,1);
return LUA_NOREF;
}
if (pmsg)
*pmsg = NULL;
return luaL_ref(L, LUA_REGISTRYINDEX);
}
/* Evaluate the loaded expression.
*
* If there is an error and the pmsg parameter is non-NULL, the char *
* it points to is filled with an error message. The message is
* allocated by strdup() so the caller is responsible for freeing the
* storage.
*
* Returns the result or 0 on error.
*/
double le_eval(int cookie, char **pmsg)
{
int err;
double ret;
if (!L) {
if (pmsg)
*pmsg = strdup("LE library not initialized");
return 0;
}
lua_rawgeti(L, LUA_REGISTRYINDEX, cookie);
err = lua_pcall(L,0,1,0);
if (err) {
if (pmsg)
*pmsg = strdup(lua_tostring(L,-1));
lua_pop(L,1);
return 0;
}
if (pmsg)
*pmsg = NULL;
ret = (double)lua_tonumber(L,-1);
lua_pop(L,1);
return ret;
}
/* Free the loaded expression.
*/
void le_unref(int cookie)
{
if (!L)
return;
luaL_unref(L, LUA_REGISTRYINDEX, cookie);
}
/* Set a variable for use in an expression.
*/
void le_setvar(char *name, double value)
{
if (!L)
return;
lua_pushnumber(L,value);
lua_setglobal(L,name);
}
/* Retrieve the current value of a variable.
*/
double le_getvar(char *name)
{
double ret;
if (!L)
return 0;
lua_getglobal(L,name);
ret = (double)lua_tonumber(L,-1);
lua_pop(L,1);
return ret;
}
Remarks
The above sample consists of 189 lines of code total, including a spattering of comments, blank lines, and the demonstration. Not bad for a quick function evaluator that knows how to evaluate reasonably arbitrary expressions of one variable, and has rich library of standard math functions at its beck and call.
You have a Turing-complete language underneath it all, and it would be an easy extension to allow the user to define complete functions as well as to evaluate simple expressions.
Since you're lazy, like most programmers, here's a link to a simple example that you can use to parse some arbitrary code using Lua. From there, it should be simple to create your expression parser.
This is for Lua users that are looking for a Lua equivalent of "eval".
The magic word used to be loadstring but it is now, since Lua 5.2, an upgraded version of load.
i=0
f = load("i = i + 1") -- f is a function
f() ; print(i) -- will produce 1
f() ; print(i) -- will produce 2
Another example, that delivers a value :
f=load('return 2+3')
print(f()) -- print 5
As a quick-and-dirty way to do, you can consider the following equivalent of eval(s), where s is a string to evaluate :
load(s)()
As always, eval mechanisms should be avoided when possible since they are expensive and produce a code difficult to read.
I personally use this mechanism with LuaTex/LuaLatex to make math operations in Latex.
The Lua documentation contains a section titled The Application Programming Interface which describes how to call Lua from your C program. The documentation for Lua is very good and you may even be able to find an example of what you want to do in there.
It's a big world in there, so whether you choose your own parsing solution or an embeddable interpreter like Lua, you're going to have some work to do!
function calc(operation)
return load("return " .. operation)()
end

Resources