Function call analysis using Clang - clang

I am using clang to do some kind of source to source transformation. I would like to do the following:
I have some class of functions in C which are va_arg functions, e.g printf(). There might be a number of calls to printf() in the source file. I want to parse the source code and find all these calls to printf(). Furthermore, I want to find the type of arguments that are passed to printf(). So, if i have something like
int a, b, c;
printf("%d%d%d", a, b, c);
I want to be able to figure out that the particular call to printf is of type printf(char*, int, int, int). I don't particularly care about qualifiers.
Could someone tell me how I should go about doing this in clang? Any example doing anything similar to this would be welcome. If you could even tell me what all classes I should be looking at and in brief tell me the flow that I should follow, I would be very grateful.

You should write an ASTConsumer. The first thing to look at is the code in examples/PrintFunctionNames which is a very simple ASTConsumer.
One way to find all the calls to printf is through the RecursiveASTVisitor, looking for the CallExpr nodes. These nodes have getNumArgs() and getArg(n) which lets you examine the arguments. You can call expr->getType() on those expressions to get their types.

Related

How can I find all uses of a ValueDecl?

I'd like to take clang AST, analyze how a certain variable is used and do some
source-to-source transformation if a specific usage pattern is recognized.
Particularly, I'm looking for patterns like this:
void *h;
h = create_handler(...);
use_handler(h);
destroy_handler(h);
So far, I am able to detect ValueDecl corresponding to void *h. Next step
would be to find all uses of h and see if they are safe and if
create_handler/destroy_handler properly dominate/post-dominate one another.
Unfortunately, I have no idea how to iterate over h's uses, it seems that
there is no such interface in ValueDecl class.
I'd appreciate it if you could you either suggest how I could find all uses of a
variable in AST, or point me to some clang-based tool dealing with a similar problem.
Thank you!
One can match declRefExprs referencing the variable (using AST matchers). After that, ParentMap could be used to traverse AST backward and find recursively AST nodes which use those declRefExprs. Keep in mind that typically ParentMap is constructed not for the whole AST but for a subtree only (passed as a parameter into the constructor).

Why are there two kinds of functions in Elixir?

I'm learning Elixir and wonder why it has two types of function definitions:
functions defined in a module with def, called using myfunction(param1, param2)
anonymous functions defined with fn, called using myfn.(param1, param2)
Only the second kind of function seems to be a first-class object and can be passed as a parameter to other functions. A function defined in a module needs to be wrapped in a fn. There's some syntactic sugar which looks like otherfunction(&myfunction(&1, &2)) in order to make that easy, but why is it necessary in the first place? Why can't we just do otherfunction(myfunction))? Is it only to allow calling module functions without parenthesis like in Ruby? It seems to have inherited this characteristic from Erlang which also has module functions and funs, so does it actually comes from how the Erlang VM works internally?
It there any benefit having two types of functions and converting from one type to another in order to pass them to other functions? Is there a benefit having two different notations to call functions?
Just to clarify the naming, they are both functions. One is a named function and the other is an anonymous one. But you are right, they work somewhat differently and I am going to illustrate why they work like that.
Let's start with the second, fn. fn is a closure, similar to a lambda in Ruby. We can create it as follows:
x = 1
fun = fn y -> x + y end
fun.(2) #=> 3
A function can have multiple clauses too:
x = 1
fun = fn
y when y < 0 -> x - y
y -> x + y
end
fun.(2) #=> 3
fun.(-2) #=> 3
Now, let's try something different. Let's try to define different clauses expecting a different number of arguments:
fn
x, y -> x + y
x -> x
end
** (SyntaxError) cannot mix clauses with different arities in function definition
Oh no! We get an error! We cannot mix clauses that expect a different number of arguments. A function always has a fixed arity.
Now, let's talk about the named functions:
def hello(x, y) do
x + y
end
As expected, they have a name and they can also receive some arguments. However, they are not closures:
x = 1
def hello(y) do
x + y
end
This code will fail to compile because every time you see a def, you get an empty variable scope. That is an important difference between them. I particularly like the fact that each named function starts with a clean slate and you don't get the variables of different scopes all mixed up together. You have a clear boundary.
We could retrieve the named hello function above as an anonymous function. You mentioned it yourself:
other_function(&hello(&1))
And then you asked, why I cannot simply pass it as hello as in other languages? That's because functions in Elixir are identified by name and arity. So a function that expects two arguments is a different function than one that expects three, even if they had the same name. So if we simply passed hello, we would have no idea which hello you actually meant. The one with two, three or four arguments? This is exactly the same reason why we can't create an anonymous function with clauses with different arities.
Since Elixir v0.10.1, we have a syntax to capture named functions:
&hello/1
That will capture the local named function hello with arity 1. Throughout the language and its documentation, it is very common to identify functions in this hello/1 syntax.
This is also why Elixir uses a dot for calling anonymous functions. Since you can't simply pass hello around as a function, instead you need to explicitly capture it, there is a natural distinction between named and anonymous functions and a distinct syntax for calling each makes everything a bit more explicit (Lispers would be familiar with this due to the Lisp 1 vs. Lisp 2 discussion).
Overall, those are the reasons why we have two functions and why they behave differently.
I don't know how useful this will be to anyone else, but the way I finally wrapped my head around the concept was to realize that elixir functions aren't Functions.
Everything in elixir is an expression. So
MyModule.my_function(foo)
is not a function but the expression returned by executing the code in my_function. There is actually only one way to get a "Function" that you can pass around as an argument and that is to use the anonymous function notation.
It is tempting to refer to the fn or & notation as a function pointer, but it is actually much more. It's a closure of the surrounding environment.
If you ask yourself:
Do I need an execution environment or a data value in this spot?
And if you need execution use fn, then most of the difficulties become much
clearer.
I may be wrong since nobody mentioned it, but I was also under the impression that the reason for this is also the ruby heritage of being able to call functions without brackets.
Arity is obviously involved but lets put it aside for a while and use functions without arguments. In a language like javascript where brackets are mandatory, it is easy to make the difference between passing a function as an argument and calling the function. You call it only when you use the brackets.
my_function // argument
(function() {}) // argument
my_function() // function is called
(function() {})() // function is called
As you can see, naming it or not does not make a big difference. But elixir and ruby allow you to call functions without the brackets. This is a design choice which I personally like but it has this side effect you cannot use just the name without the brackets because it could mean you want to call the function. This is what the & is for. If you leave arity appart for a second, prepending your function name with & means that you explicitly want to use this function as an argument, not what this function returns.
Now the anonymous function is bit different in that it is mainly used as an argument. Again this is a design choice but the rational behind it is that it is mainly used by iterators kind of functions which take functions as arguments. So obviously you don't need to use & because they are already considered arguments by default. It is their purpose.
Now the last problem is that sometimes you have to call them in your code, because they are not always used with an iterator kind of function, or you might be coding an iterator yourself. For the little story, since ruby is object oriented, the main way to do it was to use the call method on the object. That way, you could keep the non-mandatory brackets behaviour consistent.
my_lambda.call
my_lambda.call()
my_lambda_with_arguments.call :h2g2, 42
my_lambda_with_arguments.call(:h2g2, 42)
Now somebody came up with a shortcut which basically looks like a method with no name.
my_lambda.()
my_lambda_with_arguments.(:h2g2, 42)
Again, this is a design choice. Now elixir is not object oriented and therefore call not use the first form for sure. I can't speak for José but it looks like the second form was used in elixir because it still looks like a function call with an extra character. It's close enough to a function call.
I did not think about all the pros and cons, but it looks like in both languages you could get away with just the brackets as long as you make brackets mandatory for anonymous functions. It seems like it is:
Mandatory brackets VS Slightly different notation
In both cases you make an exception because you make both behave differently. Since there is a difference, you might as well make it obvious and go for the different notation. The mandatory brackets would look natural in most cases but very confusing when things don't go as planned.
Here you go. Now this might not be the best explanation in the world because I simplified most of the details. Also most of it are design choices and I tried to give a reason for them without judging them. I love elixir, I love ruby, I like the function calls without brackets, but like you, I find the consequences quite misguiding once in a while.
And in elixir, it is just this extra dot, whereas in ruby you have blocks on top of this. Blocks are amazing and I am surprised how much you can do with just blocks, but they only work when you need just one anonymous function which is the last argument. Then since you should be able to deal with other scenarios, here comes the whole method/lambda/proc/block confusion.
Anyway... this is out of scope.
I've never understood why explanations of this are so complicated.
It's really just an exceptionally small distinction combined with the realities of Ruby-style "function execution without parens".
Compare:
def fun1(x, y) do
x + y
end
To:
fun2 = fn
x, y -> x + y
end
While both of these are just identifiers...
fun1 is an identifier that describes a named function defined with def.
fun2 is an identifier that describes a variable (that happens to contain a reference to function).
Consider what that means when you see fun1 or fun2 in some other expression? When evaluating that expression, do you call the referenced function or do you just reference a value out of memory?
There's no good way to know at compile time. Ruby has the luxury of introspecting the variable namespace to find out if a variable binding has shadowed a function at some point in time. Elixir, being compiled, can't really do this. That's what the dot-notation does, it tells Elixir that it should contain a function reference and that it should be called.
And this is really hard. Imagine that there wasn't a dot notation. Consider this code:
val = 5
if :rand.uniform < 0.5 do
val = fn -> 5 end
end
IO.puts val # Does this work?
IO.puts val.() # Or maybe this?
Given the above code, I think it's pretty clear why you have to give Elixir the hint. Imagine if every variable de-reference had to check for a function? Alternatively, imagine what heroics would be necessary to always infer that variable dereference was using a function?
There's an excellent blog post about this behavior: link
Two types of functions
If a module contains this:
fac(0) when N > 0 -> 1;
fac(N) -> N* fac(N-1).
You can’t just cut and paste this into the shell and get the same
result.
It’s because there is a bug in Erlang. Modules in Erlang are sequences
of FORMS. The Erlang shell evaluates a sequence of
EXPRESSIONS. In Erlang FORMS are not EXPRESSIONS.
double(X) -> 2*X. in an Erlang module is a FORM
Double = fun(X) -> 2*X end. in the shell is an EXPRESSION
The two are not the same. This bit of silliness has been Erlang
forever but we didn’t notice it and we learned to live with it.
Dot in calling fn
iex> f = fn(x) -> 2 * x end
#Function<erl_eval.6.17052888>
iex> f.(10)
20
In school I learned to call functions by writing f(10) not f.(10) -
this is “really” a function with a name like Shell.f(10) (it’s a
function defined in the shell) The shell part is implicit so it should
just be called f(10).
If you leave it like this expect to spend the next twenty years of
your life explaining why.
Elixir has optional braces for functions, including functions with 0 arity. Let's see an example of why it makes a separate calling syntax important:
defmodule Insanity do
def dive(), do: fn() -> 1 end
end
Insanity.dive
# #Function<0.16121902/0 in Insanity.dive/0>
Insanity.dive()
# #Function<0.16121902/0 in Insanity.dive/0>
Insanity.dive.()
# 1
Insanity.dive().()
# 1
Without making a difference between 2 types of functions, we can't say what Insanity.dive means: getting a function itself, calling it, or also calling the resulting anonymous function.
fn -> syntax is for using anonymous functions. Doing var.() is just telling elixir that I want you to take that var with a func in it and run it instead of referring to the var as something just holding that function.
Elixir has a this common pattern where instead of having logic inside of a function to see how something should execute, we pattern match different functions based on what kind of input we have. I assume this is why we refer to things by arity in the function_name/1 sense.
It's kind of weird to get used to doing shorthand function definitions (func(&1), etc), but handy when you're trying to pipe or keep your code concise.
In elixir we use def for simply define a function like we do in other languages.
fn creates an anonymous function refer to this for more clarification
Only the second kind of function seems to be a first-class object and can be passed as a parameter to other functions. A function defined in a module needs to be wrapped in a fn. There's some syntactic sugar which looks like otherfunction(myfunction(&1, &2)) in order to make that easy, but why is it necessary in the first place? Why can't we just do otherfunction(myfunction))?
You can do otherfunction(&myfunction/2)
Since elixir can execute functions without the brackets (like myfunction), using otherfunction(myfunction)) it will try to execute myfunction/0.
So, you need to use the capture operator and specify the function, including arity, since you can have different functions with the same name. Thus, &myfunction/2.

How to compare Rails ''executables" before and after refactor?

In C, I could generate an executable, do an extensive rename only refactor, then compare executables again to confirm that the executable did not change. This was very handy to ensure that the refactor did not break anything.
Has anyone done anything similar with Ruby, particularly a Rails app? Strategies and methods would be appreciated. Ideally, I could run a script that output a single file of some sort that was purely bytecode and was not changed by naming changes. I'm guessing JRuby or Rubinus would be helpful here.
I don't think this strategy will work for Ruby. Unlike C, where the compiler throws away the names, most of the things you name in Ruby carry that name with them. That includes classes, modules, constants, and instance variables.
Automated unit and integration tests are the way to go to support Ruby refactoring.
Interesting question -- I like the definitive "yes" answer you can get from this regression strategy, at least for the specific case of rename refactoring.
I'm not expert enough to tell whether you can compile ruby (or at least a subset, without things like eval) but there seem to be some hints at:
http://www.hokstad.com/the-problem-with-compiling-ruby.html
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Supposing that a complete compilation isn't possible, what about an abstract interpretation approach? You could parse the ruby into an AST, emit some kind of C code from the AST, and then compile the C code. The C code would not need to fully capture the behavior of the ruby code. It would only need to be compilable and to be distinct whenever the ruby was distinct. (Actually running it could result in gibberish, or perhaps an immediate memory violation error.)
As a simple example, suppose that ruby supported multiplication and C didn't. Then you could include a static mult function in your C code and translate from:
a = b + c*d
to
a = b + mult(c,d)
and the resulting compiled code would be invariant under name refactoring but would show discrepancies under other sorts of change. The mult function need not actually implement multiplication, you could have one of these instead:
static int mult( int a, int b ) { return a + b; } // pretty close
static int mult( int a, int b ) { return *0; } // not close at all, but still sufficient
and you'd still get the invariance you need as long as the C compiler isn't going to inline the definition. The same sort of translation, from an uncompilable ruby construct to a less functional but distinct C construct, should work for object manipulation and so forth, mapping class operations into C structure references. The key point is just that you want to keep the naming relationships intact while sacrificing actual behavior.
(I wonder whether you could do something with a single C struct that has members (all pointers to the same struct type) named after all the class and property names in the ruby code. Class and object operations would then correspond to nested dereference operations using this single structure. Just a notion.)
Even if you cannot formulate a precise mapping, an imprecise mapping that misses some minor distinctions might still be enough to increase confidence in the original name refactoring.
The quickest way to implement such a scheme might be to map from byte code to C (rather from the ruby AST to C). That would save a lot of parsing, but the mapping would be harder to understand and verify.

Specifications for functions: -spec. Efficiently usage

How could I use -spec word in erlang?
Please give me an idea of efficient usage of this word. Does is stands for documentation purposes only?
I'm try to apply a constraint to function in module by function type specification using -spec, but I've failed - no restrictions have been applied.
-spec attributes are indeed treated by the compiler and the runtime system as documentation. You cannot add any "executable features" to your code using them and the same applies for -type and -opaque attributes.
However they are useful as:
Documentation: they used by EDoc to generate all different forms of documentation for your code. -spec attributes are function signatures which, depending on how much effort you put into them, can make your code more understandable and maintainable. Suppose that your favorite data structure this month is dict(). Consider the following code:
my_function(SomeArg, SomeOtherArg, Dict) ->
...
dict:find(SomeKey, Dict)
...
The variable that is being used as a dict has been named as such. But let's say that you have the following snippet:
my_other_function(NamesDict, PlacesDict) ->
...
R1 = my_function(A, B, NamesDict),
...
R2 = my_function(C, D, PlacesDict),
...
Trying to keep up with this might soon lead to code that repeats this Dict suffix. Even more, you might not even want to remember in the context of my_other_function that the two arguments are dict(). So instead you might want to do this:
-spec my_other_function(dict(), dict()) -> atom().
my_other_function(Names, Places) ->
...
R1 = my_function(A, B, Names),
...
R2 = my_function(C, D, Places),
...
Now it is clear that these arguments should be dict() for the function to work and hopefully everyone will be able to figure that without going deep into the code. But suppose you are using this Name dict() in other places and it stores some particular information that is exposed with different APIs. Then it's a perfect candidate for a -type declaration:
-type names() :: dict().
-spec my_other_function(names(), places()) -> atom().
my_other_function(Names, Places) ->
...
R1 = my_function(A, B, Names),
...
R2 = my_function(C, D, Places),
...
If somebody else makes frequent use of this particular data structure you may want to export it too:
-module(my_module).
-export_type([names/0]).
-type names() :: dict().
Other modules can now refer to this particular data structure:
-module(my_other_module).
-record(my_state, {names :: my_module:names(),
...}).
Finally if you would prefer other developer to not inspect this data structure in any way in their modules, you can declare it as -opaque. Again, this is a "friendly suggestion", as is all the rest of the stuff so far. Or is it...?
Discrepancy detection: If you take time to use -specs and -types you would very much like that these are kept up to date. It is common knowledge that nobody maintains the documentation up to date if there is none watching! Luckily, Dialyzer is watching. Dialyzer can check that in all calls to my_function() the arguments are dict() (it can do this even without your -spec annotations but it's so easier if there are these there too) and scream bloody murder if you call it with something else. It can moreover keep track of these exported types and even report opacity violations. So it's not "just documentation".
Testcase generation: PropEr can use the -spec and -type definitions to automatically check your functions with random testcases. It is capable to make random testcases even from declarations like this one:
-type int_tree() :: {node, integer(), tree(), tree()} | nil.
The brand new way to specify a set of callbacks for a behaviour is by using the familiar -spec syntax. Compiler, Dialyzer and possibly other tools can use this information to check a behaviours implementation. See more in the OTP behaviours code and here
Read more here.
-spec's for functions are specifications which has several places where they help:
They act as documentation of the function. Generating EDoc will pull the specs and make them available in the documentation.
They are a specification for the dialyzer. When the dialyzer runs it will use the specs to determine if the code is wrong in any way. That is, if you spec is wrong - and in some cases it will help the system to understand exactly why the code is wrong too.
They are a valuable tool in the specification of behaviours. There is a new -callback keyword which can be used to do this for behavioural APIs.
They are valuable for constructing a type skeleton of how the program fits together and from where data comes from.
Together with the cousins -type and -opaque you can force certain types to be opaque to pieces of code. That means you are not allowed to see the internal representation on a static verification level. This can in turn help drive modularized code as you are not allowed to tightly couple code pieces.

When should we use FSharpFunc.Adapt?

Looking at the source in FSharp.Core and PowerPack, I see that a lot of higher-order functions that accept a function with two or more parameters use FSharpFunc.Adapt. For example:
let mapi f (arr: ResizeArray<_>) =
let f = FSharpFunc<_,_,_>.Adapt(f)
let len = length arr
let res = new ResizeArray<_>(len)
for i = 0 to len - 1 do
res.Add(f.Invoke(i, arr.[i]))
res
The documentation on FSharpFunc.Adapt is fairly thin. Is this a general best practice that we should be using any time we have a higher-order function with a similar signature? Only if the passed-in function is called multiple times? How much of an optimization is it? Should we be using Adapt everywhere we can, or only rarely?
Thanks for your time.
That's quite interesting! I don't have any official information (and I didn't see this documented anywhere), but here are some thoughts on how the Adapt function might work.
Functions like mapi take curried form of a function, which means that the type of the argument is compiled to something like FSharpFunc<int, FSharpFunc<T, R>>. However, many functions are actually compiled directly as functions of two arguments, so the actual value would typically be FSharpFunc<int, T, R> which inherits from FSharpFunc<int, FSharpFunc<T, R>>.
If you call this function (e.g. f 1 "a") the F# compiler generates something like this:
FSharpFunc<int, string>.InvokeFast<a>(f, 1, "a");
If you look at the InvokeFast function using Reflector, you'll see that it tests if the function is compiled as the optimized version (f :? FSharpFunc<int, T, R>). If yes, then it directly calls Invoke(1, "a") and if not then it needs to make two calls Invoke(1).Invoke("a").
This check is done each time you call a function passed as an argument (it is probably faster to do the check and then use the optimized call, because that's more common).
What the Adapt function does is that it converts any function to FSharpFunc<T1, T2, R> (if the function is not optimized, it creates a wrapper for it, but that's not the case most of the time). The calls to the adapted function will be faster, because they don't need to do the dynamic check every time (the check is done only once inside Adapt).
So, the summary is that Adapt could improve the performance if you're calling a function passed as an argument that takes more than 1 argument a large number of times. As with any optimizations, I wouldn't use this blindly, but it is an interesting thing to be aware of when tuning the performance!
(BTW: Thanks for a very interesting question, I didn't know the compiler does this :-))

Resources