Partial SSA in LLVM - clang

I came across this concept of partial SSA in LLVM where LLVM identifies two classes of variables: (1) top-level variables are those that cannot be referenced indirectly via a
pointer, i.e., those whose address is never exposed via the address-of operator or returned via a dynamic memory allocation;
(2) address-taken variables are those that have had their address exposed and therefore can be indirectly referenced via a pointer
This definition is verbatim from This paper.
The paper further explains with an example that I can't seem to wrap my head around.
Is there an easier example for this or maybe any other resource I can look into?
Any help would be greatly appreciated.

Related

Is repr(C) a preprocessor directive?

I've seen some Rust codebases use the #[repr(C)] macro (is that what it's called?), however, I couldn't find much information about it but that it sets the type layout in memory to the same layout as 'C's.
Here's what I would like to know: is this a preprocessor directive restricted to the compiler and not the language itself (even though there aren't any other compiler front-ends for Rust), and why does Rust even have a memory layout different than that of Cs? (it's just that I've never had to do this in another language).
Here's a nice situation to demonstrate what I meant: if someone creates another compiler for Rust, are they required to implement this macro, or is it a compiler specific thing?
#[repr(C)] is not a preprocessor directive, since Rust doesn't use a preprocessor 1. It is an attribute. Rust doesn't have a complete specification, but the repr attribute is mentioned in the Rust reference, so it is absolutely a part of the language. Implementation-wise, attributes are parsed the same way all other Rust code is, and are stored in the same AST. Rust has no "attribute pass": attributes are an actual part of the language. If someone else were to implement a Rust compiler, they would need to implement #[repr(C)].
Furthermore, #[repr(C)] can't be implemented without some compiler magic. In the absence of a #[repr(...)], Rust compilers are free to arrange the fields of a struct/enum however they want to (and they do take advantage of this for optimization purposes!).
Rust does have a good reason for using it's own memory layout. If compilers aren't tied to how a struct is written in the source code, they can do optimisations like not storing struct fields that are never read from, reordering fields for better performance, enum tag pooling2, and using spare bits throughout NonZero*s in the struct to store data (the last one isn't happening yet, but might in the future). But the main reason is that Rust has things that just don't make sense in C. For instance, Rust has zero-sized types (like () and [i8; 0]) which can't exist in C, trait vtables, enums with fields, generic types, all of which cause problems when trying to translate them to C.
1 Okay, you could use the C preprocessor with Rust if you really wanted to. Please don't.
2 For example, enum Food { Apple, Pizza(Topping) } enum Topping { Pineapple, Mushroom, Garlic } can be stored in just 1 byte since there are only 4 possible Food values that can be created.
What is this?
It is not a macro it is an attribute.
The book has a good chapter on what macros are and it mentions that there are "Attribute-like macros":
The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:
Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
Attribute-like macros that define custom attributes usable on any item
Function-like macros that look like function calls but operate on the tokens specified as their argument
Attribute-like macros are what you could use like attributes. For example:
#[route(GET, "/")]
fn index() {}
It does look like the repr attribute doesn't it 😃
So what is an attribute then?
Luckily Rust has great resources like rust-by-example which includes:
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
conditional compilation of code
set crate name, version and type (binary or library)
disable lints (warnings)
enable compiler features (macros, glob imports, etc.)
link to a foreign library
mark functions as unit tests
mark functions that will be part of a benchmark
The rust reference is also something you usually look at when you need to know something more in depth. (chapter for attributes)
To the compiler authors out there:
If you were to write a rust compiler, and wanted to support things like the standard library or other crates then you would 100% need to implement these. Because the libraries use these and need them.
Otherwise I guess you could come up with a subset of rust that your compiler supports. But then most people wouldn't use it..
Why does rust not just use the C layout?
The nomicon explains why rust needs to be able to reorder fields of structs for example. For reasons of saving space and being more efficient. It is related to, among other things, generics and monomorphization. In repr(C) fields of structs must be in the same order as the definition.
The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

How can I find all uses of a ValueDecl?

I'd like to take clang AST, analyze how a certain variable is used and do some
source-to-source transformation if a specific usage pattern is recognized.
Particularly, I'm looking for patterns like this:
void *h;
h = create_handler(...);
use_handler(h);
destroy_handler(h);
So far, I am able to detect ValueDecl corresponding to void *h. Next step
would be to find all uses of h and see if they are safe and if
create_handler/destroy_handler properly dominate/post-dominate one another.
Unfortunately, I have no idea how to iterate over h's uses, it seems that
there is no such interface in ValueDecl class.
I'd appreciate it if you could you either suggest how I could find all uses of a
variable in AST, or point me to some clang-based tool dealing with a similar problem.
Thank you!
One can match declRefExprs referencing the variable (using AST matchers). After that, ParentMap could be used to traverse AST backward and find recursively AST nodes which use those declRefExprs. Keep in mind that typically ParentMap is constructed not for the whole AST but for a subtree only (passed as a parameter into the constructor).

Symbol Creation in Z3 Java API

I am new to Z3, so excuse me if the question sounds too easy. I have two questions regrading constants in Z3 Java API.
How does creation of constants happen internally? To understand that I started by tracking
public BitVecExpr mkBVConst(String, int) down to public StringSymbol mkSymbol(String) which eventually calls Native.mkStringSymbol(var1.nCtx(), var2) which generates the variable in var3 in this line long var3 = INTERNALmkStringSymbol(var0, var2);
now because `INTERNALmkStringSymbol' is native I can't see its source. I am wondering about how does it operate. Does anyone know how does it work? Where to view its source?
Another thing I am confused about is the scoping of constants using the API. In the interactive Z3, it is maintained through matching push and pop but through the API, I am not sure how scoping is defined and managed.
Any insights or guidance is much appreciated.!
Z3 is open source, you can view and download the source from https://github.com/z3prover/z3.git. Symbols in Z3 are defined in src/util/symbol.h. You will see that symbols are similar to LISP atoms: they persist through the lifetime of the dll and are unique. So two symbols with the same name will be pointer-equal. The Java API calls into the C API, which is declared in src/api/z3_api.h. The directory src/api contains the API functions, including those that create symbols. When you create an expression constant, such as mkBVConst, it is an expression that is also pointer-unique (if you create the same mkBVConst twice, the unmanaged pointers will be equal. The Java pointers are not the same, but equality testing exploits all of this).
The Solver object has push and pop methods. You can add constraints to the solver object. The life-time of constraints follow the push/pop nesting: a constraint is active until there is a pop that removes the scope where the constraint was added.

Process memory space, how is a value returned from a function?

As execution of a function is completed, and instructions and local variables are removed from the stack, how is the return value stored in memory for the process' main program to use?
How parameters are passed in, and values returned from, an executed function is known as the Calling Convention.
Ignoring runtime environments (Java and .NET, I'm looking at you) and scripted languages (any of them) and concetrating purely on native code on x86, there's a lot of them. You may have come across them if you've ever heard the term cdecl or stdcall amongst others.
Typically return values will be returned in registers. The cdecl convention, for example, returns data either in EAX (for integers and pointers) or ST0 (for floating-point values).
But the calling convention defines more than just the return format. It also defines how arguments are passed on (stack, or register and left to right or right to left) and who is responsible for cleaning the stack up (i.e., the caller or the callee). cdecl for example is an example of a convetion where the caller must clean the stack up, whilst stdcall the callee keeps the stack tidy.
Other conventions include fastcall, pascal and syscall, amongst others. Wikipedia has a good breakdown on them all, as does Microsoft's MSDN notes. You may also want to look at the SO question "stdcall and cdecl" which goes into cdecl and stdcall in detail.
I think a right answer - "It depends".. in general it's called 'calling conventions'.
I think very good overview you can find here
Pay attention that this link is only x86 related, so for other architectures they can be completely different.

How to Assign a Variable Name in a #define (Boost related Mem Leak)?

I've ran Memory Validator on an application we're developing, and I've found that a Macro expressions we've defined is at the root of about 90% of the leaks. #define O_set.
Now, our macros are defined as follows:
#define O_SET_VALUE(ValueType, Value) boost::shared_ptr<ValueType>(new ValueType(Value))
.
.
#define O_set O_SET_VALUE
However, according to the Boost web site (at: http://www.boost.org/doc/libs/1_46_1/libs/smart_ptr/shared_ptr.htm):
A simple guideline that nearly
eliminates the possibility of memory
leaks is: always use a named smart
pointer variable to hold the result of
new. Every occurence of the new
keyword in the code should have the
form: shared_ptr p(new Y); It is,
of course, acceptable to use another
smart pointer in place of shared_ptr
above; having T and Y be the same
type, or passing arguments to Y's
constructor is also OK.
If you observe this guideline, it
naturally follows that you will have
no explicit deletes; try/catch
constructs will be rare.
This leads me to believe that this is indeed the major cause of our memory leaks. Or am I being naive or completely out of my depth here?
Question is, is there a way to work around the mentioned issue, with the above macro #defines?
Update:
I'm using them, for example, like this:
return O_set(int, 1);
_time_stamp(O_set(TO_DateTime, TO_DateTime())) (_time_stamp is a member of a certain class)
I'm working in Windows and used MemoryValidator for tracking the Memory Leaks - according to it there are leaks - as I state, the root of most of which (according to the stack traces) come down to that macro #define.
Smart pointers are tricky. The first thing I would do is to check your code for any 'new' statement which isn't inside either macro.
Then you have to think about how the pointers are being used; if you pass a smart pointer by reference, the reference counter isn't increased, for example.
Another thing to check is all instances of '.get()', which is a big problem if you are working with a legacy code base or other developers who don't understand the point of using smart pointers! (this is more to do with preventing random crashes than memory links persé, but worth checking)
Also, you might want to consider why you are using a macro for all smart pointer creation. Boost supply different smart pointers for different purposes. There isn't a one size fits all solution. Good old std::auto_ptr is fine for most uses, except storing in standard containers, but you knew that already.
The most obvious and overlooked aspect is, do you really need to 'new' something. C++ isn't Java, if you can avoid creating dynamic objects you are better off doing so.
If you are lucky enough to be working with a *NIX platform (you don't mention, sorry) then try the leak checking tool with Valgrind. It's very useful. There are similar tools available for windows, but often using you're software skilz is best.
Good luck.

Resources