I want to make target independent IR with LLVM.
clang -emit-llvm -S source.c -o source.ll
in source.ll
target datalayout = "e-m:e-i64:..."
target triple = "x86_64-pc-linux-gnu"
...
LLVM IR is said to be Target-independent, but the properties of the target are specified in the actual IR file.
How can I create an LLVM IR without this Target property?
Short answer: you cannot
Long answer: target-neutrality is the property of input language, not the LLVM IR. While in theory it is possible to make more or less target-neutral LLVM IR for some inputs it is not possible for C/C++ inputs. I will only mention few things that prevents us from having such LLVM IR:
Preprocessor. Target-specific #ifdef clauses obviously make resulting IR target-specific
Pointer sizes. Think about expressions like sizeof(void*). These are target-dependent compile-time constants (yes, there are ways to defer calculation of these constants later on, but this is not something frontends are prepared to deal with, this also hinders many optimizations)
Struct layout. Partly it depends on 2. (think about struct { int foo; void* bar; }
Various ABI-related things like necessary support steps for argument / result passing, etc.
I will not mention target-specific things like vectors, builtins for target-specific instructions sets, etc.
Related
I've seen some Rust codebases use the #[repr(C)] macro (is that what it's called?), however, I couldn't find much information about it but that it sets the type layout in memory to the same layout as 'C's.
Here's what I would like to know: is this a preprocessor directive restricted to the compiler and not the language itself (even though there aren't any other compiler front-ends for Rust), and why does Rust even have a memory layout different than that of Cs? (it's just that I've never had to do this in another language).
Here's a nice situation to demonstrate what I meant: if someone creates another compiler for Rust, are they required to implement this macro, or is it a compiler specific thing?
#[repr(C)] is not a preprocessor directive, since Rust doesn't use a preprocessor 1. It is an attribute. Rust doesn't have a complete specification, but the repr attribute is mentioned in the Rust reference, so it is absolutely a part of the language. Implementation-wise, attributes are parsed the same way all other Rust code is, and are stored in the same AST. Rust has no "attribute pass": attributes are an actual part of the language. If someone else were to implement a Rust compiler, they would need to implement #[repr(C)].
Furthermore, #[repr(C)] can't be implemented without some compiler magic. In the absence of a #[repr(...)], Rust compilers are free to arrange the fields of a struct/enum however they want to (and they do take advantage of this for optimization purposes!).
Rust does have a good reason for using it's own memory layout. If compilers aren't tied to how a struct is written in the source code, they can do optimisations like not storing struct fields that are never read from, reordering fields for better performance, enum tag pooling2, and using spare bits throughout NonZero*s in the struct to store data (the last one isn't happening yet, but might in the future). But the main reason is that Rust has things that just don't make sense in C. For instance, Rust has zero-sized types (like () and [i8; 0]) which can't exist in C, trait vtables, enums with fields, generic types, all of which cause problems when trying to translate them to C.
1 Okay, you could use the C preprocessor with Rust if you really wanted to. Please don't.
2 For example, enum Food { Apple, Pizza(Topping) } enum Topping { Pineapple, Mushroom, Garlic } can be stored in just 1 byte since there are only 4 possible Food values that can be created.
What is this?
It is not a macro it is an attribute.
The book has a good chapter on what macros are and it mentions that there are "Attribute-like macros":
The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:
Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
Attribute-like macros that define custom attributes usable on any item
Function-like macros that look like function calls but operate on the tokens specified as their argument
Attribute-like macros are what you could use like attributes. For example:
#[route(GET, "/")]
fn index() {}
It does look like the repr attribute doesn't it 😃
So what is an attribute then?
Luckily Rust has great resources like rust-by-example which includes:
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
conditional compilation of code
set crate name, version and type (binary or library)
disable lints (warnings)
enable compiler features (macros, glob imports, etc.)
link to a foreign library
mark functions as unit tests
mark functions that will be part of a benchmark
The rust reference is also something you usually look at when you need to know something more in depth. (chapter for attributes)
To the compiler authors out there:
If you were to write a rust compiler, and wanted to support things like the standard library or other crates then you would 100% need to implement these. Because the libraries use these and need them.
Otherwise I guess you could come up with a subset of rust that your compiler supports. But then most people wouldn't use it..
Why does rust not just use the C layout?
The nomicon explains why rust needs to be able to reorder fields of structs for example. For reasons of saving space and being more efficient. It is related to, among other things, generics and monomorphization. In repr(C) fields of structs must be in the same order as the definition.
The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.
In my objdump -t output, I see the following two lines:
00000000000004d2 l F .text.unlikely 00000000000000ec function-signature-goes-here [clone .cold.427]
and
00000000000018e0 g F .text 0000000000000690 function-signature-goes-here
I know l means local and g means global. I also know that .text is a section, or a type of section, in an object file, containing compiled program instructions. But what is .text.unlikely? Assuming it's a different section (or type-of-section) from .text - what's the difference?
In my GCC v5.4.0 manpage, I found the following switch:
-freorder-functions
which says:
Reorder functions in the object file in order to improve code
locality. This is implemented by using special subsections
".text.hot" for most frequently executed functions and
".text.unlikely" for unlikely executed functions. Reordering is done
by the linker so object file format must support named sections and
linker must place them in a reasonable way.
Also profile feedback must be available to make this option effective.
See -fprofile-arcs for details.
Enabled at levels -O2, -O3, -Os.
Looks like the compiler was run with optimization flags or that switch for this binary, and functions are organized in subsections to optimize spatial locality.
i test clang and compile a simple C file including struct asigning.when I see the LLVM code ,there is llvm.memcpy.p0i8.p0i8.i64 ,where is from ? I not see the defination only see its declare as a fucntion.
It is a LLVM intrinsic function. As per the language reference:
LLVM provides intrinsics for a few important standard C library
functions. These intrinsics allow source-language front-ends to pass
information about the alignment of the pointer arguments to the code
generator, providing opportunity for more efficient code generation.
The llvm.memcpy intrinsic specifically:
The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source
location to the destination location.
Note that, unlike the standard libc function, the llvm.memcpy.*
intrinsics do not return a value, takes extra isvolatile arguments and
the pointers can be in specified address spaces.
I'm dumping the AST of some headers like this:
clang -cc1 -ast-dump -fblocks header.h
However, any #defines on the header are not showing on the dump. Is there a way of adding them?
It's true, #defines are handled by the preprocessor, not the compiler. So you need a preprocessor parser stage. I know of two:
Boost Wave can preprocess the input for you, and/or give you hooks to trigger on macro definitions or uses.
The Clang tool pp-trace uses a Clang library that can do callbacks on many preprocessor events, including macro definitions.
I am parallelizing an existing FORTRAN application. I don't want to directly change parts of its code so I am using preprocessor directives to accomplish my goal. This way I am able to maintain the readability of the code and I won't induce errors in parts of the code that have already been tested. However, when I try to preprocess my source with the GNU C preprocessor I get the following error message (gcc version 4.7.2 (Debian 4.7.2-5)):
test.f:9:0: error: detected recursion whilst expanding macro "ARR""
This simple test program demonstrates my problem:
PROGRAM TEST
IMPLICIT NONE
INTEGER I,OFFSET,ARR(10)
#define ARR(I) ARR(OFFSET+I)
DO I=1,10
ARR(I)=I
END DO
#undef ARR(I)
END PROGRAM TEST
This is the commandline output:
testing$ gfortran -cpp -E test.f
# 1 "test.f"
# 1 "<command-line>"
# 1 "test.f"
PROGRAM TEST
[...]
test.f:9:0: error: detected recursion whilst expanding macro "ARR"
DO I=1,10
ARR(OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+OFFSET+I)=I
END DO
[...]
END PROGRAM TEST
This site provides some information on the preprocessor I am using:
http://tigcc.ticalc.org/doc/cpp.html#SEC10
As it seems I am using a function-like macro with macro arguments.
Why is the preprocessor detecting a recursion? [EDIT] - Maybe because I use the same name for Makro and Identifier?
Why isn't the preprocessor capable of interpreting upper case directives (#DEFINE instead of #define)? - I am asking, because I haven't had this problem with the ifort preprocessor.
BTW: I am able to preprocess the original code either using the ifort preprocessor -fpp, or by changing the source in the following way:
PROGRAM TEST
IMPLICIT NONE
INTEGER I,OFFSET,ARR(10)
#define ARR_T(I) ARR(OFFSET+I)
DO I=1,10
ARR_T(I)=I
END DO
#undef ARR_T(I)
END PROGRAM TEST
Why is the preprocessor detecting a recursion? [EDIT] - Maybe because I use the same name for Makro and Identifier?
The preprocessor is detecting recursion because your macro name and array name are the same.
Why isn't the preprocessor capable of interpreting upper case directives (#DEFINE instead of #define)? - I am asking, because I haven't had this problem with the ifort preprocessor.
When using gfortran, you are using a C preprocessor. #DEFINE is not a recognized preprocessor directive in C. No idea about ifort. I thought in ifort you had to prefix macros with !$MS or !$DEC.
Your change to the program to get it to work for ifort will also work for gfortran.