Force clang to generate intrinsic cos - clang

Compile cos.c
void func() {
double a = __builtin_cos(3.0);
}
using
clang -S -emit-llvm -c cos.c
I've got
define dso_local void #func() {
%1 = alloca double, align 8
%2 = call double #cos(double 3.000000e+00)
store double %2, double* %1, align 8
ret void
}
declare dso_local double #cos(double)
But I want to obtain the llvm intrinsics #llvm.fcos.f64 for cos instead of #cos, i.e. the generated code should be like that
...
%2 = call double #llvm.fcos.f64(double 3.000000e+00)
...
}
declare double #llvm.cos.f64(double)
How can I force clang to do that? Maybe I should use another function instead of __builtin_cos?

With -ffast-math (which implies -fno-math-errno), clang -O3 will inline __builtin_cos to #llvm.cos.f64
double func(double in) {
double a = __builtin_cos(in);
return a;
}
clang -O3 -ffast-math -emit-llvm on Godbolt (with debug stuff removed)
define dso_local double #_Z4funcd(double) local_unnamed_addr #0 !dbg !7 {
%2 = tail call fast double #llvm.cos.f64(double %0), !dbg !15
ret double %2, !dbg !17
}

Related

Useless clang temporary in LLVM for `return 0` in simple C program [duplicate]

Here's a simple C file with an enum definition and a main function:
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
return 0;
}
It transpiles to the following LLVM IR:
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4
ret i32 0
}
%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?
This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this
int factorial(int n){
int result;
if(n < 2)
result = 1;
else{
result = n * factorial(n-1);
}
return result;
}
You'd probably do this
int factorial(int n){
if(n < 2)
return 1;
return n * factorial(n-1);
}
Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.
Modified code,
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
if(d) return 1;
return 0;
}
IR,
define dso_local i32 #main() #0 !dbg !15 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4, !dbg !22
%3 = load i32, i32* %2, align 4, !dbg !23
%4 = icmp ne i32 %3, 0, !dbg !23
br i1 %4, label %5, label %6, !dbg !25
5: ; preds = %0
store i32 1, i32* %1, align 4, !dbg !26
br label %7, !dbg !26
6: ; preds = %0
store i32 0, i32* %1, align 4, !dbg !27
br label %7, !dbg !27
7: ; preds = %6, %5
%8 = load i32, i32* %1, align 4, !dbg !28
ret i32 %8, !dbg !28
}
Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.
Why does this matter — what's the actual problem?
I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.
In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.
Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.
The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.
That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:
main: # #main
xor eax, eax
ret

Why doesn't Clang generate bitcode of some library functions in libstdc++?

When compiling this CPP file that using the list library by the command clang++ list-simple-test.cpp -c -emit-llvm:
// list1.cpp
#include <list>
using namespace std;
int main(int argc, char **argv)
{
int x = 1;
list<int*> alist;
alist.push_back(&x);
return x;
}
I notice that some functions, like _ZNSt8__detail15_List_node_base7_M_hookEPS0_ is generated without a function body:
; Function Attrs: nounwind
declare dso_local void #_ZNSt8__detail15_List_node_base7_M_hookEPS0_(%"struct.std::__detail::_List_node_base"*, %"struct.std::__detail::_List_node_base"*) #5
While most of the other functions are generated with a complete body, for example, like the function _ZNSt7__cxx1110_List_baseIPiSaIS1_EE11_M_inc_sizeEm below:
; Function Attrs: noinline nounwind optnone uwtable
define linkonce_odr dso_local void #_ZNSt7__cxx1110_List_baseIPiSaIS1_EE11_M_inc_sizeEm(%"class.std::__cxx11::_List_base"*, i64) #1 comdat align 2 {
%3 = alloca %"class.std::__cxx11::_List_base"*, align 8
%4 = alloca i64, align 8
store %"class.std::__cxx11::_List_base"* %0, %"class.std::__cxx11::_List_base"** %3, align 8
store i64 %1, i64* %4, align 8
%5 = load %"class.std::__cxx11::_List_base"*, %"class.std::__cxx11::_List_base"** %3, align 8
%6 = load i64, i64* %4, align 8
%7 = getelementptr inbounds %"class.std::__cxx11::_List_base", %"class.std::__cxx11::_List_base"* %5, i32 0, i32 0
%8 = getelementptr inbounds %"struct.std::__cxx11::_List_base<int *, std::allocator<int *> >::_List_impl", %"struct.std::__cxx11::_List_base<int *, std::allocator<int *> >::_List_impl"* %7, i32 0, i32 0
%9 = getelementptr inbounds %"struct.std::__detail::_List_node_header", %"struct.std::__detail::_List_node_header"* %8, i32 0, i32 1
%10 = load i64, i64* %9, align 8
%11 = add i64 %10, %6
store i64 %11, i64* %9, align 8
ret void
}
I understand that those functions are from libstdc++.so. But why does Clang generate the body for some functions, but not the other?
Does anybody know how to make Clang generate the body of _ZNSt8__detail15_List_node_base7_M_hookEPS0_ as well?
Thank you very much for reading my question! I'm writing a static analysis tool, which needs to analyze the body of _ZNSt8__detail15_List_node_base7_M_hookEPS0_ to obtain more precise result.
Most probably those other functions are coming from C++ templates.
When you declare a templated function, you have to provide its implementation in the header file in most cases. This way their code ends up in your own translation unit, and you see this code in your IR.
I temporarily found a workaround using the suggestion from #arrowd.
// generate list-simple-test.bc
clang++-list-simple-test.cpp -c -emit-llvm
// generate list.bc (list.cc is from the source code of libstdc++)
clang++ -emit-llvm list.cc
// combine list-simple-test.bc and list.bc
llvm-link list.bc list-simple-test.bc -o list-simple-final.bc
In the code about, list.cc can be downloaded from the gcc project
The final bitcode file list-simple-final.bc will contain the definition of _ZNSt8__detail15_List_node_base7_M_hookEPS0_, which is provided by list.cc

How to make LLVM's `opt` command optimize builtin functions?

Consider the following C program:
#include <stdlib.h>
int main() {
int * ptr = malloc(8);
*ptr = 14;
return 4;
}
Compiling with clang -S -emit-llvm -O1 emits the following:
...
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 #main() local_unnamed_addr #0 !dbg !7 {
call void #llvm.dbg.value(metadata i32* undef, metadata !13, metadata !DIExpression()), !dbg !15
ret i32 4, !dbg !16
}
...
The call to malloc is gone because it is a builtin function that clang knows about.
If we run clang -S -emit-llvm -O1 -fno-builtin instead we get the following:
...
; Function Attrs: nounwind uwtable
define dso_local i32 #main() local_unnamed_addr #0 !dbg !14 {
%1 = call noalias i8* #malloc(i64 8) #3, !dbg !22
%2 = bitcast i8* %1 to i32*, !dbg !22
call void #llvm.dbg.value(metadata i32* %2, metadata !20, metadata !DIExpression()), !dbg !23
store i32 14, i32* %2, align 4, !dbg !24, !tbaa !25
ret i32 4, !dbg !29
}
...
clang can't know what malloc is and has to leave the call in.
How can I get from the second LLVM program to the first using LLVM's opt command? How do I tell opt to use the knowledge about builtin functions that clang apparently has?
In this specific example, the problem is that clang -fno-builtin will produce LLVM code that explicitly marks calls to builtin functions with nobuiltin, i.e. attributes #3 = { nobuiltin nounwind "no-builtins" }.
Generally, which builtin functions are available is guessed by pass -targetlibinfo. You have to be careful to declare and use builtin functions at exactly the correct parameter and return types or LLVM will (correctly) not recognize them as builtins.

Why does the clang compiler put these instructions at the start of every function which takes arguments?

I used clang to compile this code with -S -emit-llvm:
int sub2(int n) {
return n - 2
}
And this is the code it outputted:
; Function Attrs: nounwind
define i32 #_Z4sub2i(i32) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = sub nsw i32 %3, 2
ret i32 %4
}
However, I could write the same function as:
define i32 #sub2(i32) #0 {
%2 = sub i32 %0, 2
ret i32 %2
}
Why does it adds those instruction? I am not sure about it, but it seems it's copying the argument.
This is because you haven't run the mem2reg pass. The variables are considered to occupy space on the stack and are alloca'd.
If you try
opt --mem2reg filename.ll -S
you will see that you get something similar to what you expected.
mem2reg is also a part of O1, O2, and O3.
The mem2reg pass tries to convert "variables" into llvm temporaries. It does this only for those variables who address is not taken.

Compile with no optimization in clang

Short question: how to compile with clang with no code optimization? -O0 is not working.
Long question:
I'm learning code optimization and LLVM in particular. I'm writing small examples, compiling them and then running just one optimization at a time, to analyze what it changes. For example, to test Dead Code Elimination, I tried this:
int main() {
int a = 20 + 30;
int b = 25; /* Assignment to dead variable */
int c;
c = a << 2;
return c;
b = 24; /* Unreachable code */
return 0;
}
However, when I compile it with
clang -S -O0 -emit-llvm foo.c
The last two lines of my C code do not show up in the IR code (below). Also, the 20 + 30 is already being calculated to 50. So there's some optimization going on here, even though I'm using -O0.
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
store i32 0, i32* %retval
store i32 50, i32* %a, align 4
store i32 25, i32* %b, align 4
%0 = load i32* %a, align 4
%shl = shl i32 %0, 2
store i32 %shl, i32* %c, align 4
%1 = load i32* %c, align 4
ret i32 %1
}
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.4 (trunk 192936)"}

Resources