I'm a fresh new in Zig who test a few codes with "orelse".
I tried to create 2 optional unsigned integer variables as following code and use orelse with them.
const std = #import("std");
pub fn main() !void {
var value1: ?u32 = 123;
var value2: ?u32 = 222;
std.debug.print("value1 orelse value2: {}\n", .{value1 orelse value2});
}
My expectation, it should print "123". But I got an runtime error (or at least I understand it's):
zig run main.zig
broken LLVM module found: Instruction does not dominate all uses!
%7 = getelementptr inbounds %"struct:26:52", %"struct:26:52"* %1, i32 0, i32 0, !dbg !2021
%12 = getelementptr inbounds %"?u32", %"?u32"* %7, i32 0, i32 1, !dbg !2020
Instruction does not dominate all uses!
%7 = getelementptr inbounds %"struct:26:52", %"struct:26:52"* %1, i32 0, i32 0, !dbg !2021
%13 = getelementptr inbounds %"?u32", %"?u32"* %7, i32 0, i32 0, !dbg !2020
This is a bug in the Zig compiler.thread 1856201 panic:
Unable to dump stack trace: debug info stripped
make: *** [run] Abort trap: 6
So is it the correct expectation or it's a bug? If it's a bug, where should I post it (sorry I'm a newbie).
I'm using zig 0.9.1. Run on MacOS 12.5 (21G72)
Thanks for your taking time
UPDATE:
Tried with suggestions:
v0.9.1: Define 3rd variable: value3 = value1 orelse value2; and print it
v0.10: upgrade and don't see this error anymore.
Tried with suggestions:
v0.9.1: Define 3rd variable: value3 = value1 orelse value2; and print it
v0.10: upgrade and don't see this error anymore.
Related
Here's a simple C file with an enum definition and a main function:
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
return 0;
}
It transpiles to the following LLVM IR:
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4
ret i32 0
}
%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?
This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this
int factorial(int n){
int result;
if(n < 2)
result = 1;
else{
result = n * factorial(n-1);
}
return result;
}
You'd probably do this
int factorial(int n){
if(n < 2)
return 1;
return n * factorial(n-1);
}
Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.
Modified code,
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
if(d) return 1;
return 0;
}
IR,
define dso_local i32 #main() #0 !dbg !15 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4, !dbg !22
%3 = load i32, i32* %2, align 4, !dbg !23
%4 = icmp ne i32 %3, 0, !dbg !23
br i1 %4, label %5, label %6, !dbg !25
5: ; preds = %0
store i32 1, i32* %1, align 4, !dbg !26
br label %7, !dbg !26
6: ; preds = %0
store i32 0, i32* %1, align 4, !dbg !27
br label %7, !dbg !27
7: ; preds = %6, %5
%8 = load i32, i32* %1, align 4, !dbg !28
ret i32 %8, !dbg !28
}
Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.
Why does this matter — what's the actual problem?
I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.
In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.
Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.
The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.
That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:
main: # #main
xor eax, eax
ret
When compiling this CPP file that using the list library by the command clang++ list-simple-test.cpp -c -emit-llvm:
// list1.cpp
#include <list>
using namespace std;
int main(int argc, char **argv)
{
int x = 1;
list<int*> alist;
alist.push_back(&x);
return x;
}
I notice that some functions, like _ZNSt8__detail15_List_node_base7_M_hookEPS0_ is generated without a function body:
; Function Attrs: nounwind
declare dso_local void #_ZNSt8__detail15_List_node_base7_M_hookEPS0_(%"struct.std::__detail::_List_node_base"*, %"struct.std::__detail::_List_node_base"*) #5
While most of the other functions are generated with a complete body, for example, like the function _ZNSt7__cxx1110_List_baseIPiSaIS1_EE11_M_inc_sizeEm below:
; Function Attrs: noinline nounwind optnone uwtable
define linkonce_odr dso_local void #_ZNSt7__cxx1110_List_baseIPiSaIS1_EE11_M_inc_sizeEm(%"class.std::__cxx11::_List_base"*, i64) #1 comdat align 2 {
%3 = alloca %"class.std::__cxx11::_List_base"*, align 8
%4 = alloca i64, align 8
store %"class.std::__cxx11::_List_base"* %0, %"class.std::__cxx11::_List_base"** %3, align 8
store i64 %1, i64* %4, align 8
%5 = load %"class.std::__cxx11::_List_base"*, %"class.std::__cxx11::_List_base"** %3, align 8
%6 = load i64, i64* %4, align 8
%7 = getelementptr inbounds %"class.std::__cxx11::_List_base", %"class.std::__cxx11::_List_base"* %5, i32 0, i32 0
%8 = getelementptr inbounds %"struct.std::__cxx11::_List_base<int *, std::allocator<int *> >::_List_impl", %"struct.std::__cxx11::_List_base<int *, std::allocator<int *> >::_List_impl"* %7, i32 0, i32 0
%9 = getelementptr inbounds %"struct.std::__detail::_List_node_header", %"struct.std::__detail::_List_node_header"* %8, i32 0, i32 1
%10 = load i64, i64* %9, align 8
%11 = add i64 %10, %6
store i64 %11, i64* %9, align 8
ret void
}
I understand that those functions are from libstdc++.so. But why does Clang generate the body for some functions, but not the other?
Does anybody know how to make Clang generate the body of _ZNSt8__detail15_List_node_base7_M_hookEPS0_ as well?
Thank you very much for reading my question! I'm writing a static analysis tool, which needs to analyze the body of _ZNSt8__detail15_List_node_base7_M_hookEPS0_ to obtain more precise result.
Most probably those other functions are coming from C++ templates.
When you declare a templated function, you have to provide its implementation in the header file in most cases. This way their code ends up in your own translation unit, and you see this code in your IR.
I temporarily found a workaround using the suggestion from #arrowd.
// generate list-simple-test.bc
clang++-list-simple-test.cpp -c -emit-llvm
// generate list.bc (list.cc is from the source code of libstdc++)
clang++ -emit-llvm list.cc
// combine list-simple-test.bc and list.bc
llvm-link list.bc list-simple-test.bc -o list-simple-final.bc
In the code about, list.cc can be downloaded from the gcc project
The final bitcode file list-simple-final.bc will contain the definition of _ZNSt8__detail15_List_node_base7_M_hookEPS0_, which is provided by list.cc
Please consider the following program:
int main() {
int test = 17;
return test;
}
Compile to LLVM_IR: clang++ -S -emit-llvm test.cpp
Looking at the IR, the function main is defined as so:
; Function Attrs: noinline norecurse nounwind optnone uwtable
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 17, i32* %2, align 4
%3 = load i32, i32* %2, align 4
ret i32 %3
}
We can see that %2 is the allocation of our test variable, loading 17 into it, and %3 uses that variable as the funcition's return value (in keep with the code as we wrote it). However, we see that %1 defines another int sized variable, and initializes it to 0, despite never using it. This extra variable is nowhere to be seen in the C++ source.
I should note that I see the same being generated when I compile using clang rather than clang++.
What is this extra variable?
I assume you are using an old version of clang. In the new version ( I mean v7.0 and later), value names are printed by default. But to be explicitly print, you might you -fno-discard-value-names. With this option you'll get the following IR:
define dso_local i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%test = alloca i32, align 4
store i32 0, i32* %retval, align 4
store i32 17, i32* %test, align 4
%0 = load i32, i32* %test, align 4
ret i32 %0
}
Now it is quiet clear where store 0 comes from. In an unoptimized code, the compiler initializes the retval to 0.
I generate ir by use 'clang -S -emit-llvm test.c'.
int main(int argc, char **argv)
{
int* a=0;
a=(int *)malloc(sizeof(int));
printf("hello world\n");
return 0;
}
and this is the ir:
define i32 #main(i32, i8**) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i32*, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
store i32* null, i32** %6, align 8
%7 = call noalias i8* #malloc(i64 4) #3
%8 = bitcast i8* %7 to i32*
store i32* %8, i32** %6, align 8
%9 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* #.str, i32 0, i32 0))
ret i32 0
}
how can I make the variable name remain unchanged,like a still %a ,not %3?
Actually dropping of variable names is a feature and needs to be activated with -discard-value-names. Clang in a release build does this by its own (a self compiled clang in debug mode not).
You can circumvent it with
clang <your-command-line> -###
Then copy the output and drop -discard-value-names.
Newer clang version (since 7) expose the flag to the normal command line:
clang -fno-discard-value-names <your-command-line>
Source
There is not such way. The variable names in LLVM IR are merely for debugging only and also there is certainly no way to preserve them when the code is converted to full SSA form.
If you need to preserve source code information consider using debug info.
I'm trying to run a program from the LLVM bitcode generated by my compiler, but when I run the lli command it returns an error
lli-3.6: test2.ll:9:1: error: expected instruction opcode
When I use the lli with the .ll generated by clang -S -emit-llvm it works. There are many optimizations in this code, though. I tried to insert some of them manually, but it didn't work.
My problem is that I don't know if the structure of my code is correct, or if it is just missing something particular for the interpreter to work properly. Originally, I tried to use the JIT in the code, but it was giving me more errors with the libraries and the documention was not helpful.
My llvm bitcode is the following:
%struct.test = type { i32, i32 }
define internal void #test_program() {
entry:
%a = alloca i32
store i32 5, i32* %a
call void #printf(i32 3)
%bar = alloca %struct.test
}
define internal void #f(i32 %x) {
entry:
%b = alloca i32
%mul = mul i32 6, 2
%add = add i32 %mul, 3
%add1 = add i32 10, %add
store i32 %add1, i32* %b
%tmp_eq = icmp eq i32* %b, i32 25
br i1 %tmp_eq, label %cond_true, label %cond_false
cond_true: ; preds = %entry
store i32 40, i32* %b
cond_false: ; preds = %entry
store i32 50, i32* %b
}
declare void #printf()
The LLVM IR file you provided is malformed - it's missing terminator instructions on its basic blocks (except on %entry in #f). It looks like you have a bug in your custom optimizations.