Gcovr - 6 possible branches in if-statement - gcov

I am using Gcovr to measure code coverage in C++.
In a simple if-statement, I am getting weird results.
Below I add a picture of a code coverage.
Can anyone help and can explain, why there are six possible branches?
When I seperate the functions a() and b() in their own if-statement, I am getting two possible branches, which is ok.

GCC-gcov code coverage does not consider the branches in your source code, it considers the branches emitted by the compiler. Approximately, it works on an assembly level. If I look at the assembly for your initial func(), I see three branching points associated with that line:
func():
push rbp
mov rbp, rsp
call a()
test eax, eax
jne .L6
call b()
test eax, eax
je .L7
.L6:
mov eax, 1
jmp .L8
.L7:
mov eax, 0
.L8:
test al, al
je .L9
mov eax, 1
jmp .L10
.L9:
mov eax, 0
.L10:
pop rbp
ret
– https://godbolt.org/z/6q691r3fj gcc 12.2 with -O0
The entire .L8 and .L9 block look pretty pointless to me, but it seems to me that the compiler is actually translating the conditional via an intermediate variable. Roughly, your code
if (a() || b()) {
return 1;
} else {
return 0;
}
seems to be translated as
int cond;
if (a() || b()) {
cond = 1;
} else {
cond = 0;
}
int retval;
if (cond) {
retval = 1;
} else {
retval = 0;
}
return retval;
This is just the kind of code that gets emitted when disabling all optimizations. It is not always a literal translation of the source code, sometimes compilers do stuff that appears really dumb. As discussed in the gcovr FAQ entry “Why does C++ have so many uncovered branches?”, it is not generally possible to get rid of all the compiler-generated branches. As a consequence, chasing a concrete branch coverage metric is not overly useful for GCC-gcov based coverage data.

Related

Using inline assembler in iOS aarch64 application

I tried to compile some inline assembler for 64-bit iOS application.
Here's an example:
int roundff(float _value) {
int res;
float temp;
asm("vcvtr.s32.f32 %[temp], %[value] \n vmov %[res], %[temp]" : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (_value));
return res;
}
and I have this errors:
Unrecognized instruction mnemonic.
But this code compiles fine:
__asm__ volatile(
"add %[result], %[b], %[a];"
: [result] "=r" (result)
: [a] "r" (a), [b] "r" (b), [c] "r" (c)
);
Than I founded that in aarch64 I have to use fcvt instead of vcvt. Because
int a = (int)(10.123);
compiles into
fcvtzs w8, s8
but I don't know how to write it in inline assembler. Something like this
int roundff(float _value)
{
int res;
asm("fcvtzs %[res], %[value]" : [res] "=r" (res) : [value] "w" (_value));
return res;
}
also doesn't work and generates this errors:
Instruction 'fcvtz' can not set flags, but 's' suffix specified.
Invalid operand for instruction.
Also I need round instead of trim. (fcvtns)
Any help? Where I can read something more about arm(32/64) asm?
UPDATE
Ok. This: float res = nearbyintf(v) compiles into nice instruction frinti s0 s0. But why my inline assembler does not work on iOS using clang compiler?
Here is how you do it:
-(int) roundff:(float)a {
int y;
__asm__("fcvtzs %w0, %s1\n\t" : "=r"(y) : "w"(a));
return y;
}
Take care,
/A
You can get the rounding you want using standard math.h functions that inline to single ARM instructions. Better yet, the compiler knows what they do, so may be able to optimize better by e.g. proving that the integer can't be negative, if that's the case.
Check godbolt for the compiler output:
#include <math.h>
int truncate_f_to_int(float v)
{
int res = v; // standard C cast: truncate with fcvtzs on ARM64
// AMD64: inlines to cvtTss2si rax, xmm0 // Note the extra T for truncate
return res;
}
int round_f_away_from_zero(float v)
{
int res = roundf(v); // optimizes to fcvtas on ARM64
// AMD64: AND/OR with two constants before converting with truncation
return res;
}
//#define NOT_ON_GODBOLT
// godbolt has a broken setup and gets x86-64 inline asm for lrintf on ARM64
#if defined(NOT_ON_GODBOLT) || defined (__x86_64__) || defined(__i386__)
int round_f_to_even(float v)
{
int res = lrintf(v); // should inline to a convert using the current rounding mode
// AMD64: inlines to cvtss2si rax, xmm0
// nearbyintf(v); // ARM64: calls the math library
// rintf(v); // ARM64: calls the math library
return res;
}
#endif
godbolt has a buggy install of headers for non-x86 architectures: they still uses x86 math headers, including inline asm.
Also, your roundff function with inline asm for fcvtzs compiled just fine on godbolt with gcc 4.8. Maybe you were trying to build for 32bit ARM? But like I said, use the library function that does what you want, then check to make sure it inlines to nice ASM.

Maintenance of reference counting in Z3

By some reasons I have to use C++ API and C API of Z3 together. In C++ API, reference counting of Z3 objects are well maintained and I needn't to worry about making mistakes. However I have to manually maintain reference counting for Z3 objects when I use C API because C++ API uses Z3_mk_context_rc to create the context. I have several problems on reference counting maintenance in Z3.
(1) If the reference counting of a Z3_ast is reduced to 0, what is responsible to release the memory of this Z3_ast? And when?
(2) The code below
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
Z3_ast res = Z3_mk_eq(c,x,y);
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
Z3_ast fe = Z3_mk_eq(c,z,u);
#endif
std::cout << Z3_ast_to_string(c,res) << std::endl;
}
void main()
{
config cfg;
cfg.set("MODEL", true);
cfg.set("PROOF", true);
context c(cfg);
rctry(c);
}
Although I didn't increase reference count for AST referenced by res, the program works well. If FAULT_CLAUSE is defined, program still works, but it will output (= z u) instead of (= x y). How to explain this?
Thank you!
My golden rule for reference counting is: Whenever my program receives a pointer to a Z3 object, I immediately increment the ref count and I save the object somewhere safe (i.e., I now own 1 reference to that object). Only when I'm absolutely sure that I will not need the object any longer, then I will call Z3_dec_ref; from that point on, any access to that object will trigger undefined behavior (not necessarily a segfault), because I don't own any references anymore - Z3 owns all the rerferences and it can do whatever it wants to do with them.
Z3 objects are always deallocated when the ref count goes to zero; it's within the call to dec_ref() that the deallocation happens. If Z3_dec_ref() is never called (like in the example given), then the object may remain in memory so accessing that particular part of the memory might perhaps still give "ok looking" results, but that part of the memory may also be overwritten by other procedures so that they contain garbage.
In the example program given, we would need to add inc/dec_ref calls as follows:
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
Z3_ast res = Z3_mk_eq(c,x,y);
Z3_inc_ref(c, res); // I own 1 ref to res!
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
Z3_ast fe = Z3_mk_eq(c,z,u);
Z3_inc_ref(c, fe); I own 1 ref to fe!
#endif
std::cout << Z3_ast_to_string(c, res) << std::endl;
#ifdef FAULT_CLAUSE
Z3_dec_ref(c, fe); // I give up my ref to fe.
#endif
Z3_dec_ref(c, res); // I give up my ref to res.
}
The explanation for the output (= z u) is that the second call to Z3_mk_eq
re-uses the chunk of memory that previously held res, because apparently
only the library itself had a reference to it, so it is free to chose what to
do with the memory. The consequence is that the call to Z3_ast_to_string
reads from the right part of the memory (that used to contain res), but the
contents of that part of the memory have changed in the meanwhile.
That was the long explanation for anybody who needs to manage ref counts in C. In
the case of C++ there is also a much more convenient way: the ast/expr/etc
objects contain a constructor that takes C objects. Therefore, we can construct
managed objects by simply wrapping them in constructor calls; in this
particular example that could be done as follows:
void rctry(context & c)
{
expr x = c.int_const("x");
expr y = c.int_const("y");
expr res = expr(c, Z3_mk_eq(c, x, y)); // res is now a managed expr
#ifdef FAULT_CLAUSE
expr z = c.int_const("z");
expr u = c.int_const("u");
expr fe = expr(c, Z3_mk_eq(c,z,u)); // fe is now a managed expr
#endif
std::cout << Z3_ast_to_string(c, res) << std::endl;
}
Within the destructor of expr there is a call to Z3_dec_ref, so that it
will be called automatically at the end of the function, when res and fe go
out of scope.

tail recursion and Boolean operators

I am currently learning F# on my own (via the try f# site).
I have the following (imho) tail-recursive function for existential quantification of a unary predicate (int->bool).
let rec exists bound predicate =
if (bound<0) then false
elif predicate(bound) then true
else exists (bound-1) predicate
Now this function can also be written as
let rec exists bound predicate = (bound+1>0) && (predicate(bound) || exists (bound-1) predicate)
However, the second implementation is not tail-recursive. The question is whether or not the compiler will optimize it to tail-recursive?
How is the situation for even simpler (ok, it is a bit silly) examples, say
let rec hasNoRoot f =
if (f x = 0) then false
else hasNoRoot (fun x -> f (x+1))
versus
let rec hasNoRoot f = (f 0 <> 0) && (hasNoRoot (fun x-> f(x+1)))
in the second example, in order to recognize the function (its description actually) as tail-recursive, the compiler only needs to "know" that in order to evaluate a conjunction, not necessarily both conjuncts have to be evaluated.
thanks for any advice
I compiled the second versions of your 'exists' and 'hasNoRoot' functions with VS2012 (F# 3.0) and optimizations on, then checked the IL produced by the compiler using .NET Reflector. The compiler does optimize the 'hasNoRoot' function, but unfortunately, does not optimize the 'exists' function. It seems like a reasonable optimization though, so perhaps it will be added to the next version of the compiler.
For posterity, here's what the compiler generated:
.method public static bool exists(int32 bound, class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32, bool> predicate) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = { new int32[int32(0x2)] { int32(0x1), int32(0x1) } }
.maxstack 8
L_0000: nop
L_0001: ldarg.0
L_0002: ldc.i4.1
L_0003: add
L_0004: ldc.i4.0
L_0005: ble.s L_001c
L_0007: ldarg.1
L_0008: ldarg.0
L_0009: callvirt instance !1 [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32, bool>::Invoke(!0)
L_000e: brfalse.s L_0012
L_0010: ldc.i4.1
L_0011: ret
L_0012: ldarg.0
L_0013: ldc.i4.1
L_0014: sub
L_0015: ldarg.1
L_0016: starg.s predicate
L_0018: starg.s bound
L_001a: br.s L_0001
L_001c: ldc.i4.0
L_001d: ret
}

F# using accumulator, still getting stack overflow exception

In the following function, I've attempted to set up tail recursion via the usage of an accumulator. However, I'm getting stack overflow exceptions which leads me to believe that the way I'm setting up my function is't enabling tail recursion correctly.
//F# attempting to make a tail recursive call via accumulator
let rec calc acc startNum =
match startNum with
| d when d = 1 -> List.rev (d::acc)
| e when e%2 = 0 -> calc (e::acc) (e/2)
| _ -> calc (startNum::acc) (startNum * 3 + 1)
It is my understanding that using the acc would allow the compiler to see that there is no need to keep all the stack frames around for every recursive call, since it can stuff the result of each pass in acc and return from each frame. There is obviously something I don't understand about how to use the accumulator value correctly so the compiler does tail calls.
Stephen Swensen was correct in noting as a comment to the question that if you debug, VS has to disable the tail calls (else it wouldn't have the stack frames to follow the call stack). I knew that VS did this but just plain forgot.
After getting bit by this one, I wonder if it possible for the runtime or compiler to throw a better exception since the compiler knows both that you are debugging and you wrote a recursive function, it seems to me that it might be possible for it to give you a hint such as
'Stack Overflow Exception: a recursive function does not
tail call by default when in debug mode'
It does appear that this is properly getting converted into a tail call when compiling with .NET Framework 4. Notice that in Reflector it translates your function into a while(true) as you'd expect the tail functionality in F# to do:
[CompilationArgumentCounts(new int[] { 1, 1 })]
public static FSharpList<int> calc(FSharpList<int> acc, int startNum)
{
while (true)
{
int num = startNum;
switch (num)
{
case 1:
{
int d = num;
return ListModule.Reverse<int>(FSharpList<int>.Cons(d, acc));
}
}
int e = num;
if ((e % 2) == 0)
{
int e = num;
startNum = e / 2;
acc = FSharpList<int>.Cons(e, acc);
}
else
{
startNum = (startNum * 3) + 1;
acc = FSharpList<int>.Cons(startNum, acc);
}
}
}
Your issue isn't stemming from the lack it being a tail call (if you are using F# 2.0 I don't know what the results will be). How exactly are you using this function? (Input parameters.) Once I get a better idea of what the function does I can update my answer to hopefully solve it.

Does using a lot of tail-recursion in Erlang slow it down?

I've been reading about Erlang lately and how tail-recursion is so heavily used, due to the difficulty of using iterative loops.
Doesn't this high use of recursion slow it down, what with all the function calls and the effect they have on the stack? Or does the tail recursion negate most of this?
The point is that Erlang optimizes tail calls (not only recursion). Optimizing tail calls is quite simple: if the return value is computed by a call to another function, then this other function is not just put on the function call stack on top of the calling function, but instead the stack frame of the current function is replaced by one of the called function. This means that tail calls don't add to the stack size.
So, no, using tail recursion doesn't slow Erlang down, nor does it pose a risk of stack overflow.
With tail call optimization in place, you can not only use simple tail recursion, but also mutual tail recursion of several functions (a tail-calls b, which tail-calls c, which tail-calls a ...). This can sometimes be a good model of computation.
Iterative tail recursion is generally implemented using Tail calls.
This is basically a transformation of a recursive call to a simple loop.
C# example:
uint FactorialAccum(uint n, uint accum) {
if(n < 2) return accum;
return FactorialAccum(n - 1, n * accum);
};
uint Factorial(uint n) {
return FactorialAccum(n, 1);
};
to
uint FactorialAccum(uint n, uint accum) {
start:
if(n < 2) return accum;
accum *= n;
n -= 1;
goto start;
};
uint Factorial(uint n) {
return FactorialAccum(n, 1);
};
or even better:
uint Factorial(uint n) {
uint accum = 1;
start:
if(n < 2) return accum;
accum *= n;
n -= 1;
goto start;
};
C# not real tail recursion, this is because the return value is modified, most compilers won't break this down into a loop:
int Power(int number, uint power) {
if(power == 0) return 1;
if(power == 1) return number;
return number * Power(number, --power);
}
to
int Power(int number, uint power) {
int result = number;
start:
if(power == 0) return 1;
if(power == 1) return number;
result *= number;
power--;
goto start;
}
It should not affect performance in most cases. What you're looking for is not just tail calls, but tail call optimization (or tail call elimination). Tail call optimization is a compiler or runtime technique that figures out when a call to a function is the equivalent of 'popping the stack' to get back to the proper function instead of just returning. Generally tail call optimization of can only be done when the recursive call is the last operation in the function, so you have to be careful.
There is a problem pertaining to tail-recursion but it is not related to performance - Erlang tail-recursion optimisation also involves elimination of the stack trace for debugging.
For instance see Point 9.13 of the Erlang FAQ:
Why doesn't the stack backtrace show the right functions for this code:
-module(erl).
-export([a/0]).
a() -> b().
b() -> c().
c() -> 3 = 4. %% will cause badmatch
The stack backtrace only shows function c(), rather than a(), b() and c().
This is because of last-call-optimisation; the compiler knows it does not need
to generate a stack frame for a() or b() because the last thing it did was call another function, hence the stack frame does not appear in the stack backtrace.
This can be a bit of pain when you hit a crash (but it does kinda go with the territory of functional programming...)
A similar optimization that separates program text function calls from implementation function calls is 'inlining'. In modern/thoughtful languages function calls have little relation to machine level function calls.

Resources