Getting calling conventions from DWARF info - calling-convention

I am trying to get information about calling conventions from DWARF info. More specific, I want to get which registers / stack locations are used to pass arguments to functions. My problem is that I am getting somehow wrong information in some cases from DWARF dump. The example I am using is the following "C code":
int __attribute__ ((fastcall)) __attribute__ ((noinline)) mult (int x, int y) {
return x*y;
}
I compile this example using the following command:
gcc -c -g -m32 test.c -o test.o
Now when I use the following command to get the dwarf dump:
dwarfdump test.o
I am getting the following information about this function:
< 2><0x00000042> DW_TAG_formal_parameter
DW_AT_name "x"
DW_AT_decl_file 0x00000001 /home/khaled/Repo_current/trunk/test.c
DW_AT_decl_line 0x00000001
DW_AT_type <0x0000005b>
DW_AT_location DW_OP_fbreg -12
< 2><0x0000004e> DW_TAG_formal_parameter
DW_AT_name "y"
DW_AT_decl_file 0x00000001 /home/khaled/Repo_current/trunk/test.c
DW_AT_decl_line 0x00000001
DW_AT_type <0x0000005b>
DW_AT_location DW_OP_fbreg -16
Looking at the DW_AT_location entries, it is some offset from the frame base. This implies they are memory arguments, but the actual calling convention "fastcall" forces passing them into registers. By looking at the disassembly of the produced object file, I can see they are copied from registers to stack locations at the entry point of the function. Is there a way to know from the dwarf dump --or using any other way-- where the arguments are passed at the call initially?
Thanks,

Because you are using gcc -c -g -m32 test.c -o test.o. Although it is a fastcall function, GCC still needs to generate code to save values from registers to the stack frame at the beginning of the function. Without that, any debugger or gdb cannot debug the program or they will say the argument is being optimized and not shown. It makes debugging impossible.
In x86_64, compiler also uses some registers to pass some arguments by default, even without specifying attribute fastcall for a function. You can also find those registers are being copied to the stack as well.
// x86_64 assembly code
_mult:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -4(%rbp), %eax
movl -8(%rbp), %ecx
imull %ecx, %eax
If you turn on optimization flag -O, -O2, -O3 (no matter -g or not), you can disassemble and find there is nothing being copied to the stack frame. And when you gdb the optimized executable file, and stop at the beginning of the function to show local variables, gdb will tell you those arguments are being optimized out.
the dwarfdump example of the 32-bit program would look like
0x00000083: TAG_formal_parameter [4]
AT_name( "x" )
AT_decl_file( "test.c" )
AT_decl_line( 1 )
AT_type( {0x0000005f} ( int ) )
AT_location( 0x00000000
0x00000000 - 0x00000003: ecx
0x00000003 - 0x00000018: ecx )
0x00000090: TAG_formal_parameter [4]
AT_name( "y" )
AT_decl_file( "test.c" )
AT_decl_line( 1 )
AT_type( {0x0000005f} ( int ) )
AT_location( 0x0000001e
0x00000000 - 0x00000003: edx
0x00000003 - 0x00000018: edx )
And you can find the generated assembly code is much simple and clean.
_mult:
pushl %ebp
movl %esp, %ebp
movl %ecx, %eax
imull %edx, %eax
popl %ebp
ret $12

Related

Do the glibc implementation of pthread_spin_lock() and pthread_spin_unlock() function have memory fence instructions?

Do the glibc implementation of pthread_spin_lock() and pthread_spin_unlock() function have memory fence instructions? (I could not find any fence instructions.)
Similar question has answers here.
Does pthread_mutex_lock contains memory fence instruction?
Do the glibc implementation of pthread_spin_lock() and pthread_spin_unlock() function have memory fence instructions?
There is no the implementation -- there is an implementation for each supported processor.
The x86_64 implementation does not use memory fences; it uses lock prefix instead:
gdb -q /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) disas pthread_spin_lock
Dump of assembler code for function pthread_spin_lock:
0x00000000000108c0 <+0>: lock decl (%rdi)
0x00000000000108c3 <+3>: jne 0x108d0 <pthread_spin_lock+16>
0x00000000000108c5 <+5>: xor %eax,%eax
0x00000000000108c7 <+7>: retq
0x00000000000108c8 <+8>: nopl 0x0(%rax,%rax,1)
0x00000000000108d0 <+16>: pause
0x00000000000108d2 <+18>: cmpl $0x0,(%rdi)
0x00000000000108d5 <+21>: jg 0x108c0 <pthread_spin_lock>
0x00000000000108d7 <+23>: jmp 0x108d0 <pthread_spin_lock+16>
Since lock-prefixed instructions are already a memory barrier on x86_64 (and i386), no additional memory barriers are necessary.
But powerpc implementation uses lwarx and stwcx instructions, which are closer to "memory fence", and sparc64 implementation uses full membar (memory barrier) instruction.
You can see the various implementations in sysdeps/.../pthread_spin_lock.* files in GLIBC sources.

How to solve LLDB error about N_SO in symbol with UID 1

When I launched lldb to debug an iOS application, I got an error that I never had before.
error: Veriff(0x00000001018cc000) N_SO in symbol with UID 1 has
invalid sibling in debug map, please file a bug and attach the binary
listed in this error
Below is the context of the error.
(lldb) process connect connect://localhost:6666
error: Veriff(0x00000001018cc000) N_SO in symbol with UID 1 has invalid sibling in debug map, please file a bug and attach the binary listed in this error
Process 3270 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0000000187a1f6b0 libxpc.dylib` _xpc_dictionary_apply_node_f + 108
libxpc.dylib`_xpc_dictionary_apply_node_f:
-> 0x187a1f6b0 <+108>: mov x1, x20
0x187a1f6b4 <+112>: blr x21
0x187a1f6b8 <+116>: tbz w0, #0x0, 0x187a1f6f8 ; <+180>
0x187a1f6bc <+120>: mov x0, x26
0x187a1f6c0 <+124>: cbnz x26, 0x187a1f6a0 ; <+92>
0x187a1f6c4 <+128>: add x22, x22, #0x1 ; =0x1
0x187a1f6c8 <+132>: cmp x22, x23
0x187a1f6cc <+136>: b.lo 0x187a1f698 ; <+84>
Target 0: (Test app) stopped.
Has anyone been able to solve this error?
Does this impact any debugging?
I've never seen that error triggered before. If you can make this binary available to us, please file a bug either with http://bugs.llvm.org or http://bugreporter.apple.com and include the error message and the binary.
The error means lldb can't map symbols from some .o file that was included in your binary back to the .o file they came from (which is where the debug information actually resides.) So that code's debug information will not be available.

how to find variables location in memory without source code?

Basically I want to find the address/location of a variable in gdb?
I know normally the variable are store at rbp but don't know how to locate them using gdb.
I want to find the address/location of a variable in gdb?
That is possible, but the approach is different depending on whether the variable is a global or a local.
I know normally the variable are store at rbp
Local variables are stored at some offset of the frame pointer. %RBP is often used as a frame pointer in unoptimized binaries.
To find such variable, you'll need to know how to read machine code, and then you can find it. GDB will not help you with finding it in code that is compiled without debug info (it can't).
without source code
Source code has nothing to do with this -- GDB never looks at the source code, except to display it to you.
On to concrete example. Suppose you have the following source:
int foo(int *ip) { return *ip + 42; }
int main()
{
int j = 1;
return foo(&j);
}
Compiling this without debug info and without optimizations, results in:
(gdb) disas main
Dump of assembler code for function main:
0x000000000000060d <+0>: push %rbp
0x000000000000060e <+1>: mov %rsp,%rbp
0x0000000000000611 <+4>: sub $0x10,%rsp
0x0000000000000615 <+8>: movl $0x1,-0x4(%rbp)
0x000000000000061c <+15>: lea -0x4(%rbp),%rax
0x0000000000000620 <+19>: mov %rax,%rdi
0x0000000000000623 <+22>: callq 0x5fa <foo>
0x0000000000000628 <+27>: leaveq
0x0000000000000629 <+28>: retq
End of assembler dump.
here you can clearly see that j is being stored at negative offset 4 off %rbp.
You can set a breakpoint on foo, and use GDB to examine its value like so:
(gdb) b foo
Breakpoint 1 at 0x5fe
(gdb) run
Breakpoint 1, 0x00005555555545fe in foo ()
(gdb) up
#1 0x0000555555554628 in main ()
(gdb) x/x $rbp-4
0x7fffffffdbcc: 0x00000001 // indeed that is expected value of j

AVX instructions generated when -xSSE4.1 specified

I have compiled a piece of code with the option -xSSE4.1 using the Intel compiler. When I looked at the generated assembly file, I see that AVX instructions such as 'vpmovzxbw' have been inserted. But, the executable still seems to run on machines that don't support the AVX instruction set. What explains this?
Here's the particular code snippet -
C -> src0_8x16b = _mm_cvtepu8_epi16 (src0_8x16b);
Assembly -> vpmovzxbw xmm4, QWORD PTR [rcx]
Binary -> 00066 c4 62 79 30 29
Here's another snippet where the assembly instruction uses 3 operands -
C -> src0_8x16b = _mm_sub_epi16 (src0_8x16b, src1_8x16b);
Assembly -> vpsubw xmm1, xmm13, xmm11
Binary -> 000bc c4 c1 11 f9 cb
For comparison, here's the disassembly generated by icc for the function 'foo' (The only difference between the function foo and the code snippet above is that the code snippet was coded using intrinsics) -
Compiler commands used -
icc -S -xSSE4.1 -axavx -O3 foo.c
Function foo -
void foo(float *x, int n)
{
int i;
for(i=0; i<n; i++) x[i] *= 2.0;
}
Autodispatch code -
testl $-131072, __intel_cpu_indicator(%rip) #1.27
jne foo.R #1.27
testl $-1, __intel_cpu_indicator(%rip) #1.27
jne foo.A
Loop in foo.R (AVX variant) -
vmulps (%rdi,%rcx,4), %ymm0, %ymm1 #3.24
vmulps 32(%rdi,%rcx,4), %ymm0, %ymm2 #3.24
vmovups %ymm1, (%rdi,%rcx,4) #3.24
vmovups %ymm2, 32(%rdi,%rcx,4) #3.24
addq $16, %rcx #3.5
cmpq %rdx, %rcx #3.5
jb ..B2.12 # Prob 82% #3.5
Loop in foo.A (SSE variant) -
movaps (%rdi,%r8,4), %xmm1 #3.24
movaps 16(%rdi,%r8,4), %xmm2 #3.24
mulps %xmm0, %xmm1 #3.24
mulps %xmm0, %xmm2 #3.24
movaps %xmm1, (%rdi,%r8,4) #3.24
movaps %xmm2, 16(%rdi,%r8,4) #3.24
addq $8, %r8 #3.5
cmpq %rsi, %r8 #3.5
jb ..B3.12 # Prob 82% #3.5
I have tried to replicate the results on two other compilers, viz., gcc and Microsoft Visual Studio's v100 compilers. I was unable to do so, i.e., gcc and v100 compilers seem to be generating the correct disassemblies. As a further step, I looked closely at the differences, if any, that existed between the compiler arguments that I had specified in each case. It turns out that whilst using the icc compiler, I had enabled the option to inherit project defaults for compiling this particular file. The project settings were configured such that this option was included -
-xavx
As a result when this file was being compiled, the settings I had provided -
-xSSE4.1 -axavx
were overridden by the former. This was the cause of the behavior I have detailed in my question.
I am sorry for this error, but I shall not delete this question since #Zboson 's
answer is exceptional.
PS - I had mentioned in one of my comments that I was able to run this code on an SSE42 machine. That was because the exe I had run on that machine was indeed SSE41 compliant since I had apparently used an exe generated using the gcc compiler. I ran the icc generated exe and it was indeed crashing with an illegal instruction error on the SSE42 machine.
The Intel compiler can
generate a single executable with multiple levels of vectorization with the -ax flag,
For example to generate code which is compatible with AVX, SSE4.1 and SSE2 to use -axAVX -axSSE4.2 -xSSE2.
Since you compiled with -axAVX -xSSE4.1 Intel generated a AVX branch and a SSE4.1 branch and at runtime it determines which instruct set is available and chooses that.
Agner Fog has a good description of Intel's CPU dispatcher in his Optimizing C++ manaul. See section "13.7 CPU dispatching in Intel compiler". Intel's CPU dispatcher is not ideal for several reasons, one of which is that it plays bad on AMD, which Agner describes in detail. Personally I would make my own dispatcher.
I compiled the following code with ICC 13.0 with options -O3 -axavx -xsse2
void foo(float *x, int n) {
for(int i=0; i<n; i++) x[i] *= 2.0;
}
and the start of the assembly is
test DWORD PTR __intel_cpu_indicator[rip], -131072 #1.27
jne _Z3fooPfi.R #1.27
test DWORD PTR __intel_cpu_indicator[rip], -1 #1.27
jne _Z3fooPfi.A
going to the _Z3fooPfi.R branch find the main AVX loop
..B2.12: # Preds ..B2.12 ..B2.11
vmulps ymm1, ymm0, YMMWORD PTR [rdi+rcx*4] #2.25
vmulps ymm2, ymm0, YMMWORD PTR [32+rdi+rcx*4] #2.25
vmovups YMMWORD PTR [rdi+rcx*4], ymm1 #2.25
vmovups YMMWORD PTR [32+rdi+rcx*4], ymm2 #2.25
add rcx, 16 #2.2
cmp rcx, rdx #2.2
jb ..B2.12 # Prob 82% #2.2
going to the _Z3fooPfi.A branch has the main SSE loop
movaps xmm1, XMMWORD PTR [rdi+r8*4] #2.25
movaps xmm2, XMMWORD PTR [16+rdi+r8*4] #2.25
mulps xmm1, xmm0 #2.25
mulps xmm2, xmm0 #2.25
movaps XMMWORD PTR [rdi+r8*4], xmm1 #2.25
movaps XMMWORD PTR [16+rdi+r8*4], xmm2 #2.25
add r8, 8 #2.2
cmp r8, rsi #2.2
jb ..B3.12 # Prob 82% #2.2

NSString conversion to lowercase crashes

xcode 4.6 (4H127), xcode 4.6.3 (4H1503): A simple lower/uppercase conversion of a string with an accented char crashes, depending on the setting of Deployment Target. Code snippet:
NSString *lc1 = #"Bosnië-Herzegovina";
NSString *lc2 = [lc1 lowercaseString];
NSString *uc3 = [lc1 uppercaseString];
NSLog( #"\nlc1=%#\nlc2=%#\nuc3=%# ", lc1,lc2,uc3);
The "ë" is simply typed as "opt-u e", the source code file is regular UTF Unicode.
lc1 looks as expected in the debugger. But, lc2 and uc3 strings have "chinese" characters appended at the end, with Deployment Target < 6.1. With 6.1 selected the chinese characters are gone. All that may simply be the UTF compatibility of the debugger, but with deployment target 5.0-5.1 the code snippet crashes even, as shown below, and that is my problem; the strings in my actual application are not in source code but from an SQLite database. So, at this moment I can only build my app for deployment target 6.0+? Am I missing something?
0x1c49a20: incl %eax
0x1c49a21: jmp 0x1c499fb ; CFUniCharMapCaseTo + 1275
0x1c49a23: movl 12(%ebp), %eax
0x1c49a26: movw $105, (%eax)
0x1c49a2b: movw $775, 2(%eax)
0x1c49a31: movl $2, %eax
0x1c49a36: jmp 0x1c49dac ; CFUniCharMapCaseTo + 2220
0x1c49a3b: movl 12(%ebp), %eax
0x1c49a3e: movw $105, (%eax)
0x1c49a43: movw $775, 2(%eax)
0x1c49a49: movw $771, 4(%eax)
0x1c49a4f: movl $3, %eax
0x1c49a54: jmp 0x1c49dac ; CFUniCharMapCaseTo + 2220
0x1c49a59: movl %eax, %edi
0x1c49a5b: movl 1264482(%edi), %eax
0x1c49a61: movl (%eax), %eax
0x1c49a63: movl %eax, (%esp)
0x1c49a66: movl $0, 8(%esp)
0x1c49a6e: movl $48, 4(%esp)
0x1c49a76: calll 0x1bd9980 ; CFAllocatorAllocate
0x1c49a7b: leal 16(%eax), %ecx
0x1c49a7e: movl %ecx, 1379418(%edi)
0x1c49a84: leal 32(%eax), %ecx
0x1c49a87: movl %ecx, 1379422(%edi)
0x1c49a8d: movl 1379410(%edi), %ecx
0x1c49a93: movl (%ecx), %ecx <-- EXC_BAD_ACCESS (code=1,..
0x1c49a95: movl (%ecx), %ecx
Edit:
I tried minimizing the project to show this problem, and... it disappeared. I have a bit of old-style C-code that uses things like malloc, free, freed, memmove, etc. If this bit is simply present, not even called, the problems described occur. My guess now is that some routines are loaded from a library it should not load from. Digging further.
Without exactly answering your question, but attempting to answer as no one else has, it would appear there are no "upper" case associations with those foreign characters.
Could you run a regex, or some kind of string replace to modify all known special characters with a normalized (english) version? Then they would have an uppercase or lowercase conversion.
Of course, this may completely ruin the strings you were reading from the DB if they aren't spelled right.
Well, my hunch that there was a problem with loading from libraries, or the order of loading made me change the order of the frameworks included: under "Build Phases" I spotted "CoreText.framework" as one of the last entries. I moved it to the top spot, and now all works fine for all Deployment Targets, 5.0, 5.1, 6.0, 6.1
I actually looked at the loadmap, that you can generate by setting LD_GENERATE_MAP_FILE to yes, to no avail.
Another pointer was supplied by editing the "Scheme" and switching on "Log library loads" and "Log API Usage", in that you can see that stuff is loaded from various libraries, one of them: CoreText.framework
In the end moving CoreText.framework to the top of the list made it all work.
The "chinese" characters you can still see in the debugger when using Deployment Target 5.0-6.0. With 6.1 even they are gone. I guess they fixed that now.

Resources