Does dwarf/dwarf2 facilitate deriving size information of a function accepting a fixed size array? - dwarf

Assume a C module containing the following function definition:
void foo(int (*a)[6]){...}
Is it possible to extract the actual array size information '6' of parameter 'a' from the dwarf information (embedded in the resulting .o-file) obtained when compiling the source file with gcc -g?I have applied 'readelf -wi' on the object file to obtain the dwarf information, but I cannot find any information to derive the fixed array size.

It works ok for me using the Fedora 20 system gcc. Whether it works on other versions of gcc, I don't know... I don't recall specific changes in this area, but then gcc does change quite a bit.
Anyway, I compiled a snippet like the above using gcc -g. Then I examined it with "readelf -wi". This dumps the DWARF information. I see:
<1><57>: Abbrev Number: 4 (DW_TAG_array_type)
<58> DW_AT_type : <0x6e>
<5c> DW_AT_sibling : <0x67>
<2><60>: Abbrev Number: 5 (DW_TAG_subrange_type)
<61> DW_AT_type : <0x67>
<65> DW_AT_upper_bound : 5
... as the type of the parameter "a". The subrange type there shows the bounds.

Related

Binary Size On chip

I have my code compiled for certain ARM processor and have the binary. Now I want to know the exact size in bytes (address range) it occupies on my FLASh memory.
Coz, I have certain recovery mechanism at the last 1kB of flash and don't want that to be overwritten as it needs to be there permanently.
readelf of binary gives me the start addresses ( mapped to the code & data segments) & I couldn't really map this to what I want.
Pre-initialize flash memory with value'ab', load binary. Read flash memory until you encounter more than 2 'ab' values. This should give the address range in flash memory occupied by binary. ( THis is with the assumption that your binary might not have more than 2 'ab' as part of the binary)
If your compiler/linker is based on gnu toolchain (gcc/ld)
1/ At Compile Time
In your linker script adjust the section size to substract 1K.
You compiler throw error if your code not fit into your flash area.
Example :
MEMORY
{
FLASH (rx) : ORIGIN = 0x08001000, LENGTH = 128K-1K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 16K
}
2/ At Run Time
You can set a symbol in your linker script to determine the end of your program (text segment). You can use this symbol to make a runtime test
Example :
.text :
{
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
3/ Manually
After compiling, use objcopy to convert your elf file to get the binary image that go into your flash. Check your datasheet to get your flash size and manually check if the file size fit into you flash minus 1K.
Exemple :
objcopy -O binary myfile.elf myfile.bin

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.
The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

how to get the available memory on the device

I'm trying to obtain how much free memory I have on the device. To do this I call the cuda function cuMemGetInfo from a fortran code, but it returns negative values for the free amount of memory, so there's clearly something wrong.
Does anyone know how I can do that?
Thanks
EDIT:
Sorry, in fact my question was not very clear. I'm using OpenACC in Fortran and I call the C++ cuda function cudaMemGetInfo. Finally I could fix the code, the problem was effectively the kind of variables that I was using. Switching to size_ fixed everything. This is the interface in fortran that I'm using:
interface
subroutine get_dev_mem(total,free) bind(C,name="get_dev_mem")
use iso_c_binding
integer(kind=c_size_t)::total,free
end subroutine get_dev_mem
end interface
and this the cuda code
#include <cuda.h>
#include <cuda_runtime.h>
extern "C" {
void get_dev_mem(size_t& total, size_t& free)
{
cuMemGetInfo(&free, &total);
}
}
There's one last question: I pushed an array on the gpu and I checked its size using cuMemGetInfo, then I computed it's size counting the number of bytes, but I don't have the same answer, why? In the first case it is 3052mb large, in the latter 3051mb. This difference of 1mb could be the size of the array descriptor? Here there's the code that I used:
integer, parameter:: long = selected_int_kind(12)
integer(kind=c_size_t) :: total, free1,free2
real(8), dimension(:),allocatable::a
integer(kind=long)::N, eight, four
allocate(a(four*N))
!some OpenACC stuff in order to init the gpu
call get_dev_mem(total,free1)
!$acc data copy(a)
call get_dev_mem(total,free2)
print *,"size a in the gpu = ",(free1-free2)/1024/1024, " mb"
print *,"size a in theory = ", (eight*four*N)/1024/1024, " mb"
!$acc end data
deallocate(a)
Right, so, like commenters have suggested, we're not sure exactly what you're running, but filling in the missing details by guessing, here's a shot:
Most CUDA API calls return a status code (or error code if you will); this is true both in C/C++ and in Fortran, as we can see in the Portland Group's CUDA Fortran Manual:
Most of the runtime API routines are integer functions that return an error code; they return a value of zero if the call was successful, and a nonzero value if there was an error. To interpret the error codes, refer to “Error Handling,” on page 48.
This is the case for cudaMemGetInfo() specifically:
integer function cudaMemGetInfo( free, total )
integer(kind=cuda_count_kind) :: free, total
The two integers for free and total are cuda_count_kind, which if I am not mistaken are effectively unsigned... anyway, I would guess that what you're getting is an error code. Have a look at the Error Handling section on page 48 of the manual.

compilers - Instruction Selection for type declarations in AST

I'm learning compilers and creating a code generator for a simple language that deals with two types: characters and integers.
After the user input has been scanned by the scanner and then parsed by the parser, I get an AST representation of the input. I have made a code generation for an even simpler language which only processes expressions with integers, operators and variables.
However with this new language I sometimes get a subtree for a type declaration, like this:
(IS TYPE (x) (INT))
which says x is of type INT.
Should there be a case in my code generator which deals with these type declarations? Or is this simply for the semantic analyzer to type check, so I should just assume the types have been checked and ignore this part of the tree and simply assign the value for x?
Both situations are possible, you need to describe more about your language, to see if you really need to add that feature to your code generator, or skip it as unnecessary, and avoid extra work with this difficult and interesting topic of designing a programming language.
Is you "code generator" a program that recieves as an input code in a programming language (maybe small one) and outputs code in another programming language (maybe small one) ?
This tool is usually called a "translator".
Is you "code generator" a program that receive as an input a programming language and outputs assembler / bytecode like programming language ?
This tool is usually called a "compiler".
Note: "pile" is a synonym for "stack".
Usually an A.S.T., stores the type of an operation, or function call. Example, in c:
...
int a = 3;
int b = 5;
float c = (float)(a * b);
...
The last line, generates an A.S.T. similar to this, (skip A.S.T. for other lines):
..................................................................
..................................................................
......................+--------------+............................
......................| [root] |............................
......................| (no type) = |............................
......................+------+-------+............................
.............................|....................................
.................+-----------+------------+.......................
.................|........................|.......................
...........+-----+-----+....+-------------+-------------+.........
...........| (int) c |....| (float) (cast operation) |.........
...........+-----------+....+-------------+-------------+.........
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) () |.................
....................................+-----+-----+.................
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) * |.................
....................................+-----+-----+.................
..........................................|.......................
..............................+-----------+-----------+...........
..............................|.......................|...........
........................+-----+-----+...........+-----+-----+.....
........................| (int) a |...........| (float) b |.....
........................+-----------+...........+-----------+.....
..................................................................
..................................................................
Note that the "(float)" cast its like an operator or a function,
similar to your question.
Good Luck.
If this is a declaration
(IS TYPE (x) (INT))
then x should be laid out in memory. In the case of C and automatic variables, local auto variables are allocated on stack. To allocate needed size of stack you should know sizes of all local vars and sizes are from types.
If this variable is stored in a register, you should select a register of needed size (think about x86 with: AL, AX, EAX, RAX - the same register with different sizes), if your target has such.
Also, type is needed when there is an ambiguous operation in AST, which can operate on different data sizes (e.g. char, short, int - or 8-bit, 16-bit, 32-bit, etc). And for some assemblers, size of data is encoded into instruction itself; so codegen should remember sizes of variables.
Or, if the type of operation was not recorded in AST, the ADD:
(ADD (x) (y))
may mean both float and int additions (ADD or FADD instructions), so types of x and y are needed in codegen to select right variant.

BSS, Stack, Heap, Data, Code/Text - Where each of these start in memory?

Segments of memory - BSS, Stack, Heap, Data, Code/Text (Are there any more?).
Say I have a 128MB RAM, Can someone tell me:
How much memory is allocated for each of these memory segments?
Where do they start? Please specify the address range or something like that for better clarity.
What factors influence which should start where?
That question depends on the number of variables used. Since you did not specify what compiler or language or even operating system, that is a difficult one to pin down on! It all rests with the operating system who is responsible for the memory management of the applications. In short, there is no definite answer to this question, think about this, the compiler/linker at runtime, requests the operating system to allocate a block of memory, that allocation is dependent on how many variables there are, how big are they, the scope and usage of the variables. For instance, this simple C program, in a file called simpletest.c:
#include <stdio.h>
int main(int argc, char **argv){
int num = 42;
printf("The number is %d!\n", num);
return 0;
}
Supposing the environment was Unix/Linux based and was compiled like this:
gcc -o simpletest simpletest.c
If you were to issue a objdump or nm on the binary image simpletest, you will see the sections of the executable, in this instance, 'bss', 'text'. Make note of the sizes of these sections, now add a int var[100]; to the above code, recompile and reissue the objdump or nm, you will find that the data section has appeared - why? because we added a variable of an array type of int, with 100 elements.
This simple exercise will prove that the sections grows, and hence the binary gets bigger, and it will also prove that you cannot pre-determine how much memory will be allocated as the runtime implementation varies from compiler to compiler and from operating system to operating system.
In short, the OS calls the shot on the memory management!
you can get all this information compiling your program
# gcc -o hello hello.c // you might compile with -static for simplicity
and then readelf:
# readelf -l hello
Elf file type is EXEC (Executable file)
Entry point 0x80480e0
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x55dac 0x55dac R E 0x1000
LOAD 0x055dc0 0x0809edc0 0x0809edc0 0x01df4 0x03240 RW 0x1000
NOTE 0x000094 0x08048094 0x08048094 0x00020 0x00020 R 0x4
Section to Segment mapping:
Segment Sections...
00 .init .text .fini .rodata __libc_atexit __libc_subfreeres .note.ABI-tag
01 .data .eh_frame .got .bss
02 .note.ABI-tag
The output shows the overall structure of hello. The first program header corresponds to the process' code segment, which will be loaded from file at offset 0x000000 into a memory region that will be mapped into the process' address space at address 0x08048000. The code segment will be 0x55dac bytes large and must be page-aligned (0x1000). This segment will comprise the .text and .rodata ELF segments discussed earlier, plus additional segments generated during the linking procedure. As expected, it's flagged read-only (R) and executable (X), but not writable (W).
The second program header corresponds to the process' data segment. Loading this segment follows the same steps mentioned above. However, note that the segment size is 0x01df4 on file and 0x03240 in memory. This is due to the .bss section, which is to be zeroed and therefore doesn't need to be present in the file. The data segment will also be page-aligned (0x1000) and will contain the .data and .bss ELF segments. It will be flagged readable and writable (RW). The third program header results from the linking procedure and is irrelevant for this discussion.
If you have a proc file system, you can check this, as long as you get "Hello World" to run long enough (hint: gdb), with the following command:
# cat /proc/`ps -C hello -o pid=`/maps
08048000-0809e000 r-xp 00000000 03:06 479202 .../hello
0809e000-080a1000 rw-p 00055000 03:06 479202 .../hello
080a1000-080a3000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0
The first mapped region is the process' code segment, the second and third build up the data segment (data + bss + heap), and the fourth, which has no correspondence in the ELF file, is the stack. Additional information about the running hello process can be obtained with GNU time, ps, and /proc/pid/stat.
example taken from:
http://www.lisha.ufsc.br/teaching/os/exercise/hello.html
memory depend on the global variable and local variable

Resources