BSS, Stack, Heap, Data, Code/Text - Where each of these start in memory? - memory

Segments of memory - BSS, Stack, Heap, Data, Code/Text (Are there any more?).
Say I have a 128MB RAM, Can someone tell me:
How much memory is allocated for each of these memory segments?
Where do they start? Please specify the address range or something like that for better clarity.
What factors influence which should start where?

That question depends on the number of variables used. Since you did not specify what compiler or language or even operating system, that is a difficult one to pin down on! It all rests with the operating system who is responsible for the memory management of the applications. In short, there is no definite answer to this question, think about this, the compiler/linker at runtime, requests the operating system to allocate a block of memory, that allocation is dependent on how many variables there are, how big are they, the scope and usage of the variables. For instance, this simple C program, in a file called simpletest.c:
#include <stdio.h>
int main(int argc, char **argv){
int num = 42;
printf("The number is %d!\n", num);
return 0;
}
Supposing the environment was Unix/Linux based and was compiled like this:
gcc -o simpletest simpletest.c
If you were to issue a objdump or nm on the binary image simpletest, you will see the sections of the executable, in this instance, 'bss', 'text'. Make note of the sizes of these sections, now add a int var[100]; to the above code, recompile and reissue the objdump or nm, you will find that the data section has appeared - why? because we added a variable of an array type of int, with 100 elements.
This simple exercise will prove that the sections grows, and hence the binary gets bigger, and it will also prove that you cannot pre-determine how much memory will be allocated as the runtime implementation varies from compiler to compiler and from operating system to operating system.
In short, the OS calls the shot on the memory management!

you can get all this information compiling your program
# gcc -o hello hello.c // you might compile with -static for simplicity
and then readelf:
# readelf -l hello
Elf file type is EXEC (Executable file)
Entry point 0x80480e0
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x55dac 0x55dac R E 0x1000
LOAD 0x055dc0 0x0809edc0 0x0809edc0 0x01df4 0x03240 RW 0x1000
NOTE 0x000094 0x08048094 0x08048094 0x00020 0x00020 R 0x4
Section to Segment mapping:
Segment Sections...
00 .init .text .fini .rodata __libc_atexit __libc_subfreeres .note.ABI-tag
01 .data .eh_frame .got .bss
02 .note.ABI-tag
The output shows the overall structure of hello. The first program header corresponds to the process' code segment, which will be loaded from file at offset 0x000000 into a memory region that will be mapped into the process' address space at address 0x08048000. The code segment will be 0x55dac bytes large and must be page-aligned (0x1000). This segment will comprise the .text and .rodata ELF segments discussed earlier, plus additional segments generated during the linking procedure. As expected, it's flagged read-only (R) and executable (X), but not writable (W).
The second program header corresponds to the process' data segment. Loading this segment follows the same steps mentioned above. However, note that the segment size is 0x01df4 on file and 0x03240 in memory. This is due to the .bss section, which is to be zeroed and therefore doesn't need to be present in the file. The data segment will also be page-aligned (0x1000) and will contain the .data and .bss ELF segments. It will be flagged readable and writable (RW). The third program header results from the linking procedure and is irrelevant for this discussion.
If you have a proc file system, you can check this, as long as you get "Hello World" to run long enough (hint: gdb), with the following command:
# cat /proc/`ps -C hello -o pid=`/maps
08048000-0809e000 r-xp 00000000 03:06 479202 .../hello
0809e000-080a1000 rw-p 00055000 03:06 479202 .../hello
080a1000-080a3000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0
The first mapped region is the process' code segment, the second and third build up the data segment (data + bss + heap), and the fourth, which has no correspondence in the ELF file, is the stack. Additional information about the running hello process can be obtained with GNU time, ps, and /proc/pid/stat.
example taken from:
http://www.lisha.ufsc.br/teaching/os/exercise/hello.html

memory depend on the global variable and local variable

Related

Why are printed memory addresses in Rust a mix of both 40-bit and 48-bit addresses?

I'm trying to understand the way Rust deals with memory and I've a little program that prints some memory addresses:
fn main() {
let a = &&&5;
let x = 1;
println!(" {:p}", &x);
println!(" {:p} \n {:p} \n {:p} \n {:p}", &&&a, &&a, &a, a);
}
This prints the following (varies for different runs):
0x235d0ff61c
0x235d0ff710
0x235d0ff728
0x235d0ff610
0x7ff793f4c310
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix? Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)? I'm running Windows 10 on an AMD x-64 processor (Phenom || X4) with 24GB RAM.
All the addresses do have the same size, Rust is just not printing trailing 0-digits.
The actual memory layout is an implementation detail of your OS, but the reason that a prints a location in a different memory area than all the other variables is, that a actually lives in your loaded binary, because it is a value that can already be calculated by the compiler. All the other variables are calculated at runtime and live on the stack.
See the compilation result on https://godbolt.org/z/kzSrDr:
.L__unnamed_4 contains the value 5; .L__unnamed_5, .L__unnamed_6 and .L__unnamed_1 are &5 &&5 and &&&5.
So .L__unnamed_1 is what on your system is at 0x7ff793f4c310. While 0x235d0ff??? is on your stack and calculated in the red and blue areas of the code.
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix?
It's not really a mix, Rust just doesn't display leading zeroes. It's really about where the OS maps the various components of the program (data, bss, heap and stack) in the address space.
Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)?
Because println! is a macro which expands to a bunch of stuff in the stackframe, so your values are not defined next to one another in the frame final code (https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5b812bf11e51461285f51f95dd79236b). Though even if they were there'd be no guarantee the compiler wouldn't e.g. be reusing now-dead memory to save up on frame size.

Linux size command, why are bss and data sections not zero?

I came across the size command which gives the section size of the ELF file. While playing around with it, I created an output file for the simplest C++ program :
int main(){return 0;}
Clearly, I have not defined any initialized or uninitialized, data then why are my BSS and DATA sections of the size 512 and 8 bytes?
I thought it might be because of int main(), I tried creating object file for the following C program :
void main(){}
I still don't get 0 for BSS and DATA sections.
Is it because a certain minimum sized memory is allocated to those section?
EDIT- I thought it might be because of linked libraries but my object is dynamically linked so probably it shouldn't be the issue
int main(){return 0;} puts data in .text only.
$ echo 'int main(){return 0;}' | gcc -xc - -c -o main.o && size main.o
text data bss dec hex filename
67 0 0 67 43 main.o
You're probably sizeing a fully linked executable.
$ gcc main.o -o main && size main
text data bss dec hex filename
1415 544 8 1967 7af main
In fact, if you are compiling with the libc attached to the binary, there are functions that are added before (and after) the main() function. They are here mostly to load dynamic libraries (even if you do not need it in your case) and unload it properly once main() end.
These functions have global variables that require storage; uninitialized (zero initialized) global variables in the BSS segment and initialized global variables in the DATA segment.
This is why, you will always see BSS and DATA in all the binaries compiled with the libc. If you want to get rid of this, then you should write your own assembly program, like this (asm.s):
.globl _start
 _start:
mov %eax, %ebx
And, then compile it without the libc:
$> gcc -nostdlib -o asm asm.s
You should reduce your footprint to the BSS and DATA segment on this ELF binary.

Binary Size On chip

I have my code compiled for certain ARM processor and have the binary. Now I want to know the exact size in bytes (address range) it occupies on my FLASh memory.
Coz, I have certain recovery mechanism at the last 1kB of flash and don't want that to be overwritten as it needs to be there permanently.
readelf of binary gives me the start addresses ( mapped to the code & data segments) & I couldn't really map this to what I want.
Pre-initialize flash memory with value'ab', load binary. Read flash memory until you encounter more than 2 'ab' values. This should give the address range in flash memory occupied by binary. ( THis is with the assumption that your binary might not have more than 2 'ab' as part of the binary)
If your compiler/linker is based on gnu toolchain (gcc/ld)
1/ At Compile Time
In your linker script adjust the section size to substract 1K.
You compiler throw error if your code not fit into your flash area.
Example :
MEMORY
{
FLASH (rx) : ORIGIN = 0x08001000, LENGTH = 128K-1K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 16K
}
2/ At Run Time
You can set a symbol in your linker script to determine the end of your program (text segment). You can use this symbol to make a runtime test
Example :
.text :
{
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
3/ Manually
After compiling, use objcopy to convert your elf file to get the binary image that go into your flash. Check your datasheet to get your flash size and manually check if the file size fit into you flash minus 1K.
Exemple :
objcopy -O binary myfile.elf myfile.bin

Using Non-zero indexed Memory in Quartus (Verilog)

I am writing a memory system for a basic 16-bit educational CPU and am running into issues with Quartus Synthesis of my module. Specifically, I have broken down the address space into a few different parts and one of them (which is a ROM) is not synthesizing properly. (NOTE: I am synthesizing for a DE2-115 Altera board, QuartusII 12.1, SystermVerilog code)
So, in order to make the memory-mapped VRAM (a memory module that is dual-ported to allow VGA output while the CPU writes colors) more usable, I included a small ROM in the address space that includes code (assembled functions) that allow you to write characters into memory, ie a print_char function. This ROM is located in memory at a specific address, so in order to simplify the SV, I implemented the ROM (and all the memory modules) like so:
module printROM
(input rd_cond_code_t re_L,
input wr_cond_code_t we_L,
input logic [15:0] memAddr,
input logic clock,
input logic reset_L,
inout wire [15:0] memBus,
output mem_status_t memStatus);
reg [15:0] rom[`VROM_ADDR_LO:`VROM_ADDR_HI];
reg [15:0] dataOut;
/* The ROM */
always #(posedge clock) begin
if (re_L == MEM_RD) begin
dataOut <= rom[memAddr];
end
end
/* Load the rom data */
initial $readmemh("printROM.hex", rom);
/* Drive the line */
tridrive #(.WIDTH(16)) romOut(.data(dataOut),
.bus(memBus),
.en_L(re_L));
/* Manage asserting completion of a read or a write */
always_ff #(posedge clock, negedge reset_L) begin
if (~reset_L) begin
memStatus <= MEM_NOT_DONE;
end
else begin
if ((we_L == MEM_WR) | (re_L == MEM_RD)) begin
memStatus <= MEM_DONE;
end
else begin
memStatus <= MEM_NOT_DONE;
end
end
end
endmodule // printROM
Where VROM_ADDR_LO and VROM_ADDR_HI are two macros defining the addresses of this ROM, namely 'hEB00 and 'hEFFF. Thus, when an address in that range is read/written to by the CPU, this module is able to properly index into the rom memory. This technique works fine in VCS simulation.
However, when I go to synthesize this module, Quartus correctly implies a ROM but has issues initializing it. I get this error:
Error (113012): Address at line 11 exceeds the specified depth (1280) in the Memory Initialization File "psx18240.ram0_printROM_38938673.hdl.mif"
It looks like Quartus is converting the .hex file I give as the ROM code (ie printROM.hex) and using the CPU visible addresses (ie starting at 'hEB00) for the generated .mif file even though the size of rom is obviously too small. Does Quartus not support this syntax or am I doing something wrong?
It would appear that Quartus does not like what you're doing. Probably better would be if you only want a rom with 1280 entries, then you should just create one from 0:1279, and address it using that range (this is what your logic would have to synthesize into anyway).
reg [15:0] rom[0:`VROM_ADDR_HI-`VROM_ADDR_LO];
assign romAddr = (memAddr - `VROM_ADDR_LO);
always #(posedge clock) begin
if (re_L == MEM_RD) begin
dataOut <= rom[romAddr];
end
end
Go for ROM Megafunctions in mega wizard plugin manager in quartus. Its is a GUI based ipcore generater. There you can paramterize all the options even including the hex file. But you have to follow Intel hex file format. This also available in file->new.
For hdl based instantiation
http://quartushelp.altera.com/12.1/mergedProjects/hdl/mega/mega_file_lpm_rom.htm
User guide.
https://www.google.co.in/url?sa=t&source=web&rct=j&ei=nVAGVKvSDcu9ugTMooCQDQ&url=http://www.altera.com/literature/ug/ug_ram_rom.pdf&cd=1&ved=0CBsQFjAA&usg=AFQjCNE2ZXM1gIsMZ5BvkKTHanX1E7vamg

Verify that a '*.map' file match a Delphi application

For my program delphi-code-coverage-wizard, I need to verify that a (detailed) mapping file .map matches a Delphi application .exe
Of course, this verification should be realized with Delphi.
Is there a way to check it ? Maybe by verifying some information from the EXE ?
I think a quite simple heuristic would be to check that the various sections in the PE file start and finish at the same place:
For example, here's the top of a map file.
Start Length Name Class
0001:00401000 000A4938H .text CODE
0002:004A6000 00000C9CH .itext ICODE
0003:004A7000 000022B8H .data DATA
0004:004AA000 000052ACH .bss BSS
0005:00000000 0000003CH .tls TLS
I also looked at what dumpbin /headers had to say about these sections:
SECTION HEADER #1
.text name
A4938 virtual size
1000 virtual address (00401000 to 004A5937)
A4A00 size of raw data
400 file pointer to raw data (00000400 to 000A4DFF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
SECTION HEADER #2
.itext name
C9C virtual size
A6000 virtual address (004A6000 to 004A6C9B)
E00 size of raw data
A4E00 file pointer to raw data (000A4E00 to 000A5BFF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
...truncated
Look at the .text section. According to dumpbin it starts at 00401000 and finishes at 004A5937 which is a length of 000A4938, exactly as in the .map file. Naturally you'd read the PE file directly rather than running dumpbin, but this illustrates the point.
I'd expect a vanishingly small number of false positives with this approach.

Resources