In memory, how does the CPU know when a variable is over.
Lets say you have a string:
x = 'Hi';
And it's stored to memory.
When the CPU is executing the program, how does it know when the data in that memory address is over.
In binary, x would be stored as:
Byte 1:1001000
Byte 2:1101001
What tells the CPU that byte of data is over?
I am aware of Null Terminators, but in Assembly and lower (IE; Machine Code) what declares the end of that variable?
Related
I work on a simple AVR programmer for my university project, and I am stuck with understanding how I can map memory from hex file to actual flash memory.
For instance, intel hex provides us the information about start address of data block, number of bytes in it and data itself. The trouble comes from that AVR MCUs, in particular ATmega16, often have one address for two bytes: high and low.
At first, I wrote a straightforward function, that just reads all the data from hex file and write it sequentially, increasing address by one each two bytes passed. To my surprise it works on simple blinky code. However, I am not sure, if this approach would work, if someone needs complex memory structure.
So the questions are:
Will this solution work on complex memory structures?
If not, how can I map intel hex address into actual flash address? The problem is there is no high and low bytes in intel hex format, only address = byte.
Intel hex uses byte addresses. The PC program counter refers to 16-bit word addresses. If you mean the word address to be the "actual address", then just double the number that represents the start address of the line in the hex file.
What do you mean by "complex memory structures"? Memory locations need unique addresses, no matter how that address space is broken up. I am not familiar with program memory spaces that don't start with 0 and continue linearly, but if there were such a scheme, a line in an intel hex file can specify the contents of any contiguous memory section starting at any address.
Edit:
Each line of an intel hex file can only contain up to 255 bytes. Typically, the data is split into 16 or 32 bytes chunks. Each line contains the start address of the chunk (which is added to the base address if used). A chunk doesn't have to start at the end of a previous chunk, and they can be out of order, too.
As for the complex memory structures you describe, most programs have them already. There is usually a vector table at the start, followed by a gap, followed by the crt and main program. Data to initialize global variables follows that. If there is a bootloader, it is placed in a special section at the end of memory.
I am fairly new to programming and am starting to learn the ins and outs of memory allocation. One question that recently occurred to me that I haven't yet been able to find a clear answer to is do memory addresses themselves take up memory. For example, in a 32-bit system, the way I understand it is that each address in 4 bytes and they will typically refer to an empty 'bucket' in memory that is capable of storing 1 byte of data. Does this mean that for each memory location in a 32-bit system, we are actually using 5 bytes of memory(meaning 4 for the address and 1 for the empty bucket)? I'm sure I am missing something here but any clarification would be much appreciated. Thanks!
To reference a memory address you need to express that memory address somehow, and on 32 bit system a memory reference takes 4 bytes indeed. So for any addressable memory address, somewhere else in memory there are 4 bytes that have that address.
But this does not cascade to an x5 multiplication, because a program does not need to reference every byte of memory. It only needs the address where something in memory starts, and then it can work its way to every byte of that 'something' using arithmetic.
To give an example: you have a string in memory Justin Foss. Is it at address 0x10000000, and this address is stored in a variable. So the actual variable value is 0x10000000, pointing to the string Justin Foss. But at 0x10000000 you only have one byte, the J. At 0x10000001 there is the u, at 0x10000002 is the s and so on. Your application does not need a variable for each character, it only needs one variable (4 bytes) to the beginning of the string. Same for object (fields): you only store the address where the objects starts, and the compiler know how to do the arithmetic to find the field it needs, by adding the necessary offset. In general memory objects are quite large, and a few 4 byte variables in the program reference quite a bit of memory.
(at the risk of oversimplification) Memory is sequential. Address 123 is the one-hundred and twenty third byte from the first (zeroth) by in the system. There is no memory devoted to indicating byte 123 is 123. The byte that comes after that is 124.
I am trying to find some useful information on the malloc function.
when I call this function it allocates memory dynamically. it returns the pointer (e.g. the address) to the beginning of the allocated memory.
the questions:
how the returned address is used in order to read/write into the allocated memory block (using inderect addressing registers or how?)
if it is not possible to allocate a block of memory it returns NULL. what is NULL in terms of hardware?
in order to allocate memory in heap we need to know which memory parts are occupied. where this information (about the occupied memory) is stored (if for example we use a small risc microcontroller)?
Q3 The usual way that heaps are managed are through a linked list. In the simplest case, the malloc function retains a pointer to the first free-space block in the heap, and each free-space block has a header that points to the next free space block in the heap. So the heap is in-effect self-defining in terms of knowing what is not occupied (and by inference what is therefore occupied); this minimizes the amount of overhead RAM needed to manage the heap.
When new space is needed via a malloc call, a large enough free-space block is found by traversing the linked list. That found free-space block is given to the malloc caller (with a small hidden header), and if needed a smaller free-space block is inserted into the linked list with any residual space between the original free space block and how much memory the malloc call asked for.
When a heap block is released by the application, its block is just formatted with the linked-list header, and added to the linked list, usually with some extra logic to combine consecutive free-space blocks into one larger free-space block.
Debugging versions of malloc usually do more, including retaining linked-lists of the allocated areas too, "guard zones" around the allocated heap areas to help detect memory overflows, etc. These take up extra heap space (making the heap effectively smaller in terms of usable space for the applications), but are extremely helpful when debugging.
Q2 A NULL pointer is effectively just a zero, which if used attempts to access memory starting at location 0 of RAM, which is almost always reserved memory of the OS. This is the cause of a significant quantity of memory violation aborts, all caused by programmer's lack of error checking for NULL returns from functions that allocate memory).
Because accessing memory location 0 by a non-OS application is never what is wanted, most hardware aborts any attempt to access location 0 by non-OS software. Even with page mapping such that the applications memory space (including location 0) is never mapped to real RAM location 0, since NULL is always zero, most CPUs will still abort attempts to access location 0 on the assumption that this is an access via a pointer that contains NULL.
Given your RISC processor, you will need to read its documentation to see how it handles attempts to access memory location 0.
Q1 There are many high-level language ways to use allocated memory, primarily through pointers, strings, and arrays.
In terms of assembly language and the hardware itself, the allocated heap block address just gets put into a register that is being used for memory indirection. You will need to see how that is handled in the RISC processor. However if you use C or C++ or such higher level language, then you don't need to worry about registers; the compiler handles all that.
Since you are using malloc, can we assume you are using C?
If so, you assign the result to a pointer variable, then you can access the memory by referencing through the variable. You don't really know how this is implemented in assembly. That depends on CPU you are using. malloc return 0 if it fails. Since usually NULL is defined as 0, you can test for NULL. You don't care how malloc tracks the free memory. If you really need this information, you should look at the source in glibc/malloc available on the net
char * c = malloc(10); // allocate 10 bytes
if (c == NULL)
// handle error case
else
*c = 'a' // write a in the first character on the block
My question is : Are the name of the variable and the data itself both stored in the stack?
I would like to know how the name of the variable is linked to the address memory in the stack (the data) and what does it.
Also how does anything know the number of bytes the type of the variable is composed of and how does it decides to read these exact number of bytes in the stack?
Does all the data stored in the stack occupy the same space, no matter the type of data it is?.
And the same questions with the heap?
Generally, I believe the following to be true in most practical implementations:
No, the name and actual data are not both stored on the stack.
The compiler keeps track of where the variable is on the stack, and when the compiler is done, all references to the variable (ie the name) has been substituted by a proper increse/decrease of the stack pointer to address the memory area where the data is stored.
No, they do not occupy the same space. A 4 byte var takes up 4 bytes. A 1000000 bytes variable take up 1000000 bytes (but that's not recommended, usually).
The heap is a bit different... Maybe this page can answer your question a bit more: http://www.learncpp.com/cpp-tutorial/79-the-stack-and-the-heap
I've been getting into some assembly lately and its fun as it challenges everything i have learned. I was wondering if i could ask a few questions
When running an executable, does the entire executable get loaded into memory?
From a bit of fiddling i've found that constants aren't really constants? Is it just a compiler thing?
const int i = 5;
_asm { mov i, 0 } // i is now 0 and compiles fine
So are all variables assigned with a constant value embedded into the file as well?
Meaning:
int a = 1;
const int b = 2;
void something()
{
const int c = 3;
int d = 4;
}
Will i find all of these variables embedded in the file (in a hex editor or something)?
If the executable is loaded into memory then "constants" are technically using memory? I've read around on the net people saying that constants don't use memory, is this true?
Your executable's text (i.e. code) and data segments get mapped into the process's virtual address space when the executable starts up, but the bytes might not actually be copied from the disk until those memory locations are accessed. See http://en.wikipedia.org/wiki/Demand_paging
C-language constants actually exist in memory, because you have to be able to take the address of them. (That is, &i.) Constants are usually found in the .rdata segment of your executable image.
A constant is going to take up memory somewhere--if you have the constant number 42 in your program, there must be somewhere in memory where the 42 is stored, even if that means that it's stored as the argument of an immediate-mode instruction.
The OS loads the code and data segments in order to prepare them for execution.
If the executable has a resource segment, the application loads parts of it at demand.
It's true that const variables take memory space but compilers are free to optimize
for memory usage and code size, and embed their values in the code.
(in case they don't detect any address references for those variables)
const char * aka C strings, usually are interned by the compilers, to save memory.