Why does allocated memory is different than the size of the string? - memory

Please consider the following code:
char **ptr;
str = malloc(sizeof(char *) * 3); // Allocates enough memory for 3 char pointers
str[0] = malloc(sizeof(char) * 24);
str[1] = malloc(sizeof(char) * 25);
str[2] = malloc(sizeof(char) * 25);
When I use some printf to print the memory adresses of each pointer:
printf("str[0] = '%p'\nstr[1] = '%p'\nstr[2] = '%p'\n", str[0], str[1], str[2]);
I get this output:
str[0] = '0x1254030'
str[1] = '0x1254050'
str[2] = '0x1254080'
I expected the number corresponding to the second adress to be the sum of the number corresponding to the first one, and 24, which corresponds to the size of the string str[0] in bytes (since a char has a size of 1 byte). I expected the number corresponding to the second adress to be 0x1254047, considering that this number is expressed in base 16 (0123456789abcdef).
It seems to me that I spotted a pattern: from 24 characters in a string, for every 16 more characters contained in it, the memory used is 16 bytes larger. For example, a 45 characters long string uses 64 bytes of memory, a 77 characters long string uses 80 bytes of memory, and a 150 characters long string uses 160 bytes of memory.
Here is an illustration of the pattern:
I would like to understand why the memory allocated isn't equal to the size of the string. Why does it follow this pattern?

There at least two reasons malloc() may return memory in the manner you've noted.
Efficiency and peformance. By returning blocks of memory in only a small set of actual sizes, a request for memory is much more likely to find an already-existing block of memory that can be used, or wind up producing a block of memory that can be easily reused in the future. This will make the request return faster and has the side effect of limiting memory fragmentation.
Implications arising from the memory alignment requirements 7.22.3 Memory management functions of the C Standard states "The order and contiguity of storage allocated by successive calls to the
aligned_alloc, calloc,
malloc, and
realloc functions is unspecified. The
pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to
a pointer to any type of object with a fundamental alignment requirement ..." Note the italicized part. Since the malloc() implementation has no knowledge of what the memory is to be used for, the memory returned has to be suitably aligned for any possible use. This usually means 8- or 16-byte alignment, depending on the platform.

Three reasons:
(1) The string "ABC" contains 4 characters, because every string in C has to have a terminating '\0'.
(2) Many processors have memory address alignment issues that make it most efficient when any block allocated by malloc() starts on an address that is a multiple of 4, or 8, or whatever the "natural" memory size is.
(3) The malloc() function itself requires some memory to store information about what has been allocated where, so that free() knows what to do.

Related

Why are printed memory addresses in Rust a mix of both 40-bit and 48-bit addresses?

I'm trying to understand the way Rust deals with memory and I've a little program that prints some memory addresses:
fn main() {
let a = &&&5;
let x = 1;
println!(" {:p}", &x);
println!(" {:p} \n {:p} \n {:p} \n {:p}", &&&a, &&a, &a, a);
}
This prints the following (varies for different runs):
0x235d0ff61c
0x235d0ff710
0x235d0ff728
0x235d0ff610
0x7ff793f4c310
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix? Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)? I'm running Windows 10 on an AMD x-64 processor (Phenom || X4) with 24GB RAM.
All the addresses do have the same size, Rust is just not printing trailing 0-digits.
The actual memory layout is an implementation detail of your OS, but the reason that a prints a location in a different memory area than all the other variables is, that a actually lives in your loaded binary, because it is a value that can already be calculated by the compiler. All the other variables are calculated at runtime and live on the stack.
See the compilation result on https://godbolt.org/z/kzSrDr:
.L__unnamed_4 contains the value 5; .L__unnamed_5, .L__unnamed_6 and .L__unnamed_1 are &5 &&5 and &&&5.
So .L__unnamed_1 is what on your system is at 0x7ff793f4c310. While 0x235d0ff??? is on your stack and calculated in the red and blue areas of the code.
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix?
It's not really a mix, Rust just doesn't display leading zeroes. It's really about where the OS maps the various components of the program (data, bss, heap and stack) in the address space.
Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)?
Because println! is a macro which expands to a bunch of stuff in the stackframe, so your values are not defined next to one another in the frame final code (https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5b812bf11e51461285f51f95dd79236b). Though even if they were there'd be no guarantee the compiler wouldn't e.g. be reusing now-dead memory to save up on frame size.

Length and size of strings in Elixir / Erlang needs explanation

Can someone explain why s is a string with 4096 chars
iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096
but its memory size are a few 6 words?
iex(11)> :erts_debug.size(s)
6 # WHAT?!
And why s2 is a much shorter string than s
iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50
but its size has more 3 words than s?
iex(15)> :erts_debug.size(s2)
9 # WHAT!?
And why does the size of these strings does not match their lengths?
Thanks
First clue why this is showing that values can be found in this question. Quoting size/1 docs:
%% size(Term)
%% Returns the size of Term in actual heap words. Shared subterms are
%% counted once. Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%% while flat_size(B) returns 12.
Second clue can be found in Erlang documentation about bitstrings implementation.
So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.
In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.

Direct Mapped Cache of Blocks Example

So i have this question in my homework assignment that i have struggling a bit with. I looked over my lecture content/notes and have been able to utilize those to answer the questions, however, i am not 100% sure that i did everything correctly. There are two parts (part C and D) in the question that i was not able to figure out even after consulting my notes and online sources. I am not looking for a solution for those two parts by any means, but it would be greatly appreciated if i could get, at least, a nudge in the right direction in how i can go about solving it.
I know this is a rather large question, however, i hope someone could possibly check my answers and tell me if all my work and methods of looking at this problem is correct. As always, thank you for any help :)
Alright, so now that we have the formalities out of the way,
--------------------------Here is the Question:--------------------------
Suppose a small direct-mapped cache of blocks with 32 blocks is constructed. Each cache block stores
eight 32-bit words. The main memory—which is byte addressable1—is 16,384 bytes in size. 32-bit words are stored
word aligned in memory, i.e., at an address that is divisible by 4.
(a) How many 32-bit words can the memory store (in decimal)?
(b) How many address bits would be required to address each byte of memory?
(c) What is the range of memory addresses, in hex? That is, what are the addresses of the first and last bytes of
memory? I'll give you a hint: memory addresses are numbered starting at 0.
(d) What would be the address of the last word in memory?
(e) Using the cache mapping scheme discussed in the Chapter 5 lecture notes, how many and which address bits
would be used to form the block offset?
(f) How many and which memory address bits would be used to form the cache index?
(g) How many and which address bits would be used to form the tag field for each cache block?
(h) To which cache block (in decimal) would memory address 0x2A5C map to?
(i) What would be the block offset (in decimal) for 0x2A5C?
(j) How many other main memory words would map to the same block as 0x2A5C?
(k) When the word at 0x2A5C is moved into a cache block, what are the memory addresses (in hex) of the other
words which will also be moved into this block? Express your answer as a range, e.g., [0x0000, 0x0200].
(l) The first word of a main memory block that is mapped to a cache block will always be at an address that is
divisible by __ (in decimal)?
(m) Including the V and tag bits of each cache block, what would be the total size of the cache (in bytes)
(n) what would be the size allocated for the data bits (in bytes)?
----------------------My answers and work-----------------------------------
a) memory = 16384 bytes. 16384 bytes into bits = 131072 bits. 131072/32 = 4096 32-bit words
b) 2^14 (main memory) * 2^2 (4 bits/word) = 2^16. take log(base2)(2^16) = 16 bits
c) couldnt figure this part out (would appreciate some input (NOT A SOLUTION) on how i can go about looking at this problem
d)could not figure this part out either :(
e)8 words in each cache line. 8 * 4(2^2 bits/word) = 32 bits in each cache line. log(base2)(2^5) = 5 bits used for block offset.
f) # of blocks = 2^5 = 32 blocks. log(base2)(2^5) = 5 bits for cache index
g) tag = 16 - 5 - 5 - 2(word alignment) = 4 bits
h) 0x2A5C
0010 10100 10111 00
tag index offset word aligned bits
maps to cache block index = 10100 = 0x14
i) maps to block offset = 10111 = 0x17
j) 4 tag bits, 5 block offset = 2^9 other main memory words
k) it is a permutation of the block offsets. so it maps the memory addresses with the same tag and cache index bits and block offsets of 0x00 0x01 0x02 0x04 0x08 0x10 0x11 0x12 0x14 0x18 0x1C 0x1E 0x1F
l)divisible by 4
m) 2(V+tag+data) = 2(1+4+2^3*2^5) = 522 bits = 65.25 bytes
n)data bits = 2^5 blocks * 2^3 words per block = 256 bits = 32 bytes
Part C:
If a memory has M bytes, and the memory is byte addressable, the the memory addresses range from 0 to M - 1.
For your question, this means that memory addresses range from 0 to 16383, or in hex 0x0 to 0x3FFF.
Part D:
Words are 4 bytes long. So given your answer to C, the last word is at:
(0x3FFFF - 3) -> 0x3FFC.
You can see that this is correct because the lowest 2 bits of the address are 0, which must be true of any 4 byte aligned address.

Buffer Overflow Not Overflowing Return Address

Below is the C code
#include <stdio.h>
void read_input()
{
char input[512];
int c = 0;
while (read(0, input + c++,1) == 1);
}
int main ()
{
read_input();
printf("Done !\n");
return 0;
}
In the above code, there should be a buffer overflow of the array 'input'. The file we give it will have over 600 characters in it, all 2's ( ex. 2222222...) (btw, ascii of 2 is 32). However, when executing the code with the file, no segmentation fault is thrown, meaning program counter register was unchanged. Below is the screenshot of the memory of input array in gdb, highlighted is the address of the ebp (program counter) register, and its clear that it was skipped when writing:
LINK
The writing of the characters continues after the program counter, which is maybe why segmentation fault is not shown. Please explain why this is happening, and how to cause the program counter to overflow.
This is tricky! Both input[] and c are in stack, with c following the 512 bytes of input[]. Before you read the 513th byte, c=0x00000201 (513). But since input[] is over you are reading 0x32 (50) onto c that after reading is c=0x00000232 (562): in fact this is little endian and the least significative byte comes first in memory (if this was a big endian architecture it was c=0x32000201 - and it was going to segfault mostly for sure).
So you are actually jumping 562 - 513 = 49 bytes ahead. Than there is the ++ and they are 50. In fact you have exactly 50 bytes not overwritten with 0x32 (again... 0x3232ab64 is little endian. If you display memory as bytes instead of dwords you will see 0x64 0xab 0x32 0x32).
So you are writing in not assigned stack area. It doesn't segfault because it's in the process legal space (up to the imposed limit), and is not overwriting any vital information.
Nice example of how things can go horribly wrong without exploding! Is this a real life example or an assignment?
Ah yes... for the second question, try declaring c before input[], or c as static... in order not to overwrite it.

What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

There are 2 pointers to 2 unaligned 8 byte chunks to be loaded into an xmm register. If possible, using intrinsics. And if possible, without using an auxiliary register. Without pinsrd. (SSSE Core 2)
From the msvc specs, it looks like you can do the following:
__m128d xx; // an uninitialised xmm register
xx = _mm_loadh_pd(xx, ptra); // load the higher 64 bits from (unaligned) ptra
xx = _mm_loadl_pd(xx, ptrb); // load the lower 64 bits from (unaligned) ptrb
Loading from unaligned storage (in my experience) is very much slower than loading from aligned pointers, so you properly wouldn't want to be doing this type of operation too often - if you really want higher performance.
Hope this helps.
Unaligned access is so much slower than aligned access (at least pre-Nehalem );
you may get better speed by loading the aligned 128 bit words that contain the desired unaligned 64 bit words, then shuffle them to make the result you want.
Assumes:
you have memory read access to the full 128 word
the 64 bit words are aligned on at least 32 bit boundaries
e.g. (not tested)
int aoff = ptra & 15;
int boff = ptrb & 15;
__m128 va = _mm_load_ps( (char*)ptra - aoff );
__m128 vb = _mm_load_ps( (char*)ptrb - boff );
switch ( (aoff<<4) | boff )
{
case 0: _mm_shuffle_ps(va,vb, ...
The number of cases depends on whether you can assume 64 bit alignment

Resources