I am trying to see the IR of a very simple loop
for (int i = 0; i < 15; i++){
a[b[i]]++;
}
while compile using -O0 and diving into the .ll file, I can see instructions written step by step in the define i32 #main() function. However, while compiling using -O2 and looking into the .ll file, there is only ret i32 0 in the define i32 #main() function. And some call instruction presented in the .ll file compiled by -O0 are changed to tail call in the .ll file compiled by -O2.
Can anyone give a rather detailed explanation on how llvm does the -O2 compilation? Thanks.
T
We can use the Compiler Explorer at godbolt.org to look at your example. We'll use the following testbench code:
int test() {
int a[15] = {0};
int b[15] = {0};
for (int i = 0; i < 15; i++){
a[b[i]]++;
}
return 0;
}
Godbolt shows the x86 assembly, not the LLVM bytecode, but I've summarized it a bit to show what's going on. Here it is at -O0 -m32:
test():
# set up stack
.LBB0_1:
cmp dword ptr [ebp - 128], 15 # i < 15?
jge .LBB0_4 # no? then jump out of loop
mov eax, dword ptr [ebp - 128] # load i
mov eax, dword ptr [ebp + 4*eax - 124] # load b[i]
mov ecx, dword ptr [ebp + 4*eax - 64] # load a[b[i]]
add ecx, 1 # increment it
mov dword ptr [ebp + 4*eax - 64], ecx # store it back
mov eax, dword ptr [ebp - 128]
add eax, 1 # increment i
mov dword ptr [ebp - 128], eax
jmp .LBB0_1 # repeat
.LBB0_4:
# tear down stack
ret
This looks like we'd expect: the loop is clearly visible and it does all the steps we listed. If we compile at -O1 -m32 -march=i386, we see the loop is still there but it's much simpler:
test(): # #test()
# set up stack
.LBB0_1:
mov ecx, dword ptr [esp + 4*eax] # load b[i]
inc dword ptr [esp + 4*ecx + 60] # increment a[b[i]]
inc eax # increment i
cmp eax, 15 # compare == 15
jne .LBB0_1 # no? then loop
# tear down stack
ret
Clang now uses the inc instruction (useful), noticed it could use the eax register for the loop counter i (neat), and moved the condition check to the bottom of the loop (probably better). We can still recognize our original code, though. Now let's try with -O2 -m32 -march=i386:
test():
xor eax, eax # does nothing
ret
That's it? Yes.
clang has detected that the a array can never be used outside of the function. This means doing the incrementing will never affect any other part of the program - and also that nobody will miss it when it's gone.
Removing the increment leaves a for loop with an empty body and no side effects, which can also be removed. In turn, removing the loop leaves an (for all intents and purposes) empty function.
This empty function is likely what you were seeing in the LLVM bytecode (ret i32 0).
This is not a very scientific description, and the steps clang takes might be different, but I hope the example clears it up a bit. If you want, you can read up on the as-if rule. I also recommend playing around on https://godbolt.org/ for a bit: see what happens when you move a and b outside the function, for example.
Related
I'm somewhat new to assembly language and wanted to understand how it works on an older system. I understand that the large memory model uses far pointers while the small memory model uses near pointers, and that the return address in the large model is 4 bytes instead of two, so the first parameter changes from [bp+4] to [bp+6]. However, in the process of adapting a graphics library from a small to a large model, there are other subtle things that I don't seem to understand. Running this code with a large memory model from C is supposed to clear the screen, but instead it hangs the system (it was assembled with TASM):
; void gr256cls( int color , int page );
COLOR equ [bp+6]
GPAGE equ [bp+8]
.MODEL LARGE,C
.186
public C gr256cls
.code
gr256cls PROC
push bp
mov bp,sp
push di
pushf
jmp skip_1
.386
mov ax,0A800h
mov es,ax
mov ax,0E000h
mov fs,ax
CLD
mov al,es:[bp+6]
mov ah,al
mov bx,ax
shl eax,16
mov ax,bx
cmp word ptr GPAGE,0
je short cls0
cmp word ptr GPAGE,2
je short cls0
jmp short skip_0
cls0:
mov bh,0
mov bl,1
call grph_cls256
skip_0:
cmp word ptr GPAGE,1
je short cls1
cmp word ptr GPAGE,2
je short cls1
jmp short skip_1
cls1:
mov bh,8
mov bl,9
call grph_cls256
skip_1:
.186
popf
pop di
pop bp
ret
.386
grph_cls256:
mov fs:[0004h],bh
mov fs:[0006h],bl
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,14848 ;=8192+6656
mov di,0
rep stosd
;; Freezes here.
ret
gr256cls ENDP
end
It hangs at the ret at the end of grph_256cls. In fact, even if I immediately ret from the beginning of the function it still hangs right after. Is there a comprehensive list of differences when coding assembly in the two modes, so I can more easily understand what's happening?
EDIT: To clarify, this is the original source. This is not generated output; it's intended to be assembled and linked into a library.
I changed grph_256cls to a procedure with PROC FAR and it now works without issue:
grph_cls256 PROC FAR
...
grph_cls256 ENDP
The issue had to do with how C expects functions to be called depending on the memory model. In the large memory model, all function calls are far. I hadn't labeled this assumption on the grph_256cls subroutine when trying to call it, so code that didn't push/pop the right values onto/off the stack was assembled instead.
I have two two-dimensional arrays with dynamic sizes (guess that's the proper wording). I copy the content of first one into the other using:
dest:=copy(src,0,4*x*y);
// src,dest:array of array of longint; x,y:longint;
// setlength(both arrays,x,y); //x and y are max 15 bit positive!
It works. However I'm unable to reproduce this in asm. I tried the following variations to no avail... Could someone enlighten me...
MOV ESI,src; MOV EDI,dest; MOV EBX,y; MOV EAX,x; MUL EBX;
PUSH DS; POP ES; MOV ECX,EAX; CLD; REP MOVSD;
Also tried with LEA (didn't expect that to work since it should fetch the pointer address not the array address), no workie, and tried with:
p1:=#src[0,0]; p2:=#dest[0,0]; //being no-type pointers
MOV ESI,p1; MOV EDI,p2... (the same asm)
Hints pls? Btw it's delphi 6. The error is, of course, access violation.
This is really a two-fold three-fold question.
What's the structure of a dynamic array.
Which instructions in asm will copy the array.
I'm throwing random assembler at the CPU, why doesn't it work?
Structure of a dynamic array
See: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats
To quote:
Dynamic Array Types
On the 32-bit platform, a dynamic-array variable occupies 4 bytes of memory (and 8 bytes on 64-bit) that contain a pointer to the dynamically allocated array. When the variable is empty (uninitialized) or holds a zero-length array, the pointer is nil and no dynamic memory is associated with the variable. For a nonempty array, the variable points to a dynamically allocated block of memory that contains the array in addition to a 32-bit (64-bit on Win64) length indicator and a 32-bit reference count. The table below shows the layout of a dynamic-array memory block.
Dynamic array memory layout (32-bit and 64-bit)
Offset 32-bit -8 -4 0
Offset 64-bit -12 -8 0
contents refcount count start of data
So the dynamic array variable is a pointer to the middle of the above structure.
How do I access this in asm
Let's assume the array holds records of type TMyRec
you'll need to run this code for every inner array in the outer array to do the deep copy. I leave this as an exercise for the reader. (you can do the other part in pascal).
type
TDynArr: array of TMyRec;
procedure SlowButBasicMove(const Source: TDynArr; var dest);
asm
//insert register pushes, see below.
mov esi,Source //esi = pointer to source data
mov edi,Dest //edi = pointer to dest
sub esi,8
mov ebx,[esi] //ebx = refcount (just in case)
mov ecx,[esi+4] //ecx = element count
mov edx,SizeOf(TMyRec) //anywhere from 1 to zillions
mul ecx,edx //==ecx=number of bytes in array.
//// now we can start moving
xor ebx,ebx //ebx =0
add eax,8 //eax = #data
#loop:
mov eax,[esi+ebx] //Get data from source
mov [edi+ebx],esi //copy it to dest
add ebx,4 //4 bytes at a time
cmp ebx,ecx //is ebx> number of bytes?
jle loop
//Done copying.
//insert register pops, see below
end;
That's the copy done, however in order for the system not to crash, you need to save and restore the non volatile registers (all but EAX, ECX, EDX), see: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Program_Control
push ebx
push esi
push edi
--- insert code shown above
//restore non-volatile registers
pop edi
pop esi
pop ebx //note the restoring must happen in the reverse order of the push.
See the Jeff Dunteman's book assembly step by step if you're completely new to asm.
You will get access violations if:
you try to read from a wrong address.
you try to write to a wrong adress.
you read past the end of the array.
you write to memory you haven't claimed before using GetMem or whatever means.
if you write past the end of your buffer.
if you do not restore all non-volatile registers
Remember you're directly dealing with the CPU. Delphi will not assist you in any way.
Really fast code will use some form of SSE to move 16bytes per instruction in an unrolled loop, see the above mentioned fastcode for examples of optimized assembler.
Random assembler
In assembler you need to know exactly what you're what to do, how and what the CPU does.
Set a breakpoint and run your code. Press ctrl + alt + C and behold the CPU-debug window.
This will allow you to see the code Delphi generates.
You can single step through the code to see what the CPU does.
see: http://www.plantation-productions.com/Webster/index.html
For more reading.
Dynamic Arrays differ from Static Arrays, especially when it comes to multi-dimensionality.
Refer to this reference for internal formats.
The point is that an Array Of Array Of LongInt of dimensions X and Y (in this order!) is a pointer to an array of X pointers that point to an array of Y LongInts.
Since it seems, from your comments, that you have already allocated the space for all elements in Dest, I assume you want to do a Deep Copy.
Here a sample program, where the assembly as been made as simple as possible for the sake of clarity.
Program Test;
Var Src, Dest: Array Of Array Of LongInt;
X, Y, I, J: Integer;
Begin
X := 4;
Y := 2;
setLength(Src, X, Y);
setLength(Dest, X, Y);
For I := 0 To X-1 Do
For J := 0 To Y-1 Do
Src[I,J] := I*Y + J;
{$ASMMODE intel}
Asm
cld ;Be sure movsd increments the registers
mov esi, DWORD PTR [Src] ;Src pointer
mov edi, DWORD PTR [Dest] ;Dest pointer
mov ecx, DWORD PTR [X] ;Repeat for X times
;The number of elements in Src
#_copy:
push esi ;Save these for later
push edi
push ecx
mov ecx, DWORD PTR [Y] ;Repeat for Y times
;The number of element in a Src[i] array
mov edi, DWORD PTR [edi] ;Get the pointer to the Dest[i] array
mov esi, DWORD PTR [esi] ;Get the pointer to the Src[i] array
rep movsd ;Copy sub array
pop ecx ;Restore
pop edi
pop esi
add esi, 04h ;Go from Src[i] to Src[i+1]
add edi, 04h ;Go from Dest[i] to Dest[i+1]
loop #_copy
End;
For I := 0 To X-1 Do
Begin
WriteLn();
For J := 0 To Y-1 Do
Begin
Write(' ');
Write(Dest[I,J]);
End;
End;
End.
Note 1 This source code is intended to be compile with freepascal.
Donation of Spare Time(TM) for downloading and installing Delphi are welcome!
Note 2 This source code is for illustration purpose only, it is pretty obvious, it has already been stated above, but somehow not everybody got it.
If the OP wanted a fast way to copy the array, they should have stated so.
Note 3 I don't save the clobbered registers, this is bad practice, my bad; I forgot, as there are no subroutines, no optimizations and no reason for the compiler to pass data in the registers between the two fors.
This is left as an exercise to the reader.
reader of this question.
I'm not new with assembly. But I'm new with MASM. (in fact, I was using that hardcore clean tasm stuff for about 8 years without even a single minute of using a single macros, he-he).
Now, I've got to make a simple program. I already did it's main logic. But there is some trouble with output.
When I use
output <some-variable-name>
it makes the thing - it outputs characters.
But now I want to begin output not from the very beginning of some variable but from a specific address in memory. Now I do:
lea eax, <some-variable-name>
mov esi, eax
... manipulations with address in esi, like 'add esi, ebx' and so on...
output esi
But that won't work.
Compiler says 'error A2070: invalid instruction operands'.
I use Microsoft Macro Assembler version 6.11.
Thanks in advance.
Sorry for my broken English.
UPD: defenition of 'output' macros, taken from included 'io.h' file:
output MACRO string,xtra ;; display string
IFB <string>
.ERR <missing operand in OUTPUT>
EXITM
ENDIF
IFNB <xtra>
.ERR <extra operand(s) in OUTPUT>
EXITM
ENDIF
push eax ;; save EAX
lea eax,string ;; string address
push eax ;; string parameter on stack
call outproc ;; call outproc(string)
pop eax ;; restore EAX
ENDM
The soulution in this situation is use following:
lea eax, <some-variable-name>
mov esi, eax
... manipulations with pointer, like 'add esi, edx' and so on ...
push esi
call outproc
I"m learning to code in x86 assembly (32-bit at the moment) and I'm struggling to understand the memory model completely. Particularly confusing is the semantics for labels, and the LEA instruction, and the layout of the executable. I wrote this sample program so i could inspect it running in gdb.
; mem.s
SECTION .data
msg: db "labeled string\n"
db "unlabeled-string\n"
nls: db 10,10,10,10,10,10,10,10
SECTION .text
global _start
_start:
; inspect msg label, LEA instruction
mov eax, msg
mov eax, &msg
mov eax, [msg]
; lea eax, msg (invalid instruction)
lea eax, &msg
lea eax, [msg]
; populate array in BSS section
mov [arr], DWORD 1
mov [arr+4], DWORD 2
mov [arr+8], DWORD 3
mov [arr+12], DWORD 4
; trying to print the unlabeled string
mov eax, 4
mov ebx, 1
lea ecx, [msg+15]
int 80H
mov eax, 1 ; exit syscall
mov ebx, 0 ; return value
int 80H
SECTION .bss
arr: resw 16
I've assembled and linked with:
nasm -f elf -g -F stabs mem.s
ld -m elf_i386 -o mem mem.o
GDB session:
(gdb) disas *_start
Dump of assembler code for function _start:
0x08048080 <+0>: mov $0x80490e4,%eax
0x08048085 <+5>: mov 0x80490e4,%eax
0x0804808a <+10>: mov 0x80490e4,%eax
0x0804808f <+15>: lea 0x80490e4,%eax
0x08048095 <+21>: lea 0x80490e4,%eax
0x0804809b <+27>: movl $0x1,0x8049110
0x080480a5 <+37>: movl $0x2,0x8049114
0x080480af <+47>: movl $0x3,0x8049118
0x080480b9 <+57>: movl $0x4,0x804911c
0x080480c3 <+67>: mov $0x4,%eax
0x080480c8 <+72>: mov $0x1,%ebx
0x080480cd <+77>: lea 0x80490f3,%ecx
0x080480d3 <+83>: int $0x80
0x080480d5 <+85>: mov $0x1,%eax
0x080480da <+90>: mov $0x0,%ebx
0x080480df <+95>: int $0x80
inspecting "msg" value:
(gdb) b _start
Breakpoint 1 at 0x8048080
(gdb) run
Starting program: /home/jh/workspace/x86/mem_addr/mem
(gdb) p msg
# what does this value represent?
$1 = 1700946284
(gdb) p &msg
$2 = (<data variable, no debug info> *) 0x80490e4
# this is the address where "labeled string" is stored
(gdb) p *0x80490e4
# same value as above (eg: p msg)
$3 = 1700946284
(gdb) p *msg
Cannot access memory at address 0x6562616c
# NOTE: 0x6562616c is ASCII values of 'e','b','a','l'
# the first 4 bytes from msg: db "labeled string"... little-endian
(gdb) x msg
0x6562616c: Cannot access memory at address 0x6562616c
(gdb) x &msg
0x80490e4 <msg>: 0x6562616c
(gdb) x *msg
Cannot access memory at address 0x6562616c
Stepping through one instruction at a time:
(gdb) p $eax
$4 = 0
(gdb) stepi
0x08048085 in _start ()
(gdb) p $eax
$5 = 134516964
0x0804808a in _start ()
(gdb) p $eax
$6 = 1700946284
(gdb) stepi
0x0804808f in _start ()
(gdb) p $eax
$7 = 1700946284
(gdb) stepi
0x08048095 in _start ()
(gdb) p $eax
$8 = 134516964
The array was populated with the values 1,2,3,4 as expected:
# before program execution:
(gdb) x/16w &arr
0x8049104 <arr>: 0 0 0 0
0x8049114: 0 0 0 0
0x8049124: 0 0 0 0
0x8049134: 0 0 0 0
# after program execution
(gdb) x/16w &arr
0x8049104 <arr>: 1 2 3 4
0x8049114: 0 0 0 0
0x8049124: 0 0 0 0
0x8049134: 0 0 0 0
I don't understand why printing a label in gdb results in those two values. Also, how can I print the unlabeled string.
Thanks in advance
Its somewhat confusing because gdb doesn't understand the concept of labels, really -- its designed to debug a program written in higher-level language (C or C++, generally) and compiled by a compiler. So it tries to map what it sees in the binary to high-level language concepts -- variables and types -- based on its best guess as to what is going on (in the absence of debug info from the compiler that tells it what is going on).
what nasm does
To the assembler, a label is value that hasn't been set yet -- it actually gets its final value when the linker runs. Generally, labels are used to refer to addresses in sections of memory -- the actual address will get defined when the linker lays out the final executable image. The assembler generates relocation records so that uses of the label can be set properly by the linker.
So when the assembler sees
mov eax, msg
it knows that msg is a label corresponding to an address in the data segment, so it generates an instruction to load that address into eax. When it sees
mov eax, [msg]
it generates an instruction to load 32-bits (the size of register eax) from memory at address of msg. In both cases, there will be a relocation generated so that the linker can plug in the final address msg ends up with.
(aside -- I have no idea what & means to nasm -- it doesn't appear anywhere in the documentation I can see, so I'm suprised it doesn't give an error. But it looks like it treats it as an alias for [])
Now LEA is a funny instruction -- it has basically the same format as a move from memory, but instead of reading memory, it stores the address it would have read from into the destination register. So
lea eax, msg
makes no sense -- the source is the label (address) msg, which is a (link time) constant and is not in memory anywhere.
lea eax, [msg]
works, as the source is in memory, so it sticks the address of the source into eax. This is the same effect as mov eax, msg. Most commonly, you only see lea used with more complex addressing modes, so that you can leverage the x86 AGU to do useful work other than just computing addresses. Eg:
lea eax, [ebx+4*ecx+32]
which does a shift and two adds in the AGU and puts the result into eax rather than loading from that address.
what gdb does
In gdb, when you type p <expression> it tries to evaluate <expression> to the best of its understanding of what the C/C++ compiler means for that expression. So when you say
(gdb) p msg
it looks at msg and says "that looks like a variable, so lets get the current value of that variable and print that". Now it knows that compilers like to put global variables into the .data segment, and that they create symbols for those variables with the same name as the varible. Since it sees msg in the symbol table as a symbol in the .data segment, it assumes that is what is going on, and fetches the memory at that symbol and prints it. Now it has no idea what TYPE that variable is (no debug info), so it guesses that it is a 32-bit int and prints it as that.
So the output
$1 = 1700946284
is the first 4 bytes of msg, treated as an integer.
For p &msg it understands you want to take the address of the variable msg, so it give the address from the symbol directly. When printing addresses, gdb prints the type information it has about those addresses, thus the "data variable, no debug info" that comes out with it.
If you want, you can use a cast to specify the type of something to gdb, and it will use that type instead of what it has guessed:
(gdb) p (char)msg
$6 = 108 'l'
(gdb) p (char [10])msg
$7 = "labeled st"
(gdb) p (char *)&msg
$8 = 0x80490e4 "labeled string\\nunlabeled-string\\n\n\n\n\n\n\n\n\n" <Address 0x804910e out of bounds>
Note in the latter case here, there's no NUL terminator on the string, so it prints out the entire data segment...
To print the unlabelled string with sys_write, you need to figure out the address
and length of string, which you almost have. For completeness you should also check the return value:
mov ebx, 1 ; fd 1 (stdout)
lea ecx, [msg+15] ; address
mov edx, 17 ; length
write_more:
mov eax, 4 ; sys_write
int 80H ; write(1, &msg[15], 17)
test eax, eax ; check for error
js error ; error, eax = -ERRNO
add ecx, eax
sub edx, eax
jg write_more ; only part of the string was written
Chris Dodd sez...
(aside -- I have no idea what & means to nasm -- it doesn't appear anywhere in the documentation I can see, so I'm suprised it doesn't give an error. But it looks like it treats it as an alias for [])
Oh, oh! You've discovered the secret syntax! "&" was added to Nasm (as an alias for "[]") per user request, a long time ago. It was never documented. Never removed, either. I'd stick with "[]". Being "undocumented", it might just disappear. Note that the meaning is almost "opposite" from what it means to gdb!
Might try "-F dwarf" instead of "-F stabs". It's supposed to be the "native" debugging info format used by gdb. (never noticed much difference, myself)
Best,
Frank
http://www.nasm.us
I am using Delphi 2009 with Unicode strings.
I'm trying to Encode a very large file to convert it to Unicode:
var
Buffer: TBytes;
Value: string;
Value := Encoding.GetString(Buffer);
This works fine for a Buffer of 40 MB that gets doubled in size and returns Value as an 80 MB Unicode string.
When I try this with a 300 MB Buffer, it gives me an EOutOfMemory exception.
Well, that wasn't totally unexpected. But I decided to trace it through anyway.
It goes into the DynArraySetLength procedure in the System unit. In that procedure, it goes to the heap and calls ReallocMem. To my surprise, it successfully allocates 665,124,864 bytes!!!
But then towards the end of DynArraySetLength, it calls FillChar:
// Set the new memory to all zero bits
FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);
You can see by the comment what that is supposed to do. There is not much to that routine, but that is the routine that causes the EOutOfMemory exception. Here is FillChar from the System unit:
procedure _FillChar(var Dest; count: Integer; Value: Char);
{$IFDEF PUREPASCAL}
var
I: Integer;
P: PAnsiChar;
begin
P := PAnsiChar(#Dest);
for I := count-1 downto 0 do
P[I] := Value;
end;
{$ELSE}
asm // Size = 153 Bytes
CMP EDX, 32
MOV CH, CL // Copy Value into both Bytes of CX
JL ##Small
MOV [EAX ], CX // Fill First 8 Bytes
MOV [EAX+2], CX
MOV [EAX+4], CX
MOV [EAX+6], CX
SUB EDX, 16
FLD QWORD PTR [EAX]
FST QWORD PTR [EAX+EDX] // Fill Last 16 Bytes
FST QWORD PTR [EAX+EDX+8]
MOV ECX, EAX
AND ECX, 7 // 8-Byte Align Writes
SUB ECX, 8
SUB EAX, ECX
ADD EDX, ECX
ADD EAX, EDX
NEG EDX
##Loop:
FST QWORD PTR [EAX+EDX] // Fill 16 Bytes per Loop
FST QWORD PTR [EAX+EDX+8]
ADD EDX, 16
JL ##Loop
FFREE ST(0)
FINCSTP
RET
NOP
NOP
NOP
##Small:
TEST EDX, EDX
JLE ##Done
MOV [EAX+EDX-1], CL // Fill Last Byte
AND EDX, -2 // No. of Words to Fill
NEG EDX
LEA EDX, [##SmallFill + 60 + EDX * 2]
JMP EDX
NOP // Align Jump Destinations
NOP
##SmallFill:
MOV [EAX+28], CX
MOV [EAX+26], CX
MOV [EAX+24], CX
MOV [EAX+22], CX
MOV [EAX+20], CX
MOV [EAX+18], CX
MOV [EAX+16], CX
MOV [EAX+14], CX
MOV [EAX+12], CX
MOV [EAX+10], CX
MOV [EAX+ 8], CX
MOV [EAX+ 6], CX
MOV [EAX+ 4], CX
MOV [EAX+ 2], CX
MOV [EAX ], CX
RET // DO NOT REMOVE - This is for Alignment
##Done:
end;
{$ENDIF}
So my memory was allocated, but it crashed trying to fill it with zeros. This doesn't make sense to me. As far as I'm concerned, the memory doesn't even need to be filled with zeros - and that is probably a time waster anyhow - since the Encoding statement is about to fill it anyway.
Can I somehow prevent Delphi from doing the memory fill?
Or is there some other way I can get Delphi to allocate this memory successfully for me?
My real goal is to do that Encoding statement for my very large file, so any solution that will allow this would be much appreciated.
Conclusion: See my comments on the answers.
This is a warning to be careful in debugging assembler code. Make sure you break on all the "RET" lines, since I missed the one in the middle of the FillChar routine and erroneously concluded that FillChar caused the problem. Thanks Mason, for pointing this out.
I will have to break the input into Chunks to handle the very large file.
FillChar isn't allocating any memory, so that's not your problem. Try tracing into it and placing breakpoints at the RET statements, and you'll see that the FillChar finishes. Whatever the problem is, it's probably in a later step.
Read a chunk from the file, encode and write to another file, repeat.
A wild guess: Could the problem be memory being overcommitted and when the FillChar actually accesses the memory it can't find a page to actually give you? I don't know if Windows will even overcommit memory, I do know that some OSes do--you don't find out about it until you actually try to make use of the memory.
If this is the case it could cause the blowup in FillChar.
Programs are great at looping. They loop tirelessly without complaining.
Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS won't even know if it has the amount of contiguous memory that you need ahead of time. Your OS says, yeah, I have 1 GB free. But as soon as you go to use it, your OS says, wait, you want all of it in one chunk? Let me make sure I have enough all in one place. If it doesn't you get the error.
If it does have the memory, well, there's still a lot of work for the heap manager in preparing the memory and marking it as used.
So, obviously, it makes some sense to allocate less memory and simply loop through it. This saves the computer from doing a lot of work that it will only have to undo when it's done. Why not have it do just a little bit of work in setting aside your memory, then just keep re-using it?
Stack memory is allocated much faster than heap memory. If you keep your memory usage small (under 1 MB, by default), the compiler may just use stack memory over heap memory, which will make your loops even faster. In addition, local variables that get allocated in the register are very fast.
There are factors such as hard drive cluster and cache sizes, CPU cache sizes, and things, that offer hints about the best chunk sizes. The key is to find a good number. I like to use 64 KB chunks.