I´m experiencing an weird issue that I have never seen before, in Delphi 2010 sometimes when using the routine CopyMemory (Which internally calls Move) I get an Invalid Float Point Operation exception, when such thing could happen when using Move??
I have a debug information in assembler, I have checked the source code of Move and the problem happens in FILD instruction, I found that FILD converts an integer value from memory to float point in a register and it could trigger that invalid operation, but why that happens? I´m stuck with this for 2 days now
Assembler Information:
; System.Move (Line=0 - Offset=1)
;
00404E0C cmp eax, edx
00404E0E jz System.Move
00404E10 cmp ecx, +$20
00404E13 jnbe System.Move
00404E15 sub ecx, +$08
00404E18 jnle System.Move
00404E1A jmp dword ptr [System.Move+ecx*4]
00404E21 fild qword ptr [ecx+eax]
00404E24 fild qword ptr [eax] ; <-- EXCEPTION
00404E26 cmp ecx, +$08
00404E29 jle System.Move
00404E2B fild qword ptr [eax+$08]
00404E2E cmp ecx, +$10
00404E31 jle System.Move
00404E33 fild qword ptr [eax+$10]
00404E36 fistp qword ptr [edx+$10]
00404E39 fistp qword ptr [edx+$08]
00404E3C fistp qword ptr [edx]
00404E3E fistp qword ptr [ecx+edx]
Registers:
EAX: 0E3A4694 EDI: 0000000D
EBX: 00001B5C ESI: 0ECF7928
ECX: 00000005 ESP: 0612FC1C
EDX: 0E3A2B38 EIP: 00404E24
What could cause that error?
I have seen this problem before. The problem was that before entering into the Move method the stack of the x87 registers contained some invalid floating point values instead of beging empty. This was due to an exception that occured earlier and left the x87 stack like that.
The Move command uses the x87 registers because they allow for fast movement of data without depending on SSE instructions but it assumes the stack is empty.
Finding the solution:
set a breakpoint on the start of the Move command and use the FPU debug window to validate that the FPU stack is indeed trashed.
From here: backtrace where in your application was the cause of this trashed FPU stack using the same window. This is the cause of your problem.
Seems similar to a problem I had before:
Memory corruption in System.Move due to changed 8087CW mode (png + stretchblt)
My fix was to disable SSE/MMX stuff in FastMove.pas, so it did not (mis)use the FPU anymore (and not vulnerable to FPU corruption)
Related
I'm somewhat new to assembly language and wanted to understand how it works on an older system. I understand that the large memory model uses far pointers while the small memory model uses near pointers, and that the return address in the large model is 4 bytes instead of two, so the first parameter changes from [bp+4] to [bp+6]. However, in the process of adapting a graphics library from a small to a large model, there are other subtle things that I don't seem to understand. Running this code with a large memory model from C is supposed to clear the screen, but instead it hangs the system (it was assembled with TASM):
; void gr256cls( int color , int page );
COLOR equ [bp+6]
GPAGE equ [bp+8]
.MODEL LARGE,C
.186
public C gr256cls
.code
gr256cls PROC
push bp
mov bp,sp
push di
pushf
jmp skip_1
.386
mov ax,0A800h
mov es,ax
mov ax,0E000h
mov fs,ax
CLD
mov al,es:[bp+6]
mov ah,al
mov bx,ax
shl eax,16
mov ax,bx
cmp word ptr GPAGE,0
je short cls0
cmp word ptr GPAGE,2
je short cls0
jmp short skip_0
cls0:
mov bh,0
mov bl,1
call grph_cls256
skip_0:
cmp word ptr GPAGE,1
je short cls1
cmp word ptr GPAGE,2
je short cls1
jmp short skip_1
cls1:
mov bh,8
mov bl,9
call grph_cls256
skip_1:
.186
popf
pop di
pop bp
ret
.386
grph_cls256:
mov fs:[0004h],bh
mov fs:[0006h],bl
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,14848 ;=8192+6656
mov di,0
rep stosd
;; Freezes here.
ret
gr256cls ENDP
end
It hangs at the ret at the end of grph_256cls. In fact, even if I immediately ret from the beginning of the function it still hangs right after. Is there a comprehensive list of differences when coding assembly in the two modes, so I can more easily understand what's happening?
EDIT: To clarify, this is the original source. This is not generated output; it's intended to be assembled and linked into a library.
I changed grph_256cls to a procedure with PROC FAR and it now works without issue:
grph_cls256 PROC FAR
...
grph_cls256 ENDP
The issue had to do with how C expects functions to be called depending on the memory model. In the large memory model, all function calls are far. I hadn't labeled this assumption on the grph_256cls subroutine when trying to call it, so code that didn't push/pop the right values onto/off the stack was assembled instead.
How do you perform operations like change the value at an absolute word address?
Say you have some value at 5DAh and you want to count the number of zeros on that address, or move a value from one absolute address to another. How can one do that?
Short Answer: You Can't
You might have a trick question in front of you (no clue, just my guess).
The physical architecture of the 8086 chip did not have that instruction.
As for your two specific questions...
"...you want to count the number of zeros on that address..."
That's somewhat ambiguous, in fact so vague that I can't understand it.
"...move a value from one absolute address to another..."
Good question. We'll do this in 32 bit, no, 16 and then 32 bit.
16 bit example
Push Si ;source index register
Push Di ;destination index register
Push Ax ;We'll use this for the transfer
Lea Si, Where_The_Number_Is_Now ;You'll define this, somehow
Lea Di, Where_We_Want_It_To_Go ;You'll define this also, same thing
Mov Ax, Ds:[Si] ;The "Ds:" may or may not be needed, be safe
Mov Ds:[Di], Ax ;Probably do need "Ds:" for this instruction
Pop Ax ;Do pay attention to the reverse order
Pop Di ;...of popping the registers in exact
Pop Si ;...opposite of how they were pushed
; And you are done
32 bit example
Push Esi ;source index register
Push Edi ;destination index register
Push Eax ;We'll use this for the transfer
Lea Esi, Where_The_Number_Is_Now ;You'll define this, somehow
Lea Edi, Where_We_Want_It_To_Go ;You'll define this also, same thing
Mov Eax, Ds:[Esi] ;The "Ds:" may or may not be needed, be safe
Mov Ds:[Edi], Eax ;Probably do need "Ds:" for this instruction
Pop Eax ;Do pay attention to the reverse order
Pop Edi ;...of popping the registers in exact
Pop Esi ;...opposite of how they were pushed
; And you are done
To change a value at an absolute word address:
mov byte ptr [5dah], 0
...or...
mov word ptr [5dah], 0
To move a value from one absolute word address to another:
mov al, byte ptr [5dah]
mov byte ptr [1234h], al
...or...
mov ax, word ptr [5dah]
mov word ptr [1234h], ax
As for the other question, the one that asked how to count the number of zeros on that address, you were a little to vague.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm beginner in assembly and memory related stuffs. When we have int64 value inside a reg, the mov operator how changes that? I mean it is not a string but integer. Maybe it is silly of me but I do not understand ! for example what this code does? assume that we have a int64 value at "esp".
push edx
push eax
mov eax,dword ptr [esp+$10]
mul eax,dword ptr [esp]
mov ecx,eax
mov eax,dword ptr [esp+$04]
mul eax,dword ptr [esp+$0C]
add ecx,eax
mov eax,dword ptr [esp]
mul eax,dword ptr [esp+$0C]
add edx,ecx
pop ecx
pop ecx
ret 8
int64 has 8 bytes but it addresses esp+$10 !!!
This is the code for _llmul, information that would have been quite useful to have seen in the question. No matter, _llmul is a helper function that performs 64 bit integer multiplication on a 32 bit machine.
The source code for the function looks like this:
// 64 bit integer helper routines
//
// These functions always return the 64-bit result in EAX:EDX
// ------------------------------------------------------------------------------
// 64-bit signed multiply
// ------------------------------------------------------------------------------
//
// Param 1(EAX:EDX), Param 2([ESP+8]:[ESP+4]) ; before reg pushing
//
procedure __llmul;
asm
push edx
push eax
// Param2 : [ESP+16]:[ESP+12] (hi:lo)
// Param1 : [ESP+4]:[ESP] (hi:lo)
mov eax, [esp+16]
mul dword ptr [esp]
mov ecx, eax
mov eax, [esp+4]
mul dword ptr [esp+12]
add ecx, eax
mov eax, [esp]
mul dword ptr [esp+12]
add edx, ecx
pop ecx
pop ecx
ret 8
end;
This makes it clear that, after the two PUSH operations, the operands are to found in [ESP+16]:[ESP+12] and [ESP+4]:[ESP].
At [ESP+8] you will find the function's return address.
The entire design of the function, relies on the fact that the MUL operation, when operating on 32 bit operands, returns a 64 bit result in EDX:EAX. So the function is essentially performing long multiplication on operands that are being treated as two 32 bit digits.
Let's denote the operands by H1:L1 and H2:L2 where H denotes the high 32 bits, and L denotes the low 32 bits. Then the first two multiplications are H1*L2 and H2*L1. The high part of the result, in EDX is ignored since it won't fit in a 64 bit result. The final multiplication is L1*L2 with the high part of that being combined with the low parts of the first two multiplications. And H1*H2 is not even attempted since clearly it won't fit. This function ignores overflow.
I am using Delphi 2009 with Unicode strings.
I'm trying to Encode a very large file to convert it to Unicode:
var
Buffer: TBytes;
Value: string;
Value := Encoding.GetString(Buffer);
This works fine for a Buffer of 40 MB that gets doubled in size and returns Value as an 80 MB Unicode string.
When I try this with a 300 MB Buffer, it gives me an EOutOfMemory exception.
Well, that wasn't totally unexpected. But I decided to trace it through anyway.
It goes into the DynArraySetLength procedure in the System unit. In that procedure, it goes to the heap and calls ReallocMem. To my surprise, it successfully allocates 665,124,864 bytes!!!
But then towards the end of DynArraySetLength, it calls FillChar:
// Set the new memory to all zero bits
FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);
You can see by the comment what that is supposed to do. There is not much to that routine, but that is the routine that causes the EOutOfMemory exception. Here is FillChar from the System unit:
procedure _FillChar(var Dest; count: Integer; Value: Char);
{$IFDEF PUREPASCAL}
var
I: Integer;
P: PAnsiChar;
begin
P := PAnsiChar(#Dest);
for I := count-1 downto 0 do
P[I] := Value;
end;
{$ELSE}
asm // Size = 153 Bytes
CMP EDX, 32
MOV CH, CL // Copy Value into both Bytes of CX
JL ##Small
MOV [EAX ], CX // Fill First 8 Bytes
MOV [EAX+2], CX
MOV [EAX+4], CX
MOV [EAX+6], CX
SUB EDX, 16
FLD QWORD PTR [EAX]
FST QWORD PTR [EAX+EDX] // Fill Last 16 Bytes
FST QWORD PTR [EAX+EDX+8]
MOV ECX, EAX
AND ECX, 7 // 8-Byte Align Writes
SUB ECX, 8
SUB EAX, ECX
ADD EDX, ECX
ADD EAX, EDX
NEG EDX
##Loop:
FST QWORD PTR [EAX+EDX] // Fill 16 Bytes per Loop
FST QWORD PTR [EAX+EDX+8]
ADD EDX, 16
JL ##Loop
FFREE ST(0)
FINCSTP
RET
NOP
NOP
NOP
##Small:
TEST EDX, EDX
JLE ##Done
MOV [EAX+EDX-1], CL // Fill Last Byte
AND EDX, -2 // No. of Words to Fill
NEG EDX
LEA EDX, [##SmallFill + 60 + EDX * 2]
JMP EDX
NOP // Align Jump Destinations
NOP
##SmallFill:
MOV [EAX+28], CX
MOV [EAX+26], CX
MOV [EAX+24], CX
MOV [EAX+22], CX
MOV [EAX+20], CX
MOV [EAX+18], CX
MOV [EAX+16], CX
MOV [EAX+14], CX
MOV [EAX+12], CX
MOV [EAX+10], CX
MOV [EAX+ 8], CX
MOV [EAX+ 6], CX
MOV [EAX+ 4], CX
MOV [EAX+ 2], CX
MOV [EAX ], CX
RET // DO NOT REMOVE - This is for Alignment
##Done:
end;
{$ENDIF}
So my memory was allocated, but it crashed trying to fill it with zeros. This doesn't make sense to me. As far as I'm concerned, the memory doesn't even need to be filled with zeros - and that is probably a time waster anyhow - since the Encoding statement is about to fill it anyway.
Can I somehow prevent Delphi from doing the memory fill?
Or is there some other way I can get Delphi to allocate this memory successfully for me?
My real goal is to do that Encoding statement for my very large file, so any solution that will allow this would be much appreciated.
Conclusion: See my comments on the answers.
This is a warning to be careful in debugging assembler code. Make sure you break on all the "RET" lines, since I missed the one in the middle of the FillChar routine and erroneously concluded that FillChar caused the problem. Thanks Mason, for pointing this out.
I will have to break the input into Chunks to handle the very large file.
FillChar isn't allocating any memory, so that's not your problem. Try tracing into it and placing breakpoints at the RET statements, and you'll see that the FillChar finishes. Whatever the problem is, it's probably in a later step.
Read a chunk from the file, encode and write to another file, repeat.
A wild guess: Could the problem be memory being overcommitted and when the FillChar actually accesses the memory it can't find a page to actually give you? I don't know if Windows will even overcommit memory, I do know that some OSes do--you don't find out about it until you actually try to make use of the memory.
If this is the case it could cause the blowup in FillChar.
Programs are great at looping. They loop tirelessly without complaining.
Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS won't even know if it has the amount of contiguous memory that you need ahead of time. Your OS says, yeah, I have 1 GB free. But as soon as you go to use it, your OS says, wait, you want all of it in one chunk? Let me make sure I have enough all in one place. If it doesn't you get the error.
If it does have the memory, well, there's still a lot of work for the heap manager in preparing the memory and marking it as used.
So, obviously, it makes some sense to allocate less memory and simply loop through it. This saves the computer from doing a lot of work that it will only have to undo when it's done. Why not have it do just a little bit of work in setting aside your memory, then just keep re-using it?
Stack memory is allocated much faster than heap memory. If you keep your memory usage small (under 1 MB, by default), the compiler may just use stack memory over heap memory, which will make your loops even faster. In addition, local variables that get allocated in the register are very fast.
There are factors such as hard drive cluster and cache sizes, CPU cache sizes, and things, that offer hints about the best chunk sizes. The key is to find a good number. I like to use 64 KB chunks.
i wonder if i've found a compiler bug? i was removing some old code from my app and now i get stackoverflow at "begin" (see code & disassembly below).
procedure TfraNewRTMDisplay.ShowMeasurement;
var
iDummy, iDummy2, iDummy3:integer;
begin // STACK OVERFLOW BEFORE MY CODE STARTS
iDummy:=0;
iDummy2:=0;
case iDummy2 of
1:
case iDummy of
1:m_SelectedRTMMenuData.ChannelMeasSet.MeasKV.ClearMeasurementData;
2:m_SelectedRTMMenuData.ChannelMeasSet.MeasKV.ClearMeasurementData;
end;
end;
end;
NewRTMDisplay.pas.1601: begin
00983BF8 55 push ebp
00983BF9 8BEC mov ebp,esp
00983BFB B9D4E40400 mov ecx,$0004e4d4
00983C00 6A00 push $00 // stack overflow loop
00983C02 6A00 push $00 // stack overflow loop
00983C04 49 dec ecx // stack overflow loop
00983C05 75F9 jnz $00983c00 // stack overflow loop
00983C07 56 push esi
00983C08 57 push edi
00983C09 8945FC mov [ebp-$04],eax
00983C0C 8D856005FFFF lea eax,[ebp-$0000faa0]
00983C12 8B153C789A00 mov edx,[$009a783c]
00983C18 E88B35A8FF call #InitializeRecord
00983C1D 8D85D00AFEFF lea eax,[ebp-$0001f530]
00983C23 8B153C789A00 mov edx,[$009a783c]
00983C29 E87A35A8FF call #InitializeRecord
00983C2E 33C0 xor eax,eax
00983C30 55 push ebp
00983C31 68F73C9800 push $00983cf7
00983C36 64FF30 push dword ptr fs:[eax]
00983C39 648920 mov fs:[eax],esp
NewRTMDisplay.pas.1602: iDummy:=0;
00983C3C 33C0 xor eax,eax
00983C3E 8945F8 mov [ebp-$08],eax
NewRTMDisplay.pas.1603: iDummy2:=0;
00983C41 33C0 xor eax,eax
00983C43 8945F4 mov [ebp-$0c],eax
NewRTMDisplay.pas.1605: case iDummy2 of
00983C46 8B45F4 mov eax,[ebp-$0c]
00983C49 48 dec eax
00983C4A 7571 jnz $00983cbd
NewRTMDisplay.pas.1607: case iDummy of
00983C4C 8B45F8 mov eax,[ebp-$08]
00983C4F 48 dec eax
00983C50 7405 jz $00983c57
00983C52 48 dec eax
00983C53 7436 jz $00983c8b
00983C55 EB66 jmp $00983cbd
NewRTMDisplay.pas.1608: 1:m_SelectedRTMMenuData.ChannelMeasSet.MeasKV.ClearMeasurementData;
00983C57 8D951872EBFF lea edx,[ebp-$00148de8]
00983C5D 8B45FC mov eax,[ebp-$04]
00983C60 8B80F0020000 mov eax,[eax+$000002f0]
00983C66 E895DBE9FF call TRTMMenuData.ChannelMeasSet
00983C6B 8DB52072EBFF lea esi,[ebp-$00148de0]
00983C71 8DBD6005FFFF lea edi,[ebp-$0000faa0]
00983C77 B9A43E0000 mov ecx,$00003ea4
00983C7C F3A5 rep movsd
00983C7E 8D856005FFFF lea eax,[ebp-$0000faa0]
00983C84 E81B3F0200 call TDeviceMeas.ClearMeasurementData
00983C89 EB32 jmp $00983cbd
NewRTMDisplay.pas.1609: 2:m_SelectedRTMMenuData.ChannelMeasSet.MeasKV.ClearMeasurementData;
00983C8B 8D9560D9D8FF lea edx,[ebp-$002726a0]
00983C91 8B45FC mov eax,[ebp-$04]
00983C94 8B80F0020000 mov eax,[eax+$000002f0]
00983C9A E861DBE9FF call TRTMMenuData.ChannelMeasSet
00983C9F 8DB568D9D8FF lea esi,[ebp-$00272698]
00983CA5 8DBDD00AFEFF lea edi,[ebp-$0001f530]
00983CAB B9A43E0000 mov ecx,$00003ea4
00983CB0 F3A5 rep movsd
00983CB2 8D85D00AFEFF lea eax,[ebp-$0001f530]
00983CB8 E8E73E0200 call TDeviceMeas.ClearMeasurementData
NewRTMDisplay.pas.1612: end;
any ideas?
thank you!
mp
Something's creating a 320 KB buffer. Do you have any of those objects in that chain of calls have a statically allocated huge array inside them? Perhaps it's trying to fit one of the returned objects onto the stack.
The following code appears to be the problem:
m_SelectedRTMMenuData.ChannelMeasSet.MeasKV.ClearMeasurementData;
It looks like ChannelMeasSet is a huge data structure that the compiler is trying to copy to your stack for some reason. I'm not sure why the compiler would attempt to do that without seeing the rest of your object declaration. Furthermore, the compiler appears to have allocated two separate temporary storage areas on the stack, one for each line where you call this (even though only one will be used at any given time).
You have at least two possible solutions:
Make the stack bigger. There is probably a Delphi directive for this.
Rework your data structures so the compiler isn't inclined to copy huge temporary objects around.
Yuliy is right: The 320KB buffer is likely an array[20] of whatever is returned by ChannelMeasSet - based on the size of the rep movsd loop.
It looks like you've got some code that is returning a very large data structure by value, rather than just a reference to it. Apart from the stack overflow issue, that's likely to be very inefficient.
thank you for your suggestions! here is the call stack (from delphi 2009).
NewRTMDisplay.TfraNewRTMDisplay.ShowMeasurement
NewRTMDisplay.TfraNewRTMDisplay.DetectorSelectionChange($46552D0,drmOneAtATime)
NewRTMDisplay.TfraNewRTMDisplay.pumOnDetectorSelectionChange($46129E0)
Menus.TMenuItem.Click
Menus.TMenu.DispatchCommand(???)
Menus.TPopupList.WndProc((273, 386, 0, 0, 386, 0, 0, 0, 0, 0))
Menus.TPopupList.MainWndProc(???)
Classes.StdWndProc(15403740,273,386,0)
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e4189cd ; C:\WINDOWS\system32\USER32.dll
:7e418a10 USER32.DispatchMessageW + 0xf
Forms.TApplication.ProcessMessage(???)
:0051e31c TApplication.ProcessMessage + $F8
am working on more of your comments here right now. come back in a few minutes.
admittedly there are some moderately large structs around. i doubt they're anywhere near that big but i'll get back to you shortly.
stupid site; it auto refreshed, wiping out my answer. ChannelMeasSet is a record that surprised me that it occupies 1.2 MB!
m_SelectedRTMMenuData a small object
ChannelMeasSet a record
MeasKV a record with method ClearMeasurementData( );
strangely, the app was just now having a bunch of old code being stripped out. i have never seen a stack overflow like this one.
i'll post this comment so it doesn't get destroyed before i can finish!
thank you for your help!