I recieved a crash report from MadExcept from a user. The Exception was Invalid floating point operation.
The odd part though is that the callstack dies at #FSafeDivide.
I did a google and found out that this was a check for certain pentium chips which didn't do division correctly. If the test failed all the divisions would be done in software rather than hardware. I have the Pentium-Safe FDIV option turned on in my compiler settings.
Could this have caused the error? I also read somewhere else that the EInvalidOp which was the exception class can be a stack overflow or something.
Here's a snipit of the mad except message if you want to read it.
exception class : EInvalidOp
exception message : Invalid floating point operation.
thread $1014 (TMyBossThread):
00403509 M5b3.exe System #FSafeDivide
008300c9 M5b3.exe MMyWorkerThread 317 TMyBossThread.Search
0073e87a M5b3.exe MMyManagerThread 186 TMyWorkerThread.Execute
008e8c17 M5b3.exe madExcept HookedTThreadExecute
0042c150 M5b3.exe Classes ThreadProc
00405354 M5b3.exe System ThreadWrapper
008e8af9 M5b3.exe madExcept CallThreadProcSafe
008e8b63 M5b3.exe madExcept ThreadExceptFrame
created by main thread ($864) at:
0073e828 M5b3.exe MMyManagerThread 171 TMyManagerThread.Create
First, unless you actually have people still running on early Pentium I chips, you should probably turn that compiler option off. It's to address a glitch in a few specific CPUs, and any chip sold since 1995 has not had the problem.
Having said that, if you've got an invalid floating point operation in a division, the problem's most likely in your code somewhere, especially since FSafeDivide is the routine that's supposed to produce the right results. Take a look at TMyBossThread.Search, line 317, and see what it's dividing there. Also look at line 316, since stack traces can sometimes point you to the line after the one you care about.
A few comments, before searching in the haystack:
"If it's not reproducible, it's not a bug but an anomaly". Don't waste time on What or Why but on How you can recreate it.
As Mason said, it's probably time to remove this compiler option. (D6 is almost 10 years old)
Do you know if it happens on a specific Windows version? For instance, Text-To-Speech working well on XP gives a "Floating point division by zero error" on Vista and up.
Supposing your code seems fine, what is called that would involve some floating point operations?
The 2 last ones refer to problems with the FPU registers being messed up:
See here for interoping with .Net and in the Help on Set8087CW for OpenGL
This (german) article describes a case where switching on the Pentium(tm)-safe divide ($U+) fixed a Data Execution Prevention error on a Windows 2003 Server system, which has DEP enabled:
http://entwickler-forum.de/archive/index.php/t-41207.html
Delphi 2009 still has this compiler flag with default $U- (no Pentium(tm)-safe divide.
So even if we can forget about the hardware related part (broken CPUs), it still could make a difference depending on Operating System 'features' like DEP
Related
As I am receiving "Floating point division by zero" exception when using TWebBrowser and TEmbeddedWB from time to time, I discovered that I need to mask division by zero exceptions Set8087CW or SetMXCSR.
Q1: What would be the best approach to do this:
to mask such exceptions early in the application startup and never touch them again (the app is multithreaded)?
to use OnBeforeNavigate and OnDocumentComplete events to mask / unmask exceptions? (is there a chance that exception could occur after the document is loaded?)
Q2: What would be the best "command" to mask only "division by zero" and nothing else - if application is 32-bit is there a need to mask 64 bit exception too?
The application I am using it it has TWebBrowser control available all the time for displaying email contents.
Also, if anyone can clarify - is this a particular bug with TWebBrowser control from Microsoft or just difference between Delphi/C++ Builder and Microsoft tools? What would happen if I would host TWebBrowser inside Visual C++ application if division by zero error would appear - it wouldn't be translated into exception but what would happen then - how would Visual C++ handle "division by zero" exception then?
It is kind of strange that Microsoft didn't notice this problem for such a long time - also it is strange that Embarcardero never noticed it too. Because masking floating point exception effectively also masks your own program exception for that particular purpose.
UPDATE
My final solution after some examination is:
SetExceptionMask(GetExceptionMask() << exZeroDivide);
The default state from GetExceptionMask() returns: TFPUExceptionMask() << exDenormalized << exUnderflow << exPrecision. So obviously, some exceptions are already masked - this just adds exZeroDivide to the masked exceptions.
As a result every division by zero now results with +INF in floating point instead of exception. I can live with that - for the production version of the code it will me masked to avoid errors and for the debug version it will be unmasked to detect floating point division by zero.
Assuming that you have no need for floating point exceptions to be unmasked in your application code, far and away the simplest thing to do is to mask exceptions at some point in your initialization code.
The best way to do this is as so:
SetExceptionMask(exAllArithmeticExceptions);
This will set the 8087 control word on 32 bit targets, and the MXCSR on 64 bit targets. You will find SetExceptionMask in the Math unit.
If you wish to have floating point exceptions unmasked in your code then it gets tricky. One strategy would be to run your floating point code in a dedicated thread that unmasks the exceptions. This certainly can work, but not if you rely on the RTL functions Set8087CW and SetMXCSR. Note that everything in the RTL that controls FP units routes through these functions. For example SetExceptionMask does.
The problem is that Set8087CW and SetMXCSR are not threadsafe. It seems hard to believe that Embarcadero could be so inept as to produce fundamental routines that operate on thread context and yet fail to be threadsafe. But that is what they have done.
It's surprisingly hard to undo the mess that they have left, and to do so involves quite a bit of code patching. The lack of thread safety is down to the (mis)use of the global variables Default8087CW and DefaultMXCSR. If two threads call Set8087CW or SetMXCSR at the same time then these global variables can have the effect of leaking the value from one thread to the other.
You could replace Set8087CW and SetMXCSR with versions that did not change global state, but it's sadly not that simple. The global state is used in various other places. This may seem immodest, but if you want to learn more about this matter, read my document attached to this QC report: http://qc.embarcadero.com/wc/qcmain.aspx?d=107411
I sometimes see disassembled programs which have instructions like:
mov %eax, -4(%esp)
which stores eax to stack at esp-4, without changing esp.
I'd like to know whether in general, you could put data into the stack beyond the stack pointer, and have those data be preserved (not altered unless I do it specifically).
Also, does this depend on which OS I use?
It matters which OS you use, because different OSes have different ABIs. (See the x86 tag wiki if you don't know what that means).
There are two ways I can see that mov %eax, -4(%esp) could be sane:
In the Linux x32 ABI (long mode with 32bit pointers), where there's a 128B red zone like in the normal x86-64 ABI. Compilers frequently generate code using the address-size prefix when they can't prove that e.g. 4(%rdi) would be the same as 4(%edi) in every case (e.g. wraparound). Unfortunately gcc 5.3 still uses 32bit addressing for locals on the stack, which could only wrap if %rsp == 0 (since the ABI requires it to be 16B-aligned).
Anyway, void foo(void) { volatile int x = 10; } compiles to
movl $10, -4(%esp) / ret with gcc 5.3 -O3 -mx32 on the Godbolt Compiler Explorer.
In (kernel) code that runs with interrupts disabled. Since nothing asynchronous other than DMA can happen, nothing can clobber your stack memory. (Although x86 has NMIs: Non-maskable interrupts. Depending on the handler for NMIs, and whether they can be blocked at all, NMIs could clobber memory below the stack pointer, I think.)
In user-space, your signal handlers aren't the only thing that can asynchronously clobber memory below the stack pointer:
As Jester points out in comments on dwelch's answer, pages below the stack pointer can be discarded (asynchronously of course), so a process that temporarily uses a lot of stack isn't wasting all those pages forever. If %esp happens to be at a page boundary, -4(%esp) is in a different page. And instead of faulting in a newly-allocated page of stack memory, access to unmapped pages below the stack pointer turn into segfaults on Linux.
Unless you have a guarantee otherwise (e.g. the red zone), then you must assume that everything below %esp is scribbled over between every instruction. None of the standard 32bit ABIs have a red-zone, and the Windows 64bit ABI also lacks one. Asynchronous use of the stack (usually by signal handlers in Linux) is a whole-program thing, not something that the compiler could determine just from the current compilation unit (even in cases where the compiler could prove that -4(%esp) was in the same page as (%esp)).
Note that the Linux x32 ABI is a 64bit ABI for AMD64 aka x86-64, not i386 aka IA32 aka x86-32. It's much more like the usual AMD64 ABI, since it was designed after.
EDIT
not sure what you mean by above and below since some folks "see" addresses increasing up or increasing down.
But it doesnt matter. If the stack was initialized at address X and is currently at Y then the data between X and Y must be preserved (one end not inclusive). The memory on either side is fair game.
The compiler not the operating system makes this happen, it moves the stack pointer to cover whatever it needs for that function. And moves it back when done. Each nested function consuming more and more stack and each return giving a little back.
As I am receiving "Floating point division by zero" exception when using TWebBrowser and TEmbeddedWB from time to time, I discovered that I need to mask division by zero exceptions Set8087CW or SetMXCSR.
Q1: What would be the best approach to do this:
to mask such exceptions early in the application startup and never touch them again (the app is multithreaded)?
to use OnBeforeNavigate and OnDocumentComplete events to mask / unmask exceptions? (is there a chance that exception could occur after the document is loaded?)
Q2: What would be the best "command" to mask only "division by zero" and nothing else - if application is 32-bit is there a need to mask 64 bit exception too?
The application I am using it it has TWebBrowser control available all the time for displaying email contents.
Also, if anyone can clarify - is this a particular bug with TWebBrowser control from Microsoft or just difference between Delphi/C++ Builder and Microsoft tools? What would happen if I would host TWebBrowser inside Visual C++ application if division by zero error would appear - it wouldn't be translated into exception but what would happen then - how would Visual C++ handle "division by zero" exception then?
It is kind of strange that Microsoft didn't notice this problem for such a long time - also it is strange that Embarcardero never noticed it too. Because masking floating point exception effectively also masks your own program exception for that particular purpose.
UPDATE
My final solution after some examination is:
SetExceptionMask(GetExceptionMask() << exZeroDivide);
The default state from GetExceptionMask() returns: TFPUExceptionMask() << exDenormalized << exUnderflow << exPrecision. So obviously, some exceptions are already masked - this just adds exZeroDivide to the masked exceptions.
As a result every division by zero now results with +INF in floating point instead of exception. I can live with that - for the production version of the code it will me masked to avoid errors and for the debug version it will be unmasked to detect floating point division by zero.
Assuming that you have no need for floating point exceptions to be unmasked in your application code, far and away the simplest thing to do is to mask exceptions at some point in your initialization code.
The best way to do this is as so:
SetExceptionMask(exAllArithmeticExceptions);
This will set the 8087 control word on 32 bit targets, and the MXCSR on 64 bit targets. You will find SetExceptionMask in the Math unit.
If you wish to have floating point exceptions unmasked in your code then it gets tricky. One strategy would be to run your floating point code in a dedicated thread that unmasks the exceptions. This certainly can work, but not if you rely on the RTL functions Set8087CW and SetMXCSR. Note that everything in the RTL that controls FP units routes through these functions. For example SetExceptionMask does.
The problem is that Set8087CW and SetMXCSR are not threadsafe. It seems hard to believe that Embarcadero could be so inept as to produce fundamental routines that operate on thread context and yet fail to be threadsafe. But that is what they have done.
It's surprisingly hard to undo the mess that they have left, and to do so involves quite a bit of code patching. The lack of thread safety is down to the (mis)use of the global variables Default8087CW and DefaultMXCSR. If two threads call Set8087CW or SetMXCSR at the same time then these global variables can have the effect of leaking the value from one thread to the other.
You could replace Set8087CW and SetMXCSR with versions that did not change global state, but it's sadly not that simple. The global state is used in various other places. This may seem immodest, but if you want to learn more about this matter, read my document attached to this QC report: http://qc.embarcadero.com/wc/qcmain.aspx?d=107411
As I am receiving "Floating point division by zero" exception when using TWebBrowser and TEmbeddedWB from time to time, I discovered that I need to mask division by zero exceptions Set8087CW or SetMXCSR.
Q1: What would be the best approach to do this:
to mask such exceptions early in the application startup and never touch them again (the app is multithreaded)?
to use OnBeforeNavigate and OnDocumentComplete events to mask / unmask exceptions? (is there a chance that exception could occur after the document is loaded?)
Q2: What would be the best "command" to mask only "division by zero" and nothing else - if application is 32-bit is there a need to mask 64 bit exception too?
The application I am using it it has TWebBrowser control available all the time for displaying email contents.
Also, if anyone can clarify - is this a particular bug with TWebBrowser control from Microsoft or just difference between Delphi/C++ Builder and Microsoft tools? What would happen if I would host TWebBrowser inside Visual C++ application if division by zero error would appear - it wouldn't be translated into exception but what would happen then - how would Visual C++ handle "division by zero" exception then?
It is kind of strange that Microsoft didn't notice this problem for such a long time - also it is strange that Embarcardero never noticed it too. Because masking floating point exception effectively also masks your own program exception for that particular purpose.
UPDATE
My final solution after some examination is:
SetExceptionMask(GetExceptionMask() << exZeroDivide);
The default state from GetExceptionMask() returns: TFPUExceptionMask() << exDenormalized << exUnderflow << exPrecision. So obviously, some exceptions are already masked - this just adds exZeroDivide to the masked exceptions.
As a result every division by zero now results with +INF in floating point instead of exception. I can live with that - for the production version of the code it will me masked to avoid errors and for the debug version it will be unmasked to detect floating point division by zero.
Assuming that you have no need for floating point exceptions to be unmasked in your application code, far and away the simplest thing to do is to mask exceptions at some point in your initialization code.
The best way to do this is as so:
SetExceptionMask(exAllArithmeticExceptions);
This will set the 8087 control word on 32 bit targets, and the MXCSR on 64 bit targets. You will find SetExceptionMask in the Math unit.
If you wish to have floating point exceptions unmasked in your code then it gets tricky. One strategy would be to run your floating point code in a dedicated thread that unmasks the exceptions. This certainly can work, but not if you rely on the RTL functions Set8087CW and SetMXCSR. Note that everything in the RTL that controls FP units routes through these functions. For example SetExceptionMask does.
The problem is that Set8087CW and SetMXCSR are not threadsafe. It seems hard to believe that Embarcadero could be so inept as to produce fundamental routines that operate on thread context and yet fail to be threadsafe. But that is what they have done.
It's surprisingly hard to undo the mess that they have left, and to do so involves quite a bit of code patching. The lack of thread safety is down to the (mis)use of the global variables Default8087CW and DefaultMXCSR. If two threads call Set8087CW or SetMXCSR at the same time then these global variables can have the effect of leaking the value from one thread to the other.
You could replace Set8087CW and SetMXCSR with versions that did not change global state, but it's sadly not that simple. The global state is used in various other places. This may seem immodest, but if you want to learn more about this matter, read my document attached to this QC report: http://qc.embarcadero.com/wc/qcmain.aspx?d=107411
Taking my first course in assembly language, I am frustrated with cryptic error messages during debugging... I acknowledge that the following information will not be enough to find the cause of the problem (given my limited understanding of the assembly language, ColdFire(MCF5307, M68K family)), but I will gladly take any advice.
...
jsr out_string
Address Error (format 0x04 vector 0x03 fault status 0x1 status reg 0x2700)
I found a similar question on http://forums.freescale.com/freescale/board/message?board.id=CFCOMM&thread.id=271, regarding on ADDRESS ERROR in general.
The answer to the question states that the address error is because the code is "incorrectly" trying to execute on a non-aligned boundary (or accessing non-aligned memory).
So my questions will be:
What does it mean to "incorrectly" trying to execute a non-aligned boundary/memory? If there is an example, it would help a lot
What is non-aligned boundary/memory?
How would you approach fixing this problem, assuming you have little debugging technique(eg. using breakpoints and trace)
First of all, it is possible that isn't the instruction causing the error. Be sure to see if the previous or next instruction could have caused it. However, assuming that exception handlers and debuggers have improved:
An alignment exception is what occurs when, say 32 bit (4 byte) data is retrieved from an address which is not a multiple of 4 bytes. For example, variable x is 32 bits at address 2, then
const1: dc.w someconstant
x: dc.l someotherconstant
Then the instruction
mov.l x, %r0
would cause a data alignment fault on a 68000 (and 68010, IIRC). The 68020 eliminated this restriction and performs the unaligned access, but at the cost of decreased performance. I'm not aware of the jsr (jump to subroutine) instruction requiring alignment, but it's not unreasonable and it's easy to arrange—Before each function, insert the assembly language's macro for alignment:
.align long
func: ...
It has been a long time since I've used a 68K family processor, but I can give you some hints.
Trying to execute on an unaligned boundary means executing code at an odd address. If out_string were at an address with the low bit set for example.
The same holds true for a data access to memory of 2 or 4 byte data. I'm not sure if the Coldfire supports byte access to odd memory addresses, but the other 68K family members did.
The address error occurs on the instruction that causes the error in all cases.
Find out what instruction is there. If the pc matches (or is close) then it is an unaligned execution. If it is a memory access, e.g. move.w d0,(a0), then check to see what address is being read/written, in this case the one pointed at by a0.
I just wanted to add that this is very good stuff to figure out. I program high end medical imaging devices in my day job, but occasionally I need to get down to this level. I have found and fixed more than one COTS OS problem by being able to track down just this sort of problem.