Delphi SampleProfiler: How is this code calling into ntdll.dll? - delphi

i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll.
Note: i turned on the options to ignore Application.Idle time, and calls from System.pas. So it
isn't inside ntdll because the
application is idle:
After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll, but the odd thing is who the caller is:
The caller is from the Virtual Treeview's:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
Note: The application is not inside ntdll.dll because the
application is idle, because the
caller isn't Application.Idle.
What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll. Even more confusing is that:
not only is it not something inside PrepareCell()
it's not even the setup of PrepareCell (e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on the begin inside PrepareCell.
VirtualTrees.pas:
procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
...
end;
So i'm trying to figure out how this line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
is calling ntdll.dll.
The only other ways in are the three parameters:
PaintInfo
Window.Left
NodeBitmap.Width
Maybe one of those is a function, or a property getter, that would call into ntdll. So i put a breakpoint on the line, and look at the CPU window at runtime:
There is a line in there that might be the culprit:
call dword ptr [edx+$2c]
But when i follow that jump, it doesn't end up in ntdll.dll, but TBitmap.GetWidth:
Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll.
So how is the line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
calling into ntdll.dll?
Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.
Update 2
What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.
For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:
WaitMessage;
When when you actually look at it is:
USER32.WaitMessage
mov eax,$00001226
mov edx,$7ffe0300
call dword ptr [edx]
ret
Calling the function specified at $7ffe0300 is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage.
This is how to you get into ntdll.dll: you call it.
i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.
Update Three
i converted the two parameters:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
into stack variables:
_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
To confirm that the bottleneck is not is the call to
Windows.Left, or
Nodebitmap.Width
Profiler still indicates that the line
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:
VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
mov eax,[ebp-$54]
push eax
mov edx,esi
mov ecx,[ebp-$50]
mov eax,[ebp-$04]
call TBasevirtualTree.PrepareCell
Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:
VirtualTrees.pas.15746: begin
push ebp
mov ebp,esp
add esp,-$44
push ebx
push esi
push edi
mov [ebp-$14],ecx
mov [ebp-$18],edx
mov [ebp-$1c],eax
lea esi,[ebp-$1c]
mov edi,[ebp-$18]
Nothing in there calls into ntdll.dll.
The questions still remain:
why is pushing of one variable onto the stack, and two others into registers the bottleneck?
why isn't anything inside PrepareCell itself the bottleneck?

Well, this problem was actually my main reason to make my own sampling profiler:
http://code.google.com/p/asmprofiler/wiki/AsmProfilerSamplingMode
Maybe not perfect, but you could give it a try. Let me know what you think about it.
Btw, I think it has to do with the fact that almost all calls ends into calls to the kernel (memory requests, paint events, etc). Only calculations do not need to call the kernel.
Most calls ends in waiting for kernel results:
ntdll.dll!KiFastSystemCallRet
You can see this in Process Explorer with thread stack view, or in Delphi, or using StackWalk64 API in my "Live view" of AsmProfiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer

There are probably two things happening there.
The first is that SamplingProfiler identifies the caller by walking up the stack, until it encounters what looks like a valid call point into Delphi from Delphi code.
The thing is, some procedures may reserve a large amount of stack at once, without reinitializing it. This could result in a false positive. The only clue then would be that your false positive was recently invoked.
The second thing is the ntdll localization, that is known for certain, however, ntdll is your wait point in user-space, and as user197220, ntdll is where you'll end up waiting most of the time you're calling system stuff and waiting for the result.
In your case, unless you reduced the sampling rate, you're looking at 247ms of CPU work time, which could probably pass as idle if those 247 samples were collected over many seconds of real time. Since the false positive points to VirtualTree paint preparations, my bet would be that the ntdll time is actually paint time (driver or OS software).
You can try commenting out the code that actually does the painting to be sure.

Related

Fetch and execution cycle

Im trying to understand the instructions cycle of a cpu and how it executes instructions.
From my understanding, while a program is compiled into binary it is stored on the hard drive. When you run the program, the binary execution code is loaded into RAM and the binary resides in the .text area of the memory space for a process. However, im confused as to why the return address needs to stored on the stack. If we are executing 10 instructions, say the 3rd instruction is a function call, the kernal begins allocating memory for the function and also for the return address. Ive have read that the return address is the next instruction from the EIP register. Here, would the return address be the 4th instruction located in memory so that when the stack for the function is torn down again, we instructions will continue executing from the 4th? When the function executes, do the instructions jump from 3rd to 8th for example?
This is the source if it helps:
https://www.csee.umbc.edu/~chang/cs313.s02/stack.shtml
Next, main pushes the arguments for foo one at a time, last argument
first onto the stack. For example, if the function call is:
a = foo(12, 15, 18) ; The assembly language instructions might be:
push dword 18
push dword 15
push dword 12 Finally, main can issue the subroutine call instruction:
call foo
When the call instruction is executed, the contents of the EIP register is pushed onto the stack. Since the EIP
register is pointing to the next instruction in main, the effect is
that the return address is now at the top of the stack. After the call
instruction, the next execution cycle begins at the label named foo.
Figure 2 shows the contents of the stack after the call instruction.
The red line in Figure 2 and in subsequent figures indicates the top
of the stack prior to the instructions that initiated the function
call process. We will see that after the entire function call has
finished, the top of the stack will be restored to this position
Any help is appreciated, thankyou

What happens when ShowMessage is called?

I am working on the project where I communicate with another device over COM port.
For incoming data I am using VaComm1RXchar event, there I store the message into the array and increment msgIndex which represents number of messages.
Then I call the function where I work with this message.
Inside this function is this timeout cycle where I wait for this message:
while MsgIndex < 1 do
begin
stop := GetTickCount;
if (stop - start)> timeout then
begin
MessageBox(0, 'Timeout komunikace !', 'Komunikace', MB_OK);
exit(false);
end;
sleep(10);
end;
The strange thing for me is that, when have it like this above then it always end with timeout. But when I put on there before this while cycle a ShowMessage('Waiting') then It works correcly.
Does anyone know what can caused this and how can I solve it? Thanks in advance!
We can deduce that the VaComm1RXchar event is a synchronous event and that by blocking your program in a loop you are preventing the normal message processing that would allow that event to execute.
Showing a modal dialog box, on the other hand, passes message handling to that dialog so the message queue is properly serviced and your Rx events and handled normally.
You can be certain this is the case if this also works (please never write code like this - it's just to prove the point):
while MsgIndex < 1 do begin
stop := GetTickCount;
if (stop - start)> timeout then begin
MessageBox(0, 'Timeout komunikace !', 'Komunikace', MB_OK);
exit(false);
end;
Application.ProcessMessages; // service the message queue so that
sleep(10); // your Rx event can be handled
end;
If there is a lesson here it is that RS-232 communication really needs to be done on a background thread. Most all implementations of "some chars have been received" events lead to dreadful code for the very reason you are discovering. Your main thread needs to be free to process the message that characters have been received but, at the same time, you must have some parallel process that is waiting for those received characters to complete a cogent instruction. There does not exist a sensible solution in an event-driven program to both manage the user interface and the communication port at the same time on one thread.
A set of components like AsyncPro**, for example, wrap this functionality into data packets where synchronous events are used but where the components manage start and end string (or bytes) detection for you on a worker thread. This removes one level of polling from the main thread (ie: you always get an event when a complete data packet has arrived, not a partial one). Alternatively, you can move the communication work to a custom thread and manage this yourself.
In either case, this is only partly a solution, of course, since you still cannot stick to writing long procedural methods in synchronous event handlers that require waiting for com traffic. The second level of polling, which is managing a sequence of complete instructions, will still have you needing to pump the message queue if your single procedure needs to react to a sequence of more than one comport instruction. What you also need to think about is breaking up your long methods into shorter pieces, each in response to specific device messages.
Alternatively, for heavily procedural process automation, it is also often a good idea to move that work to a background thread as well. This way your worker threads can block on synchronization objects (or poll in busy loops for status updates) while they wait for events from hardware. One thread can manage low level comport traffic, parsing and relaying those commands or data packets, while a second thread can manage the higher level process which is handling the sequence of complete comport instructions that make up your larger process. The main thread should primarily only be responsible for marshalling these messages between the workers, not doing any of the waiting itself.
See also : I do not understand what Application.ProcessMessages in Delphi is doing
** VAComm may also support something like this, I don't know. The API and documentation are not publicly available from TMS for ASync32 so you'll need to consult your local documentation.
What happens is that a modal message loop executes when the dialog is showing. That this message loop changes behaviour for you indicates that your communications with the device require the presence of a message loop.
So the solution for you is to service the message queue. A call to Application.ProcessMessages in your loop will do that, but also creates other problems. Like making your UI become re-entrant.
Without knowing more about your program, I cannot offer more detailed advice as to how you should solve this problem.

SQLite callback causes recursive exceptions after a successful return

I am developing this Delhi 2009 app with SQLite database version 3.7.9, accessed by the Zeos components. Started off OK - I can create a DB if none exists, create tables and insert records - the 'normal stuff'. There is a little table that contains serial params for a connection to some old equipments and the serial comms handler needs to be informed of connection parameter changes, but the Zeos components do not support events on SQLite.
I looked at the SQLite API and there is a 'Data change notification callback', so I set about making this happen. The setup prototype is:
void *sqlite3_update_hook(
sqlite3*,
void(*)(void *,int ,char const *,char const *,sqlite3_int64),
void*
);
So, in Delphi, I imported the DLL call:
function sqlite3_update_hook(pDB:pointer;pCallback:pointer;aux:pointer): pointer; stdcall;
..
function sqlite3_update_hook; external 'sqlite3.dll' name 'sqlite3_update_hook';
..and declared an empty callback for testing:
function DBcallback(aux:pointer;CBtype:integer;CBdatabaseName:PChar;CBtableName:PChar;CBrowID:Int64):pointer; stdcall;
begin
end;
..and called the setup:
sqlite3_update_hook(SQLiteHandle,#DBcallback,dmOilmon);
SQLiteHandle is the sqlite3* pointer for the database as retrieved from the driver after connection. dmOilmon is the DataModule instance so that, when working correctly, I can call some method from the callback, (yes, I know callback is in an unknown thread - I can deal with that OK, I'll just be signaling semaphores).
Good news: when I ran the app, the DBcallback was called, aparrently successfully, upon the first insert. Bad news: some little time later, the app blew up with 'Too many exceptions', and usually a CPU window full of '??', (the occasional alternative was system death and a reboot - Vista Ultimate 64). Breaking in the callback, aux, CBtype and CBrowID were all as expected but the debugger tooltip showed the CBdatabaseName / CBtableName PChars to be pointing at a string of Chinese characters..
I tried to trace where the callback is called from - the call chain passes through in the Zeos driver code, for some reason. Breaking there and stepping through, I checked the stack pointer and it's the same before and after the callback.
So, I set up the 'empty' callback, the callback gets called at the expected point but with a coupe of dodgy-looking parameters, the callback returns to whence it came with the SP correct. I seem to have done everything right, but... :((
Has anyone seen this, (or fixed it even:)?
Can you suggest a mechanism whereby an aparrently successful callback can pesuade an app to later generate recursive exceptions?
Does anyone have any suggestions for further debugging?
Rgds,
Martin
You have the problems with correct translation of SQLite header to Delphi and with linking:
Not stdcall, but must be cdecl.
Your function DBcallback must be a procedure. Because void(*)(...) is a pointer to C function returning void.
Not PChar, but must be PAnsiChar. For non-Unicode Delphi's it may be not important, for Unicode Delphi's, it is only a correct translation.
You are using compile-time dynamic linking (external 'sqlite3.dll' ...) to sqlite3.dll. But ZeosLib uses run-time dynamic linking (LoadLibrary & GetProcAddress). That may lead to "DLL hell" issue, when your EXE loads one sqlite3.dll, and ZeosLib plans to load another sqlite3.dll.

how do the registers get saved when a process gets interrupted?

this has been bugging me all day. When a program sets itself up to call a function when it receives a certain interrupt, I know that the registers are pushed onto the stack when the program is interrupted, but what I can't figure out is: how do the registers get off the stack? I know that the compiler doesn't know if the function is an interrupt handler, and it can't know how many arguments the interrupt gave to the function. So how on earth does it get the registers off?
It depends on the compiler, the OS and the CPU.
For low level embedded stuff, where an ISR may be called directly in response to an interrupt, the compiler will typically have some extension to the language (usually C or C++) that flags a given routine as an ISR, and registers will be saved and restored at the beginning and end of such a routine. [1]
For common desktop/server OSs though there is normally a level of abstraction between interrupts and user code - interrupts are normally handled first by some kernel code before being passed to a user routine, in which case the kernel code takes care of saving and restoring registers, and there is nothing special about the user-supplied ISR.
[1] E.g. Keil 8051 C compiler:
void Some_ISR(void) interrupt 0 // this routine will get called in response to interrupt 0
{
// compiler generates preamble to save registers
// ISR code goes here
// compiler generates code to restore registers and
// do any other special end-of-ISR stuff
}

What is the right tool to detect VMT or heap corruption in Delphi?

I'm a member in a team that use Delphi 2007 for a larger application and we suspect heap corruption because sometimes there are strange bugs that have no other explanation.
I believe that the Rangechecking option for the compiler is only for arrays. I want a tool that give an exception or log when there is a write on a memory address that is not allocated by the application.
Regards
EDIT: The error is of type:
Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD
EDIT2: Thanks for all suggestions. Unfortunately I think that the solution is deeper than that. We use a patched version of Bold for Delphi as we own the source. Probably there are some errors introduced in the Bold framwork. Yes we have a log with callstacks that are handled by JCL and also trace messages. So a callstack with the exception can lock like this:
20091210 16:02:29 (2356) [EXCEPTION] Raised EBold: Failed to derive ServerSession.mayDropSession: Boolean
OCL expression: not active and not idle and timeout and (ApplicationKernel.allinstances->first.CurrentSession <> self)
Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD. At Location BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016)
Inner Exception Raised EBold: Failed to derive ServerSession.mayDropSession: Boolean
OCL expression: not active and not idle and timeout and (ApplicationKernel.allinstances->first.CurrentSession <> self)
Error: Access violation at address 00404E78 in module 'BoatLogisticsAMCAttracsServer.exe'. Read of address FFFFFFDD. At Location BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016)
Inner Exception Call Stack:
[00] System.TObject.InheritsFrom (sys\system.pas:9237)
Call Stack:
[00] BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression (BoldSystem.pas:4016)
[01] BoldSystem.TBoldMember.DeriveMember (BoldSystem.pas:3846)
[02] BoldSystem.TBoldMemberDeriver.DoDeriveAndSubscribe (BoldSystem.pas:7491)
[03] BoldDeriver.TBoldAbstractDeriver.DeriveAndSubscribe (BoldDeriver.pas:180)
[04] BoldDeriver.TBoldAbstractDeriver.SetDeriverState (BoldDeriver.pas:262)
[05] BoldDeriver.TBoldAbstractDeriver.Derive (BoldDeriver.pas:117)
[06] BoldDeriver.TBoldAbstractDeriver.EnsureCurrent (BoldDeriver.pas:196)
[07] BoldSystem.TBoldMember.EnsureContentsCurrent (BoldSystem.pas:4245)
[08] BoldSystem.TBoldAttribute.EnsureNotNull (BoldSystem.pas:4813)
[09] BoldAttributes.TBABoolean.GetAsBoolean (BoldAttributes.pas:3069)
[10] BusinessClasses.TLogonSession._GetMayDropSession (code\BusinessClasses.pas:31854)
[11] DMAttracsTimers.TAttracsTimerDataModule.RemoveDanglingLogonSessions (code\DMAttracsTimers.pas:237)
[12] DMAttracsTimers.TAttracsTimerDataModule.UpdateServerTimeOnTimerTrig (code\DMAttracsTimers.pas:482)
[13] DMAttracsTimers.TAttracsTimerDataModule.TimerKernelWork (code\DMAttracsTimers.pas:551)
[14] DMAttracsTimers.TAttracsTimerDataModule.AttracsTimerTimer (code\DMAttracsTimers.pas:600)
[15] ExtCtrls.TTimer.Timer (ExtCtrls.pas:2281)
[16] Classes.StdWndProc (common\Classes.pas:11583)
The inner exception part is the callstack at the moment an exception is reraised.
EDIT3: The theory right now is that the Virtual Memory Table (VMT) is somehow broken. When this happen there is no indication of it. Only when a method is called an exception is raised (ALWAYS on address FFFFFFDD, -35 decimal) but then it is too late. You don't know the real cause for the error. Any hint of how to catch a bug like this is really appreciated!!! We have tried with SafeMM, but the problem is that the memory consumption is too high even when the 3 GB flag is used. So now I try to give a bounty to the SO community :)
EDIT4: One hint is that according the log there is often (or even always) another exception before this. It can be for example optimistic locking in the database. We have tried to raise exceptions by force but in test environment it just works fine.
EDIT5: Story continues... I did a search on the logs for the last 30 days now. The result:
"Read of address FFFFFFDB" 0
"Read of address FFFFFFDC" 24
"Read of address FFFFFFDD" 270
"Read of address FFFFFFDE" 22
"Read of address FFFFFFDF" 7
"Read of address FFFFFFE0" 20
"Read of address FFFFFFE1" 0
So the current theory is that an enum (there is a lots in Bold) overwrite a pointer. I got 5 hits with different address above. It could mean that the enum holds 5 values where the second one is most used. If there is an exception a rollback should occur for the database and Boldobjects should be destroyed. Maybe there is a chance that not everything is destroyed and a enum still can write to an address location. If this is true maybe it is possible to search the code by a regexpr for an enum with 5 values ?
EDIT6: To summarize, no there is no solution to the problem yet. I realize that I may mislead you a bit with the callstack. Yes there are a timer in that but there are other callstacks without a timer. Sorry for that. But there are 2 common factors.
An exception with Read of address FFFFFFxx.
Top of callstack is System.TObject.InheritsFrom (sys\system.pas:9237)
This convince me that VilleK best describe the problem.
I'm also convinced that the problem is somewhere in the Bold framework.
But the BIG question is, how can problems like this be solved ?
It is not enough to have an Assert like VilleK suggest as the damage has already happened and the callstack is gone at that moment. So to describe my view of what may cause the error:
Somewhere a pointer is assigned a bad value 1, but it can be also 0, 2, 3 etc.
An object is assigned to that pointer.
There is method call in the objects baseclass. This cause method TObject.InheritsForm to be called and an exception appear on address FFFFFFDD.
Those 3 events can be together in the code but they may also be used much later. I think this is true for the last method call.
EDIT7: We work closely with the the author of Bold Jan Norden and he recently found a bug in the OCL-evaluator in Bold framework. When this was fixed these kinds of exceptions decreased a lot but they still occasionally come. But it is a big relief that this is almost solved.
You write that you want there to be an exception if
there is a write on a memory address that is not allocated by the application
but that happens anyway, both the hardware and the OS make sure of that.
If you mean you want to check for invalid memory writes in your application's allocated address range, then there is only so much you can do. You should use FastMM4, and use it with its most verbose and paranoid settings in debug mode of your application. This will catch a lot of invalid writes, accesses to already released memory and such, but it can't catch everything. Consider a dangling pointer that points to another writeable memory location (like the middle of a large string or array of float values) - writing to it will succeed, and it will trash other data, but there's no way for the memory manager to catch such access.
I don't have a solution but there are some clues about that particular error message.
System.TObject.InheritsFrom subtracts the constant vmtParent from the Self-pointer (the class) to get pointer to the adress of the parent class.
In Delphi 2007 vmtParent is defined:
vmtParent = -36;
So the error $FFFFFFDD (-35) sounds like the class pointer is 1 in this case.
Here is a test case to reproduce it:
procedure TForm1.FormCreate(Sender: TObject);
var
I : integer;
O : tobject;
begin
I := 1;
O := #I;
O.InheritsFrom(TObject);
end;
I've tried it in Delphi 2010 and get 'Read of address FFFFFFD1' because the vmtParent is different between Delphi versions.
The problem is that this happens deep inside the Bold framework so you may have trouble guarding against it in your application code.
You can try this on your objects that are used in the DMAttracsTimers-code (which I assume is your application code):
Assert(Integer(Obj.ClassType)<>1,'Corrupt vmt');
It sounds like you have memory corruption of object instance data.
The VMT itself isn't getting corrupted, FWIW: the VMT is (normally) stored in the executable and the pages that map to it are read-only. Rather, as VilleK says, it looks like the first field of the instance data in your case got overwritten with a 32-bit integer with value 1. This is easy enough to verify: check the instance data of the object whose method call failed, and verify that the first dword is 00000001.
If it is indeed the VMT pointer in the instance data that is being corrupted, here's how I'd find the code that corrupts it:
Make sure there is an automated way to reproduce the issue that doesn't require user input. The issue may be only reproducible on a single machine without reboots between reproductions owing to how Windows may choose to lay out memory.
Reproduce the issue and note the address of the instance data whose memory is corrupted.
Rerun and check the second reproduction: make sure that the address of the instance data that was corrupted in the second run is the same as the address from the first run.
Now, step into a third run, put a 4-byte data breakpoint on the section of memory indicated by the previous two runs. The point is to break on every modification to this memory. At least one break should be the TObject.InitInstance call which fills in the VMT pointer; there may be others related to instance construction, such as in the memory allocator; and in the worst case, the relevant instance data may have been recycled memory from previous instances. To cut down on the amount of stepping needed, make the data breakpoint log the call stack, but not actually break. By checking the call stacks after the virtual call fails, you should be able to find the bad write.
mghie is right of course. (fastmm4 calls the flag fulldebugmode or something like that).
Note that that works usually with barriers just before and after an heap allocation that are regularly checked (on every heapmgr access?).
This has two consequences:
the place where fastmm detects the error might deviate from the spot where it happens
a total random write (not overflow of existing allocation) might not be detected.
So here are some other things to think about:
enable runtime checking
review all your compiler's warnings.
Try to compile with a different delphi version or FPC. Other compilers/rtls/heapmanagers have different layouts, and that could lead to the error being caught easier.
If that all yields nothing, try to simplify the application till it goes away. Then investigate the most recent commented/ifdefed parts.
The first thing I would do is add MadExcept to your application and get a stack traceback that prints out the exact calling tree, which will give you some idea what is going on here. Instead of a random exception and a binary/hex memory address, you need to see a calling tree, with the values of all parameters and local variables from the stack.
If I suspect memory corruption in a structure that is key to my application, I will often write extra code to make tracking this bug possible.
For example, in memory structures (class or record types) can be arranged to have a Magic1:Word at the beginning and a Magic2:Word at the end of each record in memory. An integrity check function can check the integrity of those structures by looking to see for each record Magic1 and Magic2 have not been changed from what they were set to in the constructor. The Destructor would change Magic1 and Magic2 to other values such as $FFFF.
I also would consider adding trace-logging to my application. Trace logging in delphi applications often starts with me declaring a TraceForm form, with a TMemo on there, and the TraceForm.Trace(msg:String) function starts out as "Memo1.Lines.Add(msg)". As my application matures, the trace logging facilities are the way I watch running applications for overall patterns in their behaviour, and misbehaviour. Then, when a "random" crash or memory corruption with "no explanation" happens, I have a trace log to go back through and see what has lead to this particular case.
Sometimes it is not memory corruption but simple basic errors (I forgot to check if X is assigned, then I go dereference it: X.DoSomething(...) that assumes X is assigned, but it isn't.
I Noticed that a timer is in the stack trace.
I have seen a lot of strange errors where the cause was the timer event is fired after the form i free'ed.
The reason is that a timer event cound be put on the message que, and noge get processed brfor the destruction of other components.
One way around that problem is disabling the timer as the first entry in the destroy of the form. After disabling the time call Application.processMessages, so any timer events is processed before destroying the components.
Another way is checking if the form is destroying in the timerevent. (csDestroying in componentstate).
Can you post the sourcecode of this procedure?
BoldSystem.TBoldMember.CalculateDerivedMemberWithExpression
(BoldSystem.pas:4016)
So we can see what's happening on line 4016.
And also the CPU view of this function?
(just set a breakpoint on line 4016 of this procedure and run. And copy+paste the CPU view contents if you hit the breakpoint). So we can see which CPU instruction is at address 00404E78.
Could there be a problem with re-entrant code?
Try putting some guard code around the TTimer event handler code:
procedure TAttracsTimerDataModule.AttracsTimerTimer(ASender: TObject);
begin
if FInTimer then
begin
// Let us know there is a problem or log it to a file, or something.
// Even throw an exception
OutputDebugString('Timer called re-entrantly!');
Exit; //======>
end;
FInTimer := True;
try
// method contents
finally
FInTimer := False;
end;
end;
N#
I think there is another possibility: the timer is fired to check if there are "Dangling Logon Sessions". Then, a call is done on a TLogonSession object to check if it may be dropped (_GetMayDropSession), right? But what if the object is destroyed already? Maybe due to thread safety issues or just a .Free call and not a FreeAndNil call (so a variable is still <> nil) etc etc. In the mean time, other objects are created so the memory gets reused. If you try to acces the variable some time later, you can/will get random errors...
An example:
procedure TForm11.Button1Click(Sender: TObject);
var
c: TComponent;
i: Integer;
p: pointer;
begin
//create
c := TComponent.Create(nil);
//get size and memory
i := c.InstanceSize;
p := Pointer(c);
//destroy component
c.Free;
//this call will succeed, object is gone, but memory still "valid"
c.InheritsFrom(TObject);
//overwrite memory
FillChar(p, i, 1);
//CRASH!
c.InheritsFrom(TObject);
end;
Access violation at address 004619D9 in module 'Project10.exe'. Read of address 01010101.
Isn't the problem that "_GetMayDropSession" is referencing a freed session variable?
I have seen this kind of errors before, in TMS where objects were freed and referenced in an onchange etc (only in some situations it gave errors, very difficult/impossible to reproduce, is fixed now by TMS :-) ). Also with RemObjects sessions I got something similar (due to bad programming bug by myself).
I would try to add a dummy variable to the session class and check for it's value:
public variable iMagicNumber: integer;
constructor create: iMagicNumber := 1234567;
destructor destroy: iMagicNumber := -1;
"other procedures": assert(iMagicNumber = 1234567)

Resources