How do I debug my program when it hangs? - delphi

I have an application which takes measurements every second (I am running it in demo mode & generating random data, so the problem is not to do with reading from devices attached to the serial port).
After 5 or 6 minutes it hangs.
I have added
try
// entire body of procedure/function goes here
except
on E: Exception do
begin
MessageDlg('Internal coding error in <function name>()',
mtError, [mbOK], 0);
end;
end;
to every single function (and Application.Run() in the project file), but I don't see any message dialogs. Any idea how I can test this?
Update: I woudl guess sit to be a resource problem, either RAM or MySql database - but other programs are running ok, and it's only 5 floats and timestamp that get saved with each measurement, so both seem unlikely after such a short time.
Solution: there were many great answers (thanks and +1 all round), but I finally got it (as suggested) by running in the IDE and using Run/Pause to see that it was an ever increasing loop.
Thanks again, everyone.

I'd try the following:
Attach and click Pause and see where it is, what threads are running, what threads are waiting (if all of them then you have a deadlock).
Refactor your main method into bunch of small ones (you may have already done that) and then replace small ones with dummy/hardcoded values. This may help but not necessarily identify faulty block.
Watch the system resource consumption (handles, threads, etc.) with PerfMon or something. See if you running out of memory and start using HDD.
If you using sockets, check if your reading timeout is set to infinity. If yes then change to some value and watch for timeouts.
In .NET it's possible to enabling handling of all exceptions, which means before the code handle it (like in catch statement), the IDE breaks at the point of the exception. Enable that in Delphi if possible and see if you are getting any.

Use the debugger to find out what your app is doing when it appears to hang.
Run your program under the debugger. Let it run until it hangs. When it hangs, switch to the debugger and select "Debug: Break All" (or equivalent) to make the debugger freeze all threads and take control of the process.
Now open the Threads view in the debugger and examine the stack traces of each thread in your program. Start with the main thread, obviously. Be sure to look back through the call stack for several calls to see if you recognize any of your code. If you find a stack trace that is in the middle of a loop, examine the local variables to see if your loop control variable has somehow slipped past the exit condition and put you into an infinite loop.
If your stack traces indicate that every thread is blocked waiting for some external event, you may have a thread deadlock - thread A taking lock A, then trying to take lock B, while thread B is holding lock B and trying to take lock A. If you're not using threads in your program, this is less likely but still keep an eye out for it.
If nothing odd jumps out at you after reviewing the stack traces of the threads, let the program run a few seconds more, then break into it again with the debugger and have another look around. See what's different in the stack traces.
This should help you narrow down at least which bodies of code are involved in your hang.

If you don't see an exception before adding the exception handlers, you won't produce any MessageDlgs to see.
If the program is hanging (rather than blowing up with an exception) you could have a looping problem, or you could have some blocking call that's never completing. Write log messages to a window or file (you can use OutputDebugStr) to isolate the problem to one section of your code — at least to the body of one function. You may see the problem right away. If not, you can use OutputDebugStr, breakpoints, and trace to learn what's happening in that section of code on a line-by-line basis.

First: try to debug it in Delphi IDE.
Second: if you can't do this (on customer PC), try the "Process Stack Viewer" of my (open source) sampling profiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer
(you need some debug info: a .map or .jdbg file)
Then look at the stack of your threads (probably main/first thread). You can post the stack trace here then (if you can't find the problem).

If you want help with this sort of thing in an app that you cannot use the IDE for, then something like madExcept can help a lot. It has an automatic freeze checker for the main thread, and you can have it give you a stack dump to show what it was doing when it was frozen. The user can choose to kill or continue, and the app can tell madExcept it is busy and not to alert if appropriate (for doing long analysis or printing or something).

Related

Recurrent exception in Delphi TService on W10 and Server 2012r2

I am working on a Delphi application using the TService functionalities.
It is a large project that was started by someone else.
The application uses several separate threads for processing, communicating with clients, database access, etc. The application’s main job is to poll regularly (every 2-300ms) certain devices and, a couple times a day, execute specific actions.
Now somewhere there is an unhandled exception for which I cannot seem to find the cause:
According to the debugger, the faulting method is System.Classes.StdWndProc.
This also seems to be confirmed by analyzing the crash dump file with WinDbg.
After numerous tests and debugging, I noticed that the crash happened on my dev computer every day at almost the same time.
I looked at the windows log and found this event:
This, coupled with the fact that the stack trace from Delphi indicates that a message with ID=26 (WM_WININICHANGE) was processed, made me believe that there might be something wrong with my usage of FormatDateTime() or DateTimeToStr() when regional settings are reloaded.
I checked every call and made sure to be using the thread-safe overload with a local instance of TFormatSettings.
However today the service crashed again.
A few points that I think are worth mentioning:
The application is also installed on a Windows 2008 server and has
been running OK for over a month.
On 2012r2 I tried forcing DEP off, but it didn’t change anything.
The service’s OnExecute() method is not implemented. I create a base thread in TService. ServiceStart() which then in turn creates the main data module and all the other threads.
The service is not marked as interactive and is executed with the Local System account.
All data modules are created with AOwner=nil.
With a special parameter, the application can be started a normal windowed application with a main form (which is created only in this case). The exception does not seem to happen when running in GUI mode.
Almost all threads have a message pump and use PostThreadMessage() to exchange information. There are no window handle allocations anywhere.
I have checked the whole project and there are no timers or message dialogs or other graphical components anywhere.
I have activated range as well as overflow checking and found no issues.
I am a loss for what to do here. I have checked and re-checked the code several times without finding anything that could explain the error.
Looking online I found several reports that seem to be pertinent to my situation, but none that actually explains what is going on:
https://answers.microsoft.com/en-us/windows/forum/windows_10-other_settings/windows-10-group-policy-application-hang/72016ea4-ba89-4770-b1de-6ddf14b0a51f
https://www.experts-exchange.com/questions/20720591/Prevent-regional-settings-from-changing-inside-a-TService-class.html
https://forums.embarcadero.com/thread.jspa?messageID=832265
Before taking everything apart I would like to know if anyone experienced anything similar or has any tips.
Thanks
Edit: looking at the call stack and the first method executed I am thinking of the TApplication instance that is used in TService.
LPARAM is always 648680.
LPARAM is a pointer to a string (you can cast it to PChar).
You could try to catch WM_SETTINGS_CHANGE and log anytime it's processing.
I think you could use TApplicationEvents component. The component has OnSettingsChange event. The event provides a setting area name (section name) and a flag that points to changed parameter.
Have a look at doc-wiki.
There is a full list of possible parameters.
I did not test it, but maybe you can use Abort procedure in OnSettingsChange event handler to stop the message distribution. Of course it's not a solution, but it may work.

Delphi 5 application partially loaded in task manager, takes forever to actually display

I have an application written in Delphi 5, which runs fine on most (windows) computers.
However, occasionally the program begins to load (you can see it in task manager, uses about 2.5-3 MB of memory), but then stalls for a number of minutes, sometimes hours.
If you leave it long enough, the formshow event will eventually occur and the application window will pop up, but it seems like some other application or windows setting is preventing it from initially using all the memory it needs to run (approx. 35-40 MB).
Also, on some of my client's workstations, if they have MS Outlook running, they can close it and my application will pop up. Does anyone know what is going on here, and/or how to fix it?
Since nobody has given a better answer I'll take a stab at how to solve this:
There's something in your initialization that is locking it up somehow. Without seeing your code I do not know what it is so I'll only address how to go about finding it:
You need to log what you accomplish during startup. If you have any kind of screen showing I find the window title useful for this but it sounds like you don't--that means you need to write the log to a file. Let it get lost, kill the task and see where it got.
Note that this means you need to cleanly write your data despite an abnormal program termination. How to go about this:
A) Append, write your line, close.
B) Write your line, then flush the file handle.
C) Initially write your file to consist of a large number of blanks--ensure this is larger than the actual log will be. Write your line. In case of abnormal termination it will retain the original larger file size.
I would write a timestamp on every log item so you can see if it's just processing something too slowly.
If examining the log shows you where the problem is, fine. If, as usually happens, it's not enough you put a bunch more logging between the last item that did get logged and the next one that didn't--I've been known to log every line when hunting a cryptic problem that only happened on someone else's system.
If finding the line isn't enough to pinpoint the problem also dump the value of relevant variables.
Finally, if such intense scrutiny makes the bug go away start looking for an uninitialized variable. (While a memory stomp is also an option I doubt it's the culprit here.)

AV after successful close of applications

I am getting this AV message about 3 to 5 seconds after the applications close as expected:
Exception EAccessViolation in module rtl160.bpl at 00073225. Access violation at address 500A3225 in module 'rtl160.bpl'. Read of address 00000004.
These (20) applications are very similar in that they are IBX business applications. About half of them did not cause the AV to occur.
These applications were ported from Delphi-xe and they worked flawlessly for a long time. No changes were made to the projects in the port. Both 32 and 64 bit builds gave the same results.
Is this a bug in some library's finalization section freeing a resource or something?
I am using Delphi-XE2 Update 3.
Would appreciate the help.
Try using madExcept / EurekaLog etc. - they give you detailed stack trace on AV. This is not always a panacea, but can point you to the problem.
Access Violations are by their nature already very troublesome beasts since they deal with invalid pointers in memory. One that occurs a while after an application shuts down is even worse because that's when your app is in "cleanup" mode. You're could be dealing with something that went wrong much earlier in the application, but is only exposing itself at shutdown.
General Tips:
Try to always undo things in the reverse order you did them. E.g.
Create A, Create B ... Destroy B, Destroy A
Connect to Database, Open Dataset ... Close Dataset, Disconnect from Database
Even making sure you've done all the above before shutting down can help tremendously.
Any threads that are still running while your application is running can cause problems.
Preferably ensure all your child threads are properly terminated before final shutdown.
Refer back to Closing datasets above. Depending on what you're doing, some database components will create their own threads.
If you're using COM, try ensure ComObj is high up in the initialization sequence (I.e. place it as high as possible in your DPR).
Delphi finalizes units in the reverse order that they were initialized.
And you don't want ComObj to finalize before other things that are dependent on ComObj have also done so.
If you're using interface references, make sure you resolve circular reference issues.
Some of these problems can be tricky to find, but you can do the following:
Setup a source-code "sandbox" environment (you're going to chuck all your changes as soon as you've found the problem).
Figure out the simplest set of steps required to guarantee the error. (Start app and immediately shutdown would be ideal.)
Then you're going to comment-out delete wipe out chunks of code between tests and basically follow a divide and conquer approach to:
rip out code
test
if the problem persists, repeat. Else roll-back and rip out a different chunk of code.
eventually your code base will be small enough to pinpoint likely problems which can be tackled with targeted testing.
I've had this kind of access violation problem on occasion with old Delphi or C++Builder projects. Today I had it with C++Builder. At the time of the crash, by looking in the Debug -> Call Stack window, I can see that it's happening inside a call to fflush, called by __exit_streams and _exit.
I'm not sure what is causing it, since it's so deep in the Borland library code, but it seems to come and go at random when the code changes. And it seems to be more common with multi-form applications.
This time the error went away when I just added a new button on the main form. A button which is just there, has no event handlers and does not do anything. I think that any random change to the code, classes, variables etc rearranges the memory layout when you relink the application, and that either triggers or untriggers the error.
For now, I just leave the new button on the form, set it to "not visible" so that there's no visible change. As it seems to work, it's good enough solution for me at this time.

How do you cleanly abort a Delphi program?

I've got a program that's having some trouble during shutdown, raising exceptions that I can't trace back to their source. It appears to be timing-related and non-deterministic. This is occurring after all shared resources have been released, and since it's shutdown, memory leaks are not an issue, so that makes me wonder if there's any way to just tell the program to terminate immediately and silently after releasing the shared resources, instead of continuing with the shutdown sequence and giving an exception message box.
Does anyone know how to do that?
After looking at the Delphi Run Time Library source code, and at the Microsoft documentation; I can corroborate Mason and Paul-Jan comments.
The hierarchy of shutdown is as follows
Application.Terminate()
performs some unidentified housekeeping of application
calls Halt()
Halt()
calls ExitProc if set
alerts the user in case of runtime error
get rid of PackageLoad call contexts that might be pending
finalize all units
clear all exception handlers
call ExitprocessProc if set
and finally, call ExitProcess() from 'kernel32.dll'
ExitProcess()
unloads all DLLs
uses TerminateProcess() to kill the process
ExitProcess(0) ?
Halt(0) used to be the good old fashioned way of telling the program to end with immediate effect. There's probably a more Delphi-friendly way of doing that now, but I'm 95% sure halt(0) still works. :-)
In case HeartWare's suggestion of using ExitProcess() fails, it might be you are using some DLL's that do not respond well to the DLL_PROCESS_DETACH. In that case, try using a TerminateProcess( getCurrentProcess, 0 );
Once you resort to such measures, one might wonder if the "cleanly" part of the topic title still holds up to scrutiny.
The last time I had to hunt a problem like this was the shutdown was a causing an event (resize? It's been a while.) to fire on the dying window causing an attempt to redraw something that needed stuff that had already been disposed of.

Overlapped serial port and Blue Screen of Death

I created a class that handles serial port asynchronously. I use it to communicate with a modem. I have no idea why, but sometimes, when I close my application, I get the Blue Screen and my computer restarts. I logged my code step by step, but when the BSOD appeared, and my computer restarted, the file into which I was logging data contained only white spaces. Therefore I have no idea, what the reason of the BSOD could be.
I looked through my code carefully and I found several possible reasons of the problem (I was looking for all that could lead to accessing unallocated memory and causing AV exceptions).
When I rethought the idea of asynchronous operations, a few things came to my mind. Please verify whether these are right:
1) WaitCommEvent() takes a pointer to the overlapped structure. Therefore, if I call WaitCommEvent() inside a function and then leave the function, the overlapped structure cannot be a local variable, right? The event mask variable and event handle too, right?
2) ReadFile() and WriteFile() also take references or pointers to variables. Therefore all these variables have to be accessible until the overlapped read or write operations finish, right?
3) I call WaitCommEvent() only once and check for its result in a loop, in the mean time doing other things. Because I have no idea how to terminate asynchronous operations (is it possible?), when I destroy my class that keeps a handle to a serial port, I first close the handle, and then wait for the event in the overlapped structure that was used when calling the WaitCommEvent() function. I do this to be sure that the thread that waits asynchronously for a comm event does not access any fields of my class which is destroyed. Is it a good idea or is it stupid?
try
CloseHandle(FSerialPortHandle);
if Assigned(FWaitCommEvent) then
FWaitCommEvent.WaitFor(INFINITE);
finally
FSerialPortHandle := INVALID_HANDLE_VALUE;
FreeAndNil(FWaitCommEvent);
end;
Before I noticed all these, most of the variables mentioned in point one and two were local variables of the functions that called the three methods above. Could it be the reason of the BSOD or should I look for some other mistakes in my code?
When I corrected the code, the BSOD stopped occuring, but It might be a coincidence. How do you think?
Any ideas will be appreciated. Thanks in advance.
I read the CancelIo() function documentation and it states that this method cancells all I/O operations issued by the calling thread. Is it OK to wait for the FWaitCommEvent after calling CancelIo() if I know that WaitCommEvent() was issued by a different thread than the one that calls CancelIo()?
if Assigned(FWaitCommEvent) and CancelIo(FSerialPortHandle) then
begin
FWaitCommEvent.WaitFor(INFINITE);
FreeAndNil(FWaitCommEvent);
end;
I checked what happens in such case and the thread calling this piece of code didn't get deadlocked even though it did not issue WaitCommEvent(). I tested in on Windows 7 (if it matters). May I leave the code as is or is it dangerous? Maybe I misunderstood the documentation and this is the reason of my question. I apologize for asking so many questions, but I really need to be sure about that.
Thanks.
An application running as a standard user should never be able to cause a bug check (a.k.a. BSOD). (And an application running as an Administrator should have to go well out of its way to do so.) Either you ran into a driver bug or you have bad hardware.
By default, Windows is configured to save a minidump in %SystemRoot%\minidump whenever a bug check occurs. You may be able to determine more information about the crash by loading the minidump file in WinDbg, configuring WinDbg to use the Microsoft public symbol store, and running the !analyze -v command in WinDbg. At the very least, this should identify what driver is probably at fault (though I would guess it's your modem driver).
Yes, you do need to keep the TOverlapped structure available for the duration of the overlapped operation. You're going to call GetOverlappedResult at some point, and GetOverlappedResult says it should receive a pointer to a structure that was used when starting the overlapped operation. The event mask and handle can be stored in local variables if you want; you're going to have a copy of them in the TOverlapped structure anyway.
Yes, the buffers that ReadFile and WriteFile use must remain valid. They do not make their own local copies to use internally. The documentation for ReadFile even says so:
This buffer must remain valid for the duration of the read operation. The caller must not use this buffer until the read operation is completed.
If you weren't obeying that rule, then you were likely reading into unreserved stack space, which could easily cause all sorts of unexpected behavior.
To cancel an overlapped I/O operation, use CancelIo. It's essential that you not free the memory of your TOverlapped record until you're sure the associated operation has terminated. Likewise for the buffer you're reading or writing. CancelIo does not cancel the operation immediately, so your buffers might still be in use even after you call it.

Resources