Hunting down application errors coming from csrss.exe - delphi

I'm the maintainer of a legacy Delphi application. On machines running this program an Application Error appears sometimes with the caption referring to this Delphi app and a message like the following:
The instruction at "..." referenced memory at "...". The memory could not be "read".
Click on OK to terminate the program.
Task Manager says the process belonging to this message box is csrss.exe. What would be a systematic way to find the root cause of this error?
The problem is, this Delphi program is fairly complex, and the error message appear relatively rarely, so I cannot simply step-through the code and find the part which causes the error. Moreover, the app runs automatically, without user interruption, so I can't ask the user what she does when the message appears. Application and system logs don't indicate any problem. The app does not stop working when the message box is present.
I hope someone has run into such an error message before and was able to solve the problem. Thank you for your help in advance.

csrss supports Windows consoles. I expect that your application targets the console subsystem.
If you cannot get your application to fail under the debugger then you need to add some diagnostics to it. I recommend using an tool like madExcept or EurekaLog to do this. Personally I use madExcept and cannot recommend it highly enough. From what I have heard, EurekaLog is also a fine product.
Integrate one of these tools with your application and the next time it faults it will produce a detailed diagnostics report. Most significantly you will get stack traces for each thread in your process. The stack trace for the faulting thread should hopefully lead you to the root cause of your program's bug.
The doubt I have is that if the fault is occurring in csrss then including diagnostics in your process may not to yield fruit. It's plausible that your application already faulted, which in turn led to the error message in csrss. In which case diagnostics in your app will help. If not then you may need to find a way to make the fault occur in your process.

In addition to David's recommendation I would recommend using procdump from sysinternals to monitor the process and have it write a dumpfile when an unhandled exception occurs.
You can analyze the dumpfile offline with Windbg and the likes. While that might seem overwelming at first, I strongly believe there's a lot to be gained by getting yourself up to speed with Windbg.
Introduction
ProcDump is a command-line utility whose primary purpose is monitoring
an application for CPU spikes and generating crash dumps during a
spike that an administrator or developer can use to determine the
cause of the spike. ProcDump also includes hung window monitoring
(using the same definition of a window hang that Windows and Task
Manager use), unhandled exception monitoring and can generate dumps
based on the values of system performance counters.
Example
Launch a process and then monitor it for exceptions:
C:\>procdump -e 1 -f "" -x c:\dumps consume.exe

Related

EXC_BAD_ACCESS but NSZombies never triggered, how to debug?

We have some very random bugs happening and throwing EXC_BAD_ACCESS, or malloc_error_break, or abort. There is no consistency to the exceptions, and enabling NSZombies doesn't cause any zombies to be triggered ever.
In fact, running with zombies enabled causes the crashes never to occur. I believe there is a subtle memory error in this codebase, and after spending many hours cleaning up what could be minor issues, we still have not solved the issue.
It make sense that a bad pointer may be overwriting a piece of memory, which is then later dereferenced and crashes the app. But what are other ways to isolate the underlying issue?
We have used all the diagnostic memory tools which will run with the device attached as well (this application uses a peripheral, so cannot be fully debugged in the simulator).
NSZombie is just a mechanism to poison space used by objects instead of freeing them, similar to Address Sanitizer's Memory Poisoning. By using various instrumentation such as NSZombie or the above mentioned ASAN, your stack and heap allocation will be laid out differently, leading to undefined behavior in a condition where a crash would likely be the best case scenario.
EXC_BAD_ACCESS means you tried to access an invalid address or tried to read or write to a memory region you do not have such permissions for. The inconsistencies you're running into are likely consequences of a nasty stack or heap corruption, like sometimes overwriting a variable that just holds data and sometimes overwriting a pointer being used by the program.
Data layout matters a lot for what happens and heap layout is often randomized in non-debug builds which adds even more room for inconsistent crashes. In addition any changes to program source code or build settings may/will inevitably cause data layout changes.
I would recommend:
Build in debug mode (-g compiler flag) and run with debugger attached. When you get a crash, gdb or lldb (the latter being the default for Xcode tooling) will stop execution and let you do things, from there get a stack trace using bt and that may let you work out the deeper cause of the problem.
Use ASAN, this page explains about its usage within Xcode tooling. It's generally an excellent tool for dealing with memory issues. Beware that using it with shared libraries built without support for it may cause anomalies but it usually tells you of them and generally tries to hold your hand as much as it can.
"printf debugging" can help, something like #define TRACE printf("%s %s:%d\n", __func__, __FILE__, __LINE__); and scattering these across the likely problem point can actually be helpful.
In general, I would suggest using a debugger first, without NSZombie or anything, just do a bunch of runs to a point of a crash and get stack traces, register states etc. Having those samples across multiple crashes can help you narrow down the problem.

How to handle third party functions or threads that has hung in Delphi?

I use a lot of components in my Delphi 7 Service application, Indy, Synapse, Zeolibs, etc.
My application is generally stable, I use Eurekalog 6 to capture exceptions, but in rare situations, some threads hang because a 3rd party function it calls has hung, e.g. Indy gets stuck when trying to send email.
In many cases, the application that hung are my customer place, I've no access to their computer, so it is not possible for me to do a live debug. My application requires high availability so even if it hangs once a year, that is not acceptable to my users.
I am now looking for the best way to deal with such a situation where debugging is not feasible but I will still need the application recover by itself. Is it possible for a thread to terminate if a function it calls hangs? Alternatively, I can also restart the entire service when that happens. How about a Watchdog and what is the best way to implement it? Thanks.
I think you are being rather defeatist. Find and fix the bugs. It might be tricky, but it's the right solution.
Killing threads whose behviour you don't understand is never the solution. If you start killing threads you'll likely make things worse. That can lead to other runtime errors, deadlock and so on. Once you start killing threads you've lost control.
Now, it would be safe to kill the process (rather than a specific thread) and rely on a watchdog service to restart the process. But that's a really dire solution.
You should certainly use a tool like madExcept, EurekaLog etc. to debug unexpected exceptions. I see you are already using EurekaLog - that's good.
Deadlocks (it sounds like you have deadlock) can be more tricky to chase down. One good way to debug a deadlock is to get your client to produce a crash dump (e.g. from Process Explorer). Then debug it in WinDbg using map2dbg to produce symbolic stack traces. That will tell you which threads are blocking and that reveals the deadlock. And then fix the bugs.
For more details on this deadlock debugging technique see here: http://capnbry.net/blog/?p=18
I'm not familiar with EurekaLog since I use madExcept, but I would expect EurekaLog has a facility to allow generation of thread stack traces for a hung process. If so then that would most likely be the best approach for you.
Your question is rather too vague. If you don't know which of the various components you're using you wish to blame, then you have zero hope of fixing it. The most likely thing is you're doing something wrong, or that you don't understand how these components work. I very much doubt that it's purely a bug in the components themselves, but hey, either way it's all on you to find what's having a problem, and your job to fix it.
A deadlock that you've created, or a deep process corruption issue, that is happening, may prevent MadExcept from giving you any information, but it's worth trying.
To find out which one is freezing, if any at all, then the madexcept comment is the best suggestion yet. It will time-out (after a configurable # of seconds) and raise an artificial exception for you, interrupting your hung process. This works for user code, and for places where the thread is blocked in a Win32 or kernel function. For example, it's possible that you've set up Indy for infinite timeouts, as that's the default these days in Indy 10, and that what you're experiencing is a timeout related freeze, where network activity that you expected to complete but which never will complete, is causing your program to "hang". The cure here is to change your timeouts.
However, until you figure out WHERE the problem is, I doubt you'll be able to fix it. And so, for that, again, Marcus is right, you should be looking into madExcept. I can't live without it.
Secondly, you should really be adding trace logic to your program, so you know where it's going and what it was doing just before it had a problem. If you really need help doing that, you could try CodeSite, from Raize. Personally I find that OutputDebugString combined with the free Microsoft DebugView utility (formerly from SysInternals) tool is more than enough to debug such problems on a client computer.
Any program with background threads that does not have trace logging, is a badly designed program. Heck, any non-trivial single threaded application that might ever fail or have problems, needs trace logging.
Logging is always going to help, even when MadExcept or other exception tools don't. Trace-Logging is usually a roll-your-own solution, although CodeSite is also quite popular.

Delphi: How can I debug access violations when closing my application?

I'm using Delphi 6 and I've got an application which when being shut down produces access violation errors. We use EurekaLog so I'm getting stack traces for debugging, but the errors seem to occur randomly in a different unit each time, but always when something is being freed in the finalization section.
How can I go about debugging this to see what's causing the problem? I'm not sure how to start debugging things that happen when the application is being finalised.
[Edit:] Sorry if I was unclear, perhaps a better question would be: What's the best place to start debugging with breakpoints if I only want to walk through the finalisation sections? The errors seem to arise in third party components we use (the devexpress dx/cxgrid library) so I'd like to start debugging in my code at pretty much the last point before Delphi will start calling finalise routines in other units.
This isn't much to go on, but if I had to guess, based on past experience... are you using packages or COM libraries? If you've got a global variable that's an interface, or an object whose class is declared in a BPL, and you unload the DLL/BPL before cleaning up the object/interface, you'll get access violations because your code is trying to do a VMT lookup in address space that is no longer mapped into the application.
Check for that and make sure you clean up all such variables before finalization begins.
When the application is shutting down, do not free things in the finalization section.
1) When the application shuts down, Windows frees all the application memory. You don't have to do that.
2) When the application shuts down, the memory is released, and the infrastructure is unloaded. You can't call code to close or free objects, because that code may have been already unloaded. You can't access pointers to memory, because those pointers may have already been released.
3) When you try to free things in the finalization section while the application is shutting down, you may get failures that prevent your code from finalizing, thus preventing the application from shutting down, causing a hung application and memory loss. Which is what you were trying to prevent in the first place. Don't do it.
Ok, when you are running on Win95/98, or using external processes, you may in some circumstances have to free shared resources and notify those external processes that you are shutting down. Apart from that, it all happens automagically now.

Trouble reading memory

When I run my code through the debugger, after a series of steps it eventually gets lost and executes commands out of order. I'm not sure if the stack is overflowing or what.
This is the error I usually get:
MSP430: Trouble Reading Memory Block at 0xffe2e on Page 0 of Length 0x1d2: Invalid parameter(s)
Any suggestions on what it could be? I read briefly about possible issues with not handling some interrupts.
Also, I'm trying to fill my RAM with a specific value so that I can tell if the stack is overflowing, any suggestions on how to fill the entire RAM with, say a value of 0x1234?
Thanks!
What debugger and compiler are you using? I've found that msp430-gcc and msp430-gdb/gdbproxy can get very confused with GCC optimizations turned on. However, broken code is sometimes is emitted without them turned on (its a quality product, really).
The easiest way to fill memory is to modify you crt0.s startup file and link it yourself. When memory is set to 0, you can change the pattern there.
Which device are you using? On 16-bit devices, 0xffe2e is outside of the address space of the processor, likely an array index or similar which has gone negative.
I have seen this error as well when using code composer studio and TI's USBFET programmer although I have not been able to nail down a single, definite cause.
Assuming you are using CCS, here are some tips:
1) Catch ACCV (UNMI) and VMA (SYSNMI) interrupts and set a break point within the handlers. If one of these trips, examine the stack for clues as to what triggered the interrupt.
2) If you have any interrupt handlers which re-enable interrupts (GIE bit), make sure they are not being retriggered repeatedly.
3) I have seen this error (inexplicably) when stepping through optimized code; so it may help to turn off optimizations.
If you are using code composer studio, as an alternative to initializing your RAM, you can set a breakpoint on stack overflow. Also, with a paused debug session, CCS gives you the option to fill a portion of memory with any value you choose via the "Memory" sub-window.

Application freezes on exit - how to debug?

We have an application made up by a host (exe) and a lot of modules (dlls) containing gui etc.
Sometimes, the application freezes on shutdown. Mostly it happens during testing through TestComplete. We are not able to reproduce the behavior during debugging.
How can we find out why the application freezes?
I would guess that it is related to threads, but I do not know for sure.
Are there any tools or techniques we should try out?
I think that good old logging would help you. Add some logging to every unit finalization, add such logging to destructors of global data (database connection, global configuration etc). Of course do not destroy logger object.
If your application is multithreaded then add some logging to working threads such as writing '[date] thread [name of class] working' and write it every few seconds (you can use some debug mode). Also add logging when thread discovers that it should terminate.
Also use some system utilites such as ProcessMonitor, Handles, Process Explorer (all by Sysinternals/Microsoft). Monitor disk reads/writes, handle count, memory usage, network connections. Maybe your application dumps some big structures on disk at exit? Maybe it allocated a lot of memory and must release it?
Rig it with EurekaLog or MadExcept, and that may show you where the exception is, or where the memory leak is that is causing the exception. Both of those are excellent tools with fully-featured trial versions. Try 'em, buy 'em. Good stuff.
If the debugger's presence keeps the problem from occurring, then wait for the problem to occur, and then attach the debugger to it. Pause execution and you can inspect each thread's call stack.
If you use lots of GUI components in DLL's and/or do plenty of multi-threading then you'd first have to discover which DLL or thread is causing the problem. Or maybe it's a combination of both. Basically, you should create log events for every DLL and thread that gets loaded/started. Try to get a situation where you have a minimum of DLL's and threads loaded to generate the freeze. Then you've localized the problem to one of those.Also, create simple test-applications or use a unit-testing framework to test specific modules. For example, there is a Delphi version of NUnit available, which might help. (It's called DUnit...) Such a test framework is helpful to isolate the threads and DLL's to check each of them.
If your app is happenning under windows 7 and there are threads running in DLL, you must shutdown/terminate all running DLL threads before closing the main form of executable.
Good luck

Resources