We have an application made up by a host (exe) and a lot of modules (dlls) containing gui etc.
Sometimes, the application freezes on shutdown. Mostly it happens during testing through TestComplete. We are not able to reproduce the behavior during debugging.
How can we find out why the application freezes?
I would guess that it is related to threads, but I do not know for sure.
Are there any tools or techniques we should try out?
I think that good old logging would help you. Add some logging to every unit finalization, add such logging to destructors of global data (database connection, global configuration etc). Of course do not destroy logger object.
If your application is multithreaded then add some logging to working threads such as writing '[date] thread [name of class] working' and write it every few seconds (you can use some debug mode). Also add logging when thread discovers that it should terminate.
Also use some system utilites such as ProcessMonitor, Handles, Process Explorer (all by Sysinternals/Microsoft). Monitor disk reads/writes, handle count, memory usage, network connections. Maybe your application dumps some big structures on disk at exit? Maybe it allocated a lot of memory and must release it?
Rig it with EurekaLog or MadExcept, and that may show you where the exception is, or where the memory leak is that is causing the exception. Both of those are excellent tools with fully-featured trial versions. Try 'em, buy 'em. Good stuff.
If the debugger's presence keeps the problem from occurring, then wait for the problem to occur, and then attach the debugger to it. Pause execution and you can inspect each thread's call stack.
If you use lots of GUI components in DLL's and/or do plenty of multi-threading then you'd first have to discover which DLL or thread is causing the problem. Or maybe it's a combination of both. Basically, you should create log events for every DLL and thread that gets loaded/started. Try to get a situation where you have a minimum of DLL's and threads loaded to generate the freeze. Then you've localized the problem to one of those.Also, create simple test-applications or use a unit-testing framework to test specific modules. For example, there is a Delphi version of NUnit available, which might help. (It's called DUnit...) Such a test framework is helpful to isolate the threads and DLL's to check each of them.
If your app is happenning under windows 7 and there are threads running in DLL, you must shutdown/terminate all running DLL threads before closing the main form of executable.
Good luck
Related
There is a process process1, which does dl_open (dynamic load/unload) of various libraries. Let's call these libraries as plugins (plugin1, plugin2, plugin3 ...),
Now I can see memory leak in process1 using valgrind(and likewise). But I want to identify, the exact plugin which could have contributed (majorily) to the leak. Is there an easy way to do this? and other than a) Running each plugin as a separate process
Normally when you run an application under Valgrind memcheck with leak detection enabled it will dump the leak stats when the application terminates.
There are two mechanisms you can use to make memcheck run a leak dump at other times during execution.
Use the Valgrind client communication macros to instrument your code
Use vgdb commands
I wrote an article covering this a few years ago.
As an example you might have something like this in your code
dlclose(myHandle);
VALGRIND_DO_LEAK_CHECK;
I'm the maintainer of a legacy Delphi application. On machines running this program an Application Error appears sometimes with the caption referring to this Delphi app and a message like the following:
The instruction at "..." referenced memory at "...". The memory could not be "read".
Click on OK to terminate the program.
Task Manager says the process belonging to this message box is csrss.exe. What would be a systematic way to find the root cause of this error?
The problem is, this Delphi program is fairly complex, and the error message appear relatively rarely, so I cannot simply step-through the code and find the part which causes the error. Moreover, the app runs automatically, without user interruption, so I can't ask the user what she does when the message appears. Application and system logs don't indicate any problem. The app does not stop working when the message box is present.
I hope someone has run into such an error message before and was able to solve the problem. Thank you for your help in advance.
csrss supports Windows consoles. I expect that your application targets the console subsystem.
If you cannot get your application to fail under the debugger then you need to add some diagnostics to it. I recommend using an tool like madExcept or EurekaLog to do this. Personally I use madExcept and cannot recommend it highly enough. From what I have heard, EurekaLog is also a fine product.
Integrate one of these tools with your application and the next time it faults it will produce a detailed diagnostics report. Most significantly you will get stack traces for each thread in your process. The stack trace for the faulting thread should hopefully lead you to the root cause of your program's bug.
The doubt I have is that if the fault is occurring in csrss then including diagnostics in your process may not to yield fruit. It's plausible that your application already faulted, which in turn led to the error message in csrss. In which case diagnostics in your app will help. If not then you may need to find a way to make the fault occur in your process.
In addition to David's recommendation I would recommend using procdump from sysinternals to monitor the process and have it write a dumpfile when an unhandled exception occurs.
You can analyze the dumpfile offline with Windbg and the likes. While that might seem overwelming at first, I strongly believe there's a lot to be gained by getting yourself up to speed with Windbg.
Introduction
ProcDump is a command-line utility whose primary purpose is monitoring
an application for CPU spikes and generating crash dumps during a
spike that an administrator or developer can use to determine the
cause of the spike. ProcDump also includes hung window monitoring
(using the same definition of a window hang that Windows and Task
Manager use), unhandled exception monitoring and can generate dumps
based on the values of system performance counters.
Example
Launch a process and then monitor it for exceptions:
C:\>procdump -e 1 -f "" -x c:\dumps consume.exe
I use a lot of components in my Delphi 7 Service application, Indy, Synapse, Zeolibs, etc.
My application is generally stable, I use Eurekalog 6 to capture exceptions, but in rare situations, some threads hang because a 3rd party function it calls has hung, e.g. Indy gets stuck when trying to send email.
In many cases, the application that hung are my customer place, I've no access to their computer, so it is not possible for me to do a live debug. My application requires high availability so even if it hangs once a year, that is not acceptable to my users.
I am now looking for the best way to deal with such a situation where debugging is not feasible but I will still need the application recover by itself. Is it possible for a thread to terminate if a function it calls hangs? Alternatively, I can also restart the entire service when that happens. How about a Watchdog and what is the best way to implement it? Thanks.
I think you are being rather defeatist. Find and fix the bugs. It might be tricky, but it's the right solution.
Killing threads whose behviour you don't understand is never the solution. If you start killing threads you'll likely make things worse. That can lead to other runtime errors, deadlock and so on. Once you start killing threads you've lost control.
Now, it would be safe to kill the process (rather than a specific thread) and rely on a watchdog service to restart the process. But that's a really dire solution.
You should certainly use a tool like madExcept, EurekaLog etc. to debug unexpected exceptions. I see you are already using EurekaLog - that's good.
Deadlocks (it sounds like you have deadlock) can be more tricky to chase down. One good way to debug a deadlock is to get your client to produce a crash dump (e.g. from Process Explorer). Then debug it in WinDbg using map2dbg to produce symbolic stack traces. That will tell you which threads are blocking and that reveals the deadlock. And then fix the bugs.
For more details on this deadlock debugging technique see here: http://capnbry.net/blog/?p=18
I'm not familiar with EurekaLog since I use madExcept, but I would expect EurekaLog has a facility to allow generation of thread stack traces for a hung process. If so then that would most likely be the best approach for you.
Your question is rather too vague. If you don't know which of the various components you're using you wish to blame, then you have zero hope of fixing it. The most likely thing is you're doing something wrong, or that you don't understand how these components work. I very much doubt that it's purely a bug in the components themselves, but hey, either way it's all on you to find what's having a problem, and your job to fix it.
A deadlock that you've created, or a deep process corruption issue, that is happening, may prevent MadExcept from giving you any information, but it's worth trying.
To find out which one is freezing, if any at all, then the madexcept comment is the best suggestion yet. It will time-out (after a configurable # of seconds) and raise an artificial exception for you, interrupting your hung process. This works for user code, and for places where the thread is blocked in a Win32 or kernel function. For example, it's possible that you've set up Indy for infinite timeouts, as that's the default these days in Indy 10, and that what you're experiencing is a timeout related freeze, where network activity that you expected to complete but which never will complete, is causing your program to "hang". The cure here is to change your timeouts.
However, until you figure out WHERE the problem is, I doubt you'll be able to fix it. And so, for that, again, Marcus is right, you should be looking into madExcept. I can't live without it.
Secondly, you should really be adding trace logic to your program, so you know where it's going and what it was doing just before it had a problem. If you really need help doing that, you could try CodeSite, from Raize. Personally I find that OutputDebugString combined with the free Microsoft DebugView utility (formerly from SysInternals) tool is more than enough to debug such problems on a client computer.
Any program with background threads that does not have trace logging, is a badly designed program. Heck, any non-trivial single threaded application that might ever fail or have problems, needs trace logging.
Logging is always going to help, even when MadExcept or other exception tools don't. Trace-Logging is usually a roll-your-own solution, although CodeSite is also quite popular.
I'm using Delphi 6 and I've got an application which when being shut down produces access violation errors. We use EurekaLog so I'm getting stack traces for debugging, but the errors seem to occur randomly in a different unit each time, but always when something is being freed in the finalization section.
How can I go about debugging this to see what's causing the problem? I'm not sure how to start debugging things that happen when the application is being finalised.
[Edit:] Sorry if I was unclear, perhaps a better question would be: What's the best place to start debugging with breakpoints if I only want to walk through the finalisation sections? The errors seem to arise in third party components we use (the devexpress dx/cxgrid library) so I'd like to start debugging in my code at pretty much the last point before Delphi will start calling finalise routines in other units.
This isn't much to go on, but if I had to guess, based on past experience... are you using packages or COM libraries? If you've got a global variable that's an interface, or an object whose class is declared in a BPL, and you unload the DLL/BPL before cleaning up the object/interface, you'll get access violations because your code is trying to do a VMT lookup in address space that is no longer mapped into the application.
Check for that and make sure you clean up all such variables before finalization begins.
When the application is shutting down, do not free things in the finalization section.
1) When the application shuts down, Windows frees all the application memory. You don't have to do that.
2) When the application shuts down, the memory is released, and the infrastructure is unloaded. You can't call code to close or free objects, because that code may have been already unloaded. You can't access pointers to memory, because those pointers may have already been released.
3) When you try to free things in the finalization section while the application is shutting down, you may get failures that prevent your code from finalizing, thus preventing the application from shutting down, causing a hung application and memory loss. Which is what you were trying to prevent in the first place. Don't do it.
Ok, when you are running on Win95/98, or using external processes, you may in some circumstances have to free shared resources and notify those external processes that you are shutting down. Apart from that, it all happens automagically now.
Is there a way to access (read or free) memory chunks that are outside the memory that is allocated for the program without getting access violation exceptions.
Well what I actually would like to understand apart from this, is how a memory cleaner (system garbage collector) works. I've always wanted to write such a program. (The language isn't an issue)
Thanks in advance :)
No.
Any modern operating system will prevent one process from accessing memory that belongs to another process.
In fact, it you understood virtual memory, you'd understand that this is impossible. Each process has its own virtual address space.
The simple answer (less I'm mistaken), no. Generally it's not a good idea for 2 reasons. First is because it causes a trust problem between your program and other programs (not to mention us humans won't trust your application either). second is if you were able to access another applications memory and make a change without the application knowing about it, you will cause the application to crash (also viruses do this).
A garbage collector is called from a runtime. The runtime "owns" the memory space and allows other applications to "live" within that memory space. This is why the garbage collector can exist. You will have to create a runtime that the OS allocates memory to, have the runtime execute the application under it's authority and use the GC under it's authority as well. You will need to allow some instrumentation or API that allows the application developer to "request" memory from your runtime (not the OS) and your runtime have a way to not only response to such a request but also keep track of the memory space it's allocating to that application. You will probably need to have a framework (set of DLL's) that makes these calls available to the application (the developer would use them to form the request inside their application).
You have to be sure that your garbage collector does not remove memory other then the memory that is used by the application being executed, as you may have more then 1 application running within your runtime at the same time.
Hope this helps.
Actually the right answer is YES.. there are some programs that does it (and if they exists.. it means it is possible...)
maybe you need to write a kernel drive to accomplish this, but it is possible.
Oh - and I have another example... Debugger attach command... here is one program that interacts with another program memory even though both started as a different process....
of course - messing with another program memory.. if you don't know what you're doing will probably make it crush...