I am profiling a multi threaded application that uses pthread library, to get information about its most executed Basic Blocks (BBLs).
The problem I have encountered is that there are a lot of blocks belonging to __pthread_mutex_lock function, which are not only in the TOP 100 most executed BBLs, but in the TOP 10. This is really annoying because I am not interested in BBLs belonging to pthread library functions at all.
My mentor and I were speculating if busy wait had something to do with this, because in MPI there are ways to change locks to behave from busy waiting mode to blocking mode, so when profiling apps these kind of pthread functions don't appear that much, not even in the top 10. I have never worked with MPI so excuse me if what I say is not very accurate.
What I wanted to ask here is if anyone knows a way to do something like that to pthreads. Changing the code of the application is not an option, but if there is a compiling knob to compile the application, appart from -lpthread, that can change its behaviour not to get so many BBLs from its libraries, would be great.
Thank you for your time.
Related
This is a followon to this question about using the DX11VideoRenderer sample (a replacement for EVR that uses DirectX11 instead of EVR's DirectX9).
I've been trying to track down why it uses so much more CPU than the EVR. Task Manager shows me that most of that time is kernel mode.
Using profiling tools, I see that a LOT of time is being spent in numerous calls to NtDelayExecution (aka Sleep). How many calls? ~100,000 over the course of ~12 seconds. Ok, yeah, I'm sending a lot of frames in those 12 seconds, but that's still a lot of calls, every one of which requires a kernel mode transition.
The callstack shows the last call in "my" code is to IDXGISwapChain1::Present(0, 0). The actual call seems be Sleep(0) and comes from nvwgf2umx.dll (which is why this question is tagged NVidia: hopefully someone there can call up the code and see what the logic is behind such frequent calls).
I couldn't quite figure out why it would need to do /any/ Sleeping during Present. It's not like we wait for vertical retrace anymore, is it? But the other reason to use Sleep has to do with yielding to other threads. Which led me to a serious clue:
If I use D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS, the CPU utilization drops. Along with some other fixes, the DX11 version is now faster and uses less CPU time than the DX9 version (which is what I would hope/expect). Profiling shows that Sleep has dropped from >30% to <1%.
Unfortunately, this page tells me:
This flag is not recommended for general use.
Oh.
So, any ideas on how to get decent performance without using debug flags?
I use a lot of components in my Delphi 7 Service application, Indy, Synapse, Zeolibs, etc.
My application is generally stable, I use Eurekalog 6 to capture exceptions, but in rare situations, some threads hang because a 3rd party function it calls has hung, e.g. Indy gets stuck when trying to send email.
In many cases, the application that hung are my customer place, I've no access to their computer, so it is not possible for me to do a live debug. My application requires high availability so even if it hangs once a year, that is not acceptable to my users.
I am now looking for the best way to deal with such a situation where debugging is not feasible but I will still need the application recover by itself. Is it possible for a thread to terminate if a function it calls hangs? Alternatively, I can also restart the entire service when that happens. How about a Watchdog and what is the best way to implement it? Thanks.
I think you are being rather defeatist. Find and fix the bugs. It might be tricky, but it's the right solution.
Killing threads whose behviour you don't understand is never the solution. If you start killing threads you'll likely make things worse. That can lead to other runtime errors, deadlock and so on. Once you start killing threads you've lost control.
Now, it would be safe to kill the process (rather than a specific thread) and rely on a watchdog service to restart the process. But that's a really dire solution.
You should certainly use a tool like madExcept, EurekaLog etc. to debug unexpected exceptions. I see you are already using EurekaLog - that's good.
Deadlocks (it sounds like you have deadlock) can be more tricky to chase down. One good way to debug a deadlock is to get your client to produce a crash dump (e.g. from Process Explorer). Then debug it in WinDbg using map2dbg to produce symbolic stack traces. That will tell you which threads are blocking and that reveals the deadlock. And then fix the bugs.
For more details on this deadlock debugging technique see here: http://capnbry.net/blog/?p=18
I'm not familiar with EurekaLog since I use madExcept, but I would expect EurekaLog has a facility to allow generation of thread stack traces for a hung process. If so then that would most likely be the best approach for you.
Your question is rather too vague. If you don't know which of the various components you're using you wish to blame, then you have zero hope of fixing it. The most likely thing is you're doing something wrong, or that you don't understand how these components work. I very much doubt that it's purely a bug in the components themselves, but hey, either way it's all on you to find what's having a problem, and your job to fix it.
A deadlock that you've created, or a deep process corruption issue, that is happening, may prevent MadExcept from giving you any information, but it's worth trying.
To find out which one is freezing, if any at all, then the madexcept comment is the best suggestion yet. It will time-out (after a configurable # of seconds) and raise an artificial exception for you, interrupting your hung process. This works for user code, and for places where the thread is blocked in a Win32 or kernel function. For example, it's possible that you've set up Indy for infinite timeouts, as that's the default these days in Indy 10, and that what you're experiencing is a timeout related freeze, where network activity that you expected to complete but which never will complete, is causing your program to "hang". The cure here is to change your timeouts.
However, until you figure out WHERE the problem is, I doubt you'll be able to fix it. And so, for that, again, Marcus is right, you should be looking into madExcept. I can't live without it.
Secondly, you should really be adding trace logic to your program, so you know where it's going and what it was doing just before it had a problem. If you really need help doing that, you could try CodeSite, from Raize. Personally I find that OutputDebugString combined with the free Microsoft DebugView utility (formerly from SysInternals) tool is more than enough to debug such problems on a client computer.
Any program with background threads that does not have trace logging, is a badly designed program. Heck, any non-trivial single threaded application that might ever fail or have problems, needs trace logging.
Logging is always going to help, even when MadExcept or other exception tools don't. Trace-Logging is usually a roll-your-own solution, although CodeSite is also quite popular.
We have an application made up by a host (exe) and a lot of modules (dlls) containing gui etc.
Sometimes, the application freezes on shutdown. Mostly it happens during testing through TestComplete. We are not able to reproduce the behavior during debugging.
How can we find out why the application freezes?
I would guess that it is related to threads, but I do not know for sure.
Are there any tools or techniques we should try out?
I think that good old logging would help you. Add some logging to every unit finalization, add such logging to destructors of global data (database connection, global configuration etc). Of course do not destroy logger object.
If your application is multithreaded then add some logging to working threads such as writing '[date] thread [name of class] working' and write it every few seconds (you can use some debug mode). Also add logging when thread discovers that it should terminate.
Also use some system utilites such as ProcessMonitor, Handles, Process Explorer (all by Sysinternals/Microsoft). Monitor disk reads/writes, handle count, memory usage, network connections. Maybe your application dumps some big structures on disk at exit? Maybe it allocated a lot of memory and must release it?
Rig it with EurekaLog or MadExcept, and that may show you where the exception is, or where the memory leak is that is causing the exception. Both of those are excellent tools with fully-featured trial versions. Try 'em, buy 'em. Good stuff.
If the debugger's presence keeps the problem from occurring, then wait for the problem to occur, and then attach the debugger to it. Pause execution and you can inspect each thread's call stack.
If you use lots of GUI components in DLL's and/or do plenty of multi-threading then you'd first have to discover which DLL or thread is causing the problem. Or maybe it's a combination of both. Basically, you should create log events for every DLL and thread that gets loaded/started. Try to get a situation where you have a minimum of DLL's and threads loaded to generate the freeze. Then you've localized the problem to one of those.Also, create simple test-applications or use a unit-testing framework to test specific modules. For example, there is a Delphi version of NUnit available, which might help. (It's called DUnit...) Such a test framework is helpful to isolate the threads and DLL's to check each of them.
If your app is happenning under windows 7 and there are threads running in DLL, you must shutdown/terminate all running DLL threads before closing the main form of executable.
Good luck
I mostly work on C language for my work. I have faced many issues and spent lot time in debugging issues related to dynamically allocated memory corruption/overwriting. Like malloc(A) A bytes but use write more than A bytes. Towards that i was trying to read few things when i read about :-
1.) An approach wherein one allocates more memory than what is needed. And write some known value/pattern in that extra locations. Then during program execution that pattern should be untouched, else it indicated memory corruption/overwriting. But how does this approach work. Does it mean for every write to that pointer which is allocated using malloc() i should be doing a memory read of the additional sentinel pattern and read for its sanity? That would make my whole program very slow.
And to say that we can remove these checks from the release version of the code, is also not fruitful as memory related issues can happen more in 'real scenario'. So can we handle this?
2.) I heard that there is something called HEAP WALKER, which enables programs to detect memory related issues? How can one enable this.
thank you.
-AD.
If you're working under Linux or OSX, have a look at Valgrind (free, available on OSX via Macports). For Windows, we're using Rational PurifyPlus (needs a license).
You can also have a look at Dmalloc or even at Paul Nettle's memory manager which helps tracking memory allocation related bugs.
If you're on Mac OS X, there's an awesome library called libgmalloc. libgmalloc places each memory allocation on a separate page. Any memory access/write beyond the page will immediately trigger a bus error. Note however that running your program with libgmalloc will likely result in a significant slowdown.
Memory guards can catch some heap corruption. It is slower (especially deallocations) but it's just for debug purposes and your release build would not include this.
Heap walking is platform specific, but not necessarily too useful. The simplest check is simply to wrap your allocations and log them to a file with the LINE and FILE information for your debug mode, and most any leaks will be apparent very quickly when you exit the program and numbers don't tally up.
Search google for LINE and I am sure lots of results will show up.
This is a bit hypothetical and grossly simplified but...
Assume a program that will be calling functions written by third parties. These parties can be assumed to be non-hostile but can't be assumed to be "competent". Each function will take some arguments, have side effects and return a value. They have no state while they are not running.
The objective is to ensure they can't cause memory leaks by logging all mallocs (and the like) and then freeing everything after the function exits.
Is this possible? Is this practical?
p.s. The important part to me is ensuring that no allocations persist so ways to remove memory leaks without doing that are not useful to me.
You don't specify the operating system or environment, this answer assumes Linux, glibc, and C.
You can set __malloc_hook, __free_hook, and __realloc_hook to point to functions which will be called from malloc(), realloc(), and free() respectively. There is a __malloc_hook manpage showing the prototypes. You can add track allocations in these hooks, then return to let glibc handle the memory allocation/deallocation.
It sounds like you want to free any live allocations when the third-party function returns. There are ways to have gcc automatically insert calls at every function entrance and exit using -finstrument-functions, but I think that would be inelegant for what you are trying to do. Can you have your own code call a function in your memory-tracking library after calling one of these third-party functions? You could then check if there are any allocations which the third-party function did not already free.
First, you have to provide the entrypoints for malloc() and free() and friends. Because this code is compiled already (right?) you can't depend on #define to redirect.
Then you can implement these in the obvious way and log that they came from a certain module by linking those routines to those modules.
The fastest way involves no logging at all. If the amount of memory they use is bounded, why not pre-allocate all the "heap" they'll ever need and write an allocator out of that? Then when it's done, free the entire "heap" and you're done! You could extend this idea to multiple heaps if it's more complex that that.
If you really do need to "log" and not make your own allocator, here's some ideas. One, use a hash table with pointers and internal chaining. Another would be to allocate extra space in front of every block and put your own structure there containing, say, an index into your "log table," then keep a free-list of log table entries (as a stack so getting a free one or putting a free one back is O(1)). This takes more memory but should be fast.
Is it practical? I think it is, so long as the speed-hit is acceptable.
You could run the third party functions in a separate process and close the process when you are done using the library.
A better solution than attempting to log mallocs might be to sandbox the functions when you call them—give them access to a fixed segment of memory and then free that segment when the function is done running.
Unconfined, incompetent memory usage can be just as damaging as malicious code.
Can't you just force them to allocate all their memory on the stack? This way it would be garanteed to be freed after the function exits.
In the past I wrote a software library in C that had a memory management subsystem that contained the ability to log allocations and frees, and to manually match each allocation and free. This was of some use when attempting to find memory leaks, but it was difficult and time consuming to use. The number of logs was overwhelming, and it took an extensive amount of time to understand the logs.
That being said, if your third party library has extensive allocations, its more then likely impractical to track this via logging. If you're running in a Windows environment, I would suggest using a tool such as Purify[1] or BoundsChecker[2] that should be able to detect leaks in your third party libraries. The investment in the tool should pay for itself in time saved.
[1]: http://www-01.ibm.com/software/awdtools/purify/ Purify
[2]: http://www.compuware.com/products/devpartner/visualc.htm BoundsChecker
Since you're worried about memory leaks and talking about malloc/free, I assume you're in C. I'm also assuming based on your question that you do not have access to the source code of the third party library.
The only thing I can think of is to examine memory consumption of your app before & after the call, log error messages if they're different and convince the third party vendor to fix any leaks you find.
If you have money to spare, then consider using Purify to track issues. It works wonders, and does not require source code or recompilation. There are also other debugging malloc libraries available that are cheaper. Electric Fence is one name I recall. That said, the debugging hooks mentioned by Denton Gentry seem interesting too.
If you're too poor for Purify, try Valgrind. It it a lot better than it was 6 years ago and a lot easier to dive into than Purify.
Microsoft Windows provides (use SUA if you need a POSIX), quite possibly, the most advanced heap+(other api known to use the heap) infrastructure of any shipping OS today.
the __malloc() debug hooks and the associated CRT debug interfaces are nice for cases where you have the source code to the tests, however they can often miss allocations by standard libraries or other code which is linked. This is expected as they are the Visual Studio heap debugging infrastructure.
gflags is a very comprehensive and detailed set of debuging capabilities which has been included with Windows for many years. Having advanced functionality for source and binary only use cases (as it is the OS heap debugging infrastructure).
It can log full stack traces (repaginating symbolic information in a post-process operation), of all heap users, for all heap modifying entrypoint's, serially if needed. Also, it may modify the heap with pathalogical cases which may align the allocation of data such that the page protection offered by the VM system is optimally assigned (i.e. allocate your requested heap block at the end of a page, so even a singele byte overflow is detected at the time of the overflow.
umdh is a tool which can help assess the status at various checkpoints, however the data is continually accumulated during the execution of the target o it is not a simple checkpointing debug stop in the traditional context. Also, WARNING, Last I checked at least, the total size of the circular buffer which store's the stack information, for each request is somewhat small (64k entries (entries+stack)), so you may need to dump rapidly for heavy heap users. There are other ways to access this data but umdh is fairly simple.
NOTE there are 2 modes;
MODE 1, umdh {-p:Process-id|-pn:ProcessName} [-f:Filename] [-g]
MODE 2, umdh [-d] {File1} [File2] [-f:Filename]
I do not know what insanity gripped the developer who chose to alternate between -p:foo argument specifier's and naked ordering of argument's but it can get a little confusing.
The debugging sdk works with a number of other tools, memsnap is a tool which apparently focuses on memory leask and such, but I have not used it, your milage may vary.
Execute gflags with no arguments for the UI mode, +arg's and /args are different "modes" of use also.
On Linux I've successfully used mtrace(3) to log allocations and freeings. Its usage is as simple as
Modify your program to call mtrace() when you need to begin tracing (e.g. at the top of main()),
Set environment variable MALLOC_TRACE to the file path where the trace should be saved and run the program.
After that the output file will contain something like this (excerpt from the middle to show a failed allocation):
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x99e5e20 0x49
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x99beba0
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x9a23ec0 0x10
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x9a23ec0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + 0x99c67c0 0x8
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668f14f] - 0x99c67c0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + (nil) 0x30000000
# /lib/libc.so.6:[0xf677f8eb] + 0x99c21f0 0x158
# /lib/libc.so.6:(_IO_file_doallocate+0x91)[0xf677ee61] + 0xbfb00480 0x400
# /lib/libc.so.6:(_IO_setb+0x59)[0xf678d7f9] - 0xbfb00480