Setting up a WH_CALLWNDPROC hook prevents WM_HELP propagation in deep hierarchies - delphi

When pressing the F1 key, the win32 API first sends the appropriate key message then sends a WM_HELPmessage to the control that has the focus.
As it does not process it, it gets sent up the parenting chain all the way to the form which reacts to the message.
In Delphi (XE7) this happens because of calls to CallWindowProc inside Vcl.Controls.TWinControl.DefaultHandler
While this works fine in pretty much all locations inside my applications, there is one place where WM_HELP never reaches the top form.
Trying to reproduce it, I came up with a test application that you may find here:
http://obones.free.fr/wm_help.zip
After having built the application and started it, place the focus inside the In SubLevel or Level 1 edits and press F1.
You will see that WM_HELP is caught by the form.
Now, if you do the same inside In SubLevel2 or Level 15 edits you will see that nothing is logged, the form never sees WM_HELP
Tracing in the VCL I found out that for those deep levels, the calls to CallWindowProc inside Vcl.Controls.TWinControl.DefaultHandler immediately returns on one of the controls in the hierarchy, thus preventing the form from ever receiving the message.
However, I couldn't figure out why the Win32 API code thinks it should not propagate the message anymore, except for one thing: If I remove the WH_CALLWNDPROC hook, then everything is back to normal.
You can see the effect of disabling it if you uncheck the Use hook checkbox.
Now, one will argue that I shouldn't have such deep hierarchies of components, and I agree. However, the structure in the center with two frames inside one another is directly inspired by what's in the application where I noticed the issue.
This means that it can be quite easy to trigger the problem without actually noticing it. Hopefully, in my case, I can remove a few panels and go back below the limit.
But did anyone encounter the situation before? If yes, were you able to solve it? Or is this a known behavior of the Win32 API?

This is caused by a "Windows kernel stack overflow" that happens if you send window messages recursively. On a 64 bit Windows the kernel stack overflow happens much faster than on a 32 bit Windows.
This bug also caused the VCL to not resize deeply nested controls correctly before it got fixed by changing the recursive AlignControls code to (my) iterative version (more about the stack overflow: http://news.jrsoftware.org/news/toolbar2000/msg07779.html)

Related

Erased components in Delphi program

Updates
2016-02-18: Added process information
I have a Delphi program compiled using XE4. It is being used by a few hundred customers. A couple of weeks ago one of these customers reported that some areas of the executable was being erased (images bellow) randomly during the day. This client has 35 sites using this exe and the problem occurs on no more than 10 of these sites.
Investigation
1 - My first suspicion was an infinite loop. The exe keeps responding while the components are erased, nothing changed on the code so radically from the time this problem did'n happen and the logs don't show any loop (this exe has logs everywhere).
2 - Misbehaving threads. I have a separate thread that syncs data between this exe and our server in the cloud. Again, logs don't show that the thread is running when the problem occur and again, nothing was changed here.
3 - Some other program (antivirus?) is affecting my exe. Couldn't investigate this hipotesis properly yet, but until now couldn't find any installed program that raised my attention.
My question is: What could be causing this issue? How can I investigate further? I know this may be a wide question but this is all information I could gather and I can't imagine many more places to look at.
Images
1 - In the image bellow the red-stroked area should be a TToolBar
2 - In this second image there are three areas, from the top to the bottom the first one should be a TToolBar, the second one should be the title of the child form and the third one should be a TwwDBGrid
3 - The third example shows on the top the erased area where should be a TEdit, just bellow it there's what should be a line on a TwwDBGrid and on the side we can see an erased scrollbar from the TwwDBGrid
4 - This last example shows 5 erased areas: The title of the application, the main TToolBar, The title of the Form, a TButton and two TwwDBGrid
5 - This is an interesting example beacause beyond the erased components there are 4 TSpeedButtons that are not erased but they are without the images they have originally (the first red stroked areas). The other 3 red stroked areas are, in order, 2 TEdits, a TwwDBGrd and a TButton
Process Information
I got a screenshot by the momment the problem occurs. scgolr is my software.
There is really not enough detailed information to give you a definite answer. However, I can answer with some directions on your question:
How can I investigate further?
Because of what you have stated:
The program is in use by a few hundred customers
One (only) customer experiences the problem
First occurance of the problem was some weeks ago
the first thing to do, is get in contact with the customer, and get the information you say that you have asked for but not got. The questions that need to be answered are:
What has changed in the customers environment at the time the problem
started with respect to hardware, network, server, OS, other software
running in the PCs?
Has anything changed in the way your customer is using your software?
What do the customer have to do to get rid of the problem, once it occurs? Close the program? Restart the PC? Or maybe just minimize - restore the erroneous window?
With the above I do not suggest that the fault is with this one customer and their equipment or their way of using the software. It may just be that the combination at the site which is different from all your other customers, triggers the problem to show up.
Some specific things to check in your software and at the site when problem occurs and if the problem goes away with a minimize - restore of the application (which would suggest a painting interrupted problem:
Do you call Application.ProcessMessages at any time?
Does the background thread access same data as the GUI? If yes, are the data protection properly in place (locking, synchronisation).
Does the background thread access any GUI components without Synchronize?
Finally I suggest that you visit the customer onsite. You get much better and faster answers in a direct discussion.
Edit after process information received.
There is nothing alarming concerning GDI or User objects. But it is alarming when you say in the comments that you call Application.ProcessMessages in many places, obviously to 'fix' a non-responsive UI. For example, what happens if the user double clicks a button, but does it slowly enough that Windows detects it as to separate clicks? First click may start your long lasting procedure within which you call A.P. The second click is read from the message que which starts the same procedure. Now the second call to the procedure runs (with its own calls to A.P.) and eventually ends and execution returns to the first call. Depending on what you do in this procedure, you may well be messing up handles and device contexts etc. A strong recommendation said with a friendly intent: Get rid of those calls to A.P.
the problem is with the security plugin (Warsaw - Gas Tecnologia) bank's website that your client is accessing , update it and it will be solved , the problem happens in Brazil
As #SebastianZ and #AlekseyK pointed out you may experiment exaustin of some GDI resource (handles?).
If the system coukd be accessed some tools like Process explerer or process hacker could give you some hints. This utility may help too GDIView
I don't know if this may apply to your case, but sometimes database data corruption can lead to strange effect in running programs (i remember 'Data Bombs' causing out of memory exceptions ...
So if something cause a GDI allocation loop, the graphics of your app cauld be affected in 'strange' ways

FMX Direct 2D Issue

Good afternoon all!
I'm experiencing a rather annoying issue with one of my current projects. I'm working with a hardware library (NVAPI Pascal header translation by Andreas Hausladen) in one of my current projects. This lib allows me to retrieve information from an NVIDIA GPU. I'm using it to retrieve temperatures, and with the help of Firemonkey's TAnimateFloat, i'm adjusting the angle on a custom-made dial to indicate the temperature.
As FMX defaults to Direct 2D on Windows, i can monitor the FPS with any of the various "gamer" tools out there (MSI Afterburner, FRAPS, etc).
The issue i'm having is that when i put the system into sleep mode (suspend to RAM/S3), and then start it up again, the interface on my application is blacked out (partially or completely), and nothing on the UI is visibly refreshing. I'm calling the initialization for the NVAPI library regularly and checking the result via a timer, but this doesn't fix the issue. I'm also running ProcessMessages and repaint on the parent dial and it's children controls (since i can't seem to find a repaint for the form or even an equivalent).
I tried various versions of the library, and each one presents the same issue. The next paragraph indicates that this was in fact NOT the issue, and that it's actually the renderer at fault.
I have one solution, but i want to know if there's something more... elegant, available. The solution i have involves adding FMX.Types.GlobalUseDirect2D := False; before Application.Initialize in my projects source. However, this forces FMX to use GDI+ rather than Direct2D. It works of course, but i'd like to keep D2D open as an option if i can. I can use FindCmdLineSwitch to toggle this on/off dependant on parameters, but this still requires me to restart the application to change from D2D to GDI+ or vice-versa.
What's weird about it is that the FPS counter (from FRAPS in my case) indicates that there's still activity happening in the UI (as the value changes as would be expected), but the UI itself isn't visibly refreshing.
Is this an issue related to Direct2D, or a bug with Firemonkey's implementation? More importantly, is there a better method to fixing it than disabling D2D? Lastly, also related, is it possible to "reinitialize" an application without terminating it first (so perhaps i can allow the user to switch between GDI+ and D2D without needing to restart the application)?
This is may be of the issues with FM prior to the update 4 hotfix - 26664/QC 104210
Fixes the issue of a FireMonkey HD form being unresponsive after user unlock - installing this might resolve the issue for you.
The update should be part of your registered user downloads from the EDN (direct link http://cc.embarcadero.com/item/28881).

AV after successful close of applications

I am getting this AV message about 3 to 5 seconds after the applications close as expected:
Exception EAccessViolation in module rtl160.bpl at 00073225. Access violation at address 500A3225 in module 'rtl160.bpl'. Read of address 00000004.
These (20) applications are very similar in that they are IBX business applications. About half of them did not cause the AV to occur.
These applications were ported from Delphi-xe and they worked flawlessly for a long time. No changes were made to the projects in the port. Both 32 and 64 bit builds gave the same results.
Is this a bug in some library's finalization section freeing a resource or something?
I am using Delphi-XE2 Update 3.
Would appreciate the help.
Try using madExcept / EurekaLog etc. - they give you detailed stack trace on AV. This is not always a panacea, but can point you to the problem.
Access Violations are by their nature already very troublesome beasts since they deal with invalid pointers in memory. One that occurs a while after an application shuts down is even worse because that's when your app is in "cleanup" mode. You're could be dealing with something that went wrong much earlier in the application, but is only exposing itself at shutdown.
General Tips:
Try to always undo things in the reverse order you did them. E.g.
Create A, Create B ... Destroy B, Destroy A
Connect to Database, Open Dataset ... Close Dataset, Disconnect from Database
Even making sure you've done all the above before shutting down can help tremendously.
Any threads that are still running while your application is running can cause problems.
Preferably ensure all your child threads are properly terminated before final shutdown.
Refer back to Closing datasets above. Depending on what you're doing, some database components will create their own threads.
If you're using COM, try ensure ComObj is high up in the initialization sequence (I.e. place it as high as possible in your DPR).
Delphi finalizes units in the reverse order that they were initialized.
And you don't want ComObj to finalize before other things that are dependent on ComObj have also done so.
If you're using interface references, make sure you resolve circular reference issues.
Some of these problems can be tricky to find, but you can do the following:
Setup a source-code "sandbox" environment (you're going to chuck all your changes as soon as you've found the problem).
Figure out the simplest set of steps required to guarantee the error. (Start app and immediately shutdown would be ideal.)
Then you're going to comment-out delete wipe out chunks of code between tests and basically follow a divide and conquer approach to:
rip out code
test
if the problem persists, repeat. Else roll-back and rip out a different chunk of code.
eventually your code base will be small enough to pinpoint likely problems which can be tackled with targeted testing.
I've had this kind of access violation problem on occasion with old Delphi or C++Builder projects. Today I had it with C++Builder. At the time of the crash, by looking in the Debug -> Call Stack window, I can see that it's happening inside a call to fflush, called by __exit_streams and _exit.
I'm not sure what is causing it, since it's so deep in the Borland library code, but it seems to come and go at random when the code changes. And it seems to be more common with multi-form applications.
This time the error went away when I just added a new button on the main form. A button which is just there, has no event handlers and does not do anything. I think that any random change to the code, classes, variables etc rearranges the memory layout when you relink the application, and that either triggers or untriggers the error.
For now, I just leave the new button on the form, set it to "not visible" so that there's no visible change. As it seems to work, it's good enough solution for me at this time.

How might I find out the source of long delays on resizing the main form?

I have a D2006 app that contains a page control and various grids, etc on the tabs. When I resize the main form (which ripples through and resizes just about everything on the form that is aligned to something), I experience long delays, like several seconds. The app freezes, the idle handler is not called and running threads appear to suspend also.
I have tries pausing execution in the IDE while this is happening in an attempt to break execution while it is in the troublesome code, but the IDE is not taking messages.
Obviously I'm not expecting anyone to point me at some errant piece of code, but I'm after debugging approaches that might help me. I have extensive execution timing code throughout the app, and the long delays don't show up in any of the data. For example, the execution time of the main form OnResize handler is minimal.
If you want to find out what's actually taking up your time, try a profiler. Sampling Profiler could answer your question pretty easily, especially if you're able to find the beginning and the end of the section of code that's causing trouble and insert OutputDebugString statements around it to narrow down the profiling.
OK. Problem solved. I noticed that the problem only occurred when I had command-line switches enabled to log some debug info. The debug info included some HTTP responses that were written to a debug log (a TMemo) on one of the tabs. When the HTTP response included a large block with no CR/LFs the TMemo wrapped it. Whenever I resized the main form, the TMemo resized and the control had to render the text again with the new word wrapping.
To demonstrate:
start a new Delphi project
drop a TMemo onto the form
align it to Client
compile and run
paste a large amount of text into the TMemo
resize the main form
I won't award myself the answer, as I hadn't really provided enough info for anybody else to solve it.
BTW #Mason - would SamplingProfiler have picked this one up - given that the execution is inside the VCL, and not in my code?
A brute-force approach that may give results.... Put a debug message to OutputDebugString() from every re-size event, sending the name of the control as the string to be displayed. This may show you which ones are being called "a lot".
You may have a situation where controls are bumping each other, setting off cascading re-size events. Like 3 siblings in the back seat of a compact car, once they start jostling for position, it can take a while for them to "settle down".
Don't make me turn this car arround....
The debug log (viewable in the IDE, or with an external ODS viewer), may show you which ones are causing the most trouble, if they appear multiple times for one "user-initiated re-size event".
Run your application in AQTime's performance profiler (included with XE, but you can get a time-limited version from their website).
Do some fanatic resizing for a while, and then stop the application.
After that, you'll see exactly which function was called many times, and where most time was spent.

How to Prevent ProcessMessages in Delphi

The Application.ProcessMessages command is well known and I use it in long processes to ensure my program will not tie up the computer.
But I have one fairly quick set of processing, where I am buffering a view into a file. During the buffering procedure, a few system messages may get sent off (e.g. redraw or scrollbar move or other events). I want to prevent these from getting handled by ProcessMessages until my buffering is complete.
Is there any way to either:
Prevent Application.ProcessMessages until my procedure is complete, or
Trap all messages generated during my procedure, and not release them until the end of the procedure.
Allowing the ProcessMessages to continue even if it sends messages you don't want should not be classed as problematic. With a bit of code refactoring, you could move the buffering method into a separate thread and go from there.
If you are attempting to copy the "visual contents" of a control into a file,
look at the WM_PRINT(xxx) message which allows child controls to paint themselves into bitmaps
try the LockWindowUpdate Win32 API method call which will turn off all painting messages to that control
override the WndProc/DefaultWndProc method on your control class or even the parent class if you need to and simply return "true" for each message sent
override specific control methods (such as "scroll bar moved", "OnPaint", "OnPaintBackground" etc) on the control class or even the parent and simply do nothing if your buffering is in progress
Overriding the WndProc or DefaultWndProc and simply returning true for each message essentially "turns off" ProcessMessages but it's not safe to do it this way because the control might need to process one or more messages to function correctly.
Turning off ProcessMessages is not possible (without rewriting the VCL code for message processing) because of the fact that it's part of how the VCL form's message loop has been constructed.
Trap all messages generated during my procedure, and not release them
until the end of the procedure.
There is a dirty hack you can do (only if you can not come up with a better way):
You can watch (trap) any messages by using Win32 Hooks.
Specifically, use SetWindowsHookEx with WH_CALLWNDPROC as the idHook value.
You can then record them in a list/queue and resend them when you want.
I learned way back in Windows 2 that windows messages will happen at times you don't expect them. Any part of a library can cause your app's message processing to happen. Rather than hold back the tide, make your code robust against the situation. This may be as simple as usinga a BeginUpdate/EndUpdate pair, or more complex (using a temporary and doing the final update at the end).
At a pedantic level, the way you "prevent" Application.ProcessMessages is to not call any code that
shows a modal dialog
calls SendMessage
runs its own local message loop
calls Application.ProcessMessages (which is a local message loop)
If you write a loop that does nothing but numerical calculations and file I/O, your UI will be frozen until you exit the loop because no messages are being processed.
If you want your UI to be responsive during some long-running operation of unknown arbitrary code (third party library) but you don't want certain kinds of actions to occur in your app during that time, that's a different problem - that's about preventing reentrancy. You want to prevent some parts of your code from being used while a particular activity is in progress. For example, modal dialogs prevent you from interacting with the app windows underneath the dialog by disabling all the app's top level windows except the modal dialog itself.

Resources