Delphi TWebBrowser get HTML source after AJAX load - delphi

I have the following function that gets the HTML document's source code after DocumentComplete event.
function TBrowser.GetWebBrowserHTML(const WebBrowser: TWebBrowser): string;
var
LStream: TStringStream;
Stream : IStream;
LPersistStreamInit : IPersistStreamInit;
begin
try
if not Assigned(WebBrowser.Document) then exit;
LStream := TStringStream.Create('', TEncoding.UTF8);
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream,soReference);
LPersistStreamInit.Save(Stream,true);
result := LStream.DataString;
finally
LStream.Free();
end;
except
end;
end;
The problem: the source code is retrieved before AJAX calls are performed on the page.
The page finishes loading (as WebBrowser determines), but AJAX continues to modify the DOM and additional elements appear on the page.
What I need is the equivalent of Mozilla's "View Generated Source", or the html source that appear when inspecting the web page with Firebug or Chrome Inspector or IE Developer Tools.
Seems that in C there is DocumentText property that does this thing, but couldn't find any property or methods to achieve this in Delphi.
Any ideas/hints/help please?

You can use the IHTMLDocument2 interface, which is the interface that TWebBrowser.Document implements. The property is exposed as an IDispatch, but you can cast it to the interface, or to an (Ole)Variant, although you won't benefit from code completion then.
The IHTMLDocument2 interface supports the DocumentElement property, which points to the root element of the document. That element (as any other) has the property outerHTML, which gives you the element and all its contents as a string:
var
d: OleVariant;
begin
d := WebBrowser1.Document;
ShowMessage(d.documentElement.outerHTML);
As far as I can see, this is the actual state of the document, including any changes that are made by Javascript.
It doesn't seem to include the doctype, but then again, if I find the doctype element through Webbrowser1.Document.All, then its outerHTML property doesn't return anything. Other parts of the document are also changed (tag names in capitals, for one), but that only confirms that this is a generated document structure based on the loaded DOM, rather than the original source of the document.

Related

how can i add hyperlink into open word document in OleContainer

i want to use MsWord as editor for my HTML document.
i open anther form with some list of files.
i want the user to select one of the files
and add this as alink in the open document (at the place the user select)
i open HTML document in word created in olecontainer.
with :
with OleContainerFrame do
begin
OleContainer1.CreateObjectFromFile(FileToEditName{myfile.html}, False);
OleContainer1.AutoActivate := aaGetFocus;
OleContainer1.DoVerb(ovOpen);
OleContainer1.Run;
end;
how can i add this link, as :
AddHperyLink(SomeText,TheHyperLink)....
at the place the user select
Suppose there is a TEdit on your form which contains a URI (I used the BBC's site). Then the following code will add a hyperlink to it in the active Word document in your OLEContainer:
procedure TForm1.Button1Click(Sender: TObject);
begin
OleContainer1.OleObject.ActiveDocument.Hyperlinks.Add(
Anchor := OleContainer1.OleObject.Selection.Range,
Address := Edit1.Text, // contains e.g. http://www.bbc.co.uk
TextToDisplay := 'Link'
);
end;
The way this works is that OleContainer1.OleObject is a variant reference to Word.Application (see e.g. the Word2000.Pas unit that comes with Delphi) and once you have this reference you can call Word's automation methods using late (or early) binding.
Btw the unusual syntax of the arguments to OleContainer1.OleObject.ActiveDocument.Hyperlinks.Add is a special syntax that Delphi supports to enable named parameters to be used in latebound calls.
Update: You say in a comment that you have tried the code above but get the error "Method 'Selection' not supported by automation object". When I put together my test project, I didn't have an association set up between HTML and MS Word, so I write the code necessary to activate Word and load an HTML file into it. I do this in the FormCreate event:
procedure TForm1.FormCreate(Sender: TObject);
var
V : OleVariant;
AFileName : String;
begin
OleContainer1.CreateObject('Word.Application', False);
OleContainer1.Run;
V := OleContainer1.OleObject;
Caption := V.Name;
V.Visible := True;
AFileName := ExtractFilePath(Application.ExeName) + 'Hello.Html';
V.Documents.Add(AFileName);
end;
Note that this and Button1Click are the entire code of my project and it inserts the link as you asked. If you get a different result, I think it must be because of some detail of your set-up that we readers can't see.
yes that work.
i didnot now we can use
(Anchor := ....
);
but now
word remove the execet PATH and change it to 'href="../../../../MzIAI/Images/2019-06/12/45545_5679.Pdf">'
and remove full path

Delphi, retrieve both visible text and hidden hyperlink when pasting into a delphi application

How can I do that? I've been looking all over the internet to find some clues but failed.
You can click on a link in the browser and copy it and then paste it into a word doc document for example.
I using a tcxGrid with some fields and want to paste this link into the field. The field will show you the text but if you click on it it will open the browser with this link.
I can fix all the later part but I don't know how to extract the text and the link from the clipboard.
Does anyone know how to do it?
I've found an old article that describes how you can do it but the result is not good. I get Chinese text instead of HTML.. see below my test code:
function TForm2.clipBoardAsHTML: string;
var
CF_HTML: UINT;
CFSTR_INETURL: UINT;
URL: THandle;
HTML: THandle;
Ptr: PChar;
begin
CF_HTML := RegisterClipboardFormat('HTML Format');
CFSTR_INETURL := RegisterClipboardFormat('UniformResourceLocator');
result := '';
with Clipboard do
begin
Open;
try
HTML := GetAsHandle(CF_HTML);
if HTML <> 0 then
begin
Ptr := PChar(GlobalLock(HTML));
if Ptr <> nil then
try
Result := Ptr;
finally
GlobalUnlock(HTML);
end;
end;
finally
Close;
end;
end;
end;
Data looks like:
敖獲潩㩮⸱ര匊慴瑲呈䱍〺〰〰〰ㄲര䔊摮呈䱍〺〰〰㈰㐳ള匊慴
and much more.
So something is wrong with my code it looks.. :(
The recommended format CFSTR_INETURL does not exist in the clipboard when takes a copy from Firefox, and Excel so I couldn't get any data using that format.
==================================
Latest test - Retrieve of format names.
procedure TForm2.Button2Click(Sender: TObject);
var
i: integer;
s: string;
szFmtBuf: array[0..350] of PWideChar;
fn: string;
fmt: integer;
begin
Memo1.Clear;
for i := 0 to clipBoard.FormatCount - 1 do
begin
fmt := clipBoard.Formats[i];
getClipBoardFormatName(fmt,#szFmtBuf,sizeOf(szFmtBuf));
fn := WideCharToString(#szFmtBuf);
if fmt >= 49152 then
Memo1.Lines.Add(fmt.ToString+ ' - '+fn);
end;
end;
Finally I made this code work :) but the main question how I'll get the url from the clipboard are still unsolved. :(
If I loop through all found formats I only get garbage from them.
The formats from Firefox looks:
49161 - DataObject
49451 - text/html
49348 - HTML Format
50225 - text/_moz_htmlcontext
50223 - text/_moz_htmlinfo
50222 - text/x-moz-url-priv
49171 - Ole Private Data
It really depends on which format(s) the copier decides to place on the clipboard. It may place multiple formats on the clipboard at a time.
A hyperlink with url and optional text may be represented using either:
the Shell CFSTR_INETURL format (registered name: 'UniformResourceLocator') containing the URL of the link, and the CF_(UNICODE)TEXT format containing the text of the link, if any.
the CF_HTML format (registered name: 'HTML Format') containing whole fragments of HTML, including <a> hyperlinks and optional display text.
The VCL's TClipboard class has HasFormat() and GetAsHandle() methods for accessing the data of formats other than CF_(UNICODE)TEXT (which can be retrieved using the TClipboard.AsText property).
You need to use the Win32 RegisterClipboardFormat() function at runtime to get the format IDs for CFSTR_INETURL and CF_HTML (using the name strings mentioned above) before you can then use those IDs with HasFormat() and GetAsHandle().
You can also enumerate the formats that are currently available on the clipboard, using the TClipboard.FormatCount and TClipboard.Formats[] properties. For format IDs in the $C000..$FFFF range, use the Win32 GetClipboardFormatName() function to retrieve the names that were originally registered with RegisterClipboardFormat().

how can I detect if a button was pressed on a simple webpage using embeddedwb

Am using embeddedwb on my application and I have a simple webpage with a button
<input name=mpi type=submit value=" Continue ">
I was trying with but it was no good
if E.outerHTML = '<input name=mpi type=submit value=" Continue ">' then
begin
if rLoginGate.IsConnectedToLoginGate then
begin
ToggleChatBtn;
end;
end;
now what I want to do is when I press the button I need my application to pick it up and run a simple command like a messagebox anyone got any ideas??
thanks
A way to do this is using the HTML DOM Object model interfaces in MSHTML.PAS.
My earlier answer, here: Detect when the active element in a TWebBrowser document changes shows how to access this via that Document object of a TWebBrowser. TEmbeddedWB provides access it via its Document object, too.
That answer and the comments to it shows how to catch events related to a specific node in the document, and also specific event(s).
Of course, if the HTML is under your control, you can make things easier for yourself by giving the HTML node(s) you're interested in an ID or attribute that's easy to find via the DOM model.
The following shows how to modify the code example in my linked answer
to attach an OnClick handler to a specific element node:
procedure TForm1.btnLoadClick(Sender: TObject);
var
V : OleVariant;
Doc1 : IHtmlDocument;
Doc2 : IHtmlDocument2;
E : IHtmlElement;
begin
// First, navigate to About:Blank to ensure that the WebBrowser's document is not null
WebBrowser1.Navigate('about:blank');
// Pick up the Document's IHTMLDocument2 interface, which we need for writing to the Document
Doc2 := WebBrowser1.Document as IHTMLDocument2;
// Pick up the Document's IHTMLDocument3 interface, which we need for finding e DOM
// Element by ID
Doc2.QueryInterface(IHTMLDocument3, Doc);
Assert(Doc <> Nil);
// Load the WebBrowser with the HTML contents of a TMemo
V := VarArrayCreate([0, 0], varVariant);
V[0] := Memo1.Lines.Text;
try
Doc2.Write(PSafeArray(TVarData(v).VArray));
finally
Doc2.Close;
end;
// Find the ElementNode whose OnClick we want to handle
V := Doc.getElementById('input1');
E := IDispatch(V) as IHtmlElement;
Assert(E <> Nil);
// Create an EventObject as per the linked answer
DocEvent := TEventObject.Create(Self.AnEvent, False) as IDispatch;
// Finally, assign the input1 Node's OnClick handler
E.onclick := DocEvent;
end;
PS: It's a while since I used TEmbeddedWB and it may turn out that there's a more direct way of doing this stuff, because it was subject to a lot of changes after I stopped using it (in the D5 era). Even so, you won't be wasting your time studying this stuff, because COM events are applicable to all sorts of things, not just the HTML DOM model.

UI Automation - ElementFromHandle doesn't find element

I am trying to use UIAutomation to access/control Chrome browser (using that method based on what other people have used - eg to get the current URL).
For the purposes ok the exercise, I'm trying to replicate this question - Retrieve current URL from C# windows forms application - in Delphi. I've imported the TLB ok. However, my call to ElementFromHandle never locates an element.
The signature of the ElementFromHandle method is:
function ElementFromHandle(hwnd: Pointer; out element: IUIAutomationElement): HResult; stdcall;
My test is simply:
procedure TForm3.Button1Click(Sender: TObject);
var
UIAuto: IUIAutomation;
element: IUIAutomationElement;
value: WideString;
h: PInteger;
begin
new(h);
h^ := $1094E;
SetForegroundWindow(h^);
ShowWindow(h^, SW_SHOW);
UIAuto := CoCUIAutomation.Create;
UIAuto.ElementFromHandle(h, element);
if Assigned(element) then
begin
element.Get_CurrentName(value);
showmessage('found -' + value);
end
else
showMessage('not found');
end;
Calls to SetForegroundWindow and ShowWindow are just there in case it needed focus (but I doubted that it would make a difference and doesn't). I can confirm that the Handle I'm passing in ($1094E) is "correct" in so much as Spy++ shows that value for the Chrome Tab I'm trying to access. The active tab in Chrome always reports that Handle.
Is my implementation correct above? Is there more to using UIAutomation than what I have implemented above? I have never explored it before.
Thanks
EDIT
I have found if I use ElementFromPoint and pass in a (hardcoded) value of where I know my Tab sits in terms of X,Y - it does work. ie:
UIAuto := CoCUIAutomation.Create;
p.x := 2916;
p.y := 129;
UIAuto.ElementFromPoint(p, element);
if Assigned(element) then
The above snippet if placed in the above OnClick event does return an element instance and the one I'm expecting too (which is a bonus). So maybe I'm passing in an incorrect value for Hwnd in ElementFromHandle? ie, I'm using the "top" level handle of Chrome as found my MS Spy++:
This sits directly under (Desktop) in Spy++.
Your mistake is in the way that you pass the window handle to ElementFromHandle. You are meant to pass an HWND. Instead you pass the address of an HWND.
The function should really be:
function ElementFromHandle(hwnd: HWND;
out element: IUIAutomationElement): HResult; stdcall;
You should remove the call to New and instead do:
var
window: HWND;
....
window := HWND($1094E);
Then call the function like this:
if Succeeded(UIAuto.ElementFromHandle(window, element)) then
....
Perhaps your biggest fundamental problem is the complete absence of error checking. I think you need to adjust your mindset to realise that these API calls will not raise exceptions. They report failure through their return value. You must check every single API call for failure.
One common way to do that is to convert HRESULT values to exceptions in case of failure with calls to OleCheck. For example:
var
UIAuto: IUIAutomation;
element: IUIAutomationElement;
value: WideString;
window: HWND;
....
window := HWND($1094E);
SetForegroundWindow(window);
ShowWindow(window, SW_SHOW);
UIAuto := CoCUIAutomation.Create;
OleCheck(UIAuto.ElementFromHandle(window, element));
OleCheck(element.Get_CurrentName(value));
ShowMessage('found -' + value);

How can I get source code of page thru WebBrowser-Control (ActiveX InternetExplorer)?

How can I get source code of page thru WebBrowser Control (ActiveX InternetExplorer)?
I have an xml document "foo.xml".
var
Web: TWebBrowser;
begin
...
Web.Navigate("foo.xml");
// How can I get source code thru WebBrower control<----
...
end;
In the DocumentCompleted event, look at the DocumentText property of the WebBrowser control. It should have the complete text of the loaded page.
IHTMLDocument2(Web.Document).Body.InnerHTML;
This should return the source of the page.
I thought this would be easy but it seems it might have been forgotten. You can easily do it with a TidHTTP control though.
MyPage := IdHTTP1.Get('www.google.com');
I know its not what you want but might help.
Another method which works well is to use Synapse. Use the synapse call HttpGet to retrieve your initial resource (which gives you the source code) then manipulate as needed.
Another option would be to use the EmbeddedWB component which exposes MANY more properties and features of the web browser than the standard Delphi component does and still fits your requirement of doing it within the web browser.
To access the entire HTML of the page through your WebBrowser control use:
Web.Document.All[0].OutterHtml;
private void btnTest_Click(object sender, EventArgs e)
{
wbMain.Navigate("foo.xml");
wbMain.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(testing);
}
private void testing(Object sender, WebBrowserDocumentCompletedEventArgs e)
{
test = wbMain.DocumentText;
}
I know this is a little late but this works for me. wbMain is the WebBrowser Object.
WebBrowser1.Navigate() loads it into the RAD component window using the built in IE component in the Windows OS. What you do is respond to a callback (for the browser component, double-click the OnDownloadComplete event) and save it to file in that function. Snippets from working code:
procedure TMainForm.WB_SaveAs_HTML(WB : TWebBrowser; const FileName : string) ;
var
PersistStream: IPersistStreamInit;
Stream: IStream;
FileStream: TFileStream;
begin
if not Assigned(WB.Document) then
begin
Logg('Document not loaded!') ; //'Logg' adds a line to a log file.
Exit;
end;
PersistStream := WB.Document as IPersistStreamInit;
FileStream := TFileStream.Create(FileName, fmCreate) ;
try
Stream := TStreamAdapter.Create(FileStream, soReference) as IStream;
if Failed(PersistStream.Save(Stream, True)) then ShowMessage('SaveAs HTML fail!') ;
finally
FileStream.Free;
end;
end; (* WB_SaveAs_HTML *)
procedure TMainForm.WebBrowser1DownloadComplete(Sender: TObject);
begin
if (WebBrowser1.Document<>nil)AND NOT(WebBrowser1.busy) then begin
WB_SaveAs_HTML(WebBrowser1,'test.html');
//myStringList.loadFromFile('test.html'); //process it.
end;
end;
Note that some MIME ("file") types such as JSON give a 'Save As...' dialog in IE, which stops your reading and requires manual intervention.

Resources