Creating a CDATA section using xsl:text element within xslt processor - orbeon

Assuming I have an XSL variable called $apps with XML content:
<APPLICATION><DATA1/><DATA2/><DATA3/></APPLICATION>
I am trying to generate a string from this XML, with XML special chars handled, using:
let $applicationsModified := <xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text><xsl:copy-of select="$apps"/><xsl:text disable-output-escaping="yes">]]></xsl:text>
What I get is:
let $applicationsModified := <?javax.xml.transform.disable-output-escaping?></xsl:text><xsl:copy-of select="$apps"/><xsl:text disable-output-escaping="yes"><?javax.xml.transform.enable-output-escaping?>
What I want to get is:
<![CDATA[<APPLICATION><DATA1/><DATA2/><DATA3/></APPLICATION>]]>
Am I doing something wrong?

Related

XElement Parse error when trying to parse string

I am getting xml parse error while trying to parse a string (with CDATA within CDATA)
var cont = "<op><![CDATA[someData<p><![CDATA[someotherData]]></p></op>";
XElement.Parse(cont);
Error:
The 'op' start tag on line 1 position 2 does not match the end tag of 'p'. Line 1, position 52.
Can we have CDATA within CDATA ? If we can, then why am I getting the error.
Below code works fine (It does not contain CDATA within CDATA).
var cont = "<op><![CDATA[someData]]</op>";
XElement.Parse(cont);
1 <op>
2 <![CDATA[
3 someData
4 <p>
5 <![CDATA[someotherData]]>
6 </p>
7 </op>
When the XML Parser encounters the ]]> in line 5 , it will terminate the first <![CDATA[ it met in line 2 . As a result , you can never have nested CDATA within an CDATA.
CDATA is not designed to hold xmlelements , but to hold character data that might contains characteres such as <, > and so on , which allows us to avoid escaping them as < , > respectively , and to write them and display them in a clean way .
So the content between <![CDATA[ and ]] will be treated as plain text , with no further processing , even if it looks like that there's a hierarchy . In other words , they are plain strings . Let's take your code as an example :
var cont = "<op><![CDATA[ <foo><bar></bar></foo> ]]></op>";
var xml=XElement.Parse(cont);
Here the FirstNode of xml will be a plain text foo><bar></bar></foo> , and the FirstNode of the FirstNode will be null.
Since the parser will always treat the data between <![CDATA[ and ]] as a plain string , there's no "standard" closest valid way to represent them . Just encode them and decode them . For example , we can urlencode the data :
string xmlstr= #"<op><![CDATA[
<helloworld/>
someData%0A%3Cp%3E%0A%3C!%5BCDATA%5BsomeotherData%5D%5D%3E%0A%3C%2Fp%3E
]]></op>";
var xml = XElement.Parse(xmlstr);
var subxmlString=System.Web.HttpUtility.UrlDecode(xml.Value);
// make sure there' must be a root element
var subxml= XElement.Parse($"<root>${subxmlString}</root>");

Can MSXML XPath select attributes? ( UPD: real issue was with default no-prefix namespace )

I want to try parsing Excel XML Spreadsheet file with MSXML and XPath.
https://technet.microsoft.com/en-us/magazine/2006.01.blogtales
https://msdn.microsoft.com/en-us/library/aa140066.aspx
It has a root element of <Workbook xmlns.... xmlns....> and a bunch of next-level nodes <Worksheet ss:Name="xxxx">.
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
....
<Worksheet ss:Name="Карточка">
....
</Worksheet>
<Worksheet ss:Name="Баланс">
...
...
...
</Worksheet>
</Workbook>
At a certain step I want to use XPath to get the very names of the worksheets.
NOTE: I do not want the get the names indirectly, that is to select those Worksheet nodes first and then enumerating them manually read their ss:Name child attribute nodes. That I can do, and it is not the topic here.
What I want is to utilize XPath flexibility: to directly fetch those ss:Name nodes without extra indirection layers.
procedure DoParseSheets( FileName: string );
var
rd: IXMLDocument;
ns: IDOMNodeList;
n: IDOMNode;
sel: IDOMNodeSelect;
ms: IXMLDOMDocument2;
ms1: IXMLDOMDocument;
i: integer;
s: string;
begin
rd := TXMLDocument.Create(nil);
rd.LoadFromFile( FileName );
if Supports(rd.DocumentElement.DOMNode,
IDOMNodeSelect, sel) then
begin
ms1 := (rd.DOMDocument as TMSDOMDocument).MSDocument;
if Supports( ms1, IXMLDOMDocument2, ms) then begin
ms.setProperty('SelectionNamespaces',
'xmlns="urn:schemas-microsoft-com:office:spreadsheet" '+
'xmlns:o="urn:schemas-microsoft-com:office:office" '+
'xmlns:x="urn:schemas-microsoft-com:office:excel" '+
'xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"');
ms.setProperty('SelectionLanguage', 'XPath');
end;
// ns := sel.selectNodes('/Workbook/Worksheet/#ss:Name/text()');
// ns := sel.selectNodes('/Workbook/Worksheet/#Name/text()');
ns := sel.selectNodes('/Workbook/Worksheet/#ss:Name');
// ns := sel.selectNodes('/Workbook/Worksheet/#Name');
// ns := sel.selectNodes('/Workbook/Worksheet');
for i := 0 to ns.length - 1 do
begin
n := ns.item[i];
s := n.nodeValue;
ShowMessage(s);
end;
end;
end;
When I use the dumbed down '/Workbook/Worksheet' query MSXML correctly return the nodes. But as soon as I add the attribute to the query - MSXML returns empty set.
Other XPath implementations like XMLPad Pro or http://www.freeformatter.com/xpath-tester.html correctly return the list of ss:Name attribute nodes. But MSXML does not.
What would be the XPath query text to help MSXML return the attribute nodes with given names ?
UPD. #koblik suggested a link to MS.Net selector (not MSXML one) and there are two examples there
https://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx
Example 1: book[#style] - All elements with style attributes, of the current context.
Example 2: book/#style - The style attribute for all elements of the current context.
That is the difference I told in the "NOTE" above: I don't need those books, I need the styles. I need attribute-nodes, not element-nodes!
And that Example 2 syntax is what MSXML seems to fail at.
UPD.2: One tester shows an interesting error claim:
The default (no prefix) Namespace URI for XPath queries is always '' and it cannot be redefined to 'urn:schemas-microsoft-com:office:spreadsheet'
I wonder if that claim about no default namespaces in XPath is really part of standard or just MSXML implementation limitation.
Then if to delete the default NS the results are how they should be:
Variant 1:
Variant 2:
I wonder if that claim about no default namespaces in XPath is really part of standard or just MSXML implementation limitation.
UPD.3: Martin Honnen in comments explains that line: See w3.org/TR/xpath/#node-tests for XPath 1.0 (as supported by Microsoft MSXML), it clearly states "A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null". So in XPath 1.0 a path like "/Workbook/Worksheet" selects elements of that name in no namespace.
UPD.4: So the selection works with '/ss:Workbook/ss:Worksheet/#ss:Name' XPath query, returning "ss:Name" attributes nodes directy. In the source XML document both default (no-prefix) and "ss:" namespaces are bound to the same URI. This URI is acknowledged by the XPath engine. But not the default namespace, which can not be redefined in MSXML XPath engine ( implementing 1.0 specs ). So to make it work, the default namespace should be mapped to another explicit prefix ( either already existing one or a newly created ) via URI and then that substitute prefix would be used in the XPath selection string. Since namespaces matching goes via URI not via prefixes it would not matter if prefixes used in the document and in the query match or not, they would be compared via their URIs.
ms.setProperty('SelectionLanguage', 'XPath');
ms.setProperty('SelectionNamespaces',
'xmlns:AnyPrefix="urn:schemas-microsoft-com:office:spreadsheet"');
and then
ns := sel.selectNodes(
'/AnyPrefix:Workbook/AnyPrefix:Worksheet/#AnyPrefix:Name' );
Thanks to Asbjørn and Martin Honnen for explaining those trivial after-the-fact but not obvious a priori relations.
The issue is that MSXML doesn't support default namespaces when using XPath. To overcome this, you must give the default namespace an explicit prefix, and use this:
ms.setProperty('SelectionNamespaces',
'xmlns:d="urn:schemas-microsoft-com:office:spreadsheet" '+
'xmlns:o="urn:schemas-microsoft-com:office:office" '+
'xmlns:x="urn:schemas-microsoft-com:office:excel" '+
'xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"');
Note how I added the d prefix to the default namespace. Then you can do the selection like this:
ns := sel.selectNodes('/d:Workbook/d:Worksheet/#ss:Name');
The reason this works is that when parsing the XML data, MSXML associates the namespace to each node. At this stage it does handle the default namespace, so the Workbook elements get associated with the urn:schemas-microsoft-com:office:spreadsheet namespace.
However, note that it does not store the namespace prefixes! Thus you can use your own prefixes for the namespaces when you set SelectionNamespaces.
Now, when doing the XPath selection, if the nodes have a namespace you have to specify namespaces for all elements in the XPath, like my example above. And then you use your own prefixes which you set with SelectionNamespaces.

Embed image in XML docs

The MSDN page about XML documentation shows that you can write simple things like:
/// <summary>Builds a new string whose characters are the results of applying the function <c>mapping</c>
/// to each of the characters of the input string and concatenating the resulting
/// strings.</summary>
/// <param name="mapping">The function to produce a string from each character of the input string.</param>
///<param name="str">The input string.</param>
///<returns>The concatenated string.</returns>
///<exception cref="System.ArgumentNullException">Thrown when the input string is null.</exception>
val collect : (char -> string) -> string -> string
But can you embed images in your XML documentation?
You can include <img ... /> and other HTML tags in the XML documentation. I just tried this and Visual Studio simply skips over the image (so you will not see it in the IntelliSense) but the fshtmldoc tool in F# Power Pack simply copies the HTML tags to the output HTML document including images.
/// <summary>Hi <img src="http://tomasp.net/img/fpman.jpg" /> there!</summary>
type IMultiKey =
// (...)
Gives me the following generated documentation:
I think the C# compiler does some additional validation of the XML tags, but I do not think this is done in the F# compiler. As an aside, I find writing the XML documents a annoyingly long, so I was playing with using F# Formatting to write them in Markdown instead (but I do not have anything ready yet).

XML Parsing - node.text method removing trailling spaces

I have a big xml file which i'm parsing using jscript. I have used the following code to load the xml
var xmlDoc = Sys.OleObject("Msxml2.DOMDocument.6.0");
xmlDoc.async = false;
// Load xml data from a file
xmlDoc.load(this._studyDocPath);
Now if i use the following code
var text = this.xmlDoc.selectSingleNode(xPath);
text = node.text;
the text variable holds the innertext of a perticular tag. But if I have tag like this
<Text>ABCD </Text>
then the node.text returns me only the value 'ABCD' i.e. it automatically trims the space. But I dont need to trim any trailling spaces. I need the text as it is. How can I achieve that?
Looking forward to your response
Thanks in Advance
We can use node.firstChild.nodeValue with a null check on node.firstChild

Finding elements with XPath in Delphi

I am trying to find an element in an XML document in Delphi. I have this code, but it always says 0 elements in the log:
function TForm1.KannaSidu: Boolean;
var
Doc: IXMLDOMDocument;
List: IXMLDomNodeList;
begin
try
Doc := CreateOleObject('Microsoft.XMLDOM') as IXMLDomDocument;
Doc.async:=False;
Doc.load(Filename);
except
LogTx('Error on page');
end;
List:=Doc.selectNodes('/html/head');
LogTx(IntToStr(List.length)+' elements');
Result:=False;
end;
So how do I make XPath work?
In the example code I find online for the selectNodes method, it is preceded by code that sets the document's SelectionNamespaces property via setProperty. Some even set SelectionLanguage, too.
Doc.setProperty('SelectionLanguage', 'XPath');
Doc.setProperty('SelectionNamespaces',
'xmlns:xsl=''http://www.w3.org/1999/XSL/Transform''');
Based on the element names you're searching for, I guess you're processing an HTML file. The basic HTML elements are in the http://www.w3.org/1999/xhtml namespace, so try this:
Doc.setProperty('SelectionNamespaces',
'xmlns:x=''http://www.w3.org/1999/xhtml''');
List := Doc.selectNodes('/x:html/x:head');
See also:
selectNodes does not give node list when xmlns is used on Microsoft's forum.
If you're just trying to load a plain html file as xml, it would probably have multiple reasons to fail and choke on things like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
You have to test that it actually loads correctly before doing anything else:
if not Doc.load(filename) then
raise Exception.Create('XML Loading error:' + Trim(Doc.parseError.reason));
It will give you the specific reason for the failure like this one:
XML Loading error:End tag 'head' does not match the start tag 'link'.
IXMLDOMDocument.Load() does not raise an exception if something goes wrong with your file or with its content. Try the following to be sure there is nothing bad with it:
...
Doc.load(Filename);
if Doc.parseError.errorCode <> 0 then
ShowMessage('Error : ' + + Doc.parseError.reason)
else
ShowMessage('No problem so far !');
...
I suck at XPath but maybe if html is your root node you don't need to include it in your query string, so try the following :
List:=Doc.selectNodes('//html/head');
or
List:=Doc.selectNodes('//head');

Resources