regular expression in XPATH - delphi

How can I use the match function of the XPATH to search for whole words in an XML tag?
The follow code return "unknown method matches " :
XML_Doc:=CreateOleObject('Msxml2.DOMDocument.6.0') as IXMLDOMDocument3;
XML_DOC.selectNodes('/DATI/DATO[matches(TEST_TAG,"\bTest\b")]');
Example XML FILE
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>

matches is XPath 2 and Msxml only supports XPath 1.
As far as I know there is no library supporting XPath 2 for Delphi. (although I wrote a XPath 2 library for Freepascal, it should be not so difficult to port)
You could use
/DATI/DATO[not(contains(TEST_TAG," "))]
to find words that do not contain a space, which is XPath 1.

Suppose that by "word" you mean:
Starting with a Latin alphabet letter and all characters contained are either latin letters or decimal digits,
one can use an XPath expression to find exactly these:
//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:copy-of select=
"//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (the provided one, but with an illegal "word" added):
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>#$%Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>
evaluates the above XPath expression and copies the selected elements to the output:
<TEST_TAG>Test</TEST_TAG>
<TEST_TAG>Abc</TEST_TAG>
Do note:
The currently-accepted answer incorrectly produces this:
<TEST_TAG>#$%Test21</TEST_TAG>
as an element whose string value is a "word".

Related

Issues while performing Arithmetic Operations on Strings using XSLT 2.0 or 3.0

I've been having issues while doing arithmetic operations on following XML
Source XML
<?xml version="1.0" encoding="UTF-8"?>
<Compensation>
<Salary>
<BasePay>$18600.12</BasePay>
<Bonus>$3500.99</Bonus>
<Gym>$670</Gym>
<Tax>$30,000</Tax>
</Salary>
<Salary>
<BasePay>$28600.12</BasePay>
<Bonus>$1500.99</Bonus>
<Gym/>
<Tax>$50,000</Tax>
</Salary>
</Compensation>
Current XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:this="urn:this-stylesheet"
exclude-result-prefixes="xs this"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:function name = "this:translateCurrency">
<xsl:param name="stringValue"/>
<xsl:value-of select="format-number(xs:decimal(translate(xs:string($stringValue), '$,','')), '#.##')"/>
</xsl:function>
<xsl:template match="Compensation">
<Worker>
<xsl:for-each select="Salary">
<Comp>
<Amount>
<xsl:value-of select="this:translateCurrency(BasePay) - this:translateCurrency(Tax) "/>
</Amount>
<NoBonus>
<xsl:value-of select="this:translateCurrency(BasePay) + this:translateCurrency(Gym) "/>
</NoBonus>
</Comp>
</xsl:for-each>
</Worker>
</xsl:template>
</xsl:stylesheet>
Currency symbol and commas will always be present in amount related XML elements such as <BasePay> <Bonus> <Gym> <Tax> which i am translating and converting to decimal before adding or substracting.
There are two issues
1. Since my source XML have many Amount related fields, I have declared a function for translating and converting to decimal. However, I'm unable to get my function rounding to two decimal points. I was expecting following line of code in my function will be able to round to two decimal points.
<xsl:value-of select="format-number(xs:decimal(translate(xs:string($stringValue), '$,','')), '#.##')"/>
2. It's possible that some of the amount fields may be null for e.g. <Gym/> is null in my Source XML and current version of XSLT returns Cannot convert to string "" to xs:decimal no digits in value.
I tried $stringValue!='' in xsl:function statement and Gym!='' but to no avail.
Can anyone help me figure out what i should be doing to get my function round to two decimal points and get past no digits in value error?
<NoBonus>
<xsl:value-of select="this:translateCurrency(BasePay) + this:translateCurrency(Gym!='') "/>
</NoBonus>
Expected Result
<?xml version="1.0" encoding="UTF-8"?>
<Worker>
<Comp>
<Amount>-11399.88</Amount>
<NoBonus>19270.12</NoBonus>
</Comp>
<Comp>
<Amount>-21399.88</Amount>
<NoBonus>28600.12</NoBonus>
</Comp>
</Worker>
If you want to convert a string to a decimal value then don't use format-number on it. So for your input values to be converted into xs:decimals you need e.g.
<xsl:function name="this:translateCurrency" as="xs:decimal">
<xsl:param name="input" as="xs:string"/>
<xsl:sequence
select="if ($input = '')
then 0
else xs:decimal(translate($input, '$,', ''))"/>
</xsl:function>
Then use those xs:decimal values in any arithmetic computations, only where you need to output the final result of an arithmetic computation in a certain format use format-number on that result to ensure e.g. you get two decimals.

XSLT: Using a key with a result tree fragment?

Following on from my earlier question, the p elements I want to apply the answer to are actually in a result tree fragment.
How do I make the key function:
<xsl:key name="kRByLevelAndParent" match="p"
use="generate-id(preceding-sibling::p
[not(#ilvl >= current()/#ilvl)][1])"/>
match against p elements in a result tree fragment?
In that answer the key is used via apply-templates:
<xsl:template match="/*">
<list>
<item>
<xsl:apply-templates select="key('kRByLevelAndParent', '')[1]" mode="start">
<xsl:with-param name="pParentLevel" select="$pStartLevel"/>
<xsl:with-param name="pSiblings" select="key('kRByLevelAndParent', '')"/>
</xsl:apply-templates>
</item>
</list>
</xsl:template>
I'd like to pass my result tree fragment as a parameter, and have the key match p elements in that.
Is this the right way to think about it?
There are no result tree fragments in XSLT 2.0 and later, you simply have temporary trees. As for keys, they apply to each document and the key function simply has a third argument to pass in the root node or subtree to search so assuming you have your temporary tree $var you can use key('keyname', key-value-expression, $var) to find elements in $var.

How to break the input xml into token and read the token values

Below is the input to the xslt
<docValues>
01|1596056|CCCCCCCCCDD|028571|ABCCHAS|29150699|150800|FFSSSSFFFF|005| |N|N|002| | |0000020319|29150699|163000|29150699|153100|666666|20140627|400|RRRRR|400| |20150701
02|1596056|028571|29150699|0001|400| | |0001|THIS IS MY SERVICE,,| | | |0901.99| |0.5|
03|1596056|028571|29150699|0001|5103|29150699|29150699| |1.000|99.098| |
<docValues>
Below are the details of xml input
01 : First Line Number
02 : Second Line Number
03 : Third Line Number
XSLT should read input xml and outout below xml
<SO>
<line01_2nd_token>1596056</line01_2nd_token>
<line01_4th_token>028571</line01_4th_token>
<line01_5th_token>ABCCHAS</line01_4th_token>
<SO>
<PARIS>
<line01_4th_token>028571</line01_4th_token>
<line02_5th_token>0001</line02_5th_token>
<line03_11th_token>99.098</line03_11th_token>
</PARIS>
<MY_SERVICE>
<line01_4th_token>028571</line01_4th_token>
<line03_5th_token>0001</line02_5th_token>
<line02_6th_token>400</line02_6th_token>
</MY_SERVICE>
To achieve above output : the lines(01,02,03) in input xml needs to be break into tokens with | as delimiter, so that i can read the desired token value in the respective lines (01,02,03)
So here my question is how to break the input xml into token and read the token values.
Is there any way to achieve the desired output
Please help me to solve this problem
Well, there is a function tokenize you could use twice, first with the second argument '\r?\n' to find the lines, then you can tokenize each line on '\|'.
Here is an example extracting some tokens, you should be able to complete that on your own:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="docValues">
<xsl:variable name="lines" select="tokenize(., '\r?\n')[normalize-space()]"/>
<xsl:variable name="line1-tokens" select="tokenize($lines[1], '\|')"/>
<SO>
<line01_2nd_token><xsl:value-of select="$line1-tokens[2]"/></line01_2nd_token>
</SO>
<PARIS>
<line01_4th_token><xsl:value-of select="$line1-tokens[4]"/></line01_4th_token>
</PARIS>
<MY_SERVICE>
<line01_4th_token><xsl:value-of select="$line1-tokens[4]"/></line01_4th_token>
</MY_SERVICE>
</xsl:template>
</xsl:stylesheet>

For each in xslt

##I need to iterate over commas to break the characters between comma in xslt. There can be at the max 15 words separated by comma##
For example
`Input
<root>
<child>A,B,C,D</child>
</root>
Output
<root>
<List>A</List>
<List>B</List>
<List>C</List>
<List>D</List> `
You may use fn:tokenize to accomplish this. It seperates a string by a delimiter and returns the individual letters without the delimiter.
fn:tokenize("abracadabra", "(ab)|(a)") returns ("", "r", "c", "d", "r", "")
For further reference: http://www.w3.org/TR/xpath-functions/#func-tokenize
Are you sure that you're using xslt 2.0, because fn:tokenize should work:
(Don't use the 'fn' namespace, this is the default namespace in xpath build in functions)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="/root">
<output>
<xsl:for-each select="tokenize(child, ',')">
<child><xsl:value-of select="."/></child>
</xsl:for-each>
</output>
</xsl:template>
</xsl:transform>
Example:
http://xsltransform.net/eiQZDbf

How the "as" attribute of xsl:template affects the result of xsl:apply-templates

Given this source document:
<things>
<thing><duck>Eider</duck></thing>
<thing><duck>Mallard</duck></thing>
<thing><duck>Muscovy</duck></thing>
</things>
I require the following output
Fat Eider, Fat Mallard, Fat Muscovy
which I can indeed get with this XSL transform:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of separator=", ">
<xsl:apply-templates select="//duck"/>
</xsl:value-of>
</xsl:template>
<xsl:template match="duck" as="xs:string">
<xsl:value-of select="concat('Fat ', .)"/>
</xsl:template>
</xsl:stylesheet>
However, I have three questions:
Question 1. (specific)
If I remove as="xs:string" from the duck template, I get the following output:
Fat EiderFat MallardFat Muscovy
Why? My understanding is that in XSLT 2.0 the result of xsl:apply-templates is always a sequence, and that xsl:value-of inserts its separator between the items in the sequence. So why does the sequence seem to "collapse" when the template has no as attribute? Bonus points for pointing me towards appropriate pages of Michael Kay's excellent "XSLT 2.0 and XPath 2.0, 4th Edition" book.
Question 2. (vague!)
As a novice user of XSLT, it seems to me that there are probably many ways to solve this problem. Can you put forward a good solution that takes a different approach? How do you choose between approaches?
Question 3.
Debugging. Can you recommend how to dump out intermediate results that would indicate the difference between the presence and the absence of the as attribute to the template?
See http://www.w3.org/TR/xslt20/#value-of which says
The string value of the new text node may be defined either by using
the select attribute, or by the sequence constructor (see 5.7 Sequence
Constructors) that forms the content of the xsl:value-of element.
These are mutually exclusive, and one of them must be present. The way
in which the value is constructed is specified in 5.7.2 Constructing
Simple Content.
So we need to look at http://www.w3.org/TR/xslt20/#constructing-simple-content and that says "2. Adjacent text nodes in the sequence are merged into a single text node.". So that is what is happening without the as="xs:string", the sequence constructor inside the xsl:value-of creates adjacent text nodes which are merged into a single text node. If you have as="xs:string" or did <xsl:sequence select="concat('Fat ', .)"/> the sequence constructor is of a sequence of primitive string values.

Resources