How to break the input xml into token and read the token values - xslt-2.0

Below is the input to the xslt
<docValues>
01|1596056|CCCCCCCCCDD|028571|ABCCHAS|29150699|150800|FFSSSSFFFF|005| |N|N|002| | |0000020319|29150699|163000|29150699|153100|666666|20140627|400|RRRRR|400| |20150701
02|1596056|028571|29150699|0001|400| | |0001|THIS IS MY SERVICE,,| | | |0901.99| |0.5|
03|1596056|028571|29150699|0001|5103|29150699|29150699| |1.000|99.098| |
<docValues>
Below are the details of xml input
01 : First Line Number
02 : Second Line Number
03 : Third Line Number
XSLT should read input xml and outout below xml
<SO>
<line01_2nd_token>1596056</line01_2nd_token>
<line01_4th_token>028571</line01_4th_token>
<line01_5th_token>ABCCHAS</line01_4th_token>
<SO>
<PARIS>
<line01_4th_token>028571</line01_4th_token>
<line02_5th_token>0001</line02_5th_token>
<line03_11th_token>99.098</line03_11th_token>
</PARIS>
<MY_SERVICE>
<line01_4th_token>028571</line01_4th_token>
<line03_5th_token>0001</line02_5th_token>
<line02_6th_token>400</line02_6th_token>
</MY_SERVICE>
To achieve above output : the lines(01,02,03) in input xml needs to be break into tokens with | as delimiter, so that i can read the desired token value in the respective lines (01,02,03)
So here my question is how to break the input xml into token and read the token values.
Is there any way to achieve the desired output
Please help me to solve this problem

Well, there is a function tokenize you could use twice, first with the second argument '\r?\n' to find the lines, then you can tokenize each line on '\|'.
Here is an example extracting some tokens, you should be able to complete that on your own:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="docValues">
<xsl:variable name="lines" select="tokenize(., '\r?\n')[normalize-space()]"/>
<xsl:variable name="line1-tokens" select="tokenize($lines[1], '\|')"/>
<SO>
<line01_2nd_token><xsl:value-of select="$line1-tokens[2]"/></line01_2nd_token>
</SO>
<PARIS>
<line01_4th_token><xsl:value-of select="$line1-tokens[4]"/></line01_4th_token>
</PARIS>
<MY_SERVICE>
<line01_4th_token><xsl:value-of select="$line1-tokens[4]"/></line01_4th_token>
</MY_SERVICE>
</xsl:template>
</xsl:stylesheet>

Related

Issues while performing Arithmetic Operations on Strings using XSLT 2.0 or 3.0

I've been having issues while doing arithmetic operations on following XML
Source XML
<?xml version="1.0" encoding="UTF-8"?>
<Compensation>
<Salary>
<BasePay>$18600.12</BasePay>
<Bonus>$3500.99</Bonus>
<Gym>$670</Gym>
<Tax>$30,000</Tax>
</Salary>
<Salary>
<BasePay>$28600.12</BasePay>
<Bonus>$1500.99</Bonus>
<Gym/>
<Tax>$50,000</Tax>
</Salary>
</Compensation>
Current XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:this="urn:this-stylesheet"
exclude-result-prefixes="xs this"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:function name = "this:translateCurrency">
<xsl:param name="stringValue"/>
<xsl:value-of select="format-number(xs:decimal(translate(xs:string($stringValue), '$,','')), '#.##')"/>
</xsl:function>
<xsl:template match="Compensation">
<Worker>
<xsl:for-each select="Salary">
<Comp>
<Amount>
<xsl:value-of select="this:translateCurrency(BasePay) - this:translateCurrency(Tax) "/>
</Amount>
<NoBonus>
<xsl:value-of select="this:translateCurrency(BasePay) + this:translateCurrency(Gym) "/>
</NoBonus>
</Comp>
</xsl:for-each>
</Worker>
</xsl:template>
</xsl:stylesheet>
Currency symbol and commas will always be present in amount related XML elements such as <BasePay> <Bonus> <Gym> <Tax> which i am translating and converting to decimal before adding or substracting.
There are two issues
1. Since my source XML have many Amount related fields, I have declared a function for translating and converting to decimal. However, I'm unable to get my function rounding to two decimal points. I was expecting following line of code in my function will be able to round to two decimal points.
<xsl:value-of select="format-number(xs:decimal(translate(xs:string($stringValue), '$,','')), '#.##')"/>
2. It's possible that some of the amount fields may be null for e.g. <Gym/> is null in my Source XML and current version of XSLT returns Cannot convert to string "" to xs:decimal no digits in value.
I tried $stringValue!='' in xsl:function statement and Gym!='' but to no avail.
Can anyone help me figure out what i should be doing to get my function round to two decimal points and get past no digits in value error?
<NoBonus>
<xsl:value-of select="this:translateCurrency(BasePay) + this:translateCurrency(Gym!='') "/>
</NoBonus>
Expected Result
<?xml version="1.0" encoding="UTF-8"?>
<Worker>
<Comp>
<Amount>-11399.88</Amount>
<NoBonus>19270.12</NoBonus>
</Comp>
<Comp>
<Amount>-21399.88</Amount>
<NoBonus>28600.12</NoBonus>
</Comp>
</Worker>
If you want to convert a string to a decimal value then don't use format-number on it. So for your input values to be converted into xs:decimals you need e.g.
<xsl:function name="this:translateCurrency" as="xs:decimal">
<xsl:param name="input" as="xs:string"/>
<xsl:sequence
select="if ($input = '')
then 0
else xs:decimal(translate($input, '$,', ''))"/>
</xsl:function>
Then use those xs:decimal values in any arithmetic computations, only where you need to output the final result of an arithmetic computation in a certain format use format-number on that result to ensure e.g. you get two decimals.

XSLT: Using a key with a result tree fragment?

Following on from my earlier question, the p elements I want to apply the answer to are actually in a result tree fragment.
How do I make the key function:
<xsl:key name="kRByLevelAndParent" match="p"
use="generate-id(preceding-sibling::p
[not(#ilvl >= current()/#ilvl)][1])"/>
match against p elements in a result tree fragment?
In that answer the key is used via apply-templates:
<xsl:template match="/*">
<list>
<item>
<xsl:apply-templates select="key('kRByLevelAndParent', '')[1]" mode="start">
<xsl:with-param name="pParentLevel" select="$pStartLevel"/>
<xsl:with-param name="pSiblings" select="key('kRByLevelAndParent', '')"/>
</xsl:apply-templates>
</item>
</list>
</xsl:template>
I'd like to pass my result tree fragment as a parameter, and have the key match p elements in that.
Is this the right way to think about it?
There are no result tree fragments in XSLT 2.0 and later, you simply have temporary trees. As for keys, they apply to each document and the key function simply has a third argument to pass in the root node or subtree to search so assuming you have your temporary tree $var you can use key('keyname', key-value-expression, $var) to find elements in $var.

How the "as" attribute of xsl:template affects the result of xsl:apply-templates

Given this source document:
<things>
<thing><duck>Eider</duck></thing>
<thing><duck>Mallard</duck></thing>
<thing><duck>Muscovy</duck></thing>
</things>
I require the following output
Fat Eider, Fat Mallard, Fat Muscovy
which I can indeed get with this XSL transform:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of separator=", ">
<xsl:apply-templates select="//duck"/>
</xsl:value-of>
</xsl:template>
<xsl:template match="duck" as="xs:string">
<xsl:value-of select="concat('Fat ', .)"/>
</xsl:template>
</xsl:stylesheet>
However, I have three questions:
Question 1. (specific)
If I remove as="xs:string" from the duck template, I get the following output:
Fat EiderFat MallardFat Muscovy
Why? My understanding is that in XSLT 2.0 the result of xsl:apply-templates is always a sequence, and that xsl:value-of inserts its separator between the items in the sequence. So why does the sequence seem to "collapse" when the template has no as attribute? Bonus points for pointing me towards appropriate pages of Michael Kay's excellent "XSLT 2.0 and XPath 2.0, 4th Edition" book.
Question 2. (vague!)
As a novice user of XSLT, it seems to me that there are probably many ways to solve this problem. Can you put forward a good solution that takes a different approach? How do you choose between approaches?
Question 3.
Debugging. Can you recommend how to dump out intermediate results that would indicate the difference between the presence and the absence of the as attribute to the template?
See http://www.w3.org/TR/xslt20/#value-of which says
The string value of the new text node may be defined either by using
the select attribute, or by the sequence constructor (see 5.7 Sequence
Constructors) that forms the content of the xsl:value-of element.
These are mutually exclusive, and one of them must be present. The way
in which the value is constructed is specified in 5.7.2 Constructing
Simple Content.
So we need to look at http://www.w3.org/TR/xslt20/#constructing-simple-content and that says "2. Adjacent text nodes in the sequence are merged into a single text node.". So that is what is happening without the as="xs:string", the sequence constructor inside the xsl:value-of creates adjacent text nodes which are merged into a single text node. If you have as="xs:string" or did <xsl:sequence select="concat('Fat ', .)"/> the sequence constructor is of a sequence of primitive string values.

regular expression in XPATH

How can I use the match function of the XPATH to search for whole words in an XML tag?
The follow code return "unknown method matches " :
XML_Doc:=CreateOleObject('Msxml2.DOMDocument.6.0') as IXMLDOMDocument3;
XML_DOC.selectNodes('/DATI/DATO[matches(TEST_TAG,"\bTest\b")]');
Example XML FILE
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>
matches is XPath 2 and Msxml only supports XPath 1.
As far as I know there is no library supporting XPath 2 for Delphi. (although I wrote a XPath 2 library for Freepascal, it should be not so difficult to port)
You could use
/DATI/DATO[not(contains(TEST_TAG," "))]
to find words that do not contain a space, which is XPath 1.
Suppose that by "word" you mean:
Starting with a Latin alphabet letter and all characters contained are either latin letters or decimal digits,
one can use an XPath expression to find exactly these:
//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:copy-of select=
"//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (the provided one, but with an illegal "word" added):
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>#$%Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>
evaluates the above XPath expression and copies the selected elements to the output:
<TEST_TAG>Test</TEST_TAG>
<TEST_TAG>Abc</TEST_TAG>
Do note:
The currently-accepted answer incorrectly produces this:
<TEST_TAG>#$%Test21</TEST_TAG>
as an element whose string value is a "word".

Why does index-of() return multiple values when applied to a sequence of unique nodes?

I'm using xpath2's index-of value to return the index of current() within a sorted sequence of nodes. Using SAXON, the sorted sequence of nodes are unique, yet index-of returns a sequence of two values.
This does not happen all the time, just very occasionally, but not for any reason I can find. Can someone please explain what is going on?
I have worked up a minimal example based on an example of data that routines gives this odd behavior.
The source data is:
<data>
<student userID="1" userName="user1"/>
<session startedOn="01/16/2012 15:01:18">
</session>
<session startedOn="11/16/2011 13:31:33">
</session>
</data>
My xsl document puts the session nodes into a sorted sequence $orderd at the very top of the root template:
<xsl:template match="/">
<xsl:variable name="nodes" as="node()*" select="/data/session"></xsl:variable>
<xsl:variable name="orderd" as="node()*">
<xsl:for-each select="$nodes">
<xsl:sort select="xs:dateTime(xs:dateTime(concat(substring(normalize-space(#startedOn),7,4),'-',substring(normalize-space(#startedOn),1,2),'-',substring(normalize-space(#startedOn),4,2),'T',substring(normalize-space(#startedOn),12,8)))
)" order="ascending"/>
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:variable>
Since the nodes were already ordered by #startOn but in the opposite order, the sequence $orderd should be the same as document-ordered sequence $nodes, except in reverse order.
When I create output using a for-each statement, I find that somehow the two nodes are seen as identical when tested using index-of.
The code below is used to output data (and comes immediately after the chunk above):
<output>
<xsl:for-each select="$nodes">
<xsl:sort select="position()" order="descending"></xsl:sort>
<xsl:variable name="index" select="index-of($orderd,current())" as="xs:integer*"></xsl:variable>
<xsl:variable name="pos" select="position()"></xsl:variable>
<session reverse-documentOrder="{$pos}" sortedOrder="{$index}"/>
</xsl:for-each>
</output>
As the output (shown below) indicates, the index-of function is returning the sequence (1,2), meaning that it sees both nodes as identical. I have checked the expression used to sort the values, and it produces distinct and well-formed date-Time strings.
<output>
<session reverse=documentOrder="1"
sortedOrder="1 2"/>
<session reverse-documentOrder="2"
sortedOrder="1 2"/>
</output>
Not relying on the generate-id() function, which is XSLT function, but not XPath function, one can write a simple index-of() function that operates on node identity:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vNum3" select="/*/*[3]"/>
<xsl:variable name="vSeq" select="/*/*[1], /*/*[3], /*/*[3]"/>
<xsl:template match="/">
<xsl:sequence select="my:index-of($vSeq, $vNum3)"/>
</xsl:template>
<xsl:function name="my:index-of" as="xs:integer*">
<xsl:param name="pSeq" as="node()*"/>
<xsl:param name="pNode" as="node()"/>
<xsl:for-each select="$pSeq">
<xsl:if test=". is $pNode">
<xsl:sequence select="position()"/>
</xsl:if>
</xsl:for-each>
</xsl:function>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
the wanted, correct result is returned:
2 3
Explanation: Use of the is operator.
The documentation http://www.w3.org/TR/xpath-functions/#func-index-of of index-of says "The items in the sequence $seqParam are compared with $srchParam under the rules for the eq operator. Values of type xs:untypedAtomic are compared as if they were of type xs:string.". So you are trying to compare untyped element nodes and that means they are compared as strings and both session elements have the same white space only string contents. That way both are compared as equal.
I am not sure what to suggest as I am not sure what you want to achieve but I hope the above explains the result you get.

Resources