Improving the performance of XSL - xslt-2.0

I am using the below XSL 2.0 code to find the ids of the text nodes that contains the list of indices that i give as input. the code works perfectly but in terms for performance it is taking a long time for huge files. Even for huge files if the index values are small then the result is quick in few ms. I am using saxon9he Java processor to execute the XSL.
<xsl:variable name="insert-data" as="element(data)*">
<xsl:for-each-group
select="doc($insert-file)/insert-data/data"
group-by="xsd:integer(#index)">
<xsl:sort select="current-grouping-key()"/>
<data
index="{current-grouping-key()}"
text-id="{generate-id(
$main-root/descendant::text()[
sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
][1]
)}">
<xsl:copy-of select="current-group()/node()"/>
</data>
</xsl:for-each-group>
</xsl:variable>
In the above solution if the index value is too huge say 270962 then the time taken for the XSL to execute is 83427ms. In huge files if the index value is huge say 4605415, 4605431 it takes several minutes to execute. Seems the computation of the variable "insert-data" takes time though it is a global variable and computed only once. Should the XSL be addessed or the processor? How can i improve the performance of the XSL.

I'd guess the problem is the generation of text-id, i.e. the expression
generate-id(
$main-root/descendant::text()[
sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
][1]
)
You are potentially recalculating a lot of sums here. I think the easiest path here would be to invert your approach: recurse across the text nodes in the document, aggregate the string length so far, and output data elements each time a new #index is reached. The following example illustrates the approach. Note that each unique #index and each text node is visited only once.
<xsl:variable name="insert-doc" select="doc($insert-file)"/>
<xsl:variable name="insert-data" as="element(data)*">
<xsl:call-template name="calculate-data"/>
</xsl:variable>
<xsl:key name="index" match="data" use="xsd:integer(#index)"/>
<xsl:template name="calculate-data">
<xsl:param name="text-nodes" select="$main-root//text()"/>
<xsl:param name="previous-lengths" select="0"/>
<xsl:param name="indexes" as="xsd:integer*">
<xsl:perform-sort
select="distinct-values(
$insert-doc/insert-data/data/#index/xsd:integer(.))">
<xsl:sort/>
</xsl:perform-sort>
</xsl:param>
<xsl:if test="$text-nodes">
<xsl:variable name="total-lengths"
select="$previous-lengths + string-length($text-nodes[1])"/>
<xsl:choose>
<xsl:when test="$total-lengths ge number($indexes[1])">
<data
index="{$indexes[1]}"
text-id="{generate-id($text-nodes[1])}">
<xsl:copy-of select="key('index', $indexes[1],
$insert-doc)"/>
</data>
<!-- Recursively move to the next index. -->
<xsl:call-template name="calculate-data">
<xsl:with-param
name="text-nodes"
select="$text-nodes"/>
<xsl:with-param
name="previous-lengths"
select="$previous-lengths"/>
<xsl:with-param
name="indexes"
select="subsequence($indexes, 2)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<!-- Recursively move to the text node. -->
<xsl:call-template name="calculate-data">
<xsl:with-param
name="text-nodes"
select="subsequence($text-nodes, 2)"/>
<xsl:with-param
name="previous-lengths"
select="$total-lengths"/>
<xsl:with-param
name="indexes"
select="$indexes"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>

Related

split xml file with multiple value using xslt

trying to split the xml file having multiple elements into separate xml having different elements.
Input File:
<Person>
<firstname>ABC</firstname>
<lastname>ABC</lastname>
<address>address1</address>
<address>address2</address>
<city>city</city>
<state>state</state>
<currency>currency1</currency>
<currency>currency2</currency>
</Person>
Need to split above file into two files as
Output file-1
<Person>
<firstname>ABC</firstname>
<lastname>ABC</lastname>
<address>address1</address>
<city>city</city>
<state>state</state>
<currency>currency1</currency>
</Person>
Output file -2
<Person>
<firstname>ABC</firstname>
<lastname>ABC</lastname>
<address>address2</address>
<city>city</city>
<state>state</state>
<currency>currency2</currency>
</Person>
Here's a generic solution that outputs N files where N is the maximum number of same-name children elements, where file N contains the Nth instance of each element name if there are at least N, or the first one otherwise:
<xsl:template match="/*">
<xsl:variable name="this" select="."/>
<xsl:variable name="names" select="distinct-values(*/name())"/>
<xsl:for-each select="1 to max(
for $name in $names return count(*[name()=$name]))"/>
<xsl:variable name="n" select="."/>
<xsl:result-document href="file{.}">
<xsl:element name="{name($this)}">
<xsl:for-each-group select="$this/*" group-by="name()">
<xsl:copy-of select="(current-group()[$n], .)[1]"/>
</xsl:for-each-group>
</xsl:element>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
This should do what you want with the example input you have shown, but whether it does the right thing with any other input is anyone's guess, because you have under-specified the requirements.

XSLT: How do I get the value of a variable by concatenating a string and another variable?

I am trying to get the value of a variable by concatenating a string and another variable. But the result is a string with the variable name, not the value. So the following code fails since its trying to evaluate a string against a number. Also where the value should be, there is only the name of the variable.
The aim is to make a scrset for images ranging from 300px to maximum 4200. But stopping the srcset before it reaches the maxWidth value. So if an image has a maxWidth of 2000, then the iteration would stop after outputting 1800.
This is the code I have so far:
<xsl:variable name="count" select="14"/>
<xsl:variable name="maxWidth" select="2200"/> <!-- this value will be dynamic depending on each image (taken from an attribute on the image) -->
<xsl:variable name="loopIndex1" select="300"/>
<xsl:variable name="loopIndex2" select="600"/>
<xsl:variable name="loopIndex3" select="900"/>
<xsl:variable name="loopIndex4" select="1200"/>
<xsl:variable name="loopIndex5" select="1500"/>
<xsl:variable name="loopIndex6" select="1800"/>
<xsl:variable name="loopIndex7" select="2100"/>
<xsl:variable name="loopIndex8" select="2400"/>
<xsl:variable name="loopIndex9" select="2700"/>
<xsl:variable name="loopIndex10" select="3000"/>
<xsl:variable name="loopIndex11" select="3300"/>
<xsl:variable name="loopIndex12" select="3600"/>
<xsl:variable name="loopIndex13" select="3900"/>
<xsl:variable name="loopIndex14" select="4200"/>
<xsl:attribute name="srcset">
<xsl:for-each select="1 to $count">
<xsl:variable name="index" select="position()"/>
<xsl:variable name="source">
<xsl:value-of select="concat('loopIndex', $index)"/>
</xsl:variable>
<xsl:if test="$source < $maxWidth">
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=<xsl:value-of select="concat($source, ' ')" /> <xsl:value-of select="$source" />w,
</xsl:if>
</xsl:for-each>
</xsl:attribute>
If I remove the test just to get some output, the output would be:
srcset="
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=loopIndex1 loopIndex1w,
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=loopIndex2 loopIndex2w,
etc
"
The wanted result is:
srcset="
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=300 300w,
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=600 600w,
etc
"
I also need to not have the comma after the last item. Meaning if http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=600 600w, was the last output then the comma at the end would not be there, like this:
http://imagescalerserver.com/?url=http://test.com/1108932.jpg&w=600 600w
Ideally I would like to not have to make the loopIndex variables, but rather just increment the value by 300 for a total of 14 iterations, but since variables cant be changed this is the best I've managed. If there is a better way, I'd appreciate to hear about it.
Declare a single variable <xsl:variable name="loopIndex" select="300, 600, 900, ..., 4200"/> (you need to spell out the ... in your code) and then you can set
<xsl:variable name="source" select="$loopIndex[current()]"/>
inside of the for-each.

XSLT template matching on //

The following xslt code produces the output incorrectly. Actually, it should increment the values by 1, But, It produces increment by 2. I need to get this know why. Could anyone let me know why this?
the xml input is
<AAA>
<BBB>cc </BBB>
<BBB>ff </BBB>
<BBB>aa </BBB>
<BBB>fff </BBB>
<BBB>FFF </BBB>
<BBB>Aa </BBB>
<BBB>ccCCC </BBB>
</AAA>
and the xslt input code is
<xsl:template match="/">
<xsl:text>
BBB[</xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>]: </xsl:text>
<xsl:value-of select="."/>
</xsl:template>
It produces the output as follows [wrongly], but it should provide such as [1], [2], [3] etc.
BBB[2]: cc
BBB[4]: ff
BBB[8]: aa
BBB[10]: fff
BBB[12]: FFF
BBB[14]: Aa
BBB[16]: ccCCC
Any idea?
I am pretty sure if you only have <xsl:template match="/"> that then you won't even get the output you say you get.
Assuming you have
<xsl:template match="BBB">
<xsl:text>
BBB[</xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>]: </xsl:text>
<xsl:value-of select="."/>
</xsl:template>
then the result depends on other factors like whether you have <xsl:strip-space elements="*"/> or whether you use
<xsl:template match="AAA">
<xsl:apply-templates select="*"/>
</xsl:template>
Your current result you have suggests you are not stripping white space text nodes and you either rely on built-in templates or you have <xsl:apply-templates/> or <xsl:apply-templates select="node()"/> in the template matching AAA. That way the current node list contains both element node as well as text nodes (between element nodes) resulting in your position results 2, 4, 6, ...
I would fix the code with
<xsl:template match="BBB">
<xsl:text>
BBB[</xsl:text>
<xsl:number/>
<xsl:text>]: </xsl:text>
<xsl:value-of select="."/>
</xsl:template>

Why does index-of() return multiple values when applied to a sequence of unique nodes?

I'm using xpath2's index-of value to return the index of current() within a sorted sequence of nodes. Using SAXON, the sorted sequence of nodes are unique, yet index-of returns a sequence of two values.
This does not happen all the time, just very occasionally, but not for any reason I can find. Can someone please explain what is going on?
I have worked up a minimal example based on an example of data that routines gives this odd behavior.
The source data is:
<data>
<student userID="1" userName="user1"/>
<session startedOn="01/16/2012 15:01:18">
</session>
<session startedOn="11/16/2011 13:31:33">
</session>
</data>
My xsl document puts the session nodes into a sorted sequence $orderd at the very top of the root template:
<xsl:template match="/">
<xsl:variable name="nodes" as="node()*" select="/data/session"></xsl:variable>
<xsl:variable name="orderd" as="node()*">
<xsl:for-each select="$nodes">
<xsl:sort select="xs:dateTime(xs:dateTime(concat(substring(normalize-space(#startedOn),7,4),'-',substring(normalize-space(#startedOn),1,2),'-',substring(normalize-space(#startedOn),4,2),'T',substring(normalize-space(#startedOn),12,8)))
)" order="ascending"/>
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:variable>
Since the nodes were already ordered by #startOn but in the opposite order, the sequence $orderd should be the same as document-ordered sequence $nodes, except in reverse order.
When I create output using a for-each statement, I find that somehow the two nodes are seen as identical when tested using index-of.
The code below is used to output data (and comes immediately after the chunk above):
<output>
<xsl:for-each select="$nodes">
<xsl:sort select="position()" order="descending"></xsl:sort>
<xsl:variable name="index" select="index-of($orderd,current())" as="xs:integer*"></xsl:variable>
<xsl:variable name="pos" select="position()"></xsl:variable>
<session reverse-documentOrder="{$pos}" sortedOrder="{$index}"/>
</xsl:for-each>
</output>
As the output (shown below) indicates, the index-of function is returning the sequence (1,2), meaning that it sees both nodes as identical. I have checked the expression used to sort the values, and it produces distinct and well-formed date-Time strings.
<output>
<session reverse=documentOrder="1"
sortedOrder="1 2"/>
<session reverse-documentOrder="2"
sortedOrder="1 2"/>
</output>
Not relying on the generate-id() function, which is XSLT function, but not XPath function, one can write a simple index-of() function that operates on node identity:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vNum3" select="/*/*[3]"/>
<xsl:variable name="vSeq" select="/*/*[1], /*/*[3], /*/*[3]"/>
<xsl:template match="/">
<xsl:sequence select="my:index-of($vSeq, $vNum3)"/>
</xsl:template>
<xsl:function name="my:index-of" as="xs:integer*">
<xsl:param name="pSeq" as="node()*"/>
<xsl:param name="pNode" as="node()"/>
<xsl:for-each select="$pSeq">
<xsl:if test=". is $pNode">
<xsl:sequence select="position()"/>
</xsl:if>
</xsl:for-each>
</xsl:function>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
the wanted, correct result is returned:
2 3
Explanation: Use of the is operator.
The documentation http://www.w3.org/TR/xpath-functions/#func-index-of of index-of says "The items in the sequence $seqParam are compared with $srchParam under the rules for the eq operator. Values of type xs:untypedAtomic are compared as if they were of type xs:string.". So you are trying to compare untyped element nodes and that means they are compared as strings and both session elements have the same white space only string contents. That way both are compared as equal.
I am not sure what to suggest as I am not sure what you want to achieve but I hope the above explains the result you get.

XSLT 2.0 - looping over nodeset variable, but need to process other elements in loop as well

I have a XML like this:
<?xml version="1.0" encoding="UTF-8"?>
<nodes>
<n c="value2"/>
<n>Has a relation to node with value2</n>
<n>Has a relation to node with value2</n>
<n c="value"/>
<n>Has a relation to node with value</n>
<n c="value1"/>
<n>Has a relation to node with value1</n>
</nodes>
I sort all elements which have attributes in variable, then I iterate over this variable in for-each loop. But at the end of each loop, I need to print value of those elements which are below the currently processed element(in original XML) and have no atrribute.
That means: call apply-templates on <n> without attribute, but the "select" attr. in apply-templates does not work, probably because I´m now in variable loop.
Is there a solution for that?
Thanks
Here is the XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="nodes">
<xsl:variable name="sorted">
<xsl:for-each select="n[#c]">
<xsl:sort select="#c"></xsl:sort>
<xsl:copy-of select="."></xsl:copy-of>
</xsl:for-each>
</xsl:variable>
<xsl:for-each select="$sorted/n">
<xsl:value-of select="#c"></xsl:value-of>
<xsl:apply-templates select="/nodes/n[2]"></xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="n[not(#c)]">
<xsl:value-of select="."></xsl:value-of>
</xsl:template>
</xsl:stylesheet>
This is just example,all this is a part of bigger project:)
Desired output with a more complicated XPAth(now even the simple one does not work) is:
Value
Has a relation to node with value
Value1
Has a relation to node with value1
Value2
Has a relation to node with value2
Has a relation to node with value2
Is it a bit clearer now?
Some thoughts: apply-templates without a select processes the child node of the current context node; in your input sample the n elements do not have any children at all. Furthermore in your variable you do a copy-of meaning you create new nodes that have no relation to the nodes in the input sample. So while I am not sure what you want to achieve your construction with apply-templates inside the for-each does not make sense, given the input sample you have posted and the variable you use.
I suspect you could use the XSLT 2.0 for-each-group group-starting-with as in
<xsl:template match="nodes">
<xsl:for-each-group select="n" group-starting-with="n[#c]">
<xsl:sort select="#c"/>
<xsl:value-of select="#c"/>
<xsl:apply-templates select="current-group() except ."/>
</xsl:for-each-group>
</xsl:template>
If that does not help then consider to post a small input sample with sample data and the corresponding output sample you want to create with XSLT 2.0, then we can make suggestions on how to achieve that.
[edit] Now that you have posted an output sample I post an enhanced version of my previous suggestion:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text"/>
<xsl:template match="nodes">
<xsl:for-each-group select="n" group-starting-with="n[#c]">
<xsl:sort select="#c"/>
<xsl:value-of select="#c"/>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="current-group() except ."/>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="n[not(#c)]">
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
When I use Saxon 9.3 and run the stylesheet against your latest input sample the result is as follows:
value
Has a relation to node with value
value1
Has a relation to node with value1
value2
Has a relation to node with value2
Has a relation to node with value2
That is what you asked for I think so try that approach with your more complex real input.

Resources