Any limitation with Saxon-EE XSLT v3 Streaming? - saxon

I want to apply different tansformations to a big XML document using the Saxon XSLT3 streaming capabilities. The problem that I'm facing is that, if I apply this transformation it does not work:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="ano contextutil" xmlns:ano="java:StreamingGenericProcessor"
xmlns:contextutil="java:GenericAnonymizerContextUtil">
<xsl:mode streamable="yes"/>
<xsl:output method="xml"/>
<xsl:param name="context" as="class:java.lang.Object" xmlns:class="http://saxon.sf.net/java-type"/>
<xsl:template match="internal/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="id/text()"><xsl:value-of select="ano:uuid($context, current(), 'ID')"/></xsl:template>
<xsl:template match="." >
<xsl:copy validation="preserve">
<xsl:apply-templates select="#*" />
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
But with this one it does:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="ano contextutil" xmlns:ano="java:StreamingGenericProcessor"
xmlns:contextutil="java:GenericAnonymizerContextUtil">
<xsl:mode streamable="yes"/>
<xsl:output method="xml"/>
<xsl:param name="context" as="class:java.lang.Object" xmlns:class="http://saxon.sf.net/java-type"/>
<xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="id/text()"><xsl:value-of select="ano:uuid($context, current(), 'ID')"/></xsl:template>
<xsl:template match="." >
<xsl:copy validation="preserve">
<xsl:apply-templates select="#*" />
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I tested plenty of different scenarios and I concluded that if I have more than 9 "xsl:template" it does not work!
EDIT: it does not work means: on a specific tag named "id" I'm applying a java function. If I have more than 9 "xsl:template", the output is not modified and my java function is not called at all. I have no error message
EDIT2: If I replace the call to the java function with, for instance, "concat(current(), '_ID')", I have the same behaviour so this is not specific to the java function all.
EDIT3:
Here is a sample input data:
<?xml version="1.0" encoding="UTF-8"?>
<table>
<row>
<id>10</id>
<email>fake#fake.com</email>
<address>dsffe</address>
<birthday>10/2018</birthday>
<country>FR</country>
<external>zz</external>
<internal>ww</internal>
<name>Jean</name>
<phone>000000</phone>
<city>Dfegd</city>
</row>
<row>
<id>9</id>
<email>fake#fake2.com</email>
<address>sdfzefzef</address>
<birthday>11/2012</birthday>
<country>GB</country>
<external>xx</external>
<internal>yy</internal>
<name>Jean-Claude</name>
<phone>000000</phone>
<city>dd</city>
</row>
This xsl which always works:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:mode streamable="yes"/>
<xsl:output method="xml"/>
<xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="id/text()"><xsl:value-of select="concat(current(), '_ID')"/></xsl:template>
<xsl:template match="." >
<xsl:copy validation="preserve">
<xsl:apply-templates select="#*" />
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The problematic one (the same xsl with one more template):
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:mode streamable="yes"/>
<xsl:output method="xml"/>
<xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="internal/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
<xsl:template match="id/text()"><xsl:value-of select="concat(current(), '_ID')"/></xsl:template>
<xsl:template match="." >
<xsl:copy validation="preserve">
<xsl:apply-templates select="#*" />
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I run with the following command line:
java -cp Saxon-EE-9.8.0-14.jar net.sf.saxon.Transform -s:test.xml -xsl:concat_not_working.xsl
The working XSL properly append _ID to the output id tag value whereas the
not working xsl does not do any transformation.
Another information, if I run without the license (so without streaming), both stylesheets work!
I'm using Saxon-EE 9.8.0-14 with a trial license: could it be a non documented trial license limitation ?

Your theory that the failure occurs with 10 or more rules turns out to be spot on. When there are more than 10 rules matching the same node-kind/node-name combination (in this case, all text nodes), Saxon-EE attempts to avoid a linear search of all the rules by looking for criteria that subsets of the rules share in common. In this case it is looking to see whether it can group the rules according to a precondition based on the parent of the text node.
At this stage there is a flaw in the logic; it carefully works out that each rule is in a group of 1 (no two parent conditions are the same), which should mean that it then abandons the optimization attempt. But it doesn't abandon it; it carries on. This shouldn't matter, because the optimization should work correctly even though it was pointless.
The reason the optimization isn't working correctly is because on the streaming path for xsl:apply-templates, the context data for evaluating the rule preconditions isn't being initialized properly, leading the rule matcher to think that the preconditions aren't satisfied.
So you've hit a bug that, as you surmised, applies when you have a set of 10 or more template rules in a streaming mode when the rules all match nodes that have the same node-kind and node-name.
Running unlicensed bypasses the bug for two reasons: it deactivates the optimization of rule chains, and it deactivates streaming.
As a workaround, simply remove the /text() from each of your template rules.
Logged as a bug here: https://saxonica.plan.io/issues/3901
Unless you indicate otherwise, I will submit a new test case based on your test data and stylesheet to the W3C test suite for XSLT 3.0.

Related

Error message 'adc' is an undeclared namespace

All, am im relatively new to XSLT but did some solutions allready. Currently i am trying to write an xslt that moves a file to different folders based on if
<adc:Status>uploaded</adc:Status>
is "uploaded" or not.
Despite declasring the xmlns:adc Namespace i get the error
Exception Occured during TransForm using XSLT 2.0 (SaxonHe):
System.Xml.XmlException: 'adc' is an undeclared namespace. Line 2,
position 4.
I spent hours looking for the error messages, trying to getmyself into namespaces, but currently i cant seem to find out whats the cause of the error.
This is the xslt i wrote
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:adc="http://www.ifra.com/adconnexion/#v2"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs xsi adc">
<xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="/">
<xsl:variable name="Filename">
<xsl:value-of select="replace(document-uri(.), '.*/', '')"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="/adc:adConnexion/adc:Requests/adc:AdOrder/adc:ProductionDetail/adc:Material/adc:Status='uploaded'">
<!-- Business case -->
<xsl:variable name="OutputFileName" select="concat('file:///C:\UPLOADED\',$Filename)"/>
<xsl:result-document href="{$OutputFileName}">
<xsl:copy-of select="current()"/>
</xsl:result-document>
</xsl:when>
<xsl:otherwise>
<!-- Non Business case -->
<xsl:variable name="OutputFileName" select="concat('file:///C:\ELSE\',$Filename)"/>
<xsl:result-document href="{$OutputFileName}">
<xsl:copy-of select="current()"/>
</xsl:result-document>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The simplified inputfile is
<adc:adConnexion xmlns:adc="http://www.ifra.com/adconnexion/#v2">
<adc:Requests>
<adc:AdOrder messageClass="BusinessTransaction" messageID="FEA*" bookingID="1234" messageCode="AD-O">
<adc:ProductionDetail>
<adc:Material>
<adc:Status>uploaded</adc:Status>
</adc:Material>
</adc:AdOrder>
</adc:Requests>
</adc:adConnexion>
Any help is greatly appreciated!

XSLT set directory where result document ends up

The XSLT below creates result-documents as desired, with one exception: the result document ends up in the directory where the stylesheet was invoked from. I want the result document to be where it was found (i.e. overwrite itself with the transform version).
How can I do that?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<xsl:for-each select="collection(iri-to-uri('file:///home/paul/Text/?select=*.xhtml'))">
<xsl:variable name="filename">
<xsl:value-of select="tokenize(document-uri(.), '/')[last()]"/>
</xsl:variable>
<xsl:result-document indent="yes" method="xml" href="{$filename}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- transform templates removed -->
</xsl:stylesheet>
Try just using href="{document-uri(.)}" to use the full uri as the target rather than doing the tokenize to pull out the last segment.

How do I merge and concatenate the data from each row in two separate source files?

I have two source files which I need to combine on a row by row basis. I am happy reading the files into a variable and I am happy with the logic but the syntax has me stumped. For each row in file 1 I need to loop round each row in file 2 and output the two variables concatenated together:
File 1:
<rows>
<row>1</row>
<row>2</row>
<row>3</row>
<row>4</row>
</rows>
File 2:
<rows>
<row>a</row>
<row>b</row>
</rows>
Required output:
<rows>
<row>1/a</row>
<row>1/b</row>
<row>2/a</row>
<row>2/b</row>
<row>3/a</row>
<row>3/b</row>
<row>4/a</row>
<row>4/b</row>
<rows>
My (poor) attempt at getting the XSLT to work:
<rows>
<xsl:apply-templates select="document('file1.xml')/rows/row" />
</rows>
<xsl:template match="row">
<xsl:apply-templates select="document('file2.xml')/rows/row" />
</xsl:template>
<xsl:template match="row">
<row><xsl:value-of select="???" />/<xsl:value-of select="???" /></row>
</xsl:template>
(These files are simplified versions of what I actually have)
How do I make one template match one 'row' value and the other match another (both source files use the same structure). And how do I set those '???' values?
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vDoc2">
<rows>
<row>a</row>
<row>b</row>
</rows>
</xsl:variable>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<rows>
<xsl:apply-templates/>
</rows>
</xsl:template>
<xsl:template match="row">
<xsl:apply-templates select="$vDoc2/*/row" mode="doc2">
<xsl:with-param name="pValue" select="."/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="row" mode="doc2">
<xsl:param name="pValue" />
<row><xsl:sequence select="concat($pValue, '/', .)"/></row>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided first XML document:
<rows>
<row>1</row>
<row>2</row>
<row>3</row>
<row>4</row>
</rows>
the wanted, correct result is produced:
<rows>
<row>1/a</row>
<row>1/b</row>
<row>2/a</row>
<row>2/b</row>
<row>3/a</row>
<row>3/b</row>
<row>4/a</row>
<row>4/b</row>
</rows>

xslt: keeping namespace declaration on root when root element is not known in advance

I have xml documents that follow a schema where most of the defined elements are allowed to be the root of a valid instance. I also have several xslt's v2.0 which translate it in various ways (put it into a normal form, a compact form, a different dialect, ...) These xslt's are all based on an identity transform with templates added to make the desired modification. The problem is that there is a proliferation of namespace attributes because there are some elements that come from outside the default namespace.
I have tried the recommended procedures for inserting the namespace on the root element, but I can't seem to get it right. The issues are:
1. the transformation may change the name, and sometimes the content of the root element, so I still need the templates for each of the global elements, and since I don't know which one will be root, I can't just insert namespace elements where needed (I don't know where they will be needed for a particular document.
2. I thought about implementing this as multi-pass, or simply an independent xslt, since I want the same result for several different xslts. In this case, what I would need is an identity transform that takes all the namespaces and prefixes from all elements in the document, and inserts them into the root. This would, I hope, automatically remove the namespace attributes from the children? However, I tried the following
<?xml version="1.0" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template name="start" match="/">
<xsl:copy>
<xsl:for-each select="*">
<xsl:copy>
<xsl:for-each select="descendant::*">
<xsl:call-template name="add-ns">
<xsl:with-param name="ns-namespace">
<xsl:value-of select="namespace-uri()"/>
</xsl:with-param>
<xsl:with-param name="ns-prefix">
<xsl:value-of
select=" prefix-from-QName( QName(namespace-uri(),name()))"/>
</xsl:with-param>
</xsl:call-template>
</xsl:for-each>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:template name="add-ns">
<xsl:param name="ns-prefix" select="'x'"/>
<xsl:param name="ns-namespace" select="'someNamespace'"/>
<xsl:namespace name="{$ns-prefix}" select="$ns-namespace"/>
</xsl:template>
<xsl:template match="node()|#* ">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And this works for all prefixes that appear on elements, but it doesn't catch the prefixes of attributes. Here is a test document:
<RuleML xmlns="http://www.ruleml.org/0.91/xsd">
<Assert textiri="xy>z">
<Importation xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="abc"
textiri="urn:common-logic:demo1"
xlink:href="http://common-logic.org/x>cl/demos.xml"/>
<a:anything xmlns:a="http://anything.org"
xmlns:xlink="http://www.w3.org/1999/xlink"/>
</Assert>
</RuleML>
I want it to produce:
<RuleML xmlns="http://www.ruleml.org/0.91/xsd" xmlns:a="http://anything.org" xmlns:xlink="http://www.w3.org/1999/xlink" >
<Assert textiri="xy>z">
<Importation xml:id="abc"
textiri="urn:common-logic:demo1"
xlink:href="http://common-logic.org/x>cl/demos.xml"/>
<a:anything/>
</Assert>
</RuleML>
but instead I get
<RuleML xmlns="http://www.ruleml.org/0.91/xsd" xmlns:a="http://anything.org">
<Assert textiri="xy>z">
<Importation xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="abc"
textiri="urn:common-logic:demo1"
xlink:href="http://common-logic.org/x>cl/demos.xml"/>
<a:anything xmlns:xlink="http://www.w3.org/1999/xlink"/>
</Assert>
</RuleML>
Tara
Does the following do what you want?
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:template match="#* | node()">
<xsl:copy copy-namespaces="no">
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:copy-of select="descendant::*/namespace::*"/>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
With Saxon 9.3 it seems to do the job on the sample you posted.
I am however not sure what you want to do if there are several elements in different default namespaces or several elements in different namespaces but using the same prefix. For instance with
<root xmlns="http://example.com/ns1">
<foo xmlns="http://example.com/ns2">
<pf:bar xmlns:pf="http://example.com/ns3">
<pf:foobar xmlns:pf="http://example.com/ns4"/>
</pf:bar>
</foo>
</root>
Saxon simply reports the error
Error at xsl:copy-of on line 15 of test2011061801Xsl2.xsl:
XTDE0430: Cannot create two namespace nodes with the same prefix mapped to different URIs
(prefix="", URI=http://example.com/ns2, URI=http://example.com/ns1)
in built-in template rule
[edit]
If you don't want an error to be reported you could try to implement a strategy to pull up namespace nodes as far up as possible but to avoid any collisions. That can be done with for-each-group, as in the following sample XSLT 2.0:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:template match="#* | text() | processing-instruction() | comment()">
<xsl:copy/>
</xsl:template>
<xsl:template match="*">
<xsl:copy copy-namespaces="no">
<xsl:for-each-group select="descendant-or-self::*/namespace::*" group-by="local-name()">
<xsl:copy-of select="."/>
</xsl:for-each-group>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
With the input being
<root xmlns="http://example.com/ns1">
<foo xmlns="http://example.com/ns2">
<pf:bar xmlns:pf="http://example.com/ns3">
<pf:foobar xmlns:pf="http://example.com/ns4"/>
</pf:bar>
</foo>
</root>
Saxon 9.3 outputs
<?xml version="1.0" encoding="UTF-8"?><root xmlns="http://example.com/ns1" xmlns:pf="http://example.com/ns3">
<foo xmlns="http://example.com/ns2">
<pf:bar>
<pf:foobar xmlns:pf="http://example.com/ns4"/>
</pf:bar>
</foo>
</root>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="*:RuleML">
<xsl:copy>
<xsl:for-each select="descendant::node()">
<xsl:choose>
<xsl:when test="self::text()"/>
<xsl:otherwise>
<xsl:for-each select="namespace::node()">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
<xsl:apply-templates select="(node() | #*) except namespace::node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="(node() | #*) except namespace::node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

XSLT conditionally write to two different files

I need to extract log meesages from an XML file and write them out to plain text files. The log messages come in two flavors, and I want to write them to separate files.
I have written a style sheet that does exactly what I need except that it sometimes creates empty files because the XML file may not contain messages of one type or another.
I am wondering, 1) if what I ma doing is the best method to do this, and 2) if there is a way to suppress empty files.
My sample may contain errors because it has been retyped. (the original is on a closed network)
Note: I am using XSLT 2.0 features.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text" encoding="iso-8859-1" />
<xsl:param name="break" select="string('
')" />
<xs:template match="/">
<xsl:result-document method="text" href="foo.txt">
<xsl:apply-templates select="Root/a/b/c[contains(., 'foo')]" />
</xsl:reult-document>
<xsl:result-document method="text" href="bar.txt">
<xsl:apply-templates select="Root/a/b/c[not(contains(., 'foo'))]" />
</xsl:reult-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select=concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
You could use some XSLT 2.0 stylesheet like:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="break" select="string('
')" />
<xsl:template match="/">
<xsl:apply-templates select="Root/a/b/c"/>
</xsl:template>
<xsl:template match="/Root/a/b/c[contains(., 'foo')]">
<xsl:result-document method="text" href="foo.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="/Root/a/b/c[not(contains(., 'foo'))]">
<xsl:result-document method="text" href="bar.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
Note: Pattern matching and xsl:next-match.

Resources