Eliminate line-breaks with XSLT 2.0 analyze-string - xslt-2.0

I use the XSLT 2.0 element analyze-string in a stylesheet that transforms XML to HTML; specifically, I use it to convert string encoding for subscripts in chemical formulae to HTML subscripts. Therefore, the result is a string, to go in a p or td element, with embedded mark-up.
The transformation is supposed to produce output like H2O but in fact inserts a line-break in the HTML:
H
<sub>2</sub>O
and this break is (correctly) interpreted by the browser as a space:
H
2O
which is ugly.
Is there a way to remove the line-break? I've tried putting the whole analyze-string element on one line and that doesn't work.
The input would be something like
<OrdinaryStructralFormula>H$_2$O</OrdinaryStructuralFormula>
for a simple case and
<OrdinaryStructralFormula>C$_2$OH$_5$$^-</OrdinaryStructuralFormula>
for a more-complicated one. Note that the subscript pattern can match multiple times in the general case and can be either in the middle or at the end of the string. The pattern also has to match and eliminate any notation for charge: the $^- bit at the end of the second example.
The XSLT processor is Saxon 9.4 and the XSLT template follows.
<xsl:template name="formula">
<xsl:param name="formula"/>
<xsl:if test="$formula">
<xsl:variable name="f" select="translate($formula, '$', '')"/>
<xsl:analyze-string select="$f" regex="(_)(\d+)|(\^)\d*\+|(\^)\d*\-">
<xsl:matching-substring>
<xsl:if test="regex-group(1)='_'">
<sub><xsl:value-of select="regex-group(2)"/></sub>
</xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:template>

I cannot reproduce the reported result.
This transformation (which is what you should have given us, but you only provided a template):
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:call-template name="formula">
<xsl:with-param name="formula" select="/*"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="formula">
<xsl:param name="formula"/>
<xsl:if test="$formula">
<xsl:variable name="f" select="translate($formula, '$', '')"/>
<xsl:analyze-string select="$f" regex="(_)(\d+)|(\^)\d*\+|(\^)\d*\-">
<xsl:matching-substring>
<xsl:if test="regex-group(1)='_'">
<sub><xsl:value-of select="regex-group(2)"/></sub>
</xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document with Saxon 9.1.05:
<formula>H$_2$O</formula>
produces the wanted, correct result:
H<sub>2</sub>O
When the same transformation is applied on the second XML document:
<OrdinaryStructuralFormula>C$_2$OH$_5$$^-</OrdinaryStructuralFormula>
Again the wanted correct result is produced:
C<sub>2</sub>OH<sub>5</sub>
Do note: I ran the same transformations with two other XSLT 2.0 processors: XQSharp (XMLPrime) and AltovaXML (XML-SPY) and got exactly the same, correct results.

Related

Return a value type of element from a function in XSLT

This is my XML
<report>
<format-inputs>
<narrative-entity-ids>
<entity id="28495795" type-cdf-meaning="DIAGNOSES"/>
<entity id="28495741" type-cdf-meaning="DIAGNOSES"/>
<entity id="28495471" type-cdf-meaning="DIAGNOSES"/>
</narrative-entity-ids>
</format-inputs>
</report>
I am creating a function in commonFunction.xslt
<xsl:function name="cdocfx:createEntityIdList" >
<xsl:param name="formatInputsNodes"/>
<xsl:if test="fn:exists(n:report/n:format-inputs)"
<xsl:variable name="entityIdList" as="element()*">
<xsl:for-each select="$formatInputsNodes/n:narrative-entity-ids/n:entity">
<Item><xsl:value-of select="#id"/></Item>
</xsl:for-each>
</xsl:variable>
</xsl:if>
<xsl:copy-of select="$entityIdList"/>
</xsl:function>
I am calling this function in other xslt file where commonFunction.xslt was included
<xsl:variable name="entityIdList" select="cdocfx:createEntityIdList(n:report/n:format-inputs)"/>
</xsl:variable>
My question is variable entityIdList should be value type of element but it is having the document-node type how can i achieve this ??
Please provide minimal but complete samples of XML input, XSLT you have, output you want and the one you get together with any exact error messages you have encountered.
I am currently not sure I understand what you want to achieve, if you construct a variable of type element()* you seem to want to construct a sequence of element nodes. Any xsl:value-of however will only output the string values of the selected items in a text node so it is not clear why you first construct elements if you only want to output string values. If you construct nodes and want to output them use xsl:copy-of or xsl:sequence, not xsl:value-of.
To show two examples of writing a function that returns a sequence of elements (i.e. whose result is of type element()*) I have set up https://xsltfiddle.liberty-development.net/3NzcBtE which has two functions
<xsl:function name="mf:ex1">
<xsl:param name="input" as="element()*"/>
<xsl:for-each select="$input">
<item>{ #id }</item>
</xsl:for-each>
</xsl:function>
<xsl:function name="mf:ex2">
<xsl:param name="input" as="element()*"/>
<xsl:variable name="elements" as="element()*">
<xsl:for-each select="$input">
<item>{ #id }</item>
</xsl:for-each>
</xsl:variable>
<xsl:sequence select="$elements"/>
</xsl:function>
the first simply directly constructs some result elements in the function body, that way the result is a sequence of element nodes. The second function uses your approach of constructing a sequence of elements nodes in a variable, the proper way to return that variable value then from the function is to use xsl:sequence.
It is not clear at which position of the posted code you think are dealing with a document-node() node.
Note also that
<xsl:choose>
<xsl:when test="fn:exists($formatInputsNodes/n:narrative-entity-ids)">
<xsl:for-each select="$formatInputsNodes/n:narrative-entity-ids/n:entity">
<Item><xsl:value-of select="#id"/></Item>
</xsl:for-each>
</xsl:when>
</xsl:choose>
can be simplified to
<xsl:for-each select="$formatInputsNodes/n:narrative-entity-ids/n:entity">
<Item><xsl:value-of select="#id"/></Item>
</xsl:for-each>
As you have now presented an XML input that is at least well-formed and some XSLT snippets (that are unfortunately not well-formed and seem to use a namespace although the XML input shown doesn't use one) here is an attempt to morph that into a working sample
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:cdocfx="http://example.com/cdox-functions"
exclude-result-prefixes="#all"
version="3.0">
<xsl:function name="cdocfx:createEntityIdList" >
<xsl:param name="formatInputsNodes"/>
<xsl:variable name="entityIdList" as="element()*">
<xsl:for-each select="$formatInputsNodes/narrative-entity-ids/entity">
<Item><xsl:value-of select="#id"/></Item>
</xsl:for-each>
</xsl:variable>
<xsl:copy-of select="$entityIdList"/>
</xsl:function>
<xsl:variable name="entityIdList" select="cdocfx:createEntityIdList(report/format-inputs)"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="$entityIdList instance of element()*, $entityIdList" separator=", "/>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/pPqsHTW/1
Output there for the check $entityIdList instance of element()* is true so I am not sure why you say you have a document node.

using a single namespace declaration for literal result element and element constructor methods

Given the XML source
<Content>
</Content>
and transformation:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
office:version="1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:template match="Content">
<xsl:element name="office:document">
<xsl:attribute name="office:version">1.2</xsl:attribute>
<xsl:attribute name="office:mimetype">application/vnd.oasis.opendocument.text</xsl:attribute>
<xsl:element name="office:body">
<xsl:element name="office:text">
<xsl:element name="text:p">Hello world.
</xsl:element>
<xsl:element name="text:p">Goodbye world.
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The result
<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
office:version="1.2"
office:mimetype="application/vnd.oasis.opendocument.text">
<office:body>
<office:text>
<text:p xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">Hello world.
</text:p>
<text:p xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">Goodbye world.
</text:p>
</office:text>
</office:body>
</office:document>
The namespace for paragraph elements is repeated. I want it applied to the root element to avoid this, as is the norm in odf files.
But if I add the namespaces to the root element, the XSL will include redundant namespace declarations, for the spreadsheet and root elements.
If I then remove the namespaces from the stylesheet element, I won't be able to add literal result elements in those namespaces.
I read in Kay's 4th edition reference p473 "Avoiding duplicate namespace declarations is entirely the job of the XSLT serializer.."
But I'm unable to leverage this insight to produce the required result.
For the included sample you get the included result as the elements generated with xsl:element don't have any namespaces in scope that you declare in the stylesheet element, they are only used to create an element in a certain namespace. It is not clear from that sample why you need xsl:element at all and can't simply use literal result elements.
If you really need to construct your root element with xsl:element, you can however construct a namespace node with <xsl:namespace name="text" select="'urn:oasis:names:tc:opendocument:xmlns:text:1.0'"/>. See https://xsltfiddle.liberty-development.net/jyH9rMg for an online example which transforms your input sample with
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
office:version="1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:template match="Content">
<xsl:element name="office:document">
<xsl:namespace name="text" select="'urn:oasis:names:tc:opendocument:xmlns:text:1.0'"/>
<xsl:attribute name="office:version">1.2</xsl:attribute>
<xsl:attribute name="office:mimetype">application/vnd.oasis.opendocument.text</xsl:attribute>
<xsl:element name="office:body">
<xsl:element name="office:text">
<xsl:element name="text:p">Hello world.
</xsl:element>
<xsl:element name="text:p">Goodbye world.
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
into
<office:document xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
office:version="1.2"
office:mimetype="application/vnd.oasis.opendocument.text">
<office:body>
<office:text>
<text:p>Hello world.
</text:p>
<text:p>Goodbye world.
</text:p>
</office:text>
</office:body>
</office:document>

How to apply regular expression with CDATA in XSLT 2.0

I'm using regular expressions with XSLT 2.0 to validate XMLs and and I don't know how to validate fields with elements CDATA like this XML:
<cac:Item>
<cbc:Description><![CDATA[ ]]></cbc:Description>
<cac:SellersItemIdentification>
<cbc:ID>8510</cbc:ID>
</cac:SellersItemIdentification>
</cac:Item>
Mi XSLT is:
<xsl:if test='not(matches(cac:Item/cbc:Description,"^.{1,250}$"))'>
This expression returns false because the element cbc:Description have 15 characters, but I just want to validate element inside CDATA. I tried with this template:
<xsl:template name="valueCDATA" match="text()">
<xsl:param name="node"/>
<xsl:value-of select="$node"/>
</xsl:template>
<xsl:variable name="getCDATA">
<xsl:call-template name="valueCDATA">
<xsl:with-param name="node" select="cac:Item/cbc:Description"/>
</xsl:call-template>
</xsl:variable>
But it doesn't work, what should I do?

XSLT 2 - pick item from tokenize()'d list by index

My environment is SAXON (last nights build) using XSLT 2.0. My real problem is that the XML document specification is sub-optimal, and in a way, my problem relates to fixing/working around that design issue.
I have a node type (<weaponmodesdata>) where all the direct children are |-separated string lists of 1-or-many elements (each child of the same <weaponmodesdata> will have the same length). I need to go over the various modes represented and "unspin" them out to separate item lists (in plain text), rather than having them all smooshed together.
Unfortunately right now I'm getting a really stubborn
XPTY0020: Required item type of the context item for the child axis is node(); supplied
value has item type xs:string
error on the lines where I pass the node that needs to be split up into my little template.
Currently I have
<xsl:template match="trait" mode="attack">
<xsl:for-each select="tokenize(weaponmodesdata/mode, '\|')">
<xsl:variable name="count" select="position()"/>
<xsl:value-of select="name"/><xsl:text> - </xsl:text>
<xsl:call-template name="split_weaponmode">
<xsl:with-param name="source" select="weaponmodesdata/damage"/>
<xsl:with-param name="item" select="$count"/>
</xsl:call-template>
<xsl:text> </xsl:text>
<xsl:call-template name="split_weaponmode">
<xsl:with-param name="source" select="weaponmodesdata/damtype"/>
<xsl:with-param name="item" select="$count"/>
</xsl:call-template>
<!-- more will go here eventually -->
<xsl:text>.
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template name="split_weaponmode">
<xsl:param name="source"/>
<xsl:param name="item"/>
<xsl:variable name="parts" select="tokenize($source, '\|')"/>
<xsl:for-each select="$parts">
<xsl:if test="position() = $item">
<xsl:value-of select="."/>
</xsl:if>
</xsl:for-each>
</xsl:template>
An example XML subtree relating to my issue:
<character>
<trait id="1">
<name>Spear</name>
<weaponmodesdata>
<mode>1H Thrust|2H Thrust|Thrown</mode>
<damage>thr+2|thr+3|thr+3</damage>
<damtype>imp|imp|imp</damtype>
</weaponmodesdata>
</trait>
<trait id="2">
<name>Broadsword</name>
<weaponmodesdata>
<mode>1H Thrust|1H Swing</mode>
<damage>thr+1|sw+2</damage>
<damtype>imp|cut</damtype>
</weaponmodesdata>
</trait>
</character>
Example desired output:
Spear - 1H Thrust; thr+2 imp.
Spear - 2H Thrust; thr+3 imp.
Spear - Thrown; thr+3 imp.
Broadsword - 1H Thrust; thr+1 imp.
Broadsword - 1H Swing; sw+2 cut.
One issue (that one causing the error message) with your code is that your for-each operates on a sequence of string value (i.e. inside the for-each body the context item is a string value), yet you have relative XPath expressions like weaponmodesdata/damage that require a context node to makes sense. So you would need to use a variable outside of the for-each to store your context node.
But I think you can simplify your code to
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="trait">
<xsl:variable name="this" select="."/>
<xsl:variable name="count" select="count(tokenize(weaponmodesdata/*[1], '\|'))"/>
<xsl:for-each-group select="weaponmodesdata/*/tokenize(., '\|')" group-by="position() mod $count">
<xsl:value-of select="$this/name"/>
<xsl:text> - </xsl:text>
<xsl:value-of select="current-group()"/>
<xsl:text>.
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
If you want to stick with your approach of calling templates then make sure you store the context node of the template using e.g. <xsl:variable name="this" select="."/> so that you can access it inside of the for-each iterating over a string item.

Complex XSL Transformation

I am still a beginner with XSLT but I am having a difficult task in hand.
I have a non-xml file which needs to be transformed. The format of the file is a s follows:
type1
type1line1
type1line2
type1line3
type2
type2line1
type2line2
type3
type3line1
type3line2
types (type1, type2, ...) are specified using certain codes which don't have a specific order. Each type has multiple line underneath.
So, I need to transform this file but the problem is that for each type I have to do a different transformation for each of it's underlying lines.
Now, I can read the string line by line and determine that a new type has begun but I don't know how to set a flag (indicating the type) to use it in the underlying lines.
Here is what I have right now:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
<xsl:param name="testString" as="xs:string">
type1
line1
line2
type1
line1
</xsl:param>
<xsl:template match="/">
<xsl:call-template name="main">
<xsl:with-param name="testString" select="$testString"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="main">
<xsl:param name="testString"/>
<xsl:variable name="iniFile" select="$testString"/>
<config>
<xsl:analyze-string select="$iniFile" regex="\n">
<xsl:non-matching-substring>
<item>
<xsl:choose>
<xsl:when test="starts-with(., 'type1')">
<!-- do a specific transformation-->
</xsl:when>
<xsl:when test="starts-with(., 'type2')">
<!-- do another transformation-->
</xsl:when>
</xsl:choose>
</item>
</xsl:non-matching-substring>
</xsl:analyze-string>
</config>
</xsl:template>
</xsl:stylesheet>
Any idea about how to solve the problem.
I think XSLT 2.1 will allow you to use its powerful stuff like for-each-group on sequences of atomic values like strings but with XSLT 2.0 you have such powerful features only for sequences of nodes so my first step when using XSLT 2.0 with plain string data I want to process/group is to create elements. So you could tokenize your data, wrap each token into some element and then use for-each-group group-starting-with to process each group starting with some pattern like '^type[0-9]+$'.
You haven't really told us what you want to with the data once you have identified a group so take the following as an example you could adapt:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="xml" indent="yes"/>
<xsl:param name="input" as="xs:string">type1
type1line1
type1line2
type1line3
type2
type2line1
type2line2
type3
type3line1
type3line2</xsl:param>
<xsl:template name="main">
<xsl:variable name="lines" as="element(item)*">
<xsl:for-each select="tokenize($input, '\n')">
<item><xsl:value-of select="."/></item>
</xsl:for-each>
</xsl:variable>
<xsl:for-each-group select="$lines" group-starting-with="item[matches(., '^type[0-9]+$')]">
<xsl:choose>
<xsl:when test=". = 'type1'">
<xsl:apply-templates select="current-group() except ." mode="m1"/>
</xsl:when>
<xsl:when test=". = 'type2'">
<xsl:apply-templates select="current-group() except ." mode="m2"/>
</xsl:when>
<xsl:when test=". = 'type3'">
<xsl:apply-templates select="current-group() except ." mode="m3"/>
</xsl:when>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="item" mode="m1">
<foo>
<xsl:value-of select="."/>
</foo>
</xsl:template>
<xsl:template match="item" mode="m2">
<bar>
<xsl:value-of select="."/>
</bar>
</xsl:template>
<xsl:template match="item" mode="m3">
<baz>
<xsl:value-of select="."/>
</baz>
</xsl:template>
</xsl:stylesheet>
When applied with Saxon 9 (command line options -it:main -xsl:sheet.xsl) the result is
<foo>type1line1</foo>
<foo>type1line2</foo>
<foo>type1line3</foo>
<bar>type2line1</bar>
<bar>type2line2</bar>
<baz>type3line1</baz>
<baz>type3line2</baz>

Resources