XSLT string manipulation - xslt-2.0

Could someone tell me the easiest way to fix below ? I currently have a file containing a variety of ways to define cross references (basically links to other pages), and I want to convert 2 of them to a single format. Below XML is a simplified sample showing source format :
<Paras>
<Para tag="CorrectTag">
<local xml:lang="en">Look at this section <XRef XRefType="(page xx)">(page 36)</XRef> for more information</local>
</Para>
<Para tag="InCorrectTag">
<local xml:lang="en">Look at some other section (page <XRef XRefType="xx">52</XRef>) for more information</local>
</Para>
</Paras>
What I want to achieve is the following :
<Paras>
<Para tag="CorrectTag">
<local xml:lang="en">Look at this section <XRef XRefType="(page xx)" XRefPage="36"/> for more information</local>
</Para>
<Para tag="InCorrectTag">
<local xml:lang="en">Look at some other section <XRef XRefType="(page xx)" XRefPage="52"/> for more information</local>
</Para>
</Paras>
Using below xslt to transform the [XRef] element
<xsl:template match="XRef">
<xsl:copy>
<xsl:attribute name="XRefType">(page xx)</xsl:attribute>
<xsl:choose>
<xsl:when test="#XRefType='(page xx)'">
<xsl:attribute name="XRefPage" select="substring-before(substring-after(.,'(page '),')')"/>
</xsl:when>
<xsl:when test="#XRefType='xx'">
<xsl:attribute name="XRefPage" select="."/>
</xsl:when>
</xsl:choose>
</xsl:copy>
</xsl:template>
already gives me this output :
<Paras>
<Para tag="CorrectTag">
<local xml:lang="en">Look at this section<XRef XRefType="(page xx)" XRefPage="36"/>for more information</local>
</Para>
<Para tag="InCorrectTag">
<local xml:lang="en">Look at some other section (page<XRef XRefType="(page xx)" XRefPage="52"/>) for more information</local>
</Para>
</Paras>
Which is already solving most of my problem but I'm stuck on how I would get the rest of the [local] element cleaned without removing too much other content.
What I need is something like : if the string "(page " is followed by an XRef element , then remove it. If the string ")" is preceded by an XRef element, remove it. Otherwise, don't touch them.
Any advice on how to tackle this ?
Thanks is advance !

You should be able to tackle that with templates e.g.
<xsl:template match="text()[ends-with(., '(page ')][following-sibling::node()[1][self::XRef]]">
<xsl:value-of select="replace(., '(page $', '')"/>
</xsl:template>
<xsl:template match="text()[starts-with(., ')')][preceding-sibling::node[1][self::XRef]">
<xsl:value-of select="substring(., 2)"/>
</xsl:template>
Of course you need to make sure that any templates for parent element of those text nodes do an apply-templates to process the child nodes.

Related

Conditional Formatting XSL

I'm trying to make <sup> elements superscripted when I encounter them. I'm iterating over a large file which I can include if required, basically <xml><article><body><p><em></em><sup></sup></p></body></article></xml>
I'm receiving:
Error reported by XML parser: The element type "fo:inline" must be terminated by
the matching end-tag "</fo:inline>"
when trying to use the below to raise the superscripts:
<xsl:for-each select="*">
<fo:block>
<xsl:if test="name() = 'sup'">
<fo:inline vertical-align='super' baseline-shift='4pt'>
</xsl:if>
<xsl:apply-templates select="." mode="xhtml"/>
<xsl:if test="name() = 'sup'">
</fo:inline>
</xsl:if>
</fo:block>
</xsl:for-each>
How can I correct this so the vertical-align='super' is only for sup elements; and is there a better approach to this? I plan to do the same for ems later.
My code which I use currently but puts everything out as plain text is:
<xsl:for-each select="*">
<fo:block><xsl:apply-templates select="." mode="xhtml"/></fo:block>
</xsl:for-each>
If you want to transform <sup></sup> to <fo:inline vertical-align='super' baseline-shift='4pt'></fo:inline> then the usual way with XSLT is to set up a template
<xsl:template match="sup">
<fo:inline vertical-align='super' baseline-shift='4pt'>
<xsl:apply-templates/>
</fo:inline>
</xsl:template>
I am not sure whether you want to do that in general or for a particular mode (in that case add mode="mode-name" on the xsl:template and mode="#current" on the xsl:apply-templates).

pattern not matching though declared

I've the below XML.
<root>
<para>
<label>5.</label> In essence, the Court in <star.page>19</star.page>
</para>
<para>
<label><star.page>21</star.page> 13.</label> Frankly, I cannot see how
one can escape
</para>
</root>
and using the below XSLT.
<xsl:template match="para">
<xsl:apply-templates select="./node()[1][self::star.page]|./label/node()[1][self::star.page]" mode="first"/>
</xsl:template>
<xsl:template match="star.page" mode="first">
<xsl:if test="preceding::star.page">
<xsl:processing-instruction name="pb">
<xsl:text>label='</xsl:text>
<xsl:value-of select="."/>
<xsl:text>'</xsl:text>
<xsl:text>?</xsl:text>
</xsl:processing-instruction>
<a name="{concat('pg_',.)}"/>
</xsl:if>
</xsl:template>
here when i try to run this code, the first para star.page is getting caught, but the second star.page, i.e. <para><label><star.page>21</star.page> 13.</label>... is not getting caught. please let me know where am i going wrong. here i'm taking [1], since i want to catch the first occurance.
Thanks
I just tried your code on xmlplayground, both the star.page elements reach the template but the if clause is preventing the first from reaching the output.

XSLT 2 - pick item from tokenize()'d list by index

My environment is SAXON (last nights build) using XSLT 2.0. My real problem is that the XML document specification is sub-optimal, and in a way, my problem relates to fixing/working around that design issue.
I have a node type (<weaponmodesdata>) where all the direct children are |-separated string lists of 1-or-many elements (each child of the same <weaponmodesdata> will have the same length). I need to go over the various modes represented and "unspin" them out to separate item lists (in plain text), rather than having them all smooshed together.
Unfortunately right now I'm getting a really stubborn
XPTY0020: Required item type of the context item for the child axis is node(); supplied
value has item type xs:string
error on the lines where I pass the node that needs to be split up into my little template.
Currently I have
<xsl:template match="trait" mode="attack">
<xsl:for-each select="tokenize(weaponmodesdata/mode, '\|')">
<xsl:variable name="count" select="position()"/>
<xsl:value-of select="name"/><xsl:text> - </xsl:text>
<xsl:call-template name="split_weaponmode">
<xsl:with-param name="source" select="weaponmodesdata/damage"/>
<xsl:with-param name="item" select="$count"/>
</xsl:call-template>
<xsl:text> </xsl:text>
<xsl:call-template name="split_weaponmode">
<xsl:with-param name="source" select="weaponmodesdata/damtype"/>
<xsl:with-param name="item" select="$count"/>
</xsl:call-template>
<!-- more will go here eventually -->
<xsl:text>.
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template name="split_weaponmode">
<xsl:param name="source"/>
<xsl:param name="item"/>
<xsl:variable name="parts" select="tokenize($source, '\|')"/>
<xsl:for-each select="$parts">
<xsl:if test="position() = $item">
<xsl:value-of select="."/>
</xsl:if>
</xsl:for-each>
</xsl:template>
An example XML subtree relating to my issue:
<character>
<trait id="1">
<name>Spear</name>
<weaponmodesdata>
<mode>1H Thrust|2H Thrust|Thrown</mode>
<damage>thr+2|thr+3|thr+3</damage>
<damtype>imp|imp|imp</damtype>
</weaponmodesdata>
</trait>
<trait id="2">
<name>Broadsword</name>
<weaponmodesdata>
<mode>1H Thrust|1H Swing</mode>
<damage>thr+1|sw+2</damage>
<damtype>imp|cut</damtype>
</weaponmodesdata>
</trait>
</character>
Example desired output:
Spear - 1H Thrust; thr+2 imp.
Spear - 2H Thrust; thr+3 imp.
Spear - Thrown; thr+3 imp.
Broadsword - 1H Thrust; thr+1 imp.
Broadsword - 1H Swing; sw+2 cut.
One issue (that one causing the error message) with your code is that your for-each operates on a sequence of string value (i.e. inside the for-each body the context item is a string value), yet you have relative XPath expressions like weaponmodesdata/damage that require a context node to makes sense. So you would need to use a variable outside of the for-each to store your context node.
But I think you can simplify your code to
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="trait">
<xsl:variable name="this" select="."/>
<xsl:variable name="count" select="count(tokenize(weaponmodesdata/*[1], '\|'))"/>
<xsl:for-each-group select="weaponmodesdata/*/tokenize(., '\|')" group-by="position() mod $count">
<xsl:value-of select="$this/name"/>
<xsl:text> - </xsl:text>
<xsl:value-of select="current-group()"/>
<xsl:text>.
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
If you want to stick with your approach of calling templates then make sure you store the context node of the template using e.g. <xsl:variable name="this" select="."/> so that you can access it inside of the for-each iterating over a string item.

Use a dynamic match in XSLT

I have an external document with a list of multiple Xpath like this:
<EncrypRqField>
<EncrypFieldRqXPath01>xpath1</EncrypFieldRqXPath01>
<EncrypFieldRqXPath02>xpath2</EncrypFieldRqXPath02>
</EncrypRqField>
I use this document to obtain the Xpath of the nodes I want to be modified.
The input XML is:
<Employees>
<Employee>
<id>1</id>
<firstname>xyz</firstname>
<lastname>abc</lastname>
<age>32</age>
<department>xyz</department>
</Employee>
</Employees>
I want to obtain something like this:
<Employees>
<Employee>
<id>XXX</id>
<firstname>xyz</firstname>
<lastname>abc</lastname>
<age>XXX</age>
<department>xyz</department>
</Employee>
</Employees>
The XXX values are the result of a data encryption, I want to dynamically obtain the Xpath from the document and change the value of its node.
Thanks.
I'm not sure if something like this is possible in XSL 2.0. May be in 3.0 there should be some function evaluate() but I don't know any details.
But I tried some workaround and it seems to be functional. Of course it is not perfect and has many limitations in this form (e.g. you need to specify absolute path, you cannot use more complex XPath like //, [], etc.) so consider it just as an idea. But it could be the way in some easier cases.
It is based on comparing of two string instead of evaluation string as XPath.
Simplified xml with xpaths to encrypt (I ommit the number for simplicity).
<?xml version="1.0" encoding="UTF-8"?>
<EncrypRqField>
<EncrypFieldRqXPath>/Employees/Employee/id</EncrypFieldRqXPath>
<EncrypFieldRqXPath>/Employees/Employee/age</EncrypFieldRqXPath>
</EncrypRqField>
And my transformation
<xsl:template match="element()">
<xsl:variable name="pathToElement">
<xsl:call-template name="getPath">
<xsl:with-param name="element" select="." />
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$xpaths/EncrypFieldRqXPath[text() = $pathToElement]">
<!-- If exists element with exacty same value as constructed "XPath", ten "encrypt" the content of element -->
<xsl:copy>
<xsl:text>XXX</xsl:text>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- This template will "construct" the XPath for element under investigation. -->
<!-- There might be an easier way (e.g. some build-in function), but it is actually out of my skill. -->
<xsl:template name="getPath">
<xsl:param name="element" />
<xsl:choose>
<xsl:when test="$element/parent::node()">
<xsl:call-template name="getPath">
<xsl:with-param name="element" select="$element/parent::node()" />
</xsl:call-template>
<xsl:text>/</xsl:text>
<xsl:value-of select="$element/name()" />
</xsl:when>
<xsl:otherwise />
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Eliminate line-breaks with XSLT 2.0 analyze-string

I use the XSLT 2.0 element analyze-string in a stylesheet that transforms XML to HTML; specifically, I use it to convert string encoding for subscripts in chemical formulae to HTML subscripts. Therefore, the result is a string, to go in a p or td element, with embedded mark-up.
The transformation is supposed to produce output like H2O but in fact inserts a line-break in the HTML:
H
<sub>2</sub>O
and this break is (correctly) interpreted by the browser as a space:
H
2O
which is ugly.
Is there a way to remove the line-break? I've tried putting the whole analyze-string element on one line and that doesn't work.
The input would be something like
<OrdinaryStructralFormula>H$_2$O</OrdinaryStructuralFormula>
for a simple case and
<OrdinaryStructralFormula>C$_2$OH$_5$$^-</OrdinaryStructuralFormula>
for a more-complicated one. Note that the subscript pattern can match multiple times in the general case and can be either in the middle or at the end of the string. The pattern also has to match and eliminate any notation for charge: the $^- bit at the end of the second example.
The XSLT processor is Saxon 9.4 and the XSLT template follows.
<xsl:template name="formula">
<xsl:param name="formula"/>
<xsl:if test="$formula">
<xsl:variable name="f" select="translate($formula, '$', '')"/>
<xsl:analyze-string select="$f" regex="(_)(\d+)|(\^)\d*\+|(\^)\d*\-">
<xsl:matching-substring>
<xsl:if test="regex-group(1)='_'">
<sub><xsl:value-of select="regex-group(2)"/></sub>
</xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:template>
I cannot reproduce the reported result.
This transformation (which is what you should have given us, but you only provided a template):
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:call-template name="formula">
<xsl:with-param name="formula" select="/*"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="formula">
<xsl:param name="formula"/>
<xsl:if test="$formula">
<xsl:variable name="f" select="translate($formula, '$', '')"/>
<xsl:analyze-string select="$f" regex="(_)(\d+)|(\^)\d*\+|(\^)\d*\-">
<xsl:matching-substring>
<xsl:if test="regex-group(1)='_'">
<sub><xsl:value-of select="regex-group(2)"/></sub>
</xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document with Saxon 9.1.05:
<formula>H$_2$O</formula>
produces the wanted, correct result:
H<sub>2</sub>O
When the same transformation is applied on the second XML document:
<OrdinaryStructuralFormula>C$_2$OH$_5$$^-</OrdinaryStructuralFormula>
Again the wanted correct result is produced:
C<sub>2</sub>OH<sub>5</sub>
Do note: I ran the same transformations with two other XSLT 2.0 processors: XQSharp (XMLPrime) and AltovaXML (XML-SPY) and got exactly the same, correct results.

Resources