Replace nbsp with another tag - xslt-2.0

In my xml file I have a tag like this(within the p tags I have nbsp;)
Now I want to replace this nbsp with another tag(as an example within p tag I want to insert another tag called s <s/>)
Is this possible to do.Please help

First note that the tree on which XSLT operates never contains a character or entity reference, it simply contains an Unicode character. To match on and replace an Unicode character with an element you can use analyze-string:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()">
<xsl:analyze-string select="." regex=" ">
<xsl:matching-substring>
<s/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
That way an input document like
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
<!ENTITY nbsp " ">
]>
<doc>
<p>This is a paragraph with a non-breaking space before some sub text.</p>
</doc>
is transformed into the result document
<?xml version="1.0" encoding="UTF-8"?><doc>
<p>This is a paragraph with a non-breaking space <s/> before some sub text.</p>
</doc>

Related

How to filter nodes based on certain condition of the child node text

I have an XML file as shown below.
<COLLECTION>
<ChangedParts>
<Part>
<number>123456</number>
<DefaultUnit>each</DefaultUnit>
<FgOrComponent>FG</FgOrComponent>
<MasterPackUom/>
<CartonUom/>
</Part>
<Part>
<number>456789</number>
<DefaultUnit>each</DefaultUnit>
<FgOrComponent>COMPONENT</FgOrComponent>
<MasterPackUom/>
<CartonUom/>
</Part>
</ChangedParts>
</COLLECTION>
I am trying to use XSLT to transform the file. The file contains Part elements with FgOrComponent and some other elements as its child nodes. FgOrComponent has either FG or COMPONENT has it value. I need to select only the Part element with FG as its value for the FgOrComponent element and modify some other elements like etc in the selected part. The expected output is as shown below.
<COLLECTION>
<ChangedParts>
<Part>
<name>123456</name>
<DefaultUnit>ea</DefaultUnit>
<FgOrComponent>FG</FgOrComponent>
<MasterPackUom>mp</MasterPackUom>
<CartonUom>ca</CartonUom>
</Part>
<Part>
<number>456789</number>
<DefaultUnit>each</DefaultUnit>
<FgOrComponent>COMPONENT</FgOrComponent>
<MasterPackUom/>
<CartonUom/>
</Part>
</ChangedParts>
</COLLECTION>
I am using the following XSLT file to do the transformation without any success. Any help would be appreciated.
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*/*/Part[(FgOrComponent = 'FG')]/*">
<xsl:choose>
<xsl:when test="MasterPackUom/text() = ''">
<MasterPackUom>mp</MasterPackUom>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
The test clause "MasterPackUom/text() = '' is never reached.
If the element is empty then it doesn't have any text() node children, just check MasterPackUom = ''.
But as you have the identity transformation set up as a base transformation, please simply write templates for the relevant changes e.g.
<xsl:template match="Part[FgOrComponent = 'FG']/MasterPackUom[. = '']">
<xsl:copy>mp</xsl:copy>
</xsl:template>
instead of doing that odd xsl:choose.

XSLT merging two files with different namespaces

This is my master HTML file with predefined namespace:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>some title</title>
</head>
<body>
<p>some text</p>
</body>
</html>
And I have an additional XML file defined like this:
<?xml version="1.0" encoding="UTF-8"?>
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<front>
<element>front text</element>
</front>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</article>
This is wanted final output (head from html file, extra-list from xml file):
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<head>
<title>some title</title>
</head>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</xml>
I am trying to join these two files with this XSLT below:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<back>
<xsl:copy-of select="document('doc.xml')"/>
</back>
</xsl:template>
</xsl:transform>
I use xpath-default-namespace in XSLT so I don't have to address HTML's namespace all the time (the original master HTML is huge) and I would like to stay with this parameter if possible. Here I am having two issues:
1.) How is it possible to get rid of all xmlns declarations on output?
2.) It is only possible to copy the whole xml file with this command <xsl:copy-of select="document('doc.xml')"/>. If I try to copy only subelement <xsl:copy-of select="document('doc.xml')/article/back"/>, then I get no output, because the content is not in the same namespace. How would I be able to solve this?
UPDATE (COMPLETE XSLT SOLUTION):
Based on Martin's answer below, this is fully working solution.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<!-- copy all elements and ignore namespace -->
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
<!-- copy all attributes and ignore namespace -->
<xsl:template match="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<!-- copy all remaining nodes and ignore namespace -->
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<xsl:copy-of xpath-default-namespace="" copy-namespaces="no" select="document('doc.xml')/article/back"/>
</xsl:template>
</xsl:transform>
I also added two extra templates to copy attributes and some other nodes.
You can override xpath-default-namespace were needed e.g. <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back"/>.
As for namespaces, there are several issues. You run part of the input in the XHTML namespace through an identity transformation, this always preserves the namespace of the elements copied. You will need to change from the identity transformation to a transformation stripping the namespace from elements:
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
The literal result elements you create in the XSLT have the XLink namespace in scope as you declare but not use it in the XSLT code. Either remove the declaration or use exclude-result-prefixes="xlink" on the xsl:stylesheet or xsl:transform element.
The other input you access with document('doc.xml') also declares unused namespaces, the default copying preserves them but as they are only in scope but not used you can get rid of them with copy-namespaces="no: <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back" copy-namespaces="no"/>. Or you would need to push those elements as well through the template stripping namespace with xsl:element name="{local-name()}".

Identify values that dont match in all Nodes and Attributes: XSLT2.0

I need to go over all the xml attributes and text nodes to identify existence of character from list and output the values the characters values that didnt match.
I am able to check the text() nodes but I am not able to perform a check on attributes.
<xsl:template match="#*|node()">
<xsl:variable name="getDelimitersToUseNodes" select="('$' ,'#' ,'*' ,'~')[not(contains(current(),.))]"/>
<xsl:variable name="getDelimitersToUseAttr" select="string-join(('$','#','*','~')[not(contains(#*/,.))],',')"/>
<xsl:variable name="getDelimitersToUse" select="concat(string-join($getDelimitersToUseNodes,','),',',string-join($getDelimitersToUseAttr,','))"/>
<!--xsl:variable name="delim" select="distinct-values($getDelimitersToUse,',')"/-->
<xsl:value-of select="$getDelimitersToUse"/>
</xsl:template>
My mocked up sample file is below
<?xml version="1.0"?>
<sample>
<test1 name="#theGoofy">My$#test</test1>
<test2 value="$##">description test2*</test2>
</sample>
You could process all those text and attribute nodes and make that same check as before. You haven't really said which output format you want, assuming text you could use
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:param name="characters" as="xs:string*" select="'$' ,'#' ,'*' ,'~'"/>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="//text() | //#*"/>
</xsl:template>
<xsl:template match="text() | #*">
<xsl:value-of select="'Text', ., 'does not contain', $characters[not(contains(current(), .))], '
'"/>
</xsl:template>
</xsl:stylesheet>
to get a result like
Text #theGoofy does not contain $ * ~
Text My$#test does not contain * ~
Text $## does not contain * ~
Text description test2* does not contain $ # ~
If you simply want to check all characters not contained in all text nodes and attribute nodes then an approach like
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:param name="characters" as="xs:string*" select="'$' ,'#' ,'*' ,'~'"/>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="nodes-to-inspect" as="node()*" select="//text() | //#*"/>
<xsl:template match="/">
<xsl:value-of select="for $c in $characters return $c[not($nodes-to-inspect[contains(., $c)])]"/>
</xsl:template>
</xsl:stylesheet>
should do.

XML and XSL with unwanted namespace when using saxon

I have used exclude-result-prefixes="ae" in the xsl stylesheet. Then also namespace is present in the converted XML file. I'm using saxon parser. Please find my MWE below:
My XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<ArticleInfo Language="En" ContainsESM="No" OutputMedium="All">
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<Para>Sample Paragraph text here</Para>
</ArticleInfo>
and My XSL file is :
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ae="www.ams.org" exclude-result-prefixes="ae" version="3.0">
<xsl:output omit-xml-declaration="no" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="ElsDoi" select="/ArticleInfo/ArticleDOI"/>
<xsl:template match="ArticleInfo">
<ae:doi><xsl:value-of select="$ElsDoi"/></ae:doi>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Para">
<xsl:element name="ae:para">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
I'm Getting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi xmlns:ae="www.ams.org">10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para xmlns:ae="www.ams.org">Sample Paragraph text here</ae:para>
Expecting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi>10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para>Sample Paragraph text here</ae:para>
Please note unwanted xmlns:ae="www.ams.org" is present in the output XML file and also in title &#x2015 is converted to unicode symbol. How do avoid this.
With <xsl:element name="ae:para"> you are explictly creating an element in the namespace bound to the prefix ae so don't expect exclude-result-prefixes to exclude that namespace as it is only useful to avoid namespace declarations of unused namespaces. A namespace used in a node name can't be excluded with exclude-result-prefixes, as otherwise the result would not be namespace well-formed XML.

XSLT conditionally write to two different files

I need to extract log meesages from an XML file and write them out to plain text files. The log messages come in two flavors, and I want to write them to separate files.
I have written a style sheet that does exactly what I need except that it sometimes creates empty files because the XML file may not contain messages of one type or another.
I am wondering, 1) if what I ma doing is the best method to do this, and 2) if there is a way to suppress empty files.
My sample may contain errors because it has been retyped. (the original is on a closed network)
Note: I am using XSLT 2.0 features.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text" encoding="iso-8859-1" />
<xsl:param name="break" select="string('
')" />
<xs:template match="/">
<xsl:result-document method="text" href="foo.txt">
<xsl:apply-templates select="Root/a/b/c[contains(., 'foo')]" />
</xsl:reult-document>
<xsl:result-document method="text" href="bar.txt">
<xsl:apply-templates select="Root/a/b/c[not(contains(., 'foo'))]" />
</xsl:reult-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select=concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
You could use some XSLT 2.0 stylesheet like:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="break" select="string('
')" />
<xsl:template match="/">
<xsl:apply-templates select="Root/a/b/c"/>
</xsl:template>
<xsl:template match="/Root/a/b/c[contains(., 'foo')]">
<xsl:result-document method="text" href="foo.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="/Root/a/b/c[not(contains(., 'foo'))]">
<xsl:result-document method="text" href="bar.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
Note: Pattern matching and xsl:next-match.

Resources