XSLT merging two files with different namespaces - xslt-2.0

This is my master HTML file with predefined namespace:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>some title</title>
</head>
<body>
<p>some text</p>
</body>
</html>
And I have an additional XML file defined like this:
<?xml version="1.0" encoding="UTF-8"?>
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<front>
<element>front text</element>
</front>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</article>
This is wanted final output (head from html file, extra-list from xml file):
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<head>
<title>some title</title>
</head>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</xml>
I am trying to join these two files with this XSLT below:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<back>
<xsl:copy-of select="document('doc.xml')"/>
</back>
</xsl:template>
</xsl:transform>
I use xpath-default-namespace in XSLT so I don't have to address HTML's namespace all the time (the original master HTML is huge) and I would like to stay with this parameter if possible. Here I am having two issues:
1.) How is it possible to get rid of all xmlns declarations on output?
2.) It is only possible to copy the whole xml file with this command <xsl:copy-of select="document('doc.xml')"/>. If I try to copy only subelement <xsl:copy-of select="document('doc.xml')/article/back"/>, then I get no output, because the content is not in the same namespace. How would I be able to solve this?
UPDATE (COMPLETE XSLT SOLUTION):
Based on Martin's answer below, this is fully working solution.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<!-- copy all elements and ignore namespace -->
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
<!-- copy all attributes and ignore namespace -->
<xsl:template match="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<!-- copy all remaining nodes and ignore namespace -->
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<xsl:copy-of xpath-default-namespace="" copy-namespaces="no" select="document('doc.xml')/article/back"/>
</xsl:template>
</xsl:transform>
I also added two extra templates to copy attributes and some other nodes.

You can override xpath-default-namespace were needed e.g. <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back"/>.
As for namespaces, there are several issues. You run part of the input in the XHTML namespace through an identity transformation, this always preserves the namespace of the elements copied. You will need to change from the identity transformation to a transformation stripping the namespace from elements:
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
The literal result elements you create in the XSLT have the XLink namespace in scope as you declare but not use it in the XSLT code. Either remove the declaration or use exclude-result-prefixes="xlink" on the xsl:stylesheet or xsl:transform element.
The other input you access with document('doc.xml') also declares unused namespaces, the default copying preserves them but as they are only in scope but not used you can get rid of them with copy-namespaces="no: <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back" copy-namespaces="no"/>. Or you would need to push those elements as well through the template stripping namespace with xsl:element name="{local-name()}".

Related

Copy xml elements without namespace

I have a XML file which contains html elements. I want copy them without have the namespaces being copied.
<clonkDoc xmlns="https://clonkspot.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://clonkspot.org clonk.xsd" xml:lang="de">
<doc>
foo <br/> bar
</doc>
</clonkDoc>
and this XSL (truncated):
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0" xpath-default-namespace="https://clonkspot.org" exclude-result-prefixes="xs">
<xsl:output method="html" encoding="ISO-8859-1" doctype-public="-//W3C//DTD HTML 4.01//EN" doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>
<xsl:template match="img|a|em|strong|br|code/i|code/b">
<xsl:copy copy-namespaces="no">
<!-- including every attribute -->
<xsl:for-each select="#*|node()">
<xsl:copy copy-namespaces="no"/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
...
i get something like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head></head>
<body>
foo <br xmlns="https://clonkspot.org"></br> bar
</body>
</html>
I already set copy-namespaces="no" (XSLT to copy element without namespace). I think the XSLT processor should see that the element i want to copy to a HTML4 file is an html element. What am i doing wrong?
Thanks!
xsl:copy in all version of XSLT makes a shallow copy of the context node and in case of an element (or other node with a qualified name like an attribute node) node that means a copy with the same name and namespace. The copy-namespaces="no" introduced in XSLT 2 only helps to avoid to also copy in scope namespace declarations that exist but are not used for the element itself.
So in your case, as you want to strip the existing namespace of the elements, you really want to and need to transform them with a template doing that e.g.
<xsl:template match="img|a|em|strong|br|code/i|code/b">
<xsl:element name="{local-name()}">...</xsl:element>
</xsl:template>
Thank you. Based on your solution i solved it with:
<!-- copy img, a, em and br literally -->
<xsl:template match="img|a|em|strong|br|code/i|code/b">
<xsl:element name="{local-name()}">
<!-- including every attribute -->
<xsl:for-each select="#*">
<xsl:attribute name="{local-name()}"><xsl:value-of select="."/></xsl:attribute>
</xsl:for-each>
<xsl:for-each select="node()">
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:element>
</xsl:template>
That should work recursively and take the attributes with.

Replace nbsp with another tag

In my xml file I have a tag like this(within the p tags I have nbsp;)
Now I want to replace this nbsp with another tag(as an example within p tag I want to insert another tag called s <s/>)
Is this possible to do.Please help
First note that the tree on which XSLT operates never contains a character or entity reference, it simply contains an Unicode character. To match on and replace an Unicode character with an element you can use analyze-string:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()">
<xsl:analyze-string select="." regex=" ">
<xsl:matching-substring>
<s/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
That way an input document like
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
<!ENTITY nbsp " ">
]>
<doc>
<p>This is a paragraph with a non-breaking space before some sub text.</p>
</doc>
is transformed into the result document
<?xml version="1.0" encoding="UTF-8"?><doc>
<p>This is a paragraph with a non-breaking space <s/> before some sub text.</p>
</doc>

XML and XSL with unwanted namespace when using saxon

I have used exclude-result-prefixes="ae" in the xsl stylesheet. Then also namespace is present in the converted XML file. I'm using saxon parser. Please find my MWE below:
My XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<ArticleInfo Language="En" ContainsESM="No" OutputMedium="All">
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<Para>Sample Paragraph text here</Para>
</ArticleInfo>
and My XSL file is :
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ae="www.ams.org" exclude-result-prefixes="ae" version="3.0">
<xsl:output omit-xml-declaration="no" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="ElsDoi" select="/ArticleInfo/ArticleDOI"/>
<xsl:template match="ArticleInfo">
<ae:doi><xsl:value-of select="$ElsDoi"/></ae:doi>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Para">
<xsl:element name="ae:para">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
I'm Getting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi xmlns:ae="www.ams.org">10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para xmlns:ae="www.ams.org">Sample Paragraph text here</ae:para>
Expecting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi>10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para>Sample Paragraph text here</ae:para>
Please note unwanted xmlns:ae="www.ams.org" is present in the output XML file and also in title &#x2015 is converted to unicode symbol. How do avoid this.
With <xsl:element name="ae:para"> you are explictly creating an element in the namespace bound to the prefix ae so don't expect exclude-result-prefixes to exclude that namespace as it is only useful to avoid namespace declarations of unused namespaces. A namespace used in a node name can't be excluded with exclude-result-prefixes, as otherwise the result would not be namespace well-formed XML.

XSLT set directory where result document ends up

The XSLT below creates result-documents as desired, with one exception: the result document ends up in the directory where the stylesheet was invoked from. I want the result document to be where it was found (i.e. overwrite itself with the transform version).
How can I do that?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<xsl:for-each select="collection(iri-to-uri('file:///home/paul/Text/?select=*.xhtml'))">
<xsl:variable name="filename">
<xsl:value-of select="tokenize(document-uri(.), '/')[last()]"/>
</xsl:variable>
<xsl:result-document indent="yes" method="xml" href="{$filename}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- transform templates removed -->
</xsl:stylesheet>
Try just using href="{document-uri(.)}" to use the full uri as the target rather than doing the tokenize to pull out the last segment.

XSLT conditionally write to two different files

I need to extract log meesages from an XML file and write them out to plain text files. The log messages come in two flavors, and I want to write them to separate files.
I have written a style sheet that does exactly what I need except that it sometimes creates empty files because the XML file may not contain messages of one type or another.
I am wondering, 1) if what I ma doing is the best method to do this, and 2) if there is a way to suppress empty files.
My sample may contain errors because it has been retyped. (the original is on a closed network)
Note: I am using XSLT 2.0 features.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text" encoding="iso-8859-1" />
<xsl:param name="break" select="string('
')" />
<xs:template match="/">
<xsl:result-document method="text" href="foo.txt">
<xsl:apply-templates select="Root/a/b/c[contains(., 'foo')]" />
</xsl:reult-document>
<xsl:result-document method="text" href="bar.txt">
<xsl:apply-templates select="Root/a/b/c[not(contains(., 'foo'))]" />
</xsl:reult-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select=concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
You could use some XSLT 2.0 stylesheet like:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="break" select="string('
')" />
<xsl:template match="/">
<xsl:apply-templates select="Root/a/b/c"/>
</xsl:template>
<xsl:template match="/Root/a/b/c[contains(., 'foo')]">
<xsl:result-document method="text" href="foo.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="/Root/a/b/c[not(contains(., 'foo'))]">
<xsl:result-document method="text" href="bar.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
Note: Pattern matching and xsl:next-match.

Resources