XML and XSL with unwanted namespace when using saxon - xslt-2.0

I have used exclude-result-prefixes="ae" in the xsl stylesheet. Then also namespace is present in the converted XML file. I'm using saxon parser. Please find my MWE below:
My XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<ArticleInfo Language="En" ContainsESM="No" OutputMedium="All">
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<Para>Sample Paragraph text here</Para>
</ArticleInfo>
and My XSL file is :
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ae="www.ams.org" exclude-result-prefixes="ae" version="3.0">
<xsl:output omit-xml-declaration="no" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="ElsDoi" select="/ArticleInfo/ArticleDOI"/>
<xsl:template match="ArticleInfo">
<ae:doi><xsl:value-of select="$ElsDoi"/></ae:doi>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Para">
<xsl:element name="ae:para">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
I'm Getting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi xmlns:ae="www.ams.org">10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para xmlns:ae="www.ams.org">Sample Paragraph text here</ae:para>
Expecting output XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//AMS//DTD journal article//EN//XML" "art.dtd">
<ae:doi>10.1000/j.asdf.2015.02.034</ae:doi>
<ArticleID>034</ArticleID>
<ArticleJID>BMCL</ArticleJID>
<ArticleDOI>10.1000/j.asdf.2015.02.034</ArticleDOI>
<ArticleTitle>Sample Article Title with ― unicode value</ArticleTitle>
<ae:para>Sample Paragraph text here</ae:para>
Please note unwanted xmlns:ae="www.ams.org" is present in the output XML file and also in title &#x2015 is converted to unicode symbol. How do avoid this.

With <xsl:element name="ae:para"> you are explictly creating an element in the namespace bound to the prefix ae so don't expect exclude-result-prefixes to exclude that namespace as it is only useful to avoid namespace declarations of unused namespaces. A namespace used in a node name can't be excluded with exclude-result-prefixes, as otherwise the result would not be namespace well-formed XML.

Related

XSLT merging two files with different namespaces

This is my master HTML file with predefined namespace:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>some title</title>
</head>
<body>
<p>some text</p>
</body>
</html>
And I have an additional XML file defined like this:
<?xml version="1.0" encoding="UTF-8"?>
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<front>
<element>front text</element>
</front>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</article>
This is wanted final output (head from html file, extra-list from xml file):
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<head>
<title>some title</title>
</head>
<back>
<extra-list>
<element>element text</element>
</extra-list>
</back>
</xml>
I am trying to join these two files with this XSLT below:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<back>
<xsl:copy-of select="document('doc.xml')"/>
</back>
</xsl:template>
</xsl:transform>
I use xpath-default-namespace in XSLT so I don't have to address HTML's namespace all the time (the original master HTML is huge) and I would like to stay with this parameter if possible. Here I am having two issues:
1.) How is it possible to get rid of all xmlns declarations on output?
2.) It is only possible to copy the whole xml file with this command <xsl:copy-of select="document('doc.xml')"/>. If I try to copy only subelement <xsl:copy-of select="document('doc.xml')/article/back"/>, then I get no output, because the content is not in the same namespace. How would I be able to solve this?
UPDATE (COMPLETE XSLT SOLUTION):
Based on Martin's answer below, this is fully working solution.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" version="1.0" indent="yes"/>
<!-- copy all elements and ignore namespace -->
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
<!-- copy all attributes and ignore namespace -->
<xsl:template match="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<!-- copy all remaining nodes and ignore namespace -->
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="html">
<xml>
<xsl:apply-templates/>
</xml>
</xsl:template>
<xsl:template match="head">
<head>
<xsl:apply-templates/>
</head>
</xsl:template>
<xsl:template match="body">
<xsl:copy-of xpath-default-namespace="" copy-namespaces="no" select="document('doc.xml')/article/back"/>
</xsl:template>
</xsl:transform>
I also added two extra templates to copy attributes and some other nodes.
You can override xpath-default-namespace were needed e.g. <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back"/>.
As for namespaces, there are several issues. You run part of the input in the XHTML namespace through an identity transformation, this always preserves the namespace of the elements copied. You will need to change from the identity transformation to a transformation stripping the namespace from elements:
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
The literal result elements you create in the XSLT have the XLink namespace in scope as you declare but not use it in the XSLT code. Either remove the declaration or use exclude-result-prefixes="xlink" on the xsl:stylesheet or xsl:transform element.
The other input you access with document('doc.xml') also declares unused namespaces, the default copying preserves them but as they are only in scope but not used you can get rid of them with copy-namespaces="no: <xsl:copy-of xpath-default-namespace="" select="document('doc.xml')/article/back" copy-namespaces="no"/>. Or you would need to push those elements as well through the template stripping namespace with xsl:element name="{local-name()}".

XSLT on XML Get parent Value from a mixed Node

I have an XML with some mixed Nodes,and I want to get just the value of the parent and not the child.
My XML
<?xml version="1.0" encoding="UTF-8"?>
<Records>
<DET>
<detnumber>100126</detnumber>
<EmployeeNo>100126</EmployeeNo>
<action>CHANGE</action>
<first_name> NewHire-4th
<previous>NewHire</previous>
</first_name>
<last_name>Test-Changed 4th
<previous>Test-Changed 3rd</previous>
</last_name>
<birth_name>
NewHire-Changed 4th
<previous>NewHire-Changed 3rd</previous>
</birth_name>
<formal_name>
NewHire-4th Test-Changed 4th
<previous>NewHire Test-Changed 3rd</previous>
</formal_name>
<salutation>
MISS
<previous>MRS</previous>
</salutation>
<email_address>
testHire4#gmail.com
<previous>testHire2#gmail.com</previous>
</email_address>
</DET>
</Records>
Using XSLT 2.0 ,
I am mostly using copy of in my xslt, But the whole Node and its child are being copied. I need to be able to restrict only to the parent.
<xsl:copy-of select="first_name"/>
<xsl:copy-of select="last_name"/>
<xsl:copy-of select="birth_name"/>
<xsl:copy-of select="formal_name"/>
<xsl:copy-of select="salutation"/>
Below is my preferred output
<?xml version="1.0" encoding="UTF-8"?>
<Records>
<DET>
<detnumber>100126</detnumber>
<EmployeeNo>100126</EmployeeNo>
<action>CHANGE</action>
<first_name> NewHire-4th</first_name>
<last_name>Test-Changed 4th</last_name>
<birth_name>NewHire-Changed 4th</birth_name>
<formal_name>NewHire-4th Test-Changed 4th</formal_name>
<salutation>MISS</salutation>
<email_address>testHire4#gmail.com</email_address>
</DET>
</Records>
Check this Code:-
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="previous"/>
</xsl:stylesheet>

Replace nbsp with another tag

In my xml file I have a tag like this(within the p tags I have nbsp;)
Now I want to replace this nbsp with another tag(as an example within p tag I want to insert another tag called s <s/>)
Is this possible to do.Please help
First note that the tree on which XSLT operates never contains a character or entity reference, it simply contains an Unicode character. To match on and replace an Unicode character with an element you can use analyze-string:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()">
<xsl:analyze-string select="." regex=" ">
<xsl:matching-substring>
<s/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
That way an input document like
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
<!ENTITY nbsp " ">
]>
<doc>
<p>This is a paragraph with a non-breaking space before some sub text.</p>
</doc>
is transformed into the result document
<?xml version="1.0" encoding="UTF-8"?><doc>
<p>This is a paragraph with a non-breaking space <s/> before some sub text.</p>
</doc>

XSLT set directory where result document ends up

The XSLT below creates result-documents as desired, with one exception: the result document ends up in the directory where the stylesheet was invoked from. I want the result document to be where it was found (i.e. overwrite itself with the transform version).
How can I do that?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<xsl:for-each select="collection(iri-to-uri('file:///home/paul/Text/?select=*.xhtml'))">
<xsl:variable name="filename">
<xsl:value-of select="tokenize(document-uri(.), '/')[last()]"/>
</xsl:variable>
<xsl:result-document indent="yes" method="xml" href="{$filename}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- transform templates removed -->
</xsl:stylesheet>
Try just using href="{document-uri(.)}" to use the full uri as the target rather than doing the tokenize to pull out the last segment.

XSLT conditionally write to two different files

I need to extract log meesages from an XML file and write them out to plain text files. The log messages come in two flavors, and I want to write them to separate files.
I have written a style sheet that does exactly what I need except that it sometimes creates empty files because the XML file may not contain messages of one type or another.
I am wondering, 1) if what I ma doing is the best method to do this, and 2) if there is a way to suppress empty files.
My sample may contain errors because it has been retyped. (the original is on a closed network)
Note: I am using XSLT 2.0 features.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text" encoding="iso-8859-1" />
<xsl:param name="break" select="string('
')" />
<xs:template match="/">
<xsl:result-document method="text" href="foo.txt">
<xsl:apply-templates select="Root/a/b/c[contains(., 'foo')]" />
</xsl:reult-document>
<xsl:result-document method="text" href="bar.txt">
<xsl:apply-templates select="Root/a/b/c[not(contains(., 'foo'))]" />
</xsl:reult-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select=concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
You could use some XSLT 2.0 stylesheet like:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="break" select="string('
')" />
<xsl:template match="/">
<xsl:apply-templates select="Root/a/b/c"/>
</xsl:template>
<xsl:template match="/Root/a/b/c[contains(., 'foo')]">
<xsl:result-document method="text" href="foo.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="/Root/a/b/c[not(contains(., 'foo'))]">
<xsl:result-document method="text" href="bar.txt">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="concat(normalize-space(.), $break)" />
</xsl:template>
</xsl:stylesheet>
Note: Pattern matching and xsl:next-match.

Resources