Getting errors in Saxon-HE 9.9.1 when processing DITA: I/O error on DTD - saxon

Using Saxon 9.9.1.3J, I am getting an I/O error every time I try to transform a DITA file that has a DTD:
I/O error reported by XML parser processing file:/test.dita: /learningAssessment.dtd (No such file or directory)
This happens even if I force -dtd:off on the command line. Commenting out the DTD in the DITA file does allow it to process.
Interestingly, when I run the same DITA file in oXygen using Saxon-HE 9.8.0.12, it does process correctly. Any idea what might be causing this to behave differently?
Sample DITA file:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE learningAssessment PUBLIC "-//OASIS//DTD DITA Learning Assessment//EN" "learningAssessment.dtd">
<learningAssessment id="id">
<title>Title</title>
<learningAssessmentbody>
<lcInteraction>
<lcSingleSelect id="lcSingleSelect_agy_fxz_ljb">
<lcQuestion>Question</lcQuestion>
<lcAnswerOptionGroup id="lcAnswerOptionGroup_bgy_fxz_ljb">
<lcAnswerOption>
<lcAnswerContent>A</lcAnswerContent>
</lcAnswerOption>
<lcAnswerOption>
<lcAnswerContent>B</lcAnswerContent>
<lcCorrectResponse/>
</lcAnswerOption>
</lcAnswerOptionGroup>
</lcSingleSelect>
</lcInteraction>
</learningAssessmentbody>
</learningAssessment>
And here's a shell of an XSL that demonstrates the error:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:output />
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>

You can resolve the problem by the following steps:
Download DITA-OT and expand it any folder you like. In my case it is located at D:\DITA-OT\dita-ot-3.3.4.
Set CLASSPATH environment variable to contain saxon9he.jarand xml-resolver-1.2.jar in DITA-OT/lib.
Invoke Saxon by specifying class name net.sf.saxon.Transform and the catalog: paramter that specifies [DITA-OT]/catalog-dita.xml.
Here is execution example command window:
Hope this helps!

My guess is that you have somehow contrived to give the document a base URI of "file:/test.dita: ", including the final space. You haven't shown how you are running the transformation, so we can't tell where this base URI comes from.
The option -dtd:off is a little misleading. It doesn't switch off DTD processing, only DTD-based validation, which is just one aspect of DTD processing. An XSLT processor always needs to ask the XML parser to read the DTD in order to expand any entity references.
(Well, theoretically it could delay reading any external DTD until it finds the first entity reference; but sadly, I don't know of any XML parser that does that.)

I misunderstood how DTDs work. I assumed the public ones were loaded from an HTTP URL, but they need to be local files. Loading the catalog for DITA OT resolved the issue.
transform -s:test.dita -xsl:test.xsl -o:test.html -catalog:/org.oasis-open.dita.v1_2/plugins/org.oasis-open.dita.v1_2/catalog.xml
Where the catalog option points to this file on my local filesystem, which comes from DITA OT

Related

Why is an external document resolved in Altova XMLSpy but not in SaxonHE10?

I want to load an external XML-document into a variable, which works in Altova XMLSpy but not in SaxonHE10.
In Altova XMLSpy the following XSLT 2.0 line returns true.
<xsl:copy-of select="fn:doc-available('https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode')"/>
In my local SaxonHE10 installation it returns false.
Are there any commandline parameters I can use to change this behavior?
Addition 18.10.2021 13:12:
The comment section ist to small, so I edit my question:
That is the XSLT:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:xdf3="urn:xoev-de:fim:standard:xdatenfelder_3.0.0"
xmlns:gc="http://docs.oasis-open.org/codelist/ns/genericode/1.0/"
exclude-result-prefixes="html"
>
<xsl:template match="/">
<xsl:message>
<xsl:copy-of select="fn:doc-available('https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode')"/>
</xsl:message>
<xsl:message>
<xsl:copy-of select="fn:document('https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode')"/>
</xsl:message>
</xsl:template>
</xsl:stylesheet>
That is the command call:
"C:\Program Files\Saxonica\SaxonHE10.2N\bin\Transform.exe" -s:test.xml -xsl:test.xsl
This is the result:
false
Error FODC0002 while evaluating xsl:message at line 20 of file:/C:/Users/Volker/Dropbox/FIM/Tools/QS%20Datenfelder/test.xsl: Document has been marked not available: https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode
<?xml version="1.0" encoding="UTF-8"?>
Interestingly the doc and doc-available calls to the given URL work in the Saxon .NET based XSLT fiddle app I run.
As far as I can tell, the only IKVM setting I have there in the web.config (that might as well work in an app.config for non web use of .NET and Saxon) are the TLS settings e.g.
<configuration>
<appSettings>
<add key="ikvm:https.protocols" value="TLSv1,TLSv1.1,TLSv1.2" />
</appSettings>
</configuration>
So that would be worth a try, that any .NET users of Saxon for which the URI fails add the above setting to the app.config or web.config.
I now tried some simple C# code using Saxon HE 10.6 .NET with e.g.
Processor processor = new Processor();
string xpathExpression = "doc-available('https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode')";
Console.WriteLine(processor.NewXPathCompiler().EvaluateSingle(xpathExpression, null).GetStringValue());
Console.ReadLine();
without any change to IKVM settings or app.config this outputs true. It turns out that the used .NET framework version is decisive or part of the reason, the first test was done with 4.8. Using 4.5 gives false, using 4.6 gives true.
Even without using Saxon or IKVM, a .NET framework 4.5 application doing
string url = "https://www.xrepository.de/api/version_codeliste/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:staat_2019-02-01/genericode";
var request = WebRequest.Create(url);
{
using (var response = (HttpWebResponse)request.GetResponse())
{
Console.WriteLine("Status: {0}", response.StatusCode);
response.Close();
}
}
Console.ReadLine();
fails to establish a connection as it gives some error about not being able to establish a protected SSL/TLS channel. Using .NET framework 4.8 no such problem arises. Still don't know how to get compiled and installed .exe files like Transform.exe or Query.exe from Saxon to switch/look for the latest/highest installed .NET framework instead of (I presume) the lowest they were compiled for and tested for to run with. https://learn.microsoft.com/en-us/dotnet/framework/network-programming/tls might give some clues.

ANT script : xmlcatalog not reading local dtd

I have XML file named TIBCOUniversalInstaller_TRA_5.10.0.silent as below.I want to replace values in XML file using "replace" target in ant script using xmltask task.
XML File is below:
<?xml version="1.0"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>---Universal Installer Silent Installation Properties---</comment>
<!--accept the license agreement-->
<entry key="acceptLicense">true</entry>
<entry key="installationRoot">/opt/tibco</entry>
<entry key="environmentName">TRA</entry>
</properties>
At time of parsing XML file ,since my server can not reach java.sun.com , so i had downloaded properties.dtd on my local machine and using xmlcatalog task i am forcing ant script to read local copy of properties.dtd.Below is my ant script
<xmlcatalog id="dtd">
<dtd publicId="SYSTEM" location="/home/tibco/BW-AUTOMATION-
PROJECT/Environments/properties.dtd"/>
</xmlcatalog>
<xmltask source="${TRASoftwareFolder}/TIBCOUniversalInstaller_TRA_5.10.0.silent" dest="${TRASoftwareFolder}/TIBCOUniversalInstaller_TRA_5.10.0.silent">
<xmlcatalog refid="dtd">
</xmlcatalog>
<replace path="/:properties/:entry/:[#key='installationRoot']/text()"
withText="/home/tibco"/>
</xmltask>
But still at time of parsing XML contents , everytime it is going to http://java.sun.com/dtd/properties.dtd and i get "Connection Refused Error".
When i did debug i see below which i believe can be issue and it is always going to website instead of local dtd file.
DEBUG LOGS:
"No matching catalog entry found, parser will use: 'http://java.sun.com/dtd/properties.dtd'"
I believe it is because i gave "SYSTEM" as value in "publicId" attribute inside dtd element.
Can you please advise what should be correct value for "publicID" attribute for this given dtd so that it matches catalog at the time of parsing.
If there is another way of reading/replacing this XML file please advise.
Thanks

Debug <?xml-stylesheet type="text/xsl" href="#test"?> in oXygen

I am writing test files that test functionality of an XSLT library. For this, I embed tiny XSLTs in the XML file itself so that I don't need a separate XML and XSLT file for each test. This looks somewhat like this:
<?xml-stylesheet type="text/xsl" href="#test"?>
<someXml xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<test feature="lib:someFeature(...)">
<xsl:stylesheet version="2.0" xml:id="test">
<xsl:import href="../testlib.xsl"/>
<xsl:template match="*[lib:assertRef(#label, lib:someFeature())]" mode="assert"/>
</xsl:stylesheet>
</test>
<someContent label="assert: #someId"/>
<someMoreContent xml:id="someId"/>
</someXml>
Is there a way in oXygen to debug this? Does oXygen have a way to run transformations based on the <?xml-stylesheet?> rules at all? Usually, this is not much of a problem as the referenced stylesheet can be run explicitly, but when the stylesheet is embedded, it's something different.
As confirmed by oXygen developer #RaduCoravu, this is not possible at the moment.

Ant xmlproperty task fails due to validation error

I want to extract an application version from a DITA map file. The ditamap file is valid and looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map id="user-manual">
<title><ph keyref="product"/> User Manual</title>
<topicmeta>
<prodinfo>
<prodname><keyword keyref="product"/></prodname>
<vrmlist>
<vrm version="4" release="3" modification="0"/>
</vrmlist>
</prodinfo>
</topicmeta>
<!--
[...]
-->
</map>
The information I want to get is in the <vrm> element.
"Easy peasy," I think to myself. So I use Ant's <xmlproperty> task to just load this XML file.
<project default="test">
<!-- notice #validate -->
<xmlproperty file="path/to/user-manual.ditamap" validate="false"/>
<target name="test">
<echo>${map.topicmeta.prodinfo.vrmlist.vrm(version)}</echo>
</target>
</project>
I don't want it to validate because Ant isn't going to find map.dtd.
Loading the file returns an error:
java.io.FileNotFoundException: /home/user/user-manual/map.dtd (No such file or directory)
If I remove the <!DOCTYPE> declaration or add a nested <xmlcatalog> with the path to the DTD, the file loads and I can use the properties from it.
I tested this with Ant 1.7.1 and 1.9.4. Is this a bug with Ant, or am I misunderstanding how Ant loads XML properties and the purpose of the validate attribute?
How can I make Ant obey my will?
I recommend to not use the <xmlproperty> for this. Please have a look at the docs:
For example, with semantic attribute processing enabled, this XML
property file:
<root>
<properties>
<foo location="bar"/>
<quux>${root.properties.foo}</quux>
</properties>
</root>
is roughly equivalent to the following fragments in a build.xml file:
<property name="root.properties.foo" location="bar"/>
<property name="root.properties.quux" value="${root.properties.foo}"/>
So the name of the properties you set is generated using their paths to the root element, so they rely on the structure of your DITA Map. But many elements in DITA may be set at different positions on your DITA Map. That means, if you move your metadata to another parent element, the property name changes and your build fails. This is probably not, what you want.
I'd recommend to grab those values via XSLT and than set the properties. That way, you could, for example, say, "give me the first occurance of that element with a simple //foo[1] XPath selector. Further on, you have the power of XSLT and XPath to slice values, format dates and so on before setting a property.
Update
You can use the oops consultancy Ant xmltask for that. It is very easy to set a property using <copy>:
<copy path="//critdates/created/#date"
property="document.date"
append="false"/>

How can I get xslt to indent xml (from Ant)?

From what I understand having looked around for an answer to this the following should work:
<xslt basedir="..." destdir="..." style="xslt-stylesheet.xsd" extension=".xml"/>
Where xslt-stylesheet.xsd contains the following:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
Unfortunately while most formatting is applied (spaces are stripped, newlines entered, etc.), indentation is not and every element is along the left side in the file. Is this an issue with the xslt processor Ant uses, or am I doing something wrong? (Using Ant 1.8.2).
It might help to set some processor-specific output options, though you should note that these may vary depending on the XSLT processor that you're using.
For example, if you're using Xalan, it defines an indent-amount property, which seems to default to 0.
To override this property at runtime, you can declare xalan namespace in your stylesheet and override using the processor-specific attribute indent-amount in your output element as follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xalan">
<xsl:output method="xml"
encoding="UTF-8"
indent="yes"
xalan:indent-amount="2"/>
This example is from the Xalan usage patterns documentation at http://xml.apache.org/xalan-j/usagepatterns.html
If you do happen to be using Xalan, the documentation also says you can change all of the output preferences globally by setting changing the file org/apache/serializer/output_xml.properties in the serializer jar.
In the interest of completeness, the complete set of Xalan-specific xml output properties defined in that file (Xalan 2.7.1) are:
{http://xml.apache.org/xalan}indent-amount=0
{http://xml.apache.org/xalan}content-handler=org.apache.xml.serializer.ToXMLStream
{http://xml.apache.org/xalan}entities=org/apache/xml/serializer/XMLEntities
If you're not using Xalan, you might have some luck looking for some processor-specific output properties in the documentation for your XSLT processor
Different XSLT processors implement indent="yes" in different way. Some indent properly, while others only put the element starting on a new line. It seems that your XSLT processor is among the latter group.
Why is this so?
The reason is that the W3C XSLT Specification allows significant leeway in what indentation could be produced:
"If the indent attribute has the value yes, then the xml output
method may output whitespace in addition to the whitespace in the
result tree (possibly based on whitespace stripped from either the
source document or the stylesheet) in order to indent the result
nicely; if the indent attribute has the value no, it should not
output any additional whitespace. The default value is no. The xml
output method should use an algorithm to output additional whitespace
that ensures that the result if whitespace were to be stripped from
the output using the process described in [3.4 Whitespace Stripping]
with the set of whitespace-preserving elements consisting of just
xsl:text would be the same when additional whitespace is output as
when additional whitespace is not output.
NOTE:It is usually not safe to use indent="yes" with document types that include element types with mixed content."
Possible solutions:
Start using another XSLT processor. For example, Saxon indents quite well.
Remove the <xsl:strip-space elements="*"/> directive. If there are whitespace-only text nodes in the source XML, they would be copied to the output and this may result in a better-looking indented output.
I don't know if ant is OK. But concerning your XSLT :
When you use the copy-of on an element, your XSLT processor does not indent. If you change your XSLT like this, your XSLT processor will may be manage to indent :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This XSLT will go through the whole XML tree and indents each element it creates.
EDIT after comment :
You can see the following question to change your XSLT processor, maybe it will solve your problem : How to execute XSLT 2.0 with ant?
You can try adding the {http://xml.apache.org/xslt}indent-amount output property in ant, something like this:
<target name="applyXsl">
<xslt in="${inputFile}" out="${outputFile}" extension=".html" style="${xslFile}" force="true">
<outputproperty name="indent" value="yes"/>
<outputproperty name="{http://xml.apache.org/xslt}indent-amount" value="4"/>
</xslt>
</target>

Resources