"Error attempting to parse XML file" when parsing using XInclude - xml-parsing

I am trying to create a combined xml document using XInclude to be unmarshalled via JAXB.
Here is my unmarshalling code:
#Override
public T readFromReader(final Reader reader) throws Exception {
final Unmarshaller unmarshaller = createUnmarshaller();
final SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setXIncludeAware(true);
spf.setNamespaceAware(true);
//spf.setValidating(true);
final XMLReader xr = spf.newSAXParser().getXMLReader();
final SAXSource source = new SAXSource( xr, new InputSource(reader) );
try {
final T object = (T) unmarshaller.unmarshal(source);
postReadSetup(object);
return object;
} catch (final Exception e) {
throw new RuntimeException("Cannot parse XML: Additional information is attached. Please ensure your XML is valid.", e);
}
}
Here is my main xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<tag1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="path-to-schema/schema.xsd">
<xi:include href="path-to-xml-files/included.xml"></xi:include>
</tag1>
And included.xml:
<?xml version="1.0" encoding="UTF-8"?>
<tag2> Some text </tag2>
In order to actually unmarshal it, I create a new FileReader with the path to my xml file (path-to-xml-files/main.xml - the path is correct because it can clearly find the main file). When I run it, however, there is something wrong with the included file. I am getting an UnmarshalException with a linked SAXParseException with this error message: Error attempting to parse XML file (href='path-to-xml-files/included.xml').
When I manually merge the content of included.xml into main.xml, it runs with no problems.
I can't tell if it's a JAXB issue or an XInclude issue, though I strongly suspect the latter.
What am I missing?

I fought with this exact same problem for three hours and finally I found this:
xerces.apache.org/xerces2-j/features.html
In short, you need to add the following line:
spf.setFeature("http://apache.org/xml/features/xinclude/fixup-base-uris", false);

I had the exact same issue.
Actually, the href attribute expects an URI, which can be:
Either an HTTP address (which means your included XML must be hosted somewhere)
Or a file on your local machine. But in that case, you need to prefix it with "file:..." and provide the absolute path.
With your example:
<?xml version="1.0" encoding="UTF-8" ?>
<tag1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="path-to-schema/schema.xsd">
<xi:include href="file:absolute-path-to-xml-files/included.xml"/>
</tag1>

Related

SAXParseException when using restassured

I am trying to verify a XML response with rest-assured like this:
.then().body("some.xml.path", is("abc"));
However, what I get is a SAXParseException:
DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.]
Response starts like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cXML.org/schemas/cXML/1.2.021/cXML.dtd">
<cXML ...
Why am I getting this exception? What should I change?
I am using version 3.2.0 of rest-assured.
A similar question has been answered here. In short, the answer describes to use disableLoadingOfExternalDtd() to have RestAssured ignore the Document Type Definition in your XML.
Normally, the DTD would describe (using the external definition) the structural layout of the element defined as cXML.

VTD-xml ignore well formed file

I will parse xml file (this peace of file):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "./resource/init/dblp.dtd">
<dblp>
<www mdate="2002-01-03" key="www/fr/ardentsoftware">
<title>Ardent Software</title>
<url>http://www.ardentsoftware.fr</url>
</www>
.
.
.
.
</dblp>
with vtd-xml , but I had this exception :
com.ximpleware.extended.EntityExceptionHuge: Errors in Entity: Illegal entity char
which mean that my file contain "entities" ,so how can I make vtd-xml ignore validation file to make a perssing correctly .
VTDGenHuge vg = new VTDGenHuge();
XMLMemMappedBuffer xb = new XMLMemMappedBuffer();
try{
xb.readFile("./resource/init/dblp.xml");
vg.setDoc(xb);
vg.parse(false);
VTDNavHuge vnh = vg.getNav();
Thanks
The VTDGenHuge parser throws this error for the simple reason that your XML file contains invalid entity references... correct the error and it should work fine

Validating XML with an in-memory DTD in C using libxml2

I need to validate XML using DTD stored in memory, i.e. something like the following:
static const char *dtd_str = "<!ELEMENT ...>";
xmlDtdPtr dtd;
dtd = xmlParseMemoryDtd(dtd_str);
XML_PARSE_DTDVALID parser option allows to validate DTD embedded into XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE some_tag[
<!ELEMENT some_tag ...>
...
]>
<some_tag>...</some_tag>
So a workaround is to modify in-memory XML. Things become more complicated with
a parser used in "push mode". In push mode we have to detect whether the XML
declaration (<?xml ...?>), or start of the root element, then put our inline
DTD between them.
Could you suggest better solution?
EDIT
A workaround is to validate parsed XML posteriori as Daniel(_DV) suggested below.
Example: main.c, response.xml.
But I was searching for way to "embed" a DTD and validate XML "on-the-fly" while libxml2 parses XML chunk-by-chunk.
The following aproach doesn't work for me:
xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT | XML_PARSE_NOWARNING | XML_PARSE_DTDVALID);
ctxt->sax->internalSubset = ngx_http_file_chunks_sax_internal_subset;
ctxt->sax->externalSubset = NULL;
$ ./parsexml
validity error : Validation failed: no DTD found !
<response>
^
Document is not valid
xmlValidateDtd allows to do DTD validation a posteriori of an already parsed XML document
to make sure it validates against the DTD. This will not use the internal subset...
http://xmlsoft.org/html/libxml-valid.html#xmlValidateDtd
See xmllint.c code in libxml2 for a full example of how to use it,
Daniel

Error in loading XSL files and DTD files in XSLT transformation

I am trying to create HTML files using XSLT, I have used xml file and xsl files to create HTML file. Here some other xsl files which are located in same location are included in xsl file by using <xsl:include href="temp.xsl"/>.
Here Xsl files are located in "D:/XSL_Folder/" path.
I am running Main.java file which is located in D:/Workspace/Webapp_Project/ path.
When i try to create HTML files by using passing "D:/XSL_Folder/root.xsl" and "D:/XML_Folder/data.xml" files to Main.java as arguments, I am getting following error while creating Templates.
Templates lTemplates = TransformerFactory.newInstance().newTemplates(new StreamSource(new FileInputStream(lFileXSL)));
ERROR: 'D:\Workspace\Webapp_Project\temp.xsl (The system cannot find the file specified)'
FATAL ERROR: 'Could not compile stylesheet'
12:20:07 ERROR f.s.t.v.v2.dao.impl.DocUnitDaoImpl - Error while creating a new XslTransformerGetter. The path to the XSL may be wrong.
javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:885) ~[na:1.7.0_13]
In error report we can see that parser is checking included xsl file in project path (D:\Workspace\Webapp_Project), not in the path where root.xsl file is located (D:/XSL_Folder/).
Can anyone suggest me why parser searching xsl file in project folder in the path where root.xsl file is located and how to fix this problem?
Code I'm using to create HTML file by using XSL and XML file :
public static void simpleTransform(InputStream lXmlFileStream, File lXSLFile,
StreamResult lHtmlResult, Map<String, String> lArguments) {
TransformerFactory tFactory = TransformerFactory.newInstance();
try {
Transformer transformer =
tFactory.newTransformer(new StreamSource(lXSLFile));
for (Entry<String, String> lEntrie : lArguments.entrySet()) {
transformer.setParameter(lEntrie.getKey(), lEntrie.getValue());
}
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new StreamSource(lXmlFileStream), lHtmlResult);
}
catch (Exception e) {
e.printStackTrace();
}
}
You have tagged the question "saxon", and you have said you are using XSLT 2.0, but the error messages show that you are using Xalan. If you specifically want to use Saxon then the best way is to avoid using the JAXP classpath search and instantiate Saxon directly - in place of TransformerFactory.newInstance(), use new net.sf.saxon.TransformerFactory().
Supplying a File as the argument to StreamSource ought to be OK; but I would like to see how the File lXSLFile object is created. My suspicion would be that you have done something like new File ("root.xsl") and it has been resolved relative to the current directory.
You may try to use <xsl:include href="resolve-uri('temp.xsl')"/> instead of <xsl:include href="temp.xsl"/> to avoid this problem.

SimpleXML Cyrillic Encoding

This is the type of XML file, which I am using:
<?xml version="1.0" encoding="UTF-8"?>
<ProductCatalog>
<ProductType>Дънни платки</ProductType>
<ProductType>Дънни платки 2</ProductType>
</ProductCatalog>
And when I run the PHP file with the following code:
$pFile = new SimpleXMLElement('test.xml', null, true);
foreach ($pFile->ProductType as $pChild)
{
var_dump($pChild);
}
I get the following results:
object(SimpleXMLElement)#5 (1) { [0]=> string(40) "Дънна платка наÑтолна"
I have tried different encodings in the XML file but it's not working well with Cyrillic symbols.
What happens if you switch Character encoding (to utf-8) in browser?
I mean, looks like output issue.

Resources