Validating XML with an in-memory DTD in C using libxml2 - xml-parsing

I need to validate XML using DTD stored in memory, i.e. something like the following:
static const char *dtd_str = "<!ELEMENT ...>";
xmlDtdPtr dtd;
dtd = xmlParseMemoryDtd(dtd_str);
XML_PARSE_DTDVALID parser option allows to validate DTD embedded into XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE some_tag[
<!ELEMENT some_tag ...>
...
]>
<some_tag>...</some_tag>
So a workaround is to modify in-memory XML. Things become more complicated with
a parser used in "push mode". In push mode we have to detect whether the XML
declaration (<?xml ...?>), or start of the root element, then put our inline
DTD between them.
Could you suggest better solution?
EDIT
A workaround is to validate parsed XML posteriori as Daniel(_DV) suggested below.
Example: main.c, response.xml.
But I was searching for way to "embed" a DTD and validate XML "on-the-fly" while libxml2 parses XML chunk-by-chunk.
The following aproach doesn't work for me:
xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT | XML_PARSE_NOWARNING | XML_PARSE_DTDVALID);
ctxt->sax->internalSubset = ngx_http_file_chunks_sax_internal_subset;
ctxt->sax->externalSubset = NULL;
$ ./parsexml
validity error : Validation failed: no DTD found !
<response>
^
Document is not valid

xmlValidateDtd allows to do DTD validation a posteriori of an already parsed XML document
to make sure it validates against the DTD. This will not use the internal subset...
http://xmlsoft.org/html/libxml-valid.html#xmlValidateDtd
See xmllint.c code in libxml2 for a full example of how to use it,
Daniel

Related

How to get over this SXXP0003 parser error?

I have an XML document which prolog looks like this :
<?xml version="1.0" encoding="utf-8" standalone="no"?>
...
This XML document is valid against the external DTD with the exact same prolog :
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE root [
...
]>
When I transform using Saxon (latest release):
$:/opt/tomcat/webapps/ROOT/$ java net.sf.saxon.Transform -s:pandora.xml -xsl:pandora.xsl -o:pandora.html
Error on line 1 column 53 of pandora.dtd:
SXXP0003 Error reported by XML parser: No more pseudo attributes are allowed.: No more
pseudo attributes are allowed.
org.xml.sax.SAXParseException; systemId: file:/opt/tomcat/webapps/ROOT/fred/pandora/dtd/pandora.dtd; lineNumber: 1; columnNumber: 53; No more pseudo attributes are allowed.
I am newbie and my research about this has only led to listing the pseudo-attributes in the order they actually are. If anybody have a clue there.
Edit
I have made other transformations using the same process with other projects without any problem. The only difference is in this problematic application, I make use of another namespace exsl to use a function not provided with version 1.0 (node-set). Everything else is similar.
For an external subset of the DTD the specification defines the format in https://www.w3.org/TR/xml/#NT-extSubset as
extSubset ::= TextDecl? extSubsetDecl
extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep)*
, for the "Text Declaration" in https://www.w3.org/TR/xml/#sec-TextDecl as TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>' so a standalone "pseudo" attribute is indeed not allowed there.
So make sure that your external DTD file does not repeat <!DOCTYPE root, it is just meant to contain declaration of markup, e.g. elements, attributes.
The error message you get comes anyway just from the XML parser and is not transformation/XSLT related.

SAXParseException when using restassured

I am trying to verify a XML response with rest-assured like this:
.then().body("some.xml.path", is("abc"));
However, what I get is a SAXParseException:
DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.]
Response starts like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cXML.org/schemas/cXML/1.2.021/cXML.dtd">
<cXML ...
Why am I getting this exception? What should I change?
I am using version 3.2.0 of rest-assured.
A similar question has been answered here. In short, the answer describes to use disableLoadingOfExternalDtd() to have RestAssured ignore the Document Type Definition in your XML.
Normally, the DTD would describe (using the external definition) the structural layout of the element defined as cXML.

Deserialize XML with UTF-16 encoding in ServiceStack.Text

I am trying to use ServiceStack.Text to deserialize some XML.
Code:
var buildEvent = dto.EventXml.FromXml<TfsEventBuildComplete>();
The opening xml line is:
<?xml version="1.0" encoding="UTF-16"?>
ServiceStack fails with the following error:
The encoding in the declaration 'utf-16' does not match the encoding of the document 'utf-8'.
I can see from the source of the Xml Serializer that ServiceStack uses UTF-8.
I am wondering whether ServiceStack.Text can deserialize UTF-16 and if so how? And if not, why not?
I have managed to hack my way around the issue. I'm not proud of it but....
var buildEvent = dto.EventXml.Replace("utf-16", "utf-8").FromXml<TfsEventBuildComplete>();

What does "Error parsing XML: not well-formed" mean?

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation=”vertical”
android:layout_width=”fill_parent”
android:layout_height=”fill_parent” >
I get these two errors
error: Error parsing XML: not well-formed (invalid token)
&
Open quote is expected for attribute "android:orientation" associated with an element type "LinearLayout".
Did you copy and paste that from word? Your quotes look a little funky. Sometimes word will use a different character than the expected " for double quotes. Make sure those are all consistent. Otherwise, the syntax is invalid.
Looks like you have "smart quotes" ( not simple " double quotes) around some attributes in your LinearLayout element.
There are many references that explain the differences between valid and well formed XML documents. A good starting point can be found here. There is also an online XML Validator that you can use to test XML documents.
The validator shows that you have two issues:
Some of your attribute values use an invalid quote character: ” vs. ", and
you need to close the LinearLayout tag with /> instead of just >.

Is it possible to generate plain-old XML using Haml?

I've been working on a piece of software where I need to generate a custom XML file to send back to a client application. The current solutions on Ruby/Rails world for generating XML files are slow, at best. Using builder or event Nokogiri, while have a nice syntax and are maintainable solutions, they consume too much time and processing.
I definetly could go to ERB, which provides a good speed at the expense of building the whole XML by hand.
HAML is a great tool, have a nice and straight-forward syntax and is fairly fast. But I'm struggling to build pure XML files using it. Which makes me wonder, is it possible at all?
Does any one have some pointers to some code or docs showing how to do this, build a full, valid XML from HAML?
Doing XML in HAML is easy, just start off your template with:
!!! XML
which produces
<?xml version='1.0' encoding='utf-8' ?>
Then as #beanish said earlier, you "make up your own tags":
%test
%test2 hello
%item{:name => "blah"}
to get
<test>
<test2>hello</test2>
<item name='blah'></item>
</test>
More:
http://haml.info/docs/yardoc/file.REFERENCE.html#doctype_
%test
%test2 hello
%item{:name => "blah"}
run it through haml
haml hamltest.haml test.xml
open the file in a browser
<test>
<test2>hello</test2>
<item name='blah'></item>
</test>
The HAML reference talks about html tags and gives some examples.
HAML reference
This demonstrates some things that could use useful for xml documents:
!!! XML
%root{'xmlns:foo' => 'http://myns'}
-# Note: :dashed-attr is invalid syntax
%dashed-tag{'dashed-attr' => 'value'} Text
%underscore_tag Text
- ['apple', 'orange', 'pear'].each do |fruit|
- haml_tag(fruit, "Yummy #{fruit.capitalize}!", 'fruit-code' => fruit.upcase)
%foo:nstag{'foo:nsattr' => 'value'}
Output:
<?xml version='1.0' encoding='utf-8' ?>
<root xmlns:foo='http://myns'>
<dashed-tag dashed-attr='value'>Text</dashed-tag>
<underscore_tag>Text</underscore_tag>
<apple fruit-code='APPLE'>Yummy Apple!</apple>
<orange fruit-code='ORANGE'>Yummy Orange!</orange>
<pear fruit-code='PEAR'>Yummy Pear!</pear>
<foo:nstag foo:nsattr='value'></foo:nstag>
</root>
Look at the Haml::Helpers link on the haml reference for more methods like haml_tag.
If you want to use double-quotes for attributes,
See: https://stackoverflow.com/a/967065/498594
Or outside of rails use:
>> Haml::Engine.new("%tag{:name => 'value'}", :attr_wrapper => '"').to_html
=> "<tag name=\"value\"></tag>\n"
Haml can produce XML just as easily as HTML (I've used it for FBML and XHTML). What problems are you having?
I've not used HAML, but if you can't make it work another option is Builder.
what about creating the xml header, e.g. <?xml version="1.0" encoding="UTF-8"?>
?
It should be possible. After all you can create plain old XML with Notepad.

Resources