Error compiling w3c schemas with xmerl - erlang

I was trying to get XForms going on my Ubuntu desktop. There does not
appear to be much activity on XForms at the moment and I was trying to
get Backplanejs running. It did not work, and upon examining the javascript
I found it relied Microsoft libraries and activex.
Rather than learn javascript I decided to continue my erlang education and
struggled with xmerl instead. I created a directory for schemas with an
index file. The contents of this directory is:
tony#blessing:~/workspace/myXformProject$ ls schemas
SchemaList.txt XForms-Schema.xsd xhtml-lat1.ent xml-events.xsd
SchemaList.txt~ xhtml1-strict.dtd xhtml-special.ent
These schema's have been downloaded from w3c. However these schemas would
not compile yielding the error wfc_PEs_In_Internal_Subset. I would have
expected these well established w3c schemas to compile with xmerl.
What am I doing wrong?
Tony Wallace
6> B.
[{"http://www.w3.org/1999/xhtml",
"schemas/xhtml1-strict.dtd"},
{"http://www.w3.org/2001/xml-events",
"schemas/xml-events.xsd"},
{"http://www.w3.org/2002/xforms",
"schemas/XForms-Schema.xsd"}]
9> {ok,S1} = xmerl_xsd:process_schemas(B).
3450- fatal: {error,{wfc_PEs_In_Internal_Subset}}
** exception exit: {fatal,{{error,{wfc_PEs_In_Internal_Subset}},
{file,"schemas/xhtml1-strict.dtd"},
{line,628},
{col,89}}}
in function xmerl_scan:fatal/2
in call from xmerl_scan:scan_entity/2
in call from xmerl_scan:scan_markup_decl/2
in call from xmerl_scan:scan_ext_subset/2
in call from xmerl_scan:scan_document/2
in call from xmerl_scan:file/2
in call from xmerl_xsd:process_schemas/2
The 3450 refers to the code line in xmerl_scan:
scan_entity_value("%" ++ _T,S=#xmerl_scanner{environment=prolog},_,_,_,_,_) ->
?fatal({error,{wfc_PEs_In_Internal_Subset}},S);
And the error appears to be associated with line 628 of xhtml1-strict.dtd
The column of 89 would appear suspect as line 628 is not that wide:
621 <!--
622 param is used to supply a named property value.
623 In XML it would seem natural to follow RDF and support an
624 abbreviated syntax where the param elements are replaced
625 by attribute value pairs on the object start tag.
626 -->
627 <!ELEMENT param EMPTY>
628 <!ATTLIST param
629 id ID #IMPLIED
630 name CDATA #IMPLIED
631 value CDATA #IMPLIED
632 valuetype (data|ref|object) "data"
633 type %ContentType; #IMPLIED
634 >
635
If you got this far down the post, many thanks!
Tony

You seem to be invoking xmerl_xsd:process_schemas on a collection of schema documents some of which are XSD schema documents and one of which is not an XSD schema document at all, but a document type definition file (xhtml1-strict.dtd). The process_schemas function expects XSD schema documents, which are XML document instances, but DTD files are not XML document instances. You will need to acquire an XSD schema for XHTML, not the DTD, if you want to do what you appear to want to do. Unfortunately, the XHTML WG's XSD schema documents are not the easiest things in the world to use; good luck.
If you want to work with XForms, you might find it easier to get XSLTForms or Orbeon or BetterForms or EMC Formula working than you did to get backplane.js to work.

Related

Validate XML against schematron using SAXON EE edition

I am evaluating SAXON EE edition to validate XML against xsd and schematron.
Can someone help me in resolving the following queries:
While validating xml document against xsd, can we also get xpath of that error node along with errors in plain text. Currently I am getting error only.
Can we validate xml against schematron using Saxon EE version? Any code sample would be a great help.
Thanks.
1. While validating xml document against xsd, can we also get xpath of that error node.
Yes, the error information includes an XPath reference to the invalid node (in most cases: there are some cases such as duplicate IDs where there isn't one specific node in error).
If you generate an XML validity report using SchemaValidator.SetValidityReporting() then the resulting report will include the path information. Here's an example:
<?xml version="1.0" encoding="UTF-8"?>
<validation-report xmlns="http://saxon.sf.net/ns/validation"
system-id="file:/Users/mike/repo2/samples/data/books-invalid.xml">
<error line="3"
column="17"
path="/Q{}BOOKLIST[1]/Q{}BOOKS[1]/#x"
xsd-part="1"
constraint="cvc-complex-type.3">Attribute #x is not allowed on element <BOOKS></error>
<error line="10"
column="17"
path="/Q{}BOOKLIST[1]/Q{}BOOKS[1]/Q{}ITEM[1]/Q{}PRICE[1]"
xsd-part="2"
constraint="cvc-datatype-valid.1">The content "$0.2" of element <PRICE> does not match the required simple type. Cannot convert string to decimal: $0.2</error>
<error line="21"
column="20"
path="/Q{}BOOKLIST[1]/Q{}BOOKS[1]/Q{}ITEM[2]/Q{}PUB-DATE[1]"
xsd-part="2"
constraint="cvc-datatype-valid.1">The content "2002-02-31" of element <PUB-DATE> does not match the required simple type. Invalid date "2002-02-31" (Non-existent date)</error>
<error line="42"
column="22"
path="/Q{}BOOKLIST[1]/Q{}BOOKS[1]/Q{}ITEM[3]/Q{}REPUTATION[1]"
xsd-part="1"
constraint="cvc-complex-type.2.4">In content of element <ITEM>: The content model does not allow element <REPUTATION> to appear immediately after element <WEIGHT>. No further elements are allowed at this point. </error>
<meta-data>
<validator name="SAXON-EE" version="9.8.0.9"/>
<results errors="4" warnings="0"/>
<schema file="books.xsd" xsd-version="1.1"/>
<run at="2018-03-07T15:22:04.847Z"/>
</meta-data>
</validation-report>
You can also get the information if you supply an IInvalidityHandler as a callback to the SchemaValidator, though this requires a bit more digging. Saxon calls your IInvalidityHandler supplying a StaticError object (which is a bit of a misnomer). The StaticError object doesn't have the path information directly available, but it contains a reference to an XPathException object which can be cast to a ValidationException, and ValidationException has a method getPath() which returns this information if available.
2. Can we validate xml against schematron.
Saxon doesn't include a schematron validator per se, though many of the third-party tools that do schematron validation make use of Saxon "under the hood". I'm not up-to-date with the situation on .NET - but essentially there are two kinds of Schematron processor: those that generate XSLT code from the schematron schema (which typically use Saxon both to generate the XSLT and to execute it), and "native" processors. Searching for "schematron on .NET" gives you quite a number of projects, but I have no idea of their current status or quality.

Woodstox parser works fine in test run in Eclipse, but fails from command line

One of my JUnit tests uses (behind the scenes) the Woodstox parser.
When I run the test from within Eclipse, the test succeeds as expected.
But running the same test on the command line, using
mvn clean test -Dtest=com.example.MyClassTest#someParserTest
results in the test to fail with the following exception messages:
Error on line 114 column 21
SXXP0003: Error reported by XML parser: Invalid UTF-8 middle byte 0x3f (at char #4174, byte #3999)
...
at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:55)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:961)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4580)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1063)
at com.ctc.wstx.sax.WstxSAXParser.fireEvents(WstxSAXParser.java:524)
at com.ctc.wstx.sax.WstxSAXParser.parse(WstxSAXParser.java:452)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
at net.sf.saxon.event.Sender.send(Sender.java:171)
at net.sf.saxon.jaxp.IdentityTransformer.transform(IdentityTransformer.java:363)
I took a look at the to-be-parsed InputStream. The InputStreams are identical in both cases.
Also, there is no "line 114 column 21" in the InputStream. Line 114 ends on column 11.
How can I investigate what causes the different behavior?
It turned out that a library I used made wrong assumptions about the environment's default character encoding (also called platform's default charset).
In the Eclipse environment, calling Charset.defaultCharset() returned UTF-8, while in the command line environment it returned CP1252.
Many standard and third-party Java APIs behave differently depending on the platform's default charset, among them:
String.getBytes()
ByteArrayOutputStream.toString()
XMLOutputFactory.createXMLStreamWriter(OutputStream stream)
IOUtils.toString(InputStream input)
To resolve my issue, I had to update that library to explicitly use the correct character set:
String.getBytes(StandardCharsets.UTF_8)
ByteArrayOutputStream.toString( StandardCharsets.UTF_8.name() )
XMLOutputFactory.createXMLStreamWriter( OutputStream stream, StandardCharsets.UTF_8.name() )
IOUtils.toString(InputStream input, StandardCharsets.UTF_8)

How to correctly use MIBs for SNMP?

I'm currently trying to write a bash-monitoring-script for a Fujitsu Primergy RX300 S6, running with XenServer 6.5.0.
After downloading the MIB-files from the Fujitsu-Page I'm getting several errors, trying to run the following line snmpget -Ov -v 2c -c PUBLICKEY SERVER.IP SNMPv2-MIB::sysUpTime.0
I'm getting the correct result, but with that there are multiple errors like
Unlinked OID in VMWARE-TRAPS-MIB: vmware ::= { enterprises 6876 }
/usr/share/snmp/mibs/VMWARE-TRAPS-MIB.mib Textual convention doesn't map to real type (DisplayString): At line 26 in usr/share/snmp/mibs/log3v1.mib : (is a reserved word): At line 27 in /usr/share/snmp/mibs/log3v1.mib : (is a reserved word): At line 28 in /usr/share/snmp/mibs/log3v1.mib Unlinked OID in FSC-LOG3-MIB: sni ::= { enterprises 231 }
Undefined identifier: enterprises near line 13 of
[...]
I do unterstand, that it says that some definitions (from foreign MIBs) are missing, but how do I get the correct ones?
Check your IMPORTS definition in MIB files that you are trying to use. These are basically your external dependencies. Try downloading these MIB MODULEs either from vendor website or if it is standard MIB file like RFC1155-SMI and RFC1213 you can easily find it via google.
Here is an example:
IMPORTS
enterprises, OBJECT-TYPE
FROM RFC1155-SMI
DisplayString
FROM RFC1158-MIB;

NSXMLParser fails with NSXMLParserErrorDomain error 111; error 111 isn't defined?

A few people seem to have run into NSXMLParser error 111 before, but it's not defined in the constants. This answer seems to have mistaken 111 with 11: NSXMLParserErrorDomain 111
As far as I can tell, I have no illegal characters in my final xml:
<?xml version="1.0" encoding="utf-16"?><wsse:BinarySecurityToken wsu:Id="uuid:383b6148-1c27-45ab-963b-30e14af8154e" ValueType="http://schemas.xmlsoap.org/ws/2009/11/swt-token-profile-1.0" EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">aHR0cCUzYSUyZiUyZnNjaGVtYXMueG1sc29hcC5vcmclMmZ3cyUyZjIwMDUlMmYwNSUyZmlkZW50aXR5JTJmY2xhaW1zJTJmbmFtZWlkZW50aWZpZXI9V3dEM2ozRzBobjE0MWFndkNWJTJmWERadmgwJTJiQ0xHV1hBblRLTmM4Qjc3N1UlM2QmaHR0cCUzYSUyZiUyZnNjaGVtYXMubWljcm9zb2Z0LmNvbSUyZmFjY2Vzc2NvbnRyb2xzZXJ2aWNlJTJmMjAxMCUyZjA3JTJmY2xhaW1zJTJmaWRlbnRpdHlwcm92aWRlcj11cmklM2FXaW5kb3dzTGl2ZUlEJkF1ZGllbmNlPWh0dHBzJTNhJTJmJTJma21haW4ta2RzLWV1czItMC5jbG91ZGFwcC5uZXQlMmYmRXhwaXJlc09uPTEzOTQ3NjExODEmSXNzdWVyPWh0dHBzJTNhJTJmJTJmdG9sZWRvLmFjY2Vzc2NvbnRyb2wud2luZG93cy5uZXQlMmYmSE1BQ1NIQTI1Nj1iVTg4cWs2OFc3bmFxOEZFam1EVUFWSlQySzZ5cCUyYkxmdGR4SlFlWDhsYXclM2Q=</wsse:BinarySecurityToken>
I've also tried changing the encoding to utf-8, but it made no difference. What causes a parser to fail with error 111? Is the parser not set up correctly, or is the XML killing it?
In NSXMLParserError docs, it says:
The following error codes are defined by NSXMLParser. For error codes not listed here, see the <libxml/xmlerror.h> header file.
The number 111 isn't mentioned in this list, so we go to /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2/libxml/xmlerror.h, and find the value:
XML_ERR_USER_STOP, /* 111 */
There isn't a lot of documentation on XML_ERR_USER_STOP in libxml2, but from reading the changeset, it looks like it's a fast-fail when the parser sees an unexpected EOF.
Referred DebugCN and Internet.
Turns out I was simply passing in an entirely wrong string. The XML chunk came from a larger JSON structure, which I then processed down to get the only XML part; yet when inited the parser, I used the wrong string to create the NSData. So make sure you aren't mixing variables up. I'm still not sure why error 111 isn't defined in the documentation, though.

F# integer file directive

I've been using fslex and fsyacc, and the F# source files (.fs they generate from the lexer (.fsl) and parser (.fsp) rules refer to the original .fsl (and sometimes to the same .fs source file) all over the place with statement such as this (numbers are line numbers):
lex.fs
1 # 1 "/[PROJECT-PATH-HERE]/lex.fsp
...
16 # 16 "/PROJECT-PATH-HERE]/lex.fs
17 // This is the type of tokens accepted by the parser
18 type token =
19 | EOF
...
Also, the .fs files generated by pars.fsp do the same kind of thing, but additionaly reference to the F# signature file (.fsi) generated alongside it. What does any of this do/mean?
The annotations you see in the generated code are F# Compiler Directives (specifically, the 'line' directive).
The 'line' directive makes it so that when the F# compiler needs to emit a warning/error message for some part of the generated code, it has a way to determine which part of the original file corresponds to that part of the generated code. In other words, the F# compiler can generate a warning/error message referencing the original code which is the basis of the generated code causing the error.

Resources