GraphDB Workbench fails to load .owl ontology file from Protégé - ontology

I created an ontology with Protege, which is saved as an .owl file.
When I try to load it to GraphDB (8.9.0) through Workbench Import -> RDF -> Upload RDF files (The supported RDF formats are .ttl .rdf .rj .n3 .nt .nq .trig .trix .brf .owl .jsonld, as well as their .gz versions and .zip archives) I'm getting the error:
RDF Parse Error: Content is not allowed in prolog. [line 1, column 1].
The full log:
01:00:37.877 [import-task-Accounting-1] ERROR o.e.r.rio.helpers.ParseErrorLogger - [Rio fatal] Content is not allowed in prolog. (1, 1)
01:00:37.879 [import-task-Accounting-1] ERROR c.o.f.impex.FileImportRunnableTask - RDF Parse Error
org.eclipse.rdf4j.rio.RDFParseException: Content is not allowed in prolog. [line 1, column 1]
at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:442)
at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:783)
at org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.reportFatalError(RDFXMLParser.java:1176)
at org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.fatalError(RDFXMLParser.java:1315)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:180)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:994)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.parse(RDFXMLParser.java:265)
at org.eclipse.rdf4j.rio.rdfxml.RDFXMLParser.parse(RDFXMLParser.java:207)
at org.eclipse.rdf4j.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:286)
at org.eclipse.rdf4j.repository.util.RDFLoader.load(RDFLoader.java:197)
at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.add(AbstractRepositoryConnection.java:329)
at com.ontotext.trree.monitorRepository.MonitorRepositoryConnection.add(MonitorRepositoryConnection.java:177)
at com.ontotext.trree.parallel.ParallelRDFLoader.add(ParallelRDFLoader.java:125)
at com.ontotext.trree.parallel.ParallelRDFLoader.add(ParallelRDFLoader.java:54)
at com.ontotext.forest.impex.ParallelAwareImporter.lambda$add$5(ParallelAwareImporter.java:107)
at com.ontotext.forest.impex.ParallelAwareImporter.wrapInBeginCommit(ParallelAwareImporter.java:128)
at com.ontotext.forest.impex.ParallelAwareImporter.add(ParallelAwareImporter.java:89)
at com.ontotext.forest.impex.FileImportRunnableTask.load(FileImportRunnableTask.java:36)
at com.ontotext.forest.impex.ImportRunnableTask.run(ImportRunnableTask.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
... 29 common frames omitted
Any help/advice/hint will be much appreciated.

The issue is that GraphDB assumes that .owl files are written in XML, hence tries to parse them as such. However, as mentioned in the OWL Conformance (2.2 Tool Conformance):
A conformant OWL 2 tool may also accept ontology documents using other serializations, for example Turtle ...
In fact an OWL ontology definition can have 5 different valid formats: functional-style, RDF/XML, OWL/XML, Turtle and Manchester syntax. Protégé decided to use Turtle syntax for its .owl files, so GraphDB gets an error when trying to parse them as XML.
Workaround: Just change the file extension from .owl to .ttl and GraphDB will happily load it.

My guess is that the file starts with a BOM character U+FEFF: error at line 1, column 1.This is a zero-width space used sometimes to mark a file as being in some Unicode representation, UTF-8, UTF-16LE, UTF-16BE. Can you share the first line of the file?
You can try removing the tag in the beginning of the file or by saving the file in Notepad++ with Encoding(Tab) > Encode in UTF-8:selected.

Related

JDOM2 Outputter inserting 4 errant bytes at start of XML file

I'm using Java 8 and JDOM 2.0.6 (Mac-Yosemite + Eclipse) to generate an XML file.
The prolog of the file comes out with these bytes preceding <?xml
C2 A8 C3 8C
I'm using XMLOutputter.output() to write out the Document. When I direct the output to console, it comes out correctly. When directed output to a file, I get the errant bytes inserted.
relevant code:
`
private Document outputDoc = new Document();
outputDoc.setRootElement(new Element("GraphicalAlgorithm_" + challengeID, DFG2D_NAMESPACE));
outputDoc.getRootElement().addContent(..my Element...);
XMLOutputter xmlOutputter = new XMLOutputter(Format.getPrettyFormat());
//TEST ONLY: writes to console
xmlOutputter.output(outputDoc, System.out);
xmlOutputter.output(outputDoc, fileStream);;
`
I'm stumped on this one.
I stepped into this minefield by copying and pasting the "file output" method I had been previously using for Serialization (.ser) file output.
The errant 4-bytes are a "magic ID" that Java serialization stamps into a FileOutpuStream which has been previously attached to an ObjectOutputStream (the specialized outputter you use for serialized output calls, e.g. "writeObject(obj)". One might assume that you have to actually invoke "writeOutput(obj)" for the serializer to mark the file this way, but no.
A complete analysis/write-up and repair can be found here:
https://github.com/hunterhacker/jdom/issues/193

Problem SamplingRateCalculatorList (00000283DDC3C0E0) : All classes are empty ! OTB + QGis

I use OTB (Orfeo Tool Box) in QGis for classification. When I use the ImageTrainClassifier tool in a batch process, I have a problem for some images. Instead of returning a model in a xml/txt file format, it returns several files with those extensions : .xml_rates_1, .xml_samples_1.dbf, .xml_samples_1.prj, .xml_samples_1.shp, .xml_samples_1.shx, .xml_stats_1 (I have the same files with txt instead of xml if I use txt file format as output).
During the execution of the algorithms, I have only one warning message :
(WARNING): file ..\Modules\Learning\Sampling\src\otbSamplingRateCalculatorList.cxx, line 99, SamplingRateCalculatorList (00000283DDC3C0E0): All classes are empty !
And after that :
(FATAL) TrainImagesClassifier: No samples found in the inputs!
The problem is that after that, I want to use ImageClassifier, that takes the model of ImageTrainClassifier in input, that I don’t have.
Thanks for your help

\[Errno -101\] NetCDF: HDF error when opening netcdf file

I have this error when opening my netcdf file.
The code was working before.
How do I fix this ?
Traceback (most recent call last):
File "", line 1, in
...
File "file.py", line 71, in gather_vgt
return xr.open_dataset(filename)
File "/.../lib/python3.6/site-packages/xarray/backends/api.py", line
286, in open_dataset
autoclose=autoclose)
File "/.../lib/python3.6/site-packages/xarray/backends/netCDF4_.py",
line 275, in open
ds = opener()
File "/.../lib/python3.6/site-packages/xarray/backends/netCDF4_.py",
line 199, in _open_netcdf4_group
ds = nc4.Dataset(filename, mode=mode, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2015, in
netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1636, in
netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'file.nc'
When I try to open the same netcdf file with h5py I get this error :
OSError: Unable to open file (file locking disabled on this file
system (use HDF5_USE_FILE_LOCKING environment variable to override),
errno = 38, error message = '...')
You must be in this situation :
your HDF5 library has been updated (1.10.1) (netcdf uses HDF5 under the hood)
your file system does not support the file locking that the HDF5 library uses.
In order to read your hdf5 or netcdf files, you need set this environment variable :
HDF5_USE_FILE_LOCKING=FALSE
For references, this was introduced in HDF5 version 1.10.1,
Added a mechanism for disabling the SWMR file locking scheme.
The file locking calls used in HDF5 1.10.0 (including patch1)
will fail when the underlying file system does not support file
locking or where locks have been disabled. To disable all file
locking operations, an environment variable named
HDF5_USE_FILE_LOCKING can be set to the five-character string
'FALSE'. This does not fundamentally change HDF5 library
operation (aside from initial file open/create, SWMR is lock-free),
but users will have to be more careful about opening files
to avoid problematic access patterns (i.e.: multiple writers) >that the file locking was designed to prevent.
Additionally, the error message that is emitted when file lock
operations set errno to ENOSYS (typical when file locking has been
disabled) has been updated to describe the problem and potential
resolution better.
(DER, 2016/10/26, HDFFV-9918)
In my case, the solution as suggested by #Florian did not work. I found another solution, which suggested that the order in which h5py and netCDF4 are imported matters (see here).
And, indeed, the following works for me:
from netCDF4 import Dataset
import h5py
Switching the order results in the error as described by OP.

Using Jena to convert an owl file to N-Triples from terminal returns an empty file

I have generated an owl file using this generator http://swat.cse.lehigh.edu/projects/lubm/
I want to transform the file in N-triples and have done it before using
$ riot -out N-TRIPLE ~/lubm20/*.owl > lubm20.nt
for some reason now I get an empty file (lubm20.nt)
and when I use
$ rdfcat -out N-TRIPLE ~/lubm20/*.owl > lubm20.nt
I get this error
Exception in thread "main" org.apache.jena.riot.RiotException: <file:///root/lubm20/classes\University0_0.owl> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
at org.apache.jena.riot.s5ystem.IRIResolver.exceptions(IRIResolver.java:371)
at org.apache.jena.riot.system.IRIResolver.resolve(IRIResolver.java:328)
at org.apache.jena.riot.system.IRIResolver$IRIResolverSync.resolve(IRIResolver.java:489)
at org.apache.jena.riot.system.IRIResolver.resolveIRI(IRIResolver.java:254)
at org.apache.jena.riot.system.IRIResolver.resolveString(IRIResolver.java:233)
at org.apache.jena.riot.SysRIOT.chooseBaseIRI(SysRIOT.java:109)
at org.apache.jena.riot.adapters.AdapterFileManager.readModelWorker(AdapterFileManager.java:286)
at org.apache.jena.util.FileManager.readModel(FileManager.java:341)
at jena.rdfcat.readInput(rdfcat.java:328)
at jena.rdfcat$ReadAction.run(rdfcat.java:473)
at jena.rdfcat.go(rdfcat.java:231)
at jena.rdfcat.main(rdfcat.java:206)
The generator would generate a well known semantic web benchmark dataset so how can it have
UNWISE_CHARACTER s?
edit:
for the question asked
I used this line to generate the *.owl files
java edu.lehigh.swat.bench.uba.Generator -onto http://swat.cse.lehigh.edu/onto/univ-bench.owl univ 20
then moved the *.owl files to lubm20 folder
I used rdf2rdf instead of jena
java -jar rdf2rdf-1.0.1-2.3.1.jar /lubmData/lubm100/*.owl lubm100.nt
worked like a charm
enter link description here

Identifying source of parser errors in Apache Fuseki

I am getting the following error in trying to load a large RDF/XML document into Fuseki:
> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
How do I find out what line contains the offending error?
I have tried turning up the output in Log4j.properties and I also tried validating the RDF/XML file using the Jena commandline rdfxml tool (as well as utf8 & riot) --- it validates with no errors reported. But I'm new to this toolset.
(version?)
Check the ""-strings in your RDF/XML data for undesiravle URIs - especially spaces in URIs.
Best to validate before loading : try riot YourFile and send stderr and stdout to a file. The errors should be approximately in the position of the parser output (N-triples) at the time.

Resources