ERROR [stderr] (default task-1) org.xml.sax.SAXParseException; systemId: - xml-parsing

I'm trying to parse an xml using below code,
if(f.exists())
{
doc = dBuilder.parse("/mySystem/Config/data/Settings/print-settings.xml");
}
but I'm not sure why I'm getting below error while parsing the xml
More pseudo attributes are expected.
org.xml.sax.SAXParseException; systemId:
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

In my xml encoding attribute was missing by adding same its get resolved
<?xml version=“1.0” encoding=“utf-8”?>

Related

Issue with registration of java twelvemonkeys registration for Deeplearning app

I am trying to register for a servlet the following and getting an exception. The code is:
static {
IIORegistry registry = IIORegistry.getDefaultInstance();
registry.registerServiceProvider(new com.twelvemonkeys.servlet.image.IIOProviderContextListener());
registry.registerServiceProvider(new com.twelvemonkeys.imageio.plugins.jpeg.JPEGImageReaderSpi());
registry.registerServiceProvider(new com.twelvemonkeys.imageio.plugins.jpeg.JPEGImageWriterSpi());
}
I am getting the following exception thrown. Funny thing is I only am using the read not the write.
I am using the 3.6 version of twelvemonkeys.
Thanks for any hints!
Exception in thread "main" java.lang.NoSuchMethodError: com.twelvemonkeys.imageio.util.IIOUtil.lookupProviderByName(Ljavax/imageio/spi/ServiceRegistry;Ljava/lang/String;Ljava/lang/Class;)Ljava/lang/Object;
at com.twelvemonkeys.imageio.plugins.jpeg.JPEGImageWriterSpi.onRegistration(JPEGImageWriterSpi.java:82)
at javax.imageio.spi.SubRegistry.registerServiceProvider(Unknown Source)
at javax.imageio.spi.ServiceRegistry.registerServiceProvider(Unknown Source)
at javax.imageio.spi.IIORegistry$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at javax.imageio.spi.IIORegistry.registerInstalledProviders(Unknown Source)
at javax.imageio.spi.IIORegistry.registerStandardSpis(Unknown Source)
at javax.imageio.spi.IIORegistry.<init>(Unknown Source)
at javax.imageio.spi.IIORegistry.getDefaultInstance(Unknown Source)
at deeplearningtest.test.<clinit>(test.java:32)
Ok, I solved the problem. I went to https://github.com/haraldk/TwelveMonkeys#manual-dependency-example and I re-downloaded all the jars mentioned in the article making sure I paid close attention to the versions to make sure 3.6 was selected since 3.6 is not shown as part of the jar name (which I liked). Once I restarted eclipse I got past that problem.
Many thanks haraldK!

How to deal with CoderException: cannot encode a null String with scio

I just started using scio and dataflow. Trying my code to one input file, worked fine. But when I add more files to the input, got the following exception:
java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: cannot encode a null String
at org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:280)
at org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:309)
at org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:77)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:621)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:609)
at com.spotify.scio.util.Functions$$anon$3.processElement(Functions.scala:158)
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null String
at org.apache.beam.sdk.coders.StringUtf8Coder.getEncodedElementByteSize(StringUtf8Coder.java:136)
at org.apache.beam.sdk.coders.StringUtf8Coder.getEncodedElementByteSize(StringUtf8Coder.java:37)
at org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
at com.spotify.scio.coders.RecordCoder.registerByteSizeObserver(Coder.scala:279)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:399)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
at org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
at org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:272)
at org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:309)
at org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:77)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:621)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:609)
at com.spotify.scio.util.Functions$$anon$3.processElement(Functions.scala:158)
at com.spotify.scio.util.Functions$$anon$3$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:275)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:240)
at org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:325)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:394)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:363)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I guess one of my input files may include some malformed data. But how to bypass the bad data? There is a similar question with Java Beam com.google.cloud.dataflow.sdk.coders.CoderException: cannot encode a null String
So I tried this:
val scText = sc.textFile(input)
scText.setCoder(NullableCoder.of(StringUtf8Coder.of()))
It didn't help. Can someone help me on this? Thanks.
The scio team provided a solution to this problem. Basically add --nullableCoders=true in command line argument.

Story Not Found error while running a test using Jbehave Junit

I have configured my Jbehave test project and has a .story file in the project. I tried using the configuration settings as I found on the internet but when I run the tests, it gives me an error, the stack trace is shown below
org.jbehave.core.io.StoryResourceNotFound: Story path 'D:\AutoRegression8.8\NewProject\src\BusinessCase1.story' not found by class loader sun.misc.Launcher$AppClassLoader#631d75b9
at org.jbehave.core.io.LoadFromClasspath.resourceAsStream(LoadFromClasspath.java:80)
at org.jbehave.core.io.LoadFromClasspath.loadResourceAsText(LoadFromClasspath.java:65)
at org.jbehave.core.io.LoadFromClasspath.loadStoryAsText(LoadFromClasspath.java:74)
at org.jbehave.core.embedder.PerformableTree.storyOfPath(PerformableTree.java:261)
at org.jbehave.core.embedder.StoryManager.storyOfPath(StoryManager.java:61)
at org.jbehave.core.embedder.StoryManager.storiesOf(StoryManager.java:92)
at org.jbehave.core.embedder.StoryManager.runStoriesAsPaths(StoryManager.java:86)
at org.jbehave.core.embedder.Embedder.runStoriesAsPaths(Embedder.java:213)
at org.jbehave.core.junit.JUnitStories.run(JUnitStories.java:20)
at TestRunner.run(TestRunner.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
However, I have my .story file in the same location at which the code tries to find it. To find the .story file, I have used the below code:
#Override
protected List<String> storyPaths() {
/*
* return new StoryFinder().findPaths(
* CodeLocations.codeLocationFromClass(this.getClass()), "**.story",
* "");
*/
String placetoSearch = System.getProperty("user.dir") + "\\src\\BusinessCase1.story";
/*return new StoryFinder().findPaths(CodeLocations.codeLocationFromClass(this.getClass()), placetoSearch, "");*/
return Arrays
.asList(placetoSearch);
}
Any help or reference in this regard would be appreciated.
There's a difference between looking for a file, and looking for a resource.
JBehave uses the classloader you set it up with to look for the story as a resource. A resource is normally part of the packages you're running. That means it needs a filename relative to the root of your classes, rather than an absolute path.
(If you were using myClass.getResource() rather than myClassLoader.getResource() it would be relative to your class.)
You can also use unix-style slashes if you want to. Try "/BusinessCase1.story" as the filename.

Having trouble with gpc rendering plugin for a grails 3 app

I am trying to migrate an old grails app to grails 3. I am using grails 3.0.10. I was using the rendering plugin in my old app to generate PDFs and have a bunch of PDFs built this way which I would like to keep intact, so I'm trying to get this rendering plugin installed in my grails 3 app. As suggested, I have added the following line to my build.gradle under the dependencies:
compile "org.grails.plugins:rendering:2.0.0-SNAPSHOT"
This seems to correctly pull this plugin .jar file and the run-app works. However, when I try to render a gsp as a PDF through my controller I'm getting a NullPointerException that is being thrown by some code in the rendering plugin.
Here's my code to generate a PDF from a controller method:
renderPdf(template: "/pdfs/test", model: [name : 'Amarish'], filename: 'Hello-There.pdf')
Since the above did not work, I also tried it separately a different way by including the pdfRenderingService in the controller through dependency injection and then tried the following:
ByteArrayOutputStream bytes = pdfRenderingService.render(template: "/pdfs/test", model: [name: 'Amarish'])
response.setContentLength(bytes.length)
response.setContentType('application/pdf')
response.outputStream.write(bytes)
I am including the stack trace below. Can you please let me know how I could correct this issue?
ERROR org.grails.web.errors.GrailsExceptionResolver - NullPointerException occurred when processing request: [GET] /test/testPDF
Stacktrace follows:
java.lang.reflect.InvocationTargetException: null
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: java.lang.NullPointerException: null
at java.beans.Introspector.getPublicDeclaredMethods(Introspector.java:1281) ~[na:1.7.0_79]
at java.beans.Introspector.getTargetMethodInfo(Introspector.java:1141) ~[na:1.7.0_79]
at java.beans.Introspector.getBeanInfo(Introspector.java:416) ~[na:1.7.0_79]
at java.beans.Introspector.getBeanInfo(Introspector.java:163) ~[na:1.7.0_79]
at grails.plugins.rendering.document.RenderEnvironment.init(RenderEnvironment.groovy:31) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.document.RenderEnvironment.with(RenderEnvironment.groovy:68) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.document.RenderEnvironment.with(RenderEnvironment.groovy:60) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.document.XhtmlDocumentService.generateXhtml(XhtmlDocumentService.groovy:65) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.document.XhtmlDocumentService.createDocument(XhtmlDocumentService.groovy:35) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.RenderingService.render(RenderingService.groovy:36) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.RenderingService.render(RenderingService.groovy:35) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.RenderingService.render(RenderingService.groovy:65) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at grails.plugins.rendering.RenderingTrait$Trait$Helper.renderPdf(RenderingTrait.groovy:47) ~[rendering-2.0.0-SNAPSHOT.jar:na]
at com.svp.controller.TestController$_closure1.doCall(TestController.groovy:14) ~[main/:na]
... 3 common frames omitted
What you need in your build.gradle dependencies is
runtime "org.springframework:spring-test:4.2.1.RELEASE"
and your code will just work just fine. Good luck!
You can also add the latest version from here

Disabling XML validation in WebHarvest

I have a mobile application already published in the Apple's app store.
This SPI client app uses a Rest API in the server side to retrieve real time information regarding buses arrivals in a specific bus stop.
The app was working like a charm for 6 months.
The Rest API uses WebHarvest to scrap the real data information from a website (for instance: http://www.metlink.org.nz/stop/4912/departures).
Few days ago the HTML page scraped from my server side code has changed by adding the following line:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Since than, my app has stopped working.
I know I can strip the line above using regExp but I would like to know if there is a way to inform WebHarvest to disable the XML validation. Disabling XML validation, I don't need to go in every configuration that I have and change my xpath expression to a regExp to strip the line above.
Here is my configuration file:
<config charset="UTF-8">
<var-def name="pageContentStr">
<html-to-xml>
<http url="http://www.metlink.org.nz/stop/${stationID.toString()}/departures" />
</html-to-xml>
</var-def>
<var-def name="serverTime">
<xpath expression="/html/body/ul/li/span/text()">
<var name="pageContentStr" />
</xpath>
</var-def>
<var-def name="busRTI">
<xpath expression="//tbody/tr[#data-code]/concat(td[1]/a[starts-with(#href,'timetables/')]/span/text(),'::',td[1]/a[starts-with(#href,'timetables/bus/')]/span/attribute::style,'::',td[2]/span/text(),'::',td[3]/span/text())">
<var name="pageContentStr" />
</xpath>
</var-def>
</config>
The config file inserted above is working fine if I run it inside WebHarvest GUI (weird). However, I receive an error when running it inside my Rest API. Here is the error that I receive:
exception
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.webharvest.exception.ScraperXPathException: Error parsing XPath expression (XPath = [/html/body/ul/li/span/text()])!
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:948)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827)
javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812)
javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
root cause
org.webharvest.exception.ScraperXPathException: Error parsing XPath expression (XPath = [/html/body/ul/li/span/text()])!
org.webharvest.runtime.processors.XPathProcessor.execute(XPathProcessor.java:70)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:25)
org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.Scraper.execute(Scraper.java:166)
org.webharvest.runtime.Scraper.execute(Scraper.java:179)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.scrapeBusesForStation(MetLinkAdapterImpl.java:147)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.getStationBuses(MetLinkAdapterImpl.java:118)
com.didibaba.services.BusStationServiceImpl.getBusStationInfoByName(BusStationServiceImpl.java:80)
com.didibaba.web.controllers.BusStationController.getBusStationInfo(BusStationController.java:36)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:219)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:745)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:686)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:925)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:856)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:936)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827)
javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812)
javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
root cause
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: `http://www.w3.org/TR/REC-html40/loose.dtd`; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
net.sf.saxon.event.Sender.sendSAXSource(Sender.java:420)
net.sf.saxon.event.Sender.send(Sender.java:169)
net.sf.saxon.Configuration.buildDocument(Configuration.java:3346)
net.sf.saxon.Configuration.buildDocument(Configuration.java:3288)
net.sf.saxon.query.StaticQueryContext.buildDocument(StaticQueryContext.java:327)
org.webharvest.utils.XmlUtil.evaluateXPath(XmlUtil.java:77)
org.webharvest.runtime.processors.XPathProcessor.execute(XPathProcessor.java:68)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:25)
org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.Scraper.execute(Scraper.java:166)
org.webharvest.runtime.Scraper.execute(Scraper.java:179)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.scrapeBusesForStation(MetLinkAdapterImpl.java:147)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.getStationBuses(MetLinkAdapterImpl.java:118)
com.didibaba.services.BusStationServiceImpl.getBusStationInfoByName(BusStationServiceImpl.java:80)
com.didibaba.web.controllers.BusStationController.getBusStationInfo(BusStationController.java:36)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:219)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:745)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:686)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:925)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:856)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:936)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827)
javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812)
javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
root cause
org.xml.sax.SAXParseExceptionpublicId: -//W3C//DTD HTML 4.0 Transitional//EN; systemId: http://www.w3.org/TR/REC-html40/loose.dtd; lineNumber: 31; columnNumber: 3; The declaration for the entity "HTML.Version" must end with '>'.
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanEntityDecl(XMLDTDScannerImpl.java:1562)
com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDecls(XMLDTDScannerImpl.java:1964)
com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDTDExternalSubset(XMLDTDScannerImpl.java:297)
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1162)
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1049)
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:962)
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
net.sf.saxon.event.Sender.sendSAXSource(Sender.java:396)
net.sf.saxon.event.Sender.send(Sender.java:169)
net.sf.saxon.Configuration.buildDocument(Configuration.java:3346)
net.sf.saxon.Configuration.buildDocument(Configuration.java:3288)
net.sf.saxon.query.StaticQueryContext.buildDocument(StaticQueryContext.java:327)
org.webharvest.utils.XmlUtil.evaluateXPath(XmlUtil.java:77)
org.webharvest.runtime.processors.XPathProcessor.execute(XPathProcessor.java:68)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:25)
org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:115)
org.webharvest.runtime.Scraper.execute(Scraper.java:166)
org.webharvest.runtime.Scraper.execute(Scraper.java:179)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.scrapeBusesForStation(MetLinkAdapterImpl.java:147)
com.didibaba.services.adapters.metlink.MetLinkAdapterImpl.getStationBuses(MetLinkAdapterImpl.java:118)
com.didibaba.services.BusStationServiceImpl.getBusStationInfoByName(BusStationServiceImpl.java:80)
com.didibaba.web.controllers.BusStationController.getBusStationInfo(BusStationController.java:36)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:219)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:745)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:686)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:925)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:856)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:936)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827)
javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812)
javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
Thanks in advance.
you could try using a omithtmlenvelope="true" in your html-to-xml call.
<var-def name="pageContentStr">
<html-to-xml omithtmlenvelope="true">
<http url="http://www.metlink.org.nz/stop/${stationID.toString()}/departures" />
</html-to-xml>
</var-def>
However, unfortunately, as you said I cannot reproduce the error you are getting, and cannot test the result...
Had a similar issue with the xpath evaluator throwing an org.xml.sax.SAXParseException
White spaces are required between publicId and systemId.
When you can change the origin xml, the solution for this problem is already solved here.
Webharvest uses html cleaner under the hood. I use the Complete Web-Harvest project and so could prevent adding the doctype-tag on the html-to-xml.
I use html cleaner version 2.6.1., and modified org.webharvest.runtime.processors.HtmlToXmlProcessor to support this newer version
HtmlCleaner cleaner = new HtmlCleaner( );
CleanerProperties cleanerProperties = cleaner.getProperties();
As html cleaner supports an omitDoctypeDeclaration - parameter ommiting the doctype at all, I added that (in future that might be done via an extra attribute with the scraper xml).
cleanerProperties.setOmitDoctypeDeclaration(true);
Hope it helps, and thanks to the creator of webharvest, it is a great and pretty reliable tool!

Resources