Woodstox parser works fine in test run in Eclipse, but fails from command line - maven-3

One of my JUnit tests uses (behind the scenes) the Woodstox parser.
When I run the test from within Eclipse, the test succeeds as expected.
But running the same test on the command line, using
mvn clean test -Dtest=com.example.MyClassTest#someParserTest
results in the test to fail with the following exception messages:
Error on line 114 column 21
SXXP0003: Error reported by XML parser: Invalid UTF-8 middle byte 0x3f (at char #4174, byte #3999)
...
at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:55)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:961)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4580)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1063)
at com.ctc.wstx.sax.WstxSAXParser.fireEvents(WstxSAXParser.java:524)
at com.ctc.wstx.sax.WstxSAXParser.parse(WstxSAXParser.java:452)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
at net.sf.saxon.event.Sender.send(Sender.java:171)
at net.sf.saxon.jaxp.IdentityTransformer.transform(IdentityTransformer.java:363)
I took a look at the to-be-parsed InputStream. The InputStreams are identical in both cases.
Also, there is no "line 114 column 21" in the InputStream. Line 114 ends on column 11.
How can I investigate what causes the different behavior?

It turned out that a library I used made wrong assumptions about the environment's default character encoding (also called platform's default charset).
In the Eclipse environment, calling Charset.defaultCharset() returned UTF-8, while in the command line environment it returned CP1252.
Many standard and third-party Java APIs behave differently depending on the platform's default charset, among them:
String.getBytes()
ByteArrayOutputStream.toString()
XMLOutputFactory.createXMLStreamWriter(OutputStream stream)
IOUtils.toString(InputStream input)
To resolve my issue, I had to update that library to explicitly use the correct character set:
String.getBytes(StandardCharsets.UTF_8)
ByteArrayOutputStream.toString( StandardCharsets.UTF_8.name() )
XMLOutputFactory.createXMLStreamWriter( OutputStream stream, StandardCharsets.UTF_8.name() )
IOUtils.toString(InputStream input, StandardCharsets.UTF_8)

Related

Neo4j to Gephi import : Failed to invoke procedure Invalid UTF-8

I tried to import my data from Neo4j into Gephi but it doesn't work.
I have the following result in Neo4j :
Failed to invoke procedure apoc.gephi.add: Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xfb at [Source: (apoc.export.util.CountingInputStream); line: 1, column: 136]
As previously mentioned, it looks like neo4j is not exporting using UTF-8, so that, I would check how neo4j is generating the output.
Another possibility is that, when neo4j writing the output, something went slightly wrong.
I got this very same problem in the past when concurrently managing content in a file.
A thread crashed and closed "not correctly enough" the file. I mean, when reviewing the file, everything looks pretty normal, but some characters have been introduced which are not UTF-8. A tool like Atom can help you.
Best

Read a list from stream using Yap-Prolog

I want to run a (python3) process from my (yap) prolog script and read its output formatted as a list of integers, e.g. [1,2,3,4,5,6].
This is what I do:
process_create(path(python3),
['my_script.py', MyParam],
[stdout(pipe(Out))]),
read(Out, OutputList),
close(Out).
However, it fails at read/2 predicate with the error:
PL_unify_term: PL_int64 not supported
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
I am sure that I can run the process correctly because with [stdout(std)] parameter given to process_create the program outputs [1,2,3,4,5,6] as expected.
Weird thing is that when I change the process to output some constant term (as constant_term) it still gives the same PL_int64 error. Appending a dot to the process' output ([1,2,3,4,5,6].) doesn't solve the error. Using read_term/3 gives the same error. read_string/3 is undefined in YAP-Prolog.
How can I solve this problem?
After asking at the yap-users mailing list I got the solution.
Re-compiled YAP Prolog 6.2.2 with libGMP option and now it works. It may also occur in 32-bit YAP.

gcov file checksums do not match when processing gcda

After using XCode 5.1, I got an error like this:
gcov: Unknown command line argument '-v'. Try: '/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/gcov -help'
Then I find an answer from this page: XCode 5.1 Unit Test Coverage Analysis Fails On Files Using Blocks
However, I got another error like this:
Processing *****.gcda
File checksums do not match: 1280071245 != 5 in ().
Invalid .gcno File!
geninfo: ERROR: GCOV failed for ****.gcda!
For me the cause was that the *.gcno files were stale and did not match the *.gcda files. In fact most of them had gone missing.
Fixed this by doing the following:
Ensured that both: 'instrument program flow' and 'generated test coverage' were set in both the App's target and the test target.
I then added the following to the AppDelegate.
- (void)applicationWillTerminate:(UIApplication *)application
{
extern void __gcov_flush(void);
__gcov_flush(); //ensure all coverage data is flushed before shutdown
}
. . this is for application hosted iOS tests. Depending on your requirements you could make this a little less 'invasive' (eg, mix in the coverage flushing so that its only in your test target), but this should be a good starting point to get it working first.

F# integer file directive

I've been using fslex and fsyacc, and the F# source files (.fs they generate from the lexer (.fsl) and parser (.fsp) rules refer to the original .fsl (and sometimes to the same .fs source file) all over the place with statement such as this (numbers are line numbers):
lex.fs
1 # 1 "/[PROJECT-PATH-HERE]/lex.fsp
...
16 # 16 "/PROJECT-PATH-HERE]/lex.fs
17 // This is the type of tokens accepted by the parser
18 type token =
19 | EOF
...
Also, the .fs files generated by pars.fsp do the same kind of thing, but additionaly reference to the F# signature file (.fsi) generated alongside it. What does any of this do/mean?
The annotations you see in the generated code are F# Compiler Directives (specifically, the 'line' directive).
The 'line' directive makes it so that when the F# compiler needs to emit a warning/error message for some part of the generated code, it has a way to determine which part of the original file corresponds to that part of the generated code. In other words, the F# compiler can generate a warning/error message referencing the original code which is the basis of the generated code causing the error.

got SIGSEGV while calling 'require ("lsqlite3")' with lua 5.1.5

I had built lua 5.1.5 and lsqlite3-0.8.1. all of them run well on my RedHat Linux.
and then I ported them to my MIPS development board. lua and other modules (such as luafilesystem, md5, cgilua and wsapi) run well. but lsqlite3 does not work.
when I execute require("lsqlite3") in lua command line, it returns error messages in below:
lua
Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio
require("lsqlite3")
do_page_fault() #2: sending SIGSEGV to lua for invalid read access from
00000000 (epc == 00000000, ra == 2ac36144)
Segmentation fault
can any one give me any help to fix it? Thanks!
I got few progress in solving this problem, I rebuilt the LUA with gcc compile option '-Wl,-E' and rebuilt lsqlite3 later. I executed require ("lsqlite3") in lua command line, and it didnt print any message. I continued running some other database operation commands and found them all been successfully executed. As it seemed the problem had been solved, I should be very happy about it.
but another more strange problem raised.
If I put sentence require("lsqlite3") into a file, and then execute the file in this way:
lua file
it still printed error messages like this:
do_page_fault() #2: sending SIGSEGV to lua for invalid read access from
2ada054c (epc == 2ada054c, ra == 2abdceac)
If I put more database operation sentences into a file, and then run this file by lua. Lua can give correct result of query operation and insert values to table correctly, but always print error messages showed above.
If I run sentences in the file one by one in lua command line interface, it never print this error message.
It seems to give the error message when executing the 'require' function. But if I put require("lfs") into a file and run this file by lua, it never print error message.
I am confused that whait is the difference between the lua command line execution and lua script.
There are three places in lsqlite3.c where sqlite_int64 is used (never long long directly). When you build sqlite3 some type will be used for 64 bit integers; lsqlite3 will use the same type by including sqlite3.h for the definition of the type.

Resources