AVRO schema evolution adding optional column with default fails deserialization - avro

When reading through avro documentation, for example [1], I understood, that schema evolution is supported, and if I added column with specified default, it should be backwards compatible (and even forward when I remove it again). Sounds great, so I added column defined as:
{
"name": "newColumn",
"type": ["null","string"],
"default": null,
"doc": "something wrong"
}
and try to consumer some topic having this schema from beginning, it fails with message:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)
to give a little bit more information. Avro schema defines one top level type, having 2 fields. String describing type of message, and union of N types. All N-1, non-modified types can be read, but one updated with optional, default-having column cannot be read. I'm not sure if this design is strictly speaking correct, but that's not the point (feel free to criticise and recommend better approach!). I'm after schema evolution, which seems not to be working.
Am I doing something wrong?
[1] https://docs.oracle.com/database/nosql-12.1.3.4/GettingStartedGuide/schemaevolution.html#changeschema-rules
EDIT:
and if we alter type definition to:
"type": "string",
"default": ""
it still does not work and generated error is:
Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -1
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)
what code does lead to given failures:
BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(avro, (BinaryDecoder)null);
GenericRecord record = (GenericRecord)(new GenericDatumReader(schema)).read((Object)null, binaryDecoder);

There is usually some miss-understanding regarding to schema evolution and how it works. When you evolution a schema it does not mean that you don't need the "writer" schema to read the avro data. For this purposes you should be using the following constructor GenericDatumReader
public GenericDatumReader(Schema writer,
Schema reader)
As you can see, writer schema (schema used to serialize the avro data) and reader schema ( your "evolution" schema) must be present. There are several libraries/tools (Hive, Spark) that abstract this but it is only possible because the file itself contains the schema (non schema-less)

Related

neo4j admin import Error in import Requested index -1, but length is 1000000

I have a set of CSV's that I have been able to use with LOAD CSV to create a database. This set is the small version (1 gb) of a much larger data set (120 gb) I intend to load to neo4j using admin import. I am trying to run the admin import on the smaller dataset first since I have already successfully created a graph with that data. I assume that if I can get the admin import to run for the small version it will hopefully run without problems for the large dataset. I've read through the admin import instructions and I've set up header files. The import loads the nodes just fine but ends up failing with he relationship files. Can anyone help me understand what is happening here so that I can figure out how to fix it? I've tried just removing the file and its associated nodes but this only results in the same error being thrown from the next file in the relationships list.
IMPORT FAILED in 9s 121ms.
Data statistics is not available.
Peak memory usage: 1.015GiB
Error in input data
Caused by:ERROR in input
data source: BufferedCharSeeker[source:/var/lib/neo4j/import/rel_cchg_dimcchg.csv, position:3861455, line:77614]
in field: :START_ID(cchg-ID):1
for header: [:START_ID(cchg-ID), :END_ID(dim_cchg-ID), :TYPE]
raw field value: 106715432018-09-010.01.00.0
original error: Requested index -1, but length is 1000000
org.neo4j.internal.batchimport.input.InputException: ERROR in input
data source: BufferedCharSeeker[source:/var/lib/neo4j/import/rel_cchg_dimcchg.csv, position:3861455, line:77614]
in field: :START_ID(cchg-ID):1
for header: [:START_ID(cchg-ID), :END_ID(dim_cchg-ID), :TYPE]
raw field value: 106715432018-09-010.01.00.0
original error: Requested index -1, but length is 1000000
at org.neo4j.internal.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:234)
at org.neo4j.internal.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:98)
at org.neo4j.internal.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.internal.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.neo4j.internal.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:110)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Requested index -1, but length is 1000000
at org.neo4j.internal.batchimport.cache.OffHeapRegularNumberArray.addressOf(OffHeapRegularNumberArray.java:42)
at org.neo4j.internal.batchimport.cache.OffHeapLongArray.get(OffHeapLongArray.java:43)
at org.neo4j.internal.batchimport.cache.DynamicLongArray.get(DynamicLongArray.java:46)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.dataValue(EncodingIdMapper.java:767)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.findFromEIdRange(EncodingIdMapper.java:802)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.binarySearch(EncodingIdMapper.java:750)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.binarySearch(EncodingIdMapper.java:305)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.get(EncodingIdMapper.java:205)
at org.neo4j.internal.batchimport.RelationshipImporter.nodeId(RelationshipImporter.java:134)
at org.neo4j.internal.batchimport.RelationshipImporter.startId(RelationshipImporter.java:109)
at org.neo4j.internal.batchimport.input.InputEntityVisitor$Delegate.startId(InputEntityVisitor.java:228)
at org.neo4j.internal.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:117)
... 9 more
The error is actually quite explicit: have a look at line 77614 in rel_cchg_dimcchg.csv. It's usually caused by an incorrect endpoint id. For example, if the END_ID is supposed to be a number but it's something like 4171;4172;4173;4174;4175;4176 this will raise the InputException error.
One would assume that --skip-bad-relationships would ignore these issues but it doesn't. So, the only remedy is to ensure that all START_ID/END_ID's are correct (ie. the right data type and format).

liquibase Element 'databaseChangeLog' used but not declared

We use liquibase 3.6.1 for change database mysql 5.x.
On Windows (10) everything is working. But on linux we get the error:
liquibase.exception.ChangeLogParseException: Error parsing line 2 column 19
of /home/myapp/conf/db/liquibase/changelog.xml: Element
'databaseChangeLog' used but not declared.
at
liquibase.parser.core.xml.XMLChangeLogSAXParser.parseToNode
Caused by: oracle.xml.parser.v2.XMLParseException: Element
'databaseChangeLog' used but not declared.
at oracle.xml.parser.v2.XMLError.flushErrors(XMLError.java:143)
oracle.xml.parser.v2.NonValidatingParser.parseDocument
(NonValidatingParser.java:269)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:149)
at liquibase.parser.core.xml.XMLChangeLogSAXParser.parseToNode
(XMLChangeLogSAXParser.java)
... 11 common frames omitted
Caused by: org.xml.sax.SAXParseException: <Line 2, Column 19>: XML-0149: (Error) Element 'databaseChangeLog' used but not declared.
at oracle.xml.parser.v2.XMLError.flushErrorHandler(XMLError.java:169)
at oracle.xml.parser.v2.XMLError.flushErrors(XMLError.java:137)
... 14 common frames omitted
Probably the problem is connected with xml parser, but I don't understand what really is happening. And what can I do.
Unfortunately, I also don't have access to the linux machine.
It could be an encoding problem of your changelog files.
Ensure that your are in UTF-8 for safety

Element type \"field\" must be followed by either attribute specifications, \">\" or \"/>\"."

I am try to make a collection with my solr
_config. I get the following error:
{
"failure": {
"10.47.24.19:5285_solr": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at https://10.47.24.19:5285/solr: Error CREATEing SolrCore 'localWebCollection_shard1_replica2': Unable to create core [localWebCollection_shard1_replica2] Caused by: Element type \"field\" must be followed by either attribute specifications, \">\" or \"/>\".",
"10.44.121.52:6560_solr": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at https://10.44.121.52:6560/solr: Error CREATEing SolrCore 'localWebCollection_shard1_replica1': Unable to create core [localWebCollection_shard1_replica1] Caused by: Element type \"field\" must be followed by either attribute specifications, \">\" or \"/>\"."
}
I checked all the fields existing in my schema.xml. They are all closed with "/>". Any Ideas on how to fix this error?
Any helps or ideas would be highly appreciated.
Turned out the 'space' between two of the attributes in one of the fields was removed! Seeing this error you should make sure that in the schema.xml all the attributes of each one of the fields are properly separated and that you have quoted the values of any of the attributes like "body", "false", "true"...etc. properly.

Error compiling w3c schemas with xmerl

I was trying to get XForms going on my Ubuntu desktop. There does not
appear to be much activity on XForms at the moment and I was trying to
get Backplanejs running. It did not work, and upon examining the javascript
I found it relied Microsoft libraries and activex.
Rather than learn javascript I decided to continue my erlang education and
struggled with xmerl instead. I created a directory for schemas with an
index file. The contents of this directory is:
tony#blessing:~/workspace/myXformProject$ ls schemas
SchemaList.txt XForms-Schema.xsd xhtml-lat1.ent xml-events.xsd
SchemaList.txt~ xhtml1-strict.dtd xhtml-special.ent
These schema's have been downloaded from w3c. However these schemas would
not compile yielding the error wfc_PEs_In_Internal_Subset. I would have
expected these well established w3c schemas to compile with xmerl.
What am I doing wrong?
Tony Wallace
6> B.
[{"http://www.w3.org/1999/xhtml",
"schemas/xhtml1-strict.dtd"},
{"http://www.w3.org/2001/xml-events",
"schemas/xml-events.xsd"},
{"http://www.w3.org/2002/xforms",
"schemas/XForms-Schema.xsd"}]
9> {ok,S1} = xmerl_xsd:process_schemas(B).
3450- fatal: {error,{wfc_PEs_In_Internal_Subset}}
** exception exit: {fatal,{{error,{wfc_PEs_In_Internal_Subset}},
{file,"schemas/xhtml1-strict.dtd"},
{line,628},
{col,89}}}
in function xmerl_scan:fatal/2
in call from xmerl_scan:scan_entity/2
in call from xmerl_scan:scan_markup_decl/2
in call from xmerl_scan:scan_ext_subset/2
in call from xmerl_scan:scan_document/2
in call from xmerl_scan:file/2
in call from xmerl_xsd:process_schemas/2
The 3450 refers to the code line in xmerl_scan:
scan_entity_value("%" ++ _T,S=#xmerl_scanner{environment=prolog},_,_,_,_,_) ->
?fatal({error,{wfc_PEs_In_Internal_Subset}},S);
And the error appears to be associated with line 628 of xhtml1-strict.dtd
The column of 89 would appear suspect as line 628 is not that wide:
621 <!--
622 param is used to supply a named property value.
623 In XML it would seem natural to follow RDF and support an
624 abbreviated syntax where the param elements are replaced
625 by attribute value pairs on the object start tag.
626 -->
627 <!ELEMENT param EMPTY>
628 <!ATTLIST param
629 id ID #IMPLIED
630 name CDATA #IMPLIED
631 value CDATA #IMPLIED
632 valuetype (data|ref|object) "data"
633 type %ContentType; #IMPLIED
634 >
635
If you got this far down the post, many thanks!
Tony
You seem to be invoking xmerl_xsd:process_schemas on a collection of schema documents some of which are XSD schema documents and one of which is not an XSD schema document at all, but a document type definition file (xhtml1-strict.dtd). The process_schemas function expects XSD schema documents, which are XML document instances, but DTD files are not XML document instances. You will need to acquire an XSD schema for XHTML, not the DTD, if you want to do what you appear to want to do. Unfortunately, the XHTML WG's XSD schema documents are not the easiest things in the world to use; good luck.
If you want to work with XForms, you might find it easier to get XSLTForms or Orbeon or BetterForms or EMC Formula working than you did to get backplane.js to work.

Xerces jar is a DOM parser or SAX parser

I would like to know about Xerces.jar implementation
is Xerces.jar a DOM parser or SAX parser.
When I try reading a huge XML file I am getting the following error message. Please help
java.lang.StackOverflowError
at org.apache.xerces.dom.ParentNode.readObject(Unknown Source)
at sun.reflect.GeneratedMethodAccessor569.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:618)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1098)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1948)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1948)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
Both.
It's a dessert topping AND a floor wax :)
http://xerces.apache.org/xerces2-j/faq-sax.html
http://xerces.apache.org/xerces2-j/faq-dom.html
PS:
Please post the space in the stack where the exception actually occurred. You seem to have left it out :)
PPS:
Also look here:
http://xerces.apache.org/xerces-j/schema.html
Due to the way in which the parser constructs content models for
elements with complex content, specifying large values for the
minOccurs or maxOccurs attributes may cause the parser to throw a
StackOverflowError. Large values for minOccurs should be avoided, and
unbounded should be used instead of a large value for maxOccurs.
Consider turning schema checking OFF, or changing minOccurs/maxOccurs:
http://xerces.apache.org/xerces2-j/features.html

Resources