Unable to serialize Avro schema when we have an array with null objects - avro

I have list of objects, out of which one object is null. When I am trying to serialize the below data, I am getting
null of string of array of union of
[object1, object2, object3, null, object4]
Schema Definition
{
"name": "myList",
"type": [
"null",
{
"type": "array",
"items": "string"
}
],
"default": null
}
How do we allow or ignore the null in the list/array and avoid the error ? TIA

Maybe you need to also add null as a possible value in the array?
{
"name": "myList",
"type": [
"null",
{
"type": "array",
"items": [
"null",
"string"
]
}
],
"default": null
}

Related

Validate multiple duplicate objects in JSON-SCHEMA

I'd like to validate a json object with the json-schema, but that json object can duplicate its values ​​as many times as the user wants.
The keys of that object can be repeated as many times as the user wishes at the times the user create his json.
example 1: (collection with object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
}
]
}
JSON-SCHEMA of Example 1
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
example 2: (collection with 2 object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
},
{
"name": "misa",
"email": "misa.com"
}
]
}
JSON SCHEMA of example 2
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
},
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
}
},
"required": [
"info"
]
}
In short, what I am looking for is a dynamic json schema that no matter how many times the collection grows, it can use only 1 and not generate several.
As you're using draft-04, I'll quote from the draft-04 specification.
This means you want items to have an object value as opposed to an array of objects.
The value of "items" MUST be either an object or an array. If it is
an object, this object MUST be a valid JSON Schema. If it is an
array, items of this array MUST be objects, and each of these objects
MUST be a valid JSON Schema.
Draft-04 specificiation https://datatracker.ietf.org/doc/html/draft-fge-json-schema-validation-00#section-5.3.1
In JSON Schema 2020-12, items may ONLY be an object value, and you must use a different keyword for tuple like validation.
This worked for me-
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
"additionalItems":{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
}
},
"required": [
"info"
]
}

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

How to extract a nested nullable Avro Schema

The complete schema is the following:
{
"type": "record",
"name": "envelope",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
]
},
{
"name": "after",
"type": [
"null",
"row"
]
}
]
}
I wanted to programmatically extract the following sub-schema:
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
As you see, field "before" is nullable. I can extract it's schema by doing:
schema.getField("before").schema()
But the schema is not a record as it contains NULL at the beginning(UNION type) and I can't go inside to fetch schema of "row".
["null",{"type":"record","name":"row","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"long"}]}]
I want to fetch the sub-schema because I want to create GenericRecord out of it. Basically I want to create two GenericRecords "before" and "after" and add them to the main GenericRecord created from full schema.
Any help will be highly appreciated.
Good news, if you have a union schema, you can go inside to fetch the list of possible options:
Schema unionSchema = schema.getField("before").schema();
List<Schema> unionSchemaContains = unionSchema.getTypes();
At that point, you can look inside the list to find the one that corresponds to the Type.RECORD.

Handle Null Records in AVRO schema

I have a problem where my record json can be null. How to handle null records in avro schema? The documentation given is for null attributes I want to get for null records.
the same applies to record types too. you can union the type with null.
{
"name": "SpokenLanguage",
"type": [
"null",
{
"type": "record",
"name": "Language",
"fields": [
{
"name": "IsoCode",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "isPrimary",
"type": "boolean",
"default": false
},
{
"name": "description",
"type": [
"null",
"string"
],
"default": null
}
]
}
],
"default": null
}

Avro schema issue when record missing a field

I am using the NiFi (v1.2) processor ConvertJSONToAvro. I am not able to parse a record that only contains 1 of 2 elements in a "record" type. This element is also allowed to be missing entirely from the data. Is my Avro schema incorrect?
Schema snippet:
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
If "personname" contains both "first" and "last" it works, but if it only contains one of the elements, it fails with the error: Cannot convert field personname: cannot resolve union:
{ "last":"Smith" }
not in
"type": [ "null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
You are missing the default value
https://avro.apache.org/docs/1.8.1/spec.html#schema_record
Your schema should looks like
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
],
"default": "null"
},
{
"name": "last",
"type": [
"null",
"string"
],
"default": "null"
}
]
}
]

Resources