How can I write an avro schema for an array of arrays? - avro

For example, I've tried this one, but it isn't working. I have to create a schema that have in one field an arrays of arrays and I couldn't do it.
{
"name": "SelfHealingStarter",
"namespace": "SCP.Kafka.AvroSchemas",
"doc": "Message with all the necessary information to run Self Healing process.",
"type": "record",
"fields": [
{
"name": "FiveMinutesAgoMeasurement",
"type": "record",
"doc": "Field with all five minutes ago measurement.",
"fields": [
{
"name": "equipments",
"doc": "List with all equipments measurement.",
"type": {
"type": "array",
"items": {
"type": {
"type": "array",
"items": "string"
},
"default": []
}
},
"default": []
}
]
}
]
}

IDL
protocol Example {
record Foo {
array<array<string>> data = [];
}
}
AVSC from java -jar ~/workspace/avro-tools-1.8.2.jar idl2schemata example.idl
{
"type" : "record",
"name" : "Foo",
"fields" : [ {
"name" : "data",
"type" : {
"type" : "array",
"items" : {
"type" : "array",
"items" : "string"
}
},
"default" : [ ]
} ]
}

Related

Validate multiple duplicate objects in JSON-SCHEMA

I'd like to validate a json object with the json-schema, but that json object can duplicate its values ​​as many times as the user wants.
The keys of that object can be repeated as many times as the user wishes at the times the user create his json.
example 1: (collection with object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
}
]
}
JSON-SCHEMA of Example 1
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
example 2: (collection with 2 object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
},
{
"name": "misa",
"email": "misa.com"
}
]
}
JSON SCHEMA of example 2
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
},
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
}
},
"required": [
"info"
]
}
In short, what I am looking for is a dynamic json schema that no matter how many times the collection grows, it can use only 1 and not generate several.
As you're using draft-04, I'll quote from the draft-04 specification.
This means you want items to have an object value as opposed to an array of objects.
The value of "items" MUST be either an object or an array. If it is
an object, this object MUST be a valid JSON Schema. If it is an
array, items of this array MUST be objects, and each of these objects
MUST be a valid JSON Schema.
Draft-04 specificiation https://datatracker.ietf.org/doc/html/draft-fge-json-schema-validation-00#section-5.3.1
In JSON Schema 2020-12, items may ONLY be an object value, and you must use a different keyword for tuple like validation.
This worked for me-
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
"additionalItems":{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
}
},
"required": [
"info"
]
}

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

Nifi RecordReader & RecordWriter serialization error. IllegalTypeConversionException; Cannot convert value of class; because the type is not supported

I'm trying to convert json to json using JoltTransformRecord in Apache Nifi. When I try transform json in https://jolt-demo.appspot.com/, I'm getting correct result. This is okey.
But, when I'm trying to transform json using JoltTransformRecord, it is throws an exception. Error is; "Cannot convert value of class [Ljava.lang.Object; because the type is not supported". But I didn't understand why I'm getting this error. I did check my input and output schemas, but I didn't find anything. They looks like correct.
In the below, my input and output json examples, jolt specification, input and output schemas are given. Also, for this, I'm using JsonTreeReader and JsonRecordSetWriter.
--- How can I solve this problem? ---
Example input json for JoltTransformRecord(In this example, there is only one json object inside array. But actually, there are a lot of json object inside array.);
[ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"Samples" : {
"PathFeedrate" : [ {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
} ]
},
"Events" : {
"SequenceNumber" : [ {
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
} ],
"Unit" : [ {
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
}]
Sample output Json I want;
{
"DataItems" : [ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "SequenceNumber",
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "Unit",
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
My Jolt Specification;
[
{
"operation": "shift",
"spec": {
"Samples": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Events": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Condition": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].value",
"*": "Items.&2[#2].&"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"Items": {
"*": {
"*": "DataItems[]"
}
}
}
}
]
This specification is working correctly. Because I have tried it in Jolt transform demo.
I'm using JsonTreeReader for read json in JoltTransformRecord. And this is my input schema;
{
"name": "Items",
"namespace": "Items",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "Samples",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "SamplesDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "sampleRate",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "duration",
"type": ["null", "string"]
},
{
"name": "sampleCount",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Events",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "EventsDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Condition",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "ConditionDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "nativeCode",
"type": ["null", "string"]
},
{
"name": "nativeSeverity",
"type": ["null", "string"]
},
{
"name": "qualifier",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
}
]
}
}
}]
}
]
}
I'm using JsonRecordSetWriter for write converted result in JoltTransformRecord. And this is my output schema;
{
"name": "Items",
"type": "record",
"namespace": "Items",
"fields": [
{
"name": "DataItems",
"type": {
"type": "array",
"items": {
"name": "DataItems",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "eventType",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"],
"default": null
},
{
"name": "subType",
"type": ["null", "string"],
"default": null
}
]
}
}
}
]
}
This is indeed a bug in the record handling utilities, I have written NIFI-6105 to cover the fix. Good catch!
As a workaround, since you have JSON as input and output, you can use JoltTransformJson instead of JoltTransformRecord. Alternatively, if you know the keys in the map (PathFeedrate, e.g.), you can change the schema to treat it as a record rather than a map, that might get you around the bug.

How to define a schema which have a union in an array in avro?

I want to define my array element as a union. Is it possible? If so please share a sample schema.
I think this is what you are looking for:
Avro Schema:
{
"type": "record",
"namespace": "example.avro",
"name": "array_union",
"fields": [
{
"name": "body",
"type": {
"name": "body",
"type": "record",
"fields": [
{
"name": "abc",
"type": [
"null",
{
"type": "array",
"name": "abc_name_0",
"items": {
"name": "_name_0",
"type": "record",
"fields": [
{
"name": "app_id",
"type": [
"null",
"string"
]
},
{
"name": "app_name",
"type": [
"null",
"string"
]
},
{
"name": "app_key",
"type": [
"null",
"string"
]
}
]
}
}
]
}
]
}
}
]
}
Valid Json that schema can accept:
{
"body": {
"abc": {
"array": [
{
"app_id": {
"string": "abc"
},
"app_name": {
"string": "bcd"
},
"app_key": {
"string": "cde"
}
}
]
}
}
}
Or,
{
"body": {
"abc": null
}
}
You could add this below piece of code as a record field type
{
"name": "some_type",
"type": {
"type": "array",
"items": {
"name": "SomeType",
"type": "record",
"fields": [
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "text",
"type": "string"
},
{
"name": "type",
"type": {
"name": "NamedTextType",
"type": "enum",
"symbols": [ "named_text", "common_named_text" ]
}
}
]
}
}
}
Hope this helps!

Resources