Avro Schema and Arrays - avro

In C# I can define these two POCOs to define generations
public class Family
{
public List<Person> FamilyMembers {get; set;}
}
public class Person
{
public string FirstName {get; set;}
public string LastName {get; set;}
public List<Person> Children {get; set;}
}
I am attempting to define an AVRO schema to serialize FamilyMembers to. Is it possible in Avro to define a recursive array (not sure if that's the proper term), rather than having to specify each generation in the schema like below.
{
"type": "record",
"name": "family",
"namespace": "com.family.my",
"fields": [
{
"name":"familymember",
"type":{
"type": "array",
"items":{
"name":"person",
"type":"record",
"fields":[
{"name":"firstname", "type":"string"},
{"name":"lastname", "type":"string"},
{"name":"children",
"type":{
"type": "array",
"items":{
"name":"children",
"type":"record",
"fields":[
{"name":"firstname", "type":"string"},
{"name":"lastname", "type":"string"},
{"name":"grandchildren",
"type":{
"type": "array",
"items":{
"name":"greatgrandchildren",
"type":"record",
"fields":[
{"name":"firstname", "type":"string"},
{"name":"lastname", "type":"string"}
]
}
}}
]
}
}}
]
}
}
}
]
}

Yes! After you define person, you may use it as the item type in the children array. For example:
{
"type": "record",
"name": "family",
"namespace": "com.family.my",
"fields": [{
"name":"familymember",
"type":{
"type": "array",
"items":{
"name":"person",
"type":"record",
"fields":[
{"name":"firstname", "type":"string"},
{"name":"lastname", "type":"string"},
{"name":"children", "type": { "type": "array", "items": "person" }}
]
}
}
}]
}
This sort of recursive schema is demonstrated by the LongList from the Avro spec:
{
"type": "record",
"name": "LongList",
"aliases": ["LinkedLongs"],
"fields" : [
{"name": "value", "type": "long"},
{"name": "next", "type": ["null", "LongList"]}
]
}

Related

How to include fields of one AVRO schema in another without nesting?

I would like to create an AVRO schema that uses fields from another schema without creating a nested record. Take the following example:
Schema A:
{
"type": "record",
"namespace": "example",
"name": "SchemaA",
"fields": [
{
"name": "foo",
"type": "string"
},
{
"name": "bar",
"type": "string"
}
]
}
I want to create Schema B which uses the fields from Schema A and adds an additional field, i.e. the result should be:
{
"type": "record",
"namespace": "example",
"name": "SchemaB",
"fields": [
{
"name": "foo",
"type": "string"
},
{
"name": "bar",
"type": "string"
},
{
"name": "abc",
"type": "string"
}
]
}
Is it possible to create this Schema B by referencing the fields of Schema A?
Note: I do not want to create a nested field like:
{
"type": "record",
"namespace": "example",
"name": "SchemaB",
"fields": [
{
"name": "SchemaA",
"type": "example.SchemaA"
},
{
"name": "abc",
"type": "string"
}
]
}

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

Avro schema getting undefined type name when using Record type

so im trying to parse an object with this avro schema.
object is like:
myInfo: {size: 'XL'}
But Its behaving like the record type doesn't actually exist and im getting a undefined type name: data.platform_data.test_service.result.record at Function.Type.forSchema for it.
schema looks like:
"avro": {
"metadata": {
"loadType": "full",
"version": "0.1"
},
"schema": {
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
]
}
}
I should mention im also using avsc for this. Anybody have any ideas? I've tried pretty much all combinations but afaik the only way of parsing out an objct like this is with record
Playing around with the schema, I found that "type": "record" is a problem. I moved it to nested definition. And it worked. Seems like description here is little bit confusing.
Change
Before:
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
After:
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
Updated schema which is working:
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
]
}
To make a record attribute nullable, process is same as any other attribute. You need to union with "null" (as show in below schema):
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": [
"null",
{
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {
"name": "size",
"type": "string"
}
}
]
}
]
}
]
}

Apache Avro UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 2001-01-01

Despite examples collected here and there, I haven't been able to produce a correct Avro 1.9.1 schema for my (lomboked) class, getting the title's error message at serialization time of my LocalDate field.
Can someone please explain what I'm missing?
#Data
public class Person {
private Long id;
private String firstname;
private LocalDate birth;
private Integer votes = 0;
}
This is the schema:
{
"type": "record",
"name": "Person",
"namespace": "com.example.demo",
"fields": [
{
"name": "id",
"type": "long"
},
{
"name": "firstname",
"type": "string"
},
{
"name": "birth",
"type": [ "null", { "type": "int", "logicalType": "date" }]
},
{
"name": "votes",
"type": "int"
}]
}
The error, meaning java.time.LocalDate is not found in the union's "index named" map, is this:
org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"int","logicalType":"date"}]: 2001-01-01
Index named map keys are "null" and "int", which seems logical.

How to re-use logical type in Apache Avro

I can define UUID type in avro with logical type like this
{
"type":"record",
"name":"Metadata",
"namespace":"com.example",
"doc":"Event metadata",
"fields":[
{
"name":"messageId",
"type": {
"type": "string",
"logicalType": "uuid"
}
}
]
}
But I'd like to re-use UUID someway like this:
{
"type":"record",
"name":"Metadata",
"namespace":"com.example",
"doc":"Event metadata",
"fields":[
{
"name":"messageId",
"type": "com.example.UUID"
}
]
}
{
"namespace" : "com.example",
"name" : "UUID",
"type": {
"type": "string",
"logicalType": "uuid"
}
}
But this schema is not valid. How can I re-use logical type in Avro?

Resources