AvroSerializer: schema for orderbook snapshots - avro

I have a Kafka cluster running and I want to store L2-orderbook snapshots into a topic that have a dictionary of {key:value} pairs where the keys are of type float as the following example:
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': {
100.0: 20.0,
101.0: 21.3,
102.0: 34.6,
...,
},
'ask': {
100.0: 20.0,
101.0: 21.3,
102.0: 34.6,
...,
}
},
'timestamp': 1642524222.1160505
}
My schema proposal below is not working and I'm pretty sure it is because the keys in the 'bid' and 'ask' dictionaries are not of type string.
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": "record", "fields": {
"name": "bid", "type": "record", "fields": {
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
},
"name": "ask", "type": "record", "fields": {
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
}
},
{"name": "timestamp", "type": "float"}
]
}
KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="no value and no default for bids"}
What would be a proper avro-schema here?

First, you have a typo. fields needs to be an array in the schema definition.
However, your bid (and ask) objects are not records. They are a map<float, float>. In other words, it does not have literal price and volume keys.
Avro has Map types, but the keys are "assumed to be strings".
You are welcome to try
{"name": "bid", "type": "map", "values": "float"}
Otherwise, you need to reformat your data payloads, for example as a list of objects
'bid': [
{'price': 100.0, 'volume': 20.0},
...,
],
Along with
{"name": "bid", "type": "array", "items": {
"type": "record",
"name": "BidItem",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}}

I have finally figured out 2 working resolutions. In both cases I need to convert the original data.
The main lessons for me have been:
avro maps need keys of type string
avro complex types (e.g. maps and records) need to be defined properly:
{"name": "bid", "type"
{"type": "array", "items": {
...
Special thanks to OneCricketeer for pointing me into the right direction! :-)
1) bids and asks as a map with the key being of type string
data example
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': {
"100.0": 20.0,
"101.0": 21.3,
"102.0": 34.6,
...,
},
'ask': {
"100.0": 20.0,
"101.0": 21.3,
"102.0": 34.6,
...,
}
},
'timestamp': 1642524222.1160505
}
schema
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": {
"name": "book",
"type": "record",
"fields": [
{"name": "bid", "type": {
"type": "map", "values": "float"
}
},
{"name": "ask", "type": {
"type": "map", "values": "float"
}
}
]}
},
{"name": "timestamp", "type": "float"}
]
}
2) bids and asks as an array of records
data example
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': [
{"price": 100.0, "volume": 20.0,}
{"price": 101.0, "volume": 21.3,}
{"price": 102.0, "volume": 34.6,}
...,
],
'ask': [
{"price": 100.0, "volume": 20.0,}
{"price": 101.0, "volume": 21.3,}
{"price": 102.0, "volume": 34.6,}
...,
]
},
'timestamp': 1642524222.1160505
}
schema
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": {
"name": "book",
"type": "record",
"fields": [
{"name": "bid", "type": {
"type": "array", "items": {
"name": "bid",
"type": "record",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}
}},
{"name": "ask", "type": {
"type": "array", "items": {
"name": "ask",
"type": "record",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}
}}
]}},
{"name": "timestamp", "type": "float"}
]
}

Related

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

Avro schema getting undefined type name when using Record type

so im trying to parse an object with this avro schema.
object is like:
myInfo: {size: 'XL'}
But Its behaving like the record type doesn't actually exist and im getting a undefined type name: data.platform_data.test_service.result.record at Function.Type.forSchema for it.
schema looks like:
"avro": {
"metadata": {
"loadType": "full",
"version": "0.1"
},
"schema": {
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
]
}
}
I should mention im also using avsc for this. Anybody have any ideas? I've tried pretty much all combinations but afaik the only way of parsing out an objct like this is with record
Playing around with the schema, I found that "type": "record" is a problem. I moved it to nested definition. And it worked. Seems like description here is little bit confusing.
Change
Before:
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
After:
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
Updated schema which is working:
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
]
}
To make a record attribute nullable, process is same as any other attribute. You need to union with "null" (as show in below schema):
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": [
"null",
{
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {
"name": "size",
"type": "string"
}
}
]
}
]
}
]
}

Creating an avro schema for an array with multiple record types?

I am creating an avro schema for a JSON payload that appear to have an array of multiple objects. I'm not sure exactly how to represent this in the schema. The key in question is content:
{
"id": "channel-id",
"name": "My Channel with a New Title",
"description": "Herpy me derpy merpus herpsum ner berp berps derp ter tee",
"privacyLevel": "<private|org>",
"planId": "some-plan-id",
"owner": "a-user-handle",
"curators": [
"user-handle-1",
"user-handle-2"
],
"members": 5,
"content": [
{
"id": "docker",
"slug": "docker",
"index": 1,
"type": "path"
},
{
"id": "such-linkage",
"slug": "such-linkage",
"index": 2,
"type": "external-link",
"details": {
"url": "http://some-dank-link.com",
"title": "My Dank Link",
"contentType": "External Link",
"level": "Beginner",
"duration": "PT34293H33M9S"
}
},
{
"id": "21f1e812-b10a-40df-8b52-3a1d05fc215c",
"slug": "windows-azure-storage-in-depth",
"index": 3,
"type": "course"
},
{
"id": "7c346c05-6416-42dd-80b2-d5e758de7926",
"slug": "7c346c05-6416-42dd-80b2-d5e758de7926",
"index": 4,
"type": "project"
}
],
"imageUrls": ["https://url/to/an/image", "https://url/to/another/image"],
"analyticsEnabled": true,
"orgDiscoverable": false,
"createdDate": "2015-12-31T01:23:45+00:00",
"archiveDate": "2015-12-31T01:23:45+00:00",
"messagePublishedAt": "2015-12-31T01:23:45+00:00"
}
If you are asking if it is possible create an array with different kind of records, it is. Avro support this through union. it would looks like .
{
"name": "myRecord",
"type":"record",
"fields":[
{
"name":"myArrayWithMultiplesTypes",
"type":{
"type": "array",
"items":[
{
"name":"typeOne",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
},
{
"name":"typeTwo",
"type":"record",
"fields":[
{"name":"id", "type":"int"}
]
}
]
}
}
]
}
If you already have the records defined previously, then it could look like this:
{
"name": "mulitplePossibleTypes",
"type": [
"null",
{
"type": "array",
"items": [
"com.xyz.kola.cloud.events.itemmanager.Part",
"com.xyz.kola.cloud.events.itemmanager.Document",
"com.xyz.kola.cloud.events.itemmanager.DigitalModel",
"com.xyz.kola.cloud.events.itemmanager.Interface"
]
}
]
},

Avro schema issue when record missing a field

I am using the NiFi (v1.2) processor ConvertJSONToAvro. I am not able to parse a record that only contains 1 of 2 elements in a "record" type. This element is also allowed to be missing entirely from the data. Is my Avro schema incorrect?
Schema snippet:
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
If "personname" contains both "first" and "last" it works, but if it only contains one of the elements, it fails with the error: Cannot convert field personname: cannot resolve union:
{ "last":"Smith" }
not in
"type": [ "null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
You are missing the default value
https://avro.apache.org/docs/1.8.1/spec.html#schema_record
Your schema should looks like
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
],
"default": "null"
},
{
"name": "last",
"type": [
"null",
"string"
],
"default": "null"
}
]
}
]

Avro-Tools JSON to Avro Schema fails: org.apache.avro.SchemaParseException: Undefined name:

I am trying to create two Avro schemas using the avro-tools-1.7.4.jar create schema command.
I have two JSON schemas which look like this:
{
"name": "TestAvro",
"type": "record",
"namespace": "com.avro.test",
"fields": [
{"name": "first", "type": "string"},
{"name": "last", "type": "string"},
{"name": "amount", "type": "double"}
]
}
{
"name": "TestArrayAvro",
"type": "record",
"namespace": "com.avro.test",
"fields": [
{"name": "date", "type": "string"},
{"name": "records", "type":
{"type":"array","items":"com.avro.test.TestAvro"}}
]
}
When I run the create schema on these two files the first one works fine and generates the java. The second one fails every time. It does not like the array items when I try and use the first Schema as the type. This is the error I get:
Exception in thread "main" org.apache.avro.SchemaParseException: Undefined name: "com.test.avro.TestAvro"
at org.apache.avro.Schema.parse(Schema.java:1052)
Both files are located in the same path directory.
Use the below avsc file:
[{
"name": "TestAvro",
"type": "record",
"namespace": "com.avro.test",
"fields": [
{
"name": "first",
"type": "string"
},
{
"name": "last",
"type": "string"
},
{
"name": "amount",
"type": "double"
}
]
},
{
"name": "TestArrayAvro",
"type": "record",
"namespace": "com.avro.test",
"fields": [
{
"name": "date",
"type": "string"
},
{
"name": "records",
"type": {
"type": "array",
"items": "com.avro.test.TestAvro"
}
}
]
}]

Resources