Compatibility of Avro contract with enum - avro

I have an existing avro schema
{
"name": "myenum",
"type": {
"type": "enum",
"name": "Suit",
"symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
},
"default": null
}
I want to add null to be the default and updating the contract to the following result in backward compatibility error. what can be done to solve this issue
{
"name": "myenum",
"type": [ null, {
"type": "enum",
"name": "Suit",
"symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}],
"default": null
}

There is a problem with the null that is been added.
i.e.
{
"name": "myenum",
"type": [ null, { <- problem
"type": "enum",
"name": "Suit",
"symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}],
"default": null
}
It should be within quotes: null => "null". So updated definition will be like:
{
"name": "myenum",
"type": [ "null", { <- change
"type": "enum",
"name": "Suit",
"symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}],
"default": null
}

Related

Correct format for decimal and timestamp-micros

I am trying to find the correct AVRO Schema format to define decimal and timestamp-micros logical types?
Whether NumericField1/ DateFieldA or NumericField2/DateFieldB is correct format for the Logical Type?
{
"type": "record",
"name": "DateAndNumber",
"namespace": "org.sample",
"fields": [
{
"name": "NumericField1",
"type": [
"null",
"bytes"
],
"default": null,
"locgicalType": "decimal",
"precision": 8,
"scale": 2
},
{
"name": "NumericField2",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
],
"default": null
},
{
"name": "DateFieldA",
"type": "long",
"logicalType": "timestamp-micros",
"default": 0
},
{
"name": "DateFieldB",
"type": {
"type": "long",
"logicalType": "timestamp-micros"
},
"default": 0
}
]
}

Avro Schema: don't know which "defualt" value to use

One of my fields in my Avro Schema is:
{
"name": "recipients",
"type": {
"type": "array",
"items": {
"name": "recordings",
"type": "record",
"fields": [
{
"name": "returnAddress",
"type": ["null","string"],
"default": null
},
{
"name": "something",
"type": ["null","string"],
"default": null
},
{
"name": "phone",
"type": ["null","string"],
"default": null
},
{
"name": "pfp",
"type": {
"name": "pfp",
"type": "record",
"fields": []
},
"default": {"a": 1}
},
{
"name": "example1",
"type": "int",
"default": -1
},
{
"name": "example2",
"type": "int",
"default": -1
}
]
},
"default": [1]
}
}
However I get an error message :
Field recipients type:ARRAY pos:4 not set and has no default value
Do I need a default value for the recipients field? and if so what would the default value be for the type listed under recipients. I have tried "default":null, "default": {"a":1}, and "default": [1] and all returned errors.

Avro multiple enum int the same type (avro.SchemaParseException: Can't redefine)

A have the following avro schema, but if I want to parse it, then I got the following error:
Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: ...
{
"name": "card_1_nr",
"type": "string"
}, {
"name": "card_1_type",
"type": {
"name": "card_type",
"type": "enum",
"symbols": ["diamonds", "clubs", "hearts", "spades"],
"default": "diamonds"
}
}, {
"name": "card_2_nr",
"type": "string"
}, {
"name": "card_2_type",
"type": {
"name": "card_type",
"type": "enum",
"symbols": ["diamonds", "clubs", "hearts", "spades"],
"default": "diamonds"
}
}
Just simply need to use the enume type name:
{
"name": "card_1_nr",
"type": "string"
}, {
"name": "card_1_type",
"type": {
"name": "card_type",
"type": "enum",
"symbols": ["diamonds", "clubs", "hearts", "spades"],
"default": "diamonds"
}
}, {
"name": "card_2_nr",
"type": "string"
}, {
"name": "card_2_type",
"type": "card_type"
}

confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161 : ValueError

I am new bee to python and trying to use 'confluent_kafka' for avro message produce.
Using 'confluent_kafka.schema_registry.avro.AvroSerializer' for the same
(referred : https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/avro_producer.py)
It works for simple avro schema with dict(json converted to dict) input, but for below sample schema I am getting error :
Schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "source",
"type": {
"type": "record",
"name": "Source",
"namespace": "io.debezium.connector.sqlserver",
"fields": [{
"name": "version",
"type": "string"
}, {
"name": "connector",
"type": "string"
}],
"connect.name": "io.debezium.connector.sqlserver.Source"
}
}, {
"name": "op",
"type": "string"
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
Input Json :
{
"after": null,
"before": {
"CoreOLTPEvents.dbo.Event.Value" : {
"EventId": 1111111111,
"CameraId": 222222222
}
},
"source": {
"version": "InitialLoad",
"connector": "sqlserver"
},
"op": "C"
}
Error :
ValueError: {'CoreOLTPEvents.dbo.Event.Value': {'EventId': 1111111111, 'CameraId': 222222222}} (type <class 'dict'>) do not match ['null', {'connect.name': 'CoreOLTPEvents.dbo.Event.Value', 'type': 'record', 'name': 'CoreOLTPEvents.dbo.Event.Value', 'fields': [{'name': 'EventId', 'type': 'long'}, {'default': None, 'name': 'CameraId', 'type': ['null', 'long']}]}] on field before
'before' field type is union (['null',record]), if I change it to only record (remove union) then it works fine.
But I need to adjust my input such a way that it works for given schema.
(Note : I am reading json input using 'json.load(json_file)' so it gives dict output)
Any help would be much appreciated.
Update :
Actual large schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}, {
"name": "SiteId",
"type": ["null", "long"],
"default": null
}, {
"name": "VehicleId",
"type": ["null", "long"],
"default": null
}, {
"name": "EventReviewStatusID",
"type": "int"
}, {
"name": "EventTypeId",
"type": ["null", "int"],
"default": null
}, {
"name": "EventDateTime",
"type": ["null", {
"type": "string",
"connect.name": "net.smartdrive.converters.SmartdriveEventDateFieldConverter"
}],
"default": null
}, {
"name": "FTPUploadDateTime",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "CAMFileName",
"type": "string"
}, {
"name": "KeypadEntryCode",
"type": ["null", "string"],
"default": null
}, {
"name": "IsActive",
"type": {
"type": "boolean",
"connect.default": true
},
"default": true
}, {
"name": "Flagged",
"type": "boolean"
}, {
"name": "EventTitle",
"type": ["null", "string"],
"default": null
}, {
"name": "CreatedBy",
"type": "long"
}, {
"name": "CreatedDate",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "ModifiedBy",
"type": "long"
}, {
"name": "ModifiedDate",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "ReReviewAnalysis",
"type": ["null", "string"],
"default": null
}, {
"name": "LegacyEventId",
"type": ["null", "long"],
"default": null
}, {
"name": "TripId",
"type": ["null", "long"],
"default": null
}, {
"name": "FileVersion",
"type": ["null", "string"],
"default": null
}, {
"name": "EventNumber",
"type": ["null", "string"],
"default": null
}, {
"name": "Latitude",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 13,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "13"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "Longitude",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 13,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "13"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "GeoAddressId",
"type": ["null", "long"],
"default": null
}, {
"name": "ReviewedEventId",
"type": ["null", "long"],
"default": null
}, {
"name": "VideoStatus",
"type": {
"type": "int",
"connect.default": 0
},
"default": 0
}, {
"name": "PredictionImportance",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 15,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "15"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "FlaggedBy",
"type": ["null", "long"],
"default": null
}, {
"name": "FlaggedDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "TriggerTypeId",
"type": ["null", "int"],
"default": null
}, {
"name": "VideoDeleteDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "MetadataDeleteDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "RetentionStatus",
"type": {
"type": "int",
"connect.default": 0,
"connect.type": "int16"
},
"default": 0
}, {
"name": "PartnerTriggerId",
"type": ["null", "int"],
"default": null
}, {
"name": "CoachingStateId",
"type": {
"type": "int",
"connect.default": 0,
"connect.type": "int16"
},
"default": 0
}, {
"name": "EventKudoHistoryId",
"type": ["null", "int"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "source",
"type": {
"type": "record",
"name": "Source",
"namespace": "io.debezium.connector.sqlserver",
"fields": [{
"name": "version",
"type": "string"
}, {
"name": "connector",
"type": "string"
}, {
"name": "name",
"type": "string"
}, {
"name": "ts_ms",
"type": "long"
}, {
"name": "snapshot",
"type": [{
"type": "string",
"connect.version": 1,
"connect.parameters": {
"allowed": "true,last,false"
},
"connect.default": "false",
"connect.name": "io.debezium.data.Enum"
}, "null"],
"default": "false"
}, {
"name": "db",
"type": "string"
}, {
"name": "schema",
"type": "string"
}, {
"name": "table",
"type": "string"
}, {
"name": "change_lsn",
"type": ["null", "string"],
"default": null
}, {
"name": "commit_lsn",
"type": ["null", "string"],
"default": null
}, {
"name": "event_serial_no",
"type": ["null", "long"],
"default": null
}],
"connect.name": "io.debezium.connector.sqlserver.Source"
}
}, {
"name": "op",
"type": "string"
}, {
"name": "ts_ms",
"type": ["null", "long"],
"default": null
}, {
"name": "transaction",
"type": ["null", {
"type": "record",
"name": "ConnectDefault",
"namespace": "io.confluent.connect.avro",
"fields": [{
"name": "id",
"type": "string"
}, {
"name": "total_order",
"type": "long"
}, {
"name": "data_collection_order",
"type": "long"
}]
}],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
Input for large schema :
{
"before": null,
"after": {
"EventId": 1234566,
"CameraId": 2233,
"SiteId": 111,
"VehicleId": 45587,
"EventReviewStatusID": 10,
"EventTypeId": 123,
"EventDateTime": "2015-01-02T01:30:29Z",
"FTPUploadDateTime": 1420193330590,
"CAMFileName": "XYZ",
"KeypadEntryCode": "0",
"IsActive": false,
"Flagged": false,
"EventTitle": null,
"CreatedBy": 1,
"CreatedDate": 1420191120730,
"ModifiedBy": 1,
"ModifiedDate": 1577871185680,
"ReReviewAnalysis": null,
"LegacyEventId": null,
"TripId": 3382,
"FileVersion": "2.2",
"EventNumber": "AAAA-BBBB",
"Latitude": "UU9elrA=",
"Longitude": "/ueZUeFw",
"GeoAddressId": null,
"ReviewedEventId": 129411077,
"VideoStatus": 4,
"PredictionImportance": 0.1402457539,
"FlaggedBy": null,
"FlaggedDate": null,
"TriggerTypeId": 322,
"VideoDeleteDate": 1422783120000,
"MetadataDeleteDate": 1577871120000,
"RetentionStatus": 15,
"PartnerTriggerId": null,
"CoachingStateId": 0,
"EventKudoHistoryId": null
},
"source": {
"version": "Final",
"connector": "sqlserver",
"name": "CoreOLTP",
"ts_ms": 1615813992548,
"snapshot": "false",
"db": "CoreOLTP",
"schema": "dbo",
"table": "xyz",
"change_lsn": null,
"commit_lsn": null,
"event_serial_no": null
},
"op": "C",
"ts_ms": 1615813992548,
"transaction": null
}
Error :
confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="{'EventId': 129411077, 'CameraId': 46237, 'SiteId': 2148, 'VehicleId': 45587, 'EventReviewStatusID': 10, 'EventTypeId': 247, 'EventDateTime': '2015-01-02T01:30:29Z', 'FTPUploadDateTime': 1420191120590, 'CAMFileName': 'JD2BC02120150102013029ER.SDE', 'KeypadEntryCode': '0', 'IsActive': False, 'Flagged': False, 'EventTitle': None, 'CreatedBy': 1, 'CreatedDate': 1420191120730, 'ModifiedBy': 1, 'ModifiedDate': 1577871185680, 'ReReviewAnalysis': None, 'LegacyEventId': None, 'TripId': 3382, 'FileVersion': '2.2', 'EventNumber': 'WSHX-8QQ2', 'Latitude': 'UU9elrA=', 'Longitude': '/ueZUeFw', 'GeoAddressId': None, 'ReviewedEventId': 129411077, 'VideoStatus': 4, 'PredictionImportance': 0.1402457539, 'FlaggedBy': None, 'FlaggedDate': None, 'TriggerTypeId': 322, 'VideoDeleteDate': 1422783120000, 'MetadataDeleteDate': 1577871120000, 'RetentionStatus': 15, 'PartnerTriggerId': None, 'CoachingStateId': 0, 'EventKudoHistoryId': None} (type <class 'dict'>) do not match ['null', 'CoreOLTPEvents.dbo.Event.Value'] on field after"}
You just need to change your input so that the before field doesn't have the namespace. So it needs to look like this:
{
"after": null,
"before": {
"EventId": 1111111111,
"CameraId": 222222222
},
"source": {
"version": "InitialLoad",
"connector": "sqlserver"
},
"op": "C"
}
The original input you had looked like it was trying to be JSON encoded avro because the field before had the CoreOLTPEvents.dbo.Event.Value namespace. However, I'm guessing it must have been hand crafted because CameraId should have been specified as {"long": 222222222} rather than just 222222222.
If you do actually have Avro encoded JSON (from the result of some other process or something) then you you could use something like fastavro.json_reader to read in that file and it will create the correct memory representation (that doesn't include the type information for union fields).
UPDATE:
To figure out what the problem is with the full schema and full data, I first loaded the two objects using json.load and then used fastavro.validate(record, schema) The output from that is a stacktrace that ends with this:
fastavro._validate_common.ValidationError: [
"CoreOLTPEvents.dbo.Event.Envelope.after is <{'EventId': 1234566, 'CameraId': 2233, 'SiteId': 111, 'VehicleId': 45587, 'EventReviewStatusID': 10, 'EventTypeId': 123, 'EventDateTime': '2015-01-02T01:30:29Z', 'FTPUploadDateTime': 1420193330590, 'CAMFileName': 'XYZ', 'KeypadEntryCode': '0', 'IsActive': False, 'Flagged': False, 'EventTitle': None, 'CreatedBy': 1, 'CreatedDate': 1420191120730, 'ModifiedBy': 1, 'ModifiedDate': 1577871185680, 'ReReviewAnalysis': None, 'LegacyEventId': None, 'TripId': 3382, 'FileVersion': '2.2', 'EventNumber': 'AAAA-BBBB', 'Latitude': 'UU9elrA=', 'Longitude': '/ueZUeFw', 'GeoAddressId': None, 'ReviewedEventId': 129411077, 'VideoStatus': 4, 'PredictionImportance': 0.1402457539, 'FlaggedBy': None, 'FlaggedDate': None, 'TriggerTypeId': 322, 'VideoDeleteDate': 1422783120000, 'MetadataDeleteDate': 1577871120000, 'RetentionStatus': 15, 'PartnerTriggerId': None, 'CoachingStateId': 0, 'EventKudoHistoryId': None}> of type <class 'dict'> expected null",
"CoreOLTPEvents.dbo.Event.Value.Latitude is <UU9elrA=> of type <class 'str'> expected null",
"CoreOLTPEvents.dbo.Event.Value.Latitude is <UU9elrA=> of type <class 'str'> expected {'scale': 10, 'precision': 13, 'connect.version': 1, 'connect.parameters': {'scale': '10', 'connect.decimal.precision': '13'}, 'connect.name': 'org.apache.kafka.connect.data.Decimal', 'logicalType': 'decimal', 'type': 'bytes'}"
]
So that is trying to tell us that there is 3 potential problems. The first is that the value in after doesn't match null, but we can ignore that because we don't want after to match null.
The later two problems are the actual problem. It says that the value of Latitude is the string UU9elrA=, but that doesn't match either null or bytes. The string here looks base64 encoded, so maybe you have some code that decodes that to bytes and if so then maybe the actual problem is something else, but if so then I think you should be able to use fastavro.validate to figure out what the problem is.

Avro schema cannot deserialize autoregistered avro schema by connector

We are trying to consume a topic that has data emitted by a connector. We are using a handwritten schema that matches the data in the topic.
{
"type": "record",
"name": "Event",
"namespace": "com.example.avro",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": ["null", "string"],
"default": null
},
{
"name": "entity_id",
"type": ["null", "string"],
"default": null
},
{
"name": "emitted_at",
"type": ["null", "string"],
"default": null
},
{
"name": "data",
"type": ["null", "string"],
"default": null
}
]
}
Unfortunately it cannot deserialize this because of the auto-registered schema by the connector.
{
"type": "record",
"name": "Value",
"namespace": "postgres.public.events",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "entity_id",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "emitted_at",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.time.ZonedTimestamp"
}
],
"default": null
},
{
"name": "data",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.data.Json"
}
],
"default": null
}
],
"connect.name": "postgres.public.events.Value"
}
We are getting the following error:
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class postgres.public.events.Value specified in writer's schema whilst finding reader's schema for a SpecificRecord.
How do we resolve this issue?
You can either download the schema from the registry instead of defining your own (there's maven plugins to do this), or change the namespace+name of your own schema such that the generated class will match.
Adding an alias might work as well, but I've not had much experience/luck with that, personally.

Resources