I am new bee to python and trying to use 'confluent_kafka' for avro message produce.
Using 'confluent_kafka.schema_registry.avro.AvroSerializer' for the same
(referred : https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/avro_producer.py)
It works for simple avro schema with dict(json converted to dict) input, but for below sample schema I am getting error :
Schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "source",
"type": {
"type": "record",
"name": "Source",
"namespace": "io.debezium.connector.sqlserver",
"fields": [{
"name": "version",
"type": "string"
}, {
"name": "connector",
"type": "string"
}],
"connect.name": "io.debezium.connector.sqlserver.Source"
}
}, {
"name": "op",
"type": "string"
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
Input Json :
{
"after": null,
"before": {
"CoreOLTPEvents.dbo.Event.Value" : {
"EventId": 1111111111,
"CameraId": 222222222
}
},
"source": {
"version": "InitialLoad",
"connector": "sqlserver"
},
"op": "C"
}
Error :
ValueError: {'CoreOLTPEvents.dbo.Event.Value': {'EventId': 1111111111, 'CameraId': 222222222}} (type <class 'dict'>) do not match ['null', {'connect.name': 'CoreOLTPEvents.dbo.Event.Value', 'type': 'record', 'name': 'CoreOLTPEvents.dbo.Event.Value', 'fields': [{'name': 'EventId', 'type': 'long'}, {'default': None, 'name': 'CameraId', 'type': ['null', 'long']}]}] on field before
'before' field type is union (['null',record]), if I change it to only record (remove union) then it works fine.
But I need to adjust my input such a way that it works for given schema.
(Note : I am reading json input using 'json.load(json_file)' so it gives dict output)
Any help would be much appreciated.
Update :
Actual large schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}, {
"name": "SiteId",
"type": ["null", "long"],
"default": null
}, {
"name": "VehicleId",
"type": ["null", "long"],
"default": null
}, {
"name": "EventReviewStatusID",
"type": "int"
}, {
"name": "EventTypeId",
"type": ["null", "int"],
"default": null
}, {
"name": "EventDateTime",
"type": ["null", {
"type": "string",
"connect.name": "net.smartdrive.converters.SmartdriveEventDateFieldConverter"
}],
"default": null
}, {
"name": "FTPUploadDateTime",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "CAMFileName",
"type": "string"
}, {
"name": "KeypadEntryCode",
"type": ["null", "string"],
"default": null
}, {
"name": "IsActive",
"type": {
"type": "boolean",
"connect.default": true
},
"default": true
}, {
"name": "Flagged",
"type": "boolean"
}, {
"name": "EventTitle",
"type": ["null", "string"],
"default": null
}, {
"name": "CreatedBy",
"type": "long"
}, {
"name": "CreatedDate",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "ModifiedBy",
"type": "long"
}, {
"name": "ModifiedDate",
"type": {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}
}, {
"name": "ReReviewAnalysis",
"type": ["null", "string"],
"default": null
}, {
"name": "LegacyEventId",
"type": ["null", "long"],
"default": null
}, {
"name": "TripId",
"type": ["null", "long"],
"default": null
}, {
"name": "FileVersion",
"type": ["null", "string"],
"default": null
}, {
"name": "EventNumber",
"type": ["null", "string"],
"default": null
}, {
"name": "Latitude",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 13,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "13"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "Longitude",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 13,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "13"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "GeoAddressId",
"type": ["null", "long"],
"default": null
}, {
"name": "ReviewedEventId",
"type": ["null", "long"],
"default": null
}, {
"name": "VideoStatus",
"type": {
"type": "int",
"connect.default": 0
},
"default": 0
}, {
"name": "PredictionImportance",
"type": ["null", {
"type": "bytes",
"scale": 10,
"precision": 15,
"connect.version": 1,
"connect.parameters": {
"scale": "10",
"connect.decimal.precision": "15"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}],
"default": null
}, {
"name": "FlaggedBy",
"type": ["null", "long"],
"default": null
}, {
"name": "FlaggedDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "TriggerTypeId",
"type": ["null", "int"],
"default": null
}, {
"name": "VideoDeleteDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "MetadataDeleteDate",
"type": ["null", {
"type": "long",
"connect.version": 1,
"connect.name": "io.debezium.time.Timestamp"
}],
"default": null
}, {
"name": "RetentionStatus",
"type": {
"type": "int",
"connect.default": 0,
"connect.type": "int16"
},
"default": 0
}, {
"name": "PartnerTriggerId",
"type": ["null", "int"],
"default": null
}, {
"name": "CoachingStateId",
"type": {
"type": "int",
"connect.default": 0,
"connect.type": "int16"
},
"default": 0
}, {
"name": "EventKudoHistoryId",
"type": ["null", "int"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "source",
"type": {
"type": "record",
"name": "Source",
"namespace": "io.debezium.connector.sqlserver",
"fields": [{
"name": "version",
"type": "string"
}, {
"name": "connector",
"type": "string"
}, {
"name": "name",
"type": "string"
}, {
"name": "ts_ms",
"type": "long"
}, {
"name": "snapshot",
"type": [{
"type": "string",
"connect.version": 1,
"connect.parameters": {
"allowed": "true,last,false"
},
"connect.default": "false",
"connect.name": "io.debezium.data.Enum"
}, "null"],
"default": "false"
}, {
"name": "db",
"type": "string"
}, {
"name": "schema",
"type": "string"
}, {
"name": "table",
"type": "string"
}, {
"name": "change_lsn",
"type": ["null", "string"],
"default": null
}, {
"name": "commit_lsn",
"type": ["null", "string"],
"default": null
}, {
"name": "event_serial_no",
"type": ["null", "long"],
"default": null
}],
"connect.name": "io.debezium.connector.sqlserver.Source"
}
}, {
"name": "op",
"type": "string"
}, {
"name": "ts_ms",
"type": ["null", "long"],
"default": null
}, {
"name": "transaction",
"type": ["null", {
"type": "record",
"name": "ConnectDefault",
"namespace": "io.confluent.connect.avro",
"fields": [{
"name": "id",
"type": "string"
}, {
"name": "total_order",
"type": "long"
}, {
"name": "data_collection_order",
"type": "long"
}]
}],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
Input for large schema :
{
"before": null,
"after": {
"EventId": 1234566,
"CameraId": 2233,
"SiteId": 111,
"VehicleId": 45587,
"EventReviewStatusID": 10,
"EventTypeId": 123,
"EventDateTime": "2015-01-02T01:30:29Z",
"FTPUploadDateTime": 1420193330590,
"CAMFileName": "XYZ",
"KeypadEntryCode": "0",
"IsActive": false,
"Flagged": false,
"EventTitle": null,
"CreatedBy": 1,
"CreatedDate": 1420191120730,
"ModifiedBy": 1,
"ModifiedDate": 1577871185680,
"ReReviewAnalysis": null,
"LegacyEventId": null,
"TripId": 3382,
"FileVersion": "2.2",
"EventNumber": "AAAA-BBBB",
"Latitude": "UU9elrA=",
"Longitude": "/ueZUeFw",
"GeoAddressId": null,
"ReviewedEventId": 129411077,
"VideoStatus": 4,
"PredictionImportance": 0.1402457539,
"FlaggedBy": null,
"FlaggedDate": null,
"TriggerTypeId": 322,
"VideoDeleteDate": 1422783120000,
"MetadataDeleteDate": 1577871120000,
"RetentionStatus": 15,
"PartnerTriggerId": null,
"CoachingStateId": 0,
"EventKudoHistoryId": null
},
"source": {
"version": "Final",
"connector": "sqlserver",
"name": "CoreOLTP",
"ts_ms": 1615813992548,
"snapshot": "false",
"db": "CoreOLTP",
"schema": "dbo",
"table": "xyz",
"change_lsn": null,
"commit_lsn": null,
"event_serial_no": null
},
"op": "C",
"ts_ms": 1615813992548,
"transaction": null
}
Error :
confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="{'EventId': 129411077, 'CameraId': 46237, 'SiteId': 2148, 'VehicleId': 45587, 'EventReviewStatusID': 10, 'EventTypeId': 247, 'EventDateTime': '2015-01-02T01:30:29Z', 'FTPUploadDateTime': 1420191120590, 'CAMFileName': 'JD2BC02120150102013029ER.SDE', 'KeypadEntryCode': '0', 'IsActive': False, 'Flagged': False, 'EventTitle': None, 'CreatedBy': 1, 'CreatedDate': 1420191120730, 'ModifiedBy': 1, 'ModifiedDate': 1577871185680, 'ReReviewAnalysis': None, 'LegacyEventId': None, 'TripId': 3382, 'FileVersion': '2.2', 'EventNumber': 'WSHX-8QQ2', 'Latitude': 'UU9elrA=', 'Longitude': '/ueZUeFw', 'GeoAddressId': None, 'ReviewedEventId': 129411077, 'VideoStatus': 4, 'PredictionImportance': 0.1402457539, 'FlaggedBy': None, 'FlaggedDate': None, 'TriggerTypeId': 322, 'VideoDeleteDate': 1422783120000, 'MetadataDeleteDate': 1577871120000, 'RetentionStatus': 15, 'PartnerTriggerId': None, 'CoachingStateId': 0, 'EventKudoHistoryId': None} (type <class 'dict'>) do not match ['null', 'CoreOLTPEvents.dbo.Event.Value'] on field after"}
You just need to change your input so that the before field doesn't have the namespace. So it needs to look like this:
{
"after": null,
"before": {
"EventId": 1111111111,
"CameraId": 222222222
},
"source": {
"version": "InitialLoad",
"connector": "sqlserver"
},
"op": "C"
}
The original input you had looked like it was trying to be JSON encoded avro because the field before had the CoreOLTPEvents.dbo.Event.Value namespace. However, I'm guessing it must have been hand crafted because CameraId should have been specified as {"long": 222222222} rather than just 222222222.
If you do actually have Avro encoded JSON (from the result of some other process or something) then you you could use something like fastavro.json_reader to read in that file and it will create the correct memory representation (that doesn't include the type information for union fields).
UPDATE:
To figure out what the problem is with the full schema and full data, I first loaded the two objects using json.load and then used fastavro.validate(record, schema) The output from that is a stacktrace that ends with this:
fastavro._validate_common.ValidationError: [
"CoreOLTPEvents.dbo.Event.Envelope.after is <{'EventId': 1234566, 'CameraId': 2233, 'SiteId': 111, 'VehicleId': 45587, 'EventReviewStatusID': 10, 'EventTypeId': 123, 'EventDateTime': '2015-01-02T01:30:29Z', 'FTPUploadDateTime': 1420193330590, 'CAMFileName': 'XYZ', 'KeypadEntryCode': '0', 'IsActive': False, 'Flagged': False, 'EventTitle': None, 'CreatedBy': 1, 'CreatedDate': 1420191120730, 'ModifiedBy': 1, 'ModifiedDate': 1577871185680, 'ReReviewAnalysis': None, 'LegacyEventId': None, 'TripId': 3382, 'FileVersion': '2.2', 'EventNumber': 'AAAA-BBBB', 'Latitude': 'UU9elrA=', 'Longitude': '/ueZUeFw', 'GeoAddressId': None, 'ReviewedEventId': 129411077, 'VideoStatus': 4, 'PredictionImportance': 0.1402457539, 'FlaggedBy': None, 'FlaggedDate': None, 'TriggerTypeId': 322, 'VideoDeleteDate': 1422783120000, 'MetadataDeleteDate': 1577871120000, 'RetentionStatus': 15, 'PartnerTriggerId': None, 'CoachingStateId': 0, 'EventKudoHistoryId': None}> of type <class 'dict'> expected null",
"CoreOLTPEvents.dbo.Event.Value.Latitude is <UU9elrA=> of type <class 'str'> expected null",
"CoreOLTPEvents.dbo.Event.Value.Latitude is <UU9elrA=> of type <class 'str'> expected {'scale': 10, 'precision': 13, 'connect.version': 1, 'connect.parameters': {'scale': '10', 'connect.decimal.precision': '13'}, 'connect.name': 'org.apache.kafka.connect.data.Decimal', 'logicalType': 'decimal', 'type': 'bytes'}"
]
So that is trying to tell us that there is 3 potential problems. The first is that the value in after doesn't match null, but we can ignore that because we don't want after to match null.
The later two problems are the actual problem. It says that the value of Latitude is the string UU9elrA=, but that doesn't match either null or bytes. The string here looks base64 encoded, so maybe you have some code that decodes that to bytes and if so then maybe the actual problem is something else, but if so then I think you should be able to use fastavro.validate to figure out what the problem is.