Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: test.record4 - avro

I am trying to compile an test.avsc avro schema file into java code.
Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: test.record4
The idea is record4 should be the same type in name4 and name5
My test.avsc is:
{
"type":"record",
"namespace":"test",
"name":"record1",
"fields":[
{
"name": "orbject1",
"type":{
"type": "array",
"items":{
"type":"record", "name":"record2", "fields": [
{"name": "name1",
"type":{
"type": "array",
"items":{
"type":"record", "name":"record3", "fields": [
{"type": {
"type":"record", "name":"record4", "fields": [
{"name":"name2", "type":"long"}
]
},
"name":"name4"
},
{"type": {
"type":"record","name":"test.record4"
},
"name":"name5"
}
]
}
}
}
]
}
}
}
]
}

The test.record4 error is fixed if you rearrange the schema to look like this:
{
"type":"record",
"namespace":"test",
"name":"record1",
"fields":[
{
"name": "orbject1",
"type":{
"type": "array",
"items":{
"type":"record", "name":"record2", "fields": [
{"name": "name1",
"type":{
"type": "array",
"items":{
"type":"record", "name":"record3", "fields": [
{"type": {
"type":"record", "name":"record4", "fields": [
{"name":"name2", "type":"long"}
]
},
"name":"name4"
},
{"type": "test.record4", "name":"name5" }
]
}
}
}
]
}
}
}
]
}

Related

Data creation Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING [duplicate]

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?
Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

Defining an Avro Schema

I have some avro data like this which is printed in terminal.
{"cust_status_id":0, "cust_status_description":{"string":" Approved"}}
The avro schema which I have created is like
{
"namespace": "com.thp.report.model",
"type": "record",
"name": "PraStatusMaster",
"fields": [
{
"name": "cust_status_id",
"type": "int"
},
{
"name": "cust_status_description",
"type": "string",
"avro.java.string": "String"
}
]
}
Is the schema correct??
Correct schema for your json is the following one:
{
"name": "PraStatusMaster",
"type": "record",
"namespace": "com.thp.report.model",
"fields": [
{
"name": "cust_status_id",
"type": "int"
},
{
"name": "cust_status_description",
"type": {
"name": "cust_status_description",
"type": "record",
"fields": [
{
"name": "string",
"type": "string"
}
]
}
}
]
}

Creating an avro schema for an array with multiple record types?

I am creating an avro schema for a JSON payload that appear to have an array of multiple objects. I'm not sure exactly how to represent this in the schema. The key in question is content:
{
"id": "channel-id",
"name": "My Channel with a New Title",
"description": "Herpy me derpy merpus herpsum ner berp berps derp ter tee",
"privacyLevel": "<private|org>",
"planId": "some-plan-id",
"owner": "a-user-handle",
"curators": [
"user-handle-1",
"user-handle-2"
],
"members": 5,
"content": [
{
"id": "docker",
"slug": "docker",
"index": 1,
"type": "path"
},
{
"id": "such-linkage",
"slug": "such-linkage",
"index": 2,
"type": "external-link",
"details": {
"url": "http://some-dank-link.com",
"title": "My Dank Link",
"contentType": "External Link",
"level": "Beginner",
"duration": "PT34293H33M9S"
}
},
{
"id": "21f1e812-b10a-40df-8b52-3a1d05fc215c",
"slug": "windows-azure-storage-in-depth",
"index": 3,
"type": "course"
},
{
"id": "7c346c05-6416-42dd-80b2-d5e758de7926",
"slug": "7c346c05-6416-42dd-80b2-d5e758de7926",
"index": 4,
"type": "project"
}
],
"imageUrls": ["https://url/to/an/image", "https://url/to/another/image"],
"analyticsEnabled": true,
"orgDiscoverable": false,
"createdDate": "2015-12-31T01:23:45+00:00",
"archiveDate": "2015-12-31T01:23:45+00:00",
"messagePublishedAt": "2015-12-31T01:23:45+00:00"
}
If you are asking if it is possible create an array with different kind of records, it is. Avro support this through union. it would looks like .
{
"name": "myRecord",
"type":"record",
"fields":[
{
"name":"myArrayWithMultiplesTypes",
"type":{
"type": "array",
"items":[
{
"name":"typeOne",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
},
{
"name":"typeTwo",
"type":"record",
"fields":[
{"name":"id", "type":"int"}
]
}
]
}
}
]
}
If you already have the records defined previously, then it could look like this:
{
"name": "mulitplePossibleTypes",
"type": [
"null",
{
"type": "array",
"items": [
"com.xyz.kola.cloud.events.itemmanager.Part",
"com.xyz.kola.cloud.events.itemmanager.Document",
"com.xyz.kola.cloud.events.itemmanager.DigitalModel",
"com.xyz.kola.cloud.events.itemmanager.Interface"
]
}
]
},

How to define a schema which have a union in an array in avro?

I want to define my array element as a union. Is it possible? If so please share a sample schema.
I think this is what you are looking for:
Avro Schema:
{
"type": "record",
"namespace": "example.avro",
"name": "array_union",
"fields": [
{
"name": "body",
"type": {
"name": "body",
"type": "record",
"fields": [
{
"name": "abc",
"type": [
"null",
{
"type": "array",
"name": "abc_name_0",
"items": {
"name": "_name_0",
"type": "record",
"fields": [
{
"name": "app_id",
"type": [
"null",
"string"
]
},
{
"name": "app_name",
"type": [
"null",
"string"
]
},
{
"name": "app_key",
"type": [
"null",
"string"
]
}
]
}
}
]
}
]
}
}
]
}
Valid Json that schema can accept:
{
"body": {
"abc": {
"array": [
{
"app_id": {
"string": "abc"
},
"app_name": {
"string": "bcd"
},
"app_key": {
"string": "cde"
}
}
]
}
}
}
Or,
{
"body": {
"abc": null
}
}
You could add this below piece of code as a record field type
{
"name": "some_type",
"type": {
"type": "array",
"items": {
"name": "SomeType",
"type": "record",
"fields": [
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "text",
"type": "string"
},
{
"name": "type",
"type": {
"name": "NamedTextType",
"type": "enum",
"symbols": [ "named_text", "common_named_text" ]
}
}
]
}
}
}
Hope this helps!

Resources