Handle Null Records in AVRO schema - avro

I have a problem where my record json can be null. How to handle null records in avro schema? The documentation given is for null attributes I want to get for null records.

the same applies to record types too. you can union the type with null.
{
"name": "SpokenLanguage",
"type": [
"null",
{
"type": "record",
"name": "Language",
"fields": [
{
"name": "IsoCode",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "isPrimary",
"type": "boolean",
"default": false
},
{
"name": "description",
"type": [
"null",
"string"
],
"default": null
}
]
}
],
"default": null
}

Related

Avro Schema: don't know which "defualt" value to use

One of my fields in my Avro Schema is:
{
"name": "recipients",
"type": {
"type": "array",
"items": {
"name": "recordings",
"type": "record",
"fields": [
{
"name": "returnAddress",
"type": ["null","string"],
"default": null
},
{
"name": "something",
"type": ["null","string"],
"default": null
},
{
"name": "phone",
"type": ["null","string"],
"default": null
},
{
"name": "pfp",
"type": {
"name": "pfp",
"type": "record",
"fields": []
},
"default": {"a": 1}
},
{
"name": "example1",
"type": "int",
"default": -1
},
{
"name": "example2",
"type": "int",
"default": -1
}
]
},
"default": [1]
}
}
However I get an error message :
Field recipients type:ARRAY pos:4 not set and has no default value
Do I need a default value for the recipients field? and if so what would the default value be for the type listed under recipients. I have tried "default":null, "default": {"a":1}, and "default": [1] and all returned errors.

Avro schema cannot deserialize autoregistered avro schema by connector

We are trying to consume a topic that has data emitted by a connector. We are using a handwritten schema that matches the data in the topic.
{
"type": "record",
"name": "Event",
"namespace": "com.example.avro",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": ["null", "string"],
"default": null
},
{
"name": "entity_id",
"type": ["null", "string"],
"default": null
},
{
"name": "emitted_at",
"type": ["null", "string"],
"default": null
},
{
"name": "data",
"type": ["null", "string"],
"default": null
}
]
}
Unfortunately it cannot deserialize this because of the auto-registered schema by the connector.
{
"type": "record",
"name": "Value",
"namespace": "postgres.public.events",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "entity_id",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "emitted_at",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.time.ZonedTimestamp"
}
],
"default": null
},
{
"name": "data",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.data.Json"
}
],
"default": null
}
],
"connect.name": "postgres.public.events.Value"
}
We are getting the following error:
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class postgres.public.events.Value specified in writer's schema whilst finding reader's schema for a SpecificRecord.
How do we resolve this issue?
You can either download the schema from the registry instead of defining your own (there's maven plugins to do this), or change the namespace+name of your own schema such that the generated class will match.
Adding an alias might work as well, but I've not had much experience/luck with that, personally.

No Type Error when trying to transform JSON to AVRO

I'm trying to convert a JSON payload to Avro to publish to a Kafka topic. However, when I do the Dataweave transformation I'm getting a "No Type" error. I'm not sure what's causing the error. I originally thought this might be due to the transformation not knowing what the MIME type on the inbound payload. So, I've made sure that it's set to application/json but that didn't make any difference.
Avro Schema
{
"compatibility" : "forward",
"name": "ContentManagerCoupons",
"type": "record",
"namespace": "com.rentpath",
"fields": [
{
"name": "clientID",
"type": "string"
},
{
"name": "outputHistoryId",
"type": "string"
},
{
"name": "categoryCoupons",
"type": {
"type": "array",
"items": {
"name": "categoryCoupons_record",
"type": "record",
"fields": [
{
"name": "applyBy",
"type": [
"string",
"int",
"null"
]
},
{
"name": "applyPeriod",
"type": [
"string",
"null"
]
},
{
"name": "cashValue",
"type": [
"int",
"null"
]
},
{
"name": "couponCategory",
"type": "string"
},
{
"name": "cashOffDesc",
"type": [
"string",
"null"
]
},
{
"name": "endDate",
"type": [
"string",
"null"
]
},
{
"name": "feeType",
"type": [
"string",
"null"
]
},
{
"name": "freeWeeks",
"type": [
"string",
"null"
]
},
{
"name": "generatedText",
"type": "string"
},
{
"name": "leaseby",
"type": [
"string",
"null"
]
},
{
"name": "leaseTerm",
"type": [
"int",
"null"
]
},
{
"name": "offerText",
"type": [
"string",
"null"
]
},
{
"name": "startDate",
"type": "string"
},
{
"name": "unitType",
"type": [
"string",
"null"
]
}
]
}
}
}
]
}
JSON Message
{
"outputHistoryId": "55324456",
"clientID": "112345",
"categoryCoupons": [
{
"unitType": null,
"startDate": "07/21/2020",
"offerText": "This would be the special offer message.",
"leaseTerm": null,
"leaseby": null,
"generatedText": "This would be the special offer message..",
"freeWeeks": null,
"feeType": null,
"endDate": "10/01/2020",
"couponCategory": "Special Offer",
"cashValue": null,
"cashOffDesc": null,
"applyPeriod": null,
"applyBy": null
}
]
}
Datawave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1"
---
payload
Error Message
"org.apache.avro.SchemaParseException - No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
org.apache.avro.SchemaParseException: No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
at org.apache.avro.Schema.getRequiredText(Schema.java:1753)
at org.apache.avro.Schema.parse(Schema.java:1604)
at org.apache.avro.Schema$Parser.parse(Schema.java:1394)
at org.apache.avro.Schema$Parser.parse(Schema.java:1365)
at org.mule.weave.v2.module.avro.AvroWriter.doWriteValue(AvroWriter.scala:195)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.avro.AvroWriter.writeValue(AvroWriter.scala:44)
at org.mule.weave.v2.module.writer.DeferredWriter.doWriteValue(DeferredWriter.scala:73)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.writer.DeferredWriter.writeValue(DeferredWriter.scala:16)
at org.mule.weave.v2.module.writer.WriterHelper$.writeValue(Writer.scala:120)
at org.mule.weave.v2.module.writer.WriterHelper$.writeAndGetResult(Writer.scala:98)
at org.mule.weave.v2.interpreted.InterpretedMappingExecutableWeave.write(InterpreterMappingCompilerPhase.scala:236)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluateWithTimeout(WeaveExpressionLanguageSession.scala:243)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluate(WeaveExpressionLanguageSession.scala:108)
at org.mule.runtime.core.internal.el.dataweave.DataWeaveExpressionLanguageAdaptor$1.evaluate(DataWeaveExpressionLanguageAdaptor.java:308)
at org.mule.runtime.core.internal.el.DefaultExpressionManagerSession.evaluate(DefaultExpressionManagerSession.java:105)
at com.mulesoft.mule.runtime.core.internal.processor.SetPayloadTransformationTarget.process(SetPayloadTransformationTarget.java:32)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.lambda$0(TransformMessageProcessor.java:92)
at java.util.Optional.ifPresent(Optional.java:159)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.process(TransformMessageProcessor.java:92)
at org.mule.runtime.core.api.util.func.CheckedFunction.apply(CheckedFunction.java:25)
at org.mule.runtime.core.api.rx.Exceptions.lambda$checkedFunction$2(Exceptions.java:84)
at org.mule.runtime.core.internal.util.rx.Operators.lambda$nullSafeMap$0(Operators.java:47)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:165)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:425)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:420)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:127)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onNext(FluxOnAssembly.java:345)
at reactor.core.publisher.FluxSubscribeOnValue$ScheduledScalar.run(FluxSubscribeOnValue.java:178)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:50)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.mule.service.scheduler.internal.AbstractRunnableFutureDecorator.doRun(AbstractRunnableFutureDecorator.java:111)
at org.mule.service.scheduler.internal.RunnableFutureDecorator.run(RunnableFutureDecorator.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748), while writing Avro at payload.
It seems to work for me. Maybe there is a problem trying to access the schema. Because I don't have access to that URL I replaced it with a local file:
output application/avro schemaUrl="classpath://schema.json"
Apparently the answer is pretty simple. I just needed to add schema to the end of my URL. This removes the extraneous items like version and id that come down without it.
New Dataweave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1/schema"
---
payload

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

Avro schema issue when record missing a field

I am using the NiFi (v1.2) processor ConvertJSONToAvro. I am not able to parse a record that only contains 1 of 2 elements in a "record" type. This element is also allowed to be missing entirely from the data. Is my Avro schema incorrect?
Schema snippet:
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
If "personname" contains both "first" and "last" it works, but if it only contains one of the elements, it fails with the error: Cannot convert field personname: cannot resolve union:
{ "last":"Smith" }
not in
"type": [ "null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
You are missing the default value
https://avro.apache.org/docs/1.8.1/spec.html#schema_record
Your schema should looks like
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
],
"default": "null"
},
{
"name": "last",
"type": [
"null",
"string"
],
"default": "null"
}
]
}
]

Resources