Avro Tools Failure Expected start-union. Got VALUE_STRING - avro

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

Related

Data creation Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING [duplicate]

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?
Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

Avro schema cannot deserialize autoregistered avro schema by connector

We are trying to consume a topic that has data emitted by a connector. We are using a handwritten schema that matches the data in the topic.
{
"type": "record",
"name": "Event",
"namespace": "com.example.avro",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": ["null", "string"],
"default": null
},
{
"name": "entity_id",
"type": ["null", "string"],
"default": null
},
{
"name": "emitted_at",
"type": ["null", "string"],
"default": null
},
{
"name": "data",
"type": ["null", "string"],
"default": null
}
]
}
Unfortunately it cannot deserialize this because of the auto-registered schema by the connector.
{
"type": "record",
"name": "Value",
"namespace": "postgres.public.events",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "type",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "entity_id",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "emitted_at",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.time.ZonedTimestamp"
}
],
"default": null
},
{
"name": "data",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.name": "io.debezium.data.Json"
}
],
"default": null
}
],
"connect.name": "postgres.public.events.Value"
}
We are getting the following error:
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class postgres.public.events.Value specified in writer's schema whilst finding reader's schema for a SpecificRecord.
How do we resolve this issue?
You can either download the schema from the registry instead of defining your own (there's maven plugins to do this), or change the namespace+name of your own schema such that the generated class will match.
Adding an alias might work as well, but I've not had much experience/luck with that, personally.

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

No Type Error when trying to transform JSON to AVRO

I'm trying to convert a JSON payload to Avro to publish to a Kafka topic. However, when I do the Dataweave transformation I'm getting a "No Type" error. I'm not sure what's causing the error. I originally thought this might be due to the transformation not knowing what the MIME type on the inbound payload. So, I've made sure that it's set to application/json but that didn't make any difference.
Avro Schema
{
"compatibility" : "forward",
"name": "ContentManagerCoupons",
"type": "record",
"namespace": "com.rentpath",
"fields": [
{
"name": "clientID",
"type": "string"
},
{
"name": "outputHistoryId",
"type": "string"
},
{
"name": "categoryCoupons",
"type": {
"type": "array",
"items": {
"name": "categoryCoupons_record",
"type": "record",
"fields": [
{
"name": "applyBy",
"type": [
"string",
"int",
"null"
]
},
{
"name": "applyPeriod",
"type": [
"string",
"null"
]
},
{
"name": "cashValue",
"type": [
"int",
"null"
]
},
{
"name": "couponCategory",
"type": "string"
},
{
"name": "cashOffDesc",
"type": [
"string",
"null"
]
},
{
"name": "endDate",
"type": [
"string",
"null"
]
},
{
"name": "feeType",
"type": [
"string",
"null"
]
},
{
"name": "freeWeeks",
"type": [
"string",
"null"
]
},
{
"name": "generatedText",
"type": "string"
},
{
"name": "leaseby",
"type": [
"string",
"null"
]
},
{
"name": "leaseTerm",
"type": [
"int",
"null"
]
},
{
"name": "offerText",
"type": [
"string",
"null"
]
},
{
"name": "startDate",
"type": "string"
},
{
"name": "unitType",
"type": [
"string",
"null"
]
}
]
}
}
}
]
}
JSON Message
{
"outputHistoryId": "55324456",
"clientID": "112345",
"categoryCoupons": [
{
"unitType": null,
"startDate": "07/21/2020",
"offerText": "This would be the special offer message.",
"leaseTerm": null,
"leaseby": null,
"generatedText": "This would be the special offer message..",
"freeWeeks": null,
"feeType": null,
"endDate": "10/01/2020",
"couponCategory": "Special Offer",
"cashValue": null,
"cashOffDesc": null,
"applyPeriod": null,
"applyBy": null
}
]
}
Datawave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1"
---
payload
Error Message
"org.apache.avro.SchemaParseException - No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
org.apache.avro.SchemaParseException: No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
at org.apache.avro.Schema.getRequiredText(Schema.java:1753)
at org.apache.avro.Schema.parse(Schema.java:1604)
at org.apache.avro.Schema$Parser.parse(Schema.java:1394)
at org.apache.avro.Schema$Parser.parse(Schema.java:1365)
at org.mule.weave.v2.module.avro.AvroWriter.doWriteValue(AvroWriter.scala:195)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.avro.AvroWriter.writeValue(AvroWriter.scala:44)
at org.mule.weave.v2.module.writer.DeferredWriter.doWriteValue(DeferredWriter.scala:73)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.writer.DeferredWriter.writeValue(DeferredWriter.scala:16)
at org.mule.weave.v2.module.writer.WriterHelper$.writeValue(Writer.scala:120)
at org.mule.weave.v2.module.writer.WriterHelper$.writeAndGetResult(Writer.scala:98)
at org.mule.weave.v2.interpreted.InterpretedMappingExecutableWeave.write(InterpreterMappingCompilerPhase.scala:236)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluateWithTimeout(WeaveExpressionLanguageSession.scala:243)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluate(WeaveExpressionLanguageSession.scala:108)
at org.mule.runtime.core.internal.el.dataweave.DataWeaveExpressionLanguageAdaptor$1.evaluate(DataWeaveExpressionLanguageAdaptor.java:308)
at org.mule.runtime.core.internal.el.DefaultExpressionManagerSession.evaluate(DefaultExpressionManagerSession.java:105)
at com.mulesoft.mule.runtime.core.internal.processor.SetPayloadTransformationTarget.process(SetPayloadTransformationTarget.java:32)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.lambda$0(TransformMessageProcessor.java:92)
at java.util.Optional.ifPresent(Optional.java:159)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.process(TransformMessageProcessor.java:92)
at org.mule.runtime.core.api.util.func.CheckedFunction.apply(CheckedFunction.java:25)
at org.mule.runtime.core.api.rx.Exceptions.lambda$checkedFunction$2(Exceptions.java:84)
at org.mule.runtime.core.internal.util.rx.Operators.lambda$nullSafeMap$0(Operators.java:47)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:165)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:425)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:420)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:127)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onNext(FluxOnAssembly.java:345)
at reactor.core.publisher.FluxSubscribeOnValue$ScheduledScalar.run(FluxSubscribeOnValue.java:178)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:50)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.mule.service.scheduler.internal.AbstractRunnableFutureDecorator.doRun(AbstractRunnableFutureDecorator.java:111)
at org.mule.service.scheduler.internal.RunnableFutureDecorator.run(RunnableFutureDecorator.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748), while writing Avro at payload.
It seems to work for me. Maybe there is a problem trying to access the schema. Because I don't have access to that URL I replaced it with a local file:
output application/avro schemaUrl="classpath://schema.json"
Apparently the answer is pretty simple. I just needed to add schema to the end of my URL. This removes the extraneous items like version and id that come down without it.
New Dataweave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1/schema"
---
payload

Defining an Avro Schema

I have some avro data like this which is printed in terminal.
{"cust_status_id":0, "cust_status_description":{"string":" Approved"}}
The avro schema which I have created is like
{
"namespace": "com.thp.report.model",
"type": "record",
"name": "PraStatusMaster",
"fields": [
{
"name": "cust_status_id",
"type": "int"
},
{
"name": "cust_status_description",
"type": "string",
"avro.java.string": "String"
}
]
}
Is the schema correct??
Correct schema for your json is the following one:
{
"name": "PraStatusMaster",
"type": "record",
"namespace": "com.thp.report.model",
"fields": [
{
"name": "cust_status_id",
"type": "int"
},
{
"name": "cust_status_description",
"type": {
"name": "cust_status_description",
"type": "record",
"fields": [
{
"name": "string",
"type": "string"
}
]
}
}
]
}

Resources