Data creation Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING [duplicate] - avro

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?

Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

Related

Avro Schema: don't know which "defualt" value to use

One of my fields in my Avro Schema is:
{
"name": "recipients",
"type": {
"type": "array",
"items": {
"name": "recordings",
"type": "record",
"fields": [
{
"name": "returnAddress",
"type": ["null","string"],
"default": null
},
{
"name": "something",
"type": ["null","string"],
"default": null
},
{
"name": "phone",
"type": ["null","string"],
"default": null
},
{
"name": "pfp",
"type": {
"name": "pfp",
"type": "record",
"fields": []
},
"default": {"a": 1}
},
{
"name": "example1",
"type": "int",
"default": -1
},
{
"name": "example2",
"type": "int",
"default": -1
}
]
},
"default": [1]
}
}
However I get an error message :
Field recipients type:ARRAY pos:4 not set and has no default value
Do I need a default value for the recipients field? and if so what would the default value be for the type listed under recipients. I have tried "default":null, "default": {"a":1}, and "default": [1] and all returned errors.

Avro Nested array exception

I am trying to generate avro schema for nested array .
The top most array stores is the issue, however inner array Business is correct.
{"name": "Stores",
"type": {
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{"name": "Business",
"type":"array",
"items": {"name":"Business_record","type":"record","fields":[
{"name": "Day", "type":"string"},
{"name": "StartTime", "type": "string"},
{"name": "EndTime", "type": "string"}
]}
}
]
}
}
And the exception im getting is :
[ {
"level" : "fatal",
"message" : "illegal Avro schema",
"exceptionClass" : "org.apache.avro.SchemaParseException",
"exceptionMessage" : "No type: {\"name\":\"Stores\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"Hours\",\"type\":\"record\",\"fields\":[{\"name\":\"Week\",\"type\":\"string\"},{\"name\":\"Business\",\"type\":\"array\",\"items\":{\"name\":\"Business_record\",\"type\":\"record\",\"fields\":[{\"name\":\"Day\",\"type\":\"string\"},{\"name\":\"StartTime\",\"type\":\"string\"},{\"name\":\"EndTime\",\"type\":\"string\"}]}}]}}}",
"info" : "other messages follow (if any)"
} ]
I think something to do with [] Or{} for the outer array fields but I'm not able to figure it out.
Any help is appreciated.
I found the mistake i was doing:
when added the "type": for the nested array it worked for me.
{
"name": "Stores",
"type": "array",
"items": {
"name": "Hours",
"type": "record",
"fields": [
{
"name": "Week",
"type": "string"
},
{
"name": "Business",
"type": {
"type": "array",
"items": {
"name": "Business_record",
"type": "record",
"fields": [
{
"name": "Day",
"type": "string"
},
{
"name": "StartTime",
"type": "string"
},
{
"name": "EndTime",
"type": "string"
}
]
}
}
}
]
}
}

No Type Error when trying to transform JSON to AVRO

I'm trying to convert a JSON payload to Avro to publish to a Kafka topic. However, when I do the Dataweave transformation I'm getting a "No Type" error. I'm not sure what's causing the error. I originally thought this might be due to the transformation not knowing what the MIME type on the inbound payload. So, I've made sure that it's set to application/json but that didn't make any difference.
Avro Schema
{
"compatibility" : "forward",
"name": "ContentManagerCoupons",
"type": "record",
"namespace": "com.rentpath",
"fields": [
{
"name": "clientID",
"type": "string"
},
{
"name": "outputHistoryId",
"type": "string"
},
{
"name": "categoryCoupons",
"type": {
"type": "array",
"items": {
"name": "categoryCoupons_record",
"type": "record",
"fields": [
{
"name": "applyBy",
"type": [
"string",
"int",
"null"
]
},
{
"name": "applyPeriod",
"type": [
"string",
"null"
]
},
{
"name": "cashValue",
"type": [
"int",
"null"
]
},
{
"name": "couponCategory",
"type": "string"
},
{
"name": "cashOffDesc",
"type": [
"string",
"null"
]
},
{
"name": "endDate",
"type": [
"string",
"null"
]
},
{
"name": "feeType",
"type": [
"string",
"null"
]
},
{
"name": "freeWeeks",
"type": [
"string",
"null"
]
},
{
"name": "generatedText",
"type": "string"
},
{
"name": "leaseby",
"type": [
"string",
"null"
]
},
{
"name": "leaseTerm",
"type": [
"int",
"null"
]
},
{
"name": "offerText",
"type": [
"string",
"null"
]
},
{
"name": "startDate",
"type": "string"
},
{
"name": "unitType",
"type": [
"string",
"null"
]
}
]
}
}
}
]
}
JSON Message
{
"outputHistoryId": "55324456",
"clientID": "112345",
"categoryCoupons": [
{
"unitType": null,
"startDate": "07/21/2020",
"offerText": "This would be the special offer message.",
"leaseTerm": null,
"leaseby": null,
"generatedText": "This would be the special offer message..",
"freeWeeks": null,
"feeType": null,
"endDate": "10/01/2020",
"couponCategory": "Special Offer",
"cashValue": null,
"cashOffDesc": null,
"applyPeriod": null,
"applyBy": null
}
]
}
Datawave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1"
---
payload
Error Message
"org.apache.avro.SchemaParseException - No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
org.apache.avro.SchemaParseException: No type: {"subject":"ContentManager.Coupon-value","version":1,"id":342,"schema":"{"type":"record","name":"ContentManagerCoupons","namespace":"com.rentpath","fields":[{"name":"clientID","type":"string"},{"name":"outputHistoryId","type":"string"},{"name":"categoryCoupons","type":{"type":"array","items":{"type":"record","name":"categoryCoupons_record","fields":[{"name":"applyBy","type":["string","int","null"]},{"name":"applyPeriod","type":["string","null"]},{"name":"cashValue","type":["int","null"]},{"name":"couponCategory","type":"string"},{"name":"cashOffDesc","type":["string","null"]},{"name":"endDate","type":["string","null"]},{"name":"feeType","type":["string","null"]},{"name":"freeWeeks","type":["string","null"]},{"name":"generatedText","type":"string"},{"name":"leaseby","type":["string","null"]},{"name":"leaseTerm","type":["int","null"]},{"name":"offerText","type":["string","null"]},{"name":"startDate","type":"string"},{"name":"unitType","type":["string","null"]}]}}}],"compatibility":"forward"}"}
at org.apache.avro.Schema.getRequiredText(Schema.java:1753)
at org.apache.avro.Schema.parse(Schema.java:1604)
at org.apache.avro.Schema$Parser.parse(Schema.java:1394)
at org.apache.avro.Schema$Parser.parse(Schema.java:1365)
at org.mule.weave.v2.module.avro.AvroWriter.doWriteValue(AvroWriter.scala:195)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.avro.AvroWriter.writeValue(AvroWriter.scala:44)
at org.mule.weave.v2.module.writer.DeferredWriter.doWriteValue(DeferredWriter.scala:73)
at org.mule.weave.v2.module.writer.Writer.writeValue(Writer.scala:41)
at org.mule.weave.v2.module.writer.Writer.writeValue$(Writer.scala:39)
at org.mule.weave.v2.module.writer.DeferredWriter.writeValue(DeferredWriter.scala:16)
at org.mule.weave.v2.module.writer.WriterHelper$.writeValue(Writer.scala:120)
at org.mule.weave.v2.module.writer.WriterHelper$.writeAndGetResult(Writer.scala:98)
at org.mule.weave.v2.interpreted.InterpretedMappingExecutableWeave.write(InterpreterMappingCompilerPhase.scala:236)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluateWithTimeout(WeaveExpressionLanguageSession.scala:243)
at org.mule.weave.v2.el.WeaveExpressionLanguageSession.evaluate(WeaveExpressionLanguageSession.scala:108)
at org.mule.runtime.core.internal.el.dataweave.DataWeaveExpressionLanguageAdaptor$1.evaluate(DataWeaveExpressionLanguageAdaptor.java:308)
at org.mule.runtime.core.internal.el.DefaultExpressionManagerSession.evaluate(DefaultExpressionManagerSession.java:105)
at com.mulesoft.mule.runtime.core.internal.processor.SetPayloadTransformationTarget.process(SetPayloadTransformationTarget.java:32)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.lambda$0(TransformMessageProcessor.java:92)
at java.util.Optional.ifPresent(Optional.java:159)
at com.mulesoft.mule.runtime.core.internal.processor.TransformMessageProcessor.process(TransformMessageProcessor.java:92)
at org.mule.runtime.core.api.util.func.CheckedFunction.apply(CheckedFunction.java:25)
at org.mule.runtime.core.api.rx.Exceptions.lambda$checkedFunction$2(Exceptions.java:84)
at org.mule.runtime.core.internal.util.rx.Operators.lambda$nullSafeMap$0(Operators.java:47)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:165)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:425)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$2.onNext(AbstractMessageProcessorChain.java:420)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:127)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onNext(FluxOnAssembly.java:345)
at reactor.core.publisher.FluxSubscribeOnValue$ScheduledScalar.run(FluxSubscribeOnValue.java:178)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:50)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.mule.service.scheduler.internal.AbstractRunnableFutureDecorator.doRun(AbstractRunnableFutureDecorator.java:111)
at org.mule.service.scheduler.internal.RunnableFutureDecorator.run(RunnableFutureDecorator.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748), while writing Avro at payload.
It seems to work for me. Maybe there is a problem trying to access the schema. Because I don't have access to that URL I replaced it with a local file:
output application/avro schemaUrl="classpath://schema.json"
Apparently the answer is pretty simple. I just needed to add schema to the end of my URL. This removes the extraneous items like version and id that come down without it.
New Dataweave
%dw 2.2
output application/avro schemaUrl="http://schema-registry.domain.com:8081/subjects/Coupon-value/versions/1/schema"
---
payload

Creating an avro schema for an array with multiple record types?

I am creating an avro schema for a JSON payload that appear to have an array of multiple objects. I'm not sure exactly how to represent this in the schema. The key in question is content:
{
"id": "channel-id",
"name": "My Channel with a New Title",
"description": "Herpy me derpy merpus herpsum ner berp berps derp ter tee",
"privacyLevel": "<private|org>",
"planId": "some-plan-id",
"owner": "a-user-handle",
"curators": [
"user-handle-1",
"user-handle-2"
],
"members": 5,
"content": [
{
"id": "docker",
"slug": "docker",
"index": 1,
"type": "path"
},
{
"id": "such-linkage",
"slug": "such-linkage",
"index": 2,
"type": "external-link",
"details": {
"url": "http://some-dank-link.com",
"title": "My Dank Link",
"contentType": "External Link",
"level": "Beginner",
"duration": "PT34293H33M9S"
}
},
{
"id": "21f1e812-b10a-40df-8b52-3a1d05fc215c",
"slug": "windows-azure-storage-in-depth",
"index": 3,
"type": "course"
},
{
"id": "7c346c05-6416-42dd-80b2-d5e758de7926",
"slug": "7c346c05-6416-42dd-80b2-d5e758de7926",
"index": 4,
"type": "project"
}
],
"imageUrls": ["https://url/to/an/image", "https://url/to/another/image"],
"analyticsEnabled": true,
"orgDiscoverable": false,
"createdDate": "2015-12-31T01:23:45+00:00",
"archiveDate": "2015-12-31T01:23:45+00:00",
"messagePublishedAt": "2015-12-31T01:23:45+00:00"
}
If you are asking if it is possible create an array with different kind of records, it is. Avro support this through union. it would looks like .
{
"name": "myRecord",
"type":"record",
"fields":[
{
"name":"myArrayWithMultiplesTypes",
"type":{
"type": "array",
"items":[
{
"name":"typeOne",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
},
{
"name":"typeTwo",
"type":"record",
"fields":[
{"name":"id", "type":"int"}
]
}
]
}
}
]
}
If you already have the records defined previously, then it could look like this:
{
"name": "mulitplePossibleTypes",
"type": [
"null",
{
"type": "array",
"items": [
"com.xyz.kola.cloud.events.itemmanager.Part",
"com.xyz.kola.cloud.events.itemmanager.Document",
"com.xyz.kola.cloud.events.itemmanager.DigitalModel",
"com.xyz.kola.cloud.events.itemmanager.Interface"
]
}
]
},

Avro schema issue when record missing a field

I am using the NiFi (v1.2) processor ConvertJSONToAvro. I am not able to parse a record that only contains 1 of 2 elements in a "record" type. This element is also allowed to be missing entirely from the data. Is my Avro schema incorrect?
Schema snippet:
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
If "personname" contains both "first" and "last" it works, but if it only contains one of the elements, it fails with the error: Cannot convert field personname: cannot resolve union:
{ "last":"Smith" }
not in
"type": [ "null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
]
},
{
"name": "last",
"type": [
"null",
"string"
]
}
]
}
]
You are missing the default value
https://avro.apache.org/docs/1.8.1/spec.html#schema_record
Your schema should looks like
"name": "personname",
"type": [
"null":,
{
"type": "record",
"name": "firstandorlast",
"fields": [
{
"name": "first",
"type": [
"null",
"string"
],
"default": "null"
},
{
"name": "last",
"type": [
"null",
"string"
],
"default": "null"
}
]
}
]

Resources