How to re-use logical type in Apache Avro - avro

I can define UUID type in avro with logical type like this
{
"type":"record",
"name":"Metadata",
"namespace":"com.example",
"doc":"Event metadata",
"fields":[
{
"name":"messageId",
"type": {
"type": "string",
"logicalType": "uuid"
}
}
]
}
But I'd like to re-use UUID someway like this:
{
"type":"record",
"name":"Metadata",
"namespace":"com.example",
"doc":"Event metadata",
"fields":[
{
"name":"messageId",
"type": "com.example.UUID"
}
]
}
{
"namespace" : "com.example",
"name" : "UUID",
"type": {
"type": "string",
"logicalType": "uuid"
}
}
But this schema is not valid. How can I re-use logical type in Avro?

Related

How can I write an avro schema for an array of arrays?

For example, I've tried this one, but it isn't working. I have to create a schema that have in one field an arrays of arrays and I couldn't do it.
{
"name": "SelfHealingStarter",
"namespace": "SCP.Kafka.AvroSchemas",
"doc": "Message with all the necessary information to run Self Healing process.",
"type": "record",
"fields": [
{
"name": "FiveMinutesAgoMeasurement",
"type": "record",
"doc": "Field with all five minutes ago measurement.",
"fields": [
{
"name": "equipments",
"doc": "List with all equipments measurement.",
"type": {
"type": "array",
"items": {
"type": {
"type": "array",
"items": "string"
},
"default": []
}
},
"default": []
}
]
}
]
}
IDL
protocol Example {
record Foo {
array<array<string>> data = [];
}
}
AVSC from java -jar ~/workspace/avro-tools-1.8.2.jar idl2schemata example.idl
{
"type" : "record",
"name" : "Foo",
"fields" : [ {
"name" : "data",
"type" : {
"type" : "array",
"items" : {
"type" : "array",
"items" : "string"
}
},
"default" : [ ]
} ]
}

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

Nifi RecordReader & RecordWriter serialization error. IllegalTypeConversionException; Cannot convert value of class; because the type is not supported

I'm trying to convert json to json using JoltTransformRecord in Apache Nifi. When I try transform json in https://jolt-demo.appspot.com/, I'm getting correct result. This is okey.
But, when I'm trying to transform json using JoltTransformRecord, it is throws an exception. Error is; "Cannot convert value of class [Ljava.lang.Object; because the type is not supported". But I didn't understand why I'm getting this error. I did check my input and output schemas, but I didn't find anything. They looks like correct.
In the below, my input and output json examples, jolt specification, input and output schemas are given. Also, for this, I'm using JsonTreeReader and JsonRecordSetWriter.
--- How can I solve this problem? ---
Example input json for JoltTransformRecord(In this example, there is only one json object inside array. But actually, there are a lot of json object inside array.);
[ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"Samples" : {
"PathFeedrate" : [ {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
} ]
},
"Events" : {
"SequenceNumber" : [ {
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
} ],
"Unit" : [ {
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
}]
Sample output Json I want;
{
"DataItems" : [ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "SequenceNumber",
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "Unit",
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
My Jolt Specification;
[
{
"operation": "shift",
"spec": {
"Samples": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Events": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Condition": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].value",
"*": "Items.&2[#2].&"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"Items": {
"*": {
"*": "DataItems[]"
}
}
}
}
]
This specification is working correctly. Because I have tried it in Jolt transform demo.
I'm using JsonTreeReader for read json in JoltTransformRecord. And this is my input schema;
{
"name": "Items",
"namespace": "Items",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "Samples",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "SamplesDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "sampleRate",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "duration",
"type": ["null", "string"]
},
{
"name": "sampleCount",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Events",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "EventsDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Condition",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "ConditionDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "nativeCode",
"type": ["null", "string"]
},
{
"name": "nativeSeverity",
"type": ["null", "string"]
},
{
"name": "qualifier",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
}
]
}
}
}]
}
]
}
I'm using JsonRecordSetWriter for write converted result in JoltTransformRecord. And this is my output schema;
{
"name": "Items",
"type": "record",
"namespace": "Items",
"fields": [
{
"name": "DataItems",
"type": {
"type": "array",
"items": {
"name": "DataItems",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "eventType",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"],
"default": null
},
{
"name": "subType",
"type": ["null", "string"],
"default": null
}
]
}
}
}
]
}
This is indeed a bug in the record handling utilities, I have written NIFI-6105 to cover the fix. Good catch!
As a workaround, since you have JSON as input and output, you can use JoltTransformJson instead of JoltTransformRecord. Alternatively, if you know the keys in the map (PathFeedrate, e.g.), you can change the schema to treat it as a record rather than a map, that might get you around the bug.

Can we refer to a only one property of other schema

I have a rest service, that can work as below:
http://server/path/AddressResource and
http://server/path/AddressResource/someAnotherPath
I have a definitions like below.
"definitions": {
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
}
that is the response of path1, and in path two i just want to return the "city" property of address.
Can I create a schema, referring to address and using just one of it's property?

How can I describe complex json model in swagger

I'm trying to use Swagger to describe web-api I'm building.
The problem is that I can't understand how to describe complex json object?
For example, how to describe this objects:
{
name: "Jhon",
address: [
{
type: "home",
line1: "1st street"
},
{
type: "office",
line1: "2nd street"
}
]
}
Okay, so based on the comments above, you want the following schema:
{
"definitions": {
"user": {
"type": "object",
"required": [ "name" ],
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "array",
"items": {
"$ref": "#/definitions/address"
}
}
}
},
"address": {
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [ "home", "office" ]
},
"line1": {
"type": "string"
}
}
}
}
}
I've made a few assumptions to make the sample a bit more complicated, to help in the future.
For the "user" object, I've declared that the "name" field is mandatory. If, for example, you also need the address to be mandatory, you can change the definition to "required": [ "name", "address" ].
We basically use a subset of json-schema to describe the models. Of course not everyone knows it, but it's fairly simple to learn and use.
For the address type you can see I also set the limit to two options - either home or office. You can add anything to that list, or remove the "enum" entirely to remove that constraint.
When the "type" of a property is "array", you need to accompany it with "items" which declares the internal type of the array. In this case, I referenced another definition, but that definition could have been inline as well. It's normally easier to maintain that way, especially if you need the "address" definition alone or within other models.
As requested, the inline version:
{
"definitions": {
"user": {
"type": "object",
"required": [
"name"
],
"properties": {
"name": {
"type": "string"
},
"address": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"home",
"office"
]
},
"line1": {
"type": "string"
}
}
}
}
}
}
}
}

Resources