avro - schema for logicalType

avro - schema for logicalType - avro

I am trying to learn avro and have a question in schema.
Some documents say
{
"name": "userid",
"type" : "string",
"logicalType" : "uuid"
},
And some say
{
"name": "userid",
"type" : {
"type" : "string",
"logicalType" : "uuid"
}
},
Which one is right? Or are they same?
Thank you!

I ran variants of your schemas with the avro tools "random" command ( aliased as avro below). It tries to generate a random value for a schema.
A schema with just this type using the nested type syntax to specify logicalType is rejected:
avro random --schema '{ "name": "userid", "type" : { "type": "string", "logicalType" : "uuid" } }' -
[...] No type: {"name":"userid","type":{"type":"string","logicalType":"uuid"}}
However, it works when putting the logicalType next to type:
avro random --schema ' { "type" : "string", "logicalType" : "uuid" }' -
[...] Objavro.schemaL{"type":"string","logicalType":"uuid"}avro.codecdeflate}�j�U�.�\�o���
Now, when we use it in a record, we get a warning when putting logicalType next to type:
avro random --schema '{ "type": "record", "fields": [ { "type" : "string", "logicalType" : "uuid", "name": "f"} ] , "name": "rec"}' -
[...] WARN avro.Schema: Ignored the rec.f.logicalType property ("uuid"). It should probably be nested inside the "type" for the field.
Objavro.schema�{"type":"record","name":"rec","fields":[{"name":"f","type":"string","logicalType":"uuid"}]}avro.codecdeflate��w�9�9�n�s�
The nested syntax is accepted without a warning:
avro random --schema '{ "type": "record", "fields": [ { "type" : { "type": "string", "logicalType" : "uuid" } , "name": "f"} ] , "name": "rec"}' -
�w<��qcord","name":"rec","fields":[{"name":"f","type":{"type":"string","logicalType":"uuid"}}]}avro.codecdeflate8��t
Further if we look at logicaltypes inside arrays:
avro random --count 1 --schema ' { "type": "array", "items": { "type" : "string", "logicalType" : "uuid" , "name": "f"} , "name": "farr" } ' -
[... random bits]
While the nested version fails:
avro random --count 1 --schema ' { "type": "array", "items": {"type": { "type" : "string", "logicalType" : "uuid" , "name": "f"} } , "name": "farr" } ' -
[...] No type: {"type":{"type":"string","logicalType":"uuid","name":"f"}}
It appears that if a logicalType is a type of a field in a record, you need to use the nested syntax.
Otherwise you need to use non-nested syntax.

Related

How can I write an avro schema for an array of arrays?

For example, I've tried this one, but it isn't working. I have to create a schema that have in one field an arrays of arrays and I couldn't do it.
{
"name": "SelfHealingStarter",
"namespace": "SCP.Kafka.AvroSchemas",
"doc": "Message with all the necessary information to run Self Healing process.",
"type": "record",
"fields": [
{
"name": "FiveMinutesAgoMeasurement",
"type": "record",
"doc": "Field with all five minutes ago measurement.",
"fields": [
{
"name": "equipments",
"doc": "List with all equipments measurement.",
"type": {
"type": "array",
"items": {
"type": {
"type": "array",
"items": "string"
},
"default": []
}
},
"default": []
}
]
}
]
}

IDL
protocol Example {
record Foo {
array<array<string>> data = [];
}
}
AVSC from java -jar ~/workspace/avro-tools-1.8.2.jar idl2schemata example.idl
{
"type" : "record",
"name" : "Foo",
"fields" : [ {
"name" : "data",
"type" : {
"type" : "array",
"items" : {
"type" : "array",
"items" : "string"
}
},
"default" : [ ]
} ]
}

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

How to extract a nested nullable Avro Schema

The complete schema is the following:
{
"type": "record",
"name": "envelope",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
]
},
{
"name": "after",
"type": [
"null",
"row"
]
}
]
}
I wanted to programmatically extract the following sub-schema:
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
As you see, field "before" is nullable. I can extract it's schema by doing:
schema.getField("before").schema()
But the schema is not a record as it contains NULL at the beginning(UNION type) and I can't go inside to fetch schema of "row".
["null",{"type":"record","name":"row","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"long"}]}]
I want to fetch the sub-schema because I want to create GenericRecord out of it. Basically I want to create two GenericRecords "before" and "after" and add them to the main GenericRecord created from full schema.
Any help will be highly appreciated.

Good news, if you have a union schema, you can go inside to fetch the list of possible options:
Schema unionSchema = schema.getField("before").schema();
List<Schema> unionSchemaContains = unionSchema.getTypes();
At that point, you can look inside the list to find the one that corresponds to the Type.RECORD.

Loading of Apache Avro plugin for Tranquility fails with Exception

For the Kafka Avro producer I run :
./kafka-avro-console-producer --broker-list localhost:9092 --topic pageviews --property value.schema='{"type":"record","name":"mypageviews","fields":[{"name":"time","type":"string"},{"name":"url","type":"string"},{"name":"user","type":"string"},{"name":"latencyMs","type":"int"}]}'
{"time": "2018-05-31T14:23:11Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
For Tranquility Kafka I run:
bin/tranquility kafka -configFile ../druid-0.12.1/conf-quickstart/tranquility/avro.json
Here is the parser part of the relevant avro.json:
"parser" : {
"type" : "avro_stream",
"avroBytesDecoder" : {
"type" : "schema_inline",
"schema" : {
"namespace": "SKY",
"name": "mypageviews",
"type": "record",
"fields": [
{ "name": "time", "type": "string" },
{ "name": "url", "type": "string" },
{ "name": "user", "type": "string" },
{ "name": "latencyMs", "type": "int" }
]
}
},
"parseSpec" : {
"timestampSpec" : {
"column" : "time",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : ["url", "user"],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "avro"
}
}
Here is the error I get:
ERROR c.m.tranquility.kafka.KafkaConsumer - Exception: java.lang.AbstractMethodError: io.druid.data.input.AvroStreamInputRowParser.parse(Ljava/lang/Object;)Lio/druid/data/input/InputRow;

Apache NiFi not converting recognizing decimal type in convertJsontoAvro Processor

I have a ConvertJsontoAvro processor in NiFi 1.4 and am having difficulty getting the proper datatype of decimal within the avro. The data is being transformed into bytes using logical Avro data types within ExecuteSQL processor, converting avro to Json using ConvertAvrotoJSON processor, and then using ConvertJsonToAvro processor to put into HDFS using PutParquet.
My schema is :
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "entryDate",
"type" : [ "null", {
"type" : "long",
"logicalType" : "timestamp-micros"
} ],
"default" : null
}, {
"name" : "points",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 18,
"scale" : 6
} ],
"default" : null
}]
}
My JSON:
{
"entryDate" : 2018-01-26T13:48:22.087,
"points" : 6.000000
}
I get an error for the avro saying
Cannont convert field points: Cannot resolve union : {"bytes": "+|Ð" not in ["null", {"type":"bytes","logicalType":"decimal","precision":18,"scale":6}]"
Is there some type of work around for this?...

Currently you cannot mix null type and logical types due to bug in Avro. Check this still unresolved issue:
https://issues.apache.org/jira/browse/AVRO-1891
Also the defaults value cannot be null. This should work for you:
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "entryDate",
"type" : {
"type" : "long",
"logicalType" : "timestamp-micros"
},
"default" : 0
}, {
"name" : "points",
"type" : {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 18,
"scale" : 6
},
"default" : ""
}]
}

For anyone interested, I was able to set the decimal and a default value as null (in cases when the field is null or missing), currently using Nifi 1.14.0
{
"name": "value",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 8,
"scale": 4
}
],
"default": null
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

avro - schema for logicalType - avro

I am trying to learn avro and have a question in schema. Some documents say { "name": "userid", "type" : "string", "logicalType" : "uuid" }, And some say { "name": "userid", "type" : { "type" : "string", "logicalType" : "uuid" } }, Which one is right? Or are they same? Thank you!

Related

How can I write an avro schema for an array of arrays?

Avro Tools Failure Expected start-union. Got VALUE_STRING

How to extract a nested nullable Avro Schema

Loading of Apache Avro plugin for Tranquility fails with Exception

Apache NiFi not converting recognizing decimal type in convertJsontoAvro Processor

Categories

Resources