Apache NiFi not converting recognizing decimal type in convertJsontoAvro Processor - avro

I have a ConvertJsontoAvro processor in NiFi 1.4 and am having difficulty getting the proper datatype of decimal within the avro. The data is being transformed into bytes using logical Avro data types within ExecuteSQL processor, converting avro to Json using ConvertAvrotoJSON processor, and then using ConvertJsonToAvro processor to put into HDFS using PutParquet.
My schema is :
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "entryDate",
"type" : [ "null", {
"type" : "long",
"logicalType" : "timestamp-micros"
} ],
"default" : null
}, {
"name" : "points",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 18,
"scale" : 6
} ],
"default" : null
}]
}
My JSON:
{
"entryDate" : 2018-01-26T13:48:22.087,
"points" : 6.000000
}
I get an error for the avro saying
Cannont convert field points: Cannot resolve union : {"bytes": "+|Ð" not in ["null", {"type":"bytes","logicalType":"decimal","precision":18,"scale":6}]"
Is there some type of work around for this?...

Currently you cannot mix null type and logical types due to bug in Avro. Check this still unresolved issue:
https://issues.apache.org/jira/browse/AVRO-1891
Also the defaults value cannot be null. This should work for you:
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "entryDate",
"type" : {
"type" : "long",
"logicalType" : "timestamp-micros"
},
"default" : 0
}, {
"name" : "points",
"type" : {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 18,
"scale" : 6
},
"default" : ""
}]
}

For anyone interested, I was able to set the decimal and a default value as null (in cases when the field is null or missing), currently using Nifi 1.14.0
{
"name": "value",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 8,
"scale": 4
}
],
"default": null
}

Related

Apache Avro Union type

I'm using the Avro 1.11.0 library to write data into the Avro files using Python 3.7. I'm having some doubts about the union type of the Avro. Please find below the two schemas.
{
"name" : "name",
"type" : ["null", "string"],
"columnName" : "name",
}
{
"name" : "name",
"type" : ["string", "null"],
"columnName" : "name",
}
First schema contains union type as "type" : ["null", "string"] and second schema contains union type as "type" : ["string", "null"].
So is there any difference between the above mentioned schemas?
The only difference is that the specification states that if you want to use a default value, it should correspond to the first type in the union.
For example, these would be valid:
{
"name" : "name",
"type" : ["null", "string"],
"columnName" : "name",
"default": null,
}
{
"name" : "name",
"type" : ["string", "null"],
"columnName" : "name",
"default": "foo",
}
But these would not:
{
"name" : "name",
"type" : ["null", "string"],
"columnName" : "name",
"default": "foo",
}
{
"name" : "name",
"type" : ["string", "null"],
"columnName" : "name",
"default": null,
}
Since a union that includes null tends to mean something like an optional field, most people would put null as the first option in the union so that they can set the default value to null.

avro - schema for logicalType

I am trying to learn avro and have a question in schema.
Some documents say
{
"name": "userid",
"type" : "string",
"logicalType" : "uuid"
},
And some say
{
"name": "userid",
"type" : {
"type" : "string",
"logicalType" : "uuid"
}
},
Which one is right? Or are they same?
Thank you!
I ran variants of your schemas with the avro tools "random" command ( aliased as avro below). It tries to generate a random value for a schema.
A schema with just this type using the nested type syntax to specify logicalType is rejected:
avro random --schema '{ "name": "userid", "type" : { "type": "string", "logicalType" : "uuid" } }' -
[...] No type: {"name":"userid","type":{"type":"string","logicalType":"uuid"}}
However, it works when putting the logicalType next to type:
avro random --schema ' { "type" : "string", "logicalType" : "uuid" }' -
[...] Objavro.schemaL{"type":"string","logicalType":"uuid"}avro.codecdeflate}�j�U�.�\�o���
Now, when we use it in a record, we get a warning when putting logicalType next to type:
avro random --schema '{ "type": "record", "fields": [ { "type" : "string", "logicalType" : "uuid", "name": "f"} ] , "name": "rec"}' -
[...] WARN avro.Schema: Ignored the rec.f.logicalType property ("uuid"). It should probably be nested inside the "type" for the field.
Objavro.schema�{"type":"record","name":"rec","fields":[{"name":"f","type":"string","logicalType":"uuid"}]}avro.codecdeflate��w�9�9�n�s�
The nested syntax is accepted without a warning:
avro random --schema '{ "type": "record", "fields": [ { "type" : { "type": "string", "logicalType" : "uuid" } , "name": "f"} ] , "name": "rec"}' -
�w<��qcord","name":"rec","fields":[{"name":"f","type":{"type":"string","logicalType":"uuid"}}]}avro.codecdeflate8��t
Further if we look at logicaltypes inside arrays:
avro random --count 1 --schema ' { "type": "array", "items": { "type" : "string", "logicalType" : "uuid" , "name": "f"} , "name": "farr" } ' -
[... random bits]
While the nested version fails:
avro random --count 1 --schema ' { "type": "array", "items": {"type": { "type" : "string", "logicalType" : "uuid" , "name": "f"} } , "name": "farr" } ' -
[...] No type: {"type":{"type":"string","logicalType":"uuid","name":"f"}}
It appears that if a logicalType is a type of a field in a record, you need to use the nested syntax.
Otherwise you need to use non-nested syntax.

Avro schema for record type with empty object

I am trying to create avro schema for below json
{
"id": "TEST",
"status": "status",
"timestamp": "2019-01-01T00:00:22-03:00",
"comment": "add comments or replace it with adSummary data",
"error": {
"code": "ER1212132",
"msg": "error message"
}
}
the error object is optional, it could be
"error" :{}
Below is the avro schema without default value
{
"type" : "record",
"name" : "Order",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "status",
"type" : "string"
}, {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "comment",
"type" : ["null","string"],
"default": null
}, {
"name" : "error",
"type" : {
"type" : "record",
"name" : "error",
"fields" : [ {
"name" : "code",
"type" : "string"
}, {
"name" : "msg",
"type" : "string"
} ]
}
} ]
}
How can I add default value {} for error field in json.
{
"type" : "record",
"name" : "Order",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "status",
"type" : "string"
}, {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "comment",
"type" : ["null","string"],
"default": null
}, {
"name" : "error",
"type" : [{"type": "record", "fields":[{"name": "code", "type":"string"}, {"name": "msg", "type":"string"}]}, {"type": "record", "fields":[]}]
} ]
}

How to develop combin charts of column and line with mutiple components in highcharts

I am struggling for the develop of combine charts of line and column with multiple components in highcharts.
Here each component has different x-axis values.Please help me out
thanks in advance
Generally mine exact question is ........................
first component:componentA
var y1-axis: [
{
"name" : "3D",
"type" : "line",
"data" : ["0","0","0"]
},
{
"name" : "3C",
"type" : "line",
"data" : ["0","0","0"]
},
{
"name" : "3B-NDT",
"type" : "line",
"data" : ["0","0","0"]
},
{
"name" : "3B",
"type" : "line",
"data" : ["0","0","0"]
}]
var x1-axis : [
{
"name" : "jobDescription",
"type" : "",
"data" : ["7799","1046","1112"]
}
]
Second component: componentB
var y2-axis: [
{
"name" : "3D",
"type" : "column",
"data" : ["0","7.98","0"]
},
{
"name" : "3C",
"type" : "column",
"data" : ["0","1.82","6.64"]
},
{
"name" : "3B-NDT",
"type" : "column",
"data" : ["0","48.12","37.87"]
}]
var x2-axis : [
{
"name" : "jobDescription1",
"type" : "",
"data" : ["7801", "1111", "1147"]
}
]
hw can design column chart with two components in single chart like
pls find the attached image of highchart
In general something like this is possible by using xAxis.left and xAxis.width like in this example: http://jsfiddle.net/45v24/4/
However, it looks like tooltip has bug, reported here.

Question populating nested records in Avro using a GenericRecord

Suppose I’ve got the following schema:
{
"name" : "Profile",
"type" : "record",
"fields" : [
{ "name" : "firstName", "type" : "string" },
{ "name" : "address" , "type" : {
"type" : "record",
"name" : "AddressUSRecord",
"fields" : [
{ "name" : "address1" , "type" : "string" },
{ "name" : "address2" , "type" : "string" },
{ "name" : "city" , "type" : "string" },
{ "name" : "state" , "type" : "string" },
{ "name" : "zip" , "type" : "int" },
{ "name" : "zip4", "type": "int" }
]
}
}
]
}
I’m using a GenericRecord to represent each Profile that gets created. To add a firstName, it’s easy to do the following:
Schema sch = Schema.parse(schemaFile);
DataFileWriter<GenericRecord> fw = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>()).create(sch, new File(outFile));
GenericRecord r = new GenericData.Record(sch);
r.put(“firstName”, “John”);
fw.append(r);
But how would I set the city, for example? How do I represent the key as a string that the r.put method can understand?
Thanks
For the schema above:
GenericRecord t = new GenericData.Record(sch.getField("address").schema());
t.put("city","beijing");
r.put("address",t);

Resources