Loading of Apache Avro plugin for Tranquility fails with Exception - avro

For the Kafka Avro producer I run :
./kafka-avro-console-producer --broker-list localhost:9092 --topic pageviews --property value.schema='{"type":"record","name":"mypageviews","fields":[{"name":"time","type":"string"},{"name":"url","type":"string"},{"name":"user","type":"string"},{"name":"latencyMs","type":"int"}]}'
{"time": "2018-05-31T14:23:11Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
For Tranquility Kafka I run:
bin/tranquility kafka -configFile ../druid-0.12.1/conf-quickstart/tranquility/avro.json
Here is the parser part of the relevant avro.json:
"parser" : {
"type" : "avro_stream",
"avroBytesDecoder" : {
"type" : "schema_inline",
"schema" : {
"namespace": "SKY",
"name": "mypageviews",
"type": "record",
"fields": [
{ "name": "time", "type": "string" },
{ "name": "url", "type": "string" },
{ "name": "user", "type": "string" },
{ "name": "latencyMs", "type": "int" }
]
}
},
"parseSpec" : {
"timestampSpec" : {
"column" : "time",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : ["url", "user"],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "avro"
}
}
Here is the error I get:
ERROR c.m.tranquility.kafka.KafkaConsumer - Exception: java.lang.AbstractMethodError: io.druid.data.input.AvroStreamInputRowParser.parse(Ljava/lang/Object;)Lio/druid/data/input/InputRow;

Related

Data creation Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING [duplicate]

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?
Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

How can I write an avro schema for an array of arrays?

For example, I've tried this one, but it isn't working. I have to create a schema that have in one field an arrays of arrays and I couldn't do it.
{
"name": "SelfHealingStarter",
"namespace": "SCP.Kafka.AvroSchemas",
"doc": "Message with all the necessary information to run Self Healing process.",
"type": "record",
"fields": [
{
"name": "FiveMinutesAgoMeasurement",
"type": "record",
"doc": "Field with all five minutes ago measurement.",
"fields": [
{
"name": "equipments",
"doc": "List with all equipments measurement.",
"type": {
"type": "array",
"items": {
"type": {
"type": "array",
"items": "string"
},
"default": []
}
},
"default": []
}
]
}
]
}
IDL
protocol Example {
record Foo {
array<array<string>> data = [];
}
}
AVSC from java -jar ~/workspace/avro-tools-1.8.2.jar idl2schemata example.idl
{
"type" : "record",
"name" : "Foo",
"fields" : [ {
"name" : "data",
"type" : {
"type" : "array",
"items" : {
"type" : "array",
"items" : "string"
}
},
"default" : [ ]
} ]
}

Avro Tools Failure Expected start-union. Got VALUE_STRING

I've defined the below avro schema (car_sales_customer.avsc),
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name": "cust_date",
"type": "string"
},
{
"name": "customer",
"type": {
"type": "array",
"items": {
"name": "customer",
"type": "record",
"fields": [
{
"name": "address",
"type": "string"
},
{
"name": "driverlience",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "name",
"type": "string"
},
{
"name": "phone",
"type": "string"
}
]
}
}
}]
}
and my input json payload (car_sales_customer.json) is as follows,
{"cust_date":"2017-04-28","customer":[{"address":"SanFrancisco,CA","driverlience":"K123499989","name":"JoyceRidgely","phone":"16504378889"}]}
I'm trying to use avro-tools and convert the above json to avro using the avro schema,
java -jar ./avro-tools-1.9.2.jar fromjson --schema-file ./car_sales_customer.avsc ./car_sales_customer.json > ./car_sales_customer.avro
I get the below error when I execute the above statement,
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:514)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:433)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:89)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
Is there a solution to overcome the error?

Nifi RecordReader & RecordWriter serialization error. IllegalTypeConversionException; Cannot convert value of class; because the type is not supported

I'm trying to convert json to json using JoltTransformRecord in Apache Nifi. When I try transform json in https://jolt-demo.appspot.com/, I'm getting correct result. This is okey.
But, when I'm trying to transform json using JoltTransformRecord, it is throws an exception. Error is; "Cannot convert value of class [Ljava.lang.Object; because the type is not supported". But I didn't understand why I'm getting this error. I did check my input and output schemas, but I didn't find anything. They looks like correct.
In the below, my input and output json examples, jolt specification, input and output schemas are given. Also, for this, I'm using JsonTreeReader and JsonRecordSetWriter.
--- How can I solve this problem? ---
Example input json for JoltTransformRecord(In this example, there is only one json object inside array. But actually, there are a lot of json object inside array.);
[ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"Samples" : {
"PathFeedrate" : [ {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
} ]
},
"Events" : {
"SequenceNumber" : [ {
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
} ],
"Unit" : [ {
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
}]
Sample output Json I want;
{
"DataItems" : [ {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68104",
"value" : "425.5333",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Samples",
"type" : "PathFeedrate",
"dataItemId" : "pf",
"timestamp" : "2019-03-01T21:48:30.244219Z",
"sequence" : "68117",
"value" : "0",
"name" : "Fact",
"subType" : "ACTUAL"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "SequenceNumber",
"dataItemId" : "seq",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68105",
"value" : "0",
"name" : "sequenceNum"
}, {
"uuid" : "MFMS1-MC5",
"componentId" : "path1",
"eventType" : "Events",
"type" : "Unit",
"dataItemId" : "unit",
"timestamp" : "2019-03-01T21:48:27.940558Z",
"sequence" : "68106",
"value" : "13",
"name" : "unitNum"
} ]
}
My Jolt Specification;
[
{
"operation": "shift",
"spec": {
"Samples": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Events": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].type",
"*": "Items.&2[#2].&"
}
}
},
"Condition": {
"*": {
"*": {
"#(3,uuid)": "Items.&2[#2].uuid",
"#(3,componentId)": "Items.&2[#2].componentId",
"$2": "Items.&2[#2].eventType",
"$1": "Items.&2[#2].value",
"*": "Items.&2[#2].&"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"Items": {
"*": {
"*": "DataItems[]"
}
}
}
}
]
This specification is working correctly. Because I have tried it in Jolt transform demo.
I'm using JsonTreeReader for read json in JoltTransformRecord. And this is my input schema;
{
"name": "Items",
"namespace": "Items",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "Samples",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "SamplesDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "sampleRate",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "duration",
"type": ["null", "string"]
},
{
"name": "sampleCount",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Events",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "EventsDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
},
{
"name": "resetTriggered",
"type": ["null", "string"]
}
]
}
}
}]
},
{
"name": "Condition",
"type": ["null", {
"type": "map",
"values": {
"type": "array",
"items": {
"name": "ConditionDataItem",
"type": "record",
"fields": [
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "sequence",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"]
},
{
"name": "subType",
"type": ["null", "string"]
},
{
"name": "nativeCode",
"type": ["null", "string"]
},
{
"name": "nativeSeverity",
"type": ["null", "string"]
},
{
"name": "qualifier",
"type": ["null", "string"]
},
{
"name": "statistic",
"type": ["null", "string"]
},
{
"name": "compositionId",
"type": ["null", "string"]
}
]
}
}
}]
}
]
}
I'm using JsonRecordSetWriter for write converted result in JoltTransformRecord. And this is my output schema;
{
"name": "Items",
"type": "record",
"namespace": "Items",
"fields": [
{
"name": "DataItems",
"type": {
"type": "array",
"items": {
"name": "DataItems",
"type": "record",
"fields": [
{
"name": "uuid",
"type": "string"
},
{
"name": "componentId",
"type": "string"
},
{
"name": "eventType",
"type": "string"
},
{
"name": "type",
"type": "string"
},
{
"name": "dataItemId",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "name",
"type": ["null", "string"],
"default": null
},
{
"name": "subType",
"type": ["null", "string"],
"default": null
}
]
}
}
}
]
}
This is indeed a bug in the record handling utilities, I have written NIFI-6105 to cover the fix. Good catch!
As a workaround, since you have JSON as input and output, you can use JoltTransformJson instead of JoltTransformRecord. Alternatively, if you know the keys in the map (PathFeedrate, e.g.), you can change the schema to treat it as a record rather than a map, that might get you around the bug.

Jenkins extended choice parameter plugin retrieve value from Groovy script

I am using the Extended Choice Parameter Jenkins plugin. Used the Groovy script provided in the plugin web page, made some changes to it and currently having:
import org.boon.Boon;
def jsonEditorOptions = Boon.fromJson(/{
disable_edit_json: true,
disable_properties: true,
no_additional_properties: true,
disable_collapse: true,
disable_array_add: true,
disable_array_delete: true,
disable_array_reorder: true,
theme: "bootstrap2",
iconlib:"fontawesome4",
"schema":{
"title": "Environments",
"type": "array",
"format":"tabs",
"items": {
"title": "Environment",
"headerTemplate": "{{self.name}}",
"type": "object",
"properties": {
"name" : {
"title": "Environment",
"type": "string",
"readOnly": "true",
"propertyOrder" : 1
},
"Properties": {
"type": "array",
"format": "table",
"propertyOrder" : 2,
"items": {
"type": "object",
"properties": {
"URL" : {
"type": "string",
"readOnly": "true",
"propertyOrder" : 1
},
"User" : {
"type": "string",
"readOnly": "true",
"propertyOrder" : 2
},
"Password" : {
"type": "string",
"propertyOrder" : 3
}
}
}
}
}
}
},
startval: [
{
[
{
"name": "DEV",
"Properties": [
{
"URL": "host:port1",
"User": "user1",
"Password":""
}
]
},
{
"name": "TST",
"Properties": [
{
"URL": "host:port2,
"User": "user2",
"Password":""
}
]
}
]
}
]
}
}/);
return jsonEditorOptions;
I would like to know:
1) When building, based on a selection/input from the user, let's say: DEV. Then the values associated to DEV would be:
- URL: host:port1
- User: user1
- Password: 123456
How can these values (DEV, host:port1, user1, 123456) be obtained and passed as parameters in the Build steps?
2) Is there a way to mask the value of the field Password when it is being entered by the user?

Resources