failure in serializing optional date type filed to avro regardless of null value or non-null value - avro

We are using avro1.8.2 to serialize data with optional date type field to be published to topic.
record aRecord {
/** Variable: lastUpdate
* lastUpdate indicates the latest date and time the reference asset was updated
*/
union {null, date} lastUpdate = null;
/** Variable: businessDate
* businessDate indicates the business date of the reference asset price
*/
union {null, date} businessDate = null;
}
Ran into the following exception while using the avro tool generated java class to serialize the data:
Error serializing avro message
Caused by: org.apache.avro.AvroRunTimeException: Unknown datum type org.joda.time.LocalDate: 2021-09-17
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:772)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:302)
at org.apache.avro.generic.GenericData.resolveUnion
Please note that2 this happens regardless of the value is null or non-null (as shown value 2021-09-17 also caused the exception)
We did the following investigation and experiment but could not figure it out why:
Making the date field mandatory, the issue is resolved.
This is because DATE_CONVERSION is added to the corresponding field in the java class generated by avro tool.
If this field is defined as optional and default value is null, DATA_CONVERSION is not added to the java file generated by avro tool.
Using avro 1.9.1 resolved the issue unfortunately we must use avro 1.8.2
We also tried a few other versions of kafka-avro-serializer and spring-boot kafka framework. Nothing works for us.
Other projects that depend on avro1.8.2 seems to be able to handle this and we checked all the places as far as we considered relevant
and all the codes are the same except that somehow they have DATE_CONVERSION in place in the java file
generated by avro tool (although they are defined in advl file exactly the same).
Debuggin into the GenericData.java we found that if DATE_CONVERSION is in place for optional date field, getSchemaName is not called at all.
The getSchemaName basically checks of the type of the object, whether it's an Int, Record, String,...etc.
The date is a logicaltype of joda. Its real type is int as far as we understand
So our questions are:
How to make avro tool enable DATE_CONVERSION for optional "date" type field using avro 1.8.2?
If DATE_CONVERSION is not the key to resolve the issue, what's the best practice to serialize date type field using avro 1.8.2?
and this field could be null (default) or non-null.
Thanks.

SpecificData specificData = SpecificData.get();
specificData.addLogicalTypeConversion(new DateConversion());
DatumWriter<MessageClass> dw = new SpecificDatumWriter<MessageClass>(message.getSchema(), specificData);
DataFileWriter<MessageClass> dfw = new DataFileWriter<MessageClass>(dw);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
dfw.create(message.getSchema(), outputStream);
dfw.append(message);
dfw.close();
ProducerRecord<String, byte[]> record = new ProducerRecord<>(topic, key, outputStream.toByteArray());
return kafkaProducer.send(record, new Callback());
The above code fixed the issue. MessageClass is the java code generated by avro tool.
message is wrapped in specificData which is constructed with new DateConversion()
DATE_CONVERSION is exactly what is needed for optional date field during serialization.
Note that this solution is only needed as a workaround to avro1.8.

Related

Conversion failed because the DateTime data value overflowed the type specified for the DateTime value part in the consumer's buffer

i have a stored procedure in Sybase ASE with date params in it, so when i created a OLE DB Connection and passing the date parameters to the OLE DB Command,And we are mapping to the parameter with OLEDBType.DBTimeStamp type, datetime param type in stored procedure is smalldatetime.
Here is the sample code.
OLEDBConnection con = new OLEDBConnection(connectionstring);
con.open;
OLEDBCommand cmd = new OLEDBCommand(con);
cmd.QueryString = "dbo.job_xb_new"
cmd.QueryType = "Stored Procedure";
cmd.Parameters.Add("#signoff",OLEType.DBTimeStamp);
cmd.Parameters("#signoff").Value = Datetime.now;
cmd.executeNonQuery(); -----------> ERROR HERE
while executing the store-procedure i am receiving the error.
"Conversion failed because the DateTime data value overflowed the type specified for the DateTime value part in the consumer's buffer" ?
Please help!!!
With the only information given there may be a solution to try.
Change the datatype of your input value to the stored proc to a char/varchar
create procedure dbo.myProc
#inDate varchar(20)
AS
BEGIN
..
END
Perform internal conversion from that with CONVERT before passing to your query.
SET #inDate = CONVERT(datetime,#inDate,[style parameter number])
For troubleshooting, just comment out everything in the procedure and SELECT #inDate first to determine what the data coming in from the OLE DB app looks like. You may be in for a surprise there...

How can I write FlowFile attributes to Avro metadata inside the FlowFile's content?

I am creating FlowFiles that are manipulated and split downstream after being emitted by an ExecuteSql processor. I have populated the FlowFiles' attributes with data that I want to put into the Avro metadata contained within each FlowFile's content.
How can I do this?
I've tried using an UpdateRecord processor configured with an AvroReader and AvroRecordSetWriter and a property with a key of /canary that should be writing a FlowFile attribute to that key somewhere in the Avro document. It does not appear anywhere in the output, though.
It would be acceptable to move the records in the Avro data to a subkey and have a metadata section be a part of the record data. I would prefer not to do this, though, because it does not seem like the correct solution and because it sounds much more complex than simply modifying the Avro metadata.
The record-aware processors (and the Readers/Writers) are not metadata-aware, meaning they cannot currently (as of NiFi 1.5.0) act on metadata in any way (inspect, create, delete, etc.), so UpdateRecord won't work for metadata per se. With your /canary property key, it will try to insert a field into your Avro record at the top level, named canary, and should have the value you specify. However I believe your output schema needs to have the canary field added at the top level, or it may be ignored (I'm not positive of this, you can check the output schema to see if it is added automatically).
There is currently no NiFi processor that can update Avro metadata explicitly (MergeContent does some with regards to merging various Avro files together, but you can't choose to set a value, e.g.). However I have an unpolished Groovy script you could use in ExecuteScript to add metadata to Avro files in NiFi 1.5.0+. In ExecuteScript you would set the language to Groovy and the following as the Script Body, then add user-defined (aka "dynamic" properties) to ExecuteScript, where the key will be the metadata key, and the evaluated value (the properties support Expression Language) will be the value:
#Grab('org.apache.avro:avro:1.8.1')
import org.apache.avro.*
import org.apache.avro.file.*
import org.apache.avro.generic.*
def flowFile = session.get()
if(!flowFile) return
try {
// Save off dynamic property values for metadata key/values later
def metadata = [:]
context.properties.findAll {e -> e.key.dynamic}.each {k,v -> metadata.put(k.name, context.getProperty(k).evaluateAttributeExpressions(flowFile).value.bytes)}
flowFile = session.write(flowFile, {inStream, outStream ->
DataFileStream<GenericRecord> reader = new DataFileStream<>(inStream, new GenericDatumReader<GenericRecord>())
DataFileWriter<GenericRecord> writer = new DataFileWriter<>(new GenericDatumWriter<GenericRecord>())
def schema = reader.schema
def inputCodec = reader.getMetaString(DataFileConstants.CODEC) ?: DataFileConstants.NULL_CODEC
// Forward the existing metadata to the output
reader.metaKeys.each { key ->
if (!DataFileWriter.isReservedMeta(key)) {
byte[] metadatum = reader.getMeta(key)
writer.setMeta(key, metadatum)
}
}
// For each dynamic property, set the key/value pair as Avro metadata
metadata.each {k,v -> writer.setMeta(k,v)}
writer.setCodec(CodecFactory.fromString(inputCodec))
writer.create(schema, outStream)
writer.appendAllFrom(reader, false)
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
log.error('Error adding Avro metadata, penalizing flow file and routing to failure', e)
flowFile = session.penalize(flowFile)
session.transfer(flowFile, REL_FAILURE)
}
Note that this script can work with versions of NiFi previous to 1.5.0, but the #Grab at the top is not supported until 1.5.0, so you'd have to download Avro and its dependencies into a flat folder, and point to that in the Module Directory property of ExecuteScript.

Dataflow output parameterized type to avro file

I have a pipeline that successfully outputs an Avro file as follows:
#DefaultCoder(AvroCoder.class)
class MyOutput_T_S {
T foo;
S bar;
Boolean baz;
public MyOutput_T_S() {}
}
#DefaultCoder(AvroCoder.class)
class T {
String id;
public T() {}
}
#DefaultCoder(AvroCoder.class)
class S {
String id;
public S() {}
}
...
PCollection<MyOutput_T_S> output = input.apply(myTransform);
output.apply(AvroIO.Write.to("/out").withSchema(MyOutput_T_S.class));
How can I reproduce this exact behavior except with a parameterized output MyOutput<T, S> (where T and S are both Avro code-able using reflection).
The main issue is that Avro reflection doesn't work for parameterized types. So based on these responses:
Setting Custom Coders & Handling Parameterized types
Using Avrocoder for Custom Types with Generics
1) I think I need to write a custom CoderFactory but, I am having difficulty figuring out exactly how this works (I'm having trouble finding examples). Oddly enough, a completely naive coder factory appears to let me run the pipeline and inspect proper output using DataflowAssert:
cr.RegisterCoder(MyOutput.class, new CoderFactory() {
#Override
public Coder<?> create(List<? excents Coder<?>> componentCoders) {
Schema schema = new Schema.Parser().parse("{\"type\":\"record\,"
+ "\"name\":\"MyOutput\","
+ "\"namespace\":\"mypackage"\","
+ "\"fields\":[]}"
return AvroCoder.of(MyOutput.class, schema);
}
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
List components = new ArrayList();
return components;
}
While I can successfully assert against the output now, I expect this will not cut it for writing to a file. I haven't figured out how I'm supposed to use the provided componentCoders to generate the correct schema and if I try to just shove the schema of T or S into fields I get:
java.lang.IllegalArgumentException: Unable to get field id from class null
2) Assuming I figure out how to encode MyOutput. What do I pass to AvroIO.Write.withSchema? If I pass either MyOutput.class or the schema I get type mismatch errors.
I think there are two questions (correct me if I am wrong):
How do I enable the coder registry to provide coders for various parameterizations of MyOutput<T, S>?
How do I values of MyOutput<T, S> to a file using AvroIO.Write.
The first question is to be solved by registering a CoderFactory as in the linked question you found.
Your naive coder is probably allowing you to run the pipeline without issues because serialization is being optimized away. Certainly an Avro schema with no fields will result in those fields being dropped in a serialization+deserialization round trip.
But assuming you fill in the schema with the fields, your approach to CoderFactory#create looks right. I don't know the exact cause of the message java.lang.IllegalArgumentException: Unable to get field id from class null, but the call to AvroCoder.of(MyOutput.class, schema) should work, for an appropriately assembled schema. If there is an issue with this, more details (such as the rest of the stack track) would be helpful.
However, your override of CoderFactory#getInstanceComponents should return a list of values, one per type parameter of MyOutput. Like so:
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
return ImmutableList.of(myOutput.foo, myOutput.bar);
}
The second question can be answered using some of the same support code as the first, but otherwise is independent. AvroIO.Write.withSchema always explicitly uses the provided schema. It does use AvroCoder under the hood, but this is actually an implementation detail. Providing a compatible schema is all that is necessary - such a schema will have to be composed for each value of T and S for which you want to output MyOutput<T, S>.

Persisting date and nested with JSON

I'm trying to save nested person, which is json array and complains about requiring a Set.
Another problem I encountered, is that another field date cannot be null, but contains value already.
What I need to do before for adding params into my object or I have to change my json is built? I'm trying to save json post like this:
// relationship of Test
//static hasMany = [people: Person, samples: Sample]
def jsonParams= JSON.parse(request.JSON.toString())
def testInstance= new Test(jsonParams)
//Error requiring a Set
[Failed to convert property value of type 'org.codehaus.groovy.grails.web.json.JSONArray' to required type 'java.util.Set' for property 'people'; nested exception is java.lang.IllegalStateException: Cannot convert value of type [java.lang.String] to required type [com.Person] for property 'people[0]': no matching editors or conversion strategy found]]
//error saying its null
Field error in object 'com.Test' on field 'samples[2].dateTime': rejected value [null]; codes [com.Sample]
//...
"samples[0].dateTime_hour":"0",
"samples[0].dateTime_minute":"0",
"samples[0].dateTime_day":"1",
"samples[0].dateTime_month":"0",
"samples[0].dateTime_year":"-1899",
"samples[0]":{
"dateTime_minute":"0",
"dateTime_day":"1",
"dateTime_year":"-1899",
"dateTime_hour":"0",
"dateTime_month":"0"
},
"people":[
"1137",
"1141"
], //...
First off, ths line is unnecessary:
def jsonParams= JSON.parse(request.JSON.toString())
The request.JSON can be directly passed to the Test constructor:
def testInstance = new Test(request.JSON)
I'm not sure what your Person class looks like, but I'm assuming those numbers (1137, 1141) are ids. If that is the case, then your json should work - there's a chance that passing the request.JSON directly could help. I tested your JSON locally and it has no problem associating the hasMany collection. I also used:
// JSON numbers rather than strings
"people": [1137, 1141]
// using Person map with the id
"people: [{
"id": 1137
}, {
"id": 1141
}]
Both of these worked as well and are worth trying.
Concerning the null dateTime, I would rework your JSON. I would send the dateTime in a single field, instead of splitting the value into hour/minute/day/etc. The default formats are yyyy-MM-dd HH:mm:ss.S and yyyy-MM-dd'T'hh:mm:ss'Z', but these can be defined by the grails.databinding.dateFormats config setting (config.groovy). There are other ways to do the binding as well (#BindingFormat annotation) but it's going to be easiest to just send the date in a way that grails can handle without additional configuration.
If you are dead set on splitting the dateTime into pieces, then you could use the #BindUsing annotation:
class Sample{
#BindUsing({obj, source ->
def hour = source['dateTime_hour']
def minute = source['dateTime_minute']
...
// set obj.dateTime based on these pieces
})
Date dateTime
}
An additional comment on your JSON, you seem to have samples[0] defined twice and are using 2 syntaxes for your internal collections (JSON arrays and indexed keys). I personally would stick with a single syntax to clean it up:
"samples": [
{"dateTime": "1988-01-01..."}
{"dateTime": "2015-10-21..."}
],"people": [
{"id": "1137"},
{"id": "1141"}
],

How to write MR unit test to read avro based schema records and emit text base key values

Even I have a similar requirement. I want to read avro records having some schema and emit text datatype key values, I need to write the MR Unit test cases for this. I have wrote the following code but it is giving me the following exception:
org.apache.avro.AvroTypeException: Found string, expecting xyz.abc.Myschema
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
.....
.....
Following is my codebase in setup function:
MyMapper myMapper = new MyMapper();
mapDriver = new MapDriver<AvroKey<Myschema>, NullWritable, Text, Text>();
mapDriver.setMapper(myMapper);
Configuration configuration = mapDriver.getConfiguration();
//Copy over the default io.serializations. If you don't do this then you will
//not be able to deserialize the inputs to the mapper
String[] strings = mapDriver.getConfiguration().getStrings("io.serializations");
String[] newStrings = new String[strings.length +1];
System.arraycopy( strings, 0, newStrings, 0, strings.length );
newStrings[newStrings.length-1] = AvroSerialization.class.getName();
//Now you have to configure AvroSerialization by sepecifying the key
//writer Schema and the value writer schema.
configuration.setStrings("io.serializations", newStrings);
Text x = new Text();
Configuration conf = mapDriver.getConfiguration();
AvroSerialization.addToConfiguration(conf);
AvroSerialization.setKeyWriterSchema(conf, Schema.create(Schema.Type.STRING));
AvroSerialization.setKeyReaderSchema(conf, new Myschema().getSchema());
Job job = new Job(conf);
job.setMapperClass(MyMapper.class);
job.setInputFormatClass(AvroKeyInputFormat.class);
AvroJob.setInputKeySchema(job, new Myschema().getSchema());
job.setOutputKeyClass((new Text()).getClass());
I need to read avro based records having Myschema schema and emit key values pairs having text datatypes.
Following is my mapper class:
public class MyMapper extends Mapper<AvroKey<Myschema>, NullWritable, Text, Text>...
protected void map(AvroKey<Myschema> key, NullWritable value, Context context)...
Can someone check if there is any configuration parameter that i'm missing and help me out?

Resources