Defining Apache Avro Schema fullname in Apache NiFi

Defining Apache Avro Schema fullname in Apache NiFi - avro

Using NiFi 1.7.1 (which uses Java Avro 1.8.1) and in the AvroSchemaRegistry, I'm trying to define a schema which has the fields name and app.name at the top level. According to the Avro docs[1] I would assume that I could just define the fullname like normal "name": "app.name" but I hit the error Illegal character in: app.name. It's true that the name portion of the fullname does not allow dots but according to the docs: "If the name specified contains a dot, then it is assumed to be a fullname..."
I then tried using the namespace field. Using the following schema:
{
"type": "record",
"name": "nameRecord",
"fields": [
{
"type": [
"string",
"null"
],
"name": "name"
},
{
"type": [
"string",
"null"
],
"namespace": "app",
"name": "name"
}
]
}
I hit this error: Duplicate field name in record nameRecord: name type:UNION pos:1 and name type:UNION pos:0
Ultimately, I'd like to be able to define a schema for record like this (in JSON):
{
"name": "Joe",
"app.name": "NiFi"
}
[1] https://avro.apache.org/docs/1.8.1/spec.html#names

According to the docs, namespaces are only supported for record, enum, and fixed types, and other fields must adhere to the "regular" naming conventions for which a period (.) is not a valid character.
However as of NiFi 1.5.0 (via NIFI-4612), you could specify the schema in an AvroSchemaRegistry, and set "Validate Field Names" to false. This should allow you to bypass the restriction of having a field's name be app.name.

Related

Replace a field of enum value in AVRO schema to string

I need a little help in removing an ENUM and replacing it with String in an AVRO schema.
I have an avro schema file which has something like this among other entries:
{
"name": "anonymizedLanguage",
"type": [
"null",
"com.publicevents.common.LanguageCode"
],
"default": null
}
The LanguageCode is also an avsc file with entries as below:
{
"name": "LanguageCode",
"type": "enum",
"namespace": "com.publicevents.common",
"symbols": [
"EN",
"NL",
"FR",
"ES"
]
}
I want to remove the language enum and move it to a string having the language code. How would I go about doing that ?

You can only do "type": ["null", "string"]. You cannot make it "have" anything specific to a language within the schema, that's what an enum is for. Once it is a plain string, that would be app-specific validation logic to enforce it have specific values.

AVRO schema optional fields are not compatible

In Confluent documentation they write that deletion and addition of optional AVRO fields preserve full AVRO compatibility. I need to update an AVRO schema by deletion of optional fields and by adding new optional fields. But Confluent schema registry responses with error 409, that the new schema is not compatible with the old schema.
I'm deleting the following field (in avsc syntax):
{
"name" : "eligibility",
"type" : [ {
"type" : "array",
"items" : "Scope"
}, "null" ]
}
and adding these fields:
{
"name" : "partyDataExt",
"type" : [ {
"type" : "record",
"name" : "PartyDataExt",
"fields" : [ {
"name" : "dayOfDeath",
"type" : [ {
"type" : "int",
"logicalType" : "date"
}, "null" ]
}, {
"name" : "identified",
"type" : [ "boolean", "null" ]
} ]
}, "null" ]
}
and
{
"name" : "identificationDocument",
"type" : [ "null", "Document" ]
}
Question: What exactly is meant by an optional AVRO field? Is it the union {null, MyType}, or the presence of the default parameter, or both, or something else?
In the case of the deleted "eligibility" field, it helps if the field has "default":null. This helps also for the added "identificationDocument" field, but not for the "partyDataExt" field.
When I switch the "null" and "Document" elements in the definition of "identificationDocument", adding the default parameter doesn't help either. It seems that "null" must be the first element in the the "type" array.

First of all, you'll need to understand Apache Avro.
default fields in the reader schema are for schema evolution:
default: A default value for this field, only used when reading
instances that lack the field for schema evolution purposes. The
presence of a default value does not make the field optional at
encoding time. [...] Avro encodes a field even if its value is equal
to its default.
Also, "null" goes first in a union:
Note that when a default value is specified for a record field whose
type is a union, the type of the default value must match the first
element of the union. Thus, for unions containing "null", the "null"
is usually listed first, since the default value of such unions is
typically null.
There is no such thing as “optional” fields in Apache Avro documentation, but Confluent refers to fields having a default, which could be as simple as
"fields": [
{
"name": "field",
"type": "string",
"default": "default"
}
]
You can also use unions and “null“ (first), but you don't have to.
That's it for Avro, so you can read data with a schema with extra fields that are not in the writers schema. Fields that are not in the reading schema are silently ignored, which Confluent refers to “deleted fields”.
As for Confluent Avro, they have different compatibility rules (and a different serialisation format) than Apache Avro, but these are documented in “Compatibility Types” you cited.

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.

What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Avro Schema - what is "avro.java.string": "String"

I've got my Kafka Streams processing configuration for AUTO_REGISTER_SCHEMAS set to true.
I noticed in this auto generated schema it creates the following 2 types
{
"name": "id",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
Could someone please explain why it creates 2 types and what exactly "avro.java.string": "String" is.
Thanks

By default Avro uses CharSequence for the String representation, the following syntax allows you to overwrite the default behavior and use java.lang.String as the String type for the instances of the fields declared like this
"type": {
"type": "string",
"avro.java.string": "String"
}

How to pass empty value to a query param in swagger?

I am trying to access an url similar to http://example.com/service1?q1=a&q2=b. However q1 will not have any values associated with it sometimes but is required to access the service (http://example.com/service1?q1=&q2=b). How do I achieve this through swagger ui JSON. I've tried using allowEmptyValue option but it doesn't seem to work.
Please find below the sample JSON I tried using allowEmptyValue option,
{
"path": "/service1.do",
"operations": [{
"method": "GET",
"type": "void",
"parameters": [{
"name": "q1",
"in" : "query",
"required": false,
"type": "string",
"paramType": "query",
"allowEmptyValue": true
},{
"name": "q2",
"in" : "query",
"required": true,
"type": "string",
"paramType": "query",
}
],
"responseMessages": [{
"code": 200,
"responseModel": "/successResponseModel"
}
}
When an empty value is passed to q1, swagger frames the URL as http://example.com/service1?q2=b. Is there anyway to include q1 with empty values to be included in the URL (http://example.com/service1?q1=&q2=b) ?
Any help would be greatly appreciated.

It looks like your problem is a known issue of swagger-ui that hasn't fixed yet. see.
As a workaround you may do one of the followings.
Option 1: Specify a default value.
This option have nothing to do with swagger-ui. In your ws-implementation, You have to add a default value(in your case "") to use when 'q1' is not added. Any REST framework has this option.
As the ws-implementation perspectives, this should be there in your ws, unless you have another service to be triggered when 'q1' is not added. (which might not be a good design in most cases) And you can use this as a forever solution, not temporary.
Option 2: using enums (not a consistent solution)
As Explained in this. You can specify your query parameter 'q1' as follows for your swagger definition.
{
"in": "query",
"name": "q1",
"type": "boolean",
"required": false,
"enum" : [true],
"allowEmptyValue" : true
}
(1) "required" must be false.
(2) "allowEmptyValue" must be true.
(3) "enum" must have exactly one non-empty value.
(4) "type" must be "boolean". (or "string" with a special enum, say "INCLUDE")

I managed to solve this by setting:
"required": true,
"allowEmptyValue": true
In the Swagger-UI a checkbox will then be displayed where you can send the empty value. That worked for me. If that was checked and an empty query parameter was passed, the URL would look something like this: https://example.com?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Defining Apache Avro Schema fullname in Apache NiFi - avro

Related

Replace a field of enum value in AVRO schema to string

AVRO schema optional fields are not compatible

How does one parse nested Avro records correctly in NiFi?

Avro Schema - what is "avro.java.string": "String"

How to pass empty value to a query param in swagger?

Categories

Resources