AVRO schema optional fields are not compatible - avro

In Confluent documentation they write that deletion and addition of optional AVRO fields preserve full AVRO compatibility. I need to update an AVRO schema by deletion of optional fields and by adding new optional fields. But Confluent schema registry responses with error 409, that the new schema is not compatible with the old schema.
I'm deleting the following field (in avsc syntax):
{
"name" : "eligibility",
"type" : [ {
"type" : "array",
"items" : "Scope"
}, "null" ]
}
and adding these fields:
{
"name" : "partyDataExt",
"type" : [ {
"type" : "record",
"name" : "PartyDataExt",
"fields" : [ {
"name" : "dayOfDeath",
"type" : [ {
"type" : "int",
"logicalType" : "date"
}, "null" ]
}, {
"name" : "identified",
"type" : [ "boolean", "null" ]
} ]
}, "null" ]
}
and
{
"name" : "identificationDocument",
"type" : [ "null", "Document" ]
}
Question: What exactly is meant by an optional AVRO field? Is it the union {null, MyType}, or the presence of the default parameter, or both, or something else?
In the case of the deleted "eligibility" field, it helps if the field has "default":null. This helps also for the added "identificationDocument" field, but not for the "partyDataExt" field.
When I switch the "null" and "Document" elements in the definition of "identificationDocument", adding the default parameter doesn't help either. It seems that "null" must be the first element in the the "type" array.

First of all, you'll need to understand Apache Avro.
default fields in the reader schema are for schema evolution:
default: A default value for this field, only used when reading
instances that lack the field for schema evolution purposes. The
presence of a default value does not make the field optional at
encoding time. [...] Avro encodes a field even if its value is equal
to its default.
Also, "null" goes first in a union:
Note that when a default value is specified for a record field whose
type is a union, the type of the default value must match the first
element of the union. Thus, for unions containing "null", the "null"
is usually listed first, since the default value of such unions is
typically null.
There is no such thing as “optional” fields in Apache Avro documentation, but Confluent refers to fields having a default, which could be as simple as
"fields": [
{
"name": "field",
"type": "string",
"default": "default"
}
]
You can also use unions and “null“ (first), but you don't have to.
That's it for Avro, so you can read data with a schema with extra fields that are not in the writers schema. Fields that are not in the reading schema are silently ignored, which Confluent refers to “deleted fields”.
As for Confluent Avro, they have different compatibility rules (and a different serialisation format) than Apache Avro, but these are documented in “Compatibility Types” you cited.

Related

Replace a field of enum value in AVRO schema to string

I need a little help in removing an ENUM and replacing it with String in an AVRO schema.
I have an avro schema file which has something like this among other entries:
{
"name": "anonymizedLanguage",
"type": [
"null",
"com.publicevents.common.LanguageCode"
],
"default": null
}
The LanguageCode is also an avsc file with entries as below:
{
"name": "LanguageCode",
"type": "enum",
"namespace": "com.publicevents.common",
"symbols": [
"EN",
"NL",
"FR",
"ES"
]
}
I want to remove the language enum and move it to a string having the language code. How would I go about doing that ?
You can only do "type": ["null", "string"]. You cannot make it "have" anything specific to a language within the schema, that's what an enum is for. Once it is a plain string, that would be app-specific validation logic to enforce it have specific values.

Can I delete am element from my AVRO schema

Can I delete an element from my AVRO schema, see the enum below, can I remove it? Reason is I want to add a list type instead which can take multiple values from the same enum.
"fields": [{
"name": "etype",
"type":
{
"type": "enum",
"name": "EFilter",
"symbols" : ["ONE", "TWO", "THREE", "FOUR"]
},
"doc": "event types"
},
Are you using the Schema Registry?
If so, you could try removing the field and post the new schema against the latest version of the schema
https://docs.confluent.io/current/schema-registry/develop/api.html#heading2-4
Removing a field is considered a backwards compatible change.
One option is to just add your new list field, then populate the enum with some dummy value during serialization, and ignore it during deserialization.

Defining Apache Avro Schema fullname in Apache NiFi

Using NiFi 1.7.1 (which uses Java Avro 1.8.1) and in the AvroSchemaRegistry, I'm trying to define a schema which has the fields name and app.name at the top level. According to the Avro docs[1] I would assume that I could just define the fullname like normal "name": "app.name" but I hit the error Illegal character in: app.name. It's true that the name portion of the fullname does not allow dots but according to the docs: "If the name specified contains a dot, then it is assumed to be a fullname..."
I then tried using the namespace field. Using the following schema:
{
"type": "record",
"name": "nameRecord",
"fields": [
{
"type": [
"string",
"null"
],
"name": "name"
},
{
"type": [
"string",
"null"
],
"namespace": "app",
"name": "name"
}
]
}
I hit this error: Duplicate field name in record nameRecord: name type:UNION pos:1 and name type:UNION pos:0
Ultimately, I'd like to be able to define a schema for record like this (in JSON):
{
"name": "Joe",
"app.name": "NiFi"
}
[1] https://avro.apache.org/docs/1.8.1/spec.html#names
According to the docs, namespaces are only supported for record, enum, and fixed types, and other fields must adhere to the "regular" naming conventions for which a period (.) is not a valid character.
However as of NiFi 1.5.0 (via NIFI-4612), you could specify the schema in an AvroSchemaRegistry, and set "Validate Field Names" to false. This should allow you to bypass the restriction of having a field's name be app.name.

Avro Schema - what is "avro.java.string": "String"

I've got my Kafka Streams processing configuration for AUTO_REGISTER_SCHEMAS set to true.
I noticed in this auto generated schema it creates the following 2 types
{
"name": "id",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
Could someone please explain why it creates 2 types and what exactly "avro.java.string": "String" is.
Thanks
By default Avro uses CharSequence for the String representation, the following syntax allows you to overwrite the default behavior and use java.lang.String as the String type for the instances of the fields declared like this
"type": {
"type": "string",
"avro.java.string": "String"
}

How to pass empty value to a query param in swagger?

I am trying to access an url similar to http://example.com/service1?q1=a&q2=b. However q1 will not have any values associated with it sometimes but is required to access the service (http://example.com/service1?q1=&q2=b). How do I achieve this through swagger ui JSON. I've tried using allowEmptyValue option but it doesn't seem to work.
Please find below the sample JSON I tried using allowEmptyValue option,
{
"path": "/service1.do",
"operations": [{
"method": "GET",
"type": "void",
"parameters": [{
"name": "q1",
"in" : "query",
"required": false,
"type": "string",
"paramType": "query",
"allowEmptyValue": true
},{
"name": "q2",
"in" : "query",
"required": true,
"type": "string",
"paramType": "query",
}
],
"responseMessages": [{
"code": 200,
"responseModel": "/successResponseModel"
}
}
When an empty value is passed to q1, swagger frames the URL as http://example.com/service1?q2=b. Is there anyway to include q1 with empty values to be included in the URL (http://example.com/service1?q1=&q2=b) ?
Any help would be greatly appreciated.
It looks like your problem is a known issue of swagger-ui that hasn't fixed yet. see.
As a workaround you may do one of the followings.
Option 1: Specify a default value.
This option have nothing to do with swagger-ui. In your ws-implementation, You have to add a default value(in your case "") to use when 'q1' is not added. Any REST framework has this option.
As the ws-implementation perspectives, this should be there in your ws, unless you have another service to be triggered when 'q1' is not added. (which might not be a good design in most cases) And you can use this as a forever solution, not temporary.
Option 2: using enums (not a consistent solution)
As Explained in this. You can specify your query parameter 'q1' as follows for your swagger definition.
{
"in": "query",
"name": "q1",
"type": "boolean",
"required": false,
"enum" : [true],
"allowEmptyValue" : true
}
(1) "required" must be false.
(2) "allowEmptyValue" must be true.
(3) "enum" must have exactly one non-empty value.
(4) "type" must be "boolean". (or "string" with a special enum, say "INCLUDE")
I managed to solve this by setting:
"required": true,
"allowEmptyValue": true
In the Swagger-UI a checkbox will then be displayed where you can send the empty value. That worked for me. If that was checked and an empty query parameter was passed, the URL would look something like this: https://example.com?

Resources