How does schema evolution in JDBC Kafka Connector work? - avro

I've configured my confluent schema registry's compatibility to BACKWARD_TRANSITIVE. I'm using confluent jdbc connector to pull incremental changes from MySQL DB.
Suppose my initial schema v1 looks like this
{"connect.name": "customers",
"type": "record",
"name": "user",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": "int"},
{"name": "favorite_color", "type": "string", "default": null}
]
}
I perform following actions on the customers table in my database and schedule jdbc connect to pull data from database after each action (I also pushed some DML after each action to capture events into Kafka topic)
I drop favorite_color (optional) column from customers table
I add a new (optional) column address to customers table
When 1 is performed I observe no change in my AVRO schema. However, when action 2 is performed, I notice that the schema version bumps up to v2 with below changes
{"connect.name": "customers",
"type": "record",
"name": "user",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": "int"},
{"name": "address", "type": "string", "default": null}
]
}
favorite_color column got dropped from schema and address column got introduced.
I noticed similar behaviour for non optional columns as well.
Why didn't schema registry update the schema when action 1 was
performed?
Why did it update when 2 was performed?
Could you please walk me through the kind of schema changes gets
updated eagerly and the ones updated lazily?
The behaviour is same irrespective of compatibility types.

Related

how to evolve schema in confluent schema registry

I'm working with Avro/Kafka and Confluent's Schema Registry for Avro.
I made some basic schemas and subjects with basic types using avsc files and avdl.
I am looking at the API's documentation made by Confluent to try to evolve a Schema to version 2. Particularly this part:
https://docs.confluent.io/current/schema-registry/using.html#register-a-new-version-of-a-schema-under-the-subject-kafka-key
But when I try to POST to this endpoint I get a 422 Conflict.
I'm using BACKWARDS compatibility and am updating just one field from the previous version:
{
"type": "record",
"name": "Address",
"fields": [
{"name": "id", "type": "string"},
{"name": "street", "type": "string"}
]
}
And the new version:
{
"type": "record",
"name": "Address",
"fields": [
{"name": "id", "type": "string"},
{"name": "street", "type": "string"},
{"name": "number", "type": "int"}
]
}
Can anyone tell me how to evolve a schema?
Solved it by setting the
access.control.allow.methods=GET,POST,OPTIONS,PUT,DELETE
I had previously set to allow CORSto * but it seems it just allows GET (?) so after changing this property I was able to change/evolve the schema.

Snipcart add item via JS API

I'm building a very small e-commerce website for selling customizable jewels, so I have a graphical configurator that lets you design the jewel and then you can add it to the cart. The product should have a custom field in JSON format that contains the item configuration. I see that Snipcart has data-item-custom{x} fields, but is populated only with dropdowns... is not suitable for me.
Do you think I can handle this situation with Snipcart? Can I simply update via JS the HTML data-item- fields content? Or add the item to the cart via JS?
addToCart({
name: 'Bracelet 1',
customField1: 'JSON HERE'
})
There's a Javascript API available for Snipcart.
It does allow to add product dynamically, however, the syntax for custom fields is slightly different. The example from the doc for Snipcart.api.items.add show how to use custom fields (removed unused fields for brevity):
Snipcart.api.items.add({
"id": "SMARTPHONE",
"name": "Smartphone",
"url": "/",
"price": "399.00",
"customFields": [{
"name": "Memory size",
"options": "16GB|32GB[+50.00]",
"value": "32GB"
}]
});
So instead of the flattened version with customFieldX, you can pass an array to customFields. The dropdown format is only used if you pass an options. For your use case this would become:
Snipcart.api.items.add({
"id": "SMARTPHONE",
"name": "Smartphone",
"url": "/",
"price": "399.00",
"customFields": [{
"name": "configuration",
"value": "{\"option1\":\"value1\"}" //...
}]
});
However, custom fields are shown to the customer which would not be ideal to show them the raw json data. To pass hidden data you can instead use metadata which already expect a JSON object:
Snipcart.api.items.add({
"id": "SMARTPHONE",
"name": "Smartphone",
"url": "/",
"price": "399.00",
"customFields": [{
"metadata": {
"configuration": "configuration data"
}
});

Defining Apache Avro Schema fullname in Apache NiFi

Using NiFi 1.7.1 (which uses Java Avro 1.8.1) and in the AvroSchemaRegistry, I'm trying to define a schema which has the fields name and app.name at the top level. According to the Avro docs[1] I would assume that I could just define the fullname like normal "name": "app.name" but I hit the error Illegal character in: app.name. It's true that the name portion of the fullname does not allow dots but according to the docs: "If the name specified contains a dot, then it is assumed to be a fullname..."
I then tried using the namespace field. Using the following schema:
{
"type": "record",
"name": "nameRecord",
"fields": [
{
"type": [
"string",
"null"
],
"name": "name"
},
{
"type": [
"string",
"null"
],
"namespace": "app",
"name": "name"
}
]
}
I hit this error: Duplicate field name in record nameRecord: name type:UNION pos:1 and name type:UNION pos:0
Ultimately, I'd like to be able to define a schema for record like this (in JSON):
{
"name": "Joe",
"app.name": "NiFi"
}
[1] https://avro.apache.org/docs/1.8.1/spec.html#names
According to the docs, namespaces are only supported for record, enum, and fixed types, and other fields must adhere to the "regular" naming conventions for which a period (.) is not a valid character.
However as of NiFi 1.5.0 (via NIFI-4612), you could specify the schema in an AvroSchemaRegistry, and set "Validate Field Names" to false. This should allow you to bypass the restriction of having a field's name be app.name.

MS graph api schema extension filter bug with outlook resources (messages, events, contacts)

When trying to filter for a custom created schema extension:
https://graph.microsoft.com/v1.0/me/events?$filter=(<schemaId>/<key> eq '<value>')
the error message we get is:
"message": "Could not find a property named 'e2_<ourTenantID>_<schemaId>' on type 'Microsoft.OutlookServices.Event'"
The problem is that before performing search, the API prepends tenantID to schema ID, thus failing to recognize the property. It seems that graph API performs search using their own internal schema ID.
Interesting thing is, that when searching for a non-existing schema, the tenantID is not added.
The problem persists when testing to filter messages, events and contacts.
Our schema extension creation JSON:
{
"description": "Extension to help avoid duplicates",
"targetTypes": [
"Contact",
"Message",
"Event"
],
"properties": [
{
"name": "UniqueId",
"type": "String"
}
],
"status": "InDevelopment",
"owner": "<appID>",
"id": "<name>",
"#odata.type": "#microsoft.graph.ComplexExtensionValue"
}

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Resources