The complete schema is the following:
{
"type": "record",
"name": "envelope",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
]
},
{
"name": "after",
"type": [
"null",
"row"
]
}
]
}
I wanted to programmatically extract the following sub-schema:
{
"type": "record",
"name": "row",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "timestamp",
"type": "long"
}
]
}
As you see, field "before" is nullable. I can extract it's schema by doing:
schema.getField("before").schema()
But the schema is not a record as it contains NULL at the beginning(UNION type) and I can't go inside to fetch schema of "row".
["null",{"type":"record","name":"row","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"long"}]}]
I want to fetch the sub-schema because I want to create GenericRecord out of it. Basically I want to create two GenericRecords "before" and "after" and add them to the main GenericRecord created from full schema.
Any help will be highly appreciated.
Good news, if you have a union schema, you can go inside to fetch the list of possible options:
Schema unionSchema = schema.getField("before").schema();
List<Schema> unionSchemaContains = unionSchema.getTypes();
At that point, you can look inside the list to find the one that corresponds to the Type.RECORD.
Related
I'd like to validate a json object with the json-schema, but that json object can duplicate its values as many times as the user wants.
The keys of that object can be repeated as many times as the user wishes at the times the user create his json.
example 1: (collection with object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
}
]
}
JSON-SCHEMA of Example 1
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
example 2: (collection with 2 object)
{
"info":
[
{
"name": "aaron",
"email": "aaron.com"
},
{
"name": "misa",
"email": "misa.com"
}
]
}
JSON SCHEMA of example 2
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
},
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
}
},
"required": [
"info"
]
}
In short, what I am looking for is a dynamic json schema that no matter how many times the collection grows, it can use only 1 and not generate several.
As you're using draft-04, I'll quote from the draft-04 specification.
This means you want items to have an object value as opposed to an array of objects.
The value of "items" MUST be either an object or an array. If it is
an object, this object MUST be a valid JSON Schema. If it is an
array, items of this array MUST be objects, and each of these objects
MUST be a valid JSON Schema.
Draft-04 specificiation https://datatracker.ietf.org/doc/html/draft-fge-json-schema-validation-00#section-5.3.1
In JSON Schema 2020-12, items may ONLY be an object value, and you must use a different keyword for tuple like validation.
This worked for me-
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"info": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
]
"additionalItems":{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
}
},
"required": [
"name",
"email"
]
}
}
},
"required": [
"info"
]
}
Question 1
I'm wondering whether below schema is valid or not for an Avro schema. Note that it is missing name in the first object of fields array.
{
"name": "AgentRecommendationList",
"type": "record",
"fields": [
{
"type": {
"type": "array",
"items": {
"name": "friend",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "phoneNumber",
"type": "string"
},
{
"name": "email",
"type": "string"
}
]
}
}
}
]
}
Which actually designed to target below kind of data
[
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
},
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
},
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
}
]
Based on reading below, seems like array without name like this are not permitted
Avro Schema failure
There is no way to define and avro schema with an array without a field name.
https://avro.apache.org/docs/current/spec.html#schema_complex
name: a JSON string providing the name of the field (required), and
I'm suspecting that below is the correct ones
{
"name": "AgentRecommendationList",
"type": "record",
"fields": [
{
"name": "friends",
"type": {
"type": "array",
"items": {
"name": "friend",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "phoneNumber",
"type": "string"
},
{
"name": "email",
"type": "string"
}
]
}
}
}
]
}
And it should have a data like below, in order to do the avro conversion successfully
{
"friends": [
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
},
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
},
{
"Name": "1",
"phoneNumber": "2",
"email": "3"
}
]
}
Question 2
Does below schema is a valid schema? This target the array without name in first example...
{
"name": "AgentRecommendationList",
"type": "array",
"items": {
"name": "friend",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "phoneNumber",
"type": "string"
},
{
"name": "email",
"type": "string"
}
]
}
}
I will appreciate if anyone can confirm my understanding... thanks!
For question 1...
Everything you have written is right. The first schema, as you mentioned, is not valid because each field within a record needs to have a name. The corrected schema is valid and the corrected data is right for the updated schema.
For question 2...
The schema in question two is valid, but the AgentRecommendationList name will get ignored. Arrays don't have names. This might sound strange after looking at the examples in question one, but in those the name is part of the field specification, not the array.
I would like to create an AVRO schema that uses fields from another schema without creating a nested record. Take the following example:
Schema A:
{
"type": "record",
"namespace": "example",
"name": "SchemaA",
"fields": [
{
"name": "foo",
"type": "string"
},
{
"name": "bar",
"type": "string"
}
]
}
I want to create Schema B which uses the fields from Schema A and adds an additional field, i.e. the result should be:
{
"type": "record",
"namespace": "example",
"name": "SchemaB",
"fields": [
{
"name": "foo",
"type": "string"
},
{
"name": "bar",
"type": "string"
},
{
"name": "abc",
"type": "string"
}
]
}
Is it possible to create this Schema B by referencing the fields of Schema A?
Note: I do not want to create a nested field like:
{
"type": "record",
"namespace": "example",
"name": "SchemaB",
"fields": [
{
"name": "SchemaA",
"type": "example.SchemaA"
},
{
"name": "abc",
"type": "string"
}
]
}
so im trying to parse an object with this avro schema.
object is like:
myInfo: {size: 'XL'}
But Its behaving like the record type doesn't actually exist and im getting a undefined type name: data.platform_data.test_service.result.record at Function.Type.forSchema for it.
schema looks like:
"avro": {
"metadata": {
"loadType": "full",
"version": "0.1"
},
"schema": {
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
]
}
}
I should mention im also using avsc for this. Anybody have any ideas? I've tried pretty much all combinations but afaik the only way of parsing out an objct like this is with record
Playing around with the schema, I found that "type": "record" is a problem. I moved it to nested definition. And it worked. Seems like description here is little bit confusing.
Change
Before:
{
"name": "myInfo",
"type": "record",
"fields": [{
"name": "size",
"type": {"name":"size", "type": "string"}
}]
}
After:
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
Updated schema which is working:
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": {
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {"name":"size", "type": "string"}
}
]
}
}
]
}
To make a record attribute nullable, process is same as any other attribute. You need to union with "null" (as show in below schema):
{
"name": "data.platform_data.test_service.result",
"type": "record",
"fields": [
{
"name": "myInfo",
"type": [
"null",
{
"type": "record",
"name": "myInfo",
"fields": [
{
"name": "size",
"type": {
"name": "size",
"type": "string"
}
}
]
}
]
}
]
}
I tried different things with the following web ui
https://schema-registry-ui.landoop.com
I couldn't seem to put the following into the registry:
{
"namespace": "test.avro",
"type": "record",
"name": "test",
"fields": [
{
"name": "field1",
"type": "string"
},
{
"name": "field2",
"type": "record",
"fields":[
{"name": "field1", "type": "string" },
{"name": "field2", "type": "string"},
{"name": "intField", "type": "int"}
]
}
]
}
Also, is there a way to refer to another schema from inside the current one to create a compound/nested schema?
Have a look at the example at
https://github.com/Landoop/schema-registry-ui/issues/43
You need to define schema as an array - with the 1st element the nested record
and as a 2nd element the main avro record