What is the AsyncAPI yaml equivalent representation of a Map in Avro schema - avro

Trying to Map the data types supported in AsyncAPI to the ones available in Avro Schema.
We have a Data type available in Avro schema that is Map , trying to find a way to represent it in Async API Yaml. Can anyone please advice ?

There is no direct equivalent in AsyncAPI to an Avro Map but you can encode a map as an object by using the additionalProperties validation keyword:
type: object
additionalProperties:
type: integer
might correspond to an Avro map<int>.
For complex types, you can define a schema to represent the complex type and use a $ref instead of a type in the additionalProperties.

Related

Can GeoJSON Feature Properties contain nested objects?

According to RFC7946 Section 3.2,
A Feature object has a member with the name "properties". The
value of the properties member is an object (any JSON object or a
JSON null value).
For years, I've been under the impression that feature properties should be one level deep. Based on the RFC, does this mean that it's valid to store a deeply nested JSON object within the feature properties?
They can indeed be arbitrary JSON, as demonstrated in the RFC:
https://www.rfc-editor.org/rfc/rfc7946#section-1.5

Partial deserialization with Apache Avro

Is it possible to deserialize a subset of fields from a large object serialized using Apache Avro without deserializing all the fields? I'm using GenericDatumReader and the GenericRecord contains all the fields.
I'm pretty sure you can't do it using GenericDatumReader, but my question is whether it is possible given the binary format of Avro.
Conceptually, binary serialization of Avro data is in-order and depth-first. As you traverse the data, record fields are serialized one after the other, lists are serialized from the top to the bottom, etc.
Within one object, there no markers to separate fields, no tags to identify specific fields, and no index into the binary data to help quickly scan to specific fields.
Depending on your schema, you could write custom code to skip some kinds of data ... for example, if a field is a LIST of FIXED bytes, you could read the size of the list and just jump over the data to the next field. This is pretty specific and wouldn't work for most Avro types though (notably integers are variable length when encoded).
Even in that unlikely case, I don't believe there are any helpers in the Java SDK that would be useful.
In brief, Avro isn't designed to do that, and you're probably not going to find a satisfactory way to do a projection on your Schema without deserializing the entire object. If you have a collection, column-oriented persistence like Parquet is probably the right thing to do!
It is possible if the fields you want to read occur first in the record. We do this in some cases where we want to read only the header fields of an object, not the full data which follows.
You can create a "subset" schema containing just those first fields, and pass this to GenericDatumReader. Avro will deserialise those fields, and anything which comes after will be ignored, because the schema doesn't "know" about it.
But this won't work for the general case where you want to pick out fields from within the middle of a record.

AvroData replaces nulls with schema default values

i'm using io.confluent.connect.avro.AvroData.fromConnectData to convert message before serialization.
AvroData uses struct.get(field) to get values which in turn replaces nulls with schema default values.
as i understand from avro doc default values should be used for schema compatibility when reader expects field that missing in writer schema (not particular message).
so my question is: is it correct way to replace nulls with schema default value? or maybe i should use another way to convert messages?
The miss understanding is that the default value is not used to replace null values, it is used to populate your field value in case that your data does not include the field. This is primary used for schema evolution purposes. What you are trying to do (replace null values coming as part of your data with another value) is not possible through avro schemas, you will need to deal with it in your program.

Schema object without a type attribute in Swagger 2.0

Does a Schema object in Swagger/OpenAPI 2.0 have to have the type attribute or not?
On the one hand, according to the JSON Schema Draft 4 spec, not specifying the type attribute is OK and means that the instance can be of any type (an object, an array or a primitive).
On the other hand, I've seen a lot of Swagger schemas which contain Schema objects without the type attribute, but with the properties attribute, which makes it clear that the schema author wants the instance to be a proper object (and doesn't want to accept arrays or primitive as valid values).
Are all those schemas incorrect? Or is type: object implied by the presence of properties? There's nothing in either the Swagger or the JSON Schema spec that says that is the case. In fact, I've seen comments that explicitly say that's NOT the case.
Like in JSON Schema, OpenAPI schema objects do not require a type, and you are correct in that no type means any type.
"Type-specific" keywords such as properties, items, minLength, etc. do not enforce a type on the schema. It works the other way around – when an instance is validated against a schema, these keywords only apply when the instance is of the corresponding type, otherwise they are ignored. Here's the relevant part of the JSON Schema Validation spec:
4.1. Keywords and instance primitive types
Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed.
For example, consider this schema:
Something:
properties:
id:
type: integer
required: [id]
minLength: 8
It's a valid schema, even though it combines object-specific keywords properties and required and string-specific keyword minLength. This schema means:
If the instance is an object, it must have an integer property named id. For example, {"id": 4} and {"id": -1, "foo": "bar"} are valid, but {} and {"id": "ACB123"} are not.
If the instance is a string, it must contain at least 8 characters. "Hello, world!" is valid, but "" and abc are not.
Any instances of other types are valid - true, false, -1.234, [], [1, 2, 3], [1, "foo", true], etc. In OpenAPI 3.x, untyped schemas also allow null values.
If there are tools that infer the type from other keywords (for example, handle schemas with no type but with properties as always objects), then these tools are not exactly following the OpenAPI Specification and JSON Schema.
Bottom line: If a schema must always be an object, add type: object explicitly. Otherwise you might get unexpected results.

Does Neo4j support complex property types?

I have some complex property types such as polygon in my RDBMS, and I want to convert them into neo4j, but in reading the official documentation, I find that the property type should be bool/byte/short/int/log/float/double/char/string. I am wondering, does neo4j support complex property types?
It depends on how "complex" the data needs to be. In addition to the scalar types that you listed, a neo4j property can also contain an array of one of those types.
So, for instance, you could store the coordinates of an N-vertex polygon in an array of size 2*N, where each X/Y pair is stored as consecutive numbers. No "complex" type is really needed.

Resources