System for data validation and class generation (Avro vs Json Schema vs OpenAPI) - avro

We want to have a system that allows us to define data schemas that we can use to validate our data, and to generate code in specific languages. We found json schema's that lets us do something like
File "message.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Message",
"properties": {
"name": {
"type" : "string"
},
"type": {
"$ref": "type/message_type.schema.json"
},
"message_id":{
"$ref": "type/uuid.schema.json"
}
},
"required": ["name", "message_id"]
}
File "message_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "MessageType",
"enum": ["Message", "Query"]
}
File "uuid_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "UUID",
"type": "string",
"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
}
File "query.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Query",
"allOf" : [ {"$ref": "type/message.schema.json" }],
"required": ["type"]
}
Please ignore if there is something that doesn't make sense but the point is, we really enjoy this system because it allows us to define types, and to refer to types that we create in another files, and even to use them like for type inheritance.
Then we want to use this files for code generation and validation. In python we then use a library called python_jsonschema_objects that can parse this files and the files that it references recursively, and we can then really simply create a python object with all the validation included.
But then we also want to use them for Java/Kotlin but the library that we found jsonschema2pojo doesn't seem able to parse linked files expecting everything to be in the same file.
This leads us to think that for some reason Json Schema is not that supported or used, unfortunately.
So, we have the question if a system like Avro or OpenAPI would be better supported and more widely used and could be chosen to this type of task.

Related

MongoDB Stitch Document insertion not passing through Schema check

In Stitch -> rules, I open up MyCollection, clicked on the Schema tab, and typed the following:
{
"bsonType": "object",
"required": [
"roadmapTitle"
],
"properties": {
"roadmapTitle": {
"bsonType": "string"
},
"creatorId": {
"bsonType": "objectId"
}
}
}
Saved & Deployed. However, if I insert a document like MyCollection.insertOne({nothing here: "hehe"}), the Log shows okay, the object is stored, no error is caught. Interestingly though, running the validate function next to "edit schema" in the Schema field shows that all of those documents failed schema validation. Did I miss a step in enforcing the Schema validation on document insertion?

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Google Cloud Endpoints REST Discovery Document missing format

I've upgraded to Cloud Endpoints 2.0 which no longer supports RPC. Therefore, I generated a new discovery document and used the service generator with the REST discovery doc as input in order to generate the client library for my iOS app.
Using the new REST discovery doc I am getting the following error when trying to generate the library:
~/workspace/google-api-objectivec-client-for-rest/Source/Tools/ServiceGenerator/build/Release/ServiceGenerator discovery/servUsApi-v1-rest.discovery --outputDir GTLAPI --gtlrFrameworkName GoogleAPIClientForREST
ERROR: Failure, exception: Looking at parameter 'creditKickbackKash:creditAmount', found a type/format pair of 'number/(null)', and don't how to map that to Objective-C
I was able to manually fix this by adding (in numerous places) in the discovery doc, the "format": "double" key and value for all double parameters. Notice creditAmount below is missing a format, like all other doubles.
The generated discovery doc looks like this:
"creditKickbackKash": {
"httpMethod": "PUT",
"id": "servUsApi.admin.creditKickbackKash",
"parameterOrder": [
"userId",
"creditAmount"
],
"parameters": {
"userId": {
"format": "int64",
"location": "path",
"required": true,
"type": "string"
},
"creditAmount": {
"location": "path",
"required": true,
"type": "number"
}
},
"path": "creditKickbackKash/{userId}/{creditAmount}",
"response": {
"$ref": "ResultDTO"
},
"scopes": [
"https://www.googleapis.com/auth/userinfo.email"
]
}
Is anyone else having this issue? How can I get the discovery document generation to properly format the document including double number types?
I had the same problem. I rolled back from 1.9.50 to 1.9.48 and the problem is gone.

Dart Services API: how to access the /fixes through the API service?

I am looking at:
https://github.com/dart-lang/dart-services
And am trying to adapt the example given (https://dartpad.dartlang.org/2a7fd9328e0a567ee79b) to pull back information from '/api/dartservices/v1/fixes' rather than '/api/dartservices/v1/analyze'.
Apologies, if I am missing something obvious here but changing the path in the example to:
"https://dart-services.appspot.com/api/dartservices/v1/fixes";
returns an error. Does anyone know how I can get the information from '/api/dartservices/v1/fixes rather than '/api/dartservices/v1/analyze'? Or does anyone have an example of this working?
Thanks.
Sending a POST request to https://dart-services.appspot.com/api/dartservices/v1/fixes with the data the DartPad example sends yields the error message "Missing parameter: 'offset'".
Looking at the discovery doc for the service https://dart-services.appspot.com/api/discovery/v1/apis/dartservices/v1/rest I see both analyze and fixes operations take a SourceRequest:
"SourceRequest": {
"id": "SourceRequest",
"type": "object",
"properties": {
"source": {
"type": "string",
"description": "The Dart source.",
"required": true
},
"offset": {
"type": "integer",
"description": "An optional offset into the source code.",
"format": "int32"
},
"strongMode": {
"type": "boolean",
"description": "An optional signal whether the source should be processed in strong mode"
}
}
offset is not marked as required so maybe there is a bug in the implementation of fixes wrt that parameter.
To make the DartPad example work, change:
Map m = {'source': textArea.value};
to
Map m = {'source': textArea.value, 'offset': 0};

Nested query parameters in Swagger 2.0

I'm documenting a Rails app with Swagger 2.0 and using Swagger-UI as the human-readable documentation/sandbox solution.
I have a resource where clients can store arbitrary metadata to query later. According to the Rails convention, the query would be submitted like so:
/posts?metadata[thing1]=abc&metadata[thing2]=def
which Rails translates to params of:
{ "metadata" => { "thing1" => "abc", "thing2" => "def" } }
which can easily be used to generate the appropriate WHERE clause for the database.
Is there any support for something like this in Swagger? I want to ultimately have Swagger-UI give some way to modify the generated request to add on arbitrary params under the metadata namespace.
This doesn't appear supported yet (over 2 years after you asked the question), but there's an ongoing discussion & open ticket about adding support for this on the OpenAPI github repo. They refer to this type of nesting as deepObjects.
There's another open issue where an implementation was attempted here. Using the most recent stable swagger-ui release, however, I have observed it working as I expect:
"parameters": [
{
"name": "page[number]",
"in": "query",
"type": "integer",
"default": 1,
"required": false
},
{
"name": "page[size]",
"in": "query",
"type": "integer",
"default": 25,
"required": false
}
This presents the expected dialog box & works with Try it out against a working server.
I don't believe there is a good way to specify arbitrary or a selection of values (e.g. an enum), so you may have to add parameters for every nesting option.

Resources