MongoDB Stitch Document insertion not passing through Schema check - mongodb-stitch

In Stitch -> rules, I open up MyCollection, clicked on the Schema tab, and typed the following:
{
"bsonType": "object",
"required": [
"roadmapTitle"
],
"properties": {
"roadmapTitle": {
"bsonType": "string"
},
"creatorId": {
"bsonType": "objectId"
}
}
}
Saved & Deployed. However, if I insert a document like MyCollection.insertOne({nothing here: "hehe"}), the Log shows okay, the object is stored, no error is caught. Interestingly though, running the validate function next to "edit schema" in the Schema field shows that all of those documents failed schema validation. Did I miss a step in enforcing the Schema validation on document insertion?

Related

System for data validation and class generation (Avro vs Json Schema vs OpenAPI)

We want to have a system that allows us to define data schemas that we can use to validate our data, and to generate code in specific languages. We found json schema's that lets us do something like
File "message.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Message",
"properties": {
"name": {
"type" : "string"
},
"type": {
"$ref": "type/message_type.schema.json"
},
"message_id":{
"$ref": "type/uuid.schema.json"
}
},
"required": ["name", "message_id"]
}
File "message_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "MessageType",
"enum": ["Message", "Query"]
}
File "uuid_type.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "UUID",
"type": "string",
"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
}
File "query.json.schema"
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Query",
"allOf" : [ {"$ref": "type/message.schema.json" }],
"required": ["type"]
}
Please ignore if there is something that doesn't make sense but the point is, we really enjoy this system because it allows us to define types, and to refer to types that we create in another files, and even to use them like for type inheritance.
Then we want to use this files for code generation and validation. In python we then use a library called python_jsonschema_objects that can parse this files and the files that it references recursively, and we can then really simply create a python object with all the validation included.
But then we also want to use them for Java/Kotlin but the library that we found jsonschema2pojo doesn't seem able to parse linked files expecting everything to be in the same file.
This leads us to think that for some reason Json Schema is not that supported or used, unfortunately.
So, we have the question if a system like Avro or OpenAPI would be better supported and more widely used and could be chosen to this type of task.

MS graph api schema extension filter bug with outlook resources (messages, events, contacts)

When trying to filter for a custom created schema extension:
https://graph.microsoft.com/v1.0/me/events?$filter=(<schemaId>/<key> eq '<value>')
the error message we get is:
"message": "Could not find a property named 'e2_<ourTenantID>_<schemaId>' on type 'Microsoft.OutlookServices.Event'"
The problem is that before performing search, the API prepends tenantID to schema ID, thus failing to recognize the property. It seems that graph API performs search using their own internal schema ID.
Interesting thing is, that when searching for a non-existing schema, the tenantID is not added.
The problem persists when testing to filter messages, events and contacts.
Our schema extension creation JSON:
{
"description": "Extension to help avoid duplicates",
"targetTypes": [
"Contact",
"Message",
"Event"
],
"properties": [
{
"name": "UniqueId",
"type": "String"
}
],
"status": "InDevelopment",
"owner": "<appID>",
"id": "<name>",
"#odata.type": "#microsoft.graph.ComplexExtensionValue"
}

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Avro Schema - what is "avro.java.string": "String"

I've got my Kafka Streams processing configuration for AUTO_REGISTER_SCHEMAS set to true.
I noticed in this auto generated schema it creates the following 2 types
{
"name": "id",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
Could someone please explain why it creates 2 types and what exactly "avro.java.string": "String" is.
Thanks
By default Avro uses CharSequence for the String representation, the following syntax allows you to overwrite the default behavior and use java.lang.String as the String type for the instances of the fields declared like this
"type": {
"type": "string",
"avro.java.string": "String"
}

Join two nodes in Firebase

I'm working on an app, which is supposed to show data from two nodes(Firebase). Firebase DB is structured as:
{
"College": {
"4F2EAB65": {
"id": "4F2EAB65",
"name": "SomeCollege"
},
"A3C2ED31": {
"id": "A3C2ED31",
"name": "OtherCollege"
},
"F967B5A0": {
"id": "F967B5A0",
"name": "CoolCollege"
}
},
"Student": {
"3E20545B": {
"college-ID": "4F2EAB65",
"id": "3E20545B",
"name": "A"
},
"6FDEE194": {
"college-ID": "F967B5A0",
"id": "6FDEE194",
"name": "B"
}
}
I want to fetch student details having details: "id", "name", "college-ID", "college-Name"(Need to fetch "college-Name" by "college-ID").
I've achieved this using for loop at front end. Is there any way to get this achieved at Firebase server, also can we make something like join (SQL).
Thanks.
There is no support for server-side joins in the Firebase Realtime Database. Client-side joins are quite normal.
The alternative is to duplicate the data upon writing, so that you don't have to read from two locations.
What's best for your application is a matter of personal preference, your comfort level with the code involved vs data duplication, and the use-cases of your app.
Client-side jons are likely not as slow as you may think. See http://stackoverflow.com/questions/35931526/speed-up-fetching-posts-for-my-social-network-app-by-using-query-instead-of-obse/35932786#35932786

Resources