Avro schema: adding a new field with default value - straight default value or a union with null? - avro

So I have an avro record like so (call it v1):
record MyRecord {
array<string> keywords;
}
I'd like to add a field caseSensitive with a default value of false (call it v2). The first approach I have is:
record MyRecord {
array<string> keywords;
boolean caseSensitive = false;
}
According to schema evolution, this is both backward and forward compatible because a reader with the new schema v2 reading a record that was encoded with old writer schema v1 will be able to fill this field with the default value and a reader with older schema v1 will be able to read a record encoded with the new writer schema v2 because it will just ignore the newly added field.
Another way to add this field is by adding a union type of null and boolean with a default value of null, like so:
record MyRecord {
array<string> keywords;
union{null, boolean} caseSensitive = null;
}
This is also backward and forward compatible. I can see that sometimes one would want to use the 2nd approach if there is no clear default value for a field (such as name, address, etc.). But given my use case with a clear default value, I'm thinking of going with the first solution. My question is: is there any other concerns that I'm missing here?

There will be a potential issue with writers in the first case--apparently writers do not use default values. So a writer writing "old data" (missing the new field--so writer is publishing a record with the "keywords" field only) will blow up against the first schema. Same writer using second schema will be successful, and the "caseSensitive" field will be set to null in the resulting message.

Related

How to derive record types in F#?

I'm inserting data into Azure CosmosDB via FSharp.ComosDb. Here is the record type that I write in the DB:
[<CLIMutable>]
type DbType =
{ id: Guid
Question: string
Answer: int }
The persistence layer works fine but I face an inelegant redundancy. The record I'm inserting originates from the Data Transfer Object (DTO) with the following shape:
type DataType =
{ QuestionId: Guid
Question: string
Answer: int }
CosmosDb accepts only records with a lowercase id. Is there any way to derive the DbType from DataType or I have to define DbType from scratch?
Is there anything a la copy and update record expression record2 = { record1 with id = record1.QuestionId } but at the type level?
There's no type-level way of deriving one record type from another the way you describe, you can however get reasonably close with the addition of anonymous records in F# 4.6.
type DataType =
{ QuestionId: Guid
Question: string
Answer: int }
let example =
{ QuestionId = Guid.NewGuid()
Question = "The meaning of life etc."
Answer = 42 }
let extended =
{| example with id = example.QuestionId |}
This gives you a value of an anonymous record type with an added field, and may be well suited to your scenario, however it's unwieldy to write code against such type once it leaves the scope of the function you create it in.
If all you care is how this single field is named - serialization libraries usually have ways of providing aliases for field names (like Newtonsoft.Json's JsonProperty attribute). Note that this might be obscured from you by the CosmosDb library you're using, which I'm not familiar with.
Another more involved approach is to use generic envelope types so that the records you persist have a uniform data store specific header across your application:
type Envelope<'record> =
{
id: string
// <other fields as needed>
payload: 'record
}
In that case the envelope contains the fields that your datastore expects to be there to fulfill the contract (+ any application specific metadata you might find useful, like timestamps, event types, versions, whatnot) and spares you the effort of defining data store specific versions of each type you want to persist.
Note that it is still a good idea in general to decouple the internal domain types from the representation you use for storage for maintainability reasons.

Update Nova resource with a empty string as default for field

I have an existing data model that has a unique indexed string field that defaults to an empty string.
I added an $attributes property to the model to default it to an empty string when creating a new object. That works just fine.
However when updating the object and the field remains empty, it will fail since the field is returned as null and the DB field is not nullable. I am not sure of the impact of making that field nullable(). Too much code to dig through so I can't change it.
I am thinking I can observe for an event, and change the value from null to empty string there, but I would rather take care of this in Nova.
Is there anyway to tell Nova an empty field should be saved as an empty string?
Short version: no.
Long version:
This cannot be set in Nova. An empty string will be passed as null in the HTTP request. This behaviour is part of Laravel's core. The only way to make sure it isn't passed as an empty string from Nova is to put form validation on it and make it a required field.
If you don't mind changing code on the Laravel app level, you can disable this behaviour by removing the ConvertEmptyStringsToNull middleware in your app's Kernel.
/app/Http/Kernel.php
protected $middleware = [
...
\Illuminate\Foundation\Http\Middleware\ConvertEmptyStringsToNull::class, // remove this line
];
However, if the DB field isn't nullable, the field has to have a value anyway right? I'd suggest putting an accessor on the model to check for null values:
public function setMyFieldAttribute($val)
{
$this->attributes['my_field'] = $val ?? '';
}
Or, if you want to be really certain it always has a value, add an observer for the saving event (as you already mentioned), and make sure the field has a value before the data is actually saved.

Saving record in RavenDb with F# adding extra Id column

When I save a new F# Record, I'm getting an extra column called Id# in the RavenDb document, and it shows up when I load or view the object in code; it's even being converted to JSON through my F# API.
Here is my F# record type:
type Campaign = { mutable Id : string; name : string; description : string }
I'm not doing anything very exciting to save it:
let save c : Campaign =
use session = store.OpenSession()
session.Store(c)
session.SaveChanges()
c
Saving a new instance of a record creates a document with the Id of campaigns/289. Here is the full value of the document in RavenDb:
{
"Id#": "campaigns/289",
"name": "Recreating Id bug",
"description": "Hello StackOverflow!"
}
Now, when I used this same database (and document) in C#, I didn't get the extra Id# value. This is what a record looks like when I saved it in C#:
{
"Description": "Hello StackOverflow!",
"Name": "Look this worked fine",
}
(Aside - "name" vs "Name" means I have 2 name columns in my document. I understand that problem, at least).
So my question is: How do I get rid of the extra Id# property being created when I save an F# record in RavenDb?
As noted by Fyodor, this is caused by how F# generates a backing field when you create a record type. The default contract resolver for RavenDB serializes that backing field instead of the public property.
You can change the default contract resolver in ravendb. It will look something like this if you want to use the Newtonsoft Json.Net:
DocumentStore.Conventions.JsonContractResolver <- new CamelCasePropertyNamesContractResolver()
There is an explanation for why this works here (see the section titled: "The explanation"). Briefly, the Newtonsoft library uses the public properties of the type instead of the private backing fields.
I also recommend, instead of having the mutable property on the Id, you can put the [<CLIMutable>] attribute on the type itself like:
[<CLIMutable>]
type Campaign = { Id : string; name : string; description : string }
This makes it so libraries can mutate the values while preventing it in your code.
This is a combination of... well, you can't quite call them "bugs", so let's say "non-straightforward features" in both F# compiler and RavenDb.
The F# compiler generates a public backing field for the Id record field. This field is named Id# (a standard pattern for all F# backing fields), and it's public, because the record field is mutable. For immutable record fields, backing fields will be internal. Why it needs to generate a public backing field for mutable record fields, I don't know.
Now, RavenDb, when generating the schema, apparently looks at both properties and fields. This is a bit non-standard. The usual practice is to consider only properties. But alas, Raven picks up the public field named Id#, and makes it part of the schema.
You can combat this problem in two ways:
First, you could make the Id field immutable. I'm not sure whether that would work for you or RavenDb. Perhaps not, since the Id is probably generated on insert.
Second, you could declare your Campaign not as an F# record, but as a true class:
type Campaign( id: int, name: string, description: string ) =
member val Id = id with get, set
member val name = name
member val description = description
This way, all backing fields stay internal and no confusion will arise. The drawback is that you have to write every field twice: first as constructor argument, then as class member.

Encode GenericData.Record field separately as encoded key

I am trying to use Avro to encode key / value pairs, but can't figure out how to encode just a single field in a schema / GenericData.Record in order to make the key.
Take this simple schema:
{"name":"TestRecord", "type":"record", "fields":[
{"name":"id", "type":"long"},
{"name":"name", "type":"string"},
{"name":"desc", "default":null, "type":["null","string"]}
]}
I am encoding records like this:
val testRecordSchema = schemaParser.parse(testRecordSchemaString)
val writer = new GenericDatumWriter[GenericRecord](testRecordSchema)
val baos = new ByteArrayOutputStream()
val encoder = EncoderFactory.get().binaryEncoder(baos, null)
val record = new org.apache.avro.generic.GenericData.Record(schema)
record.put("id", 1L)
record.put("name", "test")
writer.write(record, encoder)
encoder.flush
But now say I wanted to encode separately just the id field, to use as the key, and I want to do it by name because sometimes I want to use the name field as the key instead of the id.
I tried multiple permutations of GenericDatumWriter. GenericDatumWriter has a method called writeField that looks promising, but it is protected. Otherwise it looks like you have to write complete records.
I could wrap my field in a new record type defined in a new schema, for example:
{"name":"TestRecordKey", "type":"record", "fields":[
{"name":"id", "type":"long"}
]}
I'm 100% sure I can make that work, but then I have to create a new record type and manage it for every key field. That's not minor, and it really seems like there should be some more simple way to do this.
As it turns out, it wasn't that difficult just to create a new record-type schema with only one field -- the field I want to use as the key, like the example I have above:
{"name":"TestRecordKey", "type":"record", "fields":[
{"name":"id", "type":"long"}
]}
I do it on the fly, as I initialize my Schema.Parser with the payload schemas -- I just create the key schema based on the payload schema programmatically.
Was hoping for a less long-hand solution, but this works. I'll still upvote and accept any solution that is cleaner.

Grails Spring Batch - pattern for CRUD from record format (how to implement Delete)

I'm looking at using Spring Batch within Grails using the Grails Spring Batch plugin.
If I have a number of fixed length records referring to an entity in my input file, where part of that record indicates whether the record is a new item, an existing item that should be updated or an existing item that should be deleted, what is the best pattern for integrating into the Spring Batch pattern?
So if my possible records look like this:
// create new record of type AA, data is 12345
AAN12345
// update record of type AA, data is 12345 (assume that the data is the key and I can find the existing AA item using this key)
AAU12345
// delete record of type AA using 12345 as the key
AAD12345
I'm happy with a LineMapper that takes a line from a FlatFileItemReader and creates a new item and passes it to a writer for saving.
The LineMapper could look like:
class AaLineMapper implements LineMapper<AaItem> {
#Override
AaItem mapLine(String line, int lineNumber) throws Exception {
switch (line[0..1]) {
case 'N':
AaItem item = new AaItem()
// set fields here based on line
return item
break
case 'U':
// possibly this?
AaItem item = AaItem.findByKey(someValueWithinLine)
// set fields here based on line
return item
break
case 'D':
// not sure on this one, deleting and returning null doesn't seem to work
// I thought the writer should delete the object?
break
}
}
}
However, for update, am I to assume that the best way is to use Item.findByKey(12345) within the LineMapper and then modify the Item and call save() within the writer?
How do I implement a delete? If I return a null from my LineMapper then the application seems to stop. I thought the writer should be deleting the object, not this? Or do I just use findByKey(12345), then pass to the writer with a delete flag set?
Apologies for the basic question, this is Day 1 of using the framework. I'm interested in understanding best-practices please.
You are close, but not quite there. What you really need your line mapper to produce is an instance of a class that contains not only the instance of the domain class to effect but also a property to indicate what action needs to be taken (presumably by an item processor, or a classifying item writer, depending on your requirements).
So something like this might work:
class MyActionContainerClass {
Object target
String actionType // U, D, N
}

Resources