Encode GenericData.Record field separately as encoded key - avro

I am trying to use Avro to encode key / value pairs, but can't figure out how to encode just a single field in a schema / GenericData.Record in order to make the key.
Take this simple schema:
{"name":"TestRecord", "type":"record", "fields":[
{"name":"id", "type":"long"},
{"name":"name", "type":"string"},
{"name":"desc", "default":null, "type":["null","string"]}
]}
I am encoding records like this:
val testRecordSchema = schemaParser.parse(testRecordSchemaString)
val writer = new GenericDatumWriter[GenericRecord](testRecordSchema)
val baos = new ByteArrayOutputStream()
val encoder = EncoderFactory.get().binaryEncoder(baos, null)
val record = new org.apache.avro.generic.GenericData.Record(schema)
record.put("id", 1L)
record.put("name", "test")
writer.write(record, encoder)
encoder.flush
But now say I wanted to encode separately just the id field, to use as the key, and I want to do it by name because sometimes I want to use the name field as the key instead of the id.
I tried multiple permutations of GenericDatumWriter. GenericDatumWriter has a method called writeField that looks promising, but it is protected. Otherwise it looks like you have to write complete records.
I could wrap my field in a new record type defined in a new schema, for example:
{"name":"TestRecordKey", "type":"record", "fields":[
{"name":"id", "type":"long"}
]}
I'm 100% sure I can make that work, but then I have to create a new record type and manage it for every key field. That's not minor, and it really seems like there should be some more simple way to do this.

As it turns out, it wasn't that difficult just to create a new record-type schema with only one field -- the field I want to use as the key, like the example I have above:
{"name":"TestRecordKey", "type":"record", "fields":[
{"name":"id", "type":"long"}
]}
I do it on the fly, as I initialize my Schema.Parser with the payload schemas -- I just create the key schema based on the payload schema programmatically.
Was hoping for a less long-hand solution, but this works. I'll still upvote and accept any solution that is cleaner.

Related

Avro schema: adding a new field with default value - straight default value or a union with null?

So I have an avro record like so (call it v1):
record MyRecord {
array<string> keywords;
}
I'd like to add a field caseSensitive with a default value of false (call it v2). The first approach I have is:
record MyRecord {
array<string> keywords;
boolean caseSensitive = false;
}
According to schema evolution, this is both backward and forward compatible because a reader with the new schema v2 reading a record that was encoded with old writer schema v1 will be able to fill this field with the default value and a reader with older schema v1 will be able to read a record encoded with the new writer schema v2 because it will just ignore the newly added field.
Another way to add this field is by adding a union type of null and boolean with a default value of null, like so:
record MyRecord {
array<string> keywords;
union{null, boolean} caseSensitive = null;
}
This is also backward and forward compatible. I can see that sometimes one would want to use the 2nd approach if there is no clear default value for a field (such as name, address, etc.). But given my use case with a clear default value, I'm thinking of going with the first solution. My question is: is there any other concerns that I'm missing here?
There will be a potential issue with writers in the first case--apparently writers do not use default values. So a writer writing "old data" (missing the new field--so writer is publishing a record with the "keywords" field only) will blow up against the first schema. Same writer using second schema will be successful, and the "caseSensitive" field will be set to null in the resulting message.

Saving record in RavenDb with F# adding extra Id column

When I save a new F# Record, I'm getting an extra column called Id# in the RavenDb document, and it shows up when I load or view the object in code; it's even being converted to JSON through my F# API.
Here is my F# record type:
type Campaign = { mutable Id : string; name : string; description : string }
I'm not doing anything very exciting to save it:
let save c : Campaign =
use session = store.OpenSession()
session.Store(c)
session.SaveChanges()
c
Saving a new instance of a record creates a document with the Id of campaigns/289. Here is the full value of the document in RavenDb:
{
"Id#": "campaigns/289",
"name": "Recreating Id bug",
"description": "Hello StackOverflow!"
}
Now, when I used this same database (and document) in C#, I didn't get the extra Id# value. This is what a record looks like when I saved it in C#:
{
"Description": "Hello StackOverflow!",
"Name": "Look this worked fine",
}
(Aside - "name" vs "Name" means I have 2 name columns in my document. I understand that problem, at least).
So my question is: How do I get rid of the extra Id# property being created when I save an F# record in RavenDb?
As noted by Fyodor, this is caused by how F# generates a backing field when you create a record type. The default contract resolver for RavenDB serializes that backing field instead of the public property.
You can change the default contract resolver in ravendb. It will look something like this if you want to use the Newtonsoft Json.Net:
DocumentStore.Conventions.JsonContractResolver <- new CamelCasePropertyNamesContractResolver()
There is an explanation for why this works here (see the section titled: "The explanation"). Briefly, the Newtonsoft library uses the public properties of the type instead of the private backing fields.
I also recommend, instead of having the mutable property on the Id, you can put the [<CLIMutable>] attribute on the type itself like:
[<CLIMutable>]
type Campaign = { Id : string; name : string; description : string }
This makes it so libraries can mutate the values while preventing it in your code.
This is a combination of... well, you can't quite call them "bugs", so let's say "non-straightforward features" in both F# compiler and RavenDb.
The F# compiler generates a public backing field for the Id record field. This field is named Id# (a standard pattern for all F# backing fields), and it's public, because the record field is mutable. For immutable record fields, backing fields will be internal. Why it needs to generate a public backing field for mutable record fields, I don't know.
Now, RavenDb, when generating the schema, apparently looks at both properties and fields. This is a bit non-standard. The usual practice is to consider only properties. But alas, Raven picks up the public field named Id#, and makes it part of the schema.
You can combat this problem in two ways:
First, you could make the Id field immutable. I'm not sure whether that would work for you or RavenDb. Perhaps not, since the Id is probably generated on insert.
Second, you could declare your Campaign not as an F# record, but as a true class:
type Campaign( id: int, name: string, description: string ) =
member val Id = id with get, set
member val name = name
member val description = description
This way, all backing fields stay internal and no confusion will arise. The drawback is that you have to write every field twice: first as constructor argument, then as class member.

Lua nested tables, table.insert function

i started learning lua and now i'm trying to deal with nested tables.
Basically i want to create a kind of local "database" using json interaction with lua (i found out that was the best thing to store my values)...
what i supposed to do is to scan all members inside a chatgroup (i'm using an unofficial telegram api) and store some values inside a table. I was able to retrieve all datas needed, so here's the structure declared in main function:
local dbs = load_data("./data/database.json")
dbs[tostring(msg.to.id)] = {
gr_name = {},
timestamp = "",
user = { --user changes into user ids
us_name = {},
us_nickname = {},
us_role = ""
},
}
where msg.to.id contains a valid number. This is what i tried to do:
dbs[tostring(id)]['users'][tostring(v.peer_id)]['us_nickname'] = v.username
this one works but this one:
dbs[tostring(id)]['users'][tostring(v.peer_id)] = table.insert(us_name,v.print_name)
(id is a correct number and matches with first field, same as all values passed like v.peer_id and v.print_name so those are not the problem)
gives error "table expected"... i'm pretty sure i have totally no idea of how to insert an element in such a table like mine.
Can anyone of you be so kind to help me? I hope to be clear enough explaining my issue.
Thanks in advance to everyone :)
To add new user name to an existing user you probably want to insert it into the sub-table like this:
table.insert(dbs[tostring(id)]['users'][tostring(v.peer_id)].us_name, v.print_name)

How can I create friendly URLs with MongoDB/Node.js?

For example suppose in designing a blog application I want something like
domain.com/post/729
Instead of
domain.com/post/4f89dca9f40090d974000001
Ruby has the following
https://github.com/hakanensari/mongoid-slug
Is there an equivalent in Node.js?
The id in MongoDB is actually a hexadecimal value to convert that into a numerical value you can use the following code to search for numerical value in the database like 1, 2, 3.. and this code will convert that value into appropriate hex
article_collection.db.json_serializer.ObjectID.createFromHexString(id)
where article_collection is your collection object
There are a few ways :
1- Assuming you are trying to provide a unique id to each blog post .
Why not overwrite the '_id' field of your documents in the blogs collection ?
Sample document would be :
{ "_id" : 122 , "content" : { "title: ..... }
You will have to look out for a method to generate an autoincrement id though, which is pretty easy.
This type of primary keys are however not recommended.
http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
2- Let the _id field remain as it is, and additionaly store a key 'blogid' which is an integer, you will have to run ensureIndex on 'blogid` field though to make access by blogid fast. Storage overhead would be minor, as you will be storing a keyname and an integer more in your document.
Sample document would be :
{ "_id" : xxxxxxxxxx ,"blogid" : 122, "content" : { "title: ..... }
There are a bunch of different projects on GitHub like https://github.com/dodo/node-slug and https://github.com/stipsan/String.Slugify.js but they focus on making valid URLs out of strings (usually the post subject or article title). I haven't seen any that take a random number and some how produce a shorter random (?) and unique number.
Personally I just have a token field on my post object that contains a unique value that is shorter than just using the DB id directly (and a tiny bit more secure). If you are using Mongoose, the token can be generated automatically by hooking the pre 'Save' event on your Mongoose model.

Persisting string array in single column

Our main domain object has multiple string[] properties (various configuration options) and we are thinking of an elegant way to persist the data. GORM creates joined table for the each array so we end up with about dozen joined tables.
I wonder if it would be possible to serialize each array into single column of the main table (someway delimited) and parse it back into array onload?
Do you have suggestions how to do this? I'm thinking either hibernate usertype or grails property editor? I spent some time with usertypes but without luck.
thanks
pk
you could put the parameters into a map / array, then store them in a db field as Json:
def someDomainInstance = new SomeDomain()
def paramMap = [name:'John', age:24]
someDomainInstance.paramJson = paramMap as JSON
someDomainInstance.save()
Then you can easily convert this string back to a map / array when you interrogate the DB:
def paramMapFromDB = JSON.parse(someDomainInstance.paramJson)
assertEquals 24, paramMapFromDB.age
Something like that, I haven't tested the syntax, but that's the general idea.

Resources