Configure Schema Registry compatibility in upsert-like strategy - avro

I have schema registry with FULL_TRANSITIVE compatibility in order to be able to read old data with new schema and vice-versa, and this is completely fine for my purposes.
The problem is that I need have in some place (schema registry possibly, the last version of the schema) the fully merged schema for my subject.
Let's have an example (everything is Avro):
I write data {"foo" :"foo", "bar":"bar"} with the corresponding schema with foo and bar as optional string fields, so we have schema version 1 for that subject.
Then, I write {"foo" :"foo", "bar":"bar", "baz":"baz"}, where baz is optional string field (+ foo and bar), and so the schema evolves, putting also baz, we have schema version 2.
Now I write {"foo" :"foo", "bar":"bar", "id":"id"}, where id is optional string field (+ foo and bar, but now without baz, that's the problem here) The schema evolves, putting also id, and we have schema version 3.
The issue here is that in the last step, in the schema version 3, i have only foo, bar, id as field, and even though I can read and write perfectly thanks to the FULL_TRANSITIVE compatibility, I don't have in any place the full schema with all foo, bar, baz and id, that are string optional fields.
Is there a way to obtain such behavior in schema registry, so maintaining the FULL_TRANSITIVE compatibility but having the complete schema (with all the possibile values) as the last version of the schema, but with the requirement that I can't put the complete schema with missing fields, but only with fields included in the payload?
I was thinking about asking the latest schema before writing the record, in order to build the complete record, but probably is not good for performances.
Thank you

Related

Avro schema update optional field to required

I have an avro schema. It has a couple of fields. Among which is a field , which is presently optional ie, its type is ["null","string"]
Now , i want to make it non nullable, ie it shall not have null type.
Yet on making the change I get 409 Eror.
How can I make this change and not affect the avro schema backward compatibility check , or not make a breaking change

AvroData replaces nulls with schema default values

i'm using io.confluent.connect.avro.AvroData.fromConnectData to convert message before serialization.
AvroData uses struct.get(field) to get values which in turn replaces nulls with schema default values.
as i understand from avro doc default values should be used for schema compatibility when reader expects field that missing in writer schema (not particular message).
so my question is: is it correct way to replace nulls with schema default value? or maybe i should use another way to convert messages?
The miss understanding is that the default value is not used to replace null values, it is used to populate your field value in case that your data does not include the field. This is primary used for schema evolution purposes. What you are trying to do (replace null values coming as part of your data with another value) is not possible through avro schemas, you will need to deal with it in your program.

Implementing a unique surrogate key in Advantage Database Server

I've recently taken over support of a system which uses Advantage Database Server as its back end. For some background, I have years of database experience but have never used ADS until now, so my question is purely about how to implement a standard pattern in this specific DBMS.
There's a stored procedure which has been previously developed which manages an ID column in this manner:
#ID = (SELECT ISNULL(MAX(ID), 0) FROM ExampleTable);
#ID = #ID + 1;
INSERT INTO Example_Table (ID, OtherStuff)
VALUES (#ID, 'Things');
--Do some other stuff.
UPDATE ExampleTable
SET AnotherColumn = 'FOO'
WHERE ID = #ID;
My problem is that I now need to run this stored procedure multiple times in parallel. As you can imagine, when I do this, the same ID value is getting grabbed multiple times.
What I need is a way to consistently create a unique value which I can be sure will be unique even if I run the stored procedure multiple times at the same moment. In SQL Server I could create an IDENTITY column called ID, and then do the following:
INSERT INTO ExampleTable (OtherStuff)
VALUES ('Things');
SET #ID = SCOPE_IDENTITY();
ADS has autoinc which seems similar, but I can't find anything conclusively telling me how to return the value of the newly created value in a way that I can be 100% sure will be correct under concurrent usage. The ADS Developer's Guide actually warns me against using autoinc, and the online help files offer functions which seem to retrieve the last generated autoinc ID (which isn't what I want - I want the one created by the previous statement, not the last one created across all sessions). The help files also list these functions with a caveat that they might not work correctly in situations involving concurrency.
How can I implement this in ADS? Should I use autoinc, some other built-in method that I'm unaware of, or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place? If I should use autoinc, how can I obtain the value that has just been inserted into the table?
You use LastAutoInc(STATEMENT) with autoinc.
From the documentation (under Advantage SQL->Supported SQL Grammar->Supported Scalar Functions->Miscellaneous):
LASTAUTOINC(CONNECTION|STATEMENT)
Returns the last used autoinc value from an insert or append. Specifying CONNECTION will return the last used value for the entire connection. Specifying STATEMENT returns the last used value for only the current SQL statement. If no autoinc value has been updated yet, a NULL value is returned.
Note: Triggers that operate on tables with autoinc fields may affect the last autoinc value.
Note: SQL script triggers run on their own SQL statement. Therefore, calling LASTAUTOINC(STATEMENT) inside a SQL script trigger would return the lastautoinc value used by the trigger's SQL statement, not the original SQL statement which caused the trigger to fire. To obtain the last original SQL statement's lastautoinc value, use LASTAUTOINC(CONNECTION) instead.
Example: SELECT LASTAUTOINC(STATEMENT) FROM System.Iota
Another option is to use GUIDs.
(I wasn't sure but you may have already been alluding to this when you say "or do I genuinely need to do as the developer's guide suggests, and generate my unique identifiers before trying to insert into the table in the first place." - apologies if so, but still this info might be useful for others :) )
The use of GUIDs as a surrogate key allows either the application or the database to create a unique identifier, with a guarantee of no clashes.
Advantage 12 has built-in support for a GUID datatype:
GUID and 64-bit Integer Field Types
Advantage server and clients now support GUID and Long Integer (64-bit) data types in all table formats. The 64-bit integer type can be used to store integer values between -9,223,372,036,854,775,807 and 9,223,372,036,854,775,807 with no loss of precision. The GUID (Global Unique Identifier) field type is a 16-byte data structure. A new scalar function NewID() is available in the expression engine and SQL engine to generate new GUID. See ADT Field Types and Specifications and DBF Field Types and Specifications for more information.
http://scn.sap.com/docs/DOC-68484
For earlier versions, you could store the GUIDs as a char(36). (Think about your performance requirements here of course.) You will then need to do some conversion back and forth in your application layer between GUIDs and strings. If you're using some intermediary data access layer, e.g. NHibernate or Entity Framework, you should be able to at least localise the conversions to one place.
If some part of your logic is in a stored procedure, you should be able to use the newid() or newidstring() function, depending on the type of the backing column:
INSERT INTO Example_Table (newid(), OtherStuff)

Should I use a rails integer id for an imported public data table that uses fixed length strings for id?

I need to use a public government data table, imported via csv as a table in my rails application (using postgresql). The table in question uses a fixed length 12 digit numeric string (often with at least one leading zero) as its primary key. I feel as if I have three choices here:
Add a rails-generated integer primary key upon import
Ask rails to interpret the string as an integer and use that as my primary key.
Force rails to use a string as the primary key (and then, subsequently as the foreign key in other associated tables as well)
I'm worried about doing choice 1 because I will likely need to re-import this government data wholesale at least yearly as it gets updated in order to keep my database current. It seems like it would be really complicated to ensure that the rails primary keys stay with the correct record after a re-import if records have been added and deleted
Choice 2 seems like the way to go, (it would solve the re-import problem) but I'm not clear how to go about it. Is it as simple as telling rails to import that column as an integer?
Choice 3 seems doable, but in posts I've read elsewhere, it's not a very "railsy" way to go about it.
I'd welcome any advice or out and out solutions on this.
Update
I ended up using choice 1, and it was the right choice. That's becauseI am new to rails, and had thought I'd be doing some direct, bulk import (on the back end, directly into Postgresql) would leave me with the problem I described above. However, I looked through Railscast 369 which explains how to build csv import capabilities into your application as an end-user function. It's remarkably easy and that's what I did. As a result of doing things this way, the application does a row-by-row import, and can thus have the appropriate checks built in at that level.

Change type of field in Mongoid without losing data

How can I run a migration to change the type of a field in Mongoid/MongoDB without losing any data?
In my case I'm trying to convert from a BigDecimal (stored as string) to an Integer to store some money. I need to convert the string decimal representation to cents for the integer. I don't want to lose the existing data.
I'm assuming the steps might be something like:
create new Integer field with a new name, say amount2
deploy to production and run a migration (or rake task) that converts each amount to the right value for amount2
(this whole time existing code is still using amount and there is no downtime from the users perspective)
take the site down for maintenance, run the migration one more time to capture any amount fields that could have changed in the last few minutes
delete amount and rename amount2 to amount
deploy new code which expects amount to be an integer
bring site back up
It looks like Mongoid offers a rename method: http://mongoid.org/docs/persistence/atomic.html#rename
But I'm a little confused how this is used. If you have a field named amount2 (and you've already deleted amount), do you just run Transaction.rename :amount2, :amount? Then I imagine this immediately breaks the underlying representation so you have to restart your app server after that? What happens if you run that while amount still exists? Does it get overwritten, fail, or try to convert on it's own?
Thanks!
Ok I made it through. I think there is a faster way using the mongo console with something like this:
MongoDB: How to change the type of a field?
But I couldn't get the conversion working, so opted for this slower method in the rails console with more downtime. If anyone has a faster solution please post it.
create new Integer field with a new name, say amount2
convert each amount to the right value for amount2 in a console or rake task
Mongoid.identity_map_enabled = false
Transaction.all.each_with_index do |t,i|
puts i if i%1000==0
t.amount2 = t.amount.to_money
break if !t.save
end
Note that .all.each works fine (you don't need to use .find_each or .find_in_batches like regular activerecord with mysql) because of mongodb cursors. It won't fill up memory as long as the identity_map is off.
take the site down for maintenance, run the migration one more time to capture any amount fields that could have changed in the last few minutes (something like Transaction.where(:updated_at.gt => 1.hour.ago).each_with_index...
comment out field :amount, type: BigDecimal in your model, you don't want mongoid to know about this field anymore, and push this code
now run another script to rename your column (it overwrites any old BigDecimal string values in the process). You might need to comment any validations you have on the model which expect the old field.
Mongoid.identity_map_enabled = false
Transaction.all.each_with_index do |t,i|
puts i if i%1000==0
t.rename :amount2, :amount
end
This is atomic and doesn't require a save on the model.
update your model to reflect the new column type field :amount, type: Integer
deploy and bring the site back up
As mentioned I think there is a better way, so if anyone has some tips please share. Thanks!

Resources