What is recommended Avro type namespace/name naming scheme with respect to schema evolution? - avro

What is recommended naming scheme for avro types, so that schema evolution works with backward and forward compatibility and schema imports? How do you name your types? How many Schema.Parser instances do you use? One per schema, one global, or any other scheme?

The namespace / type names don't need a special scheme for naming to address compatibility.
If you need to rename something, that's what aliases are for
From what I've seen, using a parser more than once per schema causes some issues with state maintained by the parser

So technically you have 2 options, each has it's own benefits and drawbacks:
A) do include version identifier into namespace or type name
B) do NOT include version identifier into namespace or type name
Explanation: If you want to use schema evolution, you need not to include version number, as both confluent schema registry and simple object encoding does use namespaces, and uses some sort of hash/modified crc as schema fingerprint. When deserializing bytes, you have to know writer schema, and you can then evolve it into reader schema. These two need not to have same name, as schema resolution does not use namespace or type name. (https://avro.apache.org/docs/current/spec.html#Schema+Resolution) On the otherhand, Schema.Parser cannon parse more than 1 schema, which does have same Name, which is fully qualified type of schema, ie namespace.name. So it depends on your usecase, which one do you want to use, both can be used.
ad A) if you do include version identifier, you will be able to parse both(or all) version using same Schema.Parser, which means, that for example these schemas will be processable together in maven-avro-plugin (sorry I do not remember, whether I tested it in single configuration only, or if I did use multiple configurations also, you have to check it yourself). Another benefit is, that you can reference same type in different versions if needed. Drawback is, that after each version upgrade, the namespace and/or type name changes, and you would have to change imports in project. Schema resolution between writer and reader schema should work, and hopefully it will.
ad B) if you do not include version identifier, only one version could be compiled by avro-maven-plugin into java files, and you cannot have one global Schema.Parser instance in project. Why you would like to have just one global instance? It would be helpful if you don't follow bad&frequest advices to use top-level union to define multiple types in one avsc file. Well, maybe it's needed in confluent registry, but if you don't use that one, you definitely don't have to use top-level union. One can use schema imports, when Schema.Parser need to process all imports first and then finally the actual type. If you use these imports, then you have to use one Schema.Parser instance for each group of type+its imports. It's little bit declarational hassle, but it relieves you from having top-level union, which has issues on its own, and it's incorrect in principle. But if your project don't need multiple versions of same schema accessible at the same time, it's probably better than A) variant, as you don't have to change imports. Also there is opened possibility of composition of schemas if you use imports. As all versions have same namespace, you can pass arbitrary version to Schema.Parser. So if there is some a-->b association in types, one can use v2 b and use it with v3 a. Not sure if that is typical usecase, but it's possible.


Consuming contents of declare_directory

I have rule A implemented with a macro that uses declare_directory to produce a set of files:
output = ctx.actions.declare_directory("selected")
Names of those files are not known in advance. The implementation returns the directory created by declare_directory with the following:
return DefaultInfo(
files = depset([output]),
Rule A is included in "srcs" attribute of rule B. Rule B is also implemented with a macro. Unfortunately the list of files passed to B implementation through "srcs" attribute only contains the "selected" directory created by rule A instead of files residing in that directory.
I know that Args class supports expansion of directories so I could pass names of all files in "selected" directory to a single action. What I need, however, is a separate action for every individual file for parallelism and caching. What is the best way to achieve that?
This is one of the intended use cases of directory outputs (called TreeArtifacts in the implementation), and it's implemented using ActionTemplate:
However, this is not exposed to Starlark, and has only a couple usages currently, in the Android rules AndroidBinary.java and C++ rules CcCompilationHelper.java. The Android rules and C++ rules are going to be migrated over to Starlark, so this functionality might eventually be made available in Starlark, but I'm not sure of any concrete timelines. It would probably be good to file a feature request on Github.

IBM Integration Bus and xsd:anyType

I'm working with IIB v9 mxsd message definitions. I'd like to define one of the XML elements to be of type xsd:anyType. However, in the list of types I can choose from, only anySimpleType and anyUri are possible (besides all other types like string, integer, etc.).
How can I get around this limitation?
The XMLNSC parser supports the entire XML Schema specification, including xs:any and xs:anyType. In IIBv9 you should create a Library and import your xsds into it. Link your Application to the Library and the XMLNSC parser will find and use the model. You do not need to specify the name of the Library in the node properties; the XSD model will be automatically available to the entire application.
You do not need to use a message set at all in IIBv9 and later versions.
The mxsd file format is used only by the MRM (not DFDL) parser.
You shouldn't use an MXSD to model your XML data, use a normal XSD.
MXSD is for modelling data for the DFDL parser, but you should use the XMLNSC parser for XML messages and define them in XSDs, in which you can use anyType.
As far as I know DFDL doesn't support anyType.

Cache XMLProvider generated model(s)

Using XMLProvider from the FSharp.Data package like:
type internal MyProvider = XmlProvider<Sample = "C:\test.xml">
The test.xml file contains a total of 151,838 lines which makes up 15 types.
Working in the same project as the type declaration MyProvider is a pain, as it seems the XmlProvider is triggered everytime I hit CTRL+SPACE (Edit.CompleteWord) - and therefore regenerates all the models, which can take up to 10sec.
Is there any known work around, or setting to cache the generated models from XmlProvider?
I'm afraid F# Data does not currently have any caching mechanism for the inferred schema. It sounds like something that should not be too hard to add - if anyone is interested in contributing, please open an issue on GitHub to start the discussion!
My recommendation for the time being would be to try to simplify the sample XML, so that it is shorter and contains just a few representative records of all the different kinds.

How to define and use precompiled variable in delphi directives

I want to define a precompile string variable and use it in {$include} directive in delphi, for example:
{$define FILE_NAME "lockfile"}
{$include FILE_NAME'.txt.1'}
{$include FILE_NAME'.txt.2'}
For security reasons (this is part of our licensing system), we don't want to use normal strings and file reading functions. Is there any capability for this purpose in Delphi?
The $INCLUDE directive does not support indirection on the file name. So, the following code:
someconst = 'foo';
{$INCLUDE someconst}
leads to the following error:
F1026 File not found: 'someconst.pas'
If you must use an include file, you must apply the indirection by some other means. One way could be to use the fact that the compiler will search for the included file by looking on the search path. So, if you place each client specific include file in a different directory, then you can add the client specific directory to the search path as part of your build process.
FWIW, I find it hard to believe that this will make your program more immune to hacking. I think that a more likely outcome is that your program will be just as susceptible to hacking, but that it will become much more difficult and error prone for you to build and distribute the program.
You requirement may be better satisfied by the proper use of a VCS system. You need "branches" for every customer where customer-specific files contains customer-specific data. This will avoid to litter your code with complex directive to manage each customer - file names stays the same, just their content is different in each branch. Adding a new customer just requires to create a new branch and update files there.
Then you just need get each branch and compile it for each customer to get the final executable(s) with customer specific data built in.

Xtext: refering objects from other languages; namespaces and aliases for importURI?

I'm developing a xtext-based language which should refer to objects defined in a vendor-specific file format.
E.g. this file format defines messages, my language shall define Rules that work with these messages. Of course i want to use xtext features e.g. to autocomplete/validate message names, attributes etc.
Not sure if that is a good idea, but I came up with the following:
Use one xtext project to describe the file format
Add a dependency for this project to my DSL project, import the file format grammar to my grammar
import the description files via importURI
FileFormat grammar:
grammar com.example.xtext.fileformat.FileFormat;
generate fileformat "http://xtext.example.com/fileformat/FileFormat"
DSL grammar:
grammar com.example.xtext.dsl.DSL;
import "http://xtext.example.com/fileformat/FileFormat" AS ff;
rules += Rule*;
Rule: ImportFileRule | SampleRule;
ImportFileRule: "IMPORT" importURI=STRING "AS" name=ID ";";
SampleRule: "FORWARD" msg=[ff::Message] ";"
First of all: This works fine.
Now, different imported files may define messages with colliding names,
and I want to use fully qualified names for messages anyways.
The prefix for the message names should be defined in my DSL, e.g. the name of the ImportFileRule.
So I would like to use something like:
IMPORT "first-incredibly-long-filename-with-version-and-stuff.ff" AS first;
IMPORT "second-incredibly-long-filename-with-version-and-stuff.ff" AS second;
FORWARD first.msg_1; // references to msg_1 in first file
FORWARD second.msg_1; // references to msg_1 in second file
Unfortunately I don't see a easy way to achieve this with xtext.
At the moment I'm using a ID for the namespace qualifier and custom ProposalProvider/Validator classes,
which is ugly in detail and bypasses the EMF index, becoming slow with files of 1000 messages and 50000 attributes...
Would there be a right way to do it?
Was it a good idea to use xtext to parse the definition files in the first place?
I have two ideas what to check.
Xtext has a specific global scope provider called ImportedNameSpaceAwareScopeProvider. By using an overridden version of this, you could specify other headers to consider.
Check the implementation of the xtext grammar itself, as it supports such a feature with EPackage imports. I am not exactly sure, how it operates, but should work this way.
Finally, I ended up using the SimpleNamesFragment, ImportURIScopingFragment and a custom ScopeProvider derived from AbstractDeclarativeScopeProvider.
That way, I had to implement ScopeProvider methods for quiet a few rules but was much more flexible in using my "namespace prefix".
E.g. it is simple to implement syntaxes like
FORWARD FROM first: msg_01, msg_02;
