IBM Integration Bus and xsd:anyType - messagebroker

I'm working with IIB v9 mxsd message definitions. I'd like to define one of the XML elements to be of type xsd:anyType. However, in the list of types I can choose from, only anySimpleType and anyUri are possible (besides all other types like string, integer, etc.).
How can I get around this limitation?

The XMLNSC parser supports the entire XML Schema specification, including xs:any and xs:anyType. In IIBv9 you should create a Library and import your xsds into it. Link your Application to the Library and the XMLNSC parser will find and use the model. You do not need to specify the name of the Library in the node properties; the XSD model will be automatically available to the entire application.
You do not need to use a message set at all in IIBv9 and later versions.
The mxsd file format is used only by the MRM (not DFDL) parser.

You shouldn't use an MXSD to model your XML data, use a normal XSD.
MXSD is for modelling data for the DFDL parser, but you should use the XMLNSC parser for XML messages and define them in XSDs, in which you can use anyType.
As far as I know DFDL doesn't support anyType.

Related

What is recommended Avro type namespace/name naming scheme with respect to schema evolution?

What is recommended naming scheme for avro types, so that schema evolution works with backward and forward compatibility and schema imports? How do you name your types? How many Schema.Parser instances do you use? One per schema, one global, or any other scheme?
The namespace / type names don't need a special scheme for naming to address compatibility.
If you need to rename something, that's what aliases are for
From what I've seen, using a parser more than once per schema causes some issues with state maintained by the parser
So technically you have 2 options, each has it's own benefits and drawbacks:
A) do include version identifier into namespace or type name
B) do NOT include version identifier into namespace or type name
Explanation: If you want to use schema evolution, you need not to include version number, as both confluent schema registry and simple object encoding does use namespaces, and uses some sort of hash/modified crc as schema fingerprint. When deserializing bytes, you have to know writer schema, and you can then evolve it into reader schema. These two need not to have same name, as schema resolution does not use namespace or type name. (https://avro.apache.org/docs/current/spec.html#Schema+Resolution) On the otherhand, Schema.Parser cannon parse more than 1 schema, which does have same Name, which is fully qualified type of schema, ie namespace.name. So it depends on your usecase, which one do you want to use, both can be used.
ad A) if you do include version identifier, you will be able to parse both(or all) version using same Schema.Parser, which means, that for example these schemas will be processable together in maven-avro-plugin (sorry I do not remember, whether I tested it in single configuration only, or if I did use multiple configurations also, you have to check it yourself). Another benefit is, that you can reference same type in different versions if needed. Drawback is, that after each version upgrade, the namespace and/or type name changes, and you would have to change imports in project. Schema resolution between writer and reader schema should work, and hopefully it will.
ad B) if you do not include version identifier, only one version could be compiled by avro-maven-plugin into java files, and you cannot have one global Schema.Parser instance in project. Why you would like to have just one global instance? It would be helpful if you don't follow bad&frequest advices to use top-level union to define multiple types in one avsc file. Well, maybe it's needed in confluent registry, but if you don't use that one, you definitely don't have to use top-level union. One can use schema imports, when Schema.Parser need to process all imports first and then finally the actual type. If you use these imports, then you have to use one Schema.Parser instance for each group of type+its imports. It's little bit declarational hassle, but it relieves you from having top-level union, which has issues on its own, and it's incorrect in principle. But if your project don't need multiple versions of same schema accessible at the same time, it's probably better than A) variant, as you don't have to change imports. Also there is opened possibility of composition of schemas if you use imports. As all versions have same namespace, you can pass arbitrary version to Schema.Parser. So if there is some a-->b association in types, one can use v2 b and use it with v3 a. Not sure if that is typical usecase, but it's possible.

Serializing Drake objects with Python

Is there any way to serialize and de-serialize objects (such as
pydrake.trajectories.PiecewisePolynomial, Expression ...) using pickle
or some other way?
It does not complain when I serialize it, but when trying to load from file
it complains:
TypeError: pybind11_object.__new__(pydrake.trajectories.PiecewisePolynomial) is not safe, use object.__new__()
Is there a list of classes you would like to serialize / pickle?
I can create an issue for you, or you can create one if you have a list already in mind.
More background:
Pickling for pybind11 (which is what pydrake uses) has to be defined manually:
https://pybind11.readthedocs.io/en/stable/advanced/classes.html#pickling-support
At present, we don't have a roadmap in Drake to serialize everything, so it's a per-class basis at present.
For example, for pickling RigidTransform: issue link and PR link
A simpler pickling example for CameraInfo: PR link
(FTR, if an object is easily recoverable from it's construction arguments, it should be trivial to define pickling.

Different coders for the same class in dataflow job

I'm trying to use different coders for the same class for two different scenarios:
Reading from JSON input files - using data = TextIO.Read.from(options.getInput()).withCoder(new Coder1())
Elsewhere in the job I want the class to be persisted using SerializableCoder using data.setCoder(SerializableCoder.of(MyClass.class)
It works locally, but fails when run in the cloud with
Caused by: java.io.StreamCorruptedException: invalid stream header: 7B227365.
Is it a supported scenario? The reason to do this in the first place is to avoid read/write of JSON format, and on the other hand make reading from input files more efficient (UTF-8 parsing is part of the JSON reader, so it can read from InputStream directly)
Clarifications:
Coder1 is my coder.
The other coder is a SerializableCoder.of(MyClass.class)
How does the system choose which coder to use? The two formats are binary incompatible, and it looks like due to some optimization, the second coder is used for data format which can only be read by the first coder.
Yes, using two different coders like that should work. (With the caveat that the coder in #2 will only be used if the system choses to persist 'data' instead of optimizing it into surround computations.)
Are you using your own Coders or ones provided by the Dataflow SDK? Quick caveat on TextIO -- because it uses newlines to encode element boundaries, you'll get into trouble if you use a coder that produces encoded values containing something that can be mistaken for a newline. You really should only use textual encodings within TextIO. We're hoping to make that clearer in the future.

Add new values to XML dynamically

I have an XML file in my app resources folder. I am trying to update that file with new dictionaries dynamically. In other words I am trying to edit an existing XML file to add new keys and values to it.
First of all can we edit a static XML file and add new dictionary with keys and values to it. What is the best way to do this.
In general, you can read an XML file into a document object (choose your language), use methods to modify it (add your new dictionary), and (re-)write it back out to either the original XML file, or a new one.
That's straightforward ... just roll up the ol' sleeves and code it up.
The real problem comes in with formatting in the XML file before and after said additions.
If you are going to 'unix diff' the XML file before and after, then order is important. Some standard XML processors do better with order than others.
If the order changes behind the scenes, and is gratuitously propagated into your output file, you lose standard diffing advantages, such as some gui differs, and some scm diffs (svn, cvs, etc.).
For example, browse to:
Order of XML attributes after DOM processing
They discuss that DOM loses order where SAX does not.
You can also write a custom XML 'diff'er (there may be such off-the-shelf ... for example check out 'http://diffxml.sourceforge.net/') that compares 2 XML documents tag-by-tag, attribute-by-attribute, etc.
Perhaps some standard XML-related tool such as XSLT will allow you to keep the formatting constant without changing tag or attribute order. You'd have to research that.
BTW, a related problem is the config (.ini) file problem ... many common processors flippantly announce that the write-order may not agree with the read-order.

Xtext: refering objects from other languages; namespaces and aliases for importURI?

I'm developing a xtext-based language which should refer to objects defined in a vendor-specific file format.
E.g. this file format defines messages, my language shall define Rules that work with these messages. Of course i want to use xtext features e.g. to autocomplete/validate message names, attributes etc.
Not sure if that is a good idea, but I came up with the following:
Use one xtext project to describe the file format
Add a dependency for this project to my DSL project, import the file format grammar to my grammar
import the description files via importURI
FileFormat grammar:
grammar com.example.xtext.fileformat.FileFormat;
generate fileformat "http://xtext.example.com/fileformat/FileFormat"
[...]
DSL grammar:
grammar com.example.xtext.dsl.DSL;
import "http://xtext.example.com/fileformat/FileFormat" AS ff;
Model:
rules += Rule*;
Rule: ImportFileRule | SampleRule;
ImportFileRule: "IMPORT" importURI=STRING "AS" name=ID ";";
SampleRule: "FORWARD" msg=[ff::Message] ";"
First of all: This works fine.
Now, different imported files may define messages with colliding names,
and I want to use fully qualified names for messages anyways.
The prefix for the message names should be defined in my DSL, e.g. the name of the ImportFileRule.
So I would like to use something like:
IMPORT "first-incredibly-long-filename-with-version-and-stuff.ff" AS first;
IMPORT "second-incredibly-long-filename-with-version-and-stuff.ff" AS second;
FORWARD first.msg_1; // references to msg_1 in first file
FORWARD second.msg_1; // references to msg_1 in second file
Unfortunately I don't see a easy way to achieve this with xtext.
At the moment I'm using a ID for the namespace qualifier and custom ProposalProvider/Validator classes,
which is ugly in detail and bypasses the EMF index, becoming slow with files of 1000 messages and 50000 attributes...
Would there be a right way to do it?
Was it a good idea to use xtext to parse the definition files in the first place?
I have two ideas what to check.
Xtext has a specific global scope provider called ImportedNameSpaceAwareScopeProvider. By using an overridden version of this, you could specify other headers to consider.
Check the implementation of the xtext grammar itself, as it supports such a feature with EPackage imports. I am not exactly sure, how it operates, but should work this way.
Finally, I ended up using the SimpleNamesFragment, ImportURIScopingFragment and a custom ScopeProvider derived from AbstractDeclarativeScopeProvider.
That way, I had to implement ScopeProvider methods for quiet a few rules but was much more flexible in using my "namespace prefix".
E.g. it is simple to implement syntaxes like
FORWARD FROM first: msg_01, msg_02;

Resources