avro code generation /Dynamic typing - avro

This might be a silly question, but can anyone explain to me what is meant by dynamic typing ,code generation in context to AVRO ? . I am pretty new to AVRO and would really appreciate if someone can help me in detail to understand this.
Also , AVRO has a datatype named fixed, what would be a practical scenario to use this data type ?

Code generation refers to the code which we generate based on our file schema using avro tools. So when you say serialize/deserialize the data without code generation then it means you rae including the schema in your program and vice-versa. You can read more from https://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing+and+deserializing+without+code+generation

Related

Getting "chunk write below min size" when trying to write to Google Cloud using gcsfs

I have a script which uses gcsfs to write data to Google Cloud. Most of the time it works, but fairly regularly I get the following error:
ValueError: Non-final chunk write below min size.
This error seems to come from GCSFile._upload_chunk.
I can't really find anything in the docs that explains what might be going wrong here. I read this thread which suggests it might be related to how the data is committed (should I disable autocommit?) but I'm not sure it's entirely relevant. I read through the source of that function but that didn't help too much either. Would appreciate any guidance!
My code looks like this:
with gcs.open(file_path, mode='w') as f:
f.write('\n'.join(output_data))
output_data here is a list of strings. gcs is an instance of gcsfs.GCSFileSystem.
This issue apparently no longer happens in v0.7.0. Anyone facing it should upgrade.

IBM Integration Bus and xsd:anyType

I'm working with IIB v9 mxsd message definitions. I'd like to define one of the XML elements to be of type xsd:anyType. However, in the list of types I can choose from, only anySimpleType and anyUri are possible (besides all other types like string, integer, etc.).
How can I get around this limitation?
The XMLNSC parser supports the entire XML Schema specification, including xs:any and xs:anyType. In IIBv9 you should create a Library and import your xsds into it. Link your Application to the Library and the XMLNSC parser will find and use the model. You do not need to specify the name of the Library in the node properties; the XSD model will be automatically available to the entire application.
You do not need to use a message set at all in IIBv9 and later versions.
The mxsd file format is used only by the MRM (not DFDL) parser.
You shouldn't use an MXSD to model your XML data, use a normal XSD.
MXSD is for modelling data for the DFDL parser, but you should use the XMLNSC parser for XML messages and define them in XSDs, in which you can use anyType.
As far as I know DFDL doesn't support anyType.

Cache XMLProvider generated model(s)

Using XMLProvider from the FSharp.Data package like:
type internal MyProvider = XmlProvider<Sample = "C:\test.xml">
The test.xml file contains a total of 151,838 lines which makes up 15 types.
Working in the same project as the type declaration MyProvider is a pain, as it seems the XmlProvider is triggered everytime I hit CTRL+SPACE (Edit.CompleteWord) - and therefore regenerates all the models, which can take up to 10sec.
Is there any known work around, or setting to cache the generated models from XmlProvider?
I'm afraid F# Data does not currently have any caching mechanism for the inferred schema. It sounds like something that should not be too hard to add - if anyone is interested in contributing, please open an issue on GitHub to start the discussion!
My recommendation for the time being would be to try to simplify the sample XML, so that it is shorter and contains just a few representative records of all the different kinds.

Different coders for the same class in dataflow job

I'm trying to use different coders for the same class for two different scenarios:
Reading from JSON input files - using data = TextIO.Read.from(options.getInput()).withCoder(new Coder1())
Elsewhere in the job I want the class to be persisted using SerializableCoder using data.setCoder(SerializableCoder.of(MyClass.class)
It works locally, but fails when run in the cloud with
Caused by: java.io.StreamCorruptedException: invalid stream header: 7B227365.
Is it a supported scenario? The reason to do this in the first place is to avoid read/write of JSON format, and on the other hand make reading from input files more efficient (UTF-8 parsing is part of the JSON reader, so it can read from InputStream directly)
Clarifications:
Coder1 is my coder.
The other coder is a SerializableCoder.of(MyClass.class)
How does the system choose which coder to use? The two formats are binary incompatible, and it looks like due to some optimization, the second coder is used for data format which can only be read by the first coder.
Yes, using two different coders like that should work. (With the caveat that the coder in #2 will only be used if the system choses to persist 'data' instead of optimizing it into surround computations.)
Are you using your own Coders or ones provided by the Dataflow SDK? Quick caveat on TextIO -- because it uses newlines to encode element boundaries, you'll get into trouble if you use a coder that produces encoded values containing something that can be mistaken for a newline. You really should only use textual encodings within TextIO. We're hoping to make that clearer in the future.

c++ dom parsing problem

hi every body
i'm new to xercses C++dom parser.can anyone tell me
can we write our own getElementsBytagName,GetNodeValue etc function ?
how to write these function and use them in my code ?
can anybody explain me the process of Dom parsing?
xerces provides methods like getElementsBytagName so you can use them in your application.
Have a look at xerces programming guide you can find there how to parse xml file and how to get elements and attributes.

Resources