How in fluentd convert field type - fluentd

In Logstash, the type of field can be changed using the mutate construction.
How can this be done in Fluentd?

User record_transformer.
This filter allows you to convert records with arbitrary ruby expressions.
This is included in Fluentd’s core. No installation required.
For details, refer to the following documents:
http://docs.fluentd.org/v0.12/articles/filter_record_transformer

Related

Dropwizard configuration with list of strings in environment variable with default fallback?

Dropwizard takes .yml configurations which of course allow lists. It also has the ${FOO:-bar} syntax which allows reading environment variables with default fallback.
Is there a way to read lists of strings from an env variable and parse automatically with Dropwizard or do I have to do it manually? What is the syntax to have a default fallback list if this exists? I could not find this in their documentation.
Thic can be achived by using a json format in the default parameter:
supported_locales: ${SUPPORTED_LOCALES:-[en,da_DK,pt_BR,es]}
This will make a config parameter called supported_locales which takes a SUPPORTED_LOCALES env variable (json array of strings format), and defaults to the array provided after the - character in this case [en,da_DK,pt_BR,es]

Serializing Drake objects with Python

Is there any way to serialize and de-serialize objects (such as
pydrake.trajectories.PiecewisePolynomial, Expression ...) using pickle
or some other way?
It does not complain when I serialize it, but when trying to load from file
it complains:
TypeError: pybind11_object.__new__(pydrake.trajectories.PiecewisePolynomial) is not safe, use object.__new__()
Is there a list of classes you would like to serialize / pickle?
I can create an issue for you, or you can create one if you have a list already in mind.
More background:
Pickling for pybind11 (which is what pydrake uses) has to be defined manually:
https://pybind11.readthedocs.io/en/stable/advanced/classes.html#pickling-support
At present, we don't have a roadmap in Drake to serialize everything, so it's a per-class basis at present.
For example, for pickling RigidTransform: issue link and PR link
A simpler pickling example for CameraInfo: PR link
(FTR, if an object is easily recoverable from it's construction arguments, it should be trivial to define pickling.

Apache Beam and avro : Create a dataflow pipeline without schema

I am building a dataflow pipeline with Apache beam. Below is the pseudo code:
PCollection<GenericRecord> rows = pipeline.apply("Read Json from PubSub", <some reader>)
.apply("Convert Json to pojo", ParDo.of(new JsonToPojo()))
.apply("Convert pojo to GenericRecord", ParDo.of(new PojoToGenericRecord()))
.setCoder(AvroCoder.of(GenericRecord.class, schema));
I am trying to get rid of setting the coder in the pipeline as schema won't be known at pipeline creation time (it will be present in the message).
I commented out the line that sets the coder and got an Exception saying that default coder is not configured. I used one argument version of of method and got the following Exception:
Not a Specific class: interface org.apache.avro.generic.GenericRecord
at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:285)
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:594)
at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
at avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
... 9 more
Is there any way for us to supply the coder at runtime, without knowing the schema beforehand?
This is possible. I recommend the following approach:
Do not use an intermediate collection of type GenericRecord. Keep it as a collection of your POJOs.
Write some transform that extracts the schema of your data and makes it available as a PCollectionView<however you want to represent the schema>.
When writing to BigQuery, write your PCollection<YourPojo> via write().to(DynamicDestinations), and when writing to Avro, use FileIO.write() or writeDynamic() in combination with AvroIO.sinkViaGenericRecords(). Both of these can take a dynamically computed schema from a side input (that you computed above).

IBM Integration Bus and xsd:anyType

I'm working with IIB v9 mxsd message definitions. I'd like to define one of the XML elements to be of type xsd:anyType. However, in the list of types I can choose from, only anySimpleType and anyUri are possible (besides all other types like string, integer, etc.).
How can I get around this limitation?
The XMLNSC parser supports the entire XML Schema specification, including xs:any and xs:anyType. In IIBv9 you should create a Library and import your xsds into it. Link your Application to the Library and the XMLNSC parser will find and use the model. You do not need to specify the name of the Library in the node properties; the XSD model will be automatically available to the entire application.
You do not need to use a message set at all in IIBv9 and later versions.
The mxsd file format is used only by the MRM (not DFDL) parser.
You shouldn't use an MXSD to model your XML data, use a normal XSD.
MXSD is for modelling data for the DFDL parser, but you should use the XMLNSC parser for XML messages and define them in XSDs, in which you can use anyType.
As far as I know DFDL doesn't support anyType.

Different coders for the same class in dataflow job

I'm trying to use different coders for the same class for two different scenarios:
Reading from JSON input files - using data = TextIO.Read.from(options.getInput()).withCoder(new Coder1())
Elsewhere in the job I want the class to be persisted using SerializableCoder using data.setCoder(SerializableCoder.of(MyClass.class)
It works locally, but fails when run in the cloud with
Caused by: java.io.StreamCorruptedException: invalid stream header: 7B227365.
Is it a supported scenario? The reason to do this in the first place is to avoid read/write of JSON format, and on the other hand make reading from input files more efficient (UTF-8 parsing is part of the JSON reader, so it can read from InputStream directly)
Clarifications:
Coder1 is my coder.
The other coder is a SerializableCoder.of(MyClass.class)
How does the system choose which coder to use? The two formats are binary incompatible, and it looks like due to some optimization, the second coder is used for data format which can only be read by the first coder.
Yes, using two different coders like that should work. (With the caveat that the coder in #2 will only be used if the system choses to persist 'data' instead of optimizing it into surround computations.)
Are you using your own Coders or ones provided by the Dataflow SDK? Quick caveat on TextIO -- because it uses newlines to encode element boundaries, you'll get into trouble if you use a coder that produces encoded values containing something that can be mistaken for a newline. You really should only use textual encodings within TextIO. We're hoping to make that clearer in the future.

Resources