Applying two transforms to a message on the send port - mapping

I have an urgent need to send a canonical message (M1) out of an orchestration and need to map the canonical message to another message (M2). The resulting message (M2) has to be Wrapped in another request message (M3) before sending it to a web service.
I can't perform the initial transform in the orchestration as I can only deal with the canonical schema internally.
Whats the best way to achieve this 2 stage transform outside of the orchestration?
Thanks in advance!

You could make a pipeline component that applies each map sequentially. Then configure the port to use a pipeline with this component.
private Stream ApplyMap(Stream originalStream, Type mapType)
{
var transform = TransformMetaData.For(mapType).Transform;
var argList = TransformMetaData.For(mapType).ArgumentList;
XmlReader input = XmlReader.Create(originalStream);
Stream outputStream = new VirtualStream();
using (var outputWriter = XmlWriter.Create(outputStream))
{
transform.Transform(new XPathDocument(input), argList, outputWriter, null);
}
outputStream.Flush();
outputStream.Position = 0;
XmlReader outputReader = XmlReader.Create(outputStream);
return outputReader;
}
Then in the pipeline component's Execute method:
Type mapType1 = Type.GetType("YourMapNamespace.Map1, YourAssemblyName,...");
Type mapType2 = Type.GetType("YourMapNamespace.Map2, YourAssemblyName,...");
Stream originalStream = inmsg.BodyPart.GetOriginalDataStream();
Stream mappedStream =
ApplyMap(
ApplyMap(originalStream, mapType1),
mapType2
);
inmsg.BodyPart.Data = mappedStream;
context.ResourceTracker.AddResource(mappedStream);
Note that this example does everything in memory so it could be a problem for large messages. I'll try to find a better example that uses streaming (or worse case, you can use VirtualStream to avoid keeping everything in memory)

If you can use the ESB Toolkit, the ideal approach would be to use an itinerary (Richard Seroter has a good article on that approach here). If that's not an option, here's an approach I've used in the past:
http://blogs.msdn.com/b/chrisromp/archive/2008/08/06/stacking-maps-in-biztalk-server.aspx

Related

Saxon - s9api - setParameter as node and access in transformation

we are trying to add parameters to a transformation at the runtime. The only possible way to do so, is to set every single parameter and not a node. We don't know yet how to create a node for the setParameter.
Current setParameter:
QName TEST XdmAtomicValue 24
Expected setParameter:
<TempNode> <local>Value1</local> </TempNode>
We searched and tried to create a XdmNode and XdmItem.
If you want to create an XdmNode by parsing XML, the best way to do it is:
DocumentBuilder db = processor.newDocumentBuilder();
XdmNode node = db.build(new StreamSource(
new StringReader("<doc><elem/></doc>")));
You could also pass a string containing lexical XML as the parameter value, and then convert it to a tree by calling the XPath parse-xml() function.
If you want to construct the XdmNode programmatically, there are a number of options:
DocumentBuilder.newBuildingStreamWriter() gives you an instance of BuildingStreamWriter which extends XmlStreamWriter, and you can create the document by writing events to it using methods such as writeStartElement, writeCharacters, writeEndElement; at the end call getDocumentNode() on the BuildingStreamWriter, which gives you an XdmNode. This has the advantage that XmlStreamWriter is a standard API, though it's not actually a very nice one, because the documentation isn't very good and as a result implementations vary in their behaviour.
Another event-based API is Saxon's Push class; this differs from most push-based event APIs in that rather than having a flat sequence of methods like:
builder.startElement('x');
builder.characters('abc');
builder.endElement();
you have a nested sequence:
Element x = Document.elem('x');
x.text('abc');
x.close();
As mentioned by Martin, there is the "sapling" API: Saplings.doc().withChild(elem(...).withChild(elem(...)) etc. This API is rather radically different from anything you might be familiar with (though it's influenced by the LINQ API for tree construction on .NET) but once you've got used to it, it reads very well. The Sapling API constructs a very light-weight tree in memory (hance the name), and converts it to a fully-fledged XDM tree with a final call of SaplingDocument.toXdmNode().
If you're familiar with DOM, JDOM2, or XOM, you can construct a tree using any of those libraries and then convert it for use by Saxon. That's a bit convoluted and only really intended for applications that are already using a third-party tree model heavily (or for users who love these APIs and prefer them to anything else).
In the Saxon Java s9api, you can construct temporary trees as SaplingNode/SaplingElement/SaplingDocument, see https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingDocument.html and https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingElement.html.
To give you a simple example constructing from a Map, as you seem to want to do:
Processor processor = new Processor();
Map<String, String> xsltParameters = new HashMap<>();
xsltParameters.put("foo", "value 1");
xsltParameters.put("bar", "value 2");
SaplingElement saplingElement = new SaplingElement("Test");
for (Map.Entry<String, String> param : xsltParameters.entrySet())
{
saplingElement = saplingElement.withChild(new SaplingElement(param.getKey()).withText(param.getValue()));
}
XdmNode paramNode = saplingElement.toXdmNode(processor);
System.out.println(paramNode);
outputs e.g. <Test><bar>value 2</bar><foo>value 1</foo></Test>.
So the key is to understand that withChild() returns a new SaplingElement.
The code can be compacted using streams e.g.
XdmNode paramNode2 = Saplings.elem("root").withChild(
xsltParameters
.entrySet()
.stream()
.map(p -> Saplings.elem(p.getKey()).withText(p.getValue()))
.collect(Collectors.toList())
.toArray(SaplingElement[]::new))
.toXdmNode(processor);
System.out.println(paramNode2);

How can I push data written by a Serilog File Sink on the disk to another Serilog sink?

We are trying to load test our infrastructure of logstash/elastic. Since the actual logs are generated by a software that uses hardware, we are unable to simulate it at scale.
I am wondering if we can store the logs using file sink and later write a program that reads the log files and send data through the actual sink. Since, we are trying different setup, it would be great if we can swap different sinks for testing. Say http sink and elastic sink.
I thought of reading the json file one line at a time and then invoking Write method on the Logger. However I am not sure how to get the properties array from the json. Also, it would be great to hear if there are better alternatives in Serilog world for my needs.
Example parsing
var events= File.ReadAllLines(#"C:\20210520.json")
.Select(line => JsonConvert.DeserializeObject<dynamic>(line));
foreach (var o in objects)
{
DateTime timeStamp = o.Timestamp;
LogEventLevel level = o.Level;
string messageTemplate = o.MessageTemplate;
string exception = o.Exception;
var properties = (o.Properties as JObject);
List<object> parameters = new List<object>();
foreach (var property in properties)
{
if(messageTemplate.Contains(property.Key))
parameters.Add(property.Value.ToString());
}
logInstance.Write(level, messageTemplate, parameters.ToArray());
count++;
}
Example Json Event written to the file
{"Timestamp":"2021-05-20T13:15:49.5565372+10:00","Level":"Information","MessageTemplate":"Text dialog with {Title} and {Message} displayed, user selected {Selected}","Properties":{"Title":"Unload Device from Test","Message":"Please unload the tested device from test jig","Selected":"Methods.Option","SinkRepository":null,"SourceRepository":null,"TX":"TX2937-002 ","Host":"Host1","Session":"Host1-2021.05.20 13.12.44","Seq":87321,"ThreadId":3}}
UPDATE
Though this works for simple events,
it is not able to handle Context properties (there is a work around though using ForContext),
also it forces all the properties to be of type string and
not to mention that destucturing (#property) is not handled properly
If you can change the JSON format to Serilog.Formatting.Compact's CLEF format, then you can use Serilog.Formatting.Compact.Reader for this.
In the source app:
// dotnet add package Serilog.Formatting.Compact
Log.Logger = new LoggerConfiguration()
.WriteTo.File(new CompactJsonFormatter(), "./logs/myapp.clef")
.CreateLogger();
In the load tester:
// dotnet add package Serilog.Formatting.Compact.Reader
using (var target = new LoggerConfiguration()
.MinimumLevel.Verbose()
.WriteTo.Console()
.CreateLogger())
{
using (var file = File.OpenText("./logs/myapp.clef"))
{
var reader = new LogEventReader(file);
while (reader.TryRead(out var evt))
target.Write(evt);
}
}
Be aware though that load testing results won't be accurate for many sinks if you use repeated timestamps. You should consider re-mapping the events you read in to use current timestamps.
E.g. once you've loaded up evt:
var current = new LogEvent(DateTimeOffset.Now,
evt.Level,
evt.Exception,
evt.MessageTemplate,
evt.Properties);
target.Write(current);

SideInputs corrupt the data in DataFlow's Pipeline

I have a Dataflow pipeline (SDK 2.1.0, Apache Beam 2.2.0) which simply reads RDF (in N-Triples, so it's just text files) from GCS, transforms it somehow and writes it back to GCS, but in a different bucket. In this pipeline I employ side inputs which are three single files (one file per side input) and use them in a ParDo.
To work with RDF in Java I use Apache Jena, so each file is read into an instance of Model class. Since Dataflow doesn't have Coder for it, I developed it myself (RDFModelCoder, see below). It works fine in number of other pipelines I created.
The problem with this particular pipeline is when I add the side inputs, the execution fails with an exception indicating a corruption of the data, i.e. some garbage is added. Once I remove the side inputs, the pipeline finishes execution successfully.
The exception (it's thrown from RDFModelCoder, see below):
Caused by: org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:233)
at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:198)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:298)
at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:288)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:237)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:417)
at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:268)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:254)
at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:69)
at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:305)
And here you can see the garbage (at the end):
<http://example.com/typeofrepresentative/08> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#NamedIndividual> . ������** �����I��.�������������u�������
The pipeline:
val one = p.apply(TextIO.read().from(config.getString("source.one")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val two = p.apply(TextIO.read().from(config.getString("source.two")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val three = p.apply(TextIO.read().from(config.getString("source.three")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val sideInput = PCollectionList.of(one).and(two).and(three)
.apply(Flatten.pCollections())
.apply(View.asList())
p.apply(RDFIO.Read
.from(options.getSource())
.withSuffix(RDFLanguages.strLangNTriples))
.apply(ParDo.of(SparqlConstructETL(config, sideInput))
.withSideInputs(sideInput))
.apply(RDFIO.Write
.to(options.getDestination())
.withSuffix(RDFLanguages.NTRIPLES))
And just to provide the whole picture here are implementations of SingleValue and ConvertToRDFModel ParDos:
class SingleValue : SerializableFunction<Iterable<String>, String> {
override fun apply(input: Iterable<String>?): String {
if (input != null) {
return input.joinToString(separator = " ")
}
return ""
}
}
class ConvertToRDFModel(outputLang: Lang) : DoFn<String, Model>() {
private val lang: String = outputLang.name
#ProcessElement
fun processElement(c: ProcessContext?) {
if (c != null) {
val model = ModelFactory.createDefaultModel()
model.read(StringReader(c.element()), null, lang)
c.output(model)
}
}
}
The implementation of RDFModelCoder:
class RDFModelCoder(private val decodeLang: String = RDFLanguages.strLangNTriples,
private val encodeLang: String = RDFLanguages.strLangNTriples)
: AtomicCoder<Model>() {
private val LOG = LoggerFactory.getLogger(RDFModelCoder::class.java)
override fun decode(inStream: InputStream): Model {
val bytes = StreamUtils.getBytes(inStream)
val model = ModelFactory.createDefaultModel()
model.read(ByteArrayInputStream(bytes), null, decodeLang) // the exception is thrown from here
return model
}
override fun encode(value: Model, outStream: OutputStream?) {
value.write(outStream, encodeLang, null)
}
}
I checked the side input files multiple times, they're fine, they have UTF-8 encoding.
Most likely the error is in the implementation of RDFModelCoder. When implementing encode/decode one has to remember that the provided InputStream and OutputStream are not exclusively owned by the current instance being encoded/decoded. E.g. there might be more data in the InputStream after the encoded form of your current Model. When using StreamUtils.getBytes(inStream) you are grabbing both data of the current encoded Model and anything else that was in the stream.
Generally when writing a new Coder it's a good idea to only combine existing Coder's rather than hand-parsing the stream: that is less error-prone. I would suggest to convert the model to/from byte[] and use ByteArrayCoder.of() to encode/decode it.
Apache Jena provides the Elephas IO modules which have Hadoop IO support, since Beam supports Hadoop InputFormat IO you should be able to use that to read in your NTriples file.
This will likely be far more efficient since the NTriples support in Elephas is able to parallelise the IO and avoid caching the entire model into memory (in fact it won't use Model at all):
Configuration myHadoopConfiguration = new Configuration(false);
// Set Hadoop InputFormat, key and value class in configuration
myHadoopConfiguration.setClass("mapreduce.job.inputformat.class",
NTriplesInputFormat.class, InputFormat.class);
myHadoopConfiguration.setClass("key.class", LongWritable.class, Object.class);
myHadoopConfiguration.setClass("value.class", TripleWritable.class, Object.class);
// Set any other Hadoop config you might need
// Read data only with Hadoop configuration.
p.apply("read",
HadoopInputFormatIO.<LongWritable, TripleWritable>read()
.withConfiguration(myHadoopConfiguration);
Of course this may require you to refactor your overall pipeline somewhat.

Caching streams in Functional Reactive Programming

I have an application which is written entirely using the FRP paradigm and I think I am having performance issues due to the way that I am creating the streams. It is written in Haxe but the problem is not language specific.
For example, I have this function which returns a stream that resolves every time a config file is updated for that specific section like the following:
function getConfigSection(section:String) : Stream<Map<String, String>> {
return configFileUpdated()
.then(filterForSectionChanged(section))
.then(readFile)
.then(parseYaml);
}
In the reactive programming library I am using called promhx each step of the chain should remember its last resolved value but I think every time I call this function I am recreating the stream and reprocessing each step. This is a problem with the way I am using it rather than the library.
Since this function is called everywhere parsing the YAML every time it is needed is killing the performance and is taking up over 50% of the CPU time according to profiling.
As a fix I have done something like the following using a Map stored as an instance variable that caches the streams:
function getConfigSection(section:String) : Stream<Map<String, String>> {
var cachedStream = this._streamCache.get(section);
if (cachedStream != null) {
return cachedStream;
}
var stream = configFileUpdated()
.filter(sectionFilter(section))
.then(readFile)
.then(parseYaml);
this._streamCache.set(section, stream);
return stream;
}
This might be a good solution to the problem but it doesn't feel right to me. I am wondering if anyone can think of a cleaner solution that maybe uses a more functional approach (closures etc.) or even an extension I can add to the stream like a cache function.
Another way I could do it is to create the streams before hand and store them in fields that can be accessed by consumers. I don't like this approach because I don't want to make a field for every config section, I like being able to call a function with a specific section and get a stream back.
I'd love any ideas that could give me a fresh perspective!
Well, I think one answer is to just abstract away the caching like so:
class Test {
static function main() {
var sideeffects = 0;
var cached = memoize(function (x) return x + sideeffects++);
cached(1);
trace(sideeffects);//1
cached(1);
trace(sideeffects);//1
cached(3);
trace(sideeffects);//2
cached(3);
trace(sideeffects);//2
}
#:generic static function memoize<In, Out>(f:In->Out):In->Out {
var m = new Map<In, Out>();
return
function (input:In)
return switch m[input] {
case null: m[input] = f(input);
case output: output;
}
}
}
You may be able to find a more "functional" implementation for memoize down the road. But the important thing is that it is a separate thing now and you can use it at will.
You may choose to memoize(parseYaml) so that toggling two states in the file actually becomes very cheap after both have been parsed once. You can also tweak memoize to manage the cache size according to whatever strategy proves the most valuable.

Receive messages only from a specific DDS topic instance?

I'm using OpenDDS v3.6, and trying to send a message to a specific DDS peer, one of many. In the IDL, the message structure looks like the following:
module Test
{
#pragma DCPS_DATA_TYPE "Test::MyMessage"
#pragma DCPS_DATA_KEY "Test::MyMessage dest_id"
struct MyMessage {
short dest_id;
string txt;
};
};
My understanding is that because the data key is unique, this is a new instance of the topic being written to, and any further msgs written w/ the same data key send to this specific instance of the topic. My send code is as follows:
DDS::ReturnCode_t ret;
Test::MyMessage msg;
// populate msg
msg.dest_id = n;
DDS::InstanceHandle_t handle;
handle = msg_writer->register_instance(msg);
ret = msg_writer->write(msg, handle);
So now I need to figure out how to get the receiving peer to read only from this topic instance and not receive all the other messages being sent to other peers. I started with the following, but not sure how to properly select a specific topic instance.
DDS::InstanceHandle_t instance;
status = msg_dr->take_next_instance(spec, si, 1, DDS::ANY_SAMPLE_STATE,
DDS::ANY_VIEW_STATE, DDS::ANY_INSTANCE_STATE);
Any help much appreciated.
The easiest way to achieve what you are looking for is by using a ContentFilteredTopic. This class is a specialization of the TopicDescription class and allows you to specify an expression (like a SQL WHERE-clause) of the samples that you are interested in.
Suppose you want your DataReader to only receive samples with dest_id equal to 42, then the corresponding code for creating the ContentFilteredTopic would look something like
DDS::ContentFilteredTopic_var cft =
participant->create_contentfilteredtopic("MyTopic-Filtered",
topic,
"dest_id = 42",
StringSeq());
From there on, you create your DataReader using cft as the parameter for the TopicDescription. The resulting reader will look like a regular DataReader, except that it only receives the desired samples and nothing else. Since the field dest_id happens to be the field that identifies the instance, the end result is that you will only have one instance in your DataReader.
You can check out the DDS specification (section 7.1.2.3.3) or OpenDDS Developer's Guide (section 5.2) for more details.

Resources