Capture Serilog logging from subprocess - serilog

We may want a process to be able to capture Serilog output from a subprocess in realtime, to include it in the master process' own Serilog log.
We would want to capture the full structured log objects, not just some rendered string.
How would I do that? I haven't done much research at this stage - just browsed the available Serilog sinks. I found a few that might be useful, e.g. Serilog.Sinks.Network. That could be used by the subprocess to send log events, but how would the master process capture them and merge them into its own stream of log events?

Related

Switch between WriteTo and AuditTo at runtime

AuditTo is a Serilog feature that ensures synchronous write to the sink, with an exception thrown if change flushing fails.As the name implies, it is ideal for ensuring the security of audit data to be stored. Right now, I found File, Seq and RabbitMQ sinks supporting AuditTo writes. I couldn't find the SqlLite sink that I'm interested in ... :(
From the other side, we have WriteTo, which batches the log entries and writes them asynchronously. There are no exceptions; it's kind of fire and forget.No one cares whether the log entries are dropped by the connection or the target system's failure or unavailability.
I would like to implement sending the audit logs via AuditTo but also be able to switch the log configuration to WriteTo at the runtime. In the meantime, the app might still write the logs. 
I saw that Serilog offers dynamic switching of the logging level via LoggingLevelSwitch.
Any suggestions, ideas, or solutions for such requirements?

Can you configure a default or backup sink?

In a dotnet core app, if one configures mssql as a sink via appsettings, how can you configure a "backup" sink such as a file? So if serilog cant write to a given sink, it tries to write to the 2nd.
Is the Selflog what I need to look at?
Serilog doesn't have a feature like that, as of this writing, but it should be possible to implement by creating your own Sink that wraps the primary sinks and manages the error handling to perform the fallback to "backup" sink(s).
You might want to look at the code of Serilog.Sinks.Async for inspiration, as it shows you a way of wrapping multiple sinks into one which could be a starting point.

How to read from pubsub source in parallel using dataflow

I am very new to dataflow, I am looking to build pipeline which will use pubsub as source.
I have worked on streaming pipeline which has flink as streaming engine and kafka as source, in that we can set parallelism in flink to read messages from kafka so that message processing can happen in parallel instead of sequential.
I am wondering if same can be possible in pubsub->dataflow, or it will only read message in sequential order.
Take a look at the PubSubToBigQuery pipeline. This uses PubSub as a source, this will read data in parallel. Multiple threads will be each reading a message off of pubsub and handing it off to downstream transforms for processing, by default.
Please note that the PubSubToBQ pipeline can also be run as a template pipeline, which works well for many users. Just launch the pipeline from the Template UI and set the appropriate parameters to point to your pub sub and BQ locations. Some users prefer to use it that way. But this depends on where you want to store your data.

Does Serilog.Sinks.Console sink get any benefits from wrapping to Serilog.Sinks.Async sink?

I using Serilog inside my aspnet core app for logging. And i need to write log events to console pretty frequently (300-500 events per second). I run my app inside docker container and procces console logs using orchestrator tools.
So my question: should i use Async wrapper for my Console sink and will i get any benefits from that?
I read the documentation (https://github.com/serilog/serilog-sinks-async), but can't understand is it actual for Console sink or not.
The Async Sink takes the already-captured LogEvent items and shifts them to a single background processor from multiple foreground threads using a ConcurrentQueue Producer/Consumer collection. In general that's a good thing for stable throughput esp with that throughput of events.
Also if sending to >1 sink, shifting to a background thread which will be scheduled as necessary focusing on that workload (i.e., paths propagating to sinks being in cache) can be good if you have enough cores available and/or Sinks block even momentarily.
Having said that, to base anything of this information is premature optimization.
Console sinks and their ability to ingest efficiently without blocking if you don't put an Async in front, always Depends a lot - for example, hosting environments typically synthesize a stdout that buffers efficiently. When that works well, adding an Async in front of the Console sink is merely going to prolong object lifetimes without much benefit vs allowing each thread submit to the Console sink directly.
So, it depends - IME feeding everything to Async and doing all processing there (e.g. writing to a buffered file, emitting every .5s (perhaps to a sidecar process that forwards to your log store)) can work well. The bottom line is that a good load generator rig is a very useful thing for any high throughput app. Once you have one, you can experiment - I've seen 30% throughput gains from reorganizing the exact same output and how it's scheduled (admittedly I also switched to Serilog during that transition - you're unlikely to see anything of that order).

Any easier way to flush aggregator to GCS at the end of google dataflow pipeline

I am using Aggregator to log some runtime stats of dataflow job and I want to flush them to either GCS or BQ when the pipeline completes (or each transformer completes).
Currently I am doing it by beyond using Aggregator also creating side output by utilizing tupleTag at the same time and flush the side output PCollection.
However i am wondering whether might there by any other handy ways to flush the aggregators themselves directly?
Your method of using a side output PCollection should produce semantically equivalent results to using an Aggregator. (For example, both Aggregators and side outputs will not include duplicate values when a bundle fails and has to be retried.) The main difference is that partial results for Aggregators are available during pipeline execution in the monitoring UI and programmatically.
Within Java, you can use PipelineResult.getAggregatorValues(). If you get the PipelineResult from the [non-blocking]DataflowPipelineRunner, that will let you query aggregators as the job runs. If you use the BlockingDataflowPipelineRunner, Pipeline.run() blocks and you won't get the PipelineResult until after the job completes.
There's also commandline support: gcloud alpha dataflow metrics tail JOB_ID

Resources