Serilog File/RollingFile sink and buffering - serilog

It looks like Serilog File/RollingFile sink flushes stream after each logger call.
Isn't this fundamental performance hit? For example Nlog has some kind of AsyncWrapper for queuing log events and writing them in batch'es using background thread.
What are the solutions if I want to minimize latency when using file sink?

Rebuilding the code yourself is the only option for this currently.
I've added https://github.com/serilog/serilog/issues/650 for hopeful inclusion in the upcoming Serilog v2.

Related

Best practice to detect broken log sinks

I am trying to replace the in-house-written logger solution of one of my customers. Pretty much everything is straight forward, but i need to implement one sink that sends the logs to a custom log window that i cannot change (for now). It communicates using named pipes. this pipe may be broken or busy, so the current solution actually blocks on every log call - which I want to improve.
The question is what the best practice is when using serilog: whats the best way to tell serilog the sink is currently broken so it is not slowing down the system. Is throwing an exception enough?
Serilog itself doesn't know (or care) when a sink is broken or not, so I'm not sure I understand your goal.
Writing to a Serilog logger is supposed to be a safe operation, by design, thus any exceptions that happen in your sink will automatically be caught by Serilog to make sure the app doesn't crash. Serilog will make sure these exceptions are written to the SelfLog which developers can use to troubleshoot sink issues. See an example here.
Therefore, if your goal is to have a way that a developer can see when the sink experienced problems, the recommendation is to write error messages to the SelfLog and throw your own exceptions from within your sink.
If you can detect from within your sink that the named pipe is not available without blocking, then just write to SelfLog and return/short-circuit without trying to write to it. It's really up to you to implement any kind of resilience policy from within your sink.
If your goal is to improve the blocking calls, you might want to consider making your sink asynchronous, with the messages sent on a separate thread, without blocking the main thread of the app.
Given you're implementing your own custom sink, an easy way to do that is to turn your sink into a Periodic Batching sink and leverage the infrastructure it provides. Alternatively, you can use Serilog.Sinks.Async wrapper sink.

Does Log4j2 AsyncLogger follow insertion order?

Suppose I have a single thread to keep writing logs using Log4j2 AsyncLogger, will the logs in the files follow the order of the calls? How many threads it uses to consume the log events?
The message order bug was fixed in version 2.10.0. So from this version the messages should be displayed in the order.
According to this answer there is only one thread that writes to file

How can I save additional messages that would normally be excluded by Loglevel in case of errors

I have a basic serilog-usage-scenario: Logging messages from an Web-Application. In production I set the log-level to "information".
Now my question: Is it possible to write the last ~100 debug/trace messages to the log after an error occurs, so that I have a short history of detailed messages before the error occurred. This would keep my log clean and gives me enough informations to track errors.
I created such a mechanism years ago for another application/logging-framework, but I'm curious if thats already possible with Serilog.
If not, where in the pipeline would be the place to implement such logic.
This is not something that Serilog has out-of-the-box, but it would be possible to implement by writing a custom sink that wraps all other sinks and caches the most recent ~100 Debug messages and forwards them to the sinks when an Error message occurs.
You might want to look at the code of Serilog.Sinks.Async for inspiration, as it shows you a way of wrapping multiple sinks into one.

what actually manages watermarks in beam?

Beam's big power comes from it's advanced windowing capabilities, but it's also a bit confusing.
Having seen some oddities in local tests (I use rabbitmq for an input Source) where messages were not always getting ackd, and fixed windows that were not always closing, I started digging around StackOverflow and the Beam code base.
It seems there are Source-specific concerns with when exactly watermarks are set:
RabbitMQ watermark does not advance: Apache Beam : RabbitMqIO watermark doesn't advance
PubSub watermark does not advance for low volumes: https://issues.apache.org/jira/browse/BEAM-7322
SQS IO does not advance the watermark over a period of time of no new incoming messages - https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/sqs/SqsIO.java#L44
(and others). Further, there seem to be independent notions of Checkpoints (CheckpointMarks) as oppose to Watermarks.
So I suppose this is a multi-part question:
What code is responsible for moving the watermark? It seems to be some combination of the Source and the Runner, but I can't seem to actually find it to understand it better (or tweak it for our use cases). It is a particular issue for me as in periods of low volume the watermark never advances and messages are not ackd
I don't see much documentation around what a Checkpoint/Checkpoint mark is conceptually (the non-code Beam documentation doesn't discuss it). How does a CheckpointMark interact with a Watermark, if at all?
Each PCollection has its own watermark. The watermark indicates how complete that particular PCollection is. The source is responsible for the watermark of the PCollection that it produces. The propagation of watermarks to downstream PCollections is automatic with no additional approximation; it can be roughly understood as "the minimum of input PCollections and buffered state". So in your case, it is RabbitMqIO to look at for watermark problems. I am not familiar with this particular IO connector, but a bug report or email to the user list would be good if you have not already done this.
A checkpoint is a source-specific piece of data that allows it to resume reading without missed messages, as long as the checkpoint is durably persisted by the runner. Message ACK tends to happen in checkpoint finalization, since the runner calls this method when it is known that the message never needs to be re-read.

Log4j2 zipping performance

So I have log4j2 setup to log hourly. I know this happens in a background thread. Unfortunately, our products logs a lot, and zips over 500MB hourly. We notice a small hickup at each hour there is being logged, so it looks like the log4j2 background thread takes too much CPU. It is a small hickup, but unfortunately, this already causes internal errors to our product.
Is there any way to improve the performance of the log4j zipping? Could we for example, say to the background thread, that it is only allowed to use a certain percent of the CPU?
edit:
I've seen CPU usage going through the roof at the moment it happens, so it is definetely a CPU usage thing.
As far as I know there is no built-in way to avoid the high CPU usage on rollover so you have to implement that yourself.
log4j2 is designed to be easily extended. I cannot give you a full solution. But you should have a look at the org.apache.logging.log4j.core.appender.rolling.action.GzCompressAction class which performs the zipping (in case you are using gz). I think it should be possible to implement your own action class and put a delay to the zipping as you like.
The zip format allows a “compressionLevel” configuration attribute. This is something you can experiment with to see if it makes a difference. See the DefaultRolloverStrategy parameters in https://logging.apache.org/log4j/2.x/manual/appenders.html#RollingFileAppender
Another thing to try is simply zip smaller files. You can tell Log4j2 to rollover based on size. You’ll have more smaller log files, but you can post process them offline where it doesn’t disrupt the application.

Resources