I am little confused with the explanation on log4j site, could somebody please put some light on my ignorance.
Does the graph represents, the number of application threads currently used by application to run its business logic or it represents the number of threads that will be used by logger or disruptor in case of all Async Loggers.
In other words, does the graph represents the threads that are generating logs or does it represent the threads consuming logs on behalf of log4j2.
Please share some official references and is there any way to configure the number of threads that will be used by logger.
I don't want my application to slog by loggers thread so is there any way to configure log4j to use CPU less.
The number of threads in the graph represents the number of application threads that do logging concurrently.
With Async Loggers there is only one background thread that writes to the log files.
What the graph shows is that the data structure used to handoff log events from the many producer (application) threads to the single consumer thread is very efficient even under high contention, resulting in much better logging throughput.
Related
I using Serilog inside my aspnet core app for logging. And i need to write log events to console pretty frequently (300-500 events per second). I run my app inside docker container and procces console logs using orchestrator tools.
So my question: should i use Async wrapper for my Console sink and will i get any benefits from that?
I read the documentation (https://github.com/serilog/serilog-sinks-async), but can't understand is it actual for Console sink or not.
The Async Sink takes the already-captured LogEvent items and shifts them to a single background processor from multiple foreground threads using a ConcurrentQueue Producer/Consumer collection. In general that's a good thing for stable throughput esp with that throughput of events.
Also if sending to >1 sink, shifting to a background thread which will be scheduled as necessary focusing on that workload (i.e., paths propagating to sinks being in cache) can be good if you have enough cores available and/or Sinks block even momentarily.
Having said that, to base anything of this information is premature optimization.
Console sinks and their ability to ingest efficiently without blocking if you don't put an Async in front, always Depends a lot - for example, hosting environments typically synthesize a stdout that buffers efficiently. When that works well, adding an Async in front of the Console sink is merely going to prolong object lifetimes without much benefit vs allowing each thread submit to the Console sink directly.
So, it depends - IME feeding everything to Async and doing all processing there (e.g. writing to a buffered file, emitting every .5s (perhaps to a sidecar process that forwards to your log store)) can work well. The bottom line is that a good load generator rig is a very useful thing for any high throughput app. Once you have one, you can experiment - I've seen 30% throughput gains from reorganizing the exact same output and how it's scheduled (admittedly I also switched to Serilog during that transition - you're unlikely to see anything of that order).
I have a server I am creating (a messaging service) and I am doing some preliminary tests to benchmark it. So far, the fastest way to process the data is to do it directly on the process of the user and to use worker pools. I have tested spawning and that is unbelievable slow.
The test is just connecting 10k users, and having each one send 15kb of data a couple of times at the same time(or trying too atleast) and having the server process the data (total length, headers, and payload).
The issue I have with worker pools is its only fast when you have enough workers to offset the amount of connections. For example, if you have 500k, or 1 million users, you would need more workers to process all the concurrent data coming in. And, as for my testing, having 1000 workers would make it unusable.
So my question is the following: When does it make sense to use pools of workers? Will there be a tipping point where I would have to use workers to process the data to free up the user process? How many workers is too much, is 500,000 too much?
And, if workers are the way to go (for those massive concurrent distributed servers), I am guessing you can dynamically create/delete as you need?
Any literature is also appreciated!
Thanks for your answer!
Maybe worker pools are not the best tool for your problem. If I were you I would try using Jay Nelson's epocxy, which gives you a very basic backpressure mechanism while still letting you parallelize your tasks. From that library I would check either concurrency fount or concurrency control tools.
I have been reading the documentation trying to understand when it makes sense to increase the async-thread pool size via the +A N switch.
I am perfectly prepared to benchmark, but I was wondering if there were a rule-of-thumb for when one ought to suspect that growing the pool size from 0 to N (or N to N+M) would be helpful.
Thanks
The BEAM runs Erlang code in special threads it calls schedulers. By default it will start a scheduler for every core in your processor. This can be controlled and start up time, for instance if you don't want to run Erlang on all cores but "reserve" some for other things. Normally when you do a file I/O operation then it is run in a scheduler and as file I/O operations are relatively slow they will block that scheduler while they are running. Which can affect the real-time properties. Normally you don't do that much file I/O so it is not a problem.
The asynchronous thread pool are OS threads which are used for I/O operations. Normally the pool is empty but if you use the +A at startup time then the BEAM will create extra threads for this pool. These threads will then only be used for file I/O operations which means that the scheduler threads will no longer block waiting for file I/O and the real-time properties are improved. Of course this costs as OS threads aren't free. The threads don't mix so scheduler threads are just scheduler threads and async threads are just async threads.
If you are writing linked-in drivers for ports these can also use the async thread pool. But you have to detect when they have been started yourself.
How many you need is very much up to your application. By default none are started. Like #demeshchuk I have also heard that Riak likes to have a large async thread pool as they open many files. My only advice is to try it and measure. As with all optimisation?
By default, the number of threads in a running Erlang VM is equal to the number of processor logical cores (if you are using SMP, of course).
From my experience, increasing the +A parameter may give some performance improvement when you are having many simultaneous file I/O operations. And I doubt that increasing +A might increase the overall processes performance, since BEAM's scheduler is extremely fast and optimized.
Speaking of the exact numbers – that totally depends on your application I think. Say, in case of Riak, where the maximum number of opened files is more or less predictable, you can set +A to this maximum, or several times less if it's way too big (by default it's 64, BTW). If your application contains, like, millions of files, and you serve them to web clients – that's another story; most likely, you might want to run some benchmarks with your own code and your own environment.
Finally, I believe I've never seen +A more than a hundred. Doesn't mean you can't set it, but there's likely no point in it.
I am curious as to how to proceed with this issue; I currently have a DataSnap server setup with a TDSAuthenticationManager class managing the authentication.
If an authentication fails, is it safe for me to write directly onto a form TMemo or something similar for logging purposes? What's the best way to observe this?
Do I need threading?
Cheers for reading,
Adrian
Yes, you need synchronization, since Datasnap events run in the context of different threads, and as you may know, the UI programming is limited to the main thread.
So, if you want to display something in the UI, you have to take care of how to do it.
On the other hand, if you want to log to a file, you don't need synchronization, but you have to be careful, since it is possible for two different threads to try to log at the same time.
The options I would evaluate are:
Protect the access to the log file using a Critical Section, thus avoiding the multi-thread access with a lock. Only one thread can access the file at a time and all other interested threads have to wait.
Create a new logging class, from which a global instance that can take log requests by simply adding the log message to a (multi thread capable) queue in memory, and running it's own thread writing them to a file when there are messages in the queue.
Since servers tend to run as a services in production environments, I would choose the latter.
I have a controller action that aggregates data from multiple sources: web service, database, file lookups, etc... and passes results to the view. So in order to render the page all tasks must have completed. Currently they are performed sequentially but as they are independent I am thinking of running them in parallel as this could improve performance.
So what would be the best approach to achieve this? For each task start a new thread and block the main thread as all tasks are finished? Should I use a thread from the thread pool or spawn a new thread manually? Using threads from the thread pool would limit my web server's capability of serving new requests so this might not be a good idea. Spawning new threads manually could be expensive, so at the end of the day would there be a net gain in performance by paralleling these tasks or just leave them run sequentially?
If it's between spawning your own threads or using the thread pool threads, I'd say use the ones from the thread pool. You can always adjust your server settings to allow for more threads in the pool if you find that you are running out of threads.
The only way to answer your final question would be to actually test it out, as we don't know how complicated the separate aggregation tasks are. If you want to give the illusion of a responsive UI, you could always display the loading page and kick off the aggregation with AJAX. Even non-threaded, this may placate your users sufficiently.