Erlang: error_logger and overload protection

Erlang: error_logger and overload protection - erlang

Say I have an Erlang system which sometimes go nasty and produces a lot of warnings, which will be logged to file, the system should still functional properly if the logging doesn't block disk IO, (i.e. the logs are "more or less expected"). (This is a real world scenario, not something I makeup)
The error_logger comes with erlang doesn't have overload protection, so if the amount of logs are really big, the logging will block disk IO and possibly cause system to malfunction.
My question is, why error_logger doesn't have overload protection by default, is it because that this feature is not actually needed if you design your architecture right? or is it because that this is some kind of advanced feature, if you need this feature, you should use another library, like lager?

I would say that overload protection is a feature, and that is a matter of implementation (and/or configuration).
Overload protection is perhaps a nice feature to have in a logging toolkit, useful for some people, even if probably most people won't need it, but error_logger is really a logging interface, one designed to support arbitrary implementations (each with disparate configuration depending on their features) that the developer/integrator/user can choose to plug in, depending on their requirements.
This may not be exhaustive, but things that immediately come to mind which change logging requirements are:
application (some applications obviously log more/less/differently than others)
host environment (capabilities are different for traditional or SSD storage for example)
use case of the application (an end user or developer might deploy with configurations that mean more or less logging)
local infrastructure and standards (some organisations might use only local logs, but others may use syslog everywhere, religiously)
external or 3rd party environmental factors (such as another network service which talks to the application/node, causing logging)
It's really important to decouple the logging interface from implementation, because:
a developer might change their mind about the logging implementation part way through a project, and the decoupling means this is easy
a system administrator might need to modify or override the developers default, because of host, use case, local infrastructure and other environmental factors, etc, some of which the developer may not be able to anticipate
Thus, because it's an interface, error_logger doesn't and shouldn't have overload protection; it's outside error_logger's remit.
I freely admit that it may be possible to make plausible arguments for including some implementation in an interface, and I can imagine arguments for including features like overload protection in error_logger despite what I've said, but it's a slippery slope. I would choose purity and simplicity instead; I think it's worth keeping error_logger lean and mean rather than allowing additional bulk to creep into it that will affect the performance of every logging implementation everywhere. Take that path and before you know it the limitations won't be the disk I/O blocking, it'll be error_logger itself because it's become bloated, and there won't be anything to be done about it other than invent a new error logger to use instead.

Related

WebAssembly: Reconstructing the stack from scratch

By transforming .wasm source files or interacting with a suitable debugger with Javascript it should be possible to serialize the full Wasm execution state (mainly the stack, call frames, local variables etc.).
I wonder if it is possible to reconstruct it using this serialized representation and continue running the program where it was stopped on another machine.
Could current browser runtimes support this?

Not sure what transformation or debugger you have in mind, but your premise that it is possible to serialise JavaScript execution state is false. It would in fact be extremely difficult to implement such a mechanism in browser engines. No production JS engine I'm aware of can even serialise its heap in the general case (even though some, like V8, have a very limited snapshot mechanism for the start-up heap). Let alone the call stack and live function state, which may be in one of many optimisation modes, arbitrarily intermixed with C or assembly stack frames from the runtime or the embedder, and generally is super tricky.
The mechanisation you have in mind would require general serialisation on top of first-class undelimited continuations. TC39, the JavaScript committee, discarded the idea of adding full-blown continuations to the language many years ago because it was deemed too hard and too expensive to implement in most engines (which is why ES6 instead introduced generators as a much more limited mechanism). Edit: Generic serialisation wasn't even ever considered, since it would actually break encapsulation via closures or proxies, and thus all existing security patterns of the language.

What elements are needed to implement a remote, event driven system? - overview needed

I am trying to design an event driven system where the elements of the system communicate by generating events that are responded to by other components of the system. It is intended that the components be independent of each other - or as largely independent as I can make them. The system will initially be implemented on Windows 7, and is being written in Delphi. The generated events will be generated by the Delphi code. I understand how to implement a system of the type described on a single machine.
I wish to design the system so that it can readily be deployed on different machine architectures in particular with different components running on a distributed architecture, which may well be different to Windows 7. There is no requirement for the system ever to communicate with any systems external to itself.
I have tried investigating the architecture I need to consider and have looked at the questions mentioned below. These seem to point towards utilising named pipes as a mechanism for inter-hardware communications. As a result of these investigations I have sketched out the following to describe my system - the first part of the diagram is the system as I am developing it; the second part what I have deduced I would need for possible future implementations.
This leads to the following points:
Can you pass events via named pipes?
Is this an appropriate and sensible structure to tackle this problem?
Are there better alternatives?
What have I forgotten (at this level of granularity)?
How is event driven programming implemented?
How do I send a string from one instance of my Delphi program to another?
EDIT:
I had not given the points arising from "#I give crap answers" response sufficient consideration. My initial responses to his points are:
Synchronous v Asynchronous - mostly asynchronous
Events will always be in a FIFO queue.
Connection loss - is not terribly important - I can afford to deal with this non-rigourously.
Unbounded queues are a perfectly good way of dealing with events passed (if they can be) - there is no expectation of large volume of event generation.

For maximum deployment flexibility (operating-system independent), I recommend to take a look at popular open source message brokers which run on the Java platform. Using standard protocols. they integrate well with Delphi and other programming languages, can be used with web applications, and have a large installed user base and active community.
They are quite easy to install and configure in a few minutes, and free / commercial clients for Delphi are available.
Some examples are:
Apache ActiveMQ
OpenMQ
JBoss HornetQ
I also recommend the book "Enterprise Integration Patterns" by Martin Fowler as an overview and introduction, with many simple recipes to handle specific problems.
Note that I am a developer of commercial Delphi clients for enterprise messaging systems, such as xmlBlaster, RabbitMQ, Amazon Simple Queue Service and the three brokers mentioned above.

I can only answer for your point 4 here: You have not yet decided if an event is synchronous or asynchronous. In the async case, you have to decide what to do when messages arrive. Do you have a queue? How big is the queue? Can one grab arbitrary elements in the queue or is it strictly FIFO. What happens if a message is lost (somebody axes the network cable)?
In the sync variant, the advantage is that you got delivery guarantees, but then what do you do when connections are suddenly lost?
Connection loss is going to be a problem. The more machines you have, the greater is the chance that they will occur. Decide how you will handle that.
Another trouble may be what you do if you have a large event and several small. Is the order of transfer FIFO or smallest-first? Can events be reeordered? What are the assumptions here?
The aside is that I hack Erlang a lot. In Erlang all the event-handling is already solved but it also means a specific model is chosen for you (async, unbounded queues, no guaranteed delivery, but detection of connection loss).

I suggest to look at RabbitMQ, http://www.rabbitmq.com/. It has the server and client. Just need some wrapper codes in delphi and you are ready to build your business logic
Cheers

This is probably just an application for a message queue.
http://msdn.microsoft.com/en-us/library/ms632590(v=vs.85).aspx

Should I make and implement a network protocol by hand or use a middleware (if so which)?

I have some data that I need to share between multiple services on multiple machines. Stuffing the data into a database or shuffling it over http won't work in this situation and ideally the different pieces of software will need to communicate with each other directly (or through one central coordinator that can send and receive).
Is it recommended to create and implement a network protocol or use some tool to do the communication?
If I did go the route of creating a protocol myself, it wouldn't have to be very complex. Under 10 different message types, but it would have to be re-implemented in a few different languages for this project, and support unicode. I have read plenty (and done some) with handling sockets, but don't have much knowledge in handling a protocol I create. Are there any good resources on this?
There are also things like ICE and RPC that look intresting. The limit of my experience is using ICE and XMLRPC for a few days each. Is this the better route to go? If so what tools are out there?

Recently I've been using Google Protocol Buffers for encoding and shipping data between different machines running software written in different languages. It is quite easy to do, and takes away a lot of the hassle of designing a custom protocol.

Without knowing what technologies and platforms you are dealing with, it's difficult to give you a very specific answer - so I'll try to give you some general feedback.
If the system(s) you are wishing to connect span more than a single platform and/or technology you are probably better using an existing transport mechanism and protocol to maximize the chance your base platform will already have a library (or multiple) to interact over it. Also, integrating security and other features in a stack with known behaviors is more likely to be documented (with examples floating around). RPC (and ICE, though I've less familiarity with it) has some useful capabilities, but it also requires a lot of control over the environment and security can be convoluted (particularly if you are passing objects between different languages).
With regards to avoiding polling, this is a performance related issue; there are design patterns which can help you to handle such things - if you understand how you need the system to work (e.g. the observer pattern - kind of a dont-call-us-we'll-call-you approach). The network environment you are playing in will dictate which options are actually viable (e.g. a local LAN will have different considerations from something which runs over a WAN or the internet). Factors like firewall tunneling, VPN traversal, etc. should play part in your final selected technology profile.
The only other major consideration (that I can think of just now... ;-)) would be to consider the type of data you need to pass about. Is it just text, or do you need to stream binary objects? Would an encoding format (like XML or JSON or bJSON) do the trick? You mention "less than ten message types" as part of the question, but is that the only information which would ever need to be communicated by the system?
Either way, unless the overhead of existing protocols is unacceptable you're better of leveraging established work 99% of the time. Creativity is great - but commercial projects usually benefit from well-known behaviors, even if not the coolest or slickest (kind of the "as long as it works..." approach).
hth!

Is FastCGI still a right answer?

FastCGI is old but it still seems like it must be the right answer in some cases.
It seems like the preferred deployment of Perl/Catalyst web applications is with FastCGI.
FastCGI was popular with Rails but seems to no longer be. (Why?)
The Java world doesn't seem to have anything to do with FastCGI. Is something like Tomcat way better than Apache+FastCGI?
Is choosing FastCGI still a good idea or just a lingering technology?
Ted

Since it depends a lot on your setup and requirements, I'll let the "Is X still a right answer?" up to you. However, by looking at different architectures, you can come up with a list of questions to ask to determine if it still is a right answer given specific circumstances.
Concerns of frequent interest
The questions you'll want to ask are usually related to security and flexibility. For security, you'll want to follow the principle of least privilege. For flexibility, you'll want to know if you can run multiple frameworks, multiple versions of the framework and how easily you can delegate work to other tasks.
Other concerns
For a simple web front-end to a database-backed application, not all of these questions are important. You also need to keep in mind that some of the recommendations have nothing to do with what's outlined here. Many web frameworks will recommend whatever architecture is easiest to setup with their framework. They do this because it helps get new users trying out the framework with minimal fuss and without flooding the mailing list. Also, the Java community tends to stick to a common denominator rather than take full advantage of the platform at hand, so they'll often recommend an all-Java solution.
Popular architectures
Single process architectures
From a pure performance point of view, a single process (probably threaded) with an embedded framework probably gives most performance as it reduces most communication overhead between whatever receives the request and whatever produces a response.
Security: a single process must have all of the permissions required to perform every single task it is handed. In simple applications, this might not be a problem. However, its possible you might serve multiple services
Flexibility: probably can't run multiple version of the same framework (e.g. code for different parts of your website require different versions of Java, Rails, Python, etc.). Moreover, changing your setup to serve some work on different machines becomes painful (less difficult when split up on virtual hosts).
Sub-process based architectures
Under the CGI model, you have to pay the price of spawning a new process for each request. Even on UNIX machines where spawning a process is considered cheap, 600 requests a second will kill your server if you spawn a process for each.
Security: to spawn child processes under different user accounts, your gateway probably runs under quite high privileges.
Flexibility: additional flexibility for the multiple frameworks, multiple versions, multiple languages approach, but you're still stuck on the same machine.
Distributed architectures
The FastCGI/SCGI approach tried to solve the CGI process management problem in a clean way. Just keep the process alive. Have the gateway talk to that process to serve the request.
Security: Because the gateway doesn't spawn the processes that serve requests, the gateway can run with far less privileges enabled. Actually, if it only serves as a gateway and doesn't do any work itself, it can run with hardly any privileges at all.
Flexibility: you get even better flexibility than the CGI model because you can forward the request to any machine on the network.
Conclusion
I like FastCGI, because it gives me high flexibility at a price (i.e. request forwarded through socket) I can afford to pay. It's not my full time job to administer systems. I don't develop all the apps I hosts. This means I look for the easiest solution for hosting whatever I try to host. FastCGI popular enough to be supported by major web servers and popular web frameworks. Adding another app usually just boils down to installing and mapping the desired URL to the application over FastCGI.

For distributed applications, which to use, ASIO vs. MPI?

I am a bit confused about this. If you're building a distributed application, which in some cases may perform parallel operations (although not necessarily mathematical), should you use ASIO or something like MPI? I take it MPI is a higher level than ASIO, but it's not clear where in the stack one would begin.

I know nothing about ASIO but from a quick Google it looks to me to be a lot lower level than MPI. For me the whole point of MPI is so that I can program against a higher level of abstraction from the messaging than, it seems, ASIO provides. Where you begin depends on your needs. For mine, parallelising scientific codes for high-performance, the obvious answer is MPI. I'm not sure I'd use it, or at least not sure it would be my default choice, if I were writing more general-purpose distributed, as opposed to parallel, applications. Well, actually, it probably would be my default choice to avoid learning another approach (most of which are less portable and less long-lived than MPI) but I'll admit it might not be the best choice if starting from an equal footing.

As far as I know MPI is currently incapable of handling the situation, when the new distributed nodes want to join the already started group. The problems also may occur if one of the nodes goes offline.
MPI does not reveal any network related machinery that is underneath. Thus if you would ever need something on the lower level -- you're in trouble. If you on the other hand do not aticipate such a need, then you'll save yourself a lot of time using MPI.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart