I know both error_logger and error_logger_tty_h are swappable handlers of gen_event error_logger.
From their source code, I know error_logger report messages ending up erlang:display, and error_logger_tty_h ending up `io:format(user, String, Args)
What I feel confused is, what difference between error_logger and error_logger_tty_h for purposes?
This is mostly documented in http://erlang.org/doc/man/error_logger.html, but basically, the error_logger module implements the API for starting and interacting with the gen_event server or "event manager" which is also named error_logger. This process will receive the events and pass them to the registered handlers. The API module includes functions like error_msg(...) for sending the actual event messages in the correct format expected by the server.
However - the error_logger module also implements gen_event callback functions so that it can be registered as a handler for the server. This code could (and perhaps should) have been placed in a separate module, but the advantage is that it is trivially known that the callback code is already loaded when the error logger starts.
The error logger is started by the Erlang boot script, before any application - including the kernel app - has been started. At that time, very little is known about the capabilities of the system, but basic error logging still needs to work. The callbacks in the error_logger module implement primitive logging that print any errors during this early phase straight to the stdout of the Beam runtime process - which might just be to a console or /dev/null. It also buffers up to a fixed number of messages to be passed on to a better handler later on.
When the kernel application starts, it will read the kernel app environment settings and swap the error_logger handler to the kind you really want for the system, such as error_logger_tty_h or error_logger_file_h. It will also get any buffered messages from the old logger when it takes over, so it can handle them properly.
That's not quite the end of the story, though, because the error_logger module will still be registered as a handler for the event manager. In this new role it doesn't do any printing or buffering, but it is responsible for forwarding any events to the Erlang node corresponding to the group leader process of the process that dispatched the event. This ensures that error logger events always make their way "home" to be logged on the node of their group leader process.
Related
I am having an existential question about how I design my erlang applications:
I usually create an application, which starts a supervisor and some workers.
Aside from the supervision tree, I have modules with functions (duh).
I also have a web API that calls functions from applications' modules.
When I stop my application (application:stop(foo).), the webserver can still call foo's functions.
I find it "not idiomatic" to not be able to have a proper circuit-breaker for the foo application.
Does it mean that every public functions from foo should spawn a process under it's supervisor?
Thanks,
Bastien
Not necessarily, for two reasons:
The foo application will have two kinds of functions: those that require the worker processes to be running, and those that don't (most likely pure functions). If the application is stopped, obviously the former will fail when called, while the latter will still work. As per Erlang's "let it crash" philosophy, this is just another error condition that the web server needs to handle (or not handle). If the pure functions still work, there is no reason to prohibit the web server from calling them: it means that a greater portion of the system is functional.
In an Erlang node, stopping an application is not something you'd normally do. An Erlang application declares dependencies, that is, applications that need to be running for it to function correctly. You'll notice that if you try to start an application before its dependencies, it will refuse to start. While it's possible to stop applications manually, this means that the state of the node is no longer in accordance with the assumptions of the application model. When building a "release" consisting of a set of Erlang applications, normally they would all be started as permanent applications, meaning that if any one application crashes, the entire Erlang node would exit, in order not to violate this assumption.
Is there any particular reason why there is no start_monitor as an equivalent of spawn_monitor?
Is this simply not needed since gen_servers are usually started by supervisors?
I would like to get a notifications when my temporary workers crash. What is the recommended way to do this in an OTP application?
First idea was to have a gen_server which would monitor workers started by a dynamic supervisor.
More Info:
As far as I know supervisors provide controlled start, shutdown and controlled restart in case of a crash (to get back to a well defined state).
In addition to that, I would like to run a function when a worker process crashes.
For example, I have a C nodes which connect to the Erlang node. Since the C node can't monitor processes (AFAIK) and is also limited in other ways how it can interact with Erlang, I have a "proxy" processes for connecting C nodes in order to keep the C node as simple as possible.
The C nodes do rpc calls to Erlang using ei_rpc_to and processing messages from the connected Erlang node. Messages are either results of rpc calls or "out-of-band" data/info for the C node.
The Erlang "proxy" process monitors its C node using monitor_node to detect if it vanished, but I also need a mechanism for informing the C node that its proxy process crashed. One way of detecting this would be when it does the next rpc call, since it would obviously fail, but since I already have the "out-of-band" message processing in place, I wanted to use that.
Other use case would be having clients which do REST requests to the Erlang cluster. This in turn starts workers that perform some tasks (which may take a long time). After a while the external client may want to get the status of the task. The worker can for example update the status in a Mnesia table, but if it crashes, who will update the table with the failure status.
I know there are many ways of achieving this, but I would like to know what is the Erlang way of doing this.
2nd Edit:
After reading the docs I saw that in a gen_server, terminate will get called (if it is defined with a matching clause). Would this be a viable option to a separate monitoring process? This looks a bit messy since terminate does not get called when receiving 'EXIT' from other processes, so I would also need to trap exits
I am looking to run a service that will be consuming messages that are placed into an SQS queue. What is the best way to structure the consumer application?
One thought would be to create a bunch of threads or processes that run this:
def run(q, delete_on_error=False):
while True:
try:
m = q.read(VISIBILITY_TIMEOUT, wait_time_seconds=MAX_WAIT_TIME_SECONDS)
if m is not None:
try:
process(m.id, m.get_body())
except TransientError:
continue
except Exception as ex:
log_exception(ex)
if not delete_on_error:
continue
q.delete_message(m)
except StopIteration:
break
except socket.gaierror:
continue
Am I missing anything else important? What other exceptions do I have to guard against in the queue read and delete calls? How do others run these consumers?
I did find this project, but it seems stalled and has some issues.
I am leaning toward separate processes rather than threads to avoid the the GIL. Is there some container process that can be used to launch and monitor these separate running processes?
There are a few things:
The SQS API allows you to receive more than one message with a single API call (up to 10 messages, or up to 256k worth of messages, whichever limit is hit first). Taking advantage of this feature allows you to reduce costs, since you are charged per API call. It looks like you're using the boto library - have a look at get_messages.
In your code right now, if processing a message fails due to a transient error, the message won't be able to be processed again until the visibility timeout expires. You might want to consider returning the message to the queue straight away. You can do this by calling change_visibility with 0 on that message. The message will then be available for processing straight away. (It might seem that if you do this then the visibility timeout will be permanently changed on that message - this is actually not the case. The AWS docs state that "the visibility timeout for the message the next time it is received reverts to the original timeout value". See the docs for more information.)
If you're after an example of a robust SQS message consumer, you might want to check out NServiceBus.AmazonSQS (of which I am the author). (C# - sorry, I couldn't find any python examples.)
So we have generic part and the specific part thanks to erlang/OTP behaviours which greatly simplifies our code and gives us a common skeleton that everybody understands.
I am an Erlang newbie and trying to get my head around concepts of OTP.
But amongst
gen_server
gen_fsm
gen_event
supervisor
which behaviour is suitable for what kind of application.
The reason why I am asking this question is because after going through some verbose theory,I am still unsure about which is more suited for a XYZ application.
Is there any criteria or a Rule of thumb,or is it just the programmers instincts that matter in this case?
Any help?
Remark, it is a very general question that doesn't fit well to the usage of this community.
Supervisor are here to control the way your application starts and evolves in time.
It starts processes in the right order with a verification that everything is going fine, manage the restart strategy in case of failure, pool of processes, supervision tree... The advantages are that it helps you to follow the "let it crash" rule in your application, and it concentrates in a few line of code the complex task to coordinate all your processes.
I use gen_event mainly for logging purposes. You can use this behavior to connect on demand different handlers depending on your target. In one of my application, I have by default a graphical logger that allows to check the behavior at a glance, and a textual file handler that allows deeper and post mortem analysis.
Gen_server is the basic brick of all applications. It allows to keep a state, react on call, cast and even simple messages. It manages the code changes and provides features for start and stop. Most of my modules are gen_servers, you can see it as an actor which is responsible to execute a single task.
Gen_fsm brings to gen server the notion of a current state with a specific behavior, and transition from state to state. It can be used for example in a game to control the evolution of the game (configuration, waiting for players, play, change level...)
When you create an application you will identify the different tasks yo have to do and assign them to either gen_server either gen_fsm. You will add gen_event to allow publish/subscribe mechanism. Then you will have to group all processes considering their dependency and error recovering. This will give you the supervision tree and restart strategy.
I use following rules.
If you have some long-living entity that serves multiple clients in request-response manner - you use gen_server.
If this long-living entity should support more complex protocol that just request/response, there is good idea to use gen_fsm. I like use gen_fsm for network protocol implementation, for example.
If you need some abstract event bus, where any process can publish messages, and any client may want receive this messages, you need gen_event.
If you have some proceeses which must be restarted together (for example, if one of them fails), you should put this processes under supervisor, which ties this processes together.
I'm still kind of new to the erlang/otp world, so I guess this is a pretty basic question. Nevertheless I'd like to know what's the correct way of doing the following.
Currently, I have an application with a top supervisor. The latter will supervise workers that call gen_tcp:accept (sleeping on it) and then spawn a process for each accepted connection. Note: To this question, it is irrelevant where the listen() is done.
My question is about the correct way of making these workers (the ones that sleep on gen_tcp:accept) respect the otp design principles, in such a way that they can handle system messages (to handle shutdown, trace, etc), according to what I've read here: http://www.erlang.org/doc/design_principles/spec_proc.html
So,
Is it possible to use one of the available behaviors like gen_fsm or gen_server for this? My guess would be no, because of the blocking call to gen_tcp:accept/1. Is it still possible to do it by specifying an accept timeout? If so, where should I put the accept() call?
Or should I code it from scratch (i.e: not using an existant behavior) like the examples in the above link? In this case, I thought about a main loop that calls gen_tcp:accept/2 instead of gen_tcp:accept/1 (i.e: specifying a timeout), and immediately afterwards code a receive block, so I can process the system messages. Is this correct/acceptable?
Thanks in advance :)
As Erlang is event driven, it is awkward to deal with code that blocks as accept/{1,2} does.
Personally, I would have a supervisor which has a gen_server for the listener, and another supervisor for the accept workers.
Handroll an accept worker to timeout (gen_tcp:accept/2), effectively polling, (the awkward part) rather than receiving an message for status.
This way, if a worker dies, it gets restarted by the supervisor above it.
If the listener dies, it restarts, but not before restarting the worker tree and supervisor that depended on that listener.
Of course, if the top supervisor dies, it gets restarted.
However, if you supervisor:terminate_child/2 on the tree, then you can effectively disable the listener and all acceptors for that socket. Later, supervisor:restart_child/2 can restart the whole listener+acceptor worker pool.
If you want an app to manage this for you, cowboy implements the above. Although http oriented, it easily supports a custom handler for whatever protocol to be used instead.
I've actually found the answer in another question: Non-blocking TCP server using OTP principles and here http://20bits.com/article/erlang-a-generalized-tcp-server
EDIT: The specific answer that was helpful to me was: https://stackoverflow.com/a/6513913/727142
You can make it as a gen_server similar to this one: https://github.com/alinpopa/qerl/blob/master/src/qerl_conn_listener.erl.
As you can see, this process is doing tcp accept and processing other messages (e.g. stop(Pid) -> gen_server:cast(Pid,{close}).)
HTH,
Alin