How do I handle the death of another node in Erlang?

How do I handle the death of another node in Erlang? - erlang

I have two nodes that are connected to each other, where one of them is the server. The server would like to know if the client dies. I did something like this:
link(Client).
In the server process, and when I did that I receive a exception error: noconnection and then the server dies, when the client dies. I would just like to know if the client dies, I do not want the server do die, how do I handle the death message?

If you have two erlang nodes and want to take some actions in case if one node goes down (or network connection is lost) you possible want to use erlang:monitor_node/2,3 functions:
(n1#myhost)1> erlang:monitor_node('n2#myhost', true).
true
then if 'n2#myhost' node goes down your process will receive message:
(n1#myhost)2> flush().
Shell got {nodedown,n2#myhost}
(note, I did that from erlang shell, that is why I may call flush/0 to see what is in the mailbox of the shell process)
If you interested in certain process, on the second node you may use erlang:monitor/2
(n1#myhost)3> Ref = erlang:monitor(process, {'n2#myhost', some_registered_name}).
#Ref<0.0.0.117>
from now you will receive a message if some_registered_name goes down and you can take an action.
Also you may be interested in how to write distributed applications

To have unidirectional supervision, you should use monitors. Then your server will receive a message if the client dies.

Related

Can I call GenServer client functions from a remote node?

I have a GenServer on a remote node with both implementation and client functions in the module. Can I use the GenServer client functions remotely somehow?
Using GenServer.call({RemoteProcessName, :"app#remoteNode"}, :get) works a I expect it to, but is cumbersome.
If I want clean this up am I right in thinking that I'd have to write the client functions on the calling (client) node?

You can use the :rpc.call/{4,5} functions.
:rpc.call(:"app#remoteNode", MyModule, :some_func, [arg1, arg2])

For large number of calls, It's better to user gen_server:call/2-3.
If you want to use rpc:call/4-5, you should know that it is just one process named rex on each node for handling all requests. So if it is running one Mod:Func(Arg1, Arg2, Argn), It can not response to other request at this time !

TL;DR
Yes
Discussion
There are PIDs, messages, monitors and links. Nothing more, nothing less. That is your universe. (Unless you get into some rather esoteric aspects of the runtime implementation -- but at the abstraction level represented by EVM languages the previously stated elements (should) constitute your universe.)
Within an Erlang environment (whether local or distributed in a mesh) any PID can send a message addressed to any other PID (no middle-man required), as well as establish monitors and so on.
gen_server:cast sends a gen_server packaged message (so it will arrive in the form handle_cast/2 will be called on). gen_server:call/2 establishes a monitor and a timeout for receiving a labeled reply. Simply doing PID ! SomeMessage does essentially the same thing as gen_server:cast (sends a message) without any of the gen_server machinery behind it (messier to abstract as an interface).
That's all there is to it.
With this in mind, of course you can use gen_server:call/2 across nodes, as long as they are connected into a cluster/mesh via disterl. Two disconnected nodes would have to communicate a different way (network sockets) and wouldn't have any knowledge of each other's internal mapping of PIDs, but as long as disterl is being used they all translate PIDs amongst themselves quite readily. Named processes is where things get a little tricky, but that is the purpose of the global module and utilities such as gproc (though dependence on such facilities beyond a certain point is usually an indication of an architectural problem).
Of course, just because PIDs from any node can communicate with PIDs from another node doesn't always means they should. The physical topology of the network (bandwidth, latency, jitter) comes into play when you start sending high-frequency or large messages (lots of gen_server:calls), and you have always got to think of partition tolerance -- but for off-loading heavy sorts of work (rare) or physically partitioning sub-systems within a very large system (more common) directly sending messages is a very simple way to take a program coded for a single node and distribute it across a cluster.
(With all that in mind, it is somewhat rare to see the rpc module used.)

Distributed erlang security how to?

I want to have 2 independent erlang nodes that could communicate with each other:
so node a#myhost will be able to send messages to b#myhost.
Are there any ways to restrict node a#myhost, so only a function from a secure_module could be called on b#myhost?
It should be something like:
a#myhost> rpc:call(b#myhost,secure_module,do,[A,B,C]) returns {ok,Result}
and all other calls
a#myhost> rpc:call(b#myhost,Modue,Func,Args) return {error, Reason}
One of the options would be to use ZeroMQ library to establish a communication between nodes, but would it be better if it could be done using some standard Erlang functions/modules?

In this case distributed Erlang is not what you want. Connecting node A to node B makes a single cluster -- one huge, trusted computing environment. You don't want to trust part of this, so you don't want a single cluster.
Instead write a specific network service. Use the network itself as your abstraction layer. The most straightforward way to do this is to establish a stream connection (just boring old gen_tcp, or gen_sctp or use ssl, or whatever) from A to B.
The socket handling process on A receives messages from whatever parts of node A need to call B -- you write this exactly as you would if they were directly connected. Use a normal Erlang messaging style: Message = {name_of_request, Data} or similar. The connecting process on A simply does gen_tcp:send(Socket, term_to_binary(Message)).
The socket handling process on B shuttles received network messages between the socket and your servicing processes by simply receiving {tcp, Socket, Bin} -> Servicer ! binary_to_term(Bin).
Results of computation go back the other direction through the exact same process using the term_to_binary/binary_to_term translation again.
Your service processes should be receiving well defined messages, and disregarding whatever doesn't make sense (usually just logging the nonsense). So in this way you are not doing a direct RPC (which is unsafe in an untrusted environment) you are only responding to valid semantics defined in your (little tiny) messaging protocol. The way the socket handling processes are written is what can abstract this for you and make it feel just as though you are dealing with a trusted environment within distributed Erlang, but actually you have two independent clusters which are limited in what they can request of each other by the definition of your protocol.

Detecting failure at runtime?

1) Is there a way to automatically detect when a node fails, from another node?
2) Is there a way to automatically re-start the node which just crashed?
Regarding my second question, I have googled about and I cannot see to find any mention to creating nodes from code/at runtime.
I understand you can do this with processes- creating processes at runtime is trivial and if you want to know when they crash you can create them from a supervisor etc- but I cant find anything relating to node detection/creation.
Need this for a client who wish to design a smaller version of Amazon EDS, but I cannot imagine Amazon manually restarting nodes if they go down!

You can make use of net_kernel:monitor_nodes(true, [{node_type, visible}]) to monitor all visible nodes from inside your erlang application. From man page:
The calling process subscribes or unsubscribes to node status change
messages. A nodeup message is delivered to all subscribing process
when a new node is connected, and a nodedown message is delivered when
a node is disconnected.
I don't see any straight forward method (from inside your process which receives nodedown message) using which you can start a node on remote machine. You will probably need to write a small module which do this for you automatically.

Compensating for a one_for_one supervisor's inability to restart a child: tcp/ip port listeners

I have created a generic behavior that encapsulates tcp/ip functionality. All the user of the behaviour has to do is implement the callbacks that handle parsed "commands" that come from whatever is on the other side of the socket.
My generic behvour creates a port-listener process that listens on a port via gen_tcp:accept. When someone connects to the port, the port-listener asks a supervisor to spin-up a new port-listener while it goes on to handle the socket communication with whatever client just connected. Because each of these port-listeners / socket-handlers are dynamically created and identical, I am using a simple_one_for_one supervisor to create them. Standard stuff.
Here is my question. If the port-listening process dies, the entire behivour is non-functional since there will be nothing listening to the port. Becuase the port-listener is create by a simple_one_for_one supervisor, the supervisor cannot restart a new port_listener.
So, do I create a keep_alive process that monitors the "latest" port listener and asks the superviosr to start another one should it die? Or, is there some other best-practice for this type of case.
Also, is there a way to see/examine the process being created by this behavior? It is not an application, so appmon doesn't work here.
Thanks

You probably could go on with only one listener process since you are always able to transmit the socket ownership to another process by means of
gen_tcp:controlling_process(Socket, Pid)
And then your listener will be able too.
Then you would not be forced to simple_one_for_one supervisor at the top level but one_for_one instead. Or what you think should fit better. The top level supervisor will then spawn listener process and acceptors supervisor with simple_one_for_one strategy. Then the listener will surely be restarted if goes down somewhy (and if you want to).
Further you may consult a cowboy project to see what approaches authors are using.

Erlang: Why don't I see error_logger:info_msg output when connected by remsh?

I connect to a running node with the -remsh flag, and I run my usual Common Test sanity tests, but none of the error_logger:info_msg messages appear in the shell. Why?

The SASL default event handler will only write events to console/tty of the local node.
When connecting via "-remsh", you're starting a second node and communicating via
message passing to the first. The output from the "nodes()" BIF can confirm this.
Calls to the error_logger functions will send events to the local 'error_logger'
registered process, which is a gen_event server. You can manipulate it using
error_logger:tty/1 and error_logger:logfile/1, see the reference docs in the "Basic"
Application Group, then the "kernel" application, then the "error_logger" module.
You can also add your own event handler to the 'error_logger' server, which can then
do anything you want with the event. I'd guess that error_logger:logfile/1 might be
sufficient for your purposes, though.
-Scott

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart