Detecting failure at runtime? - erlang

1) Is there a way to automatically detect when a node fails, from another node?
2) Is there a way to automatically re-start the node which just crashed?
Regarding my second question, I have googled about and I cannot see to find any mention to creating nodes from code/at runtime.
I understand you can do this with processes- creating processes at runtime is trivial and if you want to know when they crash you can create them from a supervisor etc- but I cant find anything relating to node detection/creation.
Need this for a client who wish to design a smaller version of Amazon EDS, but I cannot imagine Amazon manually restarting nodes if they go down!

You can make use of net_kernel:monitor_nodes(true, [{node_type, visible}]) to monitor all visible nodes from inside your erlang application. From man page:
The calling process subscribes or unsubscribes to node status change
messages. A nodeup message is delivered to all subscribing process
when a new node is connected, and a nodedown message is delivered when
a node is disconnected.
I don't see any straight forward method (from inside your process which receives nodedown message) using which you can start a node on remote machine. You will probably need to write a small module which do this for you automatically.

Related

how do I remove persistent users off of a node disabled by a monitor?

In a Big-IP LTM system I have http monitors setup for a pool so that system owners can remove a file a node in the pool to remove a node from rotation. But monitors mark a node as disabled, not offline, so cookie-based persistence will still send existing users to the node that should be down. Whats the best way to use monitors to either offline a node instead of disabling it, or forcing users to a new node despite persistence?
Disabling a pool member still allows active connections/persistent connections to function. And depending on how you have persistence defined, that can end up being a LONG time.
Forced Offline still allows active connections to complete their transactions but would move previously persisted traffic to other nodes.
When doing maintenance I would force the node offline and then give sessions 5 to 10 minutes to complete before taking the node down in infrastructure. There's no good way to drop a node with active connections further dependent on what the client is actually doing.
Here's a great response on F5's Community by one of their MVP's to help explain connections.
Pool Member disabled/forced offline behavior # DevCentral
Let me know if you need more details or if you're seeing different behavior. Also, you can do all of this with REST API so you don't need to use the GUI. Doing a quick Node Offline/Online is super easy and quick.

How do I handle the death of another node in Erlang?

I have two nodes that are connected to each other, where one of them is the server. The server would like to know if the client dies. I did something like this:
link(Client).
In the server process, and when I did that I receive a exception error: noconnection and then the server dies, when the client dies. I would just like to know if the client dies, I do not want the server do die, how do I handle the death message?
If you have two erlang nodes and want to take some actions in case if one node goes down (or network connection is lost) you possible want to use erlang:monitor_node/2,3 functions:
(n1#myhost)1> erlang:monitor_node('n2#myhost', true).
true
then if 'n2#myhost' node goes down your process will receive message:
(n1#myhost)2> flush().
Shell got {nodedown,n2#myhost}
(note, I did that from erlang shell, that is why I may call flush/0 to see what is in the mailbox of the shell process)
If you interested in certain process, on the second node you may use erlang:monitor/2
(n1#myhost)3> Ref = erlang:monitor(process, {'n2#myhost', some_registered_name}).
#Ref<0.0.0.117>
from now you will receive a message if some_registered_name goes down and you can take an action.
Also you may be interested in how to write distributed applications
To have unidirectional supervision, you should use monitors. Then your server will receive a message if the client dies.

Is this the right way of building an Erlang network server for multi-client apps?

I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.

Erlang: Why don't I see error_logger:info_msg output when connected by remsh?

I connect to a running node with the -remsh flag, and I run my usual Common Test sanity tests, but none of the error_logger:info_msg messages appear in the shell. Why?
The SASL default event handler will only write events to console/tty of the local node.
When connecting via "-remsh", you're starting a second node and communicating via
message passing to the first. The output from the "nodes()" BIF can confirm this.
Calls to the error_logger functions will send events to the local 'error_logger'
registered process, which is a gen_event server. You can manipulate it using
error_logger:tty/1 and error_logger:logfile/1, see the reference docs in the "Basic"
Application Group, then the "kernel" application, then the "error_logger" module.
You can also add your own event handler to the 'error_logger' server, which can then
do anything you want with the event. I'd guess that error_logger:logfile/1 might be
sufficient for your purposes, though.
-Scott

TService won’t process messages

I have created a windows service that uses Windows Messaging System. When I test the app from the debugger the Messages go through nicely but when I install it my messag … asked 14 mins ago
vladimir
1tuga
Services don't generally receive window messages. They don't necessarily have window handles at all. Even if they do, they run in a separate desktop. Programs cannot send messages from one desktop to another, so a service can only receive messages from another service, or from a program started by a service.
Before Windows Vista, you could have configured your service to interact with the desktop. That makes the service run on the same desktop as a logged-in user, so a program running as that user could send messages to your service's windows. Windows Vista isolates services, though; they can't interact with any user's desktop anymore.
There are many other ways to communicate with services. They include named pipes, mailslots, memory-mapped files, semaphores, events, and sockets.
With a socket, for instance, your service could listen on an open port, and programs that need to communicate with it could connect to that port. This could open the door to remote administration, but you can also restrict the service to listen only for local connections.
All the above is trying to tell you that you're taking the wrong approach. But there's also the matter of the problem at hand. Your program behaves one way in the debugger and another way outside it. How are you debugging the service in the first place, if it's not installed? What user account is your service running as? Your debugger? What debugging techniques have you tried that don't involve the debugger (e.g. writeln to a log file to track your program's actions)?
What do you mean when you say it "uses" Windows Messaging System? Are you consuming or sending Windows Messages?
If you send a Windows message, you need ensure you are doing it correctly. I'd suggest writing a message loop to ensure your messages are being dispatched properly. I'd also suggest reading up on message loops and how they work.
What is a Message Loop (click the title to be taken to the source of this info)
while(GetMessage(&Msg, NULL, 0, 0) > 0)
{
TranslateMessage(&Msg);
DispatchMessage(&Msg);
}
The message loop calls GetMessage(),
which looks in your message queue.
If the message queue is empty your
program basically stops and waits
for one (it Blocks).
When an event occures causing a
message to be added to the queue
(for example the system registers a
mouse click) GetMessages() returns a
positive value indicating there is a
message to be processed, and that it
has filled in the members of the MSG
structure we passed it. It returns 0
if it hits WM_QUIT, and a negative
value if an error occured.
We take the message (in the Msg
variable) and pass it to
TranslateMessage(), this does a bit
of additional processing,
translating virtual key messages
into character messages. This step
is actually optional, but certain
things won't work if it's not there.
Once that's done we pass the message
to DispatchMessage(). What
DispatchMessage() does is take the
message, checks which window it is
for and then looks up the Window
Procedure for the window. It then
calls that procedure, sending as
parameters the handle of the window,
the message, and wParam and lParam.
In your window procedure you check
the message and it's parameters, and
do whatever you want with them! If
you aren't handling the specific
message, you almost always call
DefWindowProc() which will perform
the default actions for you (which
often means it does nothing).
Once you have finished processing
the message, your windows procedure
returns, DispatchMessage() returns,
and we go back to the beginning of
the loop.
Thank you all for the answers,
the issue was the operating system (vista), i tested the with my windows 2000 and everything works.
thanks for the light Rob.

Resources