I've implemented a gen_server and supervisor: test_server and test_sup. I want to test them from the shell/CLI. I've written their start_link functions such that their names are registered locally.
I've found that I can spawn the test_server from the command line just fine, but a spawned test_sup does not allow me to interact with the server at all.
For example, I can spawn a test_server by executing:
1> spawn(test_server, start_link, []).
<0.39.0>
2> registered().
[...,test_server,...]
I can interact with the server, and everything appears fine.
However, if I try to do the same thing with test_sup, no new names/Pids are registered in my "CLI process" (using registered/0). My test_server appears to have been spawned, but I cannot interact with it (see Lukas Larsson's comment about SASL to see why this is true).
I'd assume I coded an error in my supervisor, but this method of starting my supervisor works perfectly fine:
1> {ok, Pid}= test_sup:start_link([]).
{ok, <0.39.0>}
2> unlink(Pid).
true
3> registered().
[...,test_server,test_sup,...]
Why is it that I can spawn a gen_server but not a supervisor?
Update
The code I'm using can be found in this post. I'm using echo_server and echo_sup, two very simple modules.
Given that code, this works:
spawn(echo_server, start_link, []).
and this does not:
spawn(echo_sup, start_link, []).
Whenever trying to figure these things out it is usually very helpful to switch on SASL.
application:start(sasl).
That way you will hopefully get to know why you supervisor is terminating.
This explanation was given by Bernard Duggan on the Erlang questions mailing list:
Linked processes don't automatically
die when a process they are linked to
exits with code 'normal'. That's why
[echo_server] doesn't exit when the
spawning process exits. So why does
the supervisor die? The internals of
the supervisor module are in fact
themselves implemented as a
gen_server, but with
process_flag(trap_exit, true) set.
The result of this is that when the
parent process dies, terminate() gets
called (which doesn't happen when
trap_exit is disabled) and the
supervisor shuts down. It makes sense
in the context of a supervisor, since
a supervisor is spawned by its parent
in a supervision tree - if it didn't
die whenever its parent shutdown,
whatever the reason, you'd have
dangling "branches" of the tree.
Related
i created 2 erlang nodes in the same Windows machine with two cmd windows:'unclient#MYPC' and 'unserveur#MYPC' , the server code is very simple :
-module(serveur).
-export([start/0,recever/0,inverse/1]).
%%%%
start() ->
process_flag(trap_exit,true),
Pid=spawn_link(serveur,recever,[]),
register(ownServer, Pid).
%%%%
recever() ->
receive
{From, X} ->From ! {ownServer,1/X}
end.
%%%%
inverse(X) ->
ownServer!{self(), X},
receive
{'EXIT',_, _} ->start(),
sorry;
{ownServer, Reply} ->Reply
end.
so at the server node cmd i start this module
c(serveur).
serveur:start().
and i tested this server :at the server node cmd i used :
apply(serveur,inverse,[2]).
and i received 0.5 and i tried too causing an error by using an atom in the place of a number :
apply(serveur,inverse,[a]).
and the shell shows the error and shows 'sorry' and the server returns to its work correctly by restarting his child automatically because it is a system process and he traps the exit of his child.
now at the client node i used the rpc call function to try the connection and all is fine, for example i try :
rpc:call('unserveur#MYPC' ,serveur,inverse,[2]).
and at the client node cmd i receive :0.5
now i use an atom to send it to the server for causing an error
rpc:call('unserveur#MYPC' ,serveur,inverse,[a]).
at the client cmd node :
i waited for the response from the server that should be 'sorry' but i didn't receive anything and there is no more the client prompt :
unclient#MYPC 1>
i can write but the shell does not execute my instructions anymore and there is not any prompt.
at the server node :
i see an error and then the server prompt returns normally
unserveur#MYPC 5>
i tried this at the server node prompt :
apply(serveur, inverse, [2]).
and i had an error, so i restart the server manually by calling the start() function at the server node cmd and after that the server returns to work normally.
I tried self() on the server node cmd before and after the client call and the pid is the same and this is logic because the main server process is a system process so my result that he didn't execute the code after receive {'EXIT',...}.
why that happens ? i couldn't understand this bug so any one can explain to me please what that happens ?
This happens because the EXIT message is sent to any process that is linked to the server process, which is not necessarily the same process that sent the message.
It works when you run it in the shell, since the shell process gets linked to the server process, and the shell process is also the one that calls the inverse function. However, when you make an RPC call, the inverse function is called from a different process.
Try making the RPC call from the client, and then running flush(). in the shell of the server node, and you should see the EXIT message.
I have made an OTP compliant application where i have a gen_server and a supervisor. Also i have a script to start them.
My script contains something like this.
erl -pa module_name/ebin -name abc#hostname -setcookie test -s module_sup start_link()
This does not start the supervisor. But when i do module_sup:start_link() inside the shell, it works.
Also when i do
erl -pa module_name/ebin -name abc#hostname -setcookie test -s module_srv start_link()
i.e the server alone without the supervisor, the server gets started.
So, what am i doing wrong here. Are we not allowed to start supervisor in such a way.
Any help would be highly appriciated.
Thanx,
Wilson
supervisor:start_link/2 creates a link to its calling process. when that calling process exits, the supervisor is taken down with it.
erl -s module_sup start_link is starting the supervisor but it is killed because your start function runs inside its own process which dies once the function exits.
you can observe similar behavior with spawn(module_sup, start_link, []). the supervisor starts and gets killed immediately. when you manually start the supervisor, the calling process is the shell. when the shell exits, it will kill the supervisor.
generally the top-level supervisor is meant to be started by an application.
This is very similar to How do I start applications by command line as a daemon? In short, you can't use -s to start a supervisor unless use unlink/1, which is a total kludge. Your time is better spent learning how to package your code as an application. I'd recommend doing this with rebar.
It is important to notice that a process only dies if the linked process is terminating with a reason other than 'normal', which means that a process that simply finishes its execution does not kill the processes linked to it. (source http://www.erlang.org/doc/reference_manual/processes.html#id204170)
I think that is an important aspect of Erlang that should not be misinterpreted.
The following source code shows this:
1> spawn(
1> fun() ->
1> io:format("outer ~p~n", [self()]),
1> spawn_link(
1> fun () ->
1> io:format("inner ~p~n", [self()]),
1> receive
1> Msg -> io:format("received ~p~n", [Msg])
1> end
1> end)
1> end).
outer <0.37.0>
<0.37.0>
inner <0.38.0>
2> is_process_alive(pid(0,37,0)).
false
3> pid(0,38,0) ! test.
received test
test
4>
You can see that the caller <0.37.0> is not running, but the process <0.38.0> is still there, waiting for a message.
Anyway, the supervisor will not terminate when the caller terminates since the supervisor traps exit signals. Of course, unless it is programmed to do so. But I examined the source code and couldn't find this, but alas, my analysis may have been too superficial.
Have you had any luck with that? I will try to run some tests and see if I can figure out what is happening.
I have a one_for_one supervisor that handles similar and totally independent children.
When there is a problem with one child, repeatedly crashing and triggering:
=SUPERVISOR REPORT==== 30-Mar-2011::13:10:42 ===
Supervisor: {local,gateway_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.76.0>}, ...
shutting itself down and also terminating all the innocent children that would just continue to run fine otherwise.
How can I build a supervision tree out of standard Erlang supervisors that only stops to restart the one offending child and leaves the others alone?
I was thinking about having a extra supervisor with just one single child but this seems to heavyweight to me.
Any other ways to handle this?
I think the best solution would be to have two layers of supervision.
One supervisor which starts a supervisor + process pair for each gen_server you want running. This supervisor is configured with one_for_one strategy and temporary children.
Each supervisor running under this supervisor would have correctly configured MaxR and MaxT values, that will trigger a crash of that supervisor once the child misbehaves.
When the lower level supervisor crashes, the top level supervisor "just doesn't care".
A supervisor consumes 233 bytes when started with one child (total heap size) so memory consumption should not be an issue.
The supervision tree should look like:
supervisor_top
|
|
+------------------------+----- ...
| |
supervisor_1 supervisor_2
restart temporary restart temporary
| |
gen_server_1 gen_server_2
restart transient restart transient
This has been my current routine
sudo nohup erl -sname foo -pa ./ebin -run foo_supervisor shell -noshell -noinput &
where the shell function looks something like this
shell() ->
{ok, Pid} = supervisor:start_link({local,?MODULE}, ?MODULE, _Arg = []),
unlink(Pid).
If I don't unlink from shell it immediately stops for some reason. Is there a way I can just start my application like I would normally ie application:start(foo). Also what if I want to start sasl too? Also where could I learn more about making a self contained package using rebar?
Preface. About your unlink
In this other SO thread #filippo explains why you need the unlink when testing supervisors from the shell.
First. What you need is an Erlang application.
Reading from the doc:
In OTP, application denotes a
component implementing some specific
functionality, that can be started and
stopped as a unit, and which can be
re-used in other systems as well.
Details on how to implement an Erlang application are available here. The three main things you will need to do are:
Have a proper directory structure for your application
Write an application callback module implementing the Erlang application behaviour. That's where you will start your root supervisor
Provide an application resource file. This is where you tell the system - among other things - where to find your application callback module (look at the mod parameter).
Second. Starting SASL.
In the above application resource file, you can specify a list of applications you want to start before your application. You will add something like:
...
{applications, [kernel, stdlib, sasl]},
...
To tell it to start SASL.
Third. Rebar.
There's an introduction to Rebar here, which explains you how to use Rebar to help you in the above steps, to pack your brand new application into an Erlang release and how to start it.
I need to run complex Erlang module function from unix shell
rpc:call('node#example.com', mnesia, dirty_first, [mytable])
how can i do it?
UPD:
i make test.escript
chmod +x test.escript
#!/usr/lib64/erlang/bin/escript
%%! -name 'test#example.com'
main(_Args) ->
R = rpc:call('node#example.com', mnesia, dirty_first, [mytable]),
io:format("~p~n",[R]).
and receive {badrpc, nodedown}
but when run
erl -name test#example.com
1> rpc:call('node#example.com', mnesia, dirty_first, [mytable]).
{my, data}.
I mean it works, but howto make escript work proprely?
I think escript might be something worth looking into.
Edit:
Some examples.
First for all examples: Start the remote node somewhere, somehow.
dannib#duval:~:> erl -sname bar
(bar#duval)1> erlang:get_cookie().
'KNKKCFPYMJUPIOLYPOAA'
Escript
1: Create a file named hello.escript with content
#!/usr/bin/env escript
%%! -sname foo#duval -setcookie KNKKCFPYMJUPIOLYPOAA
main(_String) ->
Node = 'bar#duval',
Mod = 'erlang',
Fun = 'node',
Args = [],
R = rpc:call(Node, Mod, Fun, Args),
io:format("Hello there ~p~n",[R]).
Notice that the %%! -sname foo#bar identifies the node on the host (instead of creating nonode#nohost), allow setting the same cookie %%! -sname foo#duvel -setcookie KNKKCFPYMJUPIOLYPOAA as target host which solves the problem of getting {badrpc,nodedown}. Notice that the same statement holds for the following examples (erl_call, and -eval) where both the node name and cookie is set.
2: Set the execution bit and run
$ chmod +x hello.escript
$ ./hello.escript
Hello there bar#duval
Erl_call
1: run
$ erl_call -c 'KNKKCFPYMJUPIOLYPOAA' -a 'erlang node' -n bar#duval
bar#duval
Eval
1: run
$ erl -sname foo -setcookie 'KNKKCFPYMJUPIOLYPOAA'
-eval 'io:format("Hello there ~p~n",[rpc:call(bar#duval,erlang, node, [])])'
... Eshell V5.7.4 (abort with ^G)
(foo#duval)1> Hello there bar#duval
This creates a shell which might not be what you want in this case.
I might mention that if both nodes are on the same host and using the same cookie default value, the cookie value for foo and bar don't have to be explicitly set like in the examples.
After doing these examples and reading your question again I think what I GIVE TERRIBLE ADVICE said will be your best choice, erl_call. I fell for the word "complex" in question title where imho escripts allow much more "complex" setups in a easy-to-read manner. The variable _String in the escript example holds the arguments to the script which allows you to both access input through shell and perform complex erlang operations in the EVM. But erl_call might be more straight forward if you already have logic in some other application and just need to make this simple call to an erlang node.
The erl_call application is exactly what you need:
erl_call makes it possible to start and/or communicate with a distributed Erlang node. It is built upon the erl_interface library as an example application. Its purpose is to use an Unix shell script to interact with a distributed Erlang node. It performs all communication with the Erlang rex server, using the standard Erlang RPC facility. It does not require any special software to be run at the Erlang target node.
The main use is to either start a distributed Erlang node or to make an ordinary function call. However, it is also possible to pipe an Erlang module to erl_call and have it compiled, or to pipe a sequence of Erlang expressions to be evaluated (similar to the Erlang shell).
See the examples for more details
You can use -eval flag of erl:
$ erl -eval 'io:format("Hello, World!~n")'
You can parse complex arguments with escript:
#!/usr/bin/env escript
main(String) ->
{Node, Mod, Fun, Args} = parse_args(String),
R = rpc:call(Node, Mod, Fun, Args),
io:format("~p~n",[R]).
If your problem is how to set the Erlang node in network mode (i.e. turn the node into a distributed node), you might want to do something like
EPMD = code:root_dir() ++ "/bin/epmd &",
os:cmd(EPMD),
net_kernel:start([Sname, shortnames])
where Sname is your wanted node name. Only after this can you start communicating to another node with e.g. rpc.