erlang otp child workers - erlang

I'm trying to get an OTP supervisor to start child workers which will (eventually) connect to remote servers. I used Rebar to create a template test application and I'm trying to get the supervisor to fire off function 'hi' in module 'foo'. it compiles OK and runs:
Eshell V5.8.5 (abort with ^G)
1> test_app:start(1,1).
{ok,<0.34.0>}
but when I try to start the worker it goes pear shaped with this error:
2> test_sup:start_foo().
{error,{badarg,{foo,{foo,start_link,[]},
permanent,5000,worker,
[foo]}}}
The problem seems similar, but not the same, to this question: Erlang - Starting a child from the supervisor module
Any ideas?
test_app.erl
-module(test_app).
-behaviour(application).net
-export([start/2, stop/1]).
start(_StartType, _StartArgs) ->
test_sup:start_link().
stop(_State) ->
ok.
Test_sup.erl:
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1, start_foo/0]).
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, { {one_for_one, 5, 10}, []} }.
start_foo()->
supervisor:check_childspecs(?CHILD(foo, worker)),
supervisor:start_child(?MODULE, ?CHILD(foo, permanent)).
foo.erl:
-module(foo).
-export([hi/0]).
hi()->
io:format("worker ~n").

You check the childspec using the macro call ?CHILD(foo, worker) while you try to start the child with the macro using the macro call ?CHILD(foo, permanent). The second argument of the CHILD macro is the process type which should be either worker or supervisor. So the first macro call is correct. The value permanent is a value for the restart type, which you have already set to permanent, so the second call is wrong and you get a badarg error.
Note: It is quite common that library functions generate badarg errors as well, not just from built-in functions. It is not always obvious why it is a badarg.

I think that Robert answer is incomplete, after replacing permanent by worker you still have an error returned by supervisor:check_childspecs(?CHILD(foo, worker)),, I don't know why.
[edit]
The problem of bard arg comes from ... badarg :o)
check_childspecs extepect a list of child_specs, the correct syntax is supervisor:check_childspecs([?CHILD(foo, worker)]), and then it works fine. the following code is updated.
[end of edit]
But you will get also an error because the supervisor will try to launch the function foo:start_link that does not exist in the foo module.
the following code print an error, but seems to work properly.
-module(foo).
-export([hi/0,start_link/0,loop/0]).
start_link() ->
{ok,spawn_link(?MODULE,loop,[])}.
hi()->
io:format("worker ~n").
loop() ->
receive
_ -> ok
end.
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1, start_foo/0]).
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, { {one_for_one, 5, 10}, []} }.
start_foo()->
io:format("~p~n",[supervisor:check_childspecs([?CHILD(foo, worker)])]),
supervisor:start_child(?MODULE, ?CHILD(foo, worker)).
[edit]
answering to David comment
in my code the loop/0 does not loop at all, on the receive block, the process waits for any message, and as soon as it receives one, the process dies returning the value ok. So as long as the worker process does not receive any message, it keeps living, which is nice when you make some test with supervisors :o).
On the opposite, the hi/0 function simply prints 'worker' on the console and finishes. As the restart strategy of the supervisor is one_for_one, the max restart is 5 and the child process is permanent, the supervisor will try to start the hi process 5 times, printing five time 'worker' on the console, and then it will give up and terminate itself with an error message ** exception error: shutdown
Generally you should choose permanent for never ending processes (main server of an application for example). For process that normally die as soon as they have done their job, you should use temporary. I never used transient but I read that it should be used for process that must complete a task before dying.

Related

Erlang server connecting with ports to send and receive a Json file to a Java application

I have tried to implement a server with Erlang to my Java application.
Seems that my server is working, but still full of bugs and dead points.
I need to receive a JSON file parsed by the Java application into a map and send it back to all clients, including the one that uploaded the file.
Meanwhile, I need to keep track who made the request and which part of the message was sent, in case of any problems the client should be restarted from this point, not from the beginning. Unless the client leaves the application, then it should restart.
My three pieces of code will be below:
The app.erl
-module(erlServer_app).
-behaviour(application).
%% Application callbacks
-export([start/2, stop/1]).
%%%===================================================================
%%% Application callbacks
%%%===================================================================
start(_StartType, _StartArgs) ->
erlServer_sup:start_link().
stop(_State) ->
ok.
The supervisor.erl:
-module(erlServer_sup).
-behaviour(supervisor).
%% API
-export([start_link/0]).
%% Supervisor callbacks
-export([init/1, start_socket/0, terminate_socket/0, empty_listeners/0]).
-define(SERVER, ?MODULE).
%%--------------------------------------------------------------------
%% #doc
%% Starts the supervisor
%%
%% #end
%%--------------------------------------------------------------------
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
%%%===================================================================
%%% Supervisor callbacks
%%%===================================================================
init([]) -> % restart strategy 'one_for_one': if one goes down only that one is restarted
io:format("starting...~n"),
spawn_link(fun() -> empty_listeners() end),
{ok,
{{one_for_one, 5, 30}, % The flag - 5 restart within 30 seconds
[{erlServer_server, {erlServer_server, init, []}, permanent, 1000, worker, [erlServer_server]}]}}.
%%%===================================================================
%%% Internal functions
%%%===================================================================
start_socket() ->
supervisor:start_child(?MODULE, []).
terminate_socket() ->
supervisor:delete_child(?MODULE, []).
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,20)],
ok.
The server.erl:(I have a lot of debugging io:format.)
-module(erlServer_server).
%% API
-export([init/0, start_server/0]).
%% Defining the port used.
-define(PORT, 8080).
%%%===================================================================
%%% API
%%%===================================================================
init() ->
start_server().
%%%===================================================================
%%% Server callbacks
%%%===================================================================
start_server() ->
io:format("Server started.~n"),
Pid = spawn_link(fun() ->
{ok, ServerSocket} = gen_tcp:listen(?PORT, [binary, {packet, 0},
{reuseaddr, true}, {active, true}]),
io:format("Baba~p", [ServerSocket]),
server_loop(ServerSocket) end),
{ok, Pid}.
server_loop(ServerSocket) ->
io:format("Oba~p", [ServerSocket]),
{ok, Socket} = gen_tcp:accept(ServerSocket),
Pid1 = spawn(fun() -> client() end),
inet:setopts(Socket, [{packet, 0}, binary,
{nodelay, true}, {active, true}]),
gen_tcp:controlling_process(Socket, Pid1),
server_loop(ServerSocket).
%%%===================================================================
%%% Internal functions
%%%===================================================================
client() ->
io:format("Starting client. Enter \'quit\' to exit.~n"),
Client = self(),
{ok, Sock} = gen_tcp:connect("localhost", ?PORT, [{active, false}, {packet, 2}]),
display_prompt(Client),
client_loop(Client, Sock).
%%%===================================================================
%%% Client callbacks
%%%===================================================================
send(Sock, Packet) ->
{ok, Sock, Packet} = gen_tcp:send(Sock, Packet),
io:format("Sent ~n").
recv(Packet) ->
{recv, ok, Packet} = gen_tcp:recv(Packet),
io:format("Received ~n").
display_prompt(Client) ->
spawn(fun () ->
Packet = io:get_line("> "),
Client ! {entered, Packet}
end),
Client ! {requested},
ok.
client_loop(Client, Sock) ->
receive
{entered, "quit\n"} ->
gen_tcp:close(Sock);
{entered, Packet} ->
% When a packet is entered we receive it,
recv(Packet),
display_prompt(Client),
client_loop(Client, Sock);
{requested, Packet} ->
% When a packet is requested we send it,
send(Sock, Packet),
display_prompt(Client),
client_loop(Client, Sock);
{error, timeout} ->
io:format("Send timeout, closing!~n", []),
Client ! {self(),{error_sending, timeout}},
gen_tcp:close(Sock);
{error, OtherSendError} ->
io:format("Some other error on socket (~p), closing", [OtherSendError]),
Client ! {self(),{error_sending, OtherSendError}},
gen_tcp:close(Sock)
end.
This is the first server I'm doing and I got lost probably in the middle. When I run it seems to be working, but hanging. Can someone help me? My localhost never loads anything it keeps loading forever.
How can my java app receive it from the same port?
I MUST use Erlang and I MUST use ports to connect to the java application.
Thanks for helping me!
Let's rework this just a little...
First up: Naming
We don't use camelCase in Erlang. It is confusing because capitalized variable names and lower-case (or single-quoted) atoms mean different things. Also, module names must be the same as file names which causes problems on case-insensitive filesystems.
Also, we really want a better name than "server". Server can mean a lot of things in a system like this, and while the system overall may be a service written in Erlang, it doesn't necessarily mean that we can call everything inside a "server" without getting super ambiguous! That's confusing. I'm going to namespace your project as "ES" for now. So you'll have es_app and es_sup and so on. This will come in handy later when we want to start defining new modules, maybe some of them called "server" without having to write "server_server" all over the place.
Second: Input Data
Generally speaking, we would like to pass arguments to functions instead of burying literals (or worse, macro rewrites) inside of code. If we are going to have magic numbers and constants let's do our best to put them into a configuration file so that we can access them in a programmatic way, or even better, let's use them in the initial startup calls as arguments to subordinate processes so that we can rework the behavior of the system (once written) only by messing around with the startup calling functions in the main application module.
-module(es).
-behaviour(application).
-export([listen/1, ignore/0]).
-export([start/0, start/1]).
-export([start/2, stop/1]).
listen(PortNum) ->
es_client_man:listen(PortNum).
ignore() ->
es_client_man:ignore().
start() ->
ok = application:ensure_started(sasl),
ok = application:start(?MODULE),
io:format("Starting es...").
start(Port) ->
ok = start(),
ok = es_client_man:listen(Port),
io:format("Startup complete, listening on ~w~n", [Port]).
start(normal, _Args) ->
es_sup:start_link().
stop(_State) ->
ok.
I added a start/1 function above as well as a start/0, a listen/1 and an ignore/0 that you'll see again later in es_client_man. These are mostly convenience wrappers around things you could call more explicitly, but probably won't want to type all the time.
This application module kicks things off by having an application master start the project for us (by calling application:start/1) and then the next line calls erl_server_server to tell it to start listening. In early development I find this approach much more useful than burying autostarts to every component all over the place, and later on it gives us a very simple way to write an external interface that can turn various components on and off.
Ah, also... we're going to start this as a for-real Erlang application, so we'll need an app file in ebin/ (or if you're using erlang.mk or something similar an app.src file in src/):
ebin/es.app looks like this:
{application,es,
[{description,"An Erlang Server example project"},
{vsn,"0.1.0"},
{applications,[stdlib,kernel,sasl]},
{modules,[es,
es_sup,
es_clients,
es_client_sup,
es_client,
es_client_man]},
{mod,{es,[]}}]}.
The list under modules reflects the layout of the supervision tree, actually, as you will see below.
The start/2 function above now asserts that we will only start in normal mode (which may or may not be appropriate), and disregards startup args (which also may or may not be appropriate).
Third: The Supervision Tree
-module(es_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
RestartStrategy = {one_for_one, 1, 60},
Clients = {es_clients,
{es_clients, start_link, []},
permanent,
5000,
supervisor,
[es_clients]},
Children = [Clients],
{ok, {RestartStrategy, Children}}.
And then...
-module(es_clients).
-behavior(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, none).
init(none) ->
RestartStrategy = {rest_for_one, 1, 60},
ClientSup = {es_client_sup,
{es_client_sup, start_link, []},
permanent,
5000,
supervisor,
[es_client_sup]},
ClientMan = {es_client_man,
{es_client_man, start_link, []},
permanent,
5000,
worker,
[es_client_man]},
Children = [ClientSup, ClientMan],
{ok, {RestartStrategy, Children}}.
Whoa! What is going on here?!? Well, the es_sup is a supervisor, not a one-off thingy that just spawns other one-off thingies. (Misunderstanding supervisors is part of your core problem.)
Supervisors are boring. Supervisors should be boring. All they really do as a reader of code is what the structure of the supervision tree is inside. What they do for us in terms of OTP structure is extremely important, but they don't require that we write any procedural code, just declare what children it should have. What we are implementing here is called a service -> worker structure. So we have the top-level supervisor for your overall application called es_sup. Below that we have (for now) a single service component called es_clients.
The es_clients process is also a supervisor. The reason for this is to define a clear way for the client connection parts to not affect whatever ongoing state may exist in the rest of your system later. Just accepting connections from clients is useless -- surely there is some state that matters elsewhere, like long-running connections to some Java node or whatever. That would be a separate service component maybe called es_java_nodes and that part of the program would begin with its own, separate supervisor. That's why it is called a "supervision tree" instead of a "supervision list".
So back to clients... We will have clients connecting. That is why we call them "clients", because from the perspective of this Erlang system the things connecting are clients, and the processes that accept those connections abstract the clients so we can treat each connection handler as a client itself -- because that is exactly what it represents. If we connect to upstream services later, we would want to call those whatever they are abstracting so that our semantics within the system is sane.
You can then think in terms of "an es_client sent a message to an es_java_node to query [thingy]" instead of trying to keep things straight like "a server_server asked a java_server_client to server_server the service_server" (which is literally how stupid things get if you don't keep your naming principles straight in terms of inner-system perspective).
Blah blah blah...
So, here is the es_client_sup:
-module(es_client_sup).
-behaviour(supervisor).
-export([start_acceptor/1]).
-export([start_link/0]).
-export([init/1]).
start_acceptor(ListenSocket) ->
supervisor:start_child(?MODULE, [ListenSocket]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, none).
init(none) ->
RestartStrategy = {simple_one_for_one, 1, 60},
Client = {es_client,
{es_client, start_link, []},
temporary,
brutal_kill,
worker,
[es_client]},
{ok, {RestartStrategy, [Client]}}.
Are you picking up a pattern? I wasn't kidding when I said that "supervisors should be boring..." :-) Note that here we are actually passing an argument in and we have defined an interface function. That is so we have a logicalish place to call if we need a socket acceptor to start.
Fourth: The Client Service Itself
Let's look at the client manager:
-module(es_client_man).
-behavior(gen_server).
-export([listen/1, ignore/0]).
-export([start_link/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
code_change/3, terminate/2]).
-record(s, {port_num = none :: none | inet:port_number(),
listener = none :: none | gen_tcp:socket()}).
listen(PortNum) ->
gen_server:call(?MODULE, {listen, PortNum}).
ignore() ->
gen_server:cast(?MODULE, ignore).
start_link() ->
gen_server:start_link({local, ?MODULE}, ?MODULE, none, []).
init(none) ->
ok = io:format("Starting.~n"),
State = #s{},
{ok, State}.
handle_call({listen, PortNum}, _, State) ->
{Response, NewState} = do_listen(PortNum, State),
{reply, Response, NewState};
handle_call(Unexpected, From, State) ->
ok = io:format("~p Unexpected call from ~tp: ~tp~n", [self(), From, Unexpected]),
{noreply, State}.
handle_cast(ignore, State) ->
NewState = do_ignore(State),
{noreply, NewState};
handle_cast(Unexpected, State) ->
ok = io:format("~p Unexpected cast: ~tp~n", [self(), Unexpected]),
{noreply, State}.
handle_info(Unexpected, State) ->
ok = io:format("~p Unexpected info: ~tp~n", [self(), Unexpected]),
{noreply, State}.
code_change(_, State, _) ->
{ok, State}.
terminate(_, _) ->
ok.
do_listen(PortNum, State = #s{port_num = none}) ->
SocketOptions =
[{active, once},
{mode, binary},
{keepalive, true},
{reuseaddr, true}],
{ok, Listener} = gen_tcp:listen(PortNum, SocketOptions),
{ok, _} = es_client:start(Listener),
{ok, State#s{port_num = PortNum, listener = Listener}};
do_listen(_, State = #s{port_num = PortNum}) ->
ok = io:format("~p Already listening on ~p~n", [self(), PortNum]),
{{error, {listening, PortNum}}, State}.
do_ignore(State = #s{listener = none}) ->
State;
do_ignore(State = #s{listener = Listener}) ->
ok = gen_tcp:close(Listener),
State#s{listener = none}.
Hmmm, what is all that about? The basic idea here is that we have a service supervisor over the whole concept of clients (es_clients, as discussed above), and then we have the simple_one_for_one to handle whatever clients happen to be alive just now (es_client_sup), and here we have the management interface to the subsystem. All this manager does is keep track of what port we are listening on, and owns the socket that we are listening to if one is open at the moment. Note that this can be easily rewritten to allow any arbitrary number of ports to be listened to simultaneously, or track all living clients, or whatever. There really is no limit to what you might want to do.
So how do we start clients that can accept connections? By telling them to spawn and listen to the listen socket that we pass in as an argument. Go take another look at es_client_sup above. We are passing in an empty list as its first argument. What will happen when we call its start_link function is that whatever else we pass in as a list will get added to the list of arguments overall. In this case we will pass in the listen socket and so it will be started with the argument [ListenSocket].
Whenever a client listener accepts a connection, its next step will be to spawn its successor, handing it the original ListenSocket argument. Ah, the miracle of life.
-module(es_client).
-export([start/1]).
-export([start_link/1, init/2]).
-export([system_continue/3, system_terminate/4,
system_get_state/1, system_replace_state/2]).
-record(s, {socket = none :: none | gen_tcp:socket()}).
start(ListenSocket) ->
es_client_sup:start_acceptor(ListenSocket).
start_link(ListenSocket) ->
proc_lib:start_link(?MODULE, init, [self(), ListenSocket]).
init(Parent, ListenSocket) ->
ok = io:format("~p Listening.~n", [self()]),
Debug = sys:debug_options([]),
ok = proc_lib:init_ack(Parent, {ok, self()}),
listen(Parent, Debug, ListenSocket).
listen(Parent, Debug, ListenSocket) ->
case gen_tcp:accept(ListenSocket) of
{ok, Socket} ->
{ok, _} = start(ListenSocket),
{ok, Peer} = inet:peername(Socket),
ok = io:format("~p Connection accepted from: ~p~n", [self(), Peer]),
State = #s{socket = Socket},
loop(Parent, Debug, State);
{error, closed} ->
ok = io:format("~p Retiring: Listen socket closed.~n", [self()]),
exit(normal)
end.
loop(Parent, Debug, State = #s{socket = Socket}) ->
ok = inet:setopts(Socket, [{active, once}]),
receive
{tcp, Socket, <<"bye\r\n">>} ->
ok = io:format("~p Client saying goodbye. Bye!~n", [self()]),
ok = gen_tcp:send(Socket, "Bye!\r\n"),
ok = gen_tcp:shutdown(Socket, read_write),
exit(normal);
{tcp, Socket, Message} ->
ok = io:format("~p received: ~tp~n", [self(), Message]),
ok = gen_tcp:send(Socket, ["You sent: ", Message]),
loop(Parent, Debug, State);
{tcp_closed, Socket} ->
ok = io:format("~p Socket closed, retiring.~n", [self()]),
exit(normal);
{system, From, Request} ->
sys:handle_system_msg(Request, From, Parent, ?MODULE, Debug, State);
Unexpected ->
ok = io:format("~p Unexpected message: ~tp", [self(), Unexpected]),
loop(Parent, Debug, State)
end.
system_continue(Parent, Debug, State) ->
loop(Parent, Debug, State).
system_terminate(Reason, _Parent, _Debug, _State) ->
exit(Reason).
system_get_state(Misc) -> {ok, Misc}.
system_replace_state(StateFun, Misc) ->
{ok, StateFun(Misc), Misc}.
Note that above we have written a Pure Erlang process that integrates with OTP the way a gen_server would, but has a bit more straightforward loop that handles only the socket. This means we do not have the gen_server call/cast mechanics (and may need to implement those ourselves, but usually asynch-only is a better approach for socket handling). This is started through the proc_lib module which is designed specifically to bootstrap OTP-compliant processes of any arbitrary type.
If you are going to use supervisors then you really want to go all the way and use OTP properly.
So what we have above right now is a very basic Telnet echo service. Instead of writing a magical client process inside the server module to tie your brain in knots (Erlangers don't like their brains tied in knots), you can start this, tell it to listen to some port, and telnet to it yourself and see the results.
I added some scripts to automate launching things, but basically the hinge on the code and make modules. Your project is laid out like
es/
Emakefile
ebin/es.app
src/*.erl
The contents of the Emakefile will make things easier for us. In this case it is just one line:
enter code here{"src/*", [debug_info, {i, "include/"}, {outdir, "ebin/"}]}.
In the main es/ directory if we enter an erl shell we can now do...
1> code:add_patha("ebin").
true
2> make:all().
up_to_date
3> es:start().
And you'll see a bunch of SASL start reports scrolling up the screen.
From there let's do es:listen(5555):
4> es:listen(5555).
<0.94.0> Listening.
ok
Cool! So it seems like things are working. Let's try to telnet to ourselves:
ceverett#changa:~/vcs/es$ telnet localhost 5555
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello es thingy
You sent: Hello es thingy
Yay! It works!
You sent: Yay! It works!
bye
Bye!
Connection closed by foreign host.
What does that look like on the other side?
<0.96.0> Listening.
<0.94.0> Connection accepted from: {{127,0,0,1},60775}
<0.94.0> received: <<"Hello es thingy\r\n">>
<0.94.0> received: <<"Yay! It works!\r\n">>
<0.94.0> Client saying goodbye. Bye!
Ah, here we see the "Listening." message from the next listener <0.96.0> that was spawned by the first one <0.94.0>.
How about concurrent connections?
<0.97.0> Listening.
<0.96.0> Connection accepted from: {{127,0,0,1},60779}
<0.98.0> Listening.
<0.97.0> Connection accepted from: {{127,0,0,1},60780}
<0.97.0> received: <<"Screen 1\r\n">>
<0.96.0> received: <<"Screen 2\r\n">>
<0.97.0> received: <<"I wonder if I could make screen 1 talk to screen 2...\r\n">>
<0.96.0> received: <<"Time to go\r\n">>
<0.96.0> Client saying goodbye. Bye!
<0.97.0> Client saying goodbye. Bye!
Oh, neato. A concurrent server!
From here you can tool around and make this basic structure change to do pretty much anything you might imagine.
DO NOTE that there is a lot missing from this code. I have stripped it of edoc notations and typespecs (used by Dialyzer, a critically important tool in a large project) -- which is a BAD thing to do for a production system.
For an example of a production-style project that is small enough to wrap your head around (only 3 modules + full docs), refer to zuuid. It was written specifically to serve as a code example, though it happens to be a full-featured UUID generator.
Forgive the hugeness of this answer to your (much shorter) question. This comes up from time to time and I wanted to write a full example of a cut-down network socket service to which I can refer people in the future, and just happened to get the hankering to do so when I read your question. :-) Hopefully the SO nazis will forgive this grievous infraction.

Erlang: Why can't I link two gen_servers?

I have two gen_server modules.
First serv.erl
-module(serv).
-behaviour(gen_server).
-export([init/1,
handle_call/3,
handle_cast/2,
handle_info/2,
code_change/3,
terminate/2,
start_link/0
]).
start_link() ->
gen_server:start_link(?MODULE, [], []).
init([]) ->
process_flag(trap_exit, true),
spawn_link(user, start_link,[]),
{ok, []}.
handle_call(_E, _From, State) ->
{noreply, State}.
handle_cast(_Message, State) ->
{noreply, State}.
terminate(_Reason, _State) ->
ok.
handle_info(Message, State) ->
{noreply, State}.
code_change(_OldVersion, State, _Extra) ->
{ok, State}.
And user.erl (which is completely the same except init/1):
init([]) ->
{ok, []}.
I thought that the servers would last forever. And if the first server dies another one gets {'EXIT', Pid, Reason} message.
But if you start the modules by serv:start_link() , the user module will exit immediately after the start with a message {'EXIT',Pid,normal} .
Why does user die?
spawn and spawn_link are the two basic Erlang functions for starting a new process. Both will create a process which then calls the function with arguments as specified in the arguments to spawn/spawn_link. When that function ends the process automatically terminates with the exit reason normal. The difference between the functions is that spawn_link also creates a link between the two processes.
The gen_server:start_link function does much more than just creating the process by initiating the behaviour and then running a behaviour top-loop which provides all the behaviour functionality. Amongst other things the callback function init is called to initialise the behaviour and then return {ok,State} to tell the behaviour that everything has been initialised and went well and here is the local state it is to pass into all the callbacks. The callback functions of a gen_server are not meant to be called directly but by the behaviour.
So when you explicitly spawn a process just running the init function it will terminate as soon as the init function ends. This is what happening here.
The {'EXIT',Pid,Reason} messages comes from the processes being linked and that the process is trapping exits. When a process dies an exit signal is sent from the dying process to all the processes to which it is linked. When this signal arrives at a process trapping exits then it is converted to normal message and put in that processes message queue. This is what you are seeing here. Note that all this is done automatically because of the link and trapping exits.
I hope that helps. Sorry for being a bit over-didactic here.
When you use the spawn link function, you are starting a new process which calls user:start_link. That process starts and links to a user gen_server process, and then exits, since the call to user:start_link returned. The user process is linked to that process, so it gets an exit signal. Since the user process isn't trapping exits, it also exits.
You should just run user:start_link in your serv:init function, as suggested in the comments.

How to determine if a worker in a supervision tree is starting for the very first time or has been restarted

I have a simple supervisor configuration:
-module(my_supervisor).
-behaviour(supervisor).
-export([start_link/0, init/1]).
init(_Args) ->
{ok, { {one_for_one, 5, 10},
[
{my_worker, {my_worker, start_link, []}, permanent, 5000, worker, [my_worker]}
]
}
}.
And even simple worker:
-module(my_worker).
-export([start_link/0]).
start_link() ->
%??? is this the first time the supervisor is starting me or have I crashed and been restarted???
So is it even possible to determine whether this is the first time the start_link function is called by the supervisor or the worker process has crashed sometime in the past and is now being restarted?
For determining whether this is the first time the start_link function is called by the supervisor.
You can use childId parameter and pass the childId from outside as follows, for details, please doc about erlang supervisor.
start_child(ChildId, Mod, Args) ->
{ok, _} = supervisor:start_child(?SERVER,
{ChildId, {Mod, start_link, Args},
transient, ?MAX_WAIT, worker, [Mod]}),
ok.
For determining worker has been restarted. Please read document of monitor and link. When crash happens,your process will receive messages. If you read the supervisor's source code, you will find the supervisor actually use link and monitor to solve the crash monitor task.
3.
init([]) ->
process_flag(trap_exit, true),
...
terminate(_Reason, _State) ->
% may be crash here by check reason above.
ok.
handle_info({'EXIT',Self,Reason},State#state{self=Self)->
error_logger:info_report([crash_now]),
{stop,Reason,State};
[1]: http://www.erlang.org/doc/man/supervisor.html

Added supervisor(s) for a gen_server, shutdown immediately?

EDIT: Below.
Why is my supervised gen_server shutting down so quickly?
I'll give these organizational names to make it more clear the chain of command that I want in my application: First I'm starting with the "assembly_line_worker" then later I'll add the "marketing_specialist" to my supervision tree...
ceo_supervisor.erl
-module(ceo_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]).
start_link(State) ->
supervisor:start_link({local,?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec= {assembly_line_worker_supervisor,
{assembly_line_worker_supervisor, start_link, [Args]},
permanent, infinity, supervisor, [assembly_line_worker_supervisor]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker_supervisor.erl
-module(assembly_line_worker_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]). %% Internal
start_link(State) ->
supervisor:start_link({local, ?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec = {assembly_line_worker, {assembly_line_worker, start_link, [Args]}, permanent,
infinity, worker, [assembly_line_worker]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker.erl
-module(assembly_line_worker).
...
init([State]) ->
process_flag(trap_exit, true),
{ok, State}.
start_link(State) ->
gen_server:start_link({global, ?MODULE}, ?MODULE, [State], []).
handle_cast(...,State} ->
io:format("We're getting this message.~n",[]),
{noreply, State};
...
What's happening is that the assembly line worker does a few bits of work, like receiving a couple of messages that are sent just after the ceo_supervisor:start_link(#innovative_ideas{}) command is called, then it shuts down. Have any idea why? I know that the gen_server is receiving a few messages because it io:format's them to the console.
Thanks!
EDIT: I'm hosting this on Windows via erlsrv.exe and I found that when I start up my program via a function like so:
start() ->
ceo_supervisor:start_link(#innovative_ideas{}),
assembly_line_worker:ask_for_more_pay(), %% Prints out "I want more $$$" as expected,
ok.
...this function exiting immediately causes my supervisors / gen_servers to shut down. I would expect this because all of this is linked via supervision to the original calling process, so when that exits so should the children.
So I guess a better question would be, how can I allow my supervisors to keep running after going through all of the start up configuration? Is there an option other than wrapping all of this in an application? (Which doesn't sound too bad...)
Thanks for the probing questions! I learned more about supervisors that way.
batman
To get more information about what is happening start sasl before you start your supervisor: application:start(sasl).
Another way to debug this would be to start the worker from your erlang shell send the sequence of message that crashed the server. Btw: are you sure that you need 2 levels of supervisors?
Some immediate comments:
In ceo_supervisor:init/1 your supervisor child spec should declare transient instead of permanent.
Run erl -boot start_sasl so you have the error log when something goes bad and you can get the crash report of a crasher.
If you run this in the shell and you make any mistake, then your tree will be forcibly killed. This is because you linked to the shell and the shell crashes upon errors. So you are dragging down your tree. Try something like:
Pid = spawn(fun() -> my_app:start() end).
so you have it split off. You can kill the app by sending an exit message to Pid.

Handling timeouts in OTP

I've got an application defined
{application, ps_barcode,
[{description, "barcode image generator based on ps-barcode"},
{vsn, "1.0"},
{modules, [ps_barcode_app, ps_barcode_supervisor, barcode_data, wand, ps_bc]},
{registered, [ps_bc, wand, ps_barcode_supervisor]},
{applications, [kernel, stdlib]},
{mod, {ps_barcode_app, []}},
{start_phases, []}]}.
with the supervisor init looking like
init([]) ->
{ok, {{one_for_one, 3, 10},
[{tag1,
{wand, start, []},
permanent,
brutal_kill,
worker,
[wand]},
{tag2,
{ps_bc, start, []},
permanent,
10000,
worker,
[ps_bc]}]}}.
It's a barcode generator that uses a C component to do some of the image processing. The system errors and restarts correctly if asked to process nonexistent files, or to do it with insufficient permissions, but there's one particular error that results in a timeout from the wand module
GPL Ghostscript 9.04: Unrecoverable error, exit code 1
GPL Ghostscript 9.04: Unrecoverable error, exit code 1
wand.c barcode_to_png 65 Postscript delegate failed `/tmp/tmp.1337.95765.926102': No such file or directory # error/ps.c/ReadPSImage/827
** exception exit: {timeout,{gen_server,call,
[wand,{process_barcode,"/tmp/tmp.1337.95765.926102"}]}}
in function gen_server:call/2 (gen_server.erl, line 180)
in call from ps_bc:generate/3 (ps_bc.erl, line 19)
(the Imagemagick error is inaccurate there; the file exists, but it's a Postscript file with errors that therefore can't be interpreted as normal; I assume that's what generates the Ghostscript error and causes the program to hang, but I'm not sure why it fails to return at all).
The problem I've got is: even though this timeout returns an error, the wand process seems to have hanged in the background (I'm concluding this since any further call to wand returns another timeout error, including wand:stop for some reason). I'm not sure how much code to post, so I'm keeping it minimally to the wand module itself. Let me know if I need to post other pieces.
-module(wand).
-behaviour(gen_server).
-export([start/0, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
-export([process/1]).
process(Filename) -> gen_server:call(?MODULE, {process_barcode, Filename}).
handle_call({process_barcode, Filename}, _From, State) ->
State ! {self(), {command, Filename}},
receive
{State, {data, Data}} ->
{reply, decode(Data), State}
end;
handle_call({'EXIT', _Port, Reason}, _From, _State) ->
exit({port_terminated, Reason}).
decode([0]) -> {ok, 0};
decode([1]) -> {error, could_not_read};
decode([2]) -> {error, could_not_write}.
%%%%%%%%%%%%%%%%%%%% generic actions
start() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
stop() -> gen_server:call(?MODULE, stop).
%%%%%%%%%%%%%%%%%%%% gen_server handlers
init([]) -> {ok, open_port({spawn, filename:absname("wand")}, [{packet, 2}])}.
handle_cast(_Msg, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, Port) -> Port ! {self(), close}, ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.
EDIT: Forgot to mention and it may be relevant; the hang only seems to happen when I run the application through application:load/application:start. If I test this component on its own by doing
c(wand).
wand:start().
wand:process("/tmp/tmp.malformed-file.ps").
It still errors, but the process dies for real. That is, I can do
wand:start().
wand:process("/tmp/tmp.existing-well-formed-file.ps").
and get the expected response. When it's started through the supervisor, it hangs instead and exhibits the behavior I described earlier.
Not an answer, but what I will do in such case. I will use gen_server:cast and will handle timeouts in gen_server and after all work is done I will send to requester response with result. So this changes affects requester side too.
But I'm maybe wrong in all ways.
It seems that using receive..after instead of a plain receive when dealing with the external C program forces a kill. I'm not sure why the other measures don't work though...
...
receive
{State, {data, Data}} ->
{reply, decode(Data), State}
after 3000 ->
exit(wand_timeout)
end;
...
Also, at this point you have to hope that no legitimate operation takes longer than 3000. It's not a problem in this particular case, but it might be if I added more outputs to the C program.

Resources