rebar3, supervisor, behaviour(application) - erlang

I have the following straightforward UDP Server:
ONLY accept Binary <<0:32>>,
otherwise it crash
-module(server).
-export([listen/1]).
listen(Port) ->
spawn(fun()->run_it(Port) end).
run_it(Port) ->
{ok, Skt} = gen_udp:open(Port, [binary]),
loop(Skt).
loop(Skt) ->
receive
{udp, Skt, _, _, Bin} ->
case Bin of
<<0:32>> ->
io:fwrite("~p~n", [{"Good Format: ", Bin}]),
loop(Skt)
end
end.
Now, my peer UDP Client is going to send a malformed data intentionally.
I can write a case clause to match any message and simply ignore any
malformed message.
However, it will not help me if there is eventual bugs.
I have read somewhere: "Don't program defensively, let it crash, then fix it".
=ERROR REPORT==== 19-Jul-2020::21:15:29.872000 ===
Error in process <0.93.0> with exit value:
{{case_clause,<<0>>},[{server,loop,1,[{file,"server.erl"},{line,16}]}]}
Cool, it crash, But I want my server to restart automatically now :-)
I have read that a process called "supervisor" can monitor my
server and restarts it when it detects that it died.
So, I have used "rebar3" because it helped me a lot when I compile
several files with just 1 single line 'rebar3 compile'.
It creates automatically a /src/ with 3 files, but only 2 are
interesting me for now:
server_app.erl
server_sup.erl
In addition, I have read the documentations but I'm still far from understanding.
Can anyone advise please, or transform my 19 lines of code server.erl
to server_app.erl and supervised by a server_sup.erl?
N.B: I'm not looking for a gen_server, I see it a lot but am I
obligated to transform this to a gen_server also, or only
application+supervisor is ok for my requirement?
Thanks in advance,
Best Regards,

Supervisors are part of the OTP supervision tree, which handles all the restart policies and such. Although it's possible not to use gen_servers for its modules, I'd avise against it: gen_servers provide a handy abstraction for the most common server operations (and they handle name registering, sys messages and other goodies out of the box).
Although the 'let it crash' lemma is common in Erlang, it does not mean that you don't have to foresee issues your code may face or that you only care about the happy case. I'd hate to see any sytem crashing due to someone's malformed/malicious nc -u.
Keep in mind that if the supervisor's restart limit is reached, it will die too, eventually reaching the application's top supervisor, which crashes the VM upon death.
Let's get to the code (very little editing over what rebar3 new app generates):
The application requires no editions:
-module(server_app).
-behaviour(application).
-export([start/2, stop/1]).
start(_StartType, _StartArgs) ->
server_sup:start_link().
stop(_State) ->
ok.
The supervisor has a little more configuration, but that's it:
-module(server_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(?MODULE, []).
init([]) ->
SupFlags = #{strategy => one_for_all,
intensity => 0,
period => 1},
ChildSpecs = [#{id => my_server,
start => {server, start_link, [12345]},
type => worker
}],
{ok, {SupFlags, ChildSpecs}}.
And the server.erl needs some modifications in the start function:
-module(server).
-export([start_link/1]).
start_link(Port) ->
{ok, spawn_link(fun()->run_it(Port) end)}.
run_it(Port) ->
{ok, Skt} = gen_udp:open(Port, [binary]),
loop(Skt).
loop(Skt) ->
receive
{udp, Skt, _, _, Bin} ->
case Bin of
<<0:32>> ->
io:fwrite("~p~n", [{"Good Format: ", Bin}]),
loop(Skt)
end
The server.erl as gen_server would be something like:
-module(server).
-export([start_link/1]).
-behaviour(gen_server).
-export([
init/1,
handle_cast/2,
handle_call/3,
handle_info/2
]).
start_link(Port) ->
gen_server:start_link(?MODULE, Port, []).
init(Port) ->
gen_udp:open(Port, [binary]).
handle_call(_Call, _From, State) ->
{reply, ok, State}.
handle_cast(_Msg, State) ->
{noreply, State}.
handle_info({udp, Skt, _, _, Bin}, Skt) ->
case Bin of
<<0:32>> -> io:fwrite("~p~n", [{"Good Format: ", Bin}])
end,
{noreply, Skt};
handle_info(_Msg, State) -> % Messages that are not {udp, Skt, _, _, _} are discarded
{noreply, State}.

Related

Erlang server connecting with ports to send and receive a Json file to a Java application

I have tried to implement a server with Erlang to my Java application.
Seems that my server is working, but still full of bugs and dead points.
I need to receive a JSON file parsed by the Java application into a map and send it back to all clients, including the one that uploaded the file.
Meanwhile, I need to keep track who made the request and which part of the message was sent, in case of any problems the client should be restarted from this point, not from the beginning. Unless the client leaves the application, then it should restart.
My three pieces of code will be below:
The app.erl
-module(erlServer_app).
-behaviour(application).
%% Application callbacks
-export([start/2, stop/1]).
%%%===================================================================
%%% Application callbacks
%%%===================================================================
start(_StartType, _StartArgs) ->
erlServer_sup:start_link().
stop(_State) ->
ok.
The supervisor.erl:
-module(erlServer_sup).
-behaviour(supervisor).
%% API
-export([start_link/0]).
%% Supervisor callbacks
-export([init/1, start_socket/0, terminate_socket/0, empty_listeners/0]).
-define(SERVER, ?MODULE).
%%--------------------------------------------------------------------
%% #doc
%% Starts the supervisor
%%
%% #end
%%--------------------------------------------------------------------
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
%%%===================================================================
%%% Supervisor callbacks
%%%===================================================================
init([]) -> % restart strategy 'one_for_one': if one goes down only that one is restarted
io:format("starting...~n"),
spawn_link(fun() -> empty_listeners() end),
{ok,
{{one_for_one, 5, 30}, % The flag - 5 restart within 30 seconds
[{erlServer_server, {erlServer_server, init, []}, permanent, 1000, worker, [erlServer_server]}]}}.
%%%===================================================================
%%% Internal functions
%%%===================================================================
start_socket() ->
supervisor:start_child(?MODULE, []).
terminate_socket() ->
supervisor:delete_child(?MODULE, []).
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,20)],
ok.
The server.erl:(I have a lot of debugging io:format.)
-module(erlServer_server).
%% API
-export([init/0, start_server/0]).
%% Defining the port used.
-define(PORT, 8080).
%%%===================================================================
%%% API
%%%===================================================================
init() ->
start_server().
%%%===================================================================
%%% Server callbacks
%%%===================================================================
start_server() ->
io:format("Server started.~n"),
Pid = spawn_link(fun() ->
{ok, ServerSocket} = gen_tcp:listen(?PORT, [binary, {packet, 0},
{reuseaddr, true}, {active, true}]),
io:format("Baba~p", [ServerSocket]),
server_loop(ServerSocket) end),
{ok, Pid}.
server_loop(ServerSocket) ->
io:format("Oba~p", [ServerSocket]),
{ok, Socket} = gen_tcp:accept(ServerSocket),
Pid1 = spawn(fun() -> client() end),
inet:setopts(Socket, [{packet, 0}, binary,
{nodelay, true}, {active, true}]),
gen_tcp:controlling_process(Socket, Pid1),
server_loop(ServerSocket).
%%%===================================================================
%%% Internal functions
%%%===================================================================
client() ->
io:format("Starting client. Enter \'quit\' to exit.~n"),
Client = self(),
{ok, Sock} = gen_tcp:connect("localhost", ?PORT, [{active, false}, {packet, 2}]),
display_prompt(Client),
client_loop(Client, Sock).
%%%===================================================================
%%% Client callbacks
%%%===================================================================
send(Sock, Packet) ->
{ok, Sock, Packet} = gen_tcp:send(Sock, Packet),
io:format("Sent ~n").
recv(Packet) ->
{recv, ok, Packet} = gen_tcp:recv(Packet),
io:format("Received ~n").
display_prompt(Client) ->
spawn(fun () ->
Packet = io:get_line("> "),
Client ! {entered, Packet}
end),
Client ! {requested},
ok.
client_loop(Client, Sock) ->
receive
{entered, "quit\n"} ->
gen_tcp:close(Sock);
{entered, Packet} ->
% When a packet is entered we receive it,
recv(Packet),
display_prompt(Client),
client_loop(Client, Sock);
{requested, Packet} ->
% When a packet is requested we send it,
send(Sock, Packet),
display_prompt(Client),
client_loop(Client, Sock);
{error, timeout} ->
io:format("Send timeout, closing!~n", []),
Client ! {self(),{error_sending, timeout}},
gen_tcp:close(Sock);
{error, OtherSendError} ->
io:format("Some other error on socket (~p), closing", [OtherSendError]),
Client ! {self(),{error_sending, OtherSendError}},
gen_tcp:close(Sock)
end.
This is the first server I'm doing and I got lost probably in the middle. When I run it seems to be working, but hanging. Can someone help me? My localhost never loads anything it keeps loading forever.
How can my java app receive it from the same port?
I MUST use Erlang and I MUST use ports to connect to the java application.
Thanks for helping me!
Let's rework this just a little...
First up: Naming
We don't use camelCase in Erlang. It is confusing because capitalized variable names and lower-case (or single-quoted) atoms mean different things. Also, module names must be the same as file names which causes problems on case-insensitive filesystems.
Also, we really want a better name than "server". Server can mean a lot of things in a system like this, and while the system overall may be a service written in Erlang, it doesn't necessarily mean that we can call everything inside a "server" without getting super ambiguous! That's confusing. I'm going to namespace your project as "ES" for now. So you'll have es_app and es_sup and so on. This will come in handy later when we want to start defining new modules, maybe some of them called "server" without having to write "server_server" all over the place.
Second: Input Data
Generally speaking, we would like to pass arguments to functions instead of burying literals (or worse, macro rewrites) inside of code. If we are going to have magic numbers and constants let's do our best to put them into a configuration file so that we can access them in a programmatic way, or even better, let's use them in the initial startup calls as arguments to subordinate processes so that we can rework the behavior of the system (once written) only by messing around with the startup calling functions in the main application module.
-module(es).
-behaviour(application).
-export([listen/1, ignore/0]).
-export([start/0, start/1]).
-export([start/2, stop/1]).
listen(PortNum) ->
es_client_man:listen(PortNum).
ignore() ->
es_client_man:ignore().
start() ->
ok = application:ensure_started(sasl),
ok = application:start(?MODULE),
io:format("Starting es...").
start(Port) ->
ok = start(),
ok = es_client_man:listen(Port),
io:format("Startup complete, listening on ~w~n", [Port]).
start(normal, _Args) ->
es_sup:start_link().
stop(_State) ->
ok.
I added a start/1 function above as well as a start/0, a listen/1 and an ignore/0 that you'll see again later in es_client_man. These are mostly convenience wrappers around things you could call more explicitly, but probably won't want to type all the time.
This application module kicks things off by having an application master start the project for us (by calling application:start/1) and then the next line calls erl_server_server to tell it to start listening. In early development I find this approach much more useful than burying autostarts to every component all over the place, and later on it gives us a very simple way to write an external interface that can turn various components on and off.
Ah, also... we're going to start this as a for-real Erlang application, so we'll need an app file in ebin/ (or if you're using erlang.mk or something similar an app.src file in src/):
ebin/es.app looks like this:
{application,es,
[{description,"An Erlang Server example project"},
{vsn,"0.1.0"},
{applications,[stdlib,kernel,sasl]},
{modules,[es,
es_sup,
es_clients,
es_client_sup,
es_client,
es_client_man]},
{mod,{es,[]}}]}.
The list under modules reflects the layout of the supervision tree, actually, as you will see below.
The start/2 function above now asserts that we will only start in normal mode (which may or may not be appropriate), and disregards startup args (which also may or may not be appropriate).
Third: The Supervision Tree
-module(es_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
RestartStrategy = {one_for_one, 1, 60},
Clients = {es_clients,
{es_clients, start_link, []},
permanent,
5000,
supervisor,
[es_clients]},
Children = [Clients],
{ok, {RestartStrategy, Children}}.
And then...
-module(es_clients).
-behavior(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, none).
init(none) ->
RestartStrategy = {rest_for_one, 1, 60},
ClientSup = {es_client_sup,
{es_client_sup, start_link, []},
permanent,
5000,
supervisor,
[es_client_sup]},
ClientMan = {es_client_man,
{es_client_man, start_link, []},
permanent,
5000,
worker,
[es_client_man]},
Children = [ClientSup, ClientMan],
{ok, {RestartStrategy, Children}}.
Whoa! What is going on here?!? Well, the es_sup is a supervisor, not a one-off thingy that just spawns other one-off thingies. (Misunderstanding supervisors is part of your core problem.)
Supervisors are boring. Supervisors should be boring. All they really do as a reader of code is what the structure of the supervision tree is inside. What they do for us in terms of OTP structure is extremely important, but they don't require that we write any procedural code, just declare what children it should have. What we are implementing here is called a service -> worker structure. So we have the top-level supervisor for your overall application called es_sup. Below that we have (for now) a single service component called es_clients.
The es_clients process is also a supervisor. The reason for this is to define a clear way for the client connection parts to not affect whatever ongoing state may exist in the rest of your system later. Just accepting connections from clients is useless -- surely there is some state that matters elsewhere, like long-running connections to some Java node or whatever. That would be a separate service component maybe called es_java_nodes and that part of the program would begin with its own, separate supervisor. That's why it is called a "supervision tree" instead of a "supervision list".
So back to clients... We will have clients connecting. That is why we call them "clients", because from the perspective of this Erlang system the things connecting are clients, and the processes that accept those connections abstract the clients so we can treat each connection handler as a client itself -- because that is exactly what it represents. If we connect to upstream services later, we would want to call those whatever they are abstracting so that our semantics within the system is sane.
You can then think in terms of "an es_client sent a message to an es_java_node to query [thingy]" instead of trying to keep things straight like "a server_server asked a java_server_client to server_server the service_server" (which is literally how stupid things get if you don't keep your naming principles straight in terms of inner-system perspective).
Blah blah blah...
So, here is the es_client_sup:
-module(es_client_sup).
-behaviour(supervisor).
-export([start_acceptor/1]).
-export([start_link/0]).
-export([init/1]).
start_acceptor(ListenSocket) ->
supervisor:start_child(?MODULE, [ListenSocket]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, none).
init(none) ->
RestartStrategy = {simple_one_for_one, 1, 60},
Client = {es_client,
{es_client, start_link, []},
temporary,
brutal_kill,
worker,
[es_client]},
{ok, {RestartStrategy, [Client]}}.
Are you picking up a pattern? I wasn't kidding when I said that "supervisors should be boring..." :-) Note that here we are actually passing an argument in and we have defined an interface function. That is so we have a logicalish place to call if we need a socket acceptor to start.
Fourth: The Client Service Itself
Let's look at the client manager:
-module(es_client_man).
-behavior(gen_server).
-export([listen/1, ignore/0]).
-export([start_link/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
code_change/3, terminate/2]).
-record(s, {port_num = none :: none | inet:port_number(),
listener = none :: none | gen_tcp:socket()}).
listen(PortNum) ->
gen_server:call(?MODULE, {listen, PortNum}).
ignore() ->
gen_server:cast(?MODULE, ignore).
start_link() ->
gen_server:start_link({local, ?MODULE}, ?MODULE, none, []).
init(none) ->
ok = io:format("Starting.~n"),
State = #s{},
{ok, State}.
handle_call({listen, PortNum}, _, State) ->
{Response, NewState} = do_listen(PortNum, State),
{reply, Response, NewState};
handle_call(Unexpected, From, State) ->
ok = io:format("~p Unexpected call from ~tp: ~tp~n", [self(), From, Unexpected]),
{noreply, State}.
handle_cast(ignore, State) ->
NewState = do_ignore(State),
{noreply, NewState};
handle_cast(Unexpected, State) ->
ok = io:format("~p Unexpected cast: ~tp~n", [self(), Unexpected]),
{noreply, State}.
handle_info(Unexpected, State) ->
ok = io:format("~p Unexpected info: ~tp~n", [self(), Unexpected]),
{noreply, State}.
code_change(_, State, _) ->
{ok, State}.
terminate(_, _) ->
ok.
do_listen(PortNum, State = #s{port_num = none}) ->
SocketOptions =
[{active, once},
{mode, binary},
{keepalive, true},
{reuseaddr, true}],
{ok, Listener} = gen_tcp:listen(PortNum, SocketOptions),
{ok, _} = es_client:start(Listener),
{ok, State#s{port_num = PortNum, listener = Listener}};
do_listen(_, State = #s{port_num = PortNum}) ->
ok = io:format("~p Already listening on ~p~n", [self(), PortNum]),
{{error, {listening, PortNum}}, State}.
do_ignore(State = #s{listener = none}) ->
State;
do_ignore(State = #s{listener = Listener}) ->
ok = gen_tcp:close(Listener),
State#s{listener = none}.
Hmmm, what is all that about? The basic idea here is that we have a service supervisor over the whole concept of clients (es_clients, as discussed above), and then we have the simple_one_for_one to handle whatever clients happen to be alive just now (es_client_sup), and here we have the management interface to the subsystem. All this manager does is keep track of what port we are listening on, and owns the socket that we are listening to if one is open at the moment. Note that this can be easily rewritten to allow any arbitrary number of ports to be listened to simultaneously, or track all living clients, or whatever. There really is no limit to what you might want to do.
So how do we start clients that can accept connections? By telling them to spawn and listen to the listen socket that we pass in as an argument. Go take another look at es_client_sup above. We are passing in an empty list as its first argument. What will happen when we call its start_link function is that whatever else we pass in as a list will get added to the list of arguments overall. In this case we will pass in the listen socket and so it will be started with the argument [ListenSocket].
Whenever a client listener accepts a connection, its next step will be to spawn its successor, handing it the original ListenSocket argument. Ah, the miracle of life.
-module(es_client).
-export([start/1]).
-export([start_link/1, init/2]).
-export([system_continue/3, system_terminate/4,
system_get_state/1, system_replace_state/2]).
-record(s, {socket = none :: none | gen_tcp:socket()}).
start(ListenSocket) ->
es_client_sup:start_acceptor(ListenSocket).
start_link(ListenSocket) ->
proc_lib:start_link(?MODULE, init, [self(), ListenSocket]).
init(Parent, ListenSocket) ->
ok = io:format("~p Listening.~n", [self()]),
Debug = sys:debug_options([]),
ok = proc_lib:init_ack(Parent, {ok, self()}),
listen(Parent, Debug, ListenSocket).
listen(Parent, Debug, ListenSocket) ->
case gen_tcp:accept(ListenSocket) of
{ok, Socket} ->
{ok, _} = start(ListenSocket),
{ok, Peer} = inet:peername(Socket),
ok = io:format("~p Connection accepted from: ~p~n", [self(), Peer]),
State = #s{socket = Socket},
loop(Parent, Debug, State);
{error, closed} ->
ok = io:format("~p Retiring: Listen socket closed.~n", [self()]),
exit(normal)
end.
loop(Parent, Debug, State = #s{socket = Socket}) ->
ok = inet:setopts(Socket, [{active, once}]),
receive
{tcp, Socket, <<"bye\r\n">>} ->
ok = io:format("~p Client saying goodbye. Bye!~n", [self()]),
ok = gen_tcp:send(Socket, "Bye!\r\n"),
ok = gen_tcp:shutdown(Socket, read_write),
exit(normal);
{tcp, Socket, Message} ->
ok = io:format("~p received: ~tp~n", [self(), Message]),
ok = gen_tcp:send(Socket, ["You sent: ", Message]),
loop(Parent, Debug, State);
{tcp_closed, Socket} ->
ok = io:format("~p Socket closed, retiring.~n", [self()]),
exit(normal);
{system, From, Request} ->
sys:handle_system_msg(Request, From, Parent, ?MODULE, Debug, State);
Unexpected ->
ok = io:format("~p Unexpected message: ~tp", [self(), Unexpected]),
loop(Parent, Debug, State)
end.
system_continue(Parent, Debug, State) ->
loop(Parent, Debug, State).
system_terminate(Reason, _Parent, _Debug, _State) ->
exit(Reason).
system_get_state(Misc) -> {ok, Misc}.
system_replace_state(StateFun, Misc) ->
{ok, StateFun(Misc), Misc}.
Note that above we have written a Pure Erlang process that integrates with OTP the way a gen_server would, but has a bit more straightforward loop that handles only the socket. This means we do not have the gen_server call/cast mechanics (and may need to implement those ourselves, but usually asynch-only is a better approach for socket handling). This is started through the proc_lib module which is designed specifically to bootstrap OTP-compliant processes of any arbitrary type.
If you are going to use supervisors then you really want to go all the way and use OTP properly.
So what we have above right now is a very basic Telnet echo service. Instead of writing a magical client process inside the server module to tie your brain in knots (Erlangers don't like their brains tied in knots), you can start this, tell it to listen to some port, and telnet to it yourself and see the results.
I added some scripts to automate launching things, but basically the hinge on the code and make modules. Your project is laid out like
es/
Emakefile
ebin/es.app
src/*.erl
The contents of the Emakefile will make things easier for us. In this case it is just one line:
enter code here{"src/*", [debug_info, {i, "include/"}, {outdir, "ebin/"}]}.
In the main es/ directory if we enter an erl shell we can now do...
1> code:add_patha("ebin").
true
2> make:all().
up_to_date
3> es:start().
And you'll see a bunch of SASL start reports scrolling up the screen.
From there let's do es:listen(5555):
4> es:listen(5555).
<0.94.0> Listening.
ok
Cool! So it seems like things are working. Let's try to telnet to ourselves:
ceverett#changa:~/vcs/es$ telnet localhost 5555
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello es thingy
You sent: Hello es thingy
Yay! It works!
You sent: Yay! It works!
bye
Bye!
Connection closed by foreign host.
What does that look like on the other side?
<0.96.0> Listening.
<0.94.0> Connection accepted from: {{127,0,0,1},60775}
<0.94.0> received: <<"Hello es thingy\r\n">>
<0.94.0> received: <<"Yay! It works!\r\n">>
<0.94.0> Client saying goodbye. Bye!
Ah, here we see the "Listening." message from the next listener <0.96.0> that was spawned by the first one <0.94.0>.
How about concurrent connections?
<0.97.0> Listening.
<0.96.0> Connection accepted from: {{127,0,0,1},60779}
<0.98.0> Listening.
<0.97.0> Connection accepted from: {{127,0,0,1},60780}
<0.97.0> received: <<"Screen 1\r\n">>
<0.96.0> received: <<"Screen 2\r\n">>
<0.97.0> received: <<"I wonder if I could make screen 1 talk to screen 2...\r\n">>
<0.96.0> received: <<"Time to go\r\n">>
<0.96.0> Client saying goodbye. Bye!
<0.97.0> Client saying goodbye. Bye!
Oh, neato. A concurrent server!
From here you can tool around and make this basic structure change to do pretty much anything you might imagine.
DO NOTE that there is a lot missing from this code. I have stripped it of edoc notations and typespecs (used by Dialyzer, a critically important tool in a large project) -- which is a BAD thing to do for a production system.
For an example of a production-style project that is small enough to wrap your head around (only 3 modules + full docs), refer to zuuid. It was written specifically to serve as a code example, though it happens to be a full-featured UUID generator.
Forgive the hugeness of this answer to your (much shorter) question. This comes up from time to time and I wanted to write a full example of a cut-down network socket service to which I can refer people in the future, and just happened to get the hankering to do so when I read your question. :-) Hopefully the SO nazis will forgive this grievous infraction.

erlang otp child workers

I'm trying to get an OTP supervisor to start child workers which will (eventually) connect to remote servers. I used Rebar to create a template test application and I'm trying to get the supervisor to fire off function 'hi' in module 'foo'. it compiles OK and runs:
Eshell V5.8.5 (abort with ^G)
1> test_app:start(1,1).
{ok,<0.34.0>}
but when I try to start the worker it goes pear shaped with this error:
2> test_sup:start_foo().
{error,{badarg,{foo,{foo,start_link,[]},
permanent,5000,worker,
[foo]}}}
The problem seems similar, but not the same, to this question: Erlang - Starting a child from the supervisor module
Any ideas?
test_app.erl
-module(test_app).
-behaviour(application).net
-export([start/2, stop/1]).
start(_StartType, _StartArgs) ->
test_sup:start_link().
stop(_State) ->
ok.
Test_sup.erl:
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1, start_foo/0]).
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, { {one_for_one, 5, 10}, []} }.
start_foo()->
supervisor:check_childspecs(?CHILD(foo, worker)),
supervisor:start_child(?MODULE, ?CHILD(foo, permanent)).
foo.erl:
-module(foo).
-export([hi/0]).
hi()->
io:format("worker ~n").
You check the childspec using the macro call ?CHILD(foo, worker) while you try to start the child with the macro using the macro call ?CHILD(foo, permanent). The second argument of the CHILD macro is the process type which should be either worker or supervisor. So the first macro call is correct. The value permanent is a value for the restart type, which you have already set to permanent, so the second call is wrong and you get a badarg error.
Note: It is quite common that library functions generate badarg errors as well, not just from built-in functions. It is not always obvious why it is a badarg.
I think that Robert answer is incomplete, after replacing permanent by worker you still have an error returned by supervisor:check_childspecs(?CHILD(foo, worker)),, I don't know why.
[edit]
The problem of bard arg comes from ... badarg :o)
check_childspecs extepect a list of child_specs, the correct syntax is supervisor:check_childspecs([?CHILD(foo, worker)]), and then it works fine. the following code is updated.
[end of edit]
But you will get also an error because the supervisor will try to launch the function foo:start_link that does not exist in the foo module.
the following code print an error, but seems to work properly.
-module(foo).
-export([hi/0,start_link/0,loop/0]).
start_link() ->
{ok,spawn_link(?MODULE,loop,[])}.
hi()->
io:format("worker ~n").
loop() ->
receive
_ -> ok
end.
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1, start_foo/0]).
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, { {one_for_one, 5, 10}, []} }.
start_foo()->
io:format("~p~n",[supervisor:check_childspecs([?CHILD(foo, worker)])]),
supervisor:start_child(?MODULE, ?CHILD(foo, worker)).
[edit]
answering to David comment
in my code the loop/0 does not loop at all, on the receive block, the process waits for any message, and as soon as it receives one, the process dies returning the value ok. So as long as the worker process does not receive any message, it keeps living, which is nice when you make some test with supervisors :o).
On the opposite, the hi/0 function simply prints 'worker' on the console and finishes. As the restart strategy of the supervisor is one_for_one, the max restart is 5 and the child process is permanent, the supervisor will try to start the hi process 5 times, printing five time 'worker' on the console, and then it will give up and terminate itself with an error message ** exception error: shutdown
Generally you should choose permanent for never ending processes (main server of an application for example). For process that normally die as soon as they have done their job, you should use temporary. I never used transient but I read that it should be used for process that must complete a task before dying.

Added supervisor(s) for a gen_server, shutdown immediately?

EDIT: Below.
Why is my supervised gen_server shutting down so quickly?
I'll give these organizational names to make it more clear the chain of command that I want in my application: First I'm starting with the "assembly_line_worker" then later I'll add the "marketing_specialist" to my supervision tree...
ceo_supervisor.erl
-module(ceo_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]).
start_link(State) ->
supervisor:start_link({local,?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec= {assembly_line_worker_supervisor,
{assembly_line_worker_supervisor, start_link, [Args]},
permanent, infinity, supervisor, [assembly_line_worker_supervisor]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker_supervisor.erl
-module(assembly_line_worker_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]). %% Internal
start_link(State) ->
supervisor:start_link({local, ?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec = {assembly_line_worker, {assembly_line_worker, start_link, [Args]}, permanent,
infinity, worker, [assembly_line_worker]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker.erl
-module(assembly_line_worker).
...
init([State]) ->
process_flag(trap_exit, true),
{ok, State}.
start_link(State) ->
gen_server:start_link({global, ?MODULE}, ?MODULE, [State], []).
handle_cast(...,State} ->
io:format("We're getting this message.~n",[]),
{noreply, State};
...
What's happening is that the assembly line worker does a few bits of work, like receiving a couple of messages that are sent just after the ceo_supervisor:start_link(#innovative_ideas{}) command is called, then it shuts down. Have any idea why? I know that the gen_server is receiving a few messages because it io:format's them to the console.
Thanks!
EDIT: I'm hosting this on Windows via erlsrv.exe and I found that when I start up my program via a function like so:
start() ->
ceo_supervisor:start_link(#innovative_ideas{}),
assembly_line_worker:ask_for_more_pay(), %% Prints out "I want more $$$" as expected,
ok.
...this function exiting immediately causes my supervisors / gen_servers to shut down. I would expect this because all of this is linked via supervision to the original calling process, so when that exits so should the children.
So I guess a better question would be, how can I allow my supervisors to keep running after going through all of the start up configuration? Is there an option other than wrapping all of this in an application? (Which doesn't sound too bad...)
Thanks for the probing questions! I learned more about supervisors that way.
batman
To get more information about what is happening start sasl before you start your supervisor: application:start(sasl).
Another way to debug this would be to start the worker from your erlang shell send the sequence of message that crashed the server. Btw: are you sure that you need 2 levels of supervisors?
Some immediate comments:
In ceo_supervisor:init/1 your supervisor child spec should declare transient instead of permanent.
Run erl -boot start_sasl so you have the error log when something goes bad and you can get the crash report of a crasher.
If you run this in the shell and you make any mistake, then your tree will be forcibly killed. This is because you linked to the shell and the shell crashes upon errors. So you are dragging down your tree. Try something like:
Pid = spawn(fun() -> my_app:start() end).
so you have it split off. You can kill the app by sending an exit message to Pid.

Handling timeouts in OTP

I've got an application defined
{application, ps_barcode,
[{description, "barcode image generator based on ps-barcode"},
{vsn, "1.0"},
{modules, [ps_barcode_app, ps_barcode_supervisor, barcode_data, wand, ps_bc]},
{registered, [ps_bc, wand, ps_barcode_supervisor]},
{applications, [kernel, stdlib]},
{mod, {ps_barcode_app, []}},
{start_phases, []}]}.
with the supervisor init looking like
init([]) ->
{ok, {{one_for_one, 3, 10},
[{tag1,
{wand, start, []},
permanent,
brutal_kill,
worker,
[wand]},
{tag2,
{ps_bc, start, []},
permanent,
10000,
worker,
[ps_bc]}]}}.
It's a barcode generator that uses a C component to do some of the image processing. The system errors and restarts correctly if asked to process nonexistent files, or to do it with insufficient permissions, but there's one particular error that results in a timeout from the wand module
GPL Ghostscript 9.04: Unrecoverable error, exit code 1
GPL Ghostscript 9.04: Unrecoverable error, exit code 1
wand.c barcode_to_png 65 Postscript delegate failed `/tmp/tmp.1337.95765.926102': No such file or directory # error/ps.c/ReadPSImage/827
** exception exit: {timeout,{gen_server,call,
[wand,{process_barcode,"/tmp/tmp.1337.95765.926102"}]}}
in function gen_server:call/2 (gen_server.erl, line 180)
in call from ps_bc:generate/3 (ps_bc.erl, line 19)
(the Imagemagick error is inaccurate there; the file exists, but it's a Postscript file with errors that therefore can't be interpreted as normal; I assume that's what generates the Ghostscript error and causes the program to hang, but I'm not sure why it fails to return at all).
The problem I've got is: even though this timeout returns an error, the wand process seems to have hanged in the background (I'm concluding this since any further call to wand returns another timeout error, including wand:stop for some reason). I'm not sure how much code to post, so I'm keeping it minimally to the wand module itself. Let me know if I need to post other pieces.
-module(wand).
-behaviour(gen_server).
-export([start/0, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
-export([process/1]).
process(Filename) -> gen_server:call(?MODULE, {process_barcode, Filename}).
handle_call({process_barcode, Filename}, _From, State) ->
State ! {self(), {command, Filename}},
receive
{State, {data, Data}} ->
{reply, decode(Data), State}
end;
handle_call({'EXIT', _Port, Reason}, _From, _State) ->
exit({port_terminated, Reason}).
decode([0]) -> {ok, 0};
decode([1]) -> {error, could_not_read};
decode([2]) -> {error, could_not_write}.
%%%%%%%%%%%%%%%%%%%% generic actions
start() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
stop() -> gen_server:call(?MODULE, stop).
%%%%%%%%%%%%%%%%%%%% gen_server handlers
init([]) -> {ok, open_port({spawn, filename:absname("wand")}, [{packet, 2}])}.
handle_cast(_Msg, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, Port) -> Port ! {self(), close}, ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.
EDIT: Forgot to mention and it may be relevant; the hang only seems to happen when I run the application through application:load/application:start. If I test this component on its own by doing
c(wand).
wand:start().
wand:process("/tmp/tmp.malformed-file.ps").
It still errors, but the process dies for real. That is, I can do
wand:start().
wand:process("/tmp/tmp.existing-well-formed-file.ps").
and get the expected response. When it's started through the supervisor, it hangs instead and exhibits the behavior I described earlier.
Not an answer, but what I will do in such case. I will use gen_server:cast and will handle timeouts in gen_server and after all work is done I will send to requester response with result. So this changes affects requester side too.
But I'm maybe wrong in all ways.
It seems that using receive..after instead of a plain receive when dealing with the external C program forces a kill. I'm not sure why the other measures don't work though...
...
receive
{State, {data, Data}} ->
{reply, decode(Data), State}
after 3000 ->
exit(wand_timeout)
end;
...
Also, at this point you have to hope that no legitimate operation takes longer than 3000. It's not a problem in this particular case, but it might be if I added more outputs to the C program.

Achieving code swapping in Erlang's gen_server

I am looking to make use of Erlang's hot code swapping feature on a gen_server, so that I don't have to restart it. How should I do that? When I searched, all I could find was one article which mentioned that I need to make use of gen_server:code_change callback.
However, I could not really find any documentation/examples on how to use this. Any help or links to resources greatly appreciated!
As I already mentioned the normal way of upgrading is creating the proper .appup and .relup files, and let release_handler do what needs to be done. However you can manually execute the steps involved, as described here. Sorry for the long answer.
The following dummy gen_server implements a counter. The old version ("0") simply stores an integer as state, while the new version ("1") stores {tschak, Int} as state. As I said, this is a dummy example.
z.erl (old):
-module(z).
-version("0").
-export([start_link/0, boing/0]).
-behavior(gen_server).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]).
start_link() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], [{debug, [trace]}]).
boing() -> gen_server:call(?MODULE, boom).
init([]) -> {ok, 0}.
handle_call(boom, _From, Num) -> {reply, Num, Num+1};
handle_call(_Call, _From, State) -> {noreply, State}.
handle_cast(_Cast, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, _State) -> ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.
z.erl (new):
-module(z).
-version("1").
-export([start_link/0, boing/0]).
-behavior(gen_server).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]).
start_link() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], [{debug, [trace]}]).
boing() -> gen_server:call(?MODULE, boom).
init([]) -> {ok, {tschak, 0}}.
handle_call(boom, _From, {tschak, Num}) -> {reply, Num, {tschak, Num+1}};
handle_call(_Call, _From, State) -> {noreply, State}.
handle_cast(_Cast, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, _State) -> ok.
code_change("0", Num, _Extra) -> {ok, {tschak, Num}}.
Start the shell, and compile the old code. Notice the gen_server is started with debug trace.
1> c(z).
{ok,z}
2> z:start_link().
{ok,<0.38.0>}
3> z:boing().
*DBG* z got call boom from <0.31.0>
*DBG* z sent 0 to <0.31.0>, new state 1
0
4> z:boing().
*DBG* z got call boom from <0.31.0>
*DBG* z sent 1 to <0.31.0>, new state 2
1
Works as expected: returns the Int, and new state is Int+1.
Now replace z.erl with the new one, and execute the following steps.
5> compile:file(z).
{ok,z}
6> sys:suspend(z).
ok
7> code:purge(z).
false
8> code:load_file(z).
{module,z}
9> sys:change_code(z,z,"0",[]).
ok
10> sys:resume(z).
ok
What you just did: 5: compiled the new code. 6: suspended the server. 7: purged older code (just in case). 8: loaded the new code. 9: invoked code change in process 'z' for module 'z' from version "0" with [] passed as "Extra" to code_change. 10: resumed the server.
Now if you run some more tests, you can see, that the server works with the new state format:
11> z:boing().
*DBG* z got call boom from <0.31.0>
*DBG* z sent 2 to <0.31.0>, new state {tschak,3}
2
12> z:boing().
*DBG* z got call boom from <0.31.0>
*DBG* z sent 3 to <0.31.0>, new state {tschak,4}
3
You don't need to use that callback in the gen_server behaviour. It is there if you change the internal representation of the state across a code upgrade.
You only need to load the new module and the gen_server running the old version will upgrade, since it calls the new module. It is just that you dont have a chance to change the representation if that is necessary.
The simplest way to do it is replace the .beam file and run l(my_server_module). in the shell. This bypasses the code_change function, and therefore requires that the representation of state hasn't changed.
As mentioned already, the proper way to do it is to create a new release with appup and relup scripts. This new release is then installed with release_handler.
If you want to do it the right way, which is highly recommended, then you need to read up on the use of OTP Supervisors and Applications.
You could do worse than to read the OTP Design Principles User's Guide here :
http://www.erlang.org/doc/design_principles/users_guide.html
If you're on rebar3, some of this manual processing has been automated away (ie. appup and relup generation), you can find more info here: http://lrascao.github.io/automatic-release-upgrades-in-erlang/

Resources