I've got some C code that I'm executing as an external process using Erlang's Port capability. I want the process that starts the C code, via open_port, to detect if the C code crashes. The documentation isn't completely clear to me, but as I understand it, a two-way link is established between the Erlang process and the external code. If one dies, the other is notified.
Here is a slightly modified version of the code in the "Erlang Interoperability Tutorial Guide" (http://www.erlang.org/doc/tutorial/c_port.html):
init(ExtPrg) ->
register(cport, self()),
process_flag(trap_exit, true),
Port = open_port({spawn, ExtPrg}, [{packet, 2}]),
PInfo = erlang:port_info(Port),
io:format("Port: ~p PInfo: ~p~n", [Port, PInfo]),
RVal = link(Port),
io:format("link? ~p~n", [RVal]),
loop(Port).
loop(Port) ->
receive
{call, Caller, Msg} ->
Port ! {self(), {command, encode(Msg)}},
receive
{Port, {data, Data}} ->
Caller ! {cport, decode(Data)}
end,
loop(Port);
stop ->
Port ! {self(), close},
receive
{Port, closed} ->
exit(normal)
end;
{'EXIT', Port, Reason} ->
exit(port_terminated)
end.
The init call correctly executes the C code, and as you can see sets trap_exit, but the EXTT message is not received when I kill the C code using kill -HUP from the Unix shell. I've tried with and without the link call (the Erlang documentation does not use it). The printing code I've added generates:
Eshell V5.9.1 (abort with ^G)
1> cport:start("./cport 22").
Port: #Port<0.630> PInfo: [{name,"./cport 22"},
{links,[<0.38.0>]},
{id,630},
{connected,<0.38.0>},
{input,0},
{output,0}]
<0.38.0>
link? true
It appears that a link is registered but I'm not catching the trap. What am I missing?
Try the extra option exit_status when calling open_port().
I did a similar Erlang program for supervising game servers.
When the external game server crashed I wanted to restart it from my central Erlang monitoring system.
This was the code that worked for me:
erlang:open_port({spawn_executable, "<Path to my game server start script>"},
[ exit_status ]),
When the external process is killed you will get a message of type
{Port,{exit_status,Status}}
Related
Source file:
-module(biu_server).
-export([start_server/0]).
start_server() ->
{ok, Listen} = gen_tcp:listen(1332,
[binary, {packet, 4},{reuseaddr, true}, {active, true}]),
spawn(fun() -> par_connect(Listen) end).
par_connect(Listen) ->
{ok,Socket} = gen_tcp:accept(Listen),
spawn(fun() -> par_connect(Listen) end),
loop(Socket).
loop(Socket) ->
receive
{tcp, Socket, Request} ->
gen_tcp:send(Socket, term_to_binary(Request)),
loop(Socket);
{tcp_closed, Socket} ->
io:format("Server socket closed~n")
end.
Inside shell:
1> c(biu_server).
{ok,biu_server}
2> biu_server:start_server().
<0.40.0>
3> q().
ok
4> {error_logger,{{2016,1,9},{18,40,28}},"Error in process ~p with exit value:~n~p~n",[<0.40.0>,{{badmatch,{error,closed}},[{biu_server,par_connect,1,[{file,"biu_server.erl"},{line,11}]}]}]}
I want to write a echo server, but when I quit erlang shell, the error_logger is warning badmatch, but the client process is already closed.
Why does my server close fail? What happens?
There are a few wonky things in there. The easiest way to talk through them is probably to just show a different version:
-module(biu_server).
-export([start_server/0,start/0]).
start() ->
spawn(fun() -> start_server() end).
start_server() ->
Options = [list, {reuseaddr, true}],
{ok, Listen} = gen_tcp:listen(1337, Options),
par_connect(Listen).
par_connect(Listen) ->
{ok, Socket} = gen_tcp:accept(Listen),
% The line below is a whole different universe of confusion for now.
% Get sequential stuff figured out first.
% spawn(fun() -> par_connect(Listen) end),
loop(Socket).
loop(Socket) ->
ok = inet:setopts(Socket, [{active, once}]),
receive
{tcp, Socket, Request} ->
ok = io:format("Received TCP message: ~tp~n", [Request]),
Response = io_lib:format("You said: ~tp\r\n", [Request]),
gen_tcp:send(Socket, Response),
loop(Socket);
{tcp_closed, Socket} ->
io:format("Server socket closed~n");
quit ->
ok = io:format("Fine! I QUIT!~n");
Unexpected ->
ok = io:format("Unexpected message: ~tp~n", [Unexpected]),
loop(Socket)
end.
Note that above I commented out the call to spawn a new process to handle the connection. One of the problems the original version had was that the remaining listener had no way to be terminated because it was always blocking on TCP accept -- forever.
There are also some socket control issues there -- the easiest way around that is to have each listener spawn the next listener and move on to handle its newly acquired connection. Another way is to do what you've done, but there are some details that need to be managed to make it work smoothly. (Example (don't worry, still easy): https://github.com/zxq9/erlmud/blob/master/erlmud-0.1/tcplistener.erl).
I also added two new clauses to the receive in your main loop: one that lets us tell the process to kill itself from within Erlang, and another that handles unexpected messages from within the system. The "Unexpected" clause is important for two reasons: it tells us what is going on (you shouldn't normally be receiving mysterious messages, but it can happen and its not always your fault), and it prevents the process mailbox from stacking up with unmatched-but-unmanageable messages.
Just sticking with the version above, here is what happens during my first session...
In the Erlang shell:
1> c(biu_server).
{ok,biu_server}
2> {Pid, Ref} = spawn_monitor(biu_server, start_server, []).
{<0.40.0>,#Ref<0.0.2.89>}
Received TCP message: "foo\r\n"
Received TCP message: "bar\r\n"
Received TCP message: "Yay! It works!\r\n"
Server socket closed
3> flush().
Shell got {'DOWN',#Ref<0.0.2.89>,process,<0.40.0>,normal}
ok
4> f().
ok
And in a telnet session:
ceverett#changa:~/Code/erlang$ telnet localhost 1337
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
foo
You said: "foo\r\n"
bar
You said: "bar\r\n"
Yay! It works!
You said: "Yay! It works!\r\n"
^]
telnet> quit
Connection closed.
As we can see, the closed connection from the client side terminated as expected. The shell's process was monitoring the TCP server process, so when I flush() at the end we see the 'DOWN' monitor message as expected -- with a normal exit.
Now let's do a similar session, but we'll use the Erlang-side quit message:
5> {Pid, Ref} = spawn_monitor(biu_server, start_server, []).
{<0.43.0>,#Ref<0.0.2.103>}
Received TCP message: "Still works.\r\n"
6> Pid ! "Launch the missiles!".
Unexpected message: "Launch the missiles!"
"Launch the missiles!"
7> Pid ! quit.
Fine! I QUIT!
quit
8> flush().
Shell got {'DOWN',#Ref<0.0.2.103>,process,<0.43.0>,normal}
ok
And on the telnet side:
ceverett#changa:~/Code/erlang$ telnet localhost 1337
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Still works.
You said: "Still works.\r\n"
Connection closed by foreign host.
You notice the "\r\n" in there. That is coming from the client's original message -- all telnet messages are terminated with "\r\n" (so this is what you split the stream on usually -- which is a topic we aren't addressing yet; this code actually is pretending that TCP works like UDP datagrams, but that's not true...). The "\r\n" that are in the server's return message are being interpreted correctly by the telnet client and breaking to the next line.
Writing telnet (or datagram) games is a very pleasant way to explore the Erlang networking modules, by the way.
i solve this question,but i don't know why...
http://erlang.2086793.n4.nabble.com/parallel-tcp-server-closed-once-spawned-td2099538.html
code:
-module(biu_server).
-export([start_server/0,start/0]).
start() ->
spawn(fun() -> start_server() end).
start_server() ->
{ok, Listen} = gen_tcp:listen(1332,
[binary, {packet, 4},{reuseaddr, true}, {active, true}]),
par_connect(Listen).
par_connect(Listen) ->
{ok,Socket} = gen_tcp:accept(Listen),
spawn(fun() -> par_connect(Listen) end),
loop(Socket).
loop(Socket) ->
receive
{tcp, Socket, Request} ->
gen_tcp:send(Socket, term_to_binary(Request)),
loop(Socket);
{tcp_closed, Socket} ->
io:format("Server socket closed~n")
end.
When you call q() from shell, it calls init:stop/0. What it does based on documentation is:
All applications are taken down smoothly, all code is unloaded, and all ports are closed before the system terminates.
When you call start_server/0, it spawns a process and that process opens a port for TCP connection. Because you start your server from inside the shell, so it is linked to the shell, so when the shell gets an exit signal, it sends it again to all the linked processes, then the following error report will be produced by the error_logger module.
1> biu:start_server().
<0.35.0>
2> exit(normal).
=ERROR REPORT==== 9-Jan-2016::16:40:29 ===
Error in process <0.35.0> with exit value: {{badmatch,{error,closed}},[{biu,par_connect,1,[{file,"biu.erl"},{line,12}]}]}
** exception exit: normal
3>
With knowing this, when you call q() before closing the port, all the modules are unloaded and ports are closed one by one, so the unwanted behaviour will result.
When I register a process to an atom, I can send a message via the atom instead of the Pid interchangeably, which is convenient. However, pattern matching seems to treat Pid and atom as different entities, which is expected but inconvenient. In my example, the {Pid, Response} pattern does not match since Pid in this scope is an atom but the message sent as response contains the actual Pid.
Is there a preferred way to handle this?
The Program:
-module(ctemplate).
-compile(export_all).
start(AnAtom, Fun) ->
Pid = spawn(Fun),
register(AnAtom, Pid).
rpc(Pid, Request) ->
Pid ! {self(), Request},
receive
{Pid, Response} ->
Response;
Any ->
io:format("Finish (wrong):~p~n",[{Pid, Any}])
end.
loop(X) ->
receive
{Sender, Any} ->
io:format("Received: ~p~n",[{Sender, Any}]),
Sender ! {self(), "Thanks for contacting us"},
loop(X)
end.
The Shell:
Eshell V5.10.2 (abort with ^G)
1> c(ctemplate).
{ok,ctemplate}
2> ctemplate:start(foo, fun() -> ctemplate:loop([]) end).
true
3> ctemplate:rpc(foo, ["bar"]).
Received: {<0.32.0>,["bar"]}
Finish (wrong):{foo,{<0.40.0>,"Thanks for contacting us"}}
ok
4> whereis(foo).
<0.40.0>
Use references instead. The example you are suggesting is actually one of the reasons why refs are better for synchronous messages. Another reason is that sometimes you cannot guarantee that the received message is the one you are actually expecting.
So, your code will look like something
rpc(PidOrName, Request) ->
Ref = make_ref(),
PidOrName ! {{self(), Ref}, Request},
receive
{{Pid, Ref}, Response} ->
Response;
Any ->
io:format("Finish (wrong):~p~n",[{PidOrName, Any}])
end.
loop(X) ->
receive
{{Pid, Ref}, Any} ->
io:format("Received: ~p~n",[{Sender, Any}]),
Sender ! {{self(), Ref}, "Thanks for contacting us"},
end,
loop(X).
A couple notes about your and my code:
Note how I moved the last loop/1 call to the end of the function out of the receive block. Erlang does compile-time tail call optimizations, so your code should be fine, but it's a better idea to make tail calls explicitly – helps you to avoid mistakes.
You are probably trying to re-invent gen_server. The only two major differences between gen_server:call/2 and my code above are timeouts (gen_server has them) and that the reference is created by monitoring the remote process. This way, if the process dies before the timeout is thrown, we receive an immediate message. It's slower in many cases, but sometimes proves itself useful.
Overall, try to use OTP and read its code. It's good and gives you better ideas of how Erlang application should work.
rpc(Pid, Request) ->
Pid ! {self(), Request},
receive
{whereis(Pid), Response} ->
Response;
Any ->
io:format("Finish (wrong):~p~n",[{Pid, Any}])
end.
you can this function whereis():
whereis(RegName) -> pid() | port() | undefined
Types:
RegName = atom()
Returns the pid or port identifier with the registered name RegName. Returns undefined if the name is not registered.
For example:
whereis(db).
<0.43.0>
I am trying to write a first program in Erlang that effects message communication between a client and server. In theory the server exits when it receives no message from the client, but every time I edit the client code and run the server again, it executes the old code. I have to ^G>q>erl>[re-enter command] to get it to see the new code.
-module(srvEsOne).
%%
%% export functions
%%
-export([start/0]).
%%function definition
start()->
io:format("Server: Starting at pid: ~p \n",[self()]),
case lists:member(serverEsOne, registered()) of
true ->
unregister(serverEsOne); %if the token is present, remove it
false ->
ok
end,
register(serverEsOne,self()),
Pid = spawn(esOne, start,[self()]),
loop(false, false,Pid).
%
loop(Prec, Nrec,Pd)->
io:format("Server: I am waiting to hear from: ~p \n",[Pd]),
case Prec of
true ->
case Nrec of
true ->
io:format("Server: I reply to ~p \n",[Pd]),
Pd ! {reply, self()},
io:format("Server: I quit \n",[]),
ok;
false ->
receiveLoop(Prec,Nrec,Pd)
end;
false ->
receiveLoop(Prec,Nrec,Pd)
end.
receiveLoop(Prec,Nrec,Pid) ->
receive
{onPid, Pid}->
io:format("Server: I received a message to my pid from ~p \n",[Pid]),
loop(true, Nrec,Pid);
{onName,Pid}->
io:format("Server: I received a message to name from ~p \n",[Pid]),
loop(Prec,true,Pid)
after
5000->
io:format("Server: I received no messages, i quit\n",[]),
ok
end.
And the client code reads
-module(esOne).
-export([start/1, func/1]).
start(Par) ->
io:format("Client: I am ~p, i was spawned by the server: ~p \n",[self(),Par]),
spawn(esOne, func, [self()]),
io:format("Client: Now I will try to send a message to: ~p \n",[Par]),
Par ! {self(), hotbelgo},
serverEsOne ! {self(), hotbelgo},
ok.
func(Parent)->
io:format("Child: I am ~p, i was spawned from ~p \n",[self(),Parent]).
The server is failing to receive a message from the client, but I can't sensibly begin to debug that until I can try changes to the code in a more straightforward manner.
When you make modification to a module you need to compile it.
If you do it in an erlang shell using the command c(module) or c(module,[options]), the new compiled version of the module is automatically loaded in that shell. It will be used by all the new process you launch.
For the one that are alive and already use it is is more complex to explain and I think it is not what you are asking for.
If you have several erlang shells running, only the one where you compile the module loaded it. That means that in the other shell, if the module were previously loaded, basically if you already use the module in those shell, and even if the corresponding processes are terminated, the new version is ignored.
Same thing if you use the command erlc to compile.
In all these cases, you need to explicitly load the module with the command l(module) in the shell.
Your server loop contain only local function calls. Running code is changed only if there is remote (or external) function call. So you have to export your loop function first:
-export([loop/3]).
and then you have to change all loop/3 calls in function receiveLoop/3 to
?MODULE:loop(...)
Alternatively you can do same thing with receiveLoop/3 instead. Best practice for serious applications is doing hot code swapping by demand so you change loop/3 to remote/external only after receiving some special message.
EDIT: Below.
Why is my supervised gen_server shutting down so quickly?
I'll give these organizational names to make it more clear the chain of command that I want in my application: First I'm starting with the "assembly_line_worker" then later I'll add the "marketing_specialist" to my supervision tree...
ceo_supervisor.erl
-module(ceo_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]).
start_link(State) ->
supervisor:start_link({local,?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec= {assembly_line_worker_supervisor,
{assembly_line_worker_supervisor, start_link, [Args]},
permanent, infinity, supervisor, [assembly_line_worker_supervisor]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker_supervisor.erl
-module(assembly_line_worker_supervisor).
-behaviour(supervisor).
-export([start_link/1]).
-export([init/1]). %% Internal
start_link(State) ->
supervisor:start_link({local, ?MODULE}, ?MODULE, [State]).
init([Args]) ->
RestartStrategy = {one_for_one, 10, 60},
ChildSpec = {assembly_line_worker, {assembly_line_worker, start_link, [Args]}, permanent,
infinity, worker, [assembly_line_worker]},
{ok, {RestartStrategy, [ChildSpec]}}.
assembly_line_worker.erl
-module(assembly_line_worker).
...
init([State]) ->
process_flag(trap_exit, true),
{ok, State}.
start_link(State) ->
gen_server:start_link({global, ?MODULE}, ?MODULE, [State], []).
handle_cast(...,State} ->
io:format("We're getting this message.~n",[]),
{noreply, State};
...
What's happening is that the assembly line worker does a few bits of work, like receiving a couple of messages that are sent just after the ceo_supervisor:start_link(#innovative_ideas{}) command is called, then it shuts down. Have any idea why? I know that the gen_server is receiving a few messages because it io:format's them to the console.
Thanks!
EDIT: I'm hosting this on Windows via erlsrv.exe and I found that when I start up my program via a function like so:
start() ->
ceo_supervisor:start_link(#innovative_ideas{}),
assembly_line_worker:ask_for_more_pay(), %% Prints out "I want more $$$" as expected,
ok.
...this function exiting immediately causes my supervisors / gen_servers to shut down. I would expect this because all of this is linked via supervision to the original calling process, so when that exits so should the children.
So I guess a better question would be, how can I allow my supervisors to keep running after going through all of the start up configuration? Is there an option other than wrapping all of this in an application? (Which doesn't sound too bad...)
Thanks for the probing questions! I learned more about supervisors that way.
batman
To get more information about what is happening start sasl before you start your supervisor: application:start(sasl).
Another way to debug this would be to start the worker from your erlang shell send the sequence of message that crashed the server. Btw: are you sure that you need 2 levels of supervisors?
Some immediate comments:
In ceo_supervisor:init/1 your supervisor child spec should declare transient instead of permanent.
Run erl -boot start_sasl so you have the error log when something goes bad and you can get the crash report of a crasher.
If you run this in the shell and you make any mistake, then your tree will be forcibly killed. This is because you linked to the shell and the shell crashes upon errors. So you are dragging down your tree. Try something like:
Pid = spawn(fun() -> my_app:start() end).
so you have it split off. You can kill the app by sending an exit message to Pid.
Can anyone explain why I get the following error when I run: erl -noshell -s simple_server and then telnet 127.0.0.1 1084? The code itself is below the error message.
=ERROR REPORT==== 13-Aug-2011::23:12:05 === Error in process <0.30.0>
with exit value:
{{badmatch,{error,closed}},[{simple_server,wait_accept,1}]}
-module(simple_server).
-compile(export_all).
start() ->
{ok, ListenSoc} = gen_tcp:listen(1084, [binary, {active, false}]),
wait_accept(ListenSoc).
wait_accept(ListenSoc) ->
{ok, Socket} = gen_tcp:accept(ListenSoc),
spawn(?MODULE, wait_accept, [ListenSoc]),
send_resp(Socket).
send_resp(Socket) ->
gen_tcp:send(Socket, "Response from simple server...\n"),
ok = gen_tcp:close(Socket).
This thing:
{{badmatch,{error,closed}},
[{simple_server,wait_accept,1}]}
should be read as: "We are in simple_server:wait_accept/1" and we got a badmatch error (see http://www.erlang.org/doc/reference_manual/errors.html#id81191). This means that our match expression
{ok, Socket} = gen_tcp:accept(ListenSock),
returned {error, closed} (as it is the only match expression in that function it is simple). Why it returned that is a bit murky to me. The best bet is that the process calling the start/0 function has terminated and then the listen socket has been closed because of that termination (this happens automatically). Note that an error in the erlang shell will restart it and as such close the listen socket.
When a socket is created it's linked to the current process, which known as socket controlling process. When controlling process is terminated all linked sockets are just closed. In your case ListenSoc is closed when send_resp() is over (so controlling process of the socket is terminated) and gen_tcp:accept(ListenSoc) in newly created process returns {error, closed}.
Easiest fix is to add call to gen_tcp:contorolling_process/2 in wait_accept like this:
...
Pid = spawn(?MODULE, wait_accept, [ListenSoc]),
ok = gen_tcp:controlling_process(ListenSoc, Pid),
...
But in real projects it's better to loop in listen socket controlling process. You can get an idea from Nicolas Buduroi answer.