I'm testing the code in Getting Started with Erlang User's Guide Concurrent Programming Section.
In tut17.erl, I started a process with erl -sname ping, and another process with al -sname pong as the guide described.
-module(tut17).
-export([start_ping/1, start_pong/0, ping/2, pong/0]).
ping(0, Pong_Node) ->
{pong, Pong_Node} ! finished,
io:format("ping finished~n", []);
ping(N, Pong_Node) ->
{pong, Pong_Node} ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping(N - 1, Pong_Node).
pong() ->
receive
finished -> io:format("Pong finished~n", []);
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
end.
start_pong() ->
register(pong, spawn(tut17, pong, [])).
start_ping(Pong_Node) ->
spawn(tut17, ping, [3, Pong_Node]).
From the ping and pong process, I could invoke the start_ping and start_pong to check everything works fine.
(ping#smcho)1> tut17:start_ping(pong#smcho).
<0.40.0>
Ping received pong
Ping received pong
Ping received pong
ping finished
(pong#smcho)2> tut17:start_pong().
true
Pong received ping
Pong received ping
Pong received ping
Pong finished
I'm trying to run the same code from a command line; For a simple hello world example:
-module(helloworld).
-export([start/0]).
start() ->
io:fwrite("Hello, world!\n").
I use the following command line:
erlc helloworld.erl
erl -noshell -s helloworld start -s init stop
So, I just tried with the following, but ended up with a crash.
From ping node: erl -noshell -sname ping -s tut17 start_ping pong#smcho -s init stop
From pong node: erl -noshell -sname pong -s tut17 start_pong -s init stop
However, I got this error report from the ping when pong ends without printing anything.
=ERROR REPORT==== 6-Mar-2015::20:29:24 ===
Error in process <0.35.0> on node 'ping#smcho' with exit value:
{badarg,[{tut17,ping,2,[{file,"tut17.erl"},{line,9}]}]}
Compared to the REPL approach, with command line, each process does not wait for the counterpart to response, but stops after some time.
What might be wrong?
Arguments from the command line are received as a list of atoms when using the -s switch, and a list of strings when using the -run switch. With this in mind, let's think through what happens...
This command is issued from the shell:
erl -noshell -sname ping \
-s tut17 start_ping pong#smcho \
-s init stop
So start_ping/1 is being called with the argument [pong#smcho]. It is then calling ping/2 as ping(3, [pong#smcho]), which in its first line tries to do {pong, [pong#smcho]} ! {ping, self()}. Because a list is not a valid target for a message... your world explodes.
To run this both from the Erlang shell and the system shell comfortably you could add a clause to start_ping/1:
start_ping([Pong_Node]) ->
spawn(tut17, ping, [3, Pong_Node]);
start_ping(Pong_Node) ->
start_ping([Pong_Node]).
There are some changes needed.
start_ping
From zxq9's hint, I modified the start_ping function.
start_ping([Pong_Node]) ->
io:format("Ping started~n", []),
spawn(tut17, ping, [3, Pong_Node]);
start_ping(Pong_Node) ->
start_ping([Pong_Node]).
Invoke the init:stop() when process is finished.
I'm not sure this is absolutely necessary, but it seems to be working with this modification.
ping(0, Pong_Node) ->
{pong, Pong_Node} ! finished,
io:format("ping finished~n", []),
init:stop();
Then I could remove -s init stop from the shell command: erl -noshell -sname ping -s tut17 start_ping pong#smcho.
Execution order
Pong should be invoked before Ping is invoked.
The updated code
This is the revised code:
-module(tut17).
-export([start_ping/1, start_pong/0, ping/2, pong/0]).
ping(0, Pong_Node) ->
{pong, Pong_Node} ! finished,
io:format("ping finished~n", []),
init:stop();
ping(N, Pong_Node) ->
{pong, Pong_Node} ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping(N - 1, Pong_Node).
pong() ->
receive
finished ->
io:format("Pong finished~n", []),
init:stop();
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
end.
start_pong() ->
io:format("Pong started~n", []),
register(pong, spawn(tut17, pong, [])).
start_ping([Pong_Node]) ->
io:format("Ping started~n", []),
spawn(tut17, ping, [3, Pong_Node]);
start_ping(Pong_Node) ->
start_ping([Pong_Node]).
The shell commands:
ping: erl -noshell -sname ping -s tut17 start_ping pong#smcho
pong: erl -noshell -sname pong -s tut17 start_pong
Related
working through Joe's book, got stuck on Chapter 12 exercise 1. That exercise is asking one to write a function start(AnAtom,Fun) that would register AnAtom as spawn(Fun). I've decided to try something seemingly easier - took the chapter's finished 'area_server' module, and modified it's start/0 function like this:
start() ->
Pid = spawn(ex1, loop, []),
io:format("Spawned ~p~n",[Pid]),
register(area, Pid).
so in place of a process executing the arbitrary Fun, I am registering the 'loop', which is a function in the area_server module doing all the work:
loop() ->
receive
{From, {rectangle, Width, Ht}} ->
io:format("Computing for rectangle...~n"),
From ! {self(), Width*Ht},
loop();
{From, {square, Side}} ->
io:format("Computing for square...~n"),
From ! {self(), Side*Side},
loop();
{From, Other} ->
io:format("lolwut?~n"),
From ! {self(), {error, Other}},
loop()
end.
It seems to be working just fine:
1> c("ex1.erl").
{ok,ex1}
2> ex1:start().
Spawned <0.68.0>
true
3>
3> area ! {self(), hi}.
lolwut?
{<0.61.0>,hi}
4> flush().
Shell got {<0.68.0>,{error,hi}}
ok
5> area ! {self(), {square, 7}}.
Computing for square...
{<0.61.0>,{square,7}}
6> flush().
Shell got {<0.68.0>,49}
ok
Thing went bad when I've tried to test that multiple processes can talk to the registered "server". (CTRL-G, s, c 2)
I'm in a new shell, running alongside the first - but the moment I send a message from this new shell to my 'area' registered process, something nasty happens - when querying process_info(whereis(area)), process moves from this state:
{current_function,{ex1,loop,0}},
{initial_call,{ex1,loop,0}},
to this one:
{current_function,{io,execute_request,2}},
{initial_call,{ex1,loop,0}},
while the message queue starts to grow, messages not getting processed. Hanging in module io, huh! Something is blocked on the io operations? Apparently the process is moved from my ex1:loop/0 into io:execute_request/2 (whatever that is)... are my silly prints causing the problem?
Your processes are doing what you expect with the exception of handling who has control over STDOUT at what moment. And yes, this can cause weird seeming behaviors in the shell.
So let's try something like this without any IO commands that are implied to go to STDOUT and see what happens. Below is a shell session where I define a loop that accumulates messages until I ask it to send me the messages it has accumulated. We can see from this example (which does not get hung up on who is allowed to talk to the single output resource) that the processes behave as expected.
One thing to take note of is that you do not need multiple shells to talk to or from multiple processes.
Note the return value of flush/0 in the shell -- it is a special shell command that dumps the shell's mailbox to STDOUT.
Eshell V9.0 (abort with ^G)
1> Loop =
1> fun L(History) ->
1> receive
1> halt ->
1> exit(normal);
1> {Sender, history} ->
1> Sender ! History,
1> L([]);
1> Message ->
1> NewHistory = [Message | History],
1> L(NewHistory)
1> end
1> end.
#Fun<erl_eval.30.87737649>
2> {Pid1, Ref1} = spawn_monitor(fun() -> Loop([]) end).
{<0.64.0>,#Ref<0.1663562856.2369257474.102541>}
3> {Pid2, Ref2} = spawn_monitor(fun() -> Loop([]) end).
{<0.66.0>,#Ref<0.1663562856.2369257474.102546>}
4> Pid1 ! "blah".
"blah"
5> Pid1 ! "blee".
"blee"
6> Pid1 ! {self(), history}.
{<0.61.0>,history}
7> flush().
Shell got ["blee","blah"]
ok
8> Pid1 ! "Message from shell 1".
"Message from shell 1"
9> Pid2 ! "Message from shell 1".
"Message from shell 1"
10>
User switch command
--> s
--> j
1 {shell,start,[init]}
2* {shell,start,[]}
--> c 2
Eshell V9.0 (abort with ^G)
1> Shell1_Pid1 = pid(0,64,0).
<0.64.0>
2> Shell1_Pid2 = pid(0,66,0).
<0.66.0>
3> Shell1_Pid1 ! "Message from shell 2".
"Message from shell 2"
4> Shell1_Pid2 ! "Another message from shell 2".
"Another message from shell 2"
5> Shell1_Pid1 ! {self(), history}.
{<0.77.0>,history}
6> flush().
Shell got ["Message from shell 2","Message from shell 1"]
ok
7>
User switch command
--> c 1
11> Pid2 ! {self(), history}.
{<0.61.0>,history}
12> flush().
Shell got ["Another message from shell 2","Message from shell 1"]
ok
I followed the book "Programming Erlang — Joe Armstrong" to try to build the communication between 2 Mac computers with Erlang (Chap 14):
% file: kvs.erl
-module(kvs).
-export([start/0, store/2, lookup/1]).
start() -> register(kvs, spawn(fun() -> loop() end)).
store(Key, Value) -> rpc({store, Key, Value}).
lookup(Key) -> rpc({lookup, Key}).
rpc(Q) ->
kvs ! {self(), Q},
receive
{kvs, Reply} ->
Reply
end.
loop() ->
receive
{From, {store, Key, Value}} ->
put(Key, {ok, Value}),
From ! {kvs, true},
loop();
{From, {lookup, Key}} ->
From ! {kvs, get(Key)},
loop()
end.
Set up Mac 1 (Mac Pro) and run a Erlang server:
$ sudo hostname this.is.macpro.com
$ hostname
this.is.macpro.com
$ ipconfig getifaddr en2
aaa.bbb.ccc.209
$ erl -name server -setcookie abcxyz
(server#this.is.macpro.com)> c("kvs.erl").
{ok,kvs}
(server#this.is.macpro.com)> kvs:start().
true
(server#this.is.macpro.com)> kvs:store(hello, world).
true
(server#this.is.macpro.com)> kvs:lookup(hello).
{ok,world}
I tried using both IP and hostname to make a RPC from another Mac but get {badrpc, nodedown}.
Set up Mac 2 (MacBook Pro) and try to call Mac 1:
$ sudo hostname this.is.macbookpro.com
$ hostname
this.is.macbookpro.com
$ ipconfig getifaddr en2
aaa.bbb.ccc.211 # different IP
$ erl -name client -setcookie abcxyz
% try using the hostname of Mac 1 but failed
(client#this.is.macbookpro.com)> rpc:call('server#this.is.macpro.com', kvs, lookup, [hello]).
{badrpc, nodedown}
% try using the IP address of Mac 1 but failed
(client#this.is.macbookpro.com)> rpc:call('server#aaa.bbb.ccc.209', kvs, lookup, [hello]).
{badrpc, nodedown}
How to set up my Mac computers and make them available for RCP with Erlang?
When using th -name, you should provide the full name. The syntax you are using is for -sname. Try this:
erl -name server#this.is.macpro.com -setcookie "abcxyz"
erl -name client#this.is.macbookpro.com -setcookie "abcxyz"
You can also specify the an IP after the # in both cases.
Then from one node, connect to the other node:
net_kernel:connect_node('client#this.is.macbookpro.com').
This should return true. If it returns false then you are not connected. You can very with nodes()..
(joe#teves-MacBook-Pro.local)3> net_kernel:connect_node('steve#Steves-MacBook-Pro.local').
true
(joe#teves-MacBook-Pro.local)4> nodes().
['steve#Steves-MacBook-Pro.local']
If this does not fix it, then you can check epmd on both systems to ensure they are registered.
epmd -names
I have two erlang nodes, node01 is 'vm01#192.168.146.128', node02 is 'vm02#192.168.146.128'. I want to start one process on node01 by using spawn(Node, Mod, Fun, Args) on node02, but I always get useless pid.
Node connection is ok:
(vm02#192.168.146.128)14> net_adm:ping('vm01#192.168.146.128').
pong
Module is in the path of node01 and node02:
(vm01#192.168.146.128)7> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
(vm02#192.168.146.128)20> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
However, the spawn is not successful:
(vm02#192.168.146.128)21> spawn('vm01#192.168.146.128', remote_process, start, []).
I'm on node 'vm01#192.168.146.128'
<9981.89.0>
My pid is <9981.90.0>
(vm01#192.168.146.128)8> whereis(remote_process).
undefined
The process is able to run on local node:
(vm02#192.168.146.128)18> remote_process:start().
I'm on node 'vm02#192.168.146.128'
My pid is <0.108.0>
{ok,<0.108.0>}
(vm02#192.168.146.128)24> whereis(remote_process).
<0.115.0>
But it fails on remote node. Can anyone give me some idea?
Here is the source code remote_process.erl:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1]).
start() ->
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.
You are using a global registration for your process, it is necessary for your purpose. The function to retrieve it is global:whereis_name(remote_process).
Edit : It works if
the 2 nodes are connected (check with nodes())
the process is registered with the global module
the process is still alive
if any of these conditions is not satisfied you will get undefined
Edit 2: start node 1 with : werl -sname p1 and type in the shell :
(p1#W7FRR00423L)1> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)2> remote_process:start().
I'm on node p1#W7FRR00423L
My pid is <0.69.0>
{ok,<0.69.0>}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
<0.69.0>
(p1#W7FRR00423L)4>
then start a second node with werl - sname p2 and type in the shell (it is ok to connect the second node later, the global registration is "updated" when necessary):
(p2#W7FRR00423L)1> net_kernel:connect_node(p1#W7FRR00423L).
true
(p2#W7FRR00423L)2> nodes().
[p1#W7FRR00423L]
(p2#W7FRR00423L)3> global:whereis_name(remote_process).
<7080.69.0>
(p2#W7FRR00423L)4>
(p2#W7FRR00423L)4>
Edit 3:
In your test you are spawning a process P1 on the remote node which executes the function remote_process:start/0.
This function calls supervisor:start_link/3 which basically spawns a new supervisor process P2 and links itself to it. after this, P1 has nothing to do anymore so it dies, causing the linked process P2 to die too and you get an undefined reply to the global:whereis_name call.
In my test, I start the process from the shell of the remote node; the shell does not die after I evaluate remote_process:start/0, so the supervisor process does not die and global:whereis_name find the requested pid.
If you want that the supervisor survive to the call, you need an intermediate process that will be spawned without link, so it will not die with its parent. I give you a small example based on your code:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1,local_spawn/0,remote_start/1]).
remote_start(Node) ->
spawn(Node,?MODULE,local_spawn,[]).
local_spawn() ->
% spawn without link so start_wait_stop will survive to
% the death of local_spawn process
spawn(fun start_wait_stop/0).
start_wait_stop() ->
start(),
receive
stop -> ok
end.
start() ->
io:format("start (~p)~n",[self()]),
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.
in the shell you get in node 1
(p1#W7FRR00423L)1> net_kernel:connect_node(p2#W7FRR00423L).
true
(p1#W7FRR00423L)2> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
undefined
(p1#W7FRR00423L)4> remote_process:remote_start(p2#W7FRR00423L).
<7080.68.0>
start (<7080.69.0>)
I'm on node p2#W7FRR00423L
My pid is <7080.70.0>
(p1#W7FRR00423L)5> global:whereis_name(remote_process).
<7080.70.0>
(p1#W7FRR00423L)6> global:whereis_name(remote_process).
undefined
and in node 2
(p2#W7FRR00423L)1> global:registered_names(). % before step 4
[]
(p2#W7FRR00423L)2> global:registered_names(). % after step 4
[remote_process]
(p2#W7FRR00423L)3> rp(processes()).
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
<0.34.0>,<0.35.0>,<0.36.0>,<0.37.0>,<0.38.0>,<0.39.0>,
<0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,
<0.46.0>,<0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,
<0.52.0>,<0.53.0>,<0.54.0>,<0.55.0>,<0.56.0>,<0.57.0>,
<0.58.0>,<0.62.0>,<0.64.0>,<0.69.0>,<0.70.0>]
ok
(p2#W7FRR00423L)4> pid(0,69,0) ! stop. % between steps 5 and 6
stop
(p2#W7FRR00423L)5> global:registered_names().
[]
I wrote a supervisor (shown below).
It only has one child process that I get from using locations:start_link/0. I expect it to start up a supervisor and register itself globally. That way, I can get to by using global:whereis_name/1.
When I start the supervisor through the shell it works as expected:
$ erl
1> locator_suo:start_link().
registering global supervisor
starting it....
supervisor <0.34.0>
{ok,<0.34.0>}
Then I can get to it by its global name, locator_sup:
2> global:whereis_name( locator_sup ).
<0.34.0>
But I want to start the system using a startup script, so I tried starting the system like so:
$ erl -s locator_sup start_link
registering global supervisor
starting it....
supervisor <0.32.0>
It seems that the init function for the supervisor is being called, but when I try to find the supervisor by its global name, I get undefined
1> global:whereis_name( locator_sup ).
undefined
So my question is, why does the supervisor process only get registered when I use start_link from the shell?
The supervisor module:
-module(locator_sup).
-behaviour(supervisor).
%% API
-export([start_link/0]).
%% Supervisor callbacks
-export([init/1]).
%% ===================================================================
%% API functions
%% ===================================================================
start_link() ->
io:format( "registering global supervisor\n" ),
{ok, E} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
io:format("supervisor ~p\n", [E] ),
{ok, E}.
%% ===================================================================
%% Supervisor callbacks
%% ===================================================================
% only going to start the gen_server that keeps track of locations
init(_) ->
io:format( "starting it....\n" ),
{ok, {{one_for_one, 1, 60},
[{locations, {locations, start_link, []},
permanent, brutal_kill, worker, [locations]}]}}.
One reason you may have that it is because you start your node not in distributed mode.
First of all add such params to see what happens during startup: erl -boot start_sasl.
Second add node name (it will automatically enable distributed mode) : ... -sname my_node
So the startup command will look like:
erl -boot start_sasl -sname my_node -s locator_sup start_link
I have an erlang service that trigged by erl_call. erl_call will make a long call like "gen_server:call(?SERVER, do, infinity)" to wait the result. If erlang service down, erl_call will return. But if erl_call be interrupted (use CTRL-C), the erlang service do not receive any message.
I check with appmon and pman. The process that erl_call started not die after erl_call disconnected. So link/monitor to that process is not work. How do I detect erl_call already disconnected?
In your handle_call function there is a second. argument From :: {pid(), tag()}
You can call monitor(process, FromPid) in handle_call before processing the request so you'll receive DOWN message when your erl_call node disconnects. But note that you won't be able to process DOWN message before current handle_call completes unless you spawn a separate process or use delayed reply with gen_server:reply().
For example: here's our handle_call clause:
handle_call(test, {From, _}=X, State) ->
erlang:monitor(process, From),
spawn(fun() ->
timer:sleep(10000),
io:format("### DONE~n", []),
gen_server:reply(X, ok) end),
{noreply, State}.
Next we catch DOWN:
handle_info(Info, State) ->
io:format("### Process died ~p~n", [Info]),
{noreply, State}.
Next I spawn erl_call from command line:
erl_call -c 123 -n test -a 'gen_server call [a, test, infinity]'
and hit Ctrl-C
in gen_server console after 10 sec I see:
### DONE
### Process died {'DOWN',#Ref<0.0.0.41>,process,<0.44.0>,normal}