In Programming Erlang by Joe Armstrong, Chapter 12, "Making a Set of Processes That All Die Together", the following code is given:
% (Some variables are renamed and comments added for extra clarity)
start(WorkerFuns) ->
spawn(fun() ->
% Parent process
[spawn_link(WorkerFun) || WorkerFun <- WorkerFuns],
receive
after infinity -> true
end
end).
The resulting processes are linked as such:
+- parent -+
/ | \
/ | \
worker1 worker2 .. workerN
If a worker crashes, then the parent crashes, and then the remaining workers crash as well. However, if all of the workers exit normally, then the parent process lives forever, albeit in a suspended state.
While Erlang processes are supposed to be cheap, if start/1 is called many times in a long-running service, one process—the parent—appears to be "leaked" every time all workers exit normally.
Is this ever a problem in practice? And is the extra code to properly account for when all workers exit normally (see below), worth it?
start(WorkerFuns) ->
spawn(fun() ->
% Parent process
process_flag(trap_exit, true),
[spawn_link(WorkerFun) || WorkerFun <- WorkerFuns],
parent_loop(length(WorkerFuns))
end).
parent_loop(0) ->
% All workers exited normally
true;
parent_loop(RemainingWorkers) ->
receive
{'EXIT', _WorkerPid, normal} ->
parent_loop(RemainingWorkers - 1);
{'EXIT', _WorkerPid, CrashReason} ->
exit(CrashReason)
end.
Your analysis is correct. The code as given does not account for normal termination of the workers and will leave a dangling process. The space leak will be about 2 kb per invocation, so in a large system you're not likely to notice it unless you call start/1 a thousand times or more, but for a system expected to run "forever" you should definitely add the extra code.
Related
I am writing a program that solves producers-consumers problem using Erlang multiprocessing with one process responsible for handling buffer to which I produce/consume and many producers and many consumers processes. To simplify I assume producer/consumer does not know that his operation has failed (that it is impossible to produce or consume because of buffer constraints), but the server is prepared to do this.
My code is:
Server code
server(Buffer, Capacity, CountPid) ->
receive
%% PRODUCER
{Pid, produce, InputList} ->
NumberProduce = lists:flatlength(InputList),
case canProduce(Buffer, NumberProduce, Capacity) of
true ->
NewBuffer = append(InputList, Buffer),
CountPid ! lists:flatlength(InputList),
Pid ! ok,
server(NewBuffer,Capacity, CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end;
%% CONSUMER
{Pid, consume, Number} ->
case canConsume(Buffer, Number) of
true ->
Data = lists:sublist(Buffer, Number),
NewBuffer = lists:subtract(Buffer, Data),
Pid ! {ok, Data},
server(NewBuffer, Capacity,CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end
end.
Producer and consumer
producer(ServerPid) ->
X = rand:uniform(9),
ToProduce = [rand:uniform(500) || _ <- lists:seq(1, X)],
ServerPid ! {self(),produce,ToProduce},
producer(ServerPid).
consumer(ServerPid) ->
X = rand:uniform(9),
ServerPid ! {self(),consume,X},
consumer(ServerPid).
Starting and auxiliary functions (I enclose as I don't know where exactly my problem is)
spawnProducers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,producer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
spawnConsumers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,consumer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
start(ProdsNumber, ConsNumber) ->
CountPid = spawn(zad2, count, [0,0]),
ServerPid = spawn(zad2,server,[[],20, CountPid]),
spawnProducers(ProdsNumber, ServerPid),
spawnConsumers(ConsNumber, ServerPid).
canProduce(Buffer, Number, Capacity) ->
lists:flatlength(Buffer) + Number =< Capacity.
canConsume(Buffer, Number) ->
lists:flatlength(Buffer) >= Number.
append([H|T], Tail) ->
[H|append(T, Tail)];
append([], Tail) ->
Tail.
I am trying to count number of elements using such process, server sends message to it whenever elements are produced.
count(N, ThousandsCounter) ->
receive
X ->
if
N >= 1000 ->
io:format("Yeah! We have produced ~p elements!~n", [ThousandsCounter]),
count(0, ThousandsCounter + 1000);
true -> count(N + X, ThousandsCounter)
end
end.
I expect this program to work properly, which means: it produces elements, increase of produced elements depends on time like f(t) = kt, k-constant and the more processes I have the faster production is.
ACTUAL QUESTION
I launch program:
erl
c(zad2)
zad2:start(5,5)
How the program behaves:
The longer production lasts the less elements in the unit of time are being produced (e.g. in first second 10000, in next 5000, in 10th second 1000 etc.
The more processes I have, the slower production is, in start(10,10) I need to wait about a second for first thousand, whereas for start(2,2) 20000 appears almost immediately
start(100,100) made me restart my computer (I work on Ubuntu) as the whole CPU was used and there was no memory available for me to open terminal and terminate erlang machine
Why does my program not behave like I expect? Am I doing something wrong with Erlang programming or is this the matter of OS or anything else?
The producer/1 and consumer/1 functions as written above don't ever wait for anything - they just loop and loop, bombarding the server with messages. The server's message queue is filling up very quickly, and the Erlang VM will try to grow as much as it can, stealing all your memory, and the looping processes will steal all available CPU time on all cores.
I wanted to send a message to a process after a delay, and discovered erlang:send_after/4.
When looking at the docs it looked like this is exactly what I wanted:
erlang:send_after(Time, Dest, Msg, Options) -> TimerRef
Starts a timer. When the timer expires, the message Msg is sent to the
process identified by Dest.
However, it doesn't seem to work when the destination is running on another node - it tells me one of the arguments are bad.
1> P = spawn('node#host', module, function, [Arg]).
<10585.83.0>
2> erlang:send_after(1000, P, {123}).
** exception error: bad argument
in function erlang:send_after/3
called as erlang:send_after(1000,<10585.83.0>,{123})
Doing the same thing with timer:send_after/3 appears to work fine:
1> P = spawn('node#host', module, function, [Arg]).
<10101.10.0>
2> timer:send_after(1000, P, {123}).
{ok,{-576458842589535,#Ref<0.1843049418.1937244161.31646>}}
And, the docs for timer:send_after/3 state almost the same thing as the erlang version:
send_after(Time, Pid, Message) -> {ok, TRef} | {error, Reason}
Evaluates Pid ! Message after Time milliseconds.
So the question is, why do these two functions, which on the face of it do the same thing, behave differently? Is erlang:send_after broken, or mis-advertised? Or maybe timer:send_after isn't doing what I think it is?
TL;DR
Your assumption is correct: these are intended to do the same thing, but are implemented differently.
Discussion
Things in the timer module such as timer:send_after/2,3 work through the gen_server that defines that as a service. Like any other service, this one can get overloaded if you assign a really huge number of tasks (timers to track) to it.
erlang:send_after/3,4, on the other hand, is a BIF implemented directly within the runtime and therefore have access to system primitives like the hardware timer. If you have a ton of timers this is definitely the way to go. In most programs you won't notice the difference, though.
There is actually a note about this in the Erlang Efficiency Guide:
3.1 Timer Module
Creating timers using erlang:send_after/3 and erlang:start_timer/3 , is much more efficient than using the timers provided by the timer module in STDLIB. The timer module uses a separate process to manage the timers. That process can easily become overloaded if many processes create and cancel timers frequently (especially when using the SMP emulator).
The functions in the timer module that do not manage timers (such as timer:tc/3 or timer:sleep/1), do not call the timer-server process and are therefore harmless.
A workaround
A workaround to gain the efficiency of the BIF without the same-node restriction is to have a process of your own that does nothing but wait for a message to forward to another node:
-module(foo_forward).
-export([send_after/3, cancel/1]).
% Obviously this is an example only. You would want to write this to
% be compliant with proc_lib, write a proper init/N and integrate with
% OTP. Note that this snippet is missing the OTP service functions.
start() ->
spawn(fun() -> loop(self(), [], none) end).
send_after(Time, Dest, Message) ->
erlang:send_after(Time, self(), {forward, Dest, Message}).
loop(Parent, Debug, State) ->
receive
{forward, Dest, Message} ->
Dest ! Message,
loop(Parent, Debug, State);
{system, From, Request} ->
sys:handle_msg(Request, From, Parent, ?MODULE, Debug, State);
Unexpected ->
ok = log(warning, "Received message: ~tp", [Unexpected]),
loop(Parent, Debug, State)
end.
The above example is a bit shallow, but hopefully it expresses the point. It should be possible to get the efficiency of the BIF erlang:send_after/3,4 but still manage to send messages across nodes as well as give you the freedom to cancel a message using erlang:cancel_timer/1
But why?
The puzzle (and bug) is why erlang:send_after/3,4 does not want to work across nodes. The example you provided above looks a bit odd as the first assignment to P was the Pid <10101.10.0>, but the crashed call was reported as <10585.83.0> -- clearly not the same.
For the moment I do not know why erlang:send_after/3,4 doesn't work, but I can say with confidence that the mechanism of operation between the two is not the same. I'll look into it, but I imagine that the BIF version is actually doing some funny business within the runtime to gain efficiency and as a result signalling the target process by directly updating its mailbox instead of actually sending an Erlang message on the higher Erlang-to-Erlang level.
Maybe it is good that we have both, but this should definitely be clearly marked in the docs, and it evidently is not (I just checked).
There is some difference in timeout order if you have many timers.
The example below shows erlang:send_after does not guarantee order, but
timer:send_after does.
1> A = lists:seq(1,10).
[1,2,3,4,5,6,7,8,9,10]
2> [erlang:send_after(100, self(), X) || X <- A].
...
3> flush().
Shell got 2
Shell got 3
Shell got 4
Shell got 5
Shell got 6
Shell got 7
Shell got 8
Shell got 9
Shell got 10
Shell got 1
ok
4> [timer:send_after(100, self(), X) || X <- A].
...
5> flush().
Shell got 1
Shell got 2
Shell got 3
Shell got 4
Shell got 5
Shell got 6
Shell got 7
Shell got 8
Shell got 9
Shell got 10
ok
Can you give some examples where a process gets restarted by Erlang supervisor. If a process dies, it will restart. But how does a process die?
Thanks.
You can take as example what occurs in the Erlang shell, for example consider the sequence:
1> self().
<0.32.0>
2> A = 1.
1
3> self().
<0.32.0>
4> A = 2.
** exception error: no match of right hand side value 2
5> self().
<0.37.0>
1> The first command asks to the shell to prompt its own Pid: <0.32.0>.
2> Next a new command set the variable A to 1, it works, since A was unbound.
3> A new request to the shell shows that its Pid didn't change.
4> trying to match A with the integer 2 fails, it raise an exception. In fact, in the background, the shell process dies, and a supervisor restart it immediately.
5> It can be verified with a new request to get the shell Pid, now it is <0.37.0>.
6> when the shell died, it has lost every information, and it is restarted from scratch. But during initialization it can connect to some other processes who was in charge of keeping the history of the session, and all the bound variables. It can be verified by asking the value of A:
6> A.
1
7> or by asking the history
7> h().
1: self()
-> <0.32.0>
2: A = 1
-> 1
3: self()
-> <0.32.0>
4: A = 2
-> {'EXIT',{{badmatch,2},[{erl_eval,expr,3,[]}]}}
5: self()
-> <0.37.0>
6: A
Depending on the environment (hardware failure, loss of communication, bad parameters, bug...) an erlang process may die with an Error reason. If it is managed in a supervision tree (or your own monitoring) it can be restarted from scratch. It is the application responsibility to provide the means to all the processes to recover the appropriate state.
An erlang process may also die with the reason "normal", for example when a user close a session (in the shell you type q().), in this case, the supervisor will not restart it.
You will find many valuable information on the web:
design principle
erlang.org supervisor
learn you some erlang : run time errors
learn you some erlang : errors and processes
learn you some erlang : supervisors
new to Erlang and just having a bit of trouble getting my head around the new paradigm!
OK, so I have this internal function within an OTP gen_server:
my_func() ->
Result = ibrowse:send_req(?ROOTPAGE,[{"User-Agent",?USERAGENT}],get),
case Result of
{ok, "200", _, Xml} -> %<<do some stuff that won't interest you>>
,ok;
{error,{conn_failed,{error,nxdomain}}} -> <<what the heck do I do here?>>
end.
If I leave out the case for handling the connection failed then I get an exit signal propagated to the supervisor and it gets shut down along with the server.
What I want to happen (at least I think this is what I want to happen) is that on a connection failure I'd like to pause and then retry send_req say 10 times and at that point the supervisor can fail.
If I do something ugly like this...
{error,{conn_failed,{error,nxdomain}}} -> stop()
it shuts down the server process and yes, I get to use my (try 10 times within 10 seconds) restart strategy until it fails, which is also the desired result however the return value from the server to the supervisor is 'ok' when I would really like to return {error,error_but_please_dont_fall_over_mr_supervisor}.
I strongly suspect in this scenario that I'm supposed to handle all the business stuff like retrying failed connections within 'my_func' rather than trying to get the process to stop and then having the supervisor restart it in order to try it again.
Question: what is the 'Erlang way' in this scenario ?
I'm new to erlang too.. but how about something like this?
The code is long just because of the comments. My solution (I hope I've understood correctly your question) will receive the maximum number of attempts and then do a tail-recursive call, that will stop by pattern-matching the max number of attempts with the next one. Uses timer:sleep() to pause to simplify things.
%% #doc Instead of having my_func/0, you have
%% my_func/1, so we can "inject" the max number of
%% attempts. This one will call your tail-recursive
%% one
my_func(MaxAttempts) ->
my_func(MaxAttempts, 0).
%% #doc This one will match when the maximum number
%% of attempts have been reached, terminates the
%% tail recursion.
my_func(MaxAttempts, MaxAttempts) ->
{error, too_many_retries};
%% #doc Here's where we do the work, by having
%% an accumulator that is incremented with each
%% failed attempt.
my_func(MaxAttempts, Counter) ->
io:format("Attempt #~B~n", [Counter]),
% Simulating the error here.
Result = {error,{conn_failed,{error,nxdomain}}},
case Result of
{ok, "200", _, Xml} -> ok;
{error,{conn_failed,{error,nxdomain}}} ->
% Wait, then tail-recursive call.
timer:sleep(1000),
my_func(MaxAttempts, Counter + 1)
end.
EDIT: If this code is in a process which is supervised, I think it's better to have a simple_one_for_one, where you can add dinamically whatever workers you need, this is to avoid delaying initialization due to timeouts (in a one_for_one the workers are started in order, and having sleep's at that point will stop the other processes from initializing).
EDIT2: Added an example shell execution:
1> c(my_func).
my_func.erl:26: Warning: variable 'Xml' is unused
{ok,my_func}
2> my_func:my_func(5).
Attempt #0
Attempt #1
Attempt #2
Attempt #3
Attempt #4
{error,too_many_retries}
With 1s delays between each printed message.
I've been learning how to use ets, but one thing that has bothered me is that, occasionally*, ets:match throws a bad argument… And, from them on, all subsequent calls (even calls which previously worked) also throw a bad argument:
> ets:match(Tid, { [$r | '$1'] }, 1).
% this match works...
% Then, at some point, this comes up:
** exception error: bad argument
in function ets:match/3
called as ets:match(24589,{[114|'$1']},1)
% And from then on, matches stop working:
> ets:match(Tid, { [$r | '$1'] }, 1).
** exception error: bad argument
in function ets:match/3
called as ets:match(24589,{[114|'$1']},1)
Is there any way to "reset" the ets system so that I can query it (ie, from the shell) again?
*: I haven't been able to reproduce the problem… But it happens fairly often while I'm trying to do "other things".
Although I'm not 100% sure, this thread seems to answer your question. It appears that you're observing this behaviour in the shell. If so, two facts are interacting in a confusing way:
An ets table is deleted as soon as its owning process dies.
The erlang shell dies whenver it receives an exception and is silently restarted.
So, when you get the first exception, the current shell process dies causing the ets table to be deleted, and then a new shell process is started for you. Now, when you try another ets:match, it fails because the table no longer exists.
Dale already told you what happens. You can confirm that by calling self() in the shell every now and then.
As a quick workaround you can spawn another process to create a public table for you. Then that table won't die along with your shell.
1> self().
<0.32.0> % shell's Pid
2> spawn(fun() -> ets:new(my_table, [named_table, public]), receive X -> ok end end).
<0.35.0> % the spawned process's Pid
3> ets:insert(my_table, {a, b}).
true
Now make an exception and check that the table indeed survived.
4> 1/0.
** exception error: bad argument in an arithmetic expression
in operator '/'/2
called as 1 / 0
5> self().
<0.38.0> % shell's reborn, with a different Pid
6> ets:insert(my_table, {c, d}).
true
7> ets:tab2list(my_table).
[{c,d},{a,b}] % table did survive the shell restart
To delete the table, just send something to your spawned process:
8> pid(0,35,0) ! bye_bye.
bye_bye
9> ets:info(my_table).
undefined