Can anyone explain this Erlang Crash dump? - erlang

I got this error report when running my Erlang application.
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 18446744071692551144 bytes of memory (of type "heap").
It's a simple program ran on a simple PC. How is it possible to get such numbers? It is trying to allocate 10^10 gb by the way. The program basically only runs tail recursion and a quite low amount of processes.

If you get this error while running you application that means that one of your functions are calling recursively and trying to allocate that much memory where OS would not provide to VM and hence VM crashes with that memory allocation error.

Previously when I was running into similar dumps, it was caused by a huge mailbox in a process, it had piled millions of messages.
You could check it with this snippet of code:
top() ->
Procs = lists:foldl(fun(Pid, Acc) ->
case erlang:process_info(Pid, message_queue_len) of
{_K, V} -> [{Pid, V} | Acc];
_ -> Acc
end
end, [], erlang:processes()),
lists:keysort(2, Procs).

Related

How do Actor Systems function to prevent memory overflow from queues but also prevent threads blocking on writing on the queues?

Actors send messages to one another. If the queues are limited, then what happens on write/send attempts to full queues? Blocking or dropping? If they are not limited, a memory crash is possible. How much is configurable?
Default mailboxes in Akka are not bounded, so will not prevent memory crash. You can however configure actors to use different mailboxes, among those there are both mailboxes that discard (pass to dead letters) messages when the max size is reached and those that block (I would not recommend to use those). You can find all mailbox implementations that comes with Akka in the docs here: https://doc.akka.io/docs/akka/current/typed/mailboxes.html#mailbox-implementations
You can test easily the behavior of the Erlang VM in this situation. In the shell:
F = fun F() -> receive done -> ok end end,
P = spawn(F),
G = fun G(Pid,Size,Wait) -> Pid ! lists:seq(1,Size), receive done -> ok after Wait -> G(Pid,Size,Wait) end end,
H = fun(Pid,Size,Wait) -> T = fun() -> G(Pid,Size,Wait) end, spawn(T) end,
D = fun D() -> io:format("~p~n~p~n",[erlang:time(),erlang:memory(processes_used)]), receive done -> ok after 10000 -> D() end end,
P1 = spawn(D).
P2 = H(P,100000,5).
You will see that you get a memory allocation exception, the VM writes a core dump and crashes.
I didn't check how to modify the limits, if you make the trial, you will see that it needs to reach a very high number of messages, using tens gigabytes of memory in the mailbox.
If you ever reach this situation, I don't think the first reaction is to increase the size, you should look first for
unread messages,
process bottleneck
application architecture
is Erlang adapted to your problem
...
actor queue in erlang not have limitation, this limited by memory size of VM, if memory size in VM is full VM crashed. for monitor or and management memory allocation and cpu load you can use os_mon in Erlang
you can test in erlang shell
F = fun() -> timer:sleep(60000),
{message_queue_len, InboxLen} = erlang:process_info(self(), message_queue_len),
io:format("Len ===> ~p", [InboxLen])
end.
PID = erlang:spawn(F).
[PID ! "hi" || _ <- lists:seq(1, 50000)].
if you increase number of message you can overflow memory
Default mailboxes in Akka are not bounded. But if you want to limit the max messages in mailboxes, you could build an Akka stream in the actor, then OverflowStrategy can be used on demand.
For example:
val source: Source[Message, SourceQueueWithComplete[Message]] =
Source.queue[Message](bufferSize = 8192,
overflowStrategy = OverflowStrategy.dropNew)

Erlang producers and consumers - strange behaviour of program

I am writing a program that solves producers-consumers problem using Erlang multiprocessing with one process responsible for handling buffer to which I produce/consume and many producers and many consumers processes. To simplify I assume producer/consumer does not know that his operation has failed (that it is impossible to produce or consume because of buffer constraints), but the server is prepared to do this.
My code is:
Server code
server(Buffer, Capacity, CountPid) ->
receive
%% PRODUCER
{Pid, produce, InputList} ->
NumberProduce = lists:flatlength(InputList),
case canProduce(Buffer, NumberProduce, Capacity) of
true ->
NewBuffer = append(InputList, Buffer),
CountPid ! lists:flatlength(InputList),
Pid ! ok,
server(NewBuffer,Capacity, CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end;
%% CONSUMER
{Pid, consume, Number} ->
case canConsume(Buffer, Number) of
true ->
Data = lists:sublist(Buffer, Number),
NewBuffer = lists:subtract(Buffer, Data),
Pid ! {ok, Data},
server(NewBuffer, Capacity,CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end
end.
Producer and consumer
producer(ServerPid) ->
X = rand:uniform(9),
ToProduce = [rand:uniform(500) || _ <- lists:seq(1, X)],
ServerPid ! {self(),produce,ToProduce},
producer(ServerPid).
consumer(ServerPid) ->
X = rand:uniform(9),
ServerPid ! {self(),consume,X},
consumer(ServerPid).
Starting and auxiliary functions (I enclose as I don't know where exactly my problem is)
spawnProducers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,producer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
spawnConsumers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,consumer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
start(ProdsNumber, ConsNumber) ->
CountPid = spawn(zad2, count, [0,0]),
ServerPid = spawn(zad2,server,[[],20, CountPid]),
spawnProducers(ProdsNumber, ServerPid),
spawnConsumers(ConsNumber, ServerPid).
canProduce(Buffer, Number, Capacity) ->
lists:flatlength(Buffer) + Number =< Capacity.
canConsume(Buffer, Number) ->
lists:flatlength(Buffer) >= Number.
append([H|T], Tail) ->
[H|append(T, Tail)];
append([], Tail) ->
Tail.
I am trying to count number of elements using such process, server sends message to it whenever elements are produced.
count(N, ThousandsCounter) ->
receive
X ->
if
N >= 1000 ->
io:format("Yeah! We have produced ~p elements!~n", [ThousandsCounter]),
count(0, ThousandsCounter + 1000);
true -> count(N + X, ThousandsCounter)
end
end.
I expect this program to work properly, which means: it produces elements, increase of produced elements depends on time like f(t) = kt, k-constant and the more processes I have the faster production is.
ACTUAL QUESTION
I launch program:
erl
c(zad2)
zad2:start(5,5)
How the program behaves:
The longer production lasts the less elements in the unit of time are being produced (e.g. in first second 10000, in next 5000, in 10th second 1000 etc.
The more processes I have, the slower production is, in start(10,10) I need to wait about a second for first thousand, whereas for start(2,2) 20000 appears almost immediately
start(100,100) made me restart my computer (I work on Ubuntu) as the whole CPU was used and there was no memory available for me to open terminal and terminate erlang machine
Why does my program not behave like I expect? Am I doing something wrong with Erlang programming or is this the matter of OS or anything else?
The producer/1 and consumer/1 functions as written above don't ever wait for anything - they just loop and loop, bombarding the server with messages. The server's message queue is filling up very quickly, and the Erlang VM will try to grow as much as it can, stealing all your memory, and the looping processes will steal all available CPU time on all cores.

Are erlang:send_after/3 and timer:send_after/3 intended to behave differently?

I wanted to send a message to a process after a delay, and discovered erlang:send_after/4.
When looking at the docs it looked like this is exactly what I wanted:
erlang:send_after(Time, Dest, Msg, Options) -> TimerRef
Starts a timer. When the timer expires, the message Msg is sent to the
process identified by Dest.
However, it doesn't seem to work when the destination is running on another node - it tells me one of the arguments are bad.
1> P = spawn('node#host', module, function, [Arg]).
<10585.83.0>
2> erlang:send_after(1000, P, {123}).
** exception error: bad argument
in function erlang:send_after/3
called as erlang:send_after(1000,<10585.83.0>,{123})
Doing the same thing with timer:send_after/3 appears to work fine:
1> P = spawn('node#host', module, function, [Arg]).
<10101.10.0>
2> timer:send_after(1000, P, {123}).
{ok,{-576458842589535,#Ref<0.1843049418.1937244161.31646>}}
And, the docs for timer:send_after/3 state almost the same thing as the erlang version:
send_after(Time, Pid, Message) -> {ok, TRef} | {error, Reason}
Evaluates Pid ! Message after Time milliseconds.
So the question is, why do these two functions, which on the face of it do the same thing, behave differently? Is erlang:send_after broken, or mis-advertised? Or maybe timer:send_after isn't doing what I think it is?
TL;DR
Your assumption is correct: these are intended to do the same thing, but are implemented differently.
Discussion
Things in the timer module such as timer:send_after/2,3 work through the gen_server that defines that as a service. Like any other service, this one can get overloaded if you assign a really huge number of tasks (timers to track) to it.
erlang:send_after/3,4, on the other hand, is a BIF implemented directly within the runtime and therefore have access to system primitives like the hardware timer. If you have a ton of timers this is definitely the way to go. In most programs you won't notice the difference, though.
There is actually a note about this in the Erlang Efficiency Guide:
3.1 Timer Module
Creating timers using erlang:send_after/3 and erlang:start_timer/3 , is much more efficient than using the timers provided by the timer module in STDLIB. The timer module uses a separate process to manage the timers. That process can easily become overloaded if many processes create and cancel timers frequently (especially when using the SMP emulator).
The functions in the timer module that do not manage timers (such as timer:tc/3 or timer:sleep/1), do not call the timer-server process and are therefore harmless.
A workaround
A workaround to gain the efficiency of the BIF without the same-node restriction is to have a process of your own that does nothing but wait for a message to forward to another node:
-module(foo_forward).
-export([send_after/3, cancel/1]).
% Obviously this is an example only. You would want to write this to
% be compliant with proc_lib, write a proper init/N and integrate with
% OTP. Note that this snippet is missing the OTP service functions.
start() ->
spawn(fun() -> loop(self(), [], none) end).
send_after(Time, Dest, Message) ->
erlang:send_after(Time, self(), {forward, Dest, Message}).
loop(Parent, Debug, State) ->
receive
{forward, Dest, Message} ->
Dest ! Message,
loop(Parent, Debug, State);
{system, From, Request} ->
sys:handle_msg(Request, From, Parent, ?MODULE, Debug, State);
Unexpected ->
ok = log(warning, "Received message: ~tp", [Unexpected]),
loop(Parent, Debug, State)
end.
The above example is a bit shallow, but hopefully it expresses the point. It should be possible to get the efficiency of the BIF erlang:send_after/3,4 but still manage to send messages across nodes as well as give you the freedom to cancel a message using erlang:cancel_timer/1
But why?
The puzzle (and bug) is why erlang:send_after/3,4 does not want to work across nodes. The example you provided above looks a bit odd as the first assignment to P was the Pid <10101.10.0>, but the crashed call was reported as <10585.83.0> -- clearly not the same.
For the moment I do not know why erlang:send_after/3,4 doesn't work, but I can say with confidence that the mechanism of operation between the two is not the same. I'll look into it, but I imagine that the BIF version is actually doing some funny business within the runtime to gain efficiency and as a result signalling the target process by directly updating its mailbox instead of actually sending an Erlang message on the higher Erlang-to-Erlang level.
Maybe it is good that we have both, but this should definitely be clearly marked in the docs, and it evidently is not (I just checked).
There is some difference in timeout order if you have many timers.
The example below shows erlang:send_after does not guarantee order, but
timer:send_after does.
1> A = lists:seq(1,10).
[1,2,3,4,5,6,7,8,9,10]
2> [erlang:send_after(100, self(), X) || X <- A].
...
3> flush().
Shell got 2
Shell got 3
Shell got 4
Shell got 5
Shell got 6
Shell got 7
Shell got 8
Shell got 9
Shell got 10
Shell got 1
ok
4> [timer:send_after(100, self(), X) || X <- A].
...
5> flush().
Shell got 1
Shell got 2
Shell got 3
Shell got 4
Shell got 5
Shell got 6
Shell got 7
Shell got 8
Shell got 9
Shell got 10
ok

How to kill an infinite loop process in Erlang

If I create a module with this code below
start_nonstop() ->
spawn(fun() ->
Pid = spawn(?MODULE, nonstop, [0]),
timer:sleep(1000),
exit(Pid, kill)
end).
nonstop(N) ->
io:format("number: ~B~n", [N + 1]),
nonstop(N + 1).
and call start_nonstop() from the Erlang shell, I see an endless series of
number: 1
number: 2
...
which means that the nonstop(N) process was not killed as expected by calling exit(Pid,kill)...
What am I doing wrong? Obviously, this code is a mockup, but I think there is always the chance that some logic bug in a process might result in an infinite loop behaviour similar to this one.
I supposed this could be handled by Erlang, but if not, how can I have an Erlang application be protected regarding these kind of situations?
Which patterns of "infinite loops" can Erlang break? For example, if I put a sleep in the middle of the nonstop(N) functions, Erlang can break the infinite loop, but if I put an erlang:yield() it still cannot break from the infinite loop ...
In this case the infinite process is local to the one trying to kill it. But, what if the infinite process was in a different (e.g., remote) Erlang VM? Could it be killed then?
I am a newbie, and I am evaluating Erlang before I put too much effort in learning and using it for serious applications.
Thanks
In this code, you spawn two process.
In function start_nonstop(), you spawn an process, we can call it Process1. Then in Process1, you spawn another process, we call it Process2.
The work of Process2 is:
nonstop(N) ->
io:format("number: ~B~n", [N + 1]),
nonstop(N + 1).
just do io:format("number: ~B~n", [N + 1]), until the Process1 kill it.
In my environment, the Process2 can be killed. But the variable N become very large from the output.
number: 51321
number: 51322
number: 51323
number: 51324
number: 51325
number: 51326
number: 51327
number: 51328
number: 51329
number: 51330
number: 51331
number: 51332
7>

Erlang and Redis: read performance

I suddenly encountered performance problems when trying to read 1M records from Redis sorted set. I used ZSCAN with cursor and batch size 5K.
Code was executed using Erlang R14 on the same machine that hosts Redis. Receiving of 5K elements batch takes near 1 second. Unfortunately, I failed to compile Erlang R16 on this machine, but I think it does not matter.
For comparison, Node.js code with node_redis (hiredis parser) does 1M in 2 seconds. Same results for Python and PHP.
Maybe I do something wrong?
Thanks in advance.
Here is my Erlang code:
-module(redis_bench).
-export([run/0]).
-define(COUNT, 5000).
run() ->
{_,Conn} = connect_to_redis(),
read_from_redis(Conn).
connect_to_redis() ->
eredis:start_link("host", 6379, 0, "pass").
read_from_redis(_Conn, 0) ->
ok;
read_from_redis(Conn, Cursor) ->
{ok, [Cursor1|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Conn, Cursor1).
read_from_redis(Conn) ->
{ok, [Cursor|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Conn, Cursor).
9 out of 10 times, slowness like this is a result of badly written drivers more than it is a result of the system. In this case, the ability to pipeline requests to Redis is going to be important. A client like redo can do pipelining and is maybe faster.
Also, beware measuring one process/thread only. If you want fast concurrent access, it is often balanced out against fast sequential access.
Switching to redis-erl decreased read time of 1M keys to 16 seconds. Not fast, but acceptable.
Here is new code:
-module(redis_bench2).
-export([run/0]).
-define(COUNT, 200000).
run() ->
io:format("Start~n"),
redis:connect([{ip, "host"}, {port, 6379}, {db, 0}, {pass, "pass"}]),
read_from_redis().
read_from_redis(<<"0">>) ->
ok;
read_from_redis(Cursor) ->
[{ok, Cursor1}|_] = redis:q(["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Cursor1).
read_from_redis() ->
[{ok, Cursor}|_] = redis:q(["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Cursor).

Resources