Erlang and Redis: read performance - erlang

I suddenly encountered performance problems when trying to read 1M records from Redis sorted set. I used ZSCAN with cursor and batch size 5K.
Code was executed using Erlang R14 on the same machine that hosts Redis. Receiving of 5K elements batch takes near 1 second. Unfortunately, I failed to compile Erlang R16 on this machine, but I think it does not matter.
For comparison, Node.js code with node_redis (hiredis parser) does 1M in 2 seconds. Same results for Python and PHP.
Maybe I do something wrong?
Thanks in advance.
Here is my Erlang code:
-module(redis_bench).
-export([run/0]).
-define(COUNT, 5000).
run() ->
{_,Conn} = connect_to_redis(),
read_from_redis(Conn).
connect_to_redis() ->
eredis:start_link("host", 6379, 0, "pass").
read_from_redis(_Conn, 0) ->
ok;
read_from_redis(Conn, Cursor) ->
{ok, [Cursor1|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Conn, Cursor1).
read_from_redis(Conn) ->
{ok, [Cursor|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Conn, Cursor).

9 out of 10 times, slowness like this is a result of badly written drivers more than it is a result of the system. In this case, the ability to pipeline requests to Redis is going to be important. A client like redo can do pipelining and is maybe faster.
Also, beware measuring one process/thread only. If you want fast concurrent access, it is often balanced out against fast sequential access.

Switching to redis-erl decreased read time of 1M keys to 16 seconds. Not fast, but acceptable.
Here is new code:
-module(redis_bench2).
-export([run/0]).
-define(COUNT, 200000).
run() ->
io:format("Start~n"),
redis:connect([{ip, "host"}, {port, 6379}, {db, 0}, {pass, "pass"}]),
read_from_redis().
read_from_redis(<<"0">>) ->
ok;
read_from_redis(Cursor) ->
[{ok, Cursor1}|_] = redis:q(["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Cursor1).
read_from_redis() ->
[{ok, Cursor}|_] = redis:q(["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Cursor).

Related

How do Actor Systems function to prevent memory overflow from queues but also prevent threads blocking on writing on the queues?

Actors send messages to one another. If the queues are limited, then what happens on write/send attempts to full queues? Blocking or dropping? If they are not limited, a memory crash is possible. How much is configurable?
Default mailboxes in Akka are not bounded, so will not prevent memory crash. You can however configure actors to use different mailboxes, among those there are both mailboxes that discard (pass to dead letters) messages when the max size is reached and those that block (I would not recommend to use those). You can find all mailbox implementations that comes with Akka in the docs here: https://doc.akka.io/docs/akka/current/typed/mailboxes.html#mailbox-implementations
You can test easily the behavior of the Erlang VM in this situation. In the shell:
F = fun F() -> receive done -> ok end end,
P = spawn(F),
G = fun G(Pid,Size,Wait) -> Pid ! lists:seq(1,Size), receive done -> ok after Wait -> G(Pid,Size,Wait) end end,
H = fun(Pid,Size,Wait) -> T = fun() -> G(Pid,Size,Wait) end, spawn(T) end,
D = fun D() -> io:format("~p~n~p~n",[erlang:time(),erlang:memory(processes_used)]), receive done -> ok after 10000 -> D() end end,
P1 = spawn(D).
P2 = H(P,100000,5).
You will see that you get a memory allocation exception, the VM writes a core dump and crashes.
I didn't check how to modify the limits, if you make the trial, you will see that it needs to reach a very high number of messages, using tens gigabytes of memory in the mailbox.
If you ever reach this situation, I don't think the first reaction is to increase the size, you should look first for
unread messages,
process bottleneck
application architecture
is Erlang adapted to your problem
...
actor queue in erlang not have limitation, this limited by memory size of VM, if memory size in VM is full VM crashed. for monitor or and management memory allocation and cpu load you can use os_mon in Erlang
you can test in erlang shell
F = fun() -> timer:sleep(60000),
{message_queue_len, InboxLen} = erlang:process_info(self(), message_queue_len),
io:format("Len ===> ~p", [InboxLen])
end.
PID = erlang:spawn(F).
[PID ! "hi" || _ <- lists:seq(1, 50000)].
if you increase number of message you can overflow memory
Default mailboxes in Akka are not bounded. But if you want to limit the max messages in mailboxes, you could build an Akka stream in the actor, then OverflowStrategy can be used on demand.
For example:
val source: Source[Message, SourceQueueWithComplete[Message]] =
Source.queue[Message](bufferSize = 8192,
overflowStrategy = OverflowStrategy.dropNew)

Erlang producers and consumers - strange behaviour of program

I am writing a program that solves producers-consumers problem using Erlang multiprocessing with one process responsible for handling buffer to which I produce/consume and many producers and many consumers processes. To simplify I assume producer/consumer does not know that his operation has failed (that it is impossible to produce or consume because of buffer constraints), but the server is prepared to do this.
My code is:
Server code
server(Buffer, Capacity, CountPid) ->
receive
%% PRODUCER
{Pid, produce, InputList} ->
NumberProduce = lists:flatlength(InputList),
case canProduce(Buffer, NumberProduce, Capacity) of
true ->
NewBuffer = append(InputList, Buffer),
CountPid ! lists:flatlength(InputList),
Pid ! ok,
server(NewBuffer,Capacity, CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end;
%% CONSUMER
{Pid, consume, Number} ->
case canConsume(Buffer, Number) of
true ->
Data = lists:sublist(Buffer, Number),
NewBuffer = lists:subtract(Buffer, Data),
Pid ! {ok, Data},
server(NewBuffer, Capacity,CountPid);
false ->
Pid ! tryagain,
server(Buffer, Capacity, CountPid)
end
end.
Producer and consumer
producer(ServerPid) ->
X = rand:uniform(9),
ToProduce = [rand:uniform(500) || _ <- lists:seq(1, X)],
ServerPid ! {self(),produce,ToProduce},
producer(ServerPid).
consumer(ServerPid) ->
X = rand:uniform(9),
ServerPid ! {self(),consume,X},
consumer(ServerPid).
Starting and auxiliary functions (I enclose as I don't know where exactly my problem is)
spawnProducers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,producer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
spawnConsumers(Number, ServerPid) ->
case Number of
0 -> io:format("Spawned producers");
N ->
spawn(zad2,consumer,[ServerPid]),
spawnProducers(N - 1,ServerPid)
end.
start(ProdsNumber, ConsNumber) ->
CountPid = spawn(zad2, count, [0,0]),
ServerPid = spawn(zad2,server,[[],20, CountPid]),
spawnProducers(ProdsNumber, ServerPid),
spawnConsumers(ConsNumber, ServerPid).
canProduce(Buffer, Number, Capacity) ->
lists:flatlength(Buffer) + Number =< Capacity.
canConsume(Buffer, Number) ->
lists:flatlength(Buffer) >= Number.
append([H|T], Tail) ->
[H|append(T, Tail)];
append([], Tail) ->
Tail.
I am trying to count number of elements using such process, server sends message to it whenever elements are produced.
count(N, ThousandsCounter) ->
receive
X ->
if
N >= 1000 ->
io:format("Yeah! We have produced ~p elements!~n", [ThousandsCounter]),
count(0, ThousandsCounter + 1000);
true -> count(N + X, ThousandsCounter)
end
end.
I expect this program to work properly, which means: it produces elements, increase of produced elements depends on time like f(t) = kt, k-constant and the more processes I have the faster production is.
ACTUAL QUESTION
I launch program:
erl
c(zad2)
zad2:start(5,5)
How the program behaves:
The longer production lasts the less elements in the unit of time are being produced (e.g. in first second 10000, in next 5000, in 10th second 1000 etc.
The more processes I have, the slower production is, in start(10,10) I need to wait about a second for first thousand, whereas for start(2,2) 20000 appears almost immediately
start(100,100) made me restart my computer (I work on Ubuntu) as the whole CPU was used and there was no memory available for me to open terminal and terminate erlang machine
Why does my program not behave like I expect? Am I doing something wrong with Erlang programming or is this the matter of OS or anything else?
The producer/1 and consumer/1 functions as written above don't ever wait for anything - they just loop and loop, bombarding the server with messages. The server's message queue is filling up very quickly, and the Erlang VM will try to grow as much as it can, stealing all your memory, and the looping processes will steal all available CPU time on all cores.

Erlang change VM process initial size. Tune Erlang VM

First I have to mention that I run on a CentOS 7 tuned up to support 1 million connections. I tested with a simple C server and client and I connected 512000 clients. I could have connect more but I did not have enought RAM to spawn more linux client machines, since from a machine I can open 65536 connections; 8 machines * 64000 connections each = 512000.
I made a simple Erlang server to which I want to connect 1 million or half a million clients, using the same C client. The problem I'm having now is memory related. For each successfully gen_tcp:accept call I spawn a process. Around 50000 open connections costs me 3.7 GB RAM on server, meanwhile using the C server I could have open 512000 connections using 1.9 GB RAM. It is true that on the C server I did not created a process after accept to handle stuff, I just called accept again in while loop, but even so... guys on web did this erlang thing with less memory ( ejabberd riak )
I presume that the flags that I pass to the erlang VM should do the trick. From what I read in documentation and on the web this is what I have: erl +K true +Q 64200 +P 134217727 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
This is the server code, I open 1 listener that monitors port 5001 by calling start(1, 5001).
start(Num,LPort) ->
case gen_tcp:listen(LPort,[{reuseaddr, true},{backlog,9000000000}]) of
{ok, ListenSock} ->
start_servers(Num,ListenSock),
{ok, Port} = inet:port(ListenSock),
Port;
{error,Reason} ->
{error,Reason}
end.
start_servers(0,_) ->
ok;
start_servers(Num,LS) ->
spawn(?MODULE,server,[LS,0]),
start_servers(Num-1,LS).
server(LS, Nr) ->
io:format("before accept ~w~n",[Nr]),
case gen_tcp:accept(LS) of
{ok,S} ->
io:format("after accept ~w~n",[Nr]),
spawn(ex,server,[LS,Nr+1]),
proc_lib:hibernate(?MODULE, loop, [S]);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
ok = inet:setopts(S,[{active,once}]),
receive
{tcp,S, _Data} ->
Answer = 1, % Not implemented in this example
gen_tcp:send(S,Answer),
proc_lib:hibernate(?MODULE, loop, [S]);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Given this configuration your my beam consumed about 2.5 GB of memory just on start without even your module loaded.
However, if you reduce maximum number of processes to the reasonable value, like +P 60000 for 50 000 connections test, memory consumption drops rapidly.
With 60 000 processes limit VM only used 527MB of virtual memory on start.
I've tried to reproduce your test, but unfortunately I was only able to launch 30 000 netcat's on my system before running out of memory (because of client jobs). However I only observed increase of VM memory consumption up to 570MB.
So my suggestion is that your numbers come from high startup memory consumption and not great number of opened connections. Even then you actually should pay attention to the stats change along with increasing number of opened connections and not absolute values.
I finally used the following configuration for my benchmark:
erl +K true +Q 64200 +P 60000 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
So I've launched clients with the command
for i in `seq 1 50000`; do nc 127.0.0.1 5001 & done
Apart from tunes you already made you can adjust tcp buffers as well. By default they take OS default values, but you can pass {recbuf, Size}and {sndbuf, Size} to gen_tcp:listen. It may reduce memory footprints significantly.

Spawn many processes erlang

I wanna measure the performance to my database by measuring the time taken to do something as the number of processes increase. The intention is to plot a graph of performance vs number of processes after, anyone has an idea how? i am a beginner in elrlang please helo
Assuming your database is mnesia, this should not be hard. one way would be to have a write function and a read function. However, note that there are several Activity access contexts with mnesia. To test write times, you should NOT use the context of transaction because it returns immediately to the calling process, even before a disc write has occured. However, for disc writes, its important that you look at the context called: sync_transaction. Here is an example:
write(Record)->
Fun = fun(R)-> mnesia:write(R) end,
mnesia:activity(sync_transaction,Fun,[Record],mnesia_frag).
The function above will return only when all active replicas of the mnesia table have committed the record onto the data disc file. Hence to test the speed as processes increase, you need to have a record generator,a a process spawner , the write function and finally a timing mechanism. For timing, we have a built in function called: timer:tc/1, timer:tc/2 and timer:tc/3 which returns the exact time it took to execute (completely) a given function. To cut the story short, this is how i would do this:
-module(stress_test).
-compile(export_all).
-define(LIMIT,10000).
-record(book,{
isbn,
title,
price,
version}).
%% ensure this table is {type,bag}
-record(write_time,{
isbn,
num_of_processes,
write_time
}).
%% Assuming table (book) already exists
%% Assuming mnesia running already
start()->
ensure_gproc(),
tv:start(),
spawn_many(?LIMIT).
spawn_many(0)-> ok;
spawn_many(N)->
spawn(?MODULE,process,[]),
spawn_many(N - 1).
process()->
gproc:reg({n, l,guid()},ignored),
timer:apply_interval(timer:seconds(2),?MODULE,write,[]),
receive
<<"stop">> -> exit(normal)
end.
total_processes()->
proplists:get_value(size,ets:info(gproc)) div 3.
ensure_gproc()->
case lists:keymember(gproc,1,application:which_applications()) of
true -> ok;
false -> application:start(gproc)
end.
guid()->
random:seed(now()),
MD5 = erlang:md5(term_to_binary([random:uniform(152629977),{node(), now(), make_ref()}])),
MD5List = lists:nthtail(3, binary_to_list(MD5)),
F = fun(N) -> f("~2.16.0B", [N]) end,
L = [F(N) || N <- MD5List],
lists:flatten(L).
generate_record()->
#book{isbn = guid(),title = guid(),price = guid()}.
write()->
Record = generate_record(),
Fun = fun(R)-> ok = mnesia:write(R),ok end,
%% Here is now the actual write we measure
{Time,ok} = timer:tc(mnesia,activity,[sync_transaction,Fun,[Record],mnesia_frag]),
%% The we save that time, the number of processes
%% at that instant
NoteTime = #write_time{
isbn = Record#book.isbn,
num_of_processes = total_processes(),
write_time = Time
},
mnesia:activity(transaction,Fun,[NoteTime],mnesia_frag).
Now there are dependencies here, especially: gproc download and build it into your erlang lib path from here Download Gproc.To run this, just call: stress_test:start(). The table write_time will help you draw a graph of number of processes against time taken to write. As the number of processes increase from 0 to the upper limit (?LIMIT), we note the time taken to write a given record at the given instant and we also note the number of processes at that time.UPDATE
f(S)-> f(S,[]).
f(S,Args) -> lists:flatten(io_lib:format(S, Args)).
That is the missing function. Apologies.... Remember to study the table write_time, using the application tv, a window is opened in which you can examine the mnesia tables. Use this table to see increasing write times/ or decreasing performance as number of processes increase from time to time. An element i have left out is to note the actual time of the write action using time() which may be important parameter. You may add it in the table definition of the write_time table.
Also look at http://wiki.basho.com/Benchmarking.html
you might look at tsung http://tsung.erlang-projects.org/

Rate-limited event handler in erlang/OTP

I have a data source that produces point at a potentially high rate, and I'd like to perform a possibly time-consuming operation on each point; but I would also like the system to degrade gracefully when it becomes overloaded, by dropping excess data points.
As far as I can tell, using a gen_event will never skip events. Conceptually, what I would like the gen_event to do is to drop all but the latest pending events before running the handlers again.
Is there a way to do this with standard OTP ? or is there a good reason why I should not handle things that way ?
So far the best I have is using a gen_server and relying on the timeout to trigger the expensive events:
-behaviour(gen_server).
init() ->
{ok, Pid} = gen_event:start_link(),
{ok, {Pid, none}}.
handle_call({add, H, A},_From,{Pid,Data}) ->
{reply, gen_event:add_handler(Pid,H,A), {Pid,Data}}.
handle_cast(Data,{Pid,_OldData}) ->
{noreply, {Pid,Data,0}}. % set timeout to 0
handle_info(timeout, {Pid,Data}) ->
gen_event:sync_notify(Pid,Data),
{noreply, {Pid,Data}}.
Is this approach correct ? (esp. with respect to supervision ? )
I can't comment on supervision, but I would implement this as a queue with expiring items.
I've implemented something that you can use below.
I made it a gen_server; when you create it you give it a maximum age for old items.
Its interface is that you can send it items to be processed and you can request items that have not been dequeued It records the time at which it receives every item. Every time it receives an item to be processed, it checks all the items in the queue, dequeueing and discarding those that are older than the maximum age. (If you want the maximum age to be always respected, you can filter the queue before you return queued items)
Your data source will cast data ({process_this, Anything}) to the work queue and your (potentially slow) consumers process will call (gimme) to get data.
-module(work_queue).
-behavior(gen_server).
-export([init/1, handle_cast/2, handle_call/3]).
init(DiscardAfter) ->
{ok, {DiscardAfter, queue:new()}}.
handle_cast({process_this, Data}, {DiscardAfter, Queue0}) ->
Instant = now(),
Queue1 = queue:filter(fun({Stamp, _}) -> not too_old(Stamp, Instant, DiscardAfter) end, Queue0),
Queue2 = queue:in({Instant, Data}, Queue1),
{noreply, {DiscardAfter, Queue2}}.
handle_call(gimme, From, State = {DiscardAfter, Queue0}) ->
case queue:is_empty(Queue0) of
true ->
{reply, no_data, State};
false ->
{{value, {_Stamp, Data}}, Queue1} = queue:out(Queue0),
{reply, {data, Data}, {DiscardAfter, Queue1}}
end.
delta({Mega1, Unit1, Micro1}, {Mega2, Unit2, Micro2}) ->
((Mega2 - Mega1) * 1000000 + Unit2 - Unit1) * 1000000 + Micro2 - Micro1.
too_old(Stamp, Instant, DiscardAfter) ->
delta(Stamp, Instant) > DiscardAfter.
Little demo at the REPL:
c(work_queue).
{ok, PidSrv} = gen_server:start(work_queue, 10 * 1000000, []).
gen_server:cast(PidSrv, {process_this, <<"going_to_go_stale">>}),
timer:sleep(11 * 1000),
gen_server:cast(PidSrv, {process_this, <<"going to push out previous">>}),
{gen_server:call(PidSrv, gimme), gen_server:call(PidSrv, gimme)}.
Is there a way to do this with standard OTP ?
No.
is there a good reason why I should not handle things that way ?
No, timing out early can increase the performance of the entire system. Read about how here.
Is this approach correct ? (esp. with respect to supervision ? )
No idea, you haven't provided the supervision code.
As a bit of extra information to your first question:
If you can use 3rd party libraries outside of OTP, there are a few out there that can add preemptive timeouts, which is what you are describing.
There are two that I am familiar with the first is dispcount, and the second is chick (I'm the author of chick, i'll try not to advertise the project here).
Dispcount works really good for single resources that only have a limited number of jobs that can be run at the same time and does no queuing. you can read about it here (warning lots of really interesting information!).
Dispcount didn't work for me because i would have had to spawn 4000+ pools of processes to handle the amount of different queues inside of my app. I wrote chick because I needed a way to dynamically increase and decrease my queue length, as well as being able to queue up requests and deny others, without having to spawn 4000+ pools of processes.
If I were you I would try out discount first (as most solutions do not need chick), and then if you need something a bit more dynamic then a pool that can respond to a certain number of requests try out chick.

Resources