I meet a problem when I use lager.
In the lager source code, the lager_backend_throttle.erl file.
handle_event({log, _Message},State) ->
{message_queue_len, Len} = erlang:process_info(self(), message_queue_len),
case {Len > State#state.hwm, State#state.async} of
{true, true} ->
%% need to flip to sync mode
lager_config:set(async, false),
{ok, State#state{async=false}};
{false, false} ->
%% need to flip to async mode
lager_config:set(async, true),
{ok, State#state{async=true}};
_ ->
%% nothing needs to change
{ok, State}
end;
when the message_queue_len is more than Threshold, It will flip to sync mode.
when the message_queue_len is less than Thershold, It will flip to async mode.
I think when there is too many message, It should change the mode to async to process the message more quickly. Why lager design this way?
The reason I guess is that there is limit length in the message_queue, If there is too many message, the process may crash. So lager throttle down the speed of sending message by change the sending mode?
I found the answer on github.
Prior to lager 2.0, the gen_event at the core of lager operated purely in synchronous mode. Asynchronous mode is faster, but has no protection against message queue overload. In lager 2.0, the gen_event takes a hybrid approach. it polls its own mailbox size and toggles the messaging between synchronous and asynchronous depending on mailbox size.
{async_threshold, 20},
{async_threshold_window, 5}
This will use async messaging until the mailbox exceeds 20 messages, at which point synchronous messaging will be used, and switch back to asynchronous, when size reduces to 20 - 5 = 15.
If you wish to disable this behaviour, simply set it to 'undefined'. It defaults to a low number to prevent the mailbox growing rapidly beyond the limit and causing problems. In general, lager should process messages as fast as they come in, so getting 20 behind should be relatively exceptional anyway.
If you want to limit the number of messages per second allowed from error_logger, which is a good idea if you want to weather a flood of messages when lots of related processes crash, you can set a limit:
{error_logger_hwm, 50}
It is probably best to keep this number small.
Related
I want know the the erlang process state when this process is running receive after:
receive
X ->
ok
after 1000 ->
ok
end
1、Is the process state is running or waiting?
2、Does this process will use cpu schedler time?
3、If i have 120000 erlang process like this, every process will run code like this:
receive
X ->
ok
after 1000 ->
ok
end
So, Does this code will be a bottleneck?
The process is just moving along with whatever comes after the receive expression.
For example, let's say a inline a request/response:
ask_foo(SomePID) ->
Ref = make_ref(),
SomePID ! {self(), Ref, why},
receive
{Ref, Answer} ->
io:format("The answer: ~tp~n", [Answer])
after
1000 ->
io:format("~p is too slow. Moving on...~n", [SomePID])
end,
io:format("I'll print this in any case, and then exit.").
receive blocks until it either receives a message that matches one of its receive clauses, or the timeout occurs -- whichever happens first. Then it continues on doing whatever else is in its code. Very often there is a single receive loop, but it is not uncommon to use a series of inline receive clauses for things that should block, like waiting on a fixed sequence of inputs from a user or something similar.
The "process's state" is not changing in terms of its state data at all. It is blocking -- which means it is suspended until a message or a timeout occurrs. But, unlike polling systems, this does not carry an overhead penalty with it because the VM is managing the scheduling (the process doesn't have to wake itself up, it can safely block on receive).
You asked if this will be a bottleneck: No. No other processes are blocking, only this one. All other processes are executing on their own schedule, and they have nothing to do with this one. So when blocking on a receive you are only holding up the rest of the things this particular process is supposed to do. Whether or not that is a bottleneck becomes, therefore, an architectural question.
E.g.
make_orphan() ->
P = spawn(...),
ok
.
Is there a way for P to receive a message some time after make_orphan returns? Or is P destined to haunt the system (using up precious resources) for all eternity, unless it exits on its own?
A straightforward way to:
receive a message some time after make_orphan returns
is with a monitor.
make_orphan() ->
Parent = self(),
P = spawn(fun() -> monitor(process, Parent), ... end),
ok
P will then get a {'DOWN', Ref, process, Parent, Reason} message when Parent dies. Even if Parent exits before monitor/2 is called, the message will contain the reason noproc.
Communicate P to some process somewhere, register P in some way (register, global, gproc, pg2, some homebrew solution, etc.), have someone monitor it, etc. So sure, several ways. But a fundamental principle of an OTP program is that every process belongs to a supervision tree somewhere, so this becomes less of a problem.
Unless you are modeling a system that falls way outside the assumptions of OTP (like peer supervision among cellular automata) then you don't ever want to create the opportunity for orphans to exist. Orphan processes are the Erlang equivalent to memory leaks -- and that is never a good thing.
For some background information on some of the implications of writing OTP processes versus raw Erlang stuff where you're much more likely to leak processes, read the documentation for proc_lib and the "Sys and Proc_Lib" chapter of the OTP Design Principles docs.
I'm still in the learning fase of Erlang, so I might be wrong, but this is how I understood a process' message queue.
A process could be in it's main receive loop, receiving certain types of messages, while later it could be put in a waiting loop to deal with a different kind of message in the second loop. If the process would receive messages intended for the first loop in the second loop, it would just put them in the queue, ignore them for the time being and only process those message that it can match against in the current loop it is in. Now if it would enter the first receive loop again, it would start from the beginning and again process the messages that it can match against.
Now my question would be, if this is how Erlang works and I understood this correctly, then what happens when a malicious process would send a bunch of messages that the process will never process. Will the queue eventually overflow, resulting in a crash for the process or how should I deal with this? I'll type out an example to illustrate what I mean.
Now if a malicious program would get a hold of the Pid and would go Pid ! {malicioudata, LotsOfData} repeatedly, would those messages be filtered out since they will never possibly be processed or would they just stack up in the queue?
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList);
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values)
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
There is not a hard limit on message counts, and there is not a fixed amount of memory you are limited to, but you can certainly run out of memory if you have billions of messages (or a few super huge ones, maybe).
Long before you OOM because of a huge mailbox you will notice either selective receives taking a long time (not that "selective receive" is a good pattern to follow much of the time...) or innocently peek into a process mail queue and realized you've opened Pandora's Box in your terminal.
This is usually treated as a throttling and monitoring issue in the Erlang world. If you aren't able to keep up and your problem is parallelizable then you need more workers. If you are maxing out your hardware then you need more efficiency. If you are still maxing out your hardware, can't get any more, and you're still overwhelmed then you need to decide how to implement pushback or load shedding.
Unfortunately there is no "message queue overflow" and it's going to grow until VM crashes due to memory allocation error.
Solution is to drop any invalid messages in main loop, because you are not suppose to receive any of {hereismyinformation, _,_} nor one you get in askforinformation() due to blocking nature of your process.
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList, Values); % i assume you meant that
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values);
_ ->
firstloop(Values) % you can't get {hereismyinformation, _,_} here so we can drop any invalid message
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
It's not really a problem with unexpected messages because it's easily avoidable but when process queue is growing faster than it's processed. For this specific problem there is a nice jobs framework for production systems.
I use ODBC to query a table from a database:
getTable(Ref,SearchKey) ->
Q = "SELECT * FROM TestDescription WHERE NProduct = " ++ SearchKey,
case odbc:sql_query(Ref,Q) of
{_,_,Data} ->
%io:format("GetTable Query ok ~n"),
{ok, Data};
{error,_Reason} ->
%io:format("Gettable Query error ~p ~n",[_Reason]),
{stop, odbc_query_failed};
_->
io:format("Error Logic in getTable function ~n")
end.
This function will return a tuple which includes all the db data. Sending this to another process:
OtherProcessPid!{ok,Data};
It works fine with a small number of rows, but how about a very large number, greater than a million, say? Can erlang still work with it?
The question isn't "Can Erlang handle very large messages?" (it can), it is rather "are you ready to deal with the consequences of very large messages?"
All messages are copied (exception of some larger binaries): this means you have to prepare for some slowdowns if you're doing a lot of messaging of large messages, have memory use a lot less stable than with small messages, etc.
In the case of distributed Erlang, a very large message that needs to be 'uploaded' to a remote node might block the heartbeats making it possible to know whether a VM is alive or not if the delays are too short, or the messages too large for how often you send them.
In any case, the solution is to measure what you can or can't deal with. There is no hardcoded limit that I know of regarding message size. Know that smaller messages are usually preferable as a general rule of thumb, though.
I've just discovered the MailboxProcessor in F# and it's usage as a "state machine" ... but I can't find much on the recommended usage of them.
For example... say I'm making a simple game with 100 on-screen enemies should I use a MailboxProcessor to change enemy position and health; giving me 200 active MailboxProcessor?
Is there any clever thread management going on under the hood? should I try and limit the amount of active MailboxProcessor I have or can I keep banging them out willy-nilly?
Thanks in advance,
JD.
A MailboxProcessor for enemy simulation might look like this:
MailboxProcessor.Start(fun inbox ->
async {
while true do
let! message = inbox.Receive()
processMessage(message)
})
It does not consume a thread while it waits for a message to arrive (let! message = line). However, once message arrives it will consume a thread (on a thread pool). If you have a 100 mailbox processors that all receive a message simultaneously, they will all attempt to wake up and consume a thread. Since here message processing is CPU bound, 100s of mailbox processors will all wake up and start spawning (thread pool) threads. This is not a great performance.
One situation mailbox processors excel in is the situation where there is a lot of concurrent clients all sending messages to one processor (imagine several parallel web crawlers all downloading pages and sinking results to a queue). On-screen enemies case appears different - it is many entities responding to a single source of messages (player movement/time ticks).
Another example where thousands of MailboxProcessors is a great solution is I/O bound MailboxProcessor:
MailboxProcessor.Start(fun inbox ->
async {
while true do
let! message = inbox.Receive()
match message with
| ->
do! AsyncWrite("something")
let! response = AsyncResponse()
...
})
Here after receiving a message the agent very quickly yields a thread but still needs to maintain state across asynchronous operations. This will scale very very well in practice - you can run thousands and thousands of such agents: this is a great way to write a web server.
As per
http://blogs.msdn.com/b/dsyme/archive/2010/02/15/async-and-parallel-design-patterns-in-f-part-3-agents.aspx
you can bang them out willy-nilly. Try it! They use the ThreadPool. I have not tried this for a real-time GUI game app, but I would not be surprised if this is 'good enough'.
say I'm making a simple game with 100 on-screen enemies should I use a MailboxProcessor to change enemy position and health; giving me 200 active MailboxProcessor?
I don't see any reason to try to use MailboxProcessor for that. A serial loop is likely to be much simpler and faster.
Is there any clever thread management going on under the hood?
Yes, lots. But is it designed for asynchronous concurrent programming (particularly scalable IO) and your program isn't really doing that.
should I try and limit the amount of active MailboxProcessor I have or can I keep banging them out willy-nilly?
You can bang them out willy-nilly but they are far from optimized and performance is much worse than serial code.
Maybe this or this can help?