MailboxProcessor usage guidelines? - f#

I've just discovered the MailboxProcessor in F# and it's usage as a "state machine" ... but I can't find much on the recommended usage of them.
For example... say I'm making a simple game with 100 on-screen enemies should I use a MailboxProcessor to change enemy position and health; giving me 200 active MailboxProcessor?
Is there any clever thread management going on under the hood? should I try and limit the amount of active MailboxProcessor I have or can I keep banging them out willy-nilly?
Thanks in advance,
JD.

A MailboxProcessor for enemy simulation might look like this:
MailboxProcessor.Start(fun inbox ->
async {
while true do
let! message = inbox.Receive()
processMessage(message)
})
It does not consume a thread while it waits for a message to arrive (let! message = line). However, once message arrives it will consume a thread (on a thread pool). If you have a 100 mailbox processors that all receive a message simultaneously, they will all attempt to wake up and consume a thread. Since here message processing is CPU bound, 100s of mailbox processors will all wake up and start spawning (thread pool) threads. This is not a great performance.
One situation mailbox processors excel in is the situation where there is a lot of concurrent clients all sending messages to one processor (imagine several parallel web crawlers all downloading pages and sinking results to a queue). On-screen enemies case appears different - it is many entities responding to a single source of messages (player movement/time ticks).
Another example where thousands of MailboxProcessors is a great solution is I/O bound MailboxProcessor:
MailboxProcessor.Start(fun inbox ->
async {
while true do
let! message = inbox.Receive()
match message with
| ->
do! AsyncWrite("something")
let! response = AsyncResponse()
...
})
Here after receiving a message the agent very quickly yields a thread but still needs to maintain state across asynchronous operations. This will scale very very well in practice - you can run thousands and thousands of such agents: this is a great way to write a web server.

As per
http://blogs.msdn.com/b/dsyme/archive/2010/02/15/async-and-parallel-design-patterns-in-f-part-3-agents.aspx
you can bang them out willy-nilly. Try it! They use the ThreadPool. I have not tried this for a real-time GUI game app, but I would not be surprised if this is 'good enough'.

say I'm making a simple game with 100 on-screen enemies should I use a MailboxProcessor to change enemy position and health; giving me 200 active MailboxProcessor?
I don't see any reason to try to use MailboxProcessor for that. A serial loop is likely to be much simpler and faster.
Is there any clever thread management going on under the hood?
Yes, lots. But is it designed for asynchronous concurrent programming (particularly scalable IO) and your program isn't really doing that.
should I try and limit the amount of active MailboxProcessor I have or can I keep banging them out willy-nilly?
You can bang them out willy-nilly but they are far from optimized and performance is much worse than serial code.

Maybe this or this can help?

Related

Could you overflow the message queue of an Erlang process?

I'm still in the learning fase of Erlang, so I might be wrong, but this is how I understood a process' message queue.
A process could be in it's main receive loop, receiving certain types of messages, while later it could be put in a waiting loop to deal with a different kind of message in the second loop. If the process would receive messages intended for the first loop in the second loop, it would just put them in the queue, ignore them for the time being and only process those message that it can match against in the current loop it is in. Now if it would enter the first receive loop again, it would start from the beginning and again process the messages that it can match against.
Now my question would be, if this is how Erlang works and I understood this correctly, then what happens when a malicious process would send a bunch of messages that the process will never process. Will the queue eventually overflow, resulting in a crash for the process or how should I deal with this? I'll type out an example to illustrate what I mean.
Now if a malicious program would get a hold of the Pid and would go Pid ! {malicioudata, LotsOfData} repeatedly, would those messages be filtered out since they will never possibly be processed or would they just stack up in the queue?
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList);
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values)
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
There is not a hard limit on message counts, and there is not a fixed amount of memory you are limited to, but you can certainly run out of memory if you have billions of messages (or a few super huge ones, maybe).
Long before you OOM because of a huge mailbox you will notice either selective receives taking a long time (not that "selective receive" is a good pattern to follow much of the time...) or innocently peek into a process mail queue and realized you've opened Pandora's Box in your terminal.
This is usually treated as a throttling and monitoring issue in the Erlang world. If you aren't able to keep up and your problem is parallelizable then you need more workers. If you are maxing out your hardware then you need more efficiency. If you are still maxing out your hardware, can't get any more, and you're still overwhelmed then you need to decide how to implement pushback or load shedding.
Unfortunately there is no "message queue overflow" and it's going to grow until VM crashes due to memory allocation error.
Solution is to drop any invalid messages in main loop, because you are not suppose to receive any of {hereismyinformation, _,_} nor one you get in askforinformation() due to blocking nature of your process.
startproc() -> firstloop(InitValues).
firstloop(Values) ->
receive
retrieveinformation ->
WaitingList=askforinformation(),
retrieveloop(WaitingList, Values); % i assume you meant that
dostuff ->
NewValues=doingstuff(),
firstloop(NewValues);
sendmeyourdata ->
sendingdata(Values),
firstloop(Values);
_ ->
firstloop(Values) % you can't get {hereismyinformation, _,_} here so we can drop any invalid message
end.
retrieveloop([],Values) -> firstloop(Values).
retrieveloop(WaitingList,Values) ->
receive
{hereismyinformation,Id,MyInfo} ->
NewValues=dosomethingwithinfo(Id,MyInfo),
retrieveloop(lists:remove(Id,1,WaitingList),NewValues);
end.
It's not really a problem with unexpected messages because it's easily avoidable but when process queue is growing faster than it's processed. For this specific problem there is a nice jobs framework for production systems.

why lager design this way?

I meet a problem when I use lager.
In the lager source code, the lager_backend_throttle.erl file.
handle_event({log, _Message},State) ->
{message_queue_len, Len} = erlang:process_info(self(), message_queue_len),
case {Len > State#state.hwm, State#state.async} of
{true, true} ->
%% need to flip to sync mode
lager_config:set(async, false),
{ok, State#state{async=false}};
{false, false} ->
%% need to flip to async mode
lager_config:set(async, true),
{ok, State#state{async=true}};
_ ->
%% nothing needs to change
{ok, State}
end;
when the message_queue_len is more than Threshold, It will flip to sync mode.
when the message_queue_len is less than Thershold, It will flip to async mode.
I think when there is too many message, It should change the mode to async to process the message more quickly. Why lager design this way?
The reason I guess is that there is limit length in the message_queue, If there is too many message, the process may crash. So lager throttle down the speed of sending message by change the sending mode?
I found the answer on github.
Prior to lager 2.0, the gen_event at the core of lager operated purely in synchronous mode. Asynchronous mode is faster, but has no protection against message queue overload. In lager 2.0, the gen_event takes a hybrid approach. it polls its own mailbox size and toggles the messaging between synchronous and asynchronous depending on mailbox size.
{async_threshold, 20},
{async_threshold_window, 5}
This will use async messaging until the mailbox exceeds 20 messages, at which point synchronous messaging will be used, and switch back to asynchronous, when size reduces to 20 - 5 = 15.
If you wish to disable this behaviour, simply set it to 'undefined'. It defaults to a low number to prevent the mailbox growing rapidly beyond the limit and causing problems. In general, lager should process messages as fast as they come in, so getting 20 behind should be relatively exceptional anyway.
If you want to limit the number of messages per second allowed from error_logger, which is a good idea if you want to weather a flood of messages when lots of related processes crash, you can set a limit:
{error_logger_hwm, 50}
It is probably best to keep this number small.

Questions about the nextTuple method in the Spout of Storm stream processing

I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system.
Then I checked some material about the ack and nextTuple methods of Storm. Now my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method?
If this is true, does it means that I cannot set a fixed time interval to emit data?
Thx a lot!
My experience has been that you should not expect Storm to make any real-time guarantees, including in your case the rate of tuple processing. You can certainly write a spout that only emits tuples on some time schedule, but Storm can't really guarantee that it will always call on the spout as often as you would like.
Note that nextTuple should be called whenever there is room available for more pending tuples in the topology. If the topology has free capacity, I would expect Storm to try to fill it up if it can with whatever it can get.
I had a similar use-case, and the way I accomplished it is by using TICK_TUPLE
Config tickConfig = new Config();
tickConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 15);
...
...
builder.setBolt("storage_bolt", new S3Bolt(), 4).fieldsGrouping("shuffle_bolt", new Fields("hash")).addConfigurations(tickConfig);
Then in my storage_bolt (note it's written in python, but you will get an idea) i check if message is tick_tuple if it is then execute my code:
def process(self, tup):
if tup.stream == '__tick':
# Your logic that need to be executed every 15 seconds,
# or what ever you specified in tickConfig.
# NOTE: the maximum time is 600 s.
storm.ack(tup)
return

A "Temporal Pass" Module

I am trying to create a general module that collects data at an irregular interval. Data arrives from the left end as soon as new data has arrived. This may be something like 100 times a second.
On the right end I want to be able to "plug in" n listeners, each with its own regular interval. For the purpose of simplification, let's say all with an interval of once per second.
Every listener registers a callback function that may or may not be asynchronous.
My problem is that if the callback function is synchronous, my "temporal pass" may hang. What is the best way to approach this? Should I spawn a process whose pure purpose is to pass along the data and pay the price if the callback hangs?
+-------------+ Data Out 1
=======> |Temporal Pass| ==========>
Data In +-------------+ \\ Data Out 2
++=======>
\\ Data Out n
++=======>
Spawn a new process for the message, otherwise the process will wait until synchronous calls are done. This is exactly the sort of problem the process model is meant to solve and I do not see any other way to do it.
Spawning processes are not expensive, but not entirely free either. You may get a small performance boost by only spawning new processes for the synchronous calls. That will require some way of flagging each callback as either synchronous or asynchronous.

Is multiple-producer, single-consumer possible in a lockfree setting?

I have a bunch of threads that are doing lots of communication with each other.
I would prefer this be lock free.
For each thread, I want to have a mailbox, where other threads can send it messages, (but only the owner can remove messages). This is a multiple-producer single-consumer situation. is it possible for me to do this in a lockfree / high performance matter? (This is in the inner loop of a gigantic simulation.)
Lock-free Multiple Producer Single Consumer (MPSC) Queue is one of the easiest lock-free algorithms to implement.
The most basic implementation requires a simple lock-free singly-linked list (SList) with only push() and flush(). The functions are available in the Windows API as InterlockedFlushSList() and InterlockedPushEntrySList() but these are very easy to roll on your own.
Multiple Producer push() items onto the SList using a CAS (interlocked compare-and-swap).
The Single Consumer does a flush() which swaps the head of the SList with a NULL using an XCHG (interlocked exchange). The Consumer then has a list of items in the reverse-order.
To process the items in order, you must simply reverse the list returned from flush() before processing it. If you do not care about order, you can simply walk the list immediately to process it.
Two notes if you roll your own functions:
1) If you are on a system with weak memory ordering (i.e. PowerPC), you need to put a "release memory barrier" at the beginning of the push() function and an "aquire memory barrier" at the end of the flush() function.
2) You can make the functions considerably simplified and optimized because the ABA-issue with SLists occur during the pop() function. You can not have ABA-issues with a SList if you use only push() and flush(). This means you can implement it as a single pointer very similar to the non-lockfree code and there is no need for an ABA-prevention sequence counter.
Sure, if you have an atomic CompareAndSwap instruction:
for (i = 0; ; i = (i + 1) % MAILBOX_SIZE)
{
if ((mailbox[i].owned == false) &&
(CompareAndSwap(&mailbox[i].owned, true, false) == false))
break;
}
mailbox[i].message = message;
mailbox[i].ready = true;
After reading a message, the consuming thread just sets mailbox[i].ready = false; mailbox[i].owned = false; (in that order).
Here's a paper from the University of Rochester illustrating a non-blocking concurrent queue. The algorithm described in the paper shows one technique for making a lockless queue.
may want to look at Intel thread building blocks, I recall being to lecture by Intel developer that mentioned something along those lines.

Resources