Erlang/OTP How to notify parent process that child processes are idle and no messages in their mailbox - erlang

I would like to design a process hierarchy where there is a a parent process P which acts like a gatekeeper and delegates the work(messages/events from its client processes) to it's children processes C1,C2..Cn which collaborate with each other and may send the result back to P. The children processes cannot talk to any process outside, only P.
The challenge is that though P may have multiple messages from its clients, it should accept only one message, delegate the work to C1..Cn and ONLY accept the next message from its clients
when all its children processes are done(or idle) and there are no more messages circulating between C1 to Cn.
P finishes accepting messages from C1..Cn so that it can return the result to its clients
Constraints:
Idle for me is when they are waiting with a receive (blocking) or simply exited.
C1 to Cn are finite state machines. Some or all of them may send messages back to C. Or there may be no messages to be sent back to C. Even if no messages are sent back to C, C has to figure out that all of them are done with no messages between them.
If any of C1 to Cn have been pre-empted, then it is considered busy(this may be obvious but I thought I'll put it here for completion) and C will not receive the next message
Is there an OTP pattern or library which will do this for me (before I hack something?). I know that process_info can let me know if the mailbox of a process are empty and I could keep on checking the children's mailboxes from P but it would be unnecessary polling from P.
EDIT GENERAL: I am trying to implement a reactive variant of Flow Based Programming on the Erlang platform. This has the notion of 'hierarchical processes' or composites which themselves may contain composite processes until we reach some boxes of actual code...I am going to research(looking at monitor,process_info,process_flag) but I wanted to respond to your excellent answers
EDIT RECURSIVE PARENTS: Each of C1 and Cn can themselves be parent/composite processes. If I just spawn processes and let them exit immediately, I'll have to create the chain of Composites everytime as C1..Cn may themselves be composites (which spawn composites..and so on). Finally, when we reach a leaf box(which is not a composite node), they are supposed to be finite state machines.. so I'm not sure of spawning and making them exit quickly if the are FSMs.
EDIT TKOWAL: Since I am trying to create a generic parent/composite process, it does not know 'when' the task ends. All it does is relay the messages it receives from its children to it's siblings with the 'constraint' that it will not accept the next message from its client/siblings until its children are 'done'. The children C1..Cn may send not just one but many messages. I understand from your proposal, that wait_for_task_finish will stop blocking the moment it gets the first message. But more messages may be emitted too by P's children. P should wait for all messages. Also, having a task_end symbol will not work for the same reason(i.e. multiple messages possible from the children)

Given how inexpensive it is to start up Erlang processes, your gatekeeper could start new children for each incoming task, and then wait for them all to exit normally once they complete their work.
But in general, it sounds like you're looking for a process pool. There are a few of these already available, such as poolboy and sidejob. Pools can be harder to get right than you think, so I advise using an existing proven pool implementation before attempting to write your own.

After edits, this became entirely different question, so I am posting new answer.
If you are trying to write Flow Based Programming, then you are probably solving wrong problem. FBP is effective, because almost everything is asynchronous and you start processing next request immediately after you finished with previous one.
So, the answer is - don't wait for children to finish:
In FBP, there is no time dependencies between the components. So if I
have a chunk of data, it should be able to flow from one end of the
diagram to the other regardless of how any other pieces of data are
being handled. In order to program an FBP system, you have to minimize
your dependencies.
source
When creating parent and children, you know all the connections between blocks, so just configure children to send processed data directly to next block. For example: P1 has children C1 and C2. You send message to P1, it delegates it to C1, packet flows couple of times between C1 and C2 and after that, C1 or C2 sends it directly to P2.
Blocks should be stateless. They output should not depend on previous requests, so even if C1 and C2 are processing data from two different requests to P1 - it is OK. There could be situations, where P1 gets data packet D1 and then D2, but will output answers in different order R2 and then R1. It is also OK. You can use Erlang reference to tag messages and then check, which response is from which request.

I don't think, there is ready library for that, but it is really easy to hack, unless I missed something. Your P process should look like this:
ready_for_next_task() ->
receive
{task, Task, CallerPid} ->
send_task_to_workers(Task)
end,
wait_for_task_finish(CallerPid).
wait_for_task_finish(CallerPid) ->
receive
{task_end, Response} ->
CallerPid ! Response
end,
ready_for_next_task().
In wait_for_task_finish/1 you have only one clause for receive, so it will not accept next task, until current one is finished. If you are waiting for multiple responses from workers, you can simply add second clause to receive with some partial response and call wait_for_task_finish/1 recursively.
It is always better to have some indicator, that the processing ended, because you don't have guarantees on message delivery time. This means, that you could check, that all processes currently are waiting for message and think, that they ended processing, but actually, they did not started yet or one of them send message to other and you caught them before the second one had it in message box.
If the processes C1..Cn have only parts of actual work and don't know about the progress, than the gatekeeper P should know how many parts there were, receive all of them one by one and then call ready_for_next_task/1.

Related

How to guarantee message queue/order in Erlang?

If one server receives multiple requests from one process by using pid ! Msg, but the process time for each request is different, then how to guarantee the sender receives the reply in order?
From the Erlang FAQ:
10.8 Is the order of message reception guaranteed?
Yes, but only within one process.
If there is a live process and you send it message A and then message B, it's guaranteed that if message B arrived, message A arrived before it.
On the other hand, imagine processes P, Q and R. P sends message A to Q, and then message B to R. There is no guarantee that A arrives before B. (Distributed Erlang would have a pretty tough time if this was required!)
That is, if the server processes the requests in the order they arrive, and sends the responses in the order the requests were processed, then the sender will receive the responses in order.
the Erlang receive clause can do pattern matching. So what you can do is create a reference for each message that you want to receive and then pattern match on that reference.
Check out this gist if you look at line 26 you will see that the receive clause is waiting for a message with a specific pid. In this case the messages will arrive in an arbitrary order but by virtue of this receive, they will be put into order.

Passing variables through exit signals [Erlang]

I'm wondering if it's possible to send variables from a dying process to it's calling process. I have a process A that spawned another process B through spawn_link. B is about to die by calling exit(killed). I can catch this in A through {'EXIT', From, killed}, but I'd like to pass some variables from B to A before it dies. I can do this by sending a message from B to A right before it dies, but I'm wondering if this is a 'bad' thing to do. Because technically I'd be sending two messages from B to A. Right now, what I have looks like this:
B sends a message with values to A
A receives values and re-enters receive loop
B calls exit(killed)
A receives EXIT message and spawns another linked process
The idea is that B should always exist and when it gets killed, it should be 'resurrected' immediately. What seems like a better alternative in my opinion is to have something like exit(killed, [Variables]) and to catch it with {'EXIT', From, killed, [Variables]}. Is this possible? And if so, are there any reasons for not doing it? Having A store values for B when B hasn't even died yet seems like a bad move. I'd have to start implementing atomic actions to prevent problems with two linked processes dying at the same time. It also forces me to keep the variables in my receive loop.
What I mean is, if I could send values directly with the EXIT call, my loop would look like this:
loop() ->
receive ->
{'EXIT', From, killed, Variables} -> % spawn new linked process with variables
end.
But if I first need to receive a message, get into the loop again to then receive the exit message, I would get;
loop(Vars) ->
receive ->
{values, Variables} -> loop(Variables);
{'EXIT', From, killed} -> % spawn new linked process with variables
end.
This means I keep the list of variables long after I don't need them anymore and I need to enter my loop twice for what could be considered one action.
To answer your question directly: the exit reason can be any term, which means it can also be a tuple like exit({killed, Values}), so instead of receiving {'EXIT', From, killed, Values} you would received {'EXIT', From, {killed, Values}}.
But!
The way you are doing it now is not wrong. Its not particularly ugly, either. Sending a message (especially an asynchronous one) isn't some major operation to be minimized as much as possible, and neither is spawning/killing processes. If your way works for you, fine.
But! (again!)
Why are you doing this in the first place? Consider what it is about state that you need to be shuttling between two processes, one of which you are terminating just then? Should this value be a permanent entity held by the spawning process? Should it die with the worker? Should it be a quantity maintained by a third process and asked for as part of the worker's startup (a more general phrasing of what Łukasz Ptaszyński was getting at)?
I don't know the answers to those questions, because I don't know your program, but they are the things I would think about if I was finding it necessary to do this sort of work. In particular, if there is some base value that process A must seed process B with for it to work, and the next version of the base value is dependent on something process B does, then process B should be returning it as a part of its processing, not as a part of its shutdown.
This seems like a minor semantic difference, but its important to think about. You may find that you shouldn't be terminating B at all, or that you really need A to manage a directory for several concurrent B's and they should seed themselves as they move along, or whatever. You might even find that this means A should be spawning B as a synchronous, monitored operation, not an asynchronous linked one, and the whole herd of processes should be spawned as a complex of multiple managed A-B pairs! I don't know the answers in your case, but these are the things that come to mind on reading what you are doing.
I think you can try this method:
main()->
ParentPid = self(),
From = spawn_link(?MODULE, child, [ParentPid]),
receive
{'EXIT', From, Reason} ->
Reason
end.
child(ParentPid) ->
Value = 2*2,
exit(ParentPid, {killed, Value}).
Please read this link about erlang:exit/2

What guarantees does erlang's "monitor" give?

While reading the ERTS user's guide, I found this section:
The only signal ordering guarantee given is the following. If an entity sends multiple signals to the same destination
entity, the order will be preserved. That is, if A sends a signal S1 to B, and later sends the signal S2 to B, S1 is
guaranteed not to arrive after S2.
I've also happened across this while doing further research googling:
Erlang Reference Manual, 13.5:
Message sending is asynchronous and safe, the message is guaranteed to eventually reach the recipient, provided that the recipient exists.
That seems very vague and I'd like to know what guarantees I can rely on in the following scenario:
A,B are processes on two different nodes.
Assume A does not crash and B was a valid node at some point.
A and B monitor each other.
A sends messages M1,M2,M3 to B
In the above scenario, is it possible that B receives M1,M3 (M2 is dropped),
without any sort of 'DOWN'/'EXIT'/heartbeat timeout being received at A?
There are no other guarantees other than the ordering guarantee. Note that by default you don't even know who the sender is, unless the sender encodes this in the message.
Your example could happen:
A sends M1 and M2
B receives M1
The node on which B resides gets disconnected
The node on which B resides comes up again
A sends M3 to B
B receives M3
M2 can be lost on the network link in this scenario. It is highly unlikely this happens, but it can happen. The usual trick is to have some kind of notion of such errors. Either by having a timeout trigger, or by monitoring the node or Pid which is the recipient of the message.
Updated scenario:
In the updated scenario, provided I read it correctly, then A would get a 'DOWN' style message at some point, and likewise, it would get a message telling you that the node is up again, if you monitor the node.
Though often, such things are better modeled using an idempotent protocol if at all possible.
Reading through the erlang mailing-list and the academic faq, it seems like there are a few guarantees provided by the ERTS implementation, however I was not able to determine whether or not they are guaranteed at a language/specification level, too.
If you assume TCP is "reliable", then the current implementation guarantees that
given A,B are processes on different nodes (&hosts) and A monitors B, A sends to B, assuming A doesn't crash, any message delivery failures* between the two nodes or host/node/process failures on B will lead to A getting a 'DOWN' message (or 'EXIT' in the case of links). [ See 1 and 2 ]
*From what I have read on the mailing-list thread , this property is almost entirely based on the fact that TCP is used, so "message delivery failure" means any situation where TCP decides that a failure has occurred/the connection needs to be closed.
The academic faq talks about this like it's also a language/specification level guarantee, however I couldn't yet find anything to back that up.

Is it possible implement Pregel in Erlang without supersteps?

Let's say we implement Pregel with Erlang. Why do we actually need supersteps? Isn't it better to just send messages from one supervisor to processes that represent nodes? They could just apply the calculation function to themselves, send messages to each other and then send a 'done' message to the supervisor.
What is the whole purpose of supersteps in concurrent Erlang implementation of Pregel?
The SuperStep concept as espoused by the Pregel model could be viewed as sort of a Barrier for parallel-y executing entities. At the end of each superstep, each worker, flushes it state to the persistent store.
The algorithm is check-pointed at the end of each SuperStep so that in case of failure, when a new node has to take over the function of a failed peer, it has a point to start from. Pregel guarantees that since the data of the node has been flushed to disk before the SuperStep started, it can reliably start from exactly that point.
It also in a way signifies "progress" of the algorithm. A pregel algorithm/job can be provided with a "max number of supersteps" after which the algorithm should terminate.
What you specified in your question (about superisors sending worker a calculation function and waiting for a "done") can definitely be implemented (although I dont think the current supervisor packaged with OTP can do stuff like that out of the box) but I guess the concept of a SuperStep is just a requirement of a Pregel model. If on the other hand, you were implementing something like a parallel mapper (like what Joe implements in his book) you wont need supersteps/

In what order does an Erlang process consume messages?

Are messages processed in a first-come-first-serve basis or are they sorted by timestamp or something like that?
Order of messages is preserved between a process and another one. Reading from the FAQ:
10.9 Is the order of message reception guaranteed?
Yes, but only within one process.
If there is a live process and you send it message A and then message
B, it's guaranteed that if message B arrived, message A arrived before
it.
On the other hand, imagine processes P, Q and R. P sends message A to
Q, and then message B to R. There is no guarantee that A arrives
before B. (Distributed Erlang would have a pretty tough time if this
was required!)
#knutin is right regarding how you can consume messages within a process. As an addition, note that you might use two subsequent receive statements to ensure that a certain message is consumed after another one:
receive
first ->
do_first()
end,
receive
second ->
do_second()
end
The receive statement is blocking. This will ensure that you never do_second() before you do_first(). The difference from #knutin's second solution is that, in that case, if something not important arrives just before an important one, you queue the important bit.
The mailbox is always kept in the order the messages arrived.
However, the order the messages are consumed is determined by your code.
If you have a plain process with a generic receive clause that receives anything, the order you get messages are the same as the order they arrived in.
loop() ->
receive
Any ->
do_something(Any),
loop()
end.
However, if you have a selective receive with match clauses, it will search the mailbox for messages of this specific type and consume the first matching message, effectively skipping non-matching messages. In the following example, if there are messages tagged as important in the queue, they will be processed before any other message. Note: Matching like this will search all messages in the queue, which is a problem for many messages. There has been some developments in this area, but I'm not up to speed.
loop() ->
receive
{important, Stuff} ->
do_something_important(Stuff),
loop();
Any ->
do_something(Any)
loop()
end.
to further define the answer, I would like to point the fact that, as stated above, messages that don't pattern match are skipped, but in reality they are simply put aside and then reintroduced in order (so first that any other message arrived after the not matching messages) for next receive pattern matching.
This problem really shows its worst when you have, for example, a gen_server behaviour module because in this case, having always the same pattern matching call scheme, messages not in scope are going to flood message queue unless you define a (ugly and error prone, IMHO) match-all pattern like:
receive
... -> ...;
... -> ...;
MatchAllPatterns -> ok.
end

Resources