What guarantees does erlang's "monitor" give? - erlang

While reading the ERTS user's guide, I found this section:
The only signal ordering guarantee given is the following. If an entity sends multiple signals to the same destination
entity, the order will be preserved. That is, if A sends a signal S1 to B, and later sends the signal S2 to B, S1 is
guaranteed not to arrive after S2.
I've also happened across this while doing further research googling:
Erlang Reference Manual, 13.5:
Message sending is asynchronous and safe, the message is guaranteed to eventually reach the recipient, provided that the recipient exists.
That seems very vague and I'd like to know what guarantees I can rely on in the following scenario:
A,B are processes on two different nodes.
Assume A does not crash and B was a valid node at some point.
A and B monitor each other.
A sends messages M1,M2,M3 to B
In the above scenario, is it possible that B receives M1,M3 (M2 is dropped),
without any sort of 'DOWN'/'EXIT'/heartbeat timeout being received at A?

There are no other guarantees other than the ordering guarantee. Note that by default you don't even know who the sender is, unless the sender encodes this in the message.
Your example could happen:
A sends M1 and M2
B receives M1
The node on which B resides gets disconnected
The node on which B resides comes up again
A sends M3 to B
B receives M3
M2 can be lost on the network link in this scenario. It is highly unlikely this happens, but it can happen. The usual trick is to have some kind of notion of such errors. Either by having a timeout trigger, or by monitoring the node or Pid which is the recipient of the message.
Updated scenario:
In the updated scenario, provided I read it correctly, then A would get a 'DOWN' style message at some point, and likewise, it would get a message telling you that the node is up again, if you monitor the node.
Though often, such things are better modeled using an idempotent protocol if at all possible.

Reading through the erlang mailing-list and the academic faq, it seems like there are a few guarantees provided by the ERTS implementation, however I was not able to determine whether or not they are guaranteed at a language/specification level, too.
If you assume TCP is "reliable", then the current implementation guarantees that
given A,B are processes on different nodes (&hosts) and A monitors B, A sends to B, assuming A doesn't crash, any message delivery failures* between the two nodes or host/node/process failures on B will lead to A getting a 'DOWN' message (or 'EXIT' in the case of links). [ See 1 and 2 ]
*From what I have read on the mailing-list thread , this property is almost entirely based on the fact that TCP is used, so "message delivery failure" means any situation where TCP decides that a failure has occurred/the connection needs to be closed.
The academic faq talks about this like it's also a language/specification level guarantee, however I couldn't yet find anything to back that up.

Related

How to guarantee message queue/order in Erlang?

If one server receives multiple requests from one process by using pid ! Msg, but the process time for each request is different, then how to guarantee the sender receives the reply in order?
From the Erlang FAQ:
10.8 Is the order of message reception guaranteed?
Yes, but only within one process.
If there is a live process and you send it message A and then message B, it's guaranteed that if message B arrived, message A arrived before it.
On the other hand, imagine processes P, Q and R. P sends message A to Q, and then message B to R. There is no guarantee that A arrives before B. (Distributed Erlang would have a pretty tough time if this was required!)
That is, if the server processes the requests in the order they arrive, and sends the responses in the order the requests were processed, then the sender will receive the responses in order.
the Erlang receive clause can do pattern matching. So what you can do is create a reference for each message that you want to receive and then pattern match on that reference.
Check out this gist if you look at line 26 you will see that the receive clause is waiting for a message with a specific pid. In this case the messages will arrive in an arbitrary order but by virtue of this receive, they will be put into order.

Erlang/OTP How to notify parent process that child processes are idle and no messages in their mailbox

I would like to design a process hierarchy where there is a a parent process P which acts like a gatekeeper and delegates the work(messages/events from its client processes) to it's children processes C1,C2..Cn which collaborate with each other and may send the result back to P. The children processes cannot talk to any process outside, only P.
The challenge is that though P may have multiple messages from its clients, it should accept only one message, delegate the work to C1..Cn and ONLY accept the next message from its clients
when all its children processes are done(or idle) and there are no more messages circulating between C1 to Cn.
P finishes accepting messages from C1..Cn so that it can return the result to its clients
Constraints:
Idle for me is when they are waiting with a receive (blocking) or simply exited.
C1 to Cn are finite state machines. Some or all of them may send messages back to C. Or there may be no messages to be sent back to C. Even if no messages are sent back to C, C has to figure out that all of them are done with no messages between them.
If any of C1 to Cn have been pre-empted, then it is considered busy(this may be obvious but I thought I'll put it here for completion) and C will not receive the next message
Is there an OTP pattern or library which will do this for me (before I hack something?). I know that process_info can let me know if the mailbox of a process are empty and I could keep on checking the children's mailboxes from P but it would be unnecessary polling from P.
EDIT GENERAL: I am trying to implement a reactive variant of Flow Based Programming on the Erlang platform. This has the notion of 'hierarchical processes' or composites which themselves may contain composite processes until we reach some boxes of actual code...I am going to research(looking at monitor,process_info,process_flag) but I wanted to respond to your excellent answers
EDIT RECURSIVE PARENTS: Each of C1 and Cn can themselves be parent/composite processes. If I just spawn processes and let them exit immediately, I'll have to create the chain of Composites everytime as C1..Cn may themselves be composites (which spawn composites..and so on). Finally, when we reach a leaf box(which is not a composite node), they are supposed to be finite state machines.. so I'm not sure of spawning and making them exit quickly if the are FSMs.
EDIT TKOWAL: Since I am trying to create a generic parent/composite process, it does not know 'when' the task ends. All it does is relay the messages it receives from its children to it's siblings with the 'constraint' that it will not accept the next message from its client/siblings until its children are 'done'. The children C1..Cn may send not just one but many messages. I understand from your proposal, that wait_for_task_finish will stop blocking the moment it gets the first message. But more messages may be emitted too by P's children. P should wait for all messages. Also, having a task_end symbol will not work for the same reason(i.e. multiple messages possible from the children)
Given how inexpensive it is to start up Erlang processes, your gatekeeper could start new children for each incoming task, and then wait for them all to exit normally once they complete their work.
But in general, it sounds like you're looking for a process pool. There are a few of these already available, such as poolboy and sidejob. Pools can be harder to get right than you think, so I advise using an existing proven pool implementation before attempting to write your own.
After edits, this became entirely different question, so I am posting new answer.
If you are trying to write Flow Based Programming, then you are probably solving wrong problem. FBP is effective, because almost everything is asynchronous and you start processing next request immediately after you finished with previous one.
So, the answer is - don't wait for children to finish:
In FBP, there is no time dependencies between the components. So if I
have a chunk of data, it should be able to flow from one end of the
diagram to the other regardless of how any other pieces of data are
being handled. In order to program an FBP system, you have to minimize
your dependencies.
source
When creating parent and children, you know all the connections between blocks, so just configure children to send processed data directly to next block. For example: P1 has children C1 and C2. You send message to P1, it delegates it to C1, packet flows couple of times between C1 and C2 and after that, C1 or C2 sends it directly to P2.
Blocks should be stateless. They output should not depend on previous requests, so even if C1 and C2 are processing data from two different requests to P1 - it is OK. There could be situations, where P1 gets data packet D1 and then D2, but will output answers in different order R2 and then R1. It is also OK. You can use Erlang reference to tag messages and then check, which response is from which request.
I don't think, there is ready library for that, but it is really easy to hack, unless I missed something. Your P process should look like this:
ready_for_next_task() ->
receive
{task, Task, CallerPid} ->
send_task_to_workers(Task)
end,
wait_for_task_finish(CallerPid).
wait_for_task_finish(CallerPid) ->
receive
{task_end, Response} ->
CallerPid ! Response
end,
ready_for_next_task().
In wait_for_task_finish/1 you have only one clause for receive, so it will not accept next task, until current one is finished. If you are waiting for multiple responses from workers, you can simply add second clause to receive with some partial response and call wait_for_task_finish/1 recursively.
It is always better to have some indicator, that the processing ended, because you don't have guarantees on message delivery time. This means, that you could check, that all processes currently are waiting for message and think, that they ended processing, but actually, they did not started yet or one of them send message to other and you caught them before the second one had it in message box.
If the processes C1..Cn have only parts of actual work and don't know about the progress, than the gatekeeper P should know how many parts there were, receive all of them one by one and then call ready_for_next_task/1.

Missing master heartbeat does not cause node to react in a CANopen system

I have a strange finding about the heartbeat-protocol in CANopen. Maybe somebody else has seen something like this and maybe it is supposed to work like this... Anyway, here's what it's about:
In CANopen there are two timeout-based life-guarding mechanisms: the first is node guarding, which I will not mention further, since it's considered old news.
The other one is called heartbeat. It is pretty simple: Any participant on the network sends a regular message stating its node ID and its state. The frequency is defined by object 0x1017sub0 and is called heartbeat-producer-time. If it is set to zero, no heartbeat is being sent.
Any other participant can then define a number of nodes it wants to find on the network plus the maximum time there may be between two consecutive heartbeat-messages. This information is stored in object 0x1016sub1..n as 32-bit entries for as many nodes as this particular node wants to listen to.
The entries consist of the node ID (bits 22 to 16) and the mentioned maximum time that may elaps between heartbeats, called the heartbeat-consumer-time (in bits 15..0). Again if the entry is zero, it is being ignored.
As you may have gathered, there is no distinction between network-master (node ID 1) and slaves (node IDs 2 to 127).
So far the theory, now for my problem:
I configure one of the slave-nodes in my network as a heartbeat-consumer for the master, so there's an entry in object 0x1016sub1 that looks like this: 0x000107D0. Meaning that a heartbeat-message from the master is expected after at least two seconds.
I have observed that this works in two examples. If I send a master-heartbeat for a time and then stop, the node either returns to pre-operational mode or sends an appropriate emergency-message.
If I don't send any master-heartbeat-messages, I would expect that after I start the node (send it into operational mode) it takes at most two seconds for the node to either return to pre-operational mode or send an appropriate emergency-message or perhaps even both. But in the two examples I tried, nothing happened. If I never send any heartbeat, the node never expects one and just keeps on running.
The two examples are very different from each other. I am not sure whether they use the same CANopen-stack library perhaps.
Is there an explanation?
If you read CANopen User Manual, section 1.3.1.6, page 39, you will notice that the heartbeat consumer is first activated upon receiving a heartbeat from the producer. I would assume then that, since in your example the first heartbeat is never sent, the consumer is not activated.

Network Traffic in AMQP QPID

QPID AMQP
I have a question regrading network traffic . suppose I have a Publisher on Machine A . The Qpid broker is running on Machine B . WE have two subscribers Machine C and Machine D (They both subscribe to same topics). Now Imagine a topology where
A-->B-->X-->C
|
D
(Publisher A is connected to B and subscriber C and D are connected to Broker through and intermediate node X)
Message that is published by A which matches the topics for C and D will be received by both .What I want to know is that will the edge b->x carry the message twice (once for b->x->c and second time for b->x->c). Or is the AMQP/qpid framework intelligent enough to send message once from B to X and then send copies to each individual subscriber (hence less network traffic on b->x).
What I thought was that since X knows nothing and if we have private subscription queues for each subscriber (or even if shared queue and browsing/copying message instead of consuming) , the message will be travelling twice through b->x
This question is not specific to QPID . I would like to know the solutions for other Broker based (RabbitMQ ) and brokerless messaging frameworks (Zero MQ , LBM/UMS). I read in an article Zero Mq tries to provide a smarter solution http://www.250bpm.com/pubsub#toc4 , but it seems complicated since how would intermediate hops know about when to send multiple copies or not (I am not Networking expert so i might be missign something obvoius ,so any help would be really appreciated)
I'm assuming X is another Qpid broker, connected to B through the 'federation' feature. That being the case, the message will not be transported twice from B to X.
There are different ways you can configure this, depending on other requirements for the scenario.
The first is to statically link X to B: you create a queue on B for X to subscribe to, bind that Q to the exchange in question such that messages for both C and D will match, then use qpid-route to create a bridge from that queue to the exchange on X. C and D now connect and bind to that exchange on X and will receive the messages published by A as expected. In this case the messages will always flow from B to X regardless of whether C or D are active. If you add another consumer, E, then you may need to statically add a binding to the brdiged queue on B.
The second option is to use dynamic routing. This will automatically handle the propagation of binding information from X to B such that messages will only flow from B to X if they are needed by an active binding on X.
RabbitMQ will also only propagate a message across an intermediate link such as this once (and it will only get sent at all if some downstream consumer will actually end up seeing the message).

erlang : ordering of trace messages originating from a single process

That is the simple question, i can not find a clear answer to:
Can one assume that the order of trace messages belonging to a single process are sent in the order in which corresponding events occur ?
(The icing on the cake would of course be the source where is is specified :) )
thank you
Messages from a process A to a process B are guaranteed to always be ordered. It would be right to assume the trace events will also be ordered.
This guarantee doesn't hold when many processes message another one: if A and C both message B and A fires before C, there is no guarantee that A's message will be there first. Similarly, if A messages both B and C, there is no guarantee that C won't have its messages before B.
This could cause confusion if there is IO being done while tracing -- IO goes through a specific process (the group leader) that acts as a server, so outputting trace vs. stuff that is happening right now might give funny results.

Resources