Message sending in Erlang under the hood - erlang

Message sending in Erlang is asynchronous, meaning that a send expression such as PidB ! msg evaluated by a process PidA immediately yields the result msg without blocking the latter. Naturally, its side effect is that of sending msg to PidB.
Since this mode of message passing does not provide any message delivery guarantees, the sender must itself ascertain whether a message has been actually delivered by asking the recipient to confirm accordingly. After all, confirming whether a message has been delivered might not always be required.
This holds true in both the local and distributed cases: in the latter scenario, the sender cannot simply assume that the remote node is always available; in the local scenario, where processes live on the same Erlang node, a process may send a message to a non-existent process.
I am curious as to how the side effect portion of !, i.e, message sending, works at the VM-level when the sender and recipient processes live on the same node. In particular, I would like to know whether the sending operation completes before returning. By completes, I mean to say that for the specific case of local processes, the sender: (i) acquires a lock on the message queue of the recipient, (ii) writes the message directly into its queue, (iii) releases the lock and, (iv) finally returns.
I came across this post which I did not fully understand, although it seems to indicate that this could be the case.

Erik Stenman's The Beam Book, which explains many implementation details of the Erlang VM, answers your question in great detail in its "Lock Free Message Passing" section. The full answer is too long to copy here, but the short answer to your question is that yes, the sending process completely copies its message to a memory area accessible to the receiver. If you consult the book you'll find that it's more complicated than steps i-iv you describe in your question due to issues such as different send flags, whether locks are already taken by other processes, multiple memory areas, and the state of the receiving process.

Related

How to handle errors during asynchronous response delivery from SMSR back to SMDP/Operator?

In many scenarios the response with the result of the operation execution is delivered asynchronously to the operation initiator (SMDP or Operator). For example step (13) in 3.3.1 of SGP.02 v4.2:
(13) The SM-SR SHALL return the response to the “ES3.EnableProfile” function to SM-DP, indicating that the Profile has been enabled
It is not clear how SMSR should act if the call that contains the result of the operation fails. Should SMSR retry such call all the time or it is ok to try just once and give up after that? Does this depend on the type of error that happened during such call?
I'm concerned about the cases when the result is sent and may have been processed by the initiator but the information about that was not properly delivered back to SMSR. In order for SMSR to be required to retry the initiator should be ready to receive the same operation result status again and process it accordingly that is ignore and just acknowledge.
But I can't see anything in the SGP02 v4.2 that specifies what the behaviour of SMSR and SMDP should be in this case. Any pointers to the documentation specifying this are much appreciated.
In general it is not clear how the rollback to a valid know state should happen in this situation. Who is responsible for that (SMSR or SMDP in this example of profile enabling)?
I'm not aware of any part of the specification defining this. Neither in SGP.02, SGP.01 and the test specification SGP.11. There are operational requirements in the SAS certification for a continuous service. But this is not technically defined.
I have experience in implementing the specification. The approach was a message queue with Kafka and a retry policy. The specification says SHALL, which means try very hard. Any implementation dropping the message after a single try is not very quality oriented. The common sense in distributed (micro service) based systems is that there are failures which have to be handled, so this assumption was taken without being expressed in the SGP specification.
The example of the status of a profile should be idempotent, sending a message twice should not be harmful. The MessageID and RelatesTo is also useful here. I assume for auditing the request / responses are recorded anyway in your system.
In case you are sitting at the other end and are facing a badly implemented SM-SR and nt status message arrives, the ES3.GetEIS can be called by the SM-DP later to get the current status.
I have already contacted the authors directly. At the end of the document the email is mentioned:
It is our intention to provide a quality product for your use. If you
find any errors or omissions, please contact us with your comments.
You may notify us at prd#gsma.com

How do Erlang/Akka etc. send messages under the hood? Why doesn't it lead to deadlock?

Message sending is a useful abstraction, but it seems to be a bit misleading because it is not like letters sent through a post box that are literally moving through the system.
Similarly in Kafka they talk about messages but really it's just reading/writing to a distributed, append-only log.
In Erlang/Akka you actually copy the data rather than 'send it' so how does this work?
I was imagining something like Alice sends a message to Bob by
acquiring a lock to Alice's queue (i.e. mailbox)
write the message to the queue
release the lock
do something else
Given that you can send a message to anyone how does this not result in a massive deadlock with processes all waiting to message Alice. It seems like it might be useful to have multiple intermediate mailboxes for popular actors so you can write to that and then go do something else faster.
The receiver is not locking its mailbox when it is waiting for a message; only when it checks it, briefly. If there is no matching message, it releases the lock and goes to sleep, then gets woken up when new messages arrive. Likewise, senders also only need to aquire the lock while inserting the message. There is never any deadlock situation on this level.
Processes may still get deadlocked because of logical errors where both are expecting a message from the other at the same time, but that's a different matter, and the message passing style makes it less likely to end up in that situation, because there is no lock management to screw up on the user level.
As you mention, yes, it is useful to have intermediate mailboxes to reduce contention (a sender can add to the incoming side of the mailbox while a receiver is holding a lock to scan through the messages arrived so far), and that optimization is handled for you under the hood by the Erlang VM.

Erlang dead letter queue

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.
There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

What happens in Erlang if return receipt never arrives?

I just happened to read the thesis of Joe Armstrong and don't have much prior knowledge of Erlang. I wonder what happens if a delivery receipt for some message never arrives. What does the sending actor do? It sends the message another time? This could confuse the recipient actor when it receives the same message another time. It has to be able to tell that its receipt was not received and therefore the second message is void.
That kind of problems always kept me away from solutions where message delivery is not transactional. I think I know the answer: the sending actor tells its supervising actor that something must be wrong when it didn't obtain a receipt in reasonable time causing the supervisor to take some action (like restarting the involed actors or something). Is this correct? I see no other solution that doesn't result in theroretically possible infinite message sends.
Thanks for any answer,
Oliver
In Erlang, the sender of a message usually forget it immediately after sending it and continue its job. if an application need an acknowledge of the message reception, you have to build your own protocol (or use an existing one). There are many good reason for that.
One is that most of the time it is not necessary to have this handshake. The higher risk for a message to be ignored is that the receiving process does not exist anymore, or died in the mean time, and the sender in this case has very few chance to do some interesting stuff.
Also, the hand shake is a blocking action, so there is a performance impact, and a risk of deadlock.
The acknowledge should be also a message, but this one should not be acknowledged, otherwise you create a never ending loop of message. Only the application could know what to do (for example using a send with acknowledge or not) and it is really easy to write this kind of function (or use a behavior that implement it). For example:
send_with_ack(To,Mess,TimeOut,Ack) ->
Ref = make_ref(),
To ! {Mess,self(),Ref},
receive
{Ack,Ref} -> Ack
after Timeout ->
{error,timeout}
end.
receiving_process() ->
...
receive
{Pattern_matching_Mess,From,Ref} ->
do_something(),
From ! {Ack,Ref}, %% Ack for this kind of message is known by the receiver
do_somethingelse();
Mess1 -> do_otherthing()
end,
...
with little work, it is even possible to delegate the survey of message delivery to a new process - not blocking check - and using linked process, force a crash of the sender if the timeout is reached.

Race condition between trap_exit EXIT msg and common msg

Hi the question is as following:
Assume we have processes A and B which are linked. Process's A flag trap_exit is set to true. Let B process send a msg to A and then exit:
PidA ! 'msg',
exit(reason).
What I wanna know if we can be shure that the process A will receive 'msg' and only after It {'EXIT', Pid, reason} will come ? Can we predict the ordering of msgs? I can't found any proofs in documentation, but I guess that it will work that way, but I need some proofs. Don't want to have race condition here..
As to not leave this question hanging. This is the discussion in erlang-questions mailing list:
http://thread.gmane.org/gmane.comp.lang.erlang.general/66788
Long story short: all messages are signals (or all signals are messages), exits are seen as messages from the process, guaranteed to arrive in the same order they were sent.
Sounds like a code smell to me. Why do you need to rely on trap_exit? Have you thought of alternatives, e.g. proper monitoring?
I've got the O'Reilly Erlang programming book here, and in Chapter 4, in the section Message Passing, it says:
Messages are stored in the mailbox in the order in which they are delivered. If two messages are sent from one process to another, the messages are guaranteed to be received in the same order in which they are sent. This guarantee is not extended to messages sent from different processes, however, and in this case the ordering is VM-dependent.
However, in your case, I'm not sure the exit message actually comes from process B. It might originate somewhere in the bowels of the VM. If I wanted to be sure, I would actually have process A trigger the exit of process B when it receives your notification message instead.

Resources