Race condition between trap_exit EXIT msg and common msg - erlang

Hi the question is as following:
Assume we have processes A and B which are linked. Process's A flag trap_exit is set to true. Let B process send a msg to A and then exit:
PidA ! 'msg',
exit(reason).
What I wanna know if we can be shure that the process A will receive 'msg' and only after It {'EXIT', Pid, reason} will come ? Can we predict the ordering of msgs? I can't found any proofs in documentation, but I guess that it will work that way, but I need some proofs. Don't want to have race condition here..

As to not leave this question hanging. This is the discussion in erlang-questions mailing list:
http://thread.gmane.org/gmane.comp.lang.erlang.general/66788
Long story short: all messages are signals (or all signals are messages), exits are seen as messages from the process, guaranteed to arrive in the same order they were sent.

Sounds like a code smell to me. Why do you need to rely on trap_exit? Have you thought of alternatives, e.g. proper monitoring?

I've got the O'Reilly Erlang programming book here, and in Chapter 4, in the section Message Passing, it says:
Messages are stored in the mailbox in the order in which they are delivered. If two messages are sent from one process to another, the messages are guaranteed to be received in the same order in which they are sent. This guarantee is not extended to messages sent from different processes, however, and in this case the ordering is VM-dependent.
However, in your case, I'm not sure the exit message actually comes from process B. It might originate somewhere in the bowels of the VM. If I wanted to be sure, I would actually have process A trigger the exit of process B when it receives your notification message instead.

Related

How do Erlang/Akka etc. send messages under the hood? Why doesn't it lead to deadlock?

Message sending is a useful abstraction, but it seems to be a bit misleading because it is not like letters sent through a post box that are literally moving through the system.
Similarly in Kafka they talk about messages but really it's just reading/writing to a distributed, append-only log.
In Erlang/Akka you actually copy the data rather than 'send it' so how does this work?
I was imagining something like Alice sends a message to Bob by
acquiring a lock to Alice's queue (i.e. mailbox)
write the message to the queue
release the lock
do something else
Given that you can send a message to anyone how does this not result in a massive deadlock with processes all waiting to message Alice. It seems like it might be useful to have multiple intermediate mailboxes for popular actors so you can write to that and then go do something else faster.
The receiver is not locking its mailbox when it is waiting for a message; only when it checks it, briefly. If there is no matching message, it releases the lock and goes to sleep, then gets woken up when new messages arrive. Likewise, senders also only need to aquire the lock while inserting the message. There is never any deadlock situation on this level.
Processes may still get deadlocked because of logical errors where both are expecting a message from the other at the same time, but that's a different matter, and the message passing style makes it less likely to end up in that situation, because there is no lock management to screw up on the user level.
As you mention, yes, it is useful to have intermediate mailboxes to reduce contention (a sender can add to the incoming side of the mailbox while a receiver is holding a lock to scan through the messages arrived so far), and that optimization is handled for you under the hood by the Erlang VM.

Message sending in Erlang under the hood

Message sending in Erlang is asynchronous, meaning that a send expression such as PidB ! msg evaluated by a process PidA immediately yields the result msg without blocking the latter. Naturally, its side effect is that of sending msg to PidB.
Since this mode of message passing does not provide any message delivery guarantees, the sender must itself ascertain whether a message has been actually delivered by asking the recipient to confirm accordingly. After all, confirming whether a message has been delivered might not always be required.
This holds true in both the local and distributed cases: in the latter scenario, the sender cannot simply assume that the remote node is always available; in the local scenario, where processes live on the same Erlang node, a process may send a message to a non-existent process.
I am curious as to how the side effect portion of !, i.e, message sending, works at the VM-level when the sender and recipient processes live on the same node. In particular, I would like to know whether the sending operation completes before returning. By completes, I mean to say that for the specific case of local processes, the sender: (i) acquires a lock on the message queue of the recipient, (ii) writes the message directly into its queue, (iii) releases the lock and, (iv) finally returns.
I came across this post which I did not fully understand, although it seems to indicate that this could be the case.
Erik Stenman's The Beam Book, which explains many implementation details of the Erlang VM, answers your question in great detail in its "Lock Free Message Passing" section. The full answer is too long to copy here, but the short answer to your question is that yes, the sending process completely copies its message to a memory area accessible to the receiver. If you consult the book you'll find that it's more complicated than steps i-iv you describe in your question due to issues such as different send flags, whether locks are already taken by other processes, multiple memory areas, and the state of the receiving process.

Erlang dead letter queue

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.
There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

What happens in Erlang if return receipt never arrives?

I just happened to read the thesis of Joe Armstrong and don't have much prior knowledge of Erlang. I wonder what happens if a delivery receipt for some message never arrives. What does the sending actor do? It sends the message another time? This could confuse the recipient actor when it receives the same message another time. It has to be able to tell that its receipt was not received and therefore the second message is void.
That kind of problems always kept me away from solutions where message delivery is not transactional. I think I know the answer: the sending actor tells its supervising actor that something must be wrong when it didn't obtain a receipt in reasonable time causing the supervisor to take some action (like restarting the involed actors or something). Is this correct? I see no other solution that doesn't result in theroretically possible infinite message sends.
Thanks for any answer,
Oliver
In Erlang, the sender of a message usually forget it immediately after sending it and continue its job. if an application need an acknowledge of the message reception, you have to build your own protocol (or use an existing one). There are many good reason for that.
One is that most of the time it is not necessary to have this handshake. The higher risk for a message to be ignored is that the receiving process does not exist anymore, or died in the mean time, and the sender in this case has very few chance to do some interesting stuff.
Also, the hand shake is a blocking action, so there is a performance impact, and a risk of deadlock.
The acknowledge should be also a message, but this one should not be acknowledged, otherwise you create a never ending loop of message. Only the application could know what to do (for example using a send with acknowledge or not) and it is really easy to write this kind of function (or use a behavior that implement it). For example:
send_with_ack(To,Mess,TimeOut,Ack) ->
Ref = make_ref(),
To ! {Mess,self(),Ref},
receive
{Ack,Ref} -> Ack
after Timeout ->
{error,timeout}
end.
receiving_process() ->
...
receive
{Pattern_matching_Mess,From,Ref} ->
do_something(),
From ! {Ack,Ref}, %% Ack for this kind of message is known by the receiver
do_somethingelse();
Mess1 -> do_otherthing()
end,
...
with little work, it is even possible to delegate the survey of message delivery to a new process - not blocking check - and using linked process, force a crash of the sender if the timeout is reached.

spawn_monitor() and 'DOWN' messages

Is it (theoretically) possible that the process that's been spawn_monitor()'ed exits (with the normal exit or on error) without sending 'DOWN' message to the parent process ? I have a very strange process leakage, it seems like some of the processes do not send 'DOWN' message. I am using Erlang package that comes with Ubuntu 9.10. Maybe it is a known bug ?
You'll need to show some code. Monitoring is pretty core to the way erlang works.
It's hard to tell what your actual problem is since you're not describing what you're seeing, so I'll have to guess.
You're either not trying to receive the down message or the process isn't exiting.
If you have processes leaking, it sounds like they're not actually exiting.
You very well may be trying to build your own supervisor module. I'd strongly suggest using OTP's supervisor if you want sane process tree shutdown and/or restart.
Maybe you demonitored the process at some point?
Reading from the doc for erlang:demonitor/1:
Once erlang:demonitor(MonitorRef) has returned it is guaranteed that no
{'DOWN', MonitorRef, _, _, _} message
due to the monitor will be placed in
the callers message queue in the
future. A {'DOWN', MonitorRef, _, _,
_} message might have been placed in the callers message queue prior to the
call, though. Therefore, in most
cases, it is advisable to remove such
a 'DOWN' message from the message
queue after monitoring has been
stopped. erlang:demonitor(MonitorRef,
[flush]) can be used instead of
erlang:demonitor(MonitorRef) if this
cleanup is wanted.

Resources