Erlang dead letter queue - erlang

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.

There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

Related

How do Erlang/Akka etc. send messages under the hood? Why doesn't it lead to deadlock?

Message sending is a useful abstraction, but it seems to be a bit misleading because it is not like letters sent through a post box that are literally moving through the system.
Similarly in Kafka they talk about messages but really it's just reading/writing to a distributed, append-only log.
In Erlang/Akka you actually copy the data rather than 'send it' so how does this work?
I was imagining something like Alice sends a message to Bob by
acquiring a lock to Alice's queue (i.e. mailbox)
write the message to the queue
release the lock
do something else
Given that you can send a message to anyone how does this not result in a massive deadlock with processes all waiting to message Alice. It seems like it might be useful to have multiple intermediate mailboxes for popular actors so you can write to that and then go do something else faster.
The receiver is not locking its mailbox when it is waiting for a message; only when it checks it, briefly. If there is no matching message, it releases the lock and goes to sleep, then gets woken up when new messages arrive. Likewise, senders also only need to aquire the lock while inserting the message. There is never any deadlock situation on this level.
Processes may still get deadlocked because of logical errors where both are expecting a message from the other at the same time, but that's a different matter, and the message passing style makes it less likely to end up in that situation, because there is no lock management to screw up on the user level.
As you mention, yes, it is useful to have intermediate mailboxes to reduce contention (a sender can add to the incoming side of the mailbox while a receiver is holding a lock to scan through the messages arrived so far), and that optimization is handled for you under the hood by the Erlang VM.

Message sending in Erlang under the hood

Message sending in Erlang is asynchronous, meaning that a send expression such as PidB ! msg evaluated by a process PidA immediately yields the result msg without blocking the latter. Naturally, its side effect is that of sending msg to PidB.
Since this mode of message passing does not provide any message delivery guarantees, the sender must itself ascertain whether a message has been actually delivered by asking the recipient to confirm accordingly. After all, confirming whether a message has been delivered might not always be required.
This holds true in both the local and distributed cases: in the latter scenario, the sender cannot simply assume that the remote node is always available; in the local scenario, where processes live on the same Erlang node, a process may send a message to a non-existent process.
I am curious as to how the side effect portion of !, i.e, message sending, works at the VM-level when the sender and recipient processes live on the same node. In particular, I would like to know whether the sending operation completes before returning. By completes, I mean to say that for the specific case of local processes, the sender: (i) acquires a lock on the message queue of the recipient, (ii) writes the message directly into its queue, (iii) releases the lock and, (iv) finally returns.
I came across this post which I did not fully understand, although it seems to indicate that this could be the case.
Erik Stenman's The Beam Book, which explains many implementation details of the Erlang VM, answers your question in great detail in its "Lock Free Message Passing" section. The full answer is too long to copy here, but the short answer to your question is that yes, the sending process completely copies its message to a memory area accessible to the receiver. If you consult the book you'll find that it's more complicated than steps i-iv you describe in your question due to issues such as different send flags, whether locks are already taken by other processes, multiple memory areas, and the state of the receiving process.

Erlang message processing transaction

When is the "transaction" of a process trying to fetch a message from its message queue considered to be committed or rolled back? In other words, at what point of execution is the message removed permanently from the message queue?
When it is read by a receive call.
If a message is in a message queue and read by the process calling receive then it's just memory manipulation, and no other process can contend for the data so there's no transactional nature to it as such; there's no need for locking or rolling back, etc, but because it's just memory manipulation it doesn't matter.
The language you use makes me worry you think there are more guarantees than there are. It's important to remember that at the fundamental message send and receive level (without any extra layer on top that OTP might provide or you might write yourself) you are sending messages without any guarantee they will be delivered, or that the process you are sending to even exists.

Unordered socket read & close notification using IOCP

Most server framework/examples using sockets and I/O completion ports makes notifications in a way I couldn't completely figure out the purpose.
Upon read packets are processed, usually they are reordered to circumvent thread scheduling issues processing packets out of order no matter IOCP ensure a FIFO queue.
The problem is when a socket is closed gracefully or by an error. I saw in both situation, and again by the o.s. thread scheduler, the close notification may be sent to the application (i.e. http server using the framework) "before" the notification of data previously readed.
I think that the close notification should be queued in such way so the application receives it after previous reads.
Is there any intended use in most code I saw or my behavior may be correct depending on the situation?
What you suggest makes sense and I would imagine that any code that handles graceful close (a read returning 0 bytes) would do so by processing it after any proceeding successful read. Errors coming out of GetQueuedCompletionStatus(), such as connection reset errors, etc, are harder to integrate into the receive flow as they occur out of band as far as the receive data is concerned. Your question's a bit vague and depends very much on the code you're using and how you (or the people who wrote that code) want to handle these things. There is no single correct way, IMHO.

What happens in Erlang if return receipt never arrives?

I just happened to read the thesis of Joe Armstrong and don't have much prior knowledge of Erlang. I wonder what happens if a delivery receipt for some message never arrives. What does the sending actor do? It sends the message another time? This could confuse the recipient actor when it receives the same message another time. It has to be able to tell that its receipt was not received and therefore the second message is void.
That kind of problems always kept me away from solutions where message delivery is not transactional. I think I know the answer: the sending actor tells its supervising actor that something must be wrong when it didn't obtain a receipt in reasonable time causing the supervisor to take some action (like restarting the involed actors or something). Is this correct? I see no other solution that doesn't result in theroretically possible infinite message sends.
Thanks for any answer,
Oliver
In Erlang, the sender of a message usually forget it immediately after sending it and continue its job. if an application need an acknowledge of the message reception, you have to build your own protocol (or use an existing one). There are many good reason for that.
One is that most of the time it is not necessary to have this handshake. The higher risk for a message to be ignored is that the receiving process does not exist anymore, or died in the mean time, and the sender in this case has very few chance to do some interesting stuff.
Also, the hand shake is a blocking action, so there is a performance impact, and a risk of deadlock.
The acknowledge should be also a message, but this one should not be acknowledged, otherwise you create a never ending loop of message. Only the application could know what to do (for example using a send with acknowledge or not) and it is really easy to write this kind of function (or use a behavior that implement it). For example:
send_with_ack(To,Mess,TimeOut,Ack) ->
Ref = make_ref(),
To ! {Mess,self(),Ref},
receive
{Ack,Ref} -> Ack
after Timeout ->
{error,timeout}
end.
receiving_process() ->
...
receive
{Pattern_matching_Mess,From,Ref} ->
do_something(),
From ! {Ack,Ref}, %% Ack for this kind of message is known by the receiver
do_somethingelse();
Mess1 -> do_otherthing()
end,
...
with little work, it is even possible to delegate the survey of message delivery to a new process - not blocking check - and using linked process, force a crash of the sender if the timeout is reached.

Resources