How do Erlang/Akka etc. send messages under the hood? Why doesn't it lead to deadlock? - erlang

Message sending is a useful abstraction, but it seems to be a bit misleading because it is not like letters sent through a post box that are literally moving through the system.
Similarly in Kafka they talk about messages but really it's just reading/writing to a distributed, append-only log.
In Erlang/Akka you actually copy the data rather than 'send it' so how does this work?
I was imagining something like Alice sends a message to Bob by
acquiring a lock to Alice's queue (i.e. mailbox)
write the message to the queue
release the lock
do something else
Given that you can send a message to anyone how does this not result in a massive deadlock with processes all waiting to message Alice. It seems like it might be useful to have multiple intermediate mailboxes for popular actors so you can write to that and then go do something else faster.

The receiver is not locking its mailbox when it is waiting for a message; only when it checks it, briefly. If there is no matching message, it releases the lock and goes to sleep, then gets woken up when new messages arrive. Likewise, senders also only need to aquire the lock while inserting the message. There is never any deadlock situation on this level.
Processes may still get deadlocked because of logical errors where both are expecting a message from the other at the same time, but that's a different matter, and the message passing style makes it less likely to end up in that situation, because there is no lock management to screw up on the user level.
As you mention, yes, it is useful to have intermediate mailboxes to reduce contention (a sender can add to the incoming side of the mailbox while a receiver is holding a lock to scan through the messages arrived so far), and that optimization is handled for you under the hood by the Erlang VM.

Related

Message sending in Erlang under the hood

Message sending in Erlang is asynchronous, meaning that a send expression such as PidB ! msg evaluated by a process PidA immediately yields the result msg without blocking the latter. Naturally, its side effect is that of sending msg to PidB.
Since this mode of message passing does not provide any message delivery guarantees, the sender must itself ascertain whether a message has been actually delivered by asking the recipient to confirm accordingly. After all, confirming whether a message has been delivered might not always be required.
This holds true in both the local and distributed cases: in the latter scenario, the sender cannot simply assume that the remote node is always available; in the local scenario, where processes live on the same Erlang node, a process may send a message to a non-existent process.
I am curious as to how the side effect portion of !, i.e, message sending, works at the VM-level when the sender and recipient processes live on the same node. In particular, I would like to know whether the sending operation completes before returning. By completes, I mean to say that for the specific case of local processes, the sender: (i) acquires a lock on the message queue of the recipient, (ii) writes the message directly into its queue, (iii) releases the lock and, (iv) finally returns.
I came across this post which I did not fully understand, although it seems to indicate that this could be the case.
Erik Stenman's The Beam Book, which explains many implementation details of the Erlang VM, answers your question in great detail in its "Lock Free Message Passing" section. The full answer is too long to copy here, but the short answer to your question is that yes, the sending process completely copies its message to a memory area accessible to the receiver. If you consult the book you'll find that it's more complicated than steps i-iv you describe in your question due to issues such as different send flags, whether locks are already taken by other processes, multiple memory areas, and the state of the receiving process.

Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

Erlang message processing transaction

When is the "transaction" of a process trying to fetch a message from its message queue considered to be committed or rolled back? In other words, at what point of execution is the message removed permanently from the message queue?
When it is read by a receive call.
If a message is in a message queue and read by the process calling receive then it's just memory manipulation, and no other process can contend for the data so there's no transactional nature to it as such; there's no need for locking or rolling back, etc, but because it's just memory manipulation it doesn't matter.
The language you use makes me worry you think there are more guarantees than there are. It's important to remember that at the fundamental message send and receive level (without any extra layer on top that OTP might provide or you might write yourself) you are sending messages without any guarantee they will be delivered, or that the process you are sending to even exists.

Erlang dead letter queue

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.
There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

What happens in Erlang if return receipt never arrives?

I just happened to read the thesis of Joe Armstrong and don't have much prior knowledge of Erlang. I wonder what happens if a delivery receipt for some message never arrives. What does the sending actor do? It sends the message another time? This could confuse the recipient actor when it receives the same message another time. It has to be able to tell that its receipt was not received and therefore the second message is void.
That kind of problems always kept me away from solutions where message delivery is not transactional. I think I know the answer: the sending actor tells its supervising actor that something must be wrong when it didn't obtain a receipt in reasonable time causing the supervisor to take some action (like restarting the involed actors or something). Is this correct? I see no other solution that doesn't result in theroretically possible infinite message sends.
Thanks for any answer,
Oliver
In Erlang, the sender of a message usually forget it immediately after sending it and continue its job. if an application need an acknowledge of the message reception, you have to build your own protocol (or use an existing one). There are many good reason for that.
One is that most of the time it is not necessary to have this handshake. The higher risk for a message to be ignored is that the receiving process does not exist anymore, or died in the mean time, and the sender in this case has very few chance to do some interesting stuff.
Also, the hand shake is a blocking action, so there is a performance impact, and a risk of deadlock.
The acknowledge should be also a message, but this one should not be acknowledged, otherwise you create a never ending loop of message. Only the application could know what to do (for example using a send with acknowledge or not) and it is really easy to write this kind of function (or use a behavior that implement it). For example:
send_with_ack(To,Mess,TimeOut,Ack) ->
Ref = make_ref(),
To ! {Mess,self(),Ref},
receive
{Ack,Ref} -> Ack
after Timeout ->
{error,timeout}
end.
receiving_process() ->
...
receive
{Pattern_matching_Mess,From,Ref} ->
do_something(),
From ! {Ack,Ref}, %% Ack for this kind of message is known by the receiver
do_somethingelse();
Mess1 -> do_otherthing()
end,
...
with little work, it is even possible to delegate the survey of message delivery to a new process - not blocking check - and using linked process, force a crash of the sender if the timeout is reached.

Resources