How to enhance the flow to handle the duplicate requests and instead of moving to BO queue? - messagebroker

How to enhance the flow to handle the duplicate requests and instead of moving to BO queue? We have multiple cases receiving on daily basis, where the customer is sending multiple requests within seconds and moving to BO incase of duplicate request.

The fact that the message is being backed out indicates to me that your flow is already somehow detecting the duplicate and must be throwing an exception somewhere.
If you can identify this particular exception then instead of throwing back all the way to the input node and initiating a rollback you can handle the exception in a branch of the flow wired up to the catch terminal. For example perhaps by logging the message id to a log of duplicates or similar.
The tricky part will be ensuring that you can adequately identify the exception so that you are not doing the wrong failure processing for other types of failures.

Related

How to handle errors during asynchronous response delivery from SMSR back to SMDP/Operator?

In many scenarios the response with the result of the operation execution is delivered asynchronously to the operation initiator (SMDP or Operator). For example step (13) in 3.3.1 of SGP.02 v4.2:
(13) The SM-SR SHALL return the response to the “ES3.EnableProfile” function to SM-DP, indicating that the Profile has been enabled
It is not clear how SMSR should act if the call that contains the result of the operation fails. Should SMSR retry such call all the time or it is ok to try just once and give up after that? Does this depend on the type of error that happened during such call?
I'm concerned about the cases when the result is sent and may have been processed by the initiator but the information about that was not properly delivered back to SMSR. In order for SMSR to be required to retry the initiator should be ready to receive the same operation result status again and process it accordingly that is ignore and just acknowledge.
But I can't see anything in the SGP02 v4.2 that specifies what the behaviour of SMSR and SMDP should be in this case. Any pointers to the documentation specifying this are much appreciated.
In general it is not clear how the rollback to a valid know state should happen in this situation. Who is responsible for that (SMSR or SMDP in this example of profile enabling)?
I'm not aware of any part of the specification defining this. Neither in SGP.02, SGP.01 and the test specification SGP.11. There are operational requirements in the SAS certification for a continuous service. But this is not technically defined.
I have experience in implementing the specification. The approach was a message queue with Kafka and a retry policy. The specification says SHALL, which means try very hard. Any implementation dropping the message after a single try is not very quality oriented. The common sense in distributed (micro service) based systems is that there are failures which have to be handled, so this assumption was taken without being expressed in the SGP specification.
The example of the status of a profile should be idempotent, sending a message twice should not be harmful. The MessageID and RelatesTo is also useful here. I assume for auditing the request / responses are recorded anyway in your system.
In case you are sitting at the other end and are facing a badly implemented SM-SR and nt status message arrives, the ES3.GetEIS can be called by the SM-DP later to get the current status.
I have already contacted the authors directly. At the end of the document the email is mentioned:
It is our intention to provide a quality product for your use. If you
find any errors or omissions, please contact us with your comments.
You may notify us at prd#gsma.com

Esper 8.2 statement stops matching events

In my application I have about 100 continuous Esper filter queries with events being sent in. At some point for an unknown reason some of statemens stop matching events and never match any further event without ever throwing an exception (nothing logged in log4j default logging setup). This is not reproducible in a small example, and I realize that it's difficult to pinpoint a problem like this, but I'm writing this in the hopes of this being a known and/or fixed issue.
I would suggest to review your application code to make sure it is still sending events and the listener/subscriber code to make sure it is still processing output events, i.e. exception handling and logging. Or perhaps an OOM occurs and logging doesn't happen and thus you may want to check heap memory use. Also look at console out see if the JVM has encountered an issue.

Reactor - Overflow Strategy IGNORE vs ERROR

What is the significance of these 2 below overflow strategy? Exceptions are different. But in both cases when subscribers can not keep up, they are notified with an error call.
In which case we should choose one over the other!
FluxSink.OverflowStrategy.ERROR
FluxSink.OverflowStrategy.IGNORE
I found this from the documentation. But does not provide more info.
IGNORE to Completely ignore downstream backpressure requests. This may yield IllegalStateException when queues get full downstream.
ERROR to signal an IllegalStateException when the downstream can’t keep up.
downstream cannot keep up sounds a lot like queue gets full - so for me both look same.
IGNORE will ignore the requests from the downstream and simply push the items.
ERROR will check the request and fail if there is not enough demand.

How to handle split-brain?

I have read in Orleans FAQ when split-brain could happen but I don't understand what bad can happen and how to handle it properly.
FAQ says something vague like:
You just need to consider the rare possibility of having two instances of an actor while writing your application.
But how actually should I consider this and what can happen if I won't?
Orleans Paper (http://research.microsoft.com/pubs/210931/Orleans-MSR-TR-2014-41.pdf) says this:
application can rely on external persistent
storage to provide stronger data consistency
But I don't understand what this means.
Suppose split brain happened. Now I have two instances of one grain. When I'll send a few messages they could be received by these two (or there can be even more?) different instances. Suppose each instance prior to receiving these messages had same state. Now, after processing these messages they have different states.
How they should persist their states? There could be a conflict.
When another instances will be destroyed and only one will remain what will happen to the states of destroyed instances? It'll be like messages processed by them has never been processed? Then client state and server state could be desyncronized IIUC.
I see this (split-brain) as a big problem and I don't understand why there is so little attention to it.
Orleans leverages the consistency guarantees of the storage provider. When you call this.WriteStateAsync() from a grain, the storage provider ensures that the grain has seen all previous writes. If it has not, an exception is thrown. You can catch that exception and call DeactivateOnIdle() and rethrow the exception or call ReadStateAsync() and retry. So if you have 2 grains during a split-brain scenario, which ever one calls WriteStateAsync() first prevents the other one from writing state without first having read the most up-to-date state.
Update: Starting in Orleans v1.5.0, a grain which allows an InconsistentStateException to be thrown back to the caller will automatically be deactivated when the currently executing calls complete. A grain can catch and handle the exception to avoid automatic deactivation.

Erlang dead letter queue

Let's say my Erlang application receives an important message from the outside (through an exposed API endpoint, for example). Due to a bug in the application or an incorrectly formatted message the process handling the message crashes.
What happens to the message? How can I influence what happens to the message? And what happens to the other messages waiting in the process mailbox? Do I have to introduce a hierarchy of processes just to make sure that no messages are lost?
Is there something like Akka's dead letter queue in Erlang? Let's say I want to handle the message later - either by fixing the message or fixing the bug in the application itself, and then rerunning the message processing.
I am surprised how little information about this topic is available.
There is no information because there is no dead letter queue, if you application crashed while processing your message the message would be already received, why would it go on a dead letter queue (if one existed).
Such a queue would be a major scalability issue with not much use (you would get arbitrary messages which couldn't be sent and would be totally out of context)
If you need to make sure a message is processed you usually use a way to get a reply back when the message is processed like a gen_server call.
And if your messages are such important that it would be a catastrophe if lost you should probably persist it in a external DB, because otherwise if your computer crashes what would happen to all the messages in transit?

Resources