Why won't SQS throw an error when maximum in-flight messages is reached for a FIFO queue? - amazon-sqs

The approximate maximum number of in-flight messages for an SQS Standard queue is 120'000. When this limit is reached the OverLimit error message is returned. 1
But no error message is returned for FIFO queues in that case (limit here being 20'000 in-flight messages). 1
Why is that so?

I don't think there's going to be an objective answer here, other than "it was an architectural decision."
The in-flight limit is something you should essentially never encounter -- it's only applicable to messages that have been delivered to consumers, not deleted, and not past visibility timeout.
The OverLimit error is only applicable to receiving messages -- not sending them. You can still send messages to either type of queue when it's in this state, you just can't receive them.
Presumably, FIFO treats this as an ordinary "no messages available" situation so that the consumer is able to continue long polling as normal rather than seeing an exception, which would increase the workload on the FIFO queue -- which has a 300 transaction per second limit that is not applicable to non-FIFO queues. The 300 trx/sec limit includes any combination of send, receive, and/or delete, with each transaction batching up to 10 messages, and appears to be a limit related to the overhead required for coordinating exactly-once, in-order delivery. You would not want a consumer seeing exceptions to increase the workload (and reduce the throughput) on the FIFO queue by continuously retrying, when something has already gone awry (as already evidenced by 20K in-flight messages).

Related

Is it sufficient to set ROS publisher buffer to 1 and Subscriber buffer to 1000 and still not loose any messages

I am trying to understand subscriber and publisher buffers. If I set subsrciber buffer to 1000 and publisher buffer to 1, are there any chances that I loose messages ? Could anyone please explain me the same?
Yes, in theory you may lose messages with these settings, in practice it depends.
Theory: spinner threads
On both sides, publisher as well as subscriber, there are so called spinner threads responsible for handling the callbacks (for message sending on the publisher side and message evaluation on the subscriber-side). These spinner threads are working in parallel to the main thread. If messages are arriving faster from the main thread than they are being processed by the spinner thread, the number of messages given by the queue size will be buffered up before beginning to throw away the oldest ones. Therefore if you publish at a very high rate the publisher-sided spinner thread might drop older messages, while if your callback function on the subscriber side takes too long to execute your subscriber queue will start dropping messages. To improve this one can use multi-threaded spinners where one increases the number of spinner threads and activate concurrency in order to process the callback queue more quickly. Read more about it here.
Practice: Choosing the queue size
The queue size of the publisher queue you should set depends on which rate you publish and if you publish in bursts. If you publish in bursts or at higher frequencies (e.g. > 10 Hz) a publisher queue size of 1 won't be sufficient. On the subscriber side it is harder to give recommendations as it also depends on how long the callback takes to process the information.
It is actually also possible to set the value 0 for the queues which results in an arbitrarily large queue but this might be problematic as the required memory might grow indefinitely, well at least until your computer freezes. Furthermore having a large queue size might often be disadvantageous: If you set a large queue and the callback takes long to execute you might be working on very outdated data while the queue gets longer and longer.
Alternative communication patterns
If you want to guarantee that information is actually being processed (e.g. real-time or safety-relevant information) ROS topics are probably the wrong choice. Depending on what precisely you need the other two communication methods services or actions might be an alternative. But for things like large information streams of safety-relevant real-time data there are no perfect communication mechanisms in ROS1.

Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

SQS - How far out of order might messages be delivered?

I have a use case for SQS where I'll be sending messages about specific objects within a system. Each object will have a message at most every 20 seconds, and there are hundreds of thousands (potentially millions) of objects, which means I'll be handling tens of thousands (potentially hundreds of thousands) of messages per second. The volume of messages precludes using FIFO queues.
Most of the time, I don't care about in-order messaging. If messages for two different objects get delivered in a different order than they were emitted, that's fine. What could potentially be a problem is if two messages relating to the same object were delivered out of order.
Given that each object would only have events every 20 seconds, and 20 seconds is an eternity in computing time, it strikes me that it would be very unlikely for two messages sent 20 seconds apart (with potentially millions of messages between them) to be delivered out of order. That said, I haven't been able to find any hard data about out-of-order delivery with SQS. I know it's a thing that can happen, but I haven't seen any measured data about it.
I'm wondering if there is any kind of measured data on the probability that a message gets delivered X amount of time out of order, or X messages out of order.
SQS makes no guarantee about how far out of order a message can appear for a non-FIFO queue.
The most-related measurement I've seen to what your looking for is this experiment that found processing times for a message to be available for polling after it has been submitted to the queue. They also have a link to source code if you want to replicate the experiment to gather your own metrics.
If you absolutely must have them in the original order, you have a few options. They're not necessarily good options, but they are options.
Determine a way to horizontally partition your object IDs into n buckets, and use n different FIFO queues. (Probably the best option.)
Add your own sequence numbers to the messages.
Partition your messages into queues based on the current time. Drain each queue in order. (For example, you might publish to a single queue for only 4 seconds, and rotate sequentially through a group of 15 queues.)
Use a database and store the message timestamps in way that allows you to get the oldest message.

ActiveMQ Start dropping messages from Queue after memory limit is exceed

In our project i need to push messages to ActiveMQ and keep them persistant. When i send new message and memory limit is exceed oldest message in queue should be dropped/removed from queue or replaced with new one.
I do not want to clear whole queue, queue works like fail safe message backlog for our product so i need to keep last x amount of messages in the queue.
I have tried to look from google and no luck so far.
Here is my policy settings.xml
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" producerFlowControl="false" memoryLimit="5mb" >
<messageEvictionStrategy>
<oldestMessageEvictionStrategy/>
</messageEvictionStrategy>
<pendingMessageLimitStrategy>
<constantPendingMessageLimitStrategy limit="100"/>
</pendingMessageLimitStrategy>
</policyEntry>
</policyEntries>
</policyMap>
</destinationPolicy>
The eviction policy object only apply to Topics, you cannot use them on Queues as the service contract of a Queue is that it stores all messages until they are either consumed or their lifetime expires via a set TTL value. The broker can store messages on the Queue to disk and thereby remove them from memory but for Topics the contract is looser and the eviction policies allow the messages that are in memory waiting to be dispatched to a Topic consumer to be dropped.
You can only control the lifetime of messages in the Queue via a TTL value.
You can not remove persistent messages from disk until and unless we delete it or consume it. You can enable producerFlowControl to throttle producer so that it will accept new menage after consumption of old message from queue or as Tim suggested set TTL on message.

Amazon SQS End of Queue Detection

I was wondering if there was a best practice for notifying the end of an sqs queue. I am spawning a bunch of generic workers to consume data from a queue and I want to notify them that they can stop processing once they detect no more messages in the queue. Does sqs provide this type of feature?
By looking at the right_aws ruby gem source code for SQS I found that there is the ApproximateNumberOfMessages attribute on a queue. Which you can request using a standard API call.
You can find more information including examples here:
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/APIReference/Query_QueryGetQueueAttributes.html
For more information on how to do this using the right_aws gem in ruby look at:
https://github.com/rightscale/right_aws/blob/master/lib/sqs/right_sqs_gen2_interface.rb#L187
https://github.com/rightscale/right_aws/blob/master/lib/sqs/right_sqs_gen2_interface.rb#L389
Do you mean "is there a way for the producer to notify consumers that it has finished sending messages?" . If so, then no there isn't. If a consumer calls "ReceiveMessage" and gets nothing back, or "ApproximateNumberOfMessages" returns zero, that's not a guarantee that no more messages will be sent or even that there are no messages in flight. And the producer can't send any kind of "end of stream" message because only one consumer will receive it, and it might arrive out of order. Even if you used a separate notification mechanism such as an SNS topic to notify all consumers, there's no guarantee that the SNS notification won't arrive before all the messages have been delivered.
But if you just want your pool of workers to back off when there are no messages left in the queue, then consider setting the "ReceiveMessageWaitTimeSeconds" property on your queue to its maximum value of 20 seconds. When there are no more messages to process, a ReceiveMessage call will block for up to 20s to see if a message arrives instead of returning immediately.
You could have whatever's managing your thread pool query ApproximateNumberOfMessages to regularly scale/up down your thread pool if you're concerned about releasing resources. If you do, then beware that the number you get back is Approximate, and you should always assume there may be one or more messages left on the queue even if ApproximateNumberOfMessages returns zero.

Resources