Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it - amazon-sqs

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal

You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

Related

Is it sufficient to set ROS publisher buffer to 1 and Subscriber buffer to 1000 and still not loose any messages

I am trying to understand subscriber and publisher buffers. If I set subsrciber buffer to 1000 and publisher buffer to 1, are there any chances that I loose messages ? Could anyone please explain me the same?
Yes, in theory you may lose messages with these settings, in practice it depends.
Theory: spinner threads
On both sides, publisher as well as subscriber, there are so called spinner threads responsible for handling the callbacks (for message sending on the publisher side and message evaluation on the subscriber-side). These spinner threads are working in parallel to the main thread. If messages are arriving faster from the main thread than they are being processed by the spinner thread, the number of messages given by the queue size will be buffered up before beginning to throw away the oldest ones. Therefore if you publish at a very high rate the publisher-sided spinner thread might drop older messages, while if your callback function on the subscriber side takes too long to execute your subscriber queue will start dropping messages. To improve this one can use multi-threaded spinners where one increases the number of spinner threads and activate concurrency in order to process the callback queue more quickly. Read more about it here.
Practice: Choosing the queue size
The queue size of the publisher queue you should set depends on which rate you publish and if you publish in bursts. If you publish in bursts or at higher frequencies (e.g. > 10 Hz) a publisher queue size of 1 won't be sufficient. On the subscriber side it is harder to give recommendations as it also depends on how long the callback takes to process the information.
It is actually also possible to set the value 0 for the queues which results in an arbitrarily large queue but this might be problematic as the required memory might grow indefinitely, well at least until your computer freezes. Furthermore having a large queue size might often be disadvantageous: If you set a large queue and the callback takes long to execute you might be working on very outdated data while the queue gets longer and longer.
Alternative communication patterns
If you want to guarantee that information is actually being processed (e.g. real-time or safety-relevant information) ROS topics are probably the wrong choice. Depending on what precisely you need the other two communication methods services or actions might be an alternative. But for things like large information streams of safety-relevant real-time data there are no perfect communication mechanisms in ROS1.

How do Erlang/Akka etc. send messages under the hood? Why doesn't it lead to deadlock?

Message sending is a useful abstraction, but it seems to be a bit misleading because it is not like letters sent through a post box that are literally moving through the system.
Similarly in Kafka they talk about messages but really it's just reading/writing to a distributed, append-only log.
In Erlang/Akka you actually copy the data rather than 'send it' so how does this work?
I was imagining something like Alice sends a message to Bob by
acquiring a lock to Alice's queue (i.e. mailbox)
write the message to the queue
release the lock
do something else
Given that you can send a message to anyone how does this not result in a massive deadlock with processes all waiting to message Alice. It seems like it might be useful to have multiple intermediate mailboxes for popular actors so you can write to that and then go do something else faster.
The receiver is not locking its mailbox when it is waiting for a message; only when it checks it, briefly. If there is no matching message, it releases the lock and goes to sleep, then gets woken up when new messages arrive. Likewise, senders also only need to aquire the lock while inserting the message. There is never any deadlock situation on this level.
Processes may still get deadlocked because of logical errors where both are expecting a message from the other at the same time, but that's a different matter, and the message passing style makes it less likely to end up in that situation, because there is no lock management to screw up on the user level.
As you mention, yes, it is useful to have intermediate mailboxes to reduce contention (a sender can add to the incoming side of the mailbox while a receiver is holding a lock to scan through the messages arrived so far), and that optimization is handled for you under the hood by the Erlang VM.

Speed up the proces of requesting messages from SQS

We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.

Is it guaranteed that mnesia event listeners will get each state of a record, if it changes fast?

Let's say I have some record like {my_table, Id, Value}.
I constantly overwrite the value so that it holds consecutive integers like 1, 2, 3, 4, 5 etc.
In a distributed environment, is it guaranteed that my event listeners will receive all of the values? (I don't care about ordering)
I haven't verified this by reading that part of the source yet, but it appears that sending a message out is part of the update process, so messages should always come out, even on very fast changes. (The alternative would be for Mnesia to either queue messages or queue changes and run them in batches. I'm almost positive this is not what happens -- it would be too hard to predict the variability of advantageous moments to start batching jobs or queueing messages. Sending messages is generally much cheaper than making a change in the db.)
Since Erlang guarantees delivery of messages to a live destination this is as close to a promise that every Mnesia change will eventually be seen as you're likely to get. The order of messages couldn't be guaranteed on the receiving end (as it appears you expect), and of course a network failure could make a set of messages get missed (rendering the destination something other than live from the perspective of the sender).

Amazon SQS End of Queue Detection

I was wondering if there was a best practice for notifying the end of an sqs queue. I am spawning a bunch of generic workers to consume data from a queue and I want to notify them that they can stop processing once they detect no more messages in the queue. Does sqs provide this type of feature?
By looking at the right_aws ruby gem source code for SQS I found that there is the ApproximateNumberOfMessages attribute on a queue. Which you can request using a standard API call.
You can find more information including examples here:
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/APIReference/Query_QueryGetQueueAttributes.html
For more information on how to do this using the right_aws gem in ruby look at:
https://github.com/rightscale/right_aws/blob/master/lib/sqs/right_sqs_gen2_interface.rb#L187
https://github.com/rightscale/right_aws/blob/master/lib/sqs/right_sqs_gen2_interface.rb#L389
Do you mean "is there a way for the producer to notify consumers that it has finished sending messages?" . If so, then no there isn't. If a consumer calls "ReceiveMessage" and gets nothing back, or "ApproximateNumberOfMessages" returns zero, that's not a guarantee that no more messages will be sent or even that there are no messages in flight. And the producer can't send any kind of "end of stream" message because only one consumer will receive it, and it might arrive out of order. Even if you used a separate notification mechanism such as an SNS topic to notify all consumers, there's no guarantee that the SNS notification won't arrive before all the messages have been delivered.
But if you just want your pool of workers to back off when there are no messages left in the queue, then consider setting the "ReceiveMessageWaitTimeSeconds" property on your queue to its maximum value of 20 seconds. When there are no more messages to process, a ReceiveMessage call will block for up to 20s to see if a message arrives instead of returning immediately.
You could have whatever's managing your thread pool query ApproximateNumberOfMessages to regularly scale/up down your thread pool if you're concerned about releasing resources. If you do, then beware that the number you get back is Approximate, and you should always assume there may be one or more messages left on the queue even if ApproximateNumberOfMessages returns zero.

Resources