SQS - How far out of order might messages be delivered? - amazon-sqs

I have a use case for SQS where I'll be sending messages about specific objects within a system. Each object will have a message at most every 20 seconds, and there are hundreds of thousands (potentially millions) of objects, which means I'll be handling tens of thousands (potentially hundreds of thousands) of messages per second. The volume of messages precludes using FIFO queues.
Most of the time, I don't care about in-order messaging. If messages for two different objects get delivered in a different order than they were emitted, that's fine. What could potentially be a problem is if two messages relating to the same object were delivered out of order.
Given that each object would only have events every 20 seconds, and 20 seconds is an eternity in computing time, it strikes me that it would be very unlikely for two messages sent 20 seconds apart (with potentially millions of messages between them) to be delivered out of order. That said, I haven't been able to find any hard data about out-of-order delivery with SQS. I know it's a thing that can happen, but I haven't seen any measured data about it.
I'm wondering if there is any kind of measured data on the probability that a message gets delivered X amount of time out of order, or X messages out of order.

SQS makes no guarantee about how far out of order a message can appear for a non-FIFO queue.
The most-related measurement I've seen to what your looking for is this experiment that found processing times for a message to be available for polling after it has been submitted to the queue. They also have a link to source code if you want to replicate the experiment to gather your own metrics.
If you absolutely must have them in the original order, you have a few options. They're not necessarily good options, but they are options.
Determine a way to horizontally partition your object IDs into n buckets, and use n different FIFO queues. (Probably the best option.)
Add your own sequence numbers to the messages.
Partition your messages into queues based on the current time. Drain each queue in order. (For example, you might publish to a single queue for only 4 seconds, and rotate sequentially through a group of 15 queues.)
Use a database and store the message timestamps in way that allows you to get the oldest message.

Related

Is it sufficient to set ROS publisher buffer to 1 and Subscriber buffer to 1000 and still not loose any messages

I am trying to understand subscriber and publisher buffers. If I set subsrciber buffer to 1000 and publisher buffer to 1, are there any chances that I loose messages ? Could anyone please explain me the same?
Yes, in theory you may lose messages with these settings, in practice it depends.
Theory: spinner threads
On both sides, publisher as well as subscriber, there are so called spinner threads responsible for handling the callbacks (for message sending on the publisher side and message evaluation on the subscriber-side). These spinner threads are working in parallel to the main thread. If messages are arriving faster from the main thread than they are being processed by the spinner thread, the number of messages given by the queue size will be buffered up before beginning to throw away the oldest ones. Therefore if you publish at a very high rate the publisher-sided spinner thread might drop older messages, while if your callback function on the subscriber side takes too long to execute your subscriber queue will start dropping messages. To improve this one can use multi-threaded spinners where one increases the number of spinner threads and activate concurrency in order to process the callback queue more quickly. Read more about it here.
Practice: Choosing the queue size
The queue size of the publisher queue you should set depends on which rate you publish and if you publish in bursts. If you publish in bursts or at higher frequencies (e.g. > 10 Hz) a publisher queue size of 1 won't be sufficient. On the subscriber side it is harder to give recommendations as it also depends on how long the callback takes to process the information.
It is actually also possible to set the value 0 for the queues which results in an arbitrarily large queue but this might be problematic as the required memory might grow indefinitely, well at least until your computer freezes. Furthermore having a large queue size might often be disadvantageous: If you set a large queue and the callback takes long to execute you might be working on very outdated data while the queue gets longer and longer.
Alternative communication patterns
If you want to guarantee that information is actually being processed (e.g. real-time or safety-relevant information) ROS topics are probably the wrong choice. Depending on what precisely you need the other two communication methods services or actions might be an alternative. But for things like large information streams of safety-relevant real-time data there are no perfect communication mechanisms in ROS1.

Speed up the proces of requesting messages from SQS

We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.

Parse large and multiple fetches for statistics

In my app I want to retrieve a large amount of data from Parse to build a view of statistics. However, in future, as data builds up, this may be a huge amount.
For example, 10,000 results. Even if I fetched in batches of 1000 at a time, this would result in 10 fetches. This could rapidly, send me over the 30 requests per second limitation by Parse. Specifically when several other chunks of data may need to be collected at the same time for other stats.
Any recommendations/tips/advice for this scenario?
You will also run into limits with the skip and limit query variables. And heavy weight lifting on a mobile device could also present issues for you.
If you can you should pre-aggregate these statistics, perhaps once per day, so that you can simply directly request the details.
Alternatively, create a cloud code function to do some processing for you and return the results. Again, you may well run into limits here, so a cloud job may meed your needs better, and then you may need to effectively create a request object which is processed by the job and then poll for completion or send out push notifications on completion.

Is it guaranteed that mnesia event listeners will get each state of a record, if it changes fast?

Let's say I have some record like {my_table, Id, Value}.
I constantly overwrite the value so that it holds consecutive integers like 1, 2, 3, 4, 5 etc.
In a distributed environment, is it guaranteed that my event listeners will receive all of the values? (I don't care about ordering)
I haven't verified this by reading that part of the source yet, but it appears that sending a message out is part of the update process, so messages should always come out, even on very fast changes. (The alternative would be for Mnesia to either queue messages or queue changes and run them in batches. I'm almost positive this is not what happens -- it would be too hard to predict the variability of advantageous moments to start batching jobs or queueing messages. Sending messages is generally much cheaper than making a change in the db.)
Since Erlang guarantees delivery of messages to a live destination this is as close to a promise that every Mnesia change will eventually be seen as you're likely to get. The order of messages couldn't be guaranteed on the receiving end (as it appears you expect), and of course a network failure could make a set of messages get missed (rendering the destination something other than live from the perspective of the sender).

Use it for JSON data transfer

I am trying to use RabbitMQ for a distributed system that would work something like:
a producer puts in a queue a JSON-formatted list of order ids
several consumers pull out of that queue, do the business logic with that order ids and the result (JSON formatted) as well is put back into another queue
from the second queue, another consumer will take the data and pass it back to the caller
I am still very new to RabbitMQ and I am wondering if this model is the right approach, given the fact that the data should be back as fast as possible (sometimes in the matter of seconds, max 5) so there are real time requirements.
Also, how large can the message passed to a queue can be? The JSON that the producer will get back will be fairly large, based on what the consumer does.
Thanks for any ideas!
See page 47 in this presentation (InfoQ) for a great comparision between different messaging formats.
There's nothing wrong with the design you suggested.
The slight wrinkle is that enforcing "real time requirements" isn't straightforward. For instance, it's not currently possible to expire messages within a queue, so this would need to be handled by the clients when consuming messages.
The total size of messages in RabbitMQ <=1.8.1 was bounded by the amount of available RAM. As of 2.0.0, it's bounded by the amount of available disk space (i.e. rabbit will page messages to disk if it's running low on memory). Individual message sizes are recorded as 32-bit integers (IIRC), so individual messages cannot be larger than ~4GB; if this is a problem, consider saving the JSONs to network storage and passing some ID to them in the messages. Other than this, there aren't any constraints.

Resources