Provider configuration set number of messages to prefectch - amazon-sqs

I am new to spring boot jms and i saw something like provider configuration - > set Number of messages to prefetch function, i could not find much about it and i don't have much clear idea as to what does it mean?
I have my concurrency set to 5-100. So does it mean, everytime a new consumer is spawned, this will get the number of messages, lets say if i set it 5, will it get 5 messages from queue at once?
Without using spring jms we use to make a receive call which can fetch upto 10 message, over here we have a jms listener, are these two same?
Can anyone clarify its real purpose in terms of multiple consumers, what it will do?

Yes; the prefetch applies to each consumer.
Using multiple consumers allows you to process messages is parallel to improve performance, but you can only do that if the order in which you process messages is not important.

Related

Background Tasks in Spring (AMQP)

I need to handle a time-consuming and error-prone task (e.g., invoking a SOAP endpoint that will trigger the delivery of an SMS) whenever a given endpoint of my REST API is invoked, but I'd prefer not to make my users wait for that before sending a response back. Spring AMQP is already part of my stack, so I though about leveraging it to establish a "work queue" and have a number of worker processes consuming from the queue and taking care of the "work units". I have, however, the following requirements:
A work unit is guaranteed to be delivered, and delivered to exactly one worker.
Shall a work unit fail to be completed for any reason it must get placed back in the queue so that another worker can pick it up later.
Work units survive server reboots and crashes. This is mandatory because I won't be using a DB of any kind to store them.
I know RabbitMQ and Spring AMQP can be configured in such a way that ensures these three requirements, but I've only ever used it to achieve RPC so I don't know much about anything other than that. Is there any example I might follow? What are some of the pitfalls to watch out for?
While creating queues, rabbitmq gives you two options; transient or durable. Durable messages will be available until you acknowledge them. And messages won't expire if you do not give queue a ttl. For starters you can enable rabbitmq management plugin and play around a little.
But if you really want to guarantee the safety of your messages against hard resets or hardware problems, i guess you need to use a rabbitmq cluster.
Rabbitmq Clustering and you can find high availability subject on the right side of the page.
This guy explaines how to cluster
By the way i like beanstalkd too. You can make it write messages to disk and they will be safe except disk failures.

Deploying an SQS Consumer

I am looking to run a service that will be consuming messages that are placed into an SQS queue. What is the best way to structure the consumer application?
One thought would be to create a bunch of threads or processes that run this:
def run(q, delete_on_error=False):
while True:
try:
m = q.read(VISIBILITY_TIMEOUT, wait_time_seconds=MAX_WAIT_TIME_SECONDS)
if m is not None:
try:
process(m.id, m.get_body())
except TransientError:
continue
except Exception as ex:
log_exception(ex)
if not delete_on_error:
continue
q.delete_message(m)
except StopIteration:
break
except socket.gaierror:
continue
Am I missing anything else important? What other exceptions do I have to guard against in the queue read and delete calls? How do others run these consumers?
I did find this project, but it seems stalled and has some issues.
I am leaning toward separate processes rather than threads to avoid the the GIL. Is there some container process that can be used to launch and monitor these separate running processes?
There are a few things:
The SQS API allows you to receive more than one message with a single API call (up to 10 messages, or up to 256k worth of messages, whichever limit is hit first). Taking advantage of this feature allows you to reduce costs, since you are charged per API call. It looks like you're using the boto library - have a look at get_messages.
In your code right now, if processing a message fails due to a transient error, the message won't be able to be processed again until the visibility timeout expires. You might want to consider returning the message to the queue straight away. You can do this by calling change_visibility with 0 on that message. The message will then be available for processing straight away. (It might seem that if you do this then the visibility timeout will be permanently changed on that message - this is actually not the case. The AWS docs state that "the visibility timeout for the message the next time it is received reverts to the original timeout value". See the docs for more information.)
If you're after an example of a robust SQS message consumer, you might want to check out NServiceBus.AmazonSQS (of which I am the author). (C# - sorry, I couldn't find any python examples.)

MSMQ max count message notification

We are in process of implementing msmq for the quick storage of the messages and process them in disconnected mode. Typical usage of any message broker.
One of the administration requirement is to send the automatic notification to administrator/developers if the queue messages (unprocessed) count reaches 1000.
Can it be done out of the box? If yes then how?
If no then do I need to write some windows service (or any sort of scheduler) to check the count every x-seconds?
Any suggestions or past experience is welcome..
The only (partially) built-in solution would be to set up the MSMQ Queue performance counter which gives you this information for private queues on the server.
There are a number of other solutions, including a SCOM management pack, and some third party solutions like evtools, or you could roll you own using System.Messaging.
Hope this is of help.
There's commercial solution for this - QueueMonitor.
Disclaimer: I'm the author of that software.
Edit
Few tips for this scenario:
set message's UseDeadLetterQueue to true - this way if there's any issue with delivering messages at least they won't be lost but moved to system's dead letter queue.
set message's Recoverable property to true - it does reduce performance, but for this kind of long running scenario there's too much risk that some restart or failure would loose messages which are only stored in memory.
if messages are no longer valid after some period, you can use TimeToReachQueue to automatically delete them.

how to retrieve nth item in a queue with amazon sqs and ruby

Iam sending messages to the queue and using amazon sqs queuing system in a rails application. But since the queue follows FIFO process, it will get the next items in the same fashion. Suppose if I have 100 items in a queue, how can I retrieve the 35th item from the queue and process it. As far as I know, there is no such method that amazon sqs provides for doing it. So is there any other method/workaround where I can achieve the this functionality.
There is no method to do that; SQS does not guarantee order of items in the queue due to its geographically redundant nature; it can't even guarantee FIFO. If you absolutely must process things in order, and need the ability to 'look ahead' in the queue, SQS may not be your best choice. Perhaps a custom made queue in something like DynamoDB may be work better.
SQS is designed to guarantee at-least-once delivery and does not take into account the order of messages. So the simple answer to your question on whether you can do that, is no.
A work around would depend on your use-case:
To split work among different processes handling queue messages and making sure they don't both process the same item - Different queues is one approach, or prefixing every message with an identifier denoting which process is supposed to work on it. For example, if I have 4 daemons's running, I could prefix every message in the queue with the ID of the process which should work on it - 1,2,3 or 4. Every process would only process messages with the number corresponding to it's ID.
Order of arrival is critical - In this case, you're better off not using SQS because it wasn't to be used this way. CloudAMQP is a cloud based service that is based off RabbitMQ which is a true FIFO queue and would suit this case better than SQS.

What is a good practice to achieve the "Exactly-once delivery" behavior with Amazon SQS?

According to the documentation:
Q: How many times will I receive each message?
Amazon SQS is
engineered to provide “at least once” delivery of all messages in its
queues. Although most of the time each message will be delivered to
your application exactly once, you should design your system so that
processing a message more than once does not create any errors or
inconsistencies.
Is there any good practice to achieve the exactly-once delivery?
I was thinking about using the DynamoDB “Conditional Writes” as distributed locking mechanism but... any better idea?
Some reference to this topic:
At-least-once delivery (Service Behavior)
Exactly-once delivery (Service Behavior)
FIFO queues are now available and provide ordered, exactly once out of the box.
https://aws.amazon.com/sqs/faqs/#fifo-queues
Check your region for availability.
The best solution really depends on exactly how critical it is that you not perform the action suggested in the message more than once. For some actions such as deleting a file or resizing an image it doesn't really matter if it happens twice, so it is fine to do nothing. When it is more critical to not do the work a second time I use an identifier for each message (generated by the sender) and the receiver tracks dups by marking the ids as seen in memchachd. Fine for many things, but probably not if life or money depends on it, especially if there a multiple consumers.
Conditional writes sound like a clever solution, but it has me wondering if perhaps AWS isn't such a great solution for your problem if you need a bullet proof exactly-once solution.
Another alternative for distributed locking is Redis cluster, which can also be provisioned with AWS ElasticCache. Redis supports transactions which guarantee that concurrent calls will get executed in sequence.
One of the advantages of using cache is that you can set expiration timeouts, so if your message processing fails the lock will get timed release.
In this blog post the usage of a low-latency control database like Amazon DynamoDB is also recommended:
https://aws.amazon.com/blogs/compute/new-for-aws-lambda-sqs-fifo-as-an-event-source/
Amazon SQS FIFO queues ensure that the order of processing follows the
message order within a message group. However, it does not guarantee
only once delivery when used as a Lambda trigger. If only once
delivery is important in your serverless application, it’s recommended
to make your function idempotent. You could achieve this by tracking a
unique attribute of the message using a scalable, low-latency control
database like Amazon DynamoDB.
In short - we can put item or update item in dynamodb table with condition expretion attribute_not_exists(for put) or if_not_exists(for update), please check example here
https://stackoverflow.com/a/55110463/9783262
If we get an exception during put/update operations, we have to return from a lambda without further processing, if not get it then process the message (https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/)
The following resources were helpful for me too:
https://ably.com/blog/sqs-fifo-queues-message-ordering-and-exactly-once-processing-guaranteed
https://aws.amazon.com/blogs/aws/introducing-amazon-sns-fifo-first-in-first-out-pub-sub-messaging/
https://youtu.be/8zysQqxgj0I

Resources