Grails RabbitMQ losing messages when many messages at once

Grails RabbitMQ losing messages when many messages at once - grails

I have a Grails application using the grails RabbitMQ plugin to handle messages asynchronously. The queue is set to durable, the messages are persistent, and there are 20 concurrent consumers. Acknowledgement is turned on and is set to issue an ack/nack based on if the consumer returns normally or throws an exception. These consumers usually handle messages fine, but when the queue fills up very quickly (5,000 or so messages at once) some of the messages get lost.
There is logging in the consumer when the message from Rabbit is received and that logging event never occurs, so the consumer is not receiving the lost messages at all. Further, there are no exceptions that appear in the logs.
I have tried increasing the prefetch value of the consumers to 5 (from 1), but that did not solve the problem. I have checked the RabbitMQ UI and there are no messages stuck in the queue and there are no unacknowledged messages.

Related

Message Loss (message sent to spring-amqp doesn't get published to rabbitmq)

We are having a setup where we are using spring-amqp transacted channels to push our messages to RabbitMq. During a testing we found that messages were not even getting published from spring-amqp to rabbitmq;
we suspect metricsCollector.basicPublish(this) in com.rabbitmq.client.impl.ChannelN failure(no exception is thrown).
because we can see that RabbitUtils.commitIfNecessary(channel) in org.springframework.amqp.rabbit.core.RabbitTemplate is not getting called when there is an issue executing metricsCollector.basicPublish(this) for the same code flow.
We have taken TCP dumps and could see that message were written to stream/socket on rabbitmq, but since commit didn't happen due to an a probable amqp api failure the messages were not delivered to corresponding queues.
Jars Version Being used in the setup:-
spring-amqp-2.2.1.RELEASE.jar,
spring-rabbit-2.2.1.RELEASE.jar
amqp-client-5.7.3.jar,
metrics-core-3.0.2.jar
Is anyone facing the similar issue?
Can someone please help.
---edit 1
(Setup) :- We are using same connection Factory for flows with parent transaction and flows not running with parent transactions
On further analyzing the issue , we found that isChannelLocallyTransacted is sometimes showing in-consistent behavior because ConnectionFactoryUtils.isChannelTransactional(channel, getConnectionFactory() is sometimes having a reference to transacted channel (returns true hence expression isChannelLocallyTransacted evaluates to false) due to which tx.commit never happens; so message gets lost before getting committed to RabbitMQ.

Is there a way to receive most messages out of the standard SQS Queue? [NOT FIFO]

I tried using parallel requests but the due to retention by AWS, it does not allow to poll back the same queue unless previously polled messages are deleted.
I however achieved doing the same using the FIFO, but not the standard queue.
Thanks in Advance!
:)

When you say "it does not allow to poll back the same queue unless previously polled messages are deleted", I assume you're talking about the inflight messages per queue limit, which is pretty high at 120,000:
For most standard queues (depending on queue traffic and message backlog), there can be a maximum of approximately 120,000 inflight messages (received from a queue by a consumer, but not yet deleted from the queue). If you reach this limit, Amazon SQS returns the OverLimit error message. To avoid reaching the limit, you should delete messages from the queue after they're processed. You can also increase the number of queues you use to process your messages. To request a limit increase, file a support request.
The expected use case of SQS is to have workers that receive a message, do some work, then delete the message. If you're not following this pattern, I'd strongly recommend reevaluating whether SQS is the right tool for what you're trying to do.
However, if you really have a valid use case for having more than 120K messages inflight at once, you'll need to describe your use case to AWS and get their approval to increase that limit.

RabbitMq: Message lost from queue when exceptions occurred during message processing

I am ruby on rails developer. i am using rabbitMQ in my project to processed some data as soon as the data comes in queue. i am using bunny gem a rabbitMQ client that provide interface to interact with RabbitMq.
My issue is that whenever an exceptions occurred or server stops unexpectedly while processing data from queue my message from the queue is lost.
I want to know how people deal with lost messages from the rabbitMQ queue. is there any way to get those messages back for processing.

There is no way to get the messages back when they're lost. Maybe you could try and track down some entries in RMQ's database cache - but that's just a wild guess/long shot and I don't think that it will help.
What you do need to do for the future is:
in case you are using a single server: make the queues and messages durable, and explicitly acknowledge (so switch off the auto-ACK flag) messages on consumer side only once they're processed.
in case you are using cluster of RMQ nodes (which is of course recommended exactly to avoid these situations): set up queue mirroring
Take a look at RMQ persistance and high availability.

Spring AMQP Reporting

We are using spring AMQP to listen to rabbitMQ for messages. I want to be able to report the metrics once we finished processing batch of messages, that means when we exhausts all the message in the queue. I m not sure how to do that in Spring AMQP. browsing spring document, it mentions advice chain to SimpleRabbitListenerContainerFactory, but that's mainly for RetryInterceptor. is there anyway allow me to report?

There is nothing in the framework to notify the listener that there are no new messages available.
You could examine the queue using rabbitadmin to see a message count but that would be expensive to do it on every message delivery.
Some ideas:
You could schedule a task to run after some period when no messages are received (and cancel/reschedule each time a new message arrives).
You could have the sending system add a marker to the "last" message so the receiver knows the batch is complete.
Instead of using the message listener container, use RabbitTemplate.receive() (or receiveAndConvert()) which, by default, returns null when there are no messages in the queue. Call them in a loop until there are no messages. When that happens, issue your report, then go into a polling loop (with a sleep) to poll for the next "batch".

How to guarantee that Amazon SQS will receive a message only once?

I'm using an Amazon SQS queue to send notifications to an external system.
If the HTTP request fails when using SQS' SendMessage, I don't know whether the message has been queued or not. My default policy would be to retry posting the message to the queue, but there's a risk to post the message twice, which might not be acceptable depending on the use case.
Is there a way to have SQS refuse the message if there is a duplicate on the message body (or some kind of message metadata, such as a unique ID we could provide) so that we could retry until the message is accepted, and be confident that there won't be a duplicate if the first request had been already queued, but the response had been lost?

No, there's no such mechanism in SQS. Going further, it is also possible that a message will be delivered twice or more (at-least-once delivery semantics). So even if such a mechanism existed, you wouldn't be able to guarantee that the message isn't delivered multiple times.
See: http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/DistributedQueues.html
For exactly-once deliveries, you need some form of transactions (and HTTP isn't a transactional protocol) both on the sending and receiving end.

AFAIK, right now SQS does support what was asked!
Please see the "What's new" post entitled Amazon SQS Introduces FIFO Queues with Exactly-Once Processing and Lower Prices for Standard Queues
According to SQS FAQ:
FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it. Duplicates are not introduced into the queue.
There's also an AWS Blog post with a bit more insight on the subject:
These queues are designed to guarantee that messages are processed exactly once, in the order that they are sent, and without duplicates.
......
Exactly-once processing applies to both single-consumer and multiple-consumer scenarios. If you use FIFO queues in a multiple-consumer environment, you can configure your queue to make messages visible to other consumers only after the current message has been deleted or the visibility timeout expires. In this scenario, at most one consumer will actively process messages; the other consumers will be waiting until the first consumer finishes or fails.
Duplicate messages can sometimes occur when a networking issue outside of SQS prevents the message sender from learning the status of an action and causes the sender to retry the call. FIFO queues use multiple strategies to detect and eliminate duplicate messages. In addition to content-based deduplication, you can include a MessageDeduplicationId when you call SendMessage for a FIFO queue. The ID can be up to 128 characters long, and, if present, takes higher precedence than content-based deduplication.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart