I read in pubsub docs that if both the oldest_unacked_message_age and num_undelivered_messages are growing in tandem, it indicates the subscribers not keeping up with message volume. Can someone explain how or elaborate it
A subscriber is an application with a subscription to a single or multiple topics to receive messages from it.After a message is sent to a subscriber, the subscriber must acknowledge the message.
If Pub/Sub attempts to deliver a message but the subscriber can't acknowledge it due to bugs in your code or other reason within the time frame, Pub/Sub automatically tries to resend the message. By default, Pub/Sub tries resending the message immediately. Pub/Sub will resend messages that can't be acknowledged.If there are an inadequate number of subscribers to handle high volume of messages it might take too long to acknowledge messages, the messages are redelivered, resulting in the subscribers seeing duplicate messages. It indicates the subscribers not keeping up with message volume.
We can prevent the above situation by:
Add more subscriber threads or processes.
Add more subscriber machines or containers.
Look for signs of bugs in your code that prevent it from successfully acknowledging messages or processing them in a timely fashion.
For more information you can follow this link1,link2.
These two metrics measure two different properties of a subscription's backlog. Let's examine how they tend to grow by looking in terms of oldest_unacked_message_age first, which gives the age of the oldest message that has not been acknowledged by subscribers. This can grow for several reasons including:
It is a message that the subscriber cannot handle and therefore keeps getting nacked or has its ack deadline expire, which results in redelivery. If this is the case, you will typically seen oldest_unacked_message_age grow in tandem with passing time. In other words, for every minute that passes, the value of oldest_unacked_message_age increases by a minute. If only a small number of messages are being rejected, then num_undelivered_messages will reflect the number of messages that are being rejected and will likely be much smaller than the total number of messages published. A dead letter topic can help with such messages.
The subscriber is not able to keep up with the load of published messages. If there is not enough subscriber capacity to keep up with load, then a backlog of messages to be delivered builds up. As this backlog grows, the age of the oldest message in the backlog likely grows as well. Therefore, in this case, oldest_unacked_message_age and num_undelivered_messages both increase (or at least, don't decrease) over time. In this case, oldest_unacked_message_age may not grow in lockstep with time; it's possible that you are able to consume some older messages, but just not able to keep up fully, so the oldest_unacked_message_age may be growing more slowly or may remain steady at a non-zero value.
The second case is the one to which you are referring. Subscribers may not be able to keep up for several reasons and the solution may vary depending on the reason:
Downstream dependencies are too slow: If you are, for example, writing to a database from your subscriber based on messages received, and that is very slow, you may need to tune the behavior of the database to speed up the processing of messages.
You don't have enough subscriber capacity: You may need to turn up more subscriber clients or increase the resources (RAM, network, CPU, or threading) on the subscriber instances you already have. Increasing the number of subscriber clients often helps, though it may be more cost-effective to try to tune the instances you already have. Autoscalers like the one in GCE allow you to automatically alter the number of instances based on the unacknowledged Pub/Sub messages.
Your flow control limits are set too tightly: If your instances are not exceeding any of their resources and downstream dependences are not the limiting factor, but processing is still too slow to keep up with the backlog, look at tuning the flow control settings with higher values. The flow control settings limit the number of messages that can be outstanding to the subscriber client at a time, which limits the ability to process a backlog. You may want to increase the limits in this case in order to saturate the capacity of your subscriber clients.
If your backlog only grows temporarily due to a brief spike in publish load, you may not need to make any changes at all. Absorbing these temporary spikes is exactly what a Pub/Sub system is designed to do. However, if the latency of processing messages is too high for your application or the backlog grows indefinitely, you may need to take some of the above steps.
Related
I am utilising spring cloud aws messaging (2.0.1.RELEASE) in java to consume from an SQS queue. If it's relevant we use default settings, java 10 and spring cloud Finchley.SR2,
We recently had an issue where a message could not be processed due to an application bug, leading to an exception and no confirmation (deletion) of the message. The message is later retried (this is desirable) presumably after the visibility timeout has elapsed (again default values are in use), we have not customised the settings here.
We didn't spot the error above for a few days, meaning the message receive count was very high and the message had conceptually been on the queue for a while (several days by now). We considered creating a cloud watch SQS alarm to alert us to a similar situation in future. The only suitable metric appeared to be ApproximateAgeOfOldestMessage.
Sadly, when observing this metric I see this:
The max age doesn't go much above 5 mins (despite me knowing it was several days old). If a message is getting older each time a receive happens, assuming no acknowledgment comes and the message isn't deleted - but is instead becoming available again after the visibility timeout has elapsed should this graph not be much much higher?
I don't know if this is something specific to thew way that spring cloud aws messaging consumes the message or whether it's a general SQS quirk, but my expectation was that if a message was put on the queue 5 days ago, and a consumer had not successfully consumed the message then the max age would be 5 days?
Is it in fact the case that if a message is received by a consumer, but not ultimately deleted that the max age is actually the length between consume calls?
Can anyone confirm whether my expectation is incorrect, i.e. this is indeed how SQS is expected to behave (it doesn't consider the age to be the duration of time since the message was first put on the queue, but instead considers it to be the time between receive calls?
Based on a similar question on AWS forums, this is apparently a bug with regular SQS queues where only a single message is affected.
In order to have a useful alarm for this issue, I would suggest setting up a dead-letter-queue (where messages get automatically delivered after a configurable number of consume-without-deletes), and alarm on the size of the dead-letter-queue (ApproximateNumberOfMessagesVisible).
I think this might have to do with the poison pill handling by this metric. After 3+ tries, the message won't be included in the metric. From the AWS docs:
After a message is received three times (or more) and not processed,
the message is moved to the back of the queue and the
ApproximateAgeOfOldestMessage metric points at the second-oldest
message that hasn't been received more than three times. This action
occurs even if the queue has a redrive policy.
I tried using parallel requests but the due to retention by AWS, it does not allow to poll back the same queue unless previously polled messages are deleted.
I however achieved doing the same using the FIFO, but not the standard queue.
Thanks in Advance!
:)
When you say "it does not allow to poll back the same queue unless previously polled messages are deleted", I assume you're talking about the inflight messages per queue limit, which is pretty high at 120,000:
For most standard queues (depending on queue traffic and message backlog), there can be a maximum of approximately 120,000 inflight messages (received from a queue by a consumer, but not yet deleted from the queue). If you reach this limit, Amazon SQS returns the OverLimit error message. To avoid reaching the limit, you should delete messages from the queue after they're processed. You can also increase the number of queues you use to process your messages. To request a limit increase, file a support request.
The expected use case of SQS is to have workers that receive a message, do some work, then delete the message. If you're not following this pattern, I'd strongly recommend reevaluating whether SQS is the right tool for what you're trying to do.
However, if you really have a valid use case for having more than 120K messages inflight at once, you'll need to describe your use case to AWS and get their approval to increase that limit.
I've thought about this a lot but can't come up with a solution I'm happy with.
Basicly this is the problem: Log 100k+ Chats (some slower, some faster) into cassandra. So save userId, channelId, timestamp and the message.
Cassandra already supports horizontal scaling out of the box, I have no issue here.
Now my software that reads these chats does it over TCP (IRC). Something like 300 messages / sec are usual for the top 1k channels and 1 single IRC connection can't handle that from my experiments.
What I now want to build is multiple instances (with Docker/Kubernetes) of the logger and share the load between those. So ideally if I have maybe 4 workers and 1k chats (example). They would each join atleast 250 channels. I say atleast because I would want optional redundancy so I can have 2 loggers in the same chat to make sure no messages get lost.
There is no issue with duplicates, because all messages have a unique ID.
Now how would I best and dynamically share the current channels joined between the workers. I wanna avoid having a master or controlling point. Should also be easy to add more workers that then reduce the load on other workers.
Are there any good articles about this kind of behaviour? Maybe good concepts or protocols already defined? Like I said i wanna avoid another central control point so no rabbitmq, redis or whatever.
Edit: I've looked into something like the Raft Consensus Algorithm, but it doesn't make sense I think, since I don't want my clients to agree on a shared state instead divide the state between them "equally".
I think in this case looking for a description of existing algorithm might be not very useful: the problem is not complicated and generic enough to be worth publication.
As described, the problem could be solved by using Cassandra itself as a mediator and to share chat channel assignment information among the workers.
So (trivial part) channels would have IDs and assigned worker ID(s), plus in the optional case of redundancy - required amount of workers (2 or whatever number of workers you want to process this chat). Worker, before assigning itself to a channel would check if there is already enough assignees. If so would continue to the next channel. If not, assign itself to the channel. This is one of the options (alternatively you can have workers holding the channel IDs, but since redundancy is rare this way seems to be simpler). Workers would have a limit of channels they can process and will not try exceeding it by assigning more channels.
Now we only have to deal with the case of assigning too much workers to the same channel, exceeding requirements and exhausting the worker capacity by monitoring all the same channels. Otherwise, if they start all at once, channels might have more assigned workers than needed. Even though it is unlikely will create a real problem in described case (just a bit more redundancy than requested), you can handle that by prioritising workers. Much like employing of school teachers in Canada, BC is done on seniority basis - the most senior gets job first, except that here it'd be voluntarily done by the workers themselves, not by school administration. What this means, is that each worker would have to check all it's assigned channels and, should there be more workers than needed at this time, would check if it has the smallest priority among all the assignees. If it does, it would resign - remove itself and stop processing the channel.
That requires assigning distinct priorities of the workers, which could be easily achieved when spawning them, by simply setting each to a next sequential number (the oldest has the highest priority, or v.v if you concerned of old, potentially dying workers taking up all the load, and would prefer new ones to take on more while still fresh). More elaborately, this could also be done by using Cassandra Lightweight transactions as described in one of the answers here (the one by AlonL). With just a few (you mentioned ~4) workers either way should work and concerns about scaling mentioned in the other answers there isn't a big deal for a few integer priorities. Also, instead of sequential number assignment, requiring the workers to self-assign a random 32-bit integer priority on initialization has virtually no chance of collision, so loop "until no collisions" should exit on the very first iteration (which would make a second iteration very rarely code path requiring an explicit test).
The trick is basically to limit the amount of data requiring synchronisation and putting the load of regulation onto the workers themselves. There is no need for consensus algorithms as there is not much complexity and we are not dealing with huge number of potentially fraudulent workers, trying to get assignments ahead of more senior peers.
The only issue I should mention is that there could be implicit worker rotation if channels go offline which makes worker to stop processing. You will get a different worker assignment next time the channel goes online.
We are in process of implementing msmq for the quick storage of the messages and process them in disconnected mode. Typical usage of any message broker.
One of the administration requirement is to send the automatic notification to administrator/developers if the queue messages (unprocessed) count reaches 1000.
Can it be done out of the box? If yes then how?
If no then do I need to write some windows service (or any sort of scheduler) to check the count every x-seconds?
Any suggestions or past experience is welcome..
The only (partially) built-in solution would be to set up the MSMQ Queue performance counter which gives you this information for private queues on the server.
There are a number of other solutions, including a SCOM management pack, and some third party solutions like evtools, or you could roll you own using System.Messaging.
Hope this is of help.
There's commercial solution for this - QueueMonitor.
Disclaimer: I'm the author of that software.
Edit
Few tips for this scenario:
set message's UseDeadLetterQueue to true - this way if there's any issue with delivering messages at least they won't be lost but moved to system's dead letter queue.
set message's Recoverable property to true - it does reduce performance, but for this kind of long running scenario there's too much risk that some restart or failure would loose messages which are only stored in memory.
if messages are no longer valid after some period, you can use TimeToReachQueue to automatically delete them.
We have an app that uses Cloudant as a remote server. Nevertheless, Cloudant is not completely compatible with TouchDB's continuous replications from previous experience. So our alternative for now is to trigger manually one-shot replications at a fixed frequency. Nevertheless, we would like to know if that approach is going to cost us more money than continuous replications, since continuous replications use longpoll and doesn't need to query the server often. In other words, does one-shot pull replications with Cloudant as the target cost us a GET request?
Thank you,
Paul
I think the issue you refer to is [1].
Cloudant's replication is 100% compatible with CouchDB. In this
instance, TouchDB's logs indicate the iOS network stack passed
on incomplete JSON to TouchDB. It's not clear who was to blame
in this case for the replication failure.
[1] https://github.com/couchbaselabs/TouchDB-iOS/issues/241
For the cost question, a one-shot pull replication will result in a GET to the _changes
feed each time it happens, plus the other requests required to
replicate. This _changes request will be counted as a light
HTTP request against your Cloudant account.
However, whether this works out as more or fewer requests overall
depends on the number of changes coming down from the remote server.
It's also important to remember that the number of _changes calls are very small
relative to the number of other calls involved (e.g., getting the
content of the changes themselves and particularly if there are many
attachments).
While this question is specific to TouchDB, and I mention specific
behaviours of that codebase, this answer deals with the requests involved
in replication between any two systems speaking the CouchDB replication
protocol[2].
[2] http://www.dataprotocols.org/en/latest/couchdb_replication.html
Let's take a contrived example: 1 update per 10 second window to
the source database for the replication, where a TouchDB database
is the target. Let's take a 5 minute poll vs. a continuous replication.
For simplicity of call-counting, let's also take attachments out of the
picture. We'll also assume the device has a constant network connection.
For the continuous case, every 10s TouchDB will receive an update in
the _changes feed. This causes the longpoll connection to close.
TouchDB then runs through the changes, requesting the updates from the
source database; one or more GET requests on the remote server. While
this is happening, TouchDB has to open up another longpoll request
to _changes. So in a five minute period, you'd end up with perhaps
30 calls to _changes, plus all the calls to get documents and record
checkpoints.
Compare this with a one-shot replication every five minutes. You'd
receive notification of the 30 updates in one _changes feed call.
TouchDB implements an optimisation[3] whereby it will call _all_docs
to get updated documents for 1- revs, so you might end up with a single
call to get all 30 documents (not possible in the continuous case as
you've received a single change). Then you've the checkpoint documents
to record. At best fewer than 5 HTTP calls, at most about a third of
the continuous case as you've avoided extra _changes requests.
[3] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm#performance
It comes down to the frequency of updates you expect to the source
database. One-shot replication is likely to provide a smoother price
curve as you're in better control of the number of requests you make.
A further question is how often connections will drop because of the
network disconnects which happen regularly with mobile devices.
TouchDB's continuous replications will fire back up each time the
user comes on line (if added via the _replicator database). This is a
further source of unpredictable costs.
However, the benefits from more immediate visibility of changes may
certainly be worth the uncertainty.