My question is very similar to this one: Why is there a delay in Spring AMQP Message dispatching from a filled Queue?
I can see a delay between two invocations of my message listener even when the queue is filled with messages. I have put a log message with the time at the very begining of the method, the time at the end, and the time that took the method (in milliseconds):
Start - End - Time
2016-09-21T10:08:55.263; - 2016-09-21T10:08:55.278; - 15;
2016-09-21T10:08:55.356; - 2016-09-21T10:08:55.356; - 0;
2016-09-21T10:08:55.388; - 2016-09-21T10:08:55.388; - 0;
2016-09-21T10:08:55.466; - 2016-09-21T10:08:55.466; - 0;
The time processing the message is about 10 ms (in average) but I can see delays greater tan 50 ms (sometimes greater tan 100 ms).
If I change the parameter PrefetchCount of the SimpleMessageListenerContainer (for example to 200), then the performance increase considerabily, and now I can see in the logs that the delay has desapear:
Start - End - Time
2016-09-21T10:26:27.336; - 2016-09-21T10:26:27.336; - 0;
2016-09-21T10:26:27.336; - 2016-09-21T10:26:27.351; - 15;
2016-09-21T10:26:27.351; - 2016-09-21T10:26:27.351; - 0;
2016-09-21T10:26:27.351; - 2016-09-21T10:26:27.351; - 0;
My questions are:
What casues this delay? Is really Network delay? How can I prove this?
I’ve seen in the documentation about “prefetchCount” : “The higher this is the faster the messages can be delivered, but the higher the risk of non-sequential processing.” What that really mean? If I need to process the messages in sequential order, can I have “prefetchCount” in a value more tan 1?
My configuration is like this:
public MessageListenerAdapter broadcastMessageListenerAdapter() {
return new MessageListenerAdapter(myHandlerBroadcast(), "onMessage");
public SimpleMessageListenerContainer myBroadcastMessageListenerContainer()
SimpleMessageListenerContainer container = new SimpleMessageListenerContainer(myConnectionFactory());
container.setQueueNames(PX_BROADCAST_QUEUE + environment.getProperty("my.user"));
container.setPrefetchCount(500); // ¿1?
return container;
public MyHandlerBroadcast myHandlerBroadcast(){
return new MyHandlerBroadcast();
Yes; it's network. You can "prove" it by using a network monitor. By default prefetchCount is 1 which means the broker only allows one un-acked message at the consumer. Only when that message is ack'd is the next one sent.
Increasing the prefetch count dramatically increases performance but can cause out-of-order delivery.
What that really mean?
Consider a prefetch count of 10. You process 5 messages and then have a failure (#6) and the message is rejected and requeued (if so configured).
Message #6 will be redelivered when you have consumed all your prefetched messages - hence it will be out of order.
If you never requeue messages then there is no problem with message delivery order.
I am using the Serverless Framework to consume messages from SQS. Some of the messages sent to the queue do not get consumed. They go straight to the in-flight SQS status and from there to my dead letter queue. When I look at my log of the consumer, I can see that it consumed and successfully processed 9/10 messages. One is always not consumed and ends up in the dead letter queue. I am setting reservedConcurrency to 1 so that only one consumer can run at a time. The function consumer timeout is set to 30 seconds. This is the consumer code:
module.exports.mySQSConsumer = async (event, context) => {
context.callbackWaitsForEmptyEventLoop = false;
await new Promise((res, rej) => {
setTimeout(() => {
}, 100);
return true;
Consumer function configuration follow:
handler: handler.mySQSConsumer
timeout: 30 # seconds
reservedConcurrency: 1
- sqs:
arn: arn:aws:sqs:us-east-1:xyz:my-test-queue
batchSize: 1
enabled: true
If I remove the await function, it will process all messages. If I increase the timeout to 200ms, even more messages will go to straight to the in-flight status and from there to the dead letter queue. This code is very simple. Any ideas why it's skipping some messages? The messages that don't get consumed don't even show up in the log using the first console.log() statement. They seem entirely ignored.
I figured out the problem. The SQS queue Lambda function event triggering works differently than I thought. The messages get pushed into the Lambda function, not pulled by it. I think this could be engineered better by AWS, but it's what it is.
The issue was the Default Visibility Timeout set to 30 seconds together with Reserved Concurrency set to 1. When the SQS queue gets filled up quickly with thousands of records, AWS starts pushing the messages to the Lambda function at a rate that is faster than the rate at which the single function instance can process them. AWS "assumes" that it can simply spin up more instances of the Lambda to keep up with the backpressure. However, the concurrency limit doesn't let it spin up more instances - the Lambda function is throttled. As a result, the function starts returning failure to the AWS backend for some messages, which will, consequently, hide the failed messages for 30 seconds (the default setting) and put them back into the queue after this period for reprocessing. Since there are so many records to process by the single instance, 30 seconds later, the Lambda function is still busy and can't process those messages again. So the situation repeats itself and the messages go back to invisibility for 30 seconds. This repeats total 3 times. After the third attempt, the messages go to the dead letter queue (we configured our SQS queue that way).
To resolve this issue, we increased the Default Visibility Timeout to 5 minutes. That's enough time for the Lambda function to process through most of the messages in the queue while the failed ones wait in invisibility. After 5 minutes, they get pushed back into the queue and since the Lambda function is no longer busy, it will process most of them. Some of them have to go to invisibility twice before being successfully processed.
So the remedy to this problem is either increasing the Default Invisibility Timeout like we did or increasing the number of failures necessary before a message goes to the dead letter queue.
I hope this helps someone.
I am still confused about the NumberOfConcurrentThreads parameter within CreateIoCompletionPort(). I have read and re-read the MSDN dox, but the quote
This value limits the number of runnable threads associated with the
completion port.
still puzzles me.
Let's assume that I specify this value as 4. In this case, does this mean that:
1) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I can then have 4 threads again call this function,
2) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I then go on to process that packet. Only when I have finished processing the packet do I then call GetQueuedCompletionStatus(), at which point I can then have 4 threads again call this function.
See my confusion? Its the use of the phrase 'runnable threads'.
I think it might be the latter, because the link above also quotes
If your transaction required a lengthy computation, a larger
concurrency value will allow more threads to run. Each completion
packet may take longer to finish, but more completion packets will be
processed at the same time.
This will ultimately affect how we design servers. Consider a server that receives data from clients, then echoes that data to logging servers. Here is what our thread routine could look like:
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
// OK, we have 'BytesTransferred' of data in 'PerIoData', process it:
// send the data onto our logging servers, then loop back around
return 0;
Now assume I have a four core machine; if I leave NumberOfConcurrentThreads as zero within my call to CreateIoCompletionPort() I will have four threads running ServerWorkerThread(). Fine.
My concern is that the send() call may take a long time due to network traffic. Hence, I could be receiving a load of data from clients that cannot be dequeued because all four threads are taking a long time sending the data on?!
Have I missed the point here?
Update 07.03.2018 (This has now been resolved: see this comment.)
I have 8 threads running on my machine, each one runs the ServerWorkerThread():
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
switch (PerIoData->Operation)
case CPerOperationData::ACCEPT_COMPLETED:
// This case is fired when a new connection is made
while (1) {}
I only have one outstanding AcceptEx() call; when that gets filled by a new connection I post another one. I don't wait for data to be received in AcceptEx().
I create my completion port as follows:
CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 4)
Now, because I only allow 4 threads in the completion port, I thought that because I keep the threads busy (i.e. they do not enter a wait state), when I try and make a fifth connection, the completion packet would not be dequeued hence would hang! However this is not the case; I can make 5 or even 6 connections to my server! This shows that I can still dequeue packets even though my maximum allowed number of threads (4) are already running? This is why I am confused!
the completion port - is really KQUEUE object. the NumberOfConcurrentThreads is corresponded to MaximumCount
Maximum number of concurrent threads the queue can satisfy waits for.
from I/O Completion Ports
When the total number of runnable threads associated with the
completion port reaches the concurrency value, the system blocks the
execution of any subsequent threads associated with that completion
port until the number of runnable threads drops below the concurrency
it's bad and not exactly said. when thread call KeRemoveQueue ( GetQueuedCompletionStatus internal call it) system return packet to thread only if Queue->CurrentCount < Queue->MaximumCount even if exist packets in queue. system not blocks any threads of course. from another side look for KiInsertQueue - even if some threads wait on packets - it activated only in case Queue->CurrentCount < Queue->MaximumCount.
also look how and when Queue->CurrentCount is changed. look for KiActivateWaiterQueue (This function is called when the current thread is about to enter a wait state) and KiUnlinkThread. in general - when thread begin wait for any object (or another queue) system call KiActivateWaiterQueue - it decrement CurrentCount and possible (if exist packets in queue and became Queue->CurrentCount < Queue->MaximumCount and threads waited for packets) return packet to wait thread. from another side, when thread stop wait - KiUnlinkThread is called. it increment CurrentCount.
your both variant is wrong. any count of threads can call GetQueuedCompletionStatus(). and system of course not blocks the execution of any subsequent threads. for example - you have queue with MaximumCount = 4. you can queue 10 packets to queue. and call GetQueuedCompletionStatus() from 7 threads in concurrent. but only 4 from it got packets. another will be wait (despite yet 6 packets in queue). if some of threads, which remove packet from queue begin wait - system just unwait and return packet to another thread wait on queue. or if thread (which already previous remove packet from this queue (Thread->Queue == Queue) - so active thread) again call KeRemoveQueue will be Queue->CurrentCount -= 1;
I am having issues trying to get the status of how many things are left to process using SignalR. I have a starting number and every time an item completes I have it increment a counter. However, the user isn't being notified in the manner the code would suggest.
I'm not entirely sure how to word this, but here goes. I'm queuing up a series of passengers to process and then processing them. If my understanding is correct, the processing starts immediately after the first thread is queued. After everyone is queued, every second there is a SignalR call to inform the user of where we are in the process. However, the SignalR call isn't working as expected.
Next, code:
StatusInfo.SendStatus("Retrieving Passenger Details");
foreach (var passenger in manifestResponse.Manifest.PassengerList)
//Spin up all the threads.
//StatusInfo.SendStatus(TotalPassengers - PassengerThreads, 0, TotalPassengers, StartTime);
ThreadPool.QueueUserWorkItem(new WaitCallback(GetSinglePassengerDetails), passenger);
if (TotalPassengers % 5 == 0)
StatusInfo.SendStatus(TotalPassengers - PassengerThreads, 0, TotalPassengers, StartTime);
//Wait for them to be done.
StatusInfo.SendStatus(TotalPassengers - PassengerThreads, 0, TotalPassengers, StartTime);
while (PassengerThreads > 0);
So what it happening is that I will send the threads to the pool to run, however, during the send status loop it does not actually send anything back. When I open the console in the browser there's a 20 second gap between showing "Retrieving Passenger Details" and the first X of Y status. Is there something I'm doing wrong here? Maybe using the wrong threading model? Thanks.
I've implemented a Web role that writes to a queue. This is working fine. Then I developed a Worker role to read from the queue. When I run it in debug mode from my local machine it reads the messages from the queue fine, but when i deploy the Worker role it dos'nt seem to be reading the queue as the message eventually end up in the dead letter queue. Anyone know what could be causing this behavior? Below are some bit that might be key in figuring this thing out
queueClient = QueueClient.Create(queueName, ReceiveMode.PeekLock);
var queueDescription = new QueueDescription(QueueName)
RequiresSession = false,
DefaultMessageTimeToLive = TimeSpan.FromMinutes(2),
EnableDeadLetteringOnMessageExpiration = true,
MaxDeliveryCount = 20
Increase the QueueDescription.DefaultmessageTimeToLive to ~10 mins.
This property dictates how much time a message should live in the Queue - before being processed (Message.Complete() is called). If it remains in the queue for more than 2 mins - it will be automatically moved to DeadLetterQueue (as you had Set EnableDeadLetteringOnMsgExp to true).
TTL is useful in these messaging scenarios
if a message is not being processed after N mins after it arrived -then it might not be useful to process it any more
if the message was attempted to process many times and was never completed (Reciever - msg.Complete()) - this might be needing special processing
So - to be safe have a bit higher value of DefaultMsgTTL.
Hope it Helps!
I'm not sure why, but for some reason when using the observable that is created via concat I will always get all values that are pushed from my list (works as intended). Where as with the normal subscribe it seems that some values never make it to those who have subscribed to the observable (only in certain conditions).
These are the two cases that I am using. Could anyone attempt to explain why in certain cases when subscribing to the second version not all values are received? Are they not equivalent? The intent here is to rewind the stream. What are some reasons that could explain why Case 2 fails while Case 1 does not.
Replay here is just a list of the ongoing stream.
Case 1.
let observable =
Observable.Create(fun (o:IObserver<'a>) ->
let next b =
for v in replay do
someOtherObs.Subscribe(next, o.OnError, o.OnCompleted))
let toReturn = observable.Concat(someOtherObs).Publish().RefCount()
Case 2.
let toReturn =
Observable.Create(fun (o:IObserver<'a>) ->
for v in replay do
Caveat! I don't use F# regularly enough to be 100% comfortable with the syntax, but I think I see what's going on.
That said, both of these cases look odd to me and it greatly depends on how someOtherObs is implemented, and where (in terms of threads) things are running.
Case 1 Analysis
You apply concat to a source stream which appears to work like this:
It subscribes to someOtherObs, and in response to the first event (a) it pushes the elements of replay to the observer.
Then it sends event (a) to the observer.
Then it completes. At this point the stream is finished and no further events are sent.
In the event that someOtherObs is empty or just has a single error, this will be propagated to the observer instead.
Now, when this stream completes, someOtherObs is concatenated on to it. What happens now is a little unpreditcable - if someOtherObs is cold, then the first event would be sent a second time, if someOtherObs is hot, then the first event is not resent, but there's a potential race condition around which event of the remainder will go next which depends on how someOtherObs is implemented. You could easily miss events if it's hot.
Case 2 Analysis
You replay all the replay events, and then send all the events of someOtherObs - but again there's a race condition if someOtherObs is hot because you only subscribe after pushing replay, and so might miss some events.
In either case, it seems messy to me.
This looks like an attempt to do a merge of a state of the world (sotw) and a live stream. In this case, you need to subscribe to the live stream first, and cache any events while you then acquire and push the sotw events. Once sotw is pushed, you push the cached events - being careful to de-dupe events that may been read in the sotw - until you are caught up with live at which point you can just pass live events though.
You can often get away with naive implementations that flush the live cache in an OnNext handler of the live stream subscription, effectively blocking the source while you flush - but you run the risk of applying too much back pressure to the live source if you have a large history and/or a fast moving live stream.
Some considerations for you to think on that will hopefully set you on the right path.
For reference, here is an extremely naïve and simplistic C# implementation I knocked up that compiles in LINQPad with rx-main nuget package. Production ready implementations I have done in the past can get quite complex:
void Main()
// asynchronously produce a list from 1 to 10
Func<Task<List<int>>> sotw =
() => Task<List<int>>.Run(() => Enumerable.Range(1, 10).ToList());
// a stream of 5 to 15
var live = Observable.Range(5, 10);
// outputs 1 to 15
// Define other methods and classes here
public static class ObservableExtensions
public static IObservable<TSource> MergeSotwWithLive<TSource>(
this IObservable<TSource> live,
Func<Task<List<TSource>>> sotwFactory)
return Observable.Create<TSource>(async o =>
// Naïve indefinite caching, no error checking anywhere
var liveReplay = new ReplaySubject<TSource>();
// No error checking, no timeout, no cancellation support
var sotw = await sotwFactory();
foreach(var evt in sotw)
// note naive disposal
// and extremely naive de-duping (it really needs to compare
// on some unique id)
// we are only supporting disposal once the sotw is sent
return liveReplay.Where(evt => !sotw.Any(s => s.Equals(evt)))