How can I implement throttling by the message value using MassTransit? (backend is SNS/SQS but flexible) - amazon-sqs

I'm interested in using MassTransit as the event bus to help me bust a cache, but I'm not sure how to properly throttle the service.
The situation
I have a .Net service that has a refreshCache(itemId) API which recomputes the cache for itemId. I want to call this whenever code in my organization modifies any data related to itemId.
However, due to legacy code, I may have 10 events for a given itemId emitted within the same second. Since the refreshCache(itemId) call is expensive, I'd prefer to only call it once every second or so per itemId.
For instance, imagine that I have 10 events emitted for item1 and then 1 event emitted for item2. I'd like refreshCache to be called twice, once with item1 and once with item2.
Trouble with MassTransit
I could send event messages that essentially are just itemId over SNS/SQS, and the .Net service could use a MassTransit consumer to listen to that SQS queue and call refreshCache for each message. Ideally, I can also throttle either in SNS/SQS or MassTransit.
I've read these docs: https://masstransit-project.com/advanced/middleware/rate-limiter.html and have tried to find the middleware in the code but wasn't able to locate it.
They seem to suggest that the rate-limiting just delays the delivery of messages, which means that my refreshCache would get called 10 times with item1 before getting called with item2. Instead, I'd prefer it get called once per item, ideally both immediately.
Similarly, it seems as if SNS and SQS can either rate-limit in-order delivery or throttle based on the queue but not based on the contents of that queue. It would not be feasible for me to have separate queues per itemId, as there will be 100,000+ distinct itemIds.
The Ask
Is what I'm trying to do possible in MassTransit? If not, is it possible via SQS? I'm also able to be creative with using RabbitMQ or adding in Lambdas, but would prefer to keep it simple.

Related

Azure Durable Function getting slower and slower over time

My Azure Durable Function(Runtime V3) getting an average of 3M events per day. When it runs for two or three weeks it is getting slower and slower. When I remove two table storages(History & Instances) used by Durable Function Framework, it is getting better and works as expected. I hosted my function app in the consumption plan. And also inside my function app, I'm used Durabel Entities as well. In my code, I'm using sub orchestrators as well for the Fan-Out mechanism.
Is this problem possible when it comes to heavy workload? Do I need to clear those table storages from time to time or do I need to Delete the state of completed entities inside my Durable Entity Function?
Someone, please help me
Yes, you should perform periodic clean-ups yourself by calling the PurgeInstanceHistoryAsync method. See a similar post on how to do this: https://stackoverflow.com/a/60894392
Also review any loops or Monitor patterns that you may have in your code.
Any looping logic, (like foreach, for or while loops) will replay from the initial startup state. Whilst the Durable Function replay architecture is very efficient at doing this, the code we write may not be optimised for repetitive iterations.
Durable Monitor Pattern is almost an Anti-Pattern. The concept is OK but it is easily misinterpreted and is open to abuse. It is designed for a low-frequency loop that polls an endpoint either for a set number of iterations or up until a finite time, or of course when the state of the endpoint being monitoried has changed. That state change will be the trigger to perform the rest of the operation.
It is NOT an example of how to use general or high frequency looping structures in Durable functions
It is NOT and example of how to implement a traditional HTTP endpoint reporting monitor in an infinite loop (while(true)) style, perhaps to record changes into a data store over time.
If your durable function logic has an iterator that may involve many iterations, consider migrating the iteration step to a sub-orchestration that uses the Eternal Orchestration pattern

Jobs pushing to queue, but not processing

I am using AWS SQS. I am getting 2 issues.
Sometime, messages are present in the queue but I am not able to read that.
When I fetch, I am getting blank array, same like not any messages found in queue.
When I am deleting a message from queue then it gives me like
sqs.delete_message({queue_url: queue_url, receipt_handle: receipt_handle})
=> Aws::EmptyStructure
When I check in SQS (In AWS), message still present even I refresh page more then 10 times.
Can you help me why this happens ?
1. You may need to implement Long Polling.
SQS is a distributed system. By default, when you read from a queue, AWS returns you the response only from a small subset of its servers. That's why you receive empty array some times. This is known as Short Polling.
When you implement Long Polling, AWS waits until it gets the response from all it's servers.
When you're calling ReceiveMessage API, set the parameter WaitTimeSeconds > 0.
2. Visibility Timeout may be too short.
The Visibility Timeout controls how long a message currently being read by one poller is invisible to other pollers. If the visibility timeout is too short, then other pollers may start reading the message before your first poller has processed and deleted it.
Since SQS supports multiple pollers reading the same message. From the docs -
The ReceiptHandle is associated with a specific instance of receiving a message. If you receive a message more than once, the ReceiptHandle is different each time you receive a message. When you use the DeleteMessage action, you must provide the most recently received ReceiptHandle for the message (otherwise, the request succeeds, but the message might not be deleted).

Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

Speed up the proces of requesting messages from SQS

We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.

Should I convert my action method to async action method?

I have a web site where user can upload a PDF and convert it to WORD doc.
It works nice but sometimes (5-6 times per hour) the users have to wait more than usual for the conversion to take place....
I use ASP.NET MVC and the flow is:
- USER uploads file -> get the stream and convert it to word -> save word file as a temp file -> return the user the url
I am not sure if I have to convert this flow to asynchronous? Basically, my flow is sequential now BUT I have about 3-5 requests per second and CPU is dual core and 4 GB Ram.
And as I know maxConcurrentRequestsPerCPU is 5000; also The default value of Threads Per Processor Limit is 25; so these default settings should be more than fine, right?
Then why still my web app has "waitings" some times? Are there any IIS settings I need to modify from default to anything else or I should just go and make my sync method for conversion to be async?
Ps: The conversion itself is taking between 1 seconds to 40-50 seconds depending on the pdf file size.
UPDATE: Basically what it's not very clear for me is: if a user uploads a file and the conversion is long shouldn't only current request "suffer" because of this? Because the next request is independent, make another CPU call and different thread so should be no wait here, isn't it?
There are a couple of things that must be defined clearly here. Async(hronous) method and flow are not the same thing at least as far as I can understand.
An asynchronous method (using Task, usually also leveraging the async/await keywords) will work in the following way:
The execution starts on thread t1 until it reaches an await
The (potentially) long operation will not take place on thread t1 - sometimes not even on an app thread at all, leveraging IOCP (I/O completion ports).
Thread t1 is free and released back to the thread pool and is ready to service other requests if needed
When the (potentially) long operation returns a thread is taken from the thread pool (could even be the same t1 or, most probably, another one) and the rest of the code execution resumes from the last await encountered
The rest of the code executes
There's a couple of things to note here:
a. The client is blocked during the whole process. The eventual switching of threads and so on happens only on the server
b. This approach is mainly designed to alleviate an unwanted condition called 'thread starvation'. It is not meant to speed up the total client waiting time and it usually doesn't speed up the process.
As far as I understand an asynchronous flow would mean, at least in this case, that after the user's request of converting the document, the client (i.e. the client's browser) would quickly receive a response in which (s)he is informed that this potentially long process has started on the server, the user should be patient and this current response page might provide progress feedback.
In your case I recommend the second approach because the first approach would not help at all.
Of course this will not be easy. You need to emulate a queue, you need to have a processing agent and an eviction policy (most probably enforce by the same agent if you don't want a second agent).
This would work along the following lines:
a. The end user submits a file, the web server receives it
b. The web server places it in the queue and receives a job number
c. The web server returns the user a response with the job number (let's say an HTML page with a polling mechanism that would periodically receive progress from the server)
d. The agent would start processing the document when it gets the chance (i.e. finishes other work) and update its status in a common place for the web server to pick this information
e. The web server would receive calls from the HTML response asking for the status of the job and would find out that the job is complete and offer a download link or start downloading it directly.
This can be refined in some ways:
instead of the client polling the server, websockets or long polling (for example SignalR covers both) could be used
many processing agents could be used instead of one if the hardware configuration makes sense
The queue can be implemented with a simple RDBMS, Remus Rușanu has a nice article about this.

Resources