Should I convert my action method to async action method? - asp.net-mvc

I have a web site where user can upload a PDF and convert it to WORD doc.
It works nice but sometimes (5-6 times per hour) the users have to wait more than usual for the conversion to take place....
I use ASP.NET MVC and the flow is:
- USER uploads file -> get the stream and convert it to word -> save word file as a temp file -> return the user the url
I am not sure if I have to convert this flow to asynchronous? Basically, my flow is sequential now BUT I have about 3-5 requests per second and CPU is dual core and 4 GB Ram.
And as I know maxConcurrentRequestsPerCPU is 5000; also The default value of Threads Per Processor Limit is 25; so these default settings should be more than fine, right?
Then why still my web app has "waitings" some times? Are there any IIS settings I need to modify from default to anything else or I should just go and make my sync method for conversion to be async?
Ps: The conversion itself is taking between 1 seconds to 40-50 seconds depending on the pdf file size.
UPDATE: Basically what it's not very clear for me is: if a user uploads a file and the conversion is long shouldn't only current request "suffer" because of this? Because the next request is independent, make another CPU call and different thread so should be no wait here, isn't it?

There are a couple of things that must be defined clearly here. Async(hronous) method and flow are not the same thing at least as far as I can understand.
An asynchronous method (using Task, usually also leveraging the async/await keywords) will work in the following way:
The execution starts on thread t1 until it reaches an await
The (potentially) long operation will not take place on thread t1 - sometimes not even on an app thread at all, leveraging IOCP (I/O completion ports).
Thread t1 is free and released back to the thread pool and is ready to service other requests if needed
When the (potentially) long operation returns a thread is taken from the thread pool (could even be the same t1 or, most probably, another one) and the rest of the code execution resumes from the last await encountered
The rest of the code executes
There's a couple of things to note here:
a. The client is blocked during the whole process. The eventual switching of threads and so on happens only on the server
b. This approach is mainly designed to alleviate an unwanted condition called 'thread starvation'. It is not meant to speed up the total client waiting time and it usually doesn't speed up the process.
As far as I understand an asynchronous flow would mean, at least in this case, that after the user's request of converting the document, the client (i.e. the client's browser) would quickly receive a response in which (s)he is informed that this potentially long process has started on the server, the user should be patient and this current response page might provide progress feedback.
In your case I recommend the second approach because the first approach would not help at all.
Of course this will not be easy. You need to emulate a queue, you need to have a processing agent and an eviction policy (most probably enforce by the same agent if you don't want a second agent).
This would work along the following lines:
a. The end user submits a file, the web server receives it
b. The web server places it in the queue and receives a job number
c. The web server returns the user a response with the job number (let's say an HTML page with a polling mechanism that would periodically receive progress from the server)
d. The agent would start processing the document when it gets the chance (i.e. finishes other work) and update its status in a common place for the web server to pick this information
e. The web server would receive calls from the HTML response asking for the status of the job and would find out that the job is complete and offer a download link or start downloading it directly.
This can be refined in some ways:
instead of the client polling the server, websockets or long polling (for example SignalR covers both) could be used
many processing agents could be used instead of one if the hardware configuration makes sense
The queue can be implemented with a simple RDBMS, Remus Rușanu has a nice article about this.

Related

Speed up the proces of requesting messages from SQS

We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.

How can I retrieve data onscreen from an external API without tying up a request?

I have a Rails 3 application that lets a user perform a search against a 3rd party database via an API. It could potentially bring back quite a bit of XML data. Also, the API could be busy serving requests for other users and have a nontrivial delay in its response time.
I only run 2 webservers so I can't afford to have a delayed request obviously. I use Sidekiq to process long-running jobs, but in all the cases I've needed that I haven't had to return a value to the screen.
I also use Pusher to communicate back to the user when a background job is finished. I am checking it out, but I don't know if it can be used for the kind of data I want to push to the screen. Right now it just pops up dialog boxes with messages that I send it.
I have thought of some pretty kooky stuff, like running the request via Sidekiq, sending the results to a session object or file, then using Pusher to kick off some kind of event to grab the data and populate the screen with it. Seems kind of Rube Goldberg-ish.
I appreciate any help or insight anyone can offer into the problem!
I had a similar situation not long ago and the way I've fixed was using memcache and threads.
I've also thought about using Sidekiq, but Sidekiq is ideal if you don't expect to use the data right away, so memcache and threads worked pretty well and gave us a good amount of control.
Instead of calling the API directly I would assign the API request to a thread and this thread once done would write to memcache, in my case this can happen incrementally, with the same API being able to return more data from the same endpoint until is complete.
From the UI I would have a basic ajax pooling mechanism that would hit a controller and check memcache for data and for the status to see if it was complete or not, this would sign the UI that it need to keep pooling for more data.

C# 5 .NET MVC long async task, progress report and cancel globally

I use ASP.Net MVC 5 and I have a long running action which have to poll webservices, process data and store them in database.
For that I want to use TPL library to start the task async.
But I wonder how to do 3 things :
I want to report progress of this task. For this I think about SignalR
I want to be able to left the page where I start this task from and be able to report the progression across the website (from a panel on the left but this is ok)
And I want to be able to cancel this task globally (from my panel on the left)
I know quite a few about all of technologies involved. But I'm not sure about the best way to achieve this.
Is someone can help me about the best solution ?
The fact that you want to run long running work while the user can navigate away from the page that initiates the work means that you need to run this work "in the background". It cannot be performed as part of a regular HTTP request because the user might cancel his request at any time by navigating away or closing the browser. In fact this seems to be a key scenario for you.
Background work in ASP.NET is dangerous. You can certainly pull it off but it is not easy to get right. Also, worker processes can exit for many reasons (app pool recycle, deployment, machine reboot, machine failure, Stack Overflow or OOM exception on an unrelated thread). So make sure your long-running work tolerates being aborted mid-way. You can reduce the likelyhood that this happens but never exclude the possibility.
You can make your code safe in the face of arbitrary termination by wrapping all work in a transaction. This of course only works if you don't cause non-transacted side-effects like web-service calls that change state. It is not possible to give a general answer here because achieving safety in the presence of arbitrary termination depends highly on the concrete work to be done.
Here's a possible architecture that I have used in the past:
When a job comes in you write all necessary input data to a database table and report success to the client.
You need a way to start a worker to work on that job. You could start a task immediately for that. You also need a periodic check that looks for unstarted work in case the app exits after having added the work item but before starting a task for it. Have the Windows task scheduler call a secret URL in your app once per minute that does this.
When you start working on a job you mark that job as running so that it is not accidentally picked up a second time. Work on that job, write the results and mark it as done. All in a single transaction. When your process happens to exit mid-way the database will reset all data involved.
Write job progress to a separate table row on a separate connection and separate transaction. The browser can poll the server for progress information. You could also use SignalR but I don't have experience with that and I expect it would be hard to get it to resume progress reporting in the presence of arbitrary termination.
Cancellation would be done by setting a cancel flag in the progress information row. The app needs to poll that flag.
Maybe you can make use of message queueing for job processing but I'm always wary to use it. To process a message in a transacted way you need MSDTC which is unsupported with many high-availability solutions for SQL Server.
You might think that this architecture is not very sophisticated. It makes use of polling for lots of things. Polling is a primitive technique but it works quite well. It is reliable and well-understood. It has a simple concurrency model.
If you can assume that your application never exits at inopportune times the architecture would be much simpler. But this cannot be assumed. You cannot assume that there will be no deployments during work hours and that there will be no bugs leading to crashes.
Even if using http worker is a bad thing to run long task I have made a small example of how to manage it with SignalR :
Inside this example you can :
Start a task
See task progression
Cancel task
It's based on :
twitter bootstrap
knockoutjs
signalR
C# 5.0 async/await with CancelToken and IProgress
You can find the source of this example here :
https://github.com/dragouf/SignalR.Progress

Return async while having a thread still runnning (MVC .Net)

I got the following problem, I want to execute a block of code that might take a while.
It would be a bad user experience if the user has to wait for it to finish. So I though of using a thread.
[HttpPost]
public ActionResult methode(Model model){
Task.Factory.StartNew(() =>
{
// Do a block of code that takes a while
});
return Json(new
{
succes = GetValue()
});
}
When debugging you can clearly see that the thread is being executed and that the return code is reached.
Problem I got here is that the actual return takes place when the thread is done. ( so I am not gaining any speed here.)
Why is that? And how do I make it work?
Thanks in advance!
The way you have done it above won't work because as you have observed, you have await the task and then return, so effectively it's a synchronous operation from the user's point of view and they are stuck waiting for the server to respond with the actual results of the work.
There are a couple of ways to do this though, it depends on what the nature of the task is. If this is something that generates data that the user might want to come back to later on and download e.g a report that can take a while to generate:
Using an asynchronous controller, kick off the task and keep a record on the server side via some unique identifier, and then return immediately with that identifier.
A client-side script is then triggered to poll every N seconds via an Ajax call, to see if the work is complete on the server side
However if the app pool gets recycled in IIS during the running of this thread, the work is lost and you haven't really got a way to control this
Better yet kick off the long running task in a separate process and then the client can poll every few seconds(or use SignalR from the server side to push) for when the job is done.
A typical way to achieve this is to push a record of the work to be done into a Queue and then return some kind of identifier to the client.
Your separate process can just be running all the time and monitoring the queue for new work to do, it can then update a database or cache with the results of the work.
Meanwhile the client is polling (or SignalR is pushing whereby your signalr hub checks every few seconds and pushes the job status back to the client) to see if that particular "job" is completed.
This way you are not at the mercy of the web server process going down and trashing all your threads, plus you gain much better control over cases where you need to defer jobs, spread the load over multiple servers, etc.
If on the other hand this is data that just takes a while to calculate but is only going to be available to the user on-screen, and you don't allow the user to leave the current page whilst waiting for the results, then just use a properly asynchronous Ajax call from the client-side!

Asynchronous MVC Controllers

I'm learning about the AsyncController in ASP.NET MVC and using it with the TPL, but I'm struggling to see its need, I can understand when you would want to run an Action asynchronously to do something like send out an email, but in reality would you ever use it to return a view from an action?
For example if the Action gets some data from a database, which is set to work async, then return a View, if the data fails to retrieve in time will the View not just return with no data in the model?
Would you ever use it to return a view from an action?
The main advantage of asynchrony in ASP.NET is scalability. While the asynchronous work executes, you're not consuming any threads. This means your application will consume less memory and it may be also faster.
If the data fails to retrieve in time will the View not just return with no data in the model?
That depends on you and how exactly will you handle that failure.
Async controllers are used primarily to give up the current thread pool thread to allow other incoming connections to process work while you are waiting for a long running process to complete.
This has nothing to do with pass a view back. The process will still "block" from the end users perspective, but on the server the resources the server needs to respond to incoming requests will not be consumed.
By default, there are 250 thread pool threads per cpu core in an IIS worker process to respond to incoming connections (this can be tuned, but in general you should know what you're doing). If you have to people waiting for long requests to complete, then nobody else will be able to connect to your server until one of them finishes. Async controllers fix that problem.
You can also offload CPU bound work to a dedicated thread when using async controllers, where that was more difficult in synchronous controllers. And, it allows you to perform tasks in parallel. For instance, suppose you have to go out to 10 web sites and retrieve data. Most of the time is spent waiting for those web requests to return, and they can be done in parallel if you are doing things async.

Resources