Design for long running ASP.net MVC web request - asp.net-mvc

I'm aware of the model that involves a scheduled task runninng in the back ground which runs jobs registered with a web request but how about this for an idea that keeps everything within ASP.net...
User uploads a CSV file with, perhaps, several thousand rows. The rows are persisted to the database. I think this would take maybe a minute or so which would be an acceptable wait.
Request returns to the browser and then an automatic Ajax request would go back to the server and request, say, ten rows at a time and process them. (Each row requires a number of web service requests.)
Ajax call returns, display is updated and then another automatic Ajax request goes back for more rows. This repeats until all rows are completed.
If user leaves the web page, then they could return and restart the job.
Any thoughts?
Cheers, Ian.

If i get you right, you actually dont need any "interaction" between background jobs and the long-running request, you just want to "lauch" background jobs with incoming requests? Not such a good idea. Take a look at the Quartz.NET project, it is scheduler embeddable into ASP.NET application, it will handle this stuff for you without need of requests. Of course, if there is app pool shutdown, also your scheduler goes down, but this you cant guarantee not to happen even with your long-running requests solution, dependent on browser waiting on other side.
Also take a look on this interesting article from phil haack on this topic, with his own little scheduler library specific for ASP.NET :
http://haacked.com/archive/2011/10/16/the-dangers-of-implementing-recurring-background-tasks-in-asp-net.aspx

A server side program (or ideally service) could still be quick and dirty and would be more reliable. You could still do step 1 as you have proposed, upload the file and insert the data ( don't forget to increase the maxRequestLength time out value in the web.config ). Then have a program running on the server that checks for new records and processes them.
If the user needs status you could store an entry in the database for each file and update the database record when the import is complete.

Maybe I'm reading the question and interpreting it in a weird way, but why couldn't you read the file into a database and store in a table the current line of the file that you've completed through. You could then track your progress via the db and just send small json objects telling the user how far along you are. That way if their connection drops you can keep processing their request, and if they return later you can notify them of how far along the job is. Also, if multiple clients are connecting you can use the db to queue and throttle (by serializing) the workload. Or if the user connects mid-job with another file, then their new request will be queued up after their current job.

Related

Rails ActiveRecord/Postgres single query timeout?

I have a logging query (a simple INSERT) that happens on every single request.
For this request only (the one that happens on every page load), I want to set the limit to 500ms in case the database is locked/slow/down it won't affect the site, where the site hangs while it waits to connect/write.
Is there a way I can specify a timeout somehow on a per-query basis that I can abort the LoggedRequest.create! if it's taking too long?
I don't want to set it in my config because I have many other queries that shouldn't have timeouts that low.
I'm using Postgres 11.7
I also don't know how I feel about setting a timeout for the entire session because I don't want that connection to be shared from the pool with other queries that can't have that timeout.
Rails 6 introduces event based triggers for notifications, logging etc that comes in very handy, provided you are using/can afford to migrate to Rails 6. Here'a useful post that demonstrates creating event based triggers for notifications/logging: https://pramodbshinde.wordpress.com/2020/03/20/custom-events-tracking-with-activesupportnotifications-and-audited/
If, for some reason, you cannot use Rails 6, perhaps this article might help you find some answers: https://evilmartians.com/chronicles/the-silence-of-the-ruby-exceptions-a-rails-postgresql-database-transaction-thriller
If I were you, I could also contemplate using AJAX with a fire-and-forget API request to server for logging/whatever that is not critical to normal functioning of the application.

Rails - ensuring only one simultaneous controller hit per user

tl;dr - Is there a way to ensure that a given Rails controller action stops executing when another simultaneous request from the same user comes in?
In my Rails/Angular app, I make requests to the Foursquare API from the client-side. Because they need to be authenticated, and my authentication information should remain secure, I pass these requests through a Rails controller in my own app.
For a more in-depth description of the architecture of this, check out this semi-related question.
My concern, as elaborated there, is that each request to this internal controller takes up server time (and on Heroku, ties up a dyno). I'd tried to make the action as fast as possible, but I'd still like to reduce the amount the server is tied up.
The amount the server is tied up is exacerbated by the real-time nature of the search I'm doing. The request is sent out to my server as a user types, not on enter or anything, because I wanted to allow for auto-suggestion.
I'm debouncing the user input (0.4 seconds), so a the request isn't made til a user briefly stops typing. But if a user pauses a few times while typing, and a request goes out each time, this can quickly cause multiple dynos to get used.
More concretely, assuming a roughly ~1.3s API response time from Foursquare, imagine this scenario: A user types "ameri", then waits 0.4 seconds, then types "can", then waits 0.4 seconds, then types " beauty", completing their query. This would send three separate requests, all of which would need to be handled by different dynos, because none of the requests have a chance to return before the next comes in.
This would either cost me a ton of money (if I have a bunch of users, that means a large number of dynos to protect against concurrency timeouts) or cause really annoying waits on the user front.
So my thought would be that it would be awesome if I could essentially do a retroactive debounce on the server side, by terminating any running requests to Foursquare coming from that user before sending a new request out. That would mean that in the above concrete example, while 3 requests started, only the last request would come back, because the first two would be dropped midstream when a new one came in.
I was thinking of storing some variable in session for each that would be true when a request was executing. Then, the next request wouldn't go out if it was triggered. But that's actually sort of the opposite of what I want, because I want the original request to get canceled when the new one comes in. I just don't know how to access that request from within the latter on.
This feels complicated, so I'm guessing it may be impossible (particularly as each controller action is responded to by a new controller instance), but does anyone know a way to cancel controller actions if the same action is hit by the same user again while the first request is getting resolved?
Thanks!

Rails application design: Queueing, Resque, Background Services, and Redis

I am designing a Rails app that takes in requests, uses data within the request to call a 3rd party web service, process the reply and then sends out a response to the original requestor and also issues a PUT request to yet another service.
I am trying to wrap my head around how to design this Rails app as it's different from the canonical Rails structure.
The objects are Lists and Tasks. Each List has many Tasks, and each Task belongs to a List.
The request I would get is something like:
http://myrailsapp.heroku.com/v1/lists?id=1&from=2012-02-12&to=2012-02-14&priority=high
In this example I am requesting tasks from 2/12/2012 to 2/14/2012 with a high priority in List #1
I would then issue a 3rd party web service call like this:
http://thirdpartywebservice.com/v1/lists?id=4128&from=2012-02-12&to=2012-02-14&priority=high
As you can see some processing was done on the data (id was changed in this case)
The results are then sent back to the requestor and to another web service via PUT.
My question is, how do I set up the Rails app to handle these types of behaviors? How does the controller structure change? This looks like a good use case for queues, how do I distribute multiple concurrent requests among queues?
For one thing I don't need data persistence (data can be discarded after the response is sent out) and also data structure design is simplified. (I don't think I need ruby objects, simply dictionaries or hashes representing these would be lighter weight and quicker to implement)
Edit
So I broke down the work flow of the app into these components
Parse incoming request
Construct 3rd part web service request
Send 3rd party request
Enqueue a worker to process the expected response
Process the response once it arrives
Send the parsed result back as a response
Which of the standard ruby controllers handle each of these steps? What are the models needed besides Lists and Tasks?
You should still use a database because passing data to Resque is messy. Rather, you should store it in the database and then pass the id to the workers, fetch the data, commit any new data or delete the record. It's really up to you but this method is cleaner. You can also use a push service like faye to let the user know when the processing is complete.
If you expect to have many concurrent requests, I would recommend Sidekiq as it's less of a memory hog. Having 4-5 resque workers can already suck up about 512 MB. The controller structure should not change. Please comment on anything you need clarified and I'll be happy to update my answer.
EDIT
You would want to use a separate database store, such as Postgres. Not sure if it's important what models you need, but essentially this is what should be happening.
In your controller, create a Request object which contains the query params you want to query this 3rd party service with. Then enqueue a job to be handled by Sidekiq/Resque, let's call this ThirdPartyRequest and pass in the id of the Request object you just created as an argument. Then render a view here showing the Request object. Let's say that Request#response is still empty cause it hasn't been processed yet, so let the user know it's still processing.
A worker then handles your job ThirdPartyRequest. ThirdPartyRequest should then fetch the Request object and obtain the query params needed to contact the third party service. It does that then gets a Request. Update the Request object with this Request then save it.
class ThirdPartyRequest
def self.perform(request_id)
request = Request.find(request_id)
# contact third party service
request.response = ...
request.save
end
end
The user can continually refresh his page to check on his/her Request object. Once it gets updated with the response, they will know its completed. If you want the page to refresh automatically, look into faye/juggernaut/private_pub or a SaaS solution like Pusher.

HTTP requests in transactions?

I have a model which sends a HTTP request to an external web service on creation in order to find out some information to add before it is saved.
Currently I'm doing this in a before_create callback. I recently learned that before/after callbacks happen within database transactions.
Am I opening myself up to any issues such as limiting DB throughput by doing this? Would it be better to commit the record before sending the http request and then update the record when it returns?
As long a s you keep a transaction open, all the locks it acquired are active. If you have a call to an external source that may stall you for a long period of time, be sure not to to have any unrelated locks in the same transaction.
In other words: don't put anything else into the same transaction.
If you don't mind the new row being visible before you look up the additional information, you might just commit and later update the row.
Or you fetch the information from the external web service before you even start the transaction. That would be cleanest / fastest solution for the database.
PostgreSQL lock types.
How to view locks.

How to send many emails via ASP.NET without delaying response

Following a specific action the user takes on my website, a number of messages must be sent to different emails. Is it possible to have a separate thread or worker take care of sending multiple emails so as to avoid having the response from the server take a while to return if there are a lot of emails to send?
I would like to avoid using system process or scheduled tasks, email queues.
You can definitely spawn off a background thread in your controller to handle the emails asynchronously.
I know you want to avoid queues, but another thing i have done in the past is written a windows service that pulls email from a DB queue and processes it at certain intervals. This way you can separate the 2 applications if there is a lot of email to be sent.
This can be done in many different ways, depending on how large your application is and what kind of reliability you want. Any of these ways should help you achieve what you want (in ascending order based on complexity):
If you're using IIS SMTP Server or another mail server that supports a pickup directory option, you can go with that. With this option, instead of sending the emails directly, they are saved first in the pickup directory. Your call will immediately return after the email is saved in the pickup directory, so the user won't have to wait until the email is sent. On the other hand, the server will try to send the email as soon as it's saved in the pickup directory so it's almost immediate (just without blocking the call).
You can use a background thread like described in other answers. You'll need to be careful with this option as the thread can end unexpectedly before it finishes its job. You'll need to add some code to make sure this works reliably (personally, I'd prefer not to use this option).
Using a messaging queue server like MSMQ. This is more work and you probably should only look into this if you have a large scale application or have good reasons not to use the first option with the pickup directory.
There are a few ways you could do this.
You could store enough details about the message in the database, and write a windows service to loop through them and send the email. When the user submits the form it just inserts the required data about the message and trusts the service will pick it up. Almost an email queue which you said you didn't want, but you're going to end up in a queue situation with almost any solution.
Another option would be to drop in NServiceBus. Use that for these kinds of tasks.
I typically compile the message body and store that in a table in the db along with the from and to addresses, a subject, and a timestamp indicating when the email was sent. Then I have a background task check the table periodically and pull any that haven't been sent. This task attempts to send each email and updates the timestamp accordingly. One advantage of storing the compiled message body up front is that the background task doesn't have to do any processing of context-specific data, and therefore can be pretty darn simple.
Whenever an operation like is hingent upon an event, there is always the possibility something will go wrong.
In ASP.NET you can spawn multiple threads and have those threads do the action. Make sure you tell the thread it's a background thread, otherwise ASP.NET might way for the thread to finish before rendering your page:
myThread.IsBackground = true;
I know you said you didn't want to use system process or scheduled tasks, but a windows service would be a viable approach to this as well. The approach would be to use MS Queue, or save the actions needing to be done in a DataBase table. Then have a windows service check every minute or so and do those actions.
This way, if something fails (Email server down) those emails / actions can still be done.
They will also be recorded for audit's (which is very nice to have).
This method allows you're web site to function as a website while offloading these tasks to another service. The last thing you need is for multiple ASP.NET processes to be used up waiting for emails to send. let something else handle that.

Resources