I have a controller action that aggregates data from multiple sources: web service, database, file lookups, etc... and passes results to the view. So in order to render the page all tasks must have completed. Currently they are performed sequentially but as they are independent I am thinking of running them in parallel as this could improve performance.
So what would be the best approach to achieve this? For each task start a new thread and block the main thread as all tasks are finished? Should I use a thread from the thread pool or spawn a new thread manually? Using threads from the thread pool would limit my web server's capability of serving new requests so this might not be a good idea. Spawning new threads manually could be expensive, so at the end of the day would there be a net gain in performance by paralleling these tasks or just leave them run sequentially?
If it's between spawning your own threads or using the thread pool threads, I'd say use the ones from the thread pool. You can always adjust your server settings to allow for more threads in the pool if you find that you are running out of threads.
The only way to answer your final question would be to actually test it out, as we don't know how complicated the separate aggregation tasks are. If you want to give the illusion of a responsive UI, you could always display the loading page and kick off the aggregation with AJAX. Even non-threaded, this may placate your users sufficiently.
Related
I have a server I am creating (a messaging service) and I am doing some preliminary tests to benchmark it. So far, the fastest way to process the data is to do it directly on the process of the user and to use worker pools. I have tested spawning and that is unbelievable slow.
The test is just connecting 10k users, and having each one send 15kb of data a couple of times at the same time(or trying too atleast) and having the server process the data (total length, headers, and payload).
The issue I have with worker pools is its only fast when you have enough workers to offset the amount of connections. For example, if you have 500k, or 1 million users, you would need more workers to process all the concurrent data coming in. And, as for my testing, having 1000 workers would make it unusable.
So my question is the following: When does it make sense to use pools of workers? Will there be a tipping point where I would have to use workers to process the data to free up the user process? How many workers is too much, is 500,000 too much?
And, if workers are the way to go (for those massive concurrent distributed servers), I am guessing you can dynamically create/delete as you need?
Any literature is also appreciated!
Thanks for your answer!
Maybe worker pools are not the best tool for your problem. If I were you I would try using Jay Nelson's epocxy, which gives you a very basic backpressure mechanism while still letting you parallelize your tasks. From that library I would check either concurrency fount or concurrency control tools.
I have an ASP.NET MVC application which gathers data from multiple Databases.
The databases hold information for various sites and for every new site we have a new Database. The database for each site is connected at two points, from the site and then from HQ.
A web application updated data every minute from the site and the data is is served to the HQ (via another web application) every minute. Sometimes the application response is very slow and from what I have investigated, it may be because the connection pool starts filling up swiftly.
I want to ask what is the best approach to such application, where I can get the best performance out of it. Any guidance is welcome.
How to improve your web application performance regarding to database, really depends on your architecture. But there are some general rules which you should always follow:
Check about thread starvation:On the Web server, the .NET Framework
maintains a pool of threads that are used to service ASP.NET
requests. When a request arrives, a thread from the pool is
dispatched to process that request. If the request is processed
synchronously, the thread that processes the request is blocked
while the request is being processed, and that thread cannot service
another request.
This might not be a problem, because the thread
pool can be made large enough to accommodate many blocked threads.
However, the number of threads in the thread pool is limited. In
large applications that process multiple simultaneous long-running
requests, all available threads might be blocked. This condition is
known as thread starvation. When this condition is reached, the Web
server queues requests. If the request queue becomes full, the Web
server rejects requests with an HTTP 503 status (Server Too Busy).
for "thread starvation" the best approach is using "Asynchronous
Methods". refer here for more information.
Try to use using block for your datacontext, to dispose them immediately after finishing with them.
Huge data amount in transaction: you should check your code.
May be you using too much data without need to all of them. For
example you transfer all object which you may need just one
properties of object. In this case use "projection"(refer here for
an example).
Also you may use "lazy loading" or "eager loading" base on you
scenarios. But please be noted that none of these are magic tool for
every scenario. In some cases "lazy loading" improve performance and
on others "eager loading" makes things faster. It depends to your
deep understanding of these two terms and also your case of issue,
your code and your design.
Filter your data on server side or client side. Filtering data on server side helps to keep your server load and network traffic as less as possible. It also makes your application more responsive and with better performance. Use IQueryable Interface for server side filtering (check here for more information).
One side effect of using server side filtering is having better security
Check your architecture to see do you have any bottleneck. A
controller which gets called too much, a methods which handles lots
of objects with lots of data, a table in database which receives
requests continuously, all are candidates for bottle neck.
Ues cashing data when applicable for most requested data. But again
use cashing wisely and based on your situation. Wrong cashing makes
your server very slow.
If you think your speed issue is completely on your database, the best approach is using sql profiling tools to find out which point you have critical situation. Maybe redesign of your own tables could be an answer. Try to separate reading and writing tables as much as possible. Separation could be done by creating appropriate views. Also check this checklist for monitoring your database.
I am using in my app a background job system (Sidekiq) to manage some heavy job that should not block the UI.
I would like to transmit data from the background job to the main thread when the job is finished, e.g. the status of the job or the data done by the job.
At this moment I use Redis as middleware between the main thread and the background jobs. It store data, status,... of the background jobs so the main thread can read what it happens behind.
My question is: is this a good practice to manage data between the scheduled job and the main thread (using Redis or a key-value cache)? There are others procedures? Which is best and why?
Redis pub/sub are thing you are looking for.
You just subscribe main thread using subscribe command on channel, in which worker will announce job status using publish command.
As you already have Redis inside your environment, you don't need anything else to start.
Here are two other options that I have used in the past:
Unix sockets. This was extremely fiddly, creating and closing connections was a nuisance, but it does work. Also dealing with cleaning up sockets and interacting with the file system is a bit involved. Would not recommend.
Standard RDBMS. This is very easy to implement, and made sense for my use case, since the heavy job was associated with a specific model, so the status of the process could be stored in columns on that table. It also means that you only have one store to worry about in terms of consistency.
I have used memcached aswell, which does the same thing as Redis, here's a discussion comparing their features if you're interested. I found this to work well.
If Redis is working for you then I would stick with it. As far as I can see it is a reasonable solution to this problem. The only things that might cause issues are generating unique keys (probably not that hard), and also making sure that unused cache entries are cleaned up.
If I have a function that can be executed asynchronously without any dependencies and no other functions require its results directly, should I use spawn ? In my scenario I want to proceed to consume a message queue, so spawning would relif my blocking loop, but if there are other situations where I can distribute function calls as much as possible, will that affect negatively my application ?
Overall, what would be the pros and cons of using Spawn.
Unlike operating system processes or threads, Erlang processes are very light weight. There is minimal overhead in starting, stopping, and scheduling new processes. You should be able to spawn as many of them as you need (the max per vm is in the hundreds of thousands). The Actor model Erlang implements allows you to think about what is actually happening in parallel and write your programs to express that directly. Avoid complicating your logic with work queues if you can avoid it.
Spawn a process whenever it makes logical sense, and optimize only when you have to.
The first thing that come in mind is the size of parameters. They will be copied from your current process to the new one and if the parameters are huge it may be inefficient.
Another problem that may arise is bloating VM with such amount of processes that your system will become irresponsive. You can overcome this problem by using pool of worker processes or special monitor process that will allow to work only limited amount of such processes.
so spawning would relif my blocking loop
If you are in the situation that a loop will receive many messages requiring independant actions, don't hesitate and spawn new processes for each message processing, this way you will take advantage of the multicore capabilities (if any) of your computer. As kjw0188 says, the Erlang processes are very light weight and if the system hits the limit of process numbers alive in parallel (with the assumption that you are doing reasonable code) it is more likely that the application is overloading the capability of the node.
I am curious as to how to proceed with this issue; I currently have a DataSnap server setup with a TDSAuthenticationManager class managing the authentication.
If an authentication fails, is it safe for me to write directly onto a form TMemo or something similar for logging purposes? What's the best way to observe this?
Do I need threading?
Cheers for reading,
Adrian
Yes, you need synchronization, since Datasnap events run in the context of different threads, and as you may know, the UI programming is limited to the main thread.
So, if you want to display something in the UI, you have to take care of how to do it.
On the other hand, if you want to log to a file, you don't need synchronization, but you have to be careful, since it is possible for two different threads to try to log at the same time.
The options I would evaluate are:
Protect the access to the log file using a Critical Section, thus avoiding the multi-thread access with a lock. Only one thread can access the file at a time and all other interested threads have to wait.
Create a new logging class, from which a global instance that can take log requests by simply adding the log message to a (multi thread capable) queue in memory, and running it's own thread writing them to a file when there are messages in the queue.
Since servers tend to run as a services in production environments, I would choose the latter.