I'm studying OS synchronization problems. I understand semaphores and their use in reader-writer and producer-consumer situations. I'm not getting the concept of monitors, though. Can someone help me understand them?
Super simple high level answer:
A semaphore counts how many are using a resource (or a pool of resources) and stops when a limit is reached. (For example, you could have a semaphore of 3 then the first 3 would be able to use the resource and then any additional would be locked out till a resource is released -- only 3 can have a lock at once.)
A monitor only allows a single lock -- by one process. When something is using it nothing else can.
A semaphore that counts to 1 is the same as a monitor.
Because a semaphore is designed to do more, a semaphore that only counted to 1 would not be efficient. (That is, when one implements a monitor it is more efficient than a semaphore which counts to 1 because a monitor has less requirements).
Related
I have 2 tasks that need to be performed:
One is synchronized with refresh rate via the Present call; does fancy graphics.
The other does a bunch of computations on a virtually infinite workload; does not need to be synchronous with the first task; really does not like being interrupted (encourages coarser workload granularity).
Is there a way to optimally use the GPU in this situation with DirectX?
Perhaps the solution would:
issue Dispatch (or Draw) calls in a way that allows them to run/finish asynchronously.
signal the current shader to stop.
use hardware or driver scheduling.
Right now my soultion is to try and predict how long it would take to run the shaders, which is unreliable, unless I add a bunch of downtime...
Trying to avoid the th**ad word as it means a different thing on GPUs
Create two separate D3D11 devices. Use one for the rendering, and another one (driven from another CPU thread with lower priority) for the computations.
Rework your low-priority computations making each Dispatch() to take a couple milliseconds of GPU time to complete. Don’t submit many compute calls at once: use 2 queries or a single fence to never dispatch more than 2 pending compute calls. Dispatch 2 calls initially, when the first is complete dispatch the 3-rd one, etc.
While 3D rendering on your main thread, lock an std::mutex, release once you rendered the scene before Present. On the background thread, lock that mutex when submitting more compute tasks, but keep it unlocked while waiting for a query or fence.
You still gonna have some interference between these two tasks, but it might be good enough for your use case.
Ideally, consider using timestamp queries to measure GPU time spent computing your background tasks. Then adjust size of the single task dynamically based on these numbers, this should allow to achieve ideal granularity of these tasks regardless on the GPU performance. Don’t forget to apply some rolling average over the last 5-10 completed tasks before using the number for these adjustments.
Pretty simple question that I haven't found anywhere in documentation or tutorials on GCD: What happens if I'm submitting work to queues faster than it's being processed and removed? I'm aware that GCD queues have no size limit, would work just pile up until the program runs out of memory? Is there any way to properly handle this situation?
What happens if I'm submitting work to queues faster than it's being processed and removed?
It depends.
If dispatching tasks to a single/shared serial queue, they will just be added to the queue and it will process them in a FIFO manner. No problem. Memory is your only constraint.
If dispatching tasks to a concurrent queue, though, you end up with “thread explosion”, and you will quickly exhaust the limited number of worker threads available for that quality-of-service (QoS). This can result in unpredictable behaviors should the OS need to avail itself of a queue of the same QoS. Thus, you must be very careful to avoid this thread explosion.
See a discussion on thread explosion WWDC 2015 Building Responsive and Efficient Apps with GCD and again in WWDC 2016 Concurrent Programming With GCD in Swift 3.
Is there any way to properly handle this situation?
It is hard to answer that in the abstract. Different situations call for different solutions.
In the case of thread explosion, the solution is to constrain the degree of concurrency using concurrentPerform (limiting the concurrency to the number of cores on your device). Or we use operation queues and their maxConcurrentOperationCount to limit the degree of concurrency to something reasonable. There are other patterns, too, but the idea is to constrain concurrency to something suitable for the device in question.
But if you're just dispatching a large number of tasks to a serial queue, there's not much you can do (other than looking for parallelism opportunities, to make efficient use of all of CPU’s cores). But that's OK, as that is the whole purpose of a queue, to let it perform tasks in the order they were submitted, even if the queue can't keep up. It wouldn’t be a “queue” if it didn’t follow this FIFO sort of pattern.
Now if dealing with real-time data that cannot be processed quickly enough, you have a different problem. In that case, you might want to decouple the capture of the input from the processing and decide how to you want to handle it. E.g. if you can't keep up with real-time processing of a video, for example, you have a choice. Either you start dropping frames or process the data asynchronously/later. You just have to decide what is right for your use case. We cannot answer this question in the abstract.
I know that the max number of threads spawned cannot exceed 66 through the response to this question. But is there a way to limit the thread count to a value which an user has defined?
From my experience and work with GCD under various circumstances, I believe this is not possible.
Said that, it is very important to understand, that by using GCD, you spawn queues, not threads. Whenever a call to create a queue is made from your code, GCD subsystem in its turn checks OS condition and seeks for available resources. New threads are then created under the hood based on these conditions – in the order and with the resources allocated, not controlled by you. This is clearly explained in official documentation:
When it comes to adding concurrency to an application, dispatch queues
provide several advantages over threads. The most direct advantage is
the simplicity of the work-queue programming model. With threads, you
have to write code both for the work you want to perform and for the
creation and management of the threads themselves. Dispatch queues let
you focus on the work you actually want to perform without having to
worry about the thread creation and management. Instead, the system
handles all of the thread creation and management for you. The
advantage is that the system is able to manage threads much more
efficiently than any single application ever could. The system can
scale the number of threads dynamically based on the available
resources and current system conditions. In addition, the system is
usually able to start running your task more quickly than you could if
you created the thread yourself.
Source: Dispatch Queues
There is no way you can control resources consumption with GCD, like by setting some kind of threshold. GCD is a high-level abstraction over low-level things, such as threads, and it manages it for you.
The only way you can possibly influence how many resources particular task within your application should take, is by setting its QoS (Quality of Service) class (formerly known simply as priority, extended to a more complex concept). To be brief, you can classify tasks within your application based on their importance, this way helping GCD and your application be more resource- and battery- efficient. Its employment is highly encouraged in complex applications with vast concurrency usage.
Even still, however, this kind of regulation from developer end has its limits and ultimately does not address the goal to control threads creation:
Apps and operations compete to use finite resources—CPU, memory,
network interfaces, and so on. In order to remain responsive and
efficient, the system needs to prioritize tasks and make intelligent
decisions about when to execute them.
Work that directly impacts the user, such as UI updates, is extremely
important and takes precedence over other work that may be occurring
in the background. This higher priority work often uses more energy,
as it may require substantial and immediate access to system
resources.
As a developer, you can help the system prioritize more effectively by
categorizing your app’s work, based on importance. Even if you’ve
implemented other efficiency measures, such as deferring work until an
optimal time, the system still needs to perform some level of
prioritization. Therefore, it is still important to categorize the
work your app performs.
Source: Prioritize Work with Quality of Service Classes
To conclude, if you are deliberate in your intent to control threads, don't use GCD. Use low-level programming techniques and manage them yourself. If you use GCD, then you agree to leave this kind of responsibility to GCD.
pthread_mutex_trylock detects deadlocks, doesn't block, then why would you even "need" pthread_mutex_lock?
Perhaps when you deliberately want the thread to block? But in that case it may result in a deadlock?
pthread_mutex_trylock does not detect deadlocks.
You can use it to avoid deadlocks but you have to do that by wrapping your own code around it, effectively multiple calls to pthread_mutex_trylock in a loop with a time-out, after which your thread releases all its resources.
In any case, you can avoid deadlocks even with pthread_mutex_lock if you just follow the simple rule that all threads allocate resources in the same order.
You use pthread_mutex_lock if you just want to efficiently wait until the resource is available, without having to spin on the mutex, something which is often very inefficient. Properly designed multi-threaded applications have no need for the pthread_mutex_trylock variant.
Locks should only be held for the absolute minimum time to do the work and, if that's too long, you can generally redesign things so the lock time is less (such as by using the mutex to only copy data to a thread's local data areas, and having the long-running bit work on that after the mutex is released).
The pseudo-code:
while not pthread_mutex_trylock:
yield
will continue to run your thread, waiting for the lock to be available, especially since there is no pthread_yield() in POSIX threads (though it's sometimes provided as a non-portable extension).
That means, at worst, the code segment above won't even be able to portably yield the CPU, therefore chewing up the rest of it's quantum every time through the scheduler cycle.
And at best, it will still activate the thread once per scheduler cycle just to see if the mutex can be obtained.
Whereas:
pthread_mutex_lock
will most likely totally pause your thread until the lock is made available, since it will move it to a waiting queue until the current lock holder releases the mutex.
That's probably the major reason why you should prefer pthread_mutex_lock to pthread_mutex_trylock.
Perhaps when you deliberately want the thread to block?
Yup, exactly in this case. But you can mimic pthread_mutex_lock() behavior with something like that
while(pthread_mutex_trylock(&mtx))
pthread_yield()
Asp.net 2 have 12 threads by default
Now Asp.Net 4 have 5000. We still need async controllers?
We still need async controllers?
Yes. Async controllers are useful in situations where you have lengthy operations such as network calls and you don't want to monopolize worker threads for them. The fact that there are 5000 worker threads by default doesn't mean that you have to waste them. Is it because you are a millionaire that you are giving away your money? No.
Obviously if you don't use async controllers correctly they will do more harm than good.
MVC 4/Dev11 makes Async controllers more appealing than previous versions. Add to that WebAPI making it easy to create web services.
Start of Levi's comments so they won't be missed (under #Darin Dimitrov's excellent answer)
Expanding on Darin's answer a bit - asynchronous I/O operations (which
is what AsyncController is intended for) operate using IOCP, not
ThreadPool threads. This is important, as each ThreadPool thread has
an associated 1 MB stack (plus other overhead), so if you're using
5000 ThreadPool threads, you're automatically losing 5 GB of memory
just due to overhead! IOCP continuations have nowhere near as much
overhead, so it's possible to juggle greater numbers of them at any
given time. ThreadPool threads are pooled and removed when no longer
needed - so you're only taking the hit for threads which are currently
active. But if you're doing a lot of concurrent CPU-bound work with
ThreadPool, you're very rapidly going to start hitting memory issues.
This is precisely one of the reasons the C# / VB teams released the
Async CTP a few months ago - to try to solve this issue.
Async for Web service often makes sense - see Should my database calls be Asynchronous Part II
For database applications using async operations to reduce the number of blocked threads on the web server is almost always a complete waste of time. A small web server can easily handle way more simultaneous blocking requests than your database back-end can process concurrently. Instead make sure your service calls are cheap at the database, and limit the number of concurrently executing requests to a number that you have tested to work correctly and maximize overall transaction throughput
See Should my database calls be Asynchronous?
4 or 5000, it doesn't matter, it is just a setting. You can set it to 1 billion if you want to, that won't make your application more scalabe. In the end your machine only has 4 cores (or 8 or 2, but not 5000). Always keep in mind that you can only ever have as many threads running at the same time that the number of cores. Every thread you have in excess to your number of cores is just overhead. It will create more context switches, consume CPU and occupy more memory.
IO (database access, web service, file access...) is not taking up any CPU. If you do it synchronously, it will block a thread for the length of the operation. If you have a lengthy operation (5 seconds), and a load of 1,000 request per second, you will be permanently blocking 5,000 threads. So you are already starving the thread pool (with a setting of 5,000). But what is worse is that you will be trashing your maching with context switches. If you do it asynchronously, no thread will be blocked, no resource is taken, and you have no limit on the number of concurrent IO you can do.
Adding more threads in the thread pool is a quick and dirty hack when you can't afford rewriting your application using asynchronous IO. It is not a clean solution.