Do fibers have priorities? - fiber

Fibers are defined as lightweight threads, and threads have priorities because they are preemptively scheduled. However, since fibers are cooperative do they too have priorities?

No, they intrinsically don't have priorities, as in cooperative multi-tasking, the context switch target is always determined by the piece of code handing off control (e.g. calling Fiber.yield(), or however it might be called in your implementation).
This doesn't keep you from implementing a scheduler to determine the next switching at the application level, though, which could then decide based on a priority again.

Related

Why is there only limited usage of thread pools in TensorFlow-Federated?

TFF's threading libraries start a new thread from ThreadRun by default, and the only usage (as of TFF 0.42.0) of the optional ThreadPool parameter is in the implementation of a single executor. Why is this the case?
After conferring with some people who were close to the implementation, the understanding we came to was:
The issue with totally general usage of thread pools in TFF is generally that if used incorrectly, we may be courting deadlock. We need FIFO scheduling in the thread pool itself, and FIFO-compatible usage in the runtime (if you need the result of a computation, you need to know it will be started before you start).
When implementing the first usages of thread pools in the TF executor, we reasoned ourselves to believing the following statement is true: at the leaf executors (that is, so long as an executor doesnt have any children), this FIFO-compatible programming is guaranteed by the stateful executor interface. That is, if you need a value, you know it has already been created (otherwise the executor wouldn't be able to resolve it), so as long as the thread pool is FIFO, it will be ready before you execute. Either the creating function already pushed a function onto this FIFO queue, or just created the value directly, so you can push yourself onto the FIFO queue no sweat.
Due to difficulty, we haven't really tried to reason too hard about how / whether we might be able to make similar statements about executors which have children (and these children may be pushing work onto the queue; AFAIK we dont really currently make any guarantees about how we do this, but i could imagine reasoning about a similar invariant step-by-step 'up the stack'). Thus we have only considered it safe so far to inject thread pool usage at leaf executors. The fact that we don't have this in the XLAExecutor yet is simply due to lack of use.

Multitasking techniques

I have 2 tasks that need to be performed:
One is synchronized with refresh rate via the Present call; does fancy graphics.
The other does a bunch of computations on a virtually infinite workload; does not need to be synchronous with the first task; really does not like being interrupted (encourages coarser workload granularity).
Is there a way to optimally use the GPU in this situation with DirectX?
Perhaps the solution would:
issue Dispatch (or Draw) calls in a way that allows them to run/finish asynchronously.
signal the current shader to stop.
use hardware or driver scheduling.
Right now my soultion is to try and predict how long it would take to run the shaders, which is unreliable, unless I add a bunch of downtime...
Trying to avoid the th**ad word as it means a different thing on GPUs
Create two separate D3D11 devices. Use one for the rendering, and another one (driven from another CPU thread with lower priority) for the computations.
Rework your low-priority computations making each Dispatch() to take a couple milliseconds of GPU time to complete. Don’t submit many compute calls at once: use 2 queries or a single fence to never dispatch more than 2 pending compute calls. Dispatch 2 calls initially, when the first is complete dispatch the 3-rd one, etc.
While 3D rendering on your main thread, lock an std::mutex, release once you rendered the scene before Present. On the background thread, lock that mutex when submitting more compute tasks, but keep it unlocked while waiting for a query or fence.
You still gonna have some interference between these two tasks, but it might be good enough for your use case.
Ideally, consider using timestamp queries to measure GPU time spent computing your background tasks. Then adjust size of the single task dynamically based on these numbers, this should allow to achieve ideal granularity of these tasks regardless on the GPU performance. Don’t forget to apply some rolling average over the last 5-10 completed tasks before using the number for these adjustments.

Grand Central Dispatch: What happens when queues get overloaded?

Pretty simple question that I haven't found anywhere in documentation or tutorials on GCD: What happens if I'm submitting work to queues faster than it's being processed and removed? I'm aware that GCD queues have no size limit, would work just pile up until the program runs out of memory? Is there any way to properly handle this situation?
What happens if I'm submitting work to queues faster than it's being processed and removed?
It depends.
If dispatching tasks to a single/shared serial queue, they will just be added to the queue and it will process them in a FIFO manner. No problem. Memory is your only constraint.
If dispatching tasks to a concurrent queue, though, you end up with “thread explosion”, and you will quickly exhaust the limited number of worker threads available for that quality-of-service (QoS). This can result in unpredictable behaviors should the OS need to avail itself of a queue of the same QoS. Thus, you must be very careful to avoid this thread explosion.
See a discussion on thread explosion WWDC 2015 Building Responsive and Efficient Apps with GCD and again in WWDC 2016 Concurrent Programming With GCD in Swift 3.
Is there any way to properly handle this situation?
It is hard to answer that in the abstract. Different situations call for different solutions.
In the case of thread explosion, the solution is to constrain the degree of concurrency using concurrentPerform (limiting the concurrency to the number of cores on your device). Or we use operation queues and their maxConcurrentOperationCount to limit the degree of concurrency to something reasonable. There are other patterns, too, but the idea is to constrain concurrency to something suitable for the device in question.
But if you're just dispatching a large number of tasks to a serial queue, there's not much you can do (other than looking for parallelism opportunities, to make efficient use of all of CPU’s cores). But that's OK, as that is the whole purpose of a queue, to let it perform tasks in the order they were submitted, even if the queue can't keep up. It wouldn’t be a “queue” if it didn’t follow this FIFO sort of pattern.
Now if dealing with real-time data that cannot be processed quickly enough, you have a different problem. In that case, you might want to decouple the capture of the input from the processing and decide how to you want to handle it. E.g. if you can't keep up with real-time processing of a video, for example, you have a choice. Either you start dropping frames or process the data asynchronously/later. You just have to decide what is right for your use case. We cannot answer this question in the abstract.

Is there a way to limit the number of threads spawned by GCD in my application?

I know that the max number of threads spawned cannot exceed 66 through the response to this question. But is there a way to limit the thread count to a value which an user has defined?
From my experience and work with GCD under various circumstances, I believe this is not possible.
Said that, it is very important to understand, that by using GCD, you spawn queues, not threads. Whenever a call to create a queue is made from your code, GCD subsystem in its turn checks OS condition and seeks for available resources. New threads are then created under the hood based on these conditions – in the order and with the resources allocated, not controlled by you. This is clearly explained in official documentation:
When it comes to adding concurrency to an application, dispatch queues
provide several advantages over threads. The most direct advantage is
the simplicity of the work-queue programming model. With threads, you
have to write code both for the work you want to perform and for the
creation and management of the threads themselves. Dispatch queues let
you focus on the work you actually want to perform without having to
worry about the thread creation and management. Instead, the system
handles all of the thread creation and management for you. The
advantage is that the system is able to manage threads much more
efficiently than any single application ever could. The system can
scale the number of threads dynamically based on the available
resources and current system conditions. In addition, the system is
usually able to start running your task more quickly than you could if
you created the thread yourself.
Source: Dispatch Queues
There is no way you can control resources consumption with GCD, like by setting some kind of threshold. GCD is a high-level abstraction over low-level things, such as threads, and it manages it for you.
The only way you can possibly influence how many resources particular task within your application should take, is by setting its QoS (Quality of Service) class (formerly known simply as priority, extended to a more complex concept). To be brief, you can classify tasks within your application based on their importance, this way helping GCD and your application be more resource- and battery- efficient. Its employment is highly encouraged in complex applications with vast concurrency usage.
Even still, however, this kind of regulation from developer end has its limits and ultimately does not address the goal to control threads creation:
Apps and operations compete to use finite resources—CPU, memory,
network interfaces, and so on. In order to remain responsive and
efficient, the system needs to prioritize tasks and make intelligent
decisions about when to execute them.
Work that directly impacts the user, such as UI updates, is extremely
important and takes precedence over other work that may be occurring
in the background. This higher priority work often uses more energy,
as it may require substantial and immediate access to system
resources.
As a developer, you can help the system prioritize more effectively by
categorizing your app’s work, based on importance. Even if you’ve
implemented other efficiency measures, such as deferring work until an
optimal time, the system still needs to perform some level of
prioritization. Therefore, it is still important to categorize the
work your app performs.
Source: Prioritize Work with Quality of Service Classes
To conclude, if you are deliberate in your intent to control threads, don't use GCD. Use low-level programming techniques and manage them yourself. If you use GCD, then you agree to leave this kind of responsibility to GCD.

Cost of switching between many EAGLContexts?

I’m working on some code that has a grid view (~20 child views on screen at once). Each child view draws its content in GL, and has its own drawing thread and EAGLContext.
The advantage of this is that each view is relatively insulated from other GL usage in the app, though with 20 such views on screen, we have to glFlush+setCurrentContext: 20 times per frame. My gut tells me this is not the most efficient use of GL.
My questions:
What's the cost of switching contexts?
Does having to glFlush for each context actually slow it down, or does glFlush only stall the current context?
• Does having to glFlush for each context actually slow it down, or does glFlush only stall the current context?
Contexts have their own individual command streams.
All of this stuff eventually has to be serialized for drawing on a single GPU, so flushing the command stream for 20 concurrent contexts is going to put some pressure on whatever part of the driver does that.
Luckily, GL does not guarantee any sort of synchronization between different contexts so GL itself is not going to spend a whole lot of effort making sure commands from different contexts are executed in a particular order relative to one another. However, if you were waiting on a fence sync. object associated with another context in one of the command streams then it would introduce some interesting GL-related overhead.
• What's the cost of switching contexts?
Why are you switching contexts?
You said that each view has its own thread and context, so I am having trouble understanding why you would ever change the context current to a thread.
The cost of switching contexts is very hardware dependent. Newer hardware generations tend to have more efficient context switching support. But it's generally a pretty heavy weight operation in any case.
The cost of a glFlush is neither very small nor very large. Not something you want to do more often than needed, but not very harmful when used in moderation. I would be much more worried about the context switch than the flush. As Andon mentioned in his response, a glFlush will not be enough if you need synchronization between the contexts/threads. That will require either a glFinish, or some kind of fence.
With your setup, you'll also have the price for thread switches on the CPU.
I think your gut feeling is absolutely right. The way I understand your use case, it would probably be much more efficient to render your sub-views in sequence, with a single rendering thread and context. It might make the state management a little more cumbersome, but you should be able to handle that fairly cleanly.
To make this answer more self contained, with credit to Andon: You don't have to make calls to set the current context, since the current context is maintained per thread. But on the GPU level, there will still be context switches as soon as work from multiple contexts is submitted.

Resources