DispatchQueue - QOS Confusion - ios

Please consider the following statement:
DispatchQueue.global(qos: .userInitiated).asyncAfter(deadline: .now() + .milliseconds(500), qos: .utility, flags: .noQoS) {
print("What is my QOS?")
}
Notice how many of the parameters refer to the quality of service. How can a mere mortal possibility sort out the permutations?

Generally you shouldn't try to sort out all those permutations. In most cases, messing around too much with QoS is a recipe for trouble. But there are fairly simple rules.
Queues have priorities, and they can assign that priority to blocks that request to inherit.
This particular block is explicitly requesting a lower priority, but then says "ignore my QoS request." As a rule, don't do that. The only reason I know of for doing that is if you're interacting with some legacy API that doesn't understand QoS. (I've never encountered this myself, and it's hard to imagine it coming up in user-level code.)
A more interesting question IMO (and one that comes up much more often in real code) is this one:
DispatchQueue.global(qos: .utility).async(qos: .userInitiated) {}
What is the priority of this block? The answer is .userInitiated, and the block will "loan" its priority to the queue until it finishes executing. So for some period of time, this entire queue will become .userInitiated. This is to prevent priority inversion (where a high-priority task blocks waiting for a low-priority task).
This is all discussed in depth in Concurrent Programming With GCD in Swift 3, which is a must-watch for anyone interested in non-trivial GCD.

Related

How to implement a lock and unlock sequence in a metal shader?

How should I implement a lock/unlock sequence with Compare and Swap using a Metal compute shader.
I’ve tested this sample code but it does not seem to work. For some reason, the threads are not detecting that the lock was released.
Here is a brief explanation of the code below:
The depthFlag is an array of atomic_bools. In this simple example, I simply try to do a lock by comparing the content of depthFlag[1]. I then go ahead and do my operation and once the operation is done, I do an unlock.
As stated above, only one thread is able to do the locking/work/unlocking but the rest of the threads get stuck in the while loop. They NEVER leave. I expect another thread to detect the unlock and go through the sequence.
What am I doing wrong? My knowledge on CAS is limited, so I appreciate any tips.
kernel void testFunction(device float *depthBuffer[[buffer(4)]], device atomic_bool *depthFlag [[buffer(5)]], uint index[[thread_position_in_grid]]){
//lock
bool expected=false;
while(!atomic_compare_exchange_weak_explicit(&depthFlag[1],&expected,true,memory_order_relaxed,memory_order_relaxed)){
//wait
expected=false;
}
//Do my operation here
//unlock
atomic_store_explicit(&depthFlag[1], false, memory_order_relaxed);
//barrier
}
You essentially can't use the locking programming model for GPU concurrency. For one, the relaxed memory order model (the only one available) is not suitable for this; for another, you can't guarantee that other threads will make progress between your atomic operations. Your code must always be able to make progress, regardless of what the other threads are doing.
My recommendation is that you use something like the following model instead:
Read atomic value to check if another thread has already completed the operation in question.
If no other thread has done it yet, perform the operation. (But don't cause any side effects, i.e. don't write to device memory.)
Perform an atomic operation to indicate your thread has completed the operation while checking whether another thread got there first. (e.g. compare-and-swap a boolean, but increasing a counter also works)
If another thread got there first, don't perform side effects.
If your thread "won" and no other thread registered completion, perform your operation's side effects, e.g. do whatever you need to do to write out the result etc.
This works well if there's not much competition, and if the result does not vary depending on which thread performs the operation.
The occasional discarded work should not matter. If there is significant competition, use thread groups; within a thread group, the threads can coordinate which thread will perform which operation. You may still end up with wasted computation from competition between groups. If this is a problem, you may need to change your approach more fundamentally.
If the results of the operation are not deterministic, and the threads all need to proceed using the same result, you will need to change your approach. For example, split your kernels up so any computation which depends on the result of the operation in question runs in a sequentially queued kernel.

dispatch_queue_t slows down after some time

I have a question about custom DispatchQueue.
I created a queue and I use it as a queue for captureOutput: method. Here's a code snippet:
//At the file header
private let videoQueue = DispatchQueue(label: "videoQueue")
//setting queue to AVCaptureVideoDataOutput
dataOutput.setSampleBufferDelegate(self, queue: videoQueue)
After I get the frame, I'm doing an expensive performance with it and I'm doing it for each frame.
When I launch the app, my expensive performance takes to 17 ms to compute and thankfully to that I have around 46-47 fps. So far so good.
But after some time (around 10-15 seconds), this expensive performance starts taking more and more time and in 1-2 minute I end up with 35-36 fps, and instead of 17 ms I have around 20-25 ms.
Unfortunately, I can't provide the code of expensive performance because there's a lot of it and at least XCode tells that I do not have any memory leaks.
I know that manually created DispatchQueue doesn't really work in its own because all tasks I put there eventually end up in iOS default thread pool (I'm talking about BACKGROUND, UTILITY, USER_INTERACTIVE, etc). And for me it looks like videoQueue looses a priority with some period of time.
If my guess is right - is there any way to influence that? The performance of my DispatchQueue is very crucial and I want to give it the highest priority all the time.
If I'm not right, I would very much appreciate if someone can give me a direction I should investigate. Thanks in advance!
First, I would suspect other parts of your code, and probably some kind of memory accumulation. I would expect you're either reprocessing some portion of the same data over and over again, or you're doing a lot of memory allocation/deallocation (which can lead to memory fragmentation). Lots of memory issues don't show up as "leaks" (because they're not leaks). The way to explore this is with Instruments.
That said, you probably don't want to run this at the default QoS. You shouldn't think of QoS as "priority." It's more complicated than that (which is why it's called "quality-of-service," not "priority"). You should assign work to a queue whose QoS matches the how it impacts the user. In this case, it looks like you are updating the UI in real-time. That matches the .userInteractive QoS:
private let videoQueue = DispatchQueue(label: "videoQueue", qos: .userInteractive)
This may improve things, but I suspect other problems in your code that Instruments will help you reveal.

Block Operation - Completion Block returning random results

My block operation completion handler is displaying random results. Not sure why. I've read this and all lessons say it is similar to Dispatch Groups in GCD
Please find my code below
import Foundation
let sentence = "I love my car"
let wordOperation = BlockOperation()
var wordArray = [String]()
for word in sentence.split(separator: " ") {
wordOperation.addExecutionBlock {
print(word)
wordArray.append(String(word))
}
}
wordOperation.completionBlock = {
print(wordArray)
print("Completion Block")
}
wordOperation.start()
I was expecting my output to be ["I", "love", "my", "car"] (it should display all these words - either in sequence or in random order)
But when I run my output is either ["my"] or ["love"] or ["I", "car"] - it prints randomly without all expected values
Not sure why this is happening. Please advice
The problem is that those separate execution blocks may run concurrently with respect to each other, on separate threads. This is true if you start the operation like you have, or even if you added this operation to an operation queue with maxConcurrentOperationCount of 1. As the documentation says, when dealing with addExecutionBlock:
The specified block should not make any assumptions about its execution environment.
On top of this, Swift arrays are not a thread-safe. So in the absence of synchronization, concurrent interaction with a non-thread-safe object may result in unexpected behavior, such as what you’ve shared with us.
If you turn on TSAN, the thread sanitizer, (found in “Product” » “Scheme” » “Edit Scheme...”, or press ⌘+<, and then choose “Run” » “Diagnostics” » “Thread Sanitizer”) it will warn you about the data race.
So, bottom line, the problem isn’t addExecutionBlock, per se, but rather the attempt to mutate the array from multiple threads at the same time. If you used concurrent queue in conjunction with dispatch group, you can experience similar problems (though, like many race conditions, sometimes it is hard to manifest).
Theoretically, one could add synchronization code to your code snippet and that would fix the problem. But then again, it would be silly to try to initiate a bunch of concurrent updates, only to then employ synchronization within that to prevent concurrent updates. It would work, but would be inefficient. You only employ that pattern when the work on the background threads is substantial in comparison to the amount of time spent synchronizing updates to some shared resource. But that’s not the case here.

NSOperation hierarchy, units of work

So I was wondering what the best way to break out long tasks into NSOperations. If I have 3 long running tasks, is it better to have one NSOperation subclass that basically does something like
Single NSOperation subclass
- (void)main {
// do long running task 1
// do long running task 2
// do long running task 3
// call back the delegate
}
Or is it better to have each task be a subclass of NSOperation, and then manage each task from my ViewController as a single unit of work? Thanks in advance.
It depends whether the operation queue is serial (i.e. max concurrent operations 1) or parallel, and what the nature of the work is. If the queue is serial, then it really doesn't matter. If the queue is parallel, then it depends on a bunch of factors:
is the work safe to do concurrently
does the work contend on a shared resource (such as network or disk IO, or a lock) that would remove the concurrency
is each unit of work sufficiently large to be worth the overhead of dispatching separately
(edit)
Also, if you don't need the advanced features of NSOperationQueue (operation dependencies and priorities, KVO, etc...), consider using dispatch queues instead. They're significantly lighter weight.

How to programmatically control and balance a number of threads iOS app is executing?

How to control and balance the number of threads my app is executing, how to limit their number to avoid app's blocking because thread limit is reached?
Here on SO I saw the following possible answer: "Main concurrent queue (dispatch_get_global_queue) manages the number of threads automatically" which I don't like for the following reason:
Consider the following pattern (in my real app there are both more simple and more complex examples):
dispatch_queue_t defaultBackgroundQueue() {
return dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
}
dispatch_queue_t databaseQueue() {
dispatch_queue_create("Database private queue", 0);
}
dispatch_async(defaultBackgroundQueue(), ^{
[AFNetworkingAsynchronousRequestWithCompletionHandler:^(data){
dispatch_async(databaseQueue(), ^{
// data is about 100-200 elements to parse
for (el in data) {
}
maybe more AFNetworking requests and/or processing in other queues or
dispatch_async(dispatch_get_main_queue(), ^{
// At last! We can do something on UI.
});
});
}];
});
This design very often leads to the situation when:
The app is locked because of threads limit is reached (something like > 64)
the slower and thus narrow queues can be overwhelmed with a large number of pending jobs.
the second one also can produce a cancellation problem - if we have 100 jobs already waiting for execution in a serial queue we can't cancel them at once.
The obvious and dumb solution would be to replace sensitive dispatch_async methods with dispatch_sync, but it is definitely the one I don't like.
What is recommended approach for this kind of situations?
I hope an answer more smart than just "Use NSOperationQueue - it can limit the number of concurrent operations" does exist (similar topic: Number of threads with NSOperationQueueDefaultMaxConcurrentOperationCount).
UPDATE 1: The only decent pattern is see: is to replace all dispatch_async's of blocks to concurrent queues with running these blocks wrapped in NSOperations in NSOperationQueue-based concurrent queues with max operations limit set (in my case maybe also set a max operations limit on the NSOperationQueue-based queue that AFNetworking run all its operations in).
You are starting too many network requests. AFAIK it's not documented anywhere, but you can run up to 6 simultaneous network connections (which is a sensible number considering RFC 2616 8.1.4, paragraph 6). After that you get locking, and GCD compensates creating more threads, which by the way, have a stack space of 512KB each with pages allocated on demand. So yes, use NSOperation for this. I use it to queue network requests, increase the priority when the same object is requested again, pause and serialize to disk if the user leaves. I also monitor the speed of the network requests in bytes/time and change the number of concurrent operations.
While I don't see from your example where exactly you're creating "too many" background threads, I'll just try to answer the question of how to control the exact number of threads per queue. Apple's documentation says:
Concurrent queues (also known as a type of global dispatch queue) execute one or more tasks concurrently, but tasks are still started in the order in which they were added to the queue. The currently executing tasks run on distinct threads that are managed by the dispatch queue. The exact number of tasks executing at any given point is variable and depends on system conditions.
While you can now (since iOS5) create concurrent queues manually, there is no way to control how many jobs will be run concurrently by such a queue. The OS will balance the load automatically. If, for whatever reason, you don't want that, you could for example create a set of n serial queues manually and dispatch new jobs to one of your n queues at random:
NSArray *queues = #[dispatch_queue_create("com.myapp.queue1", 0),dispatch_queue_create("com.myapp.queue2", 0),dispatch_queue_create("com.myapp.queue3", 0)];
NSUInteger randQueue = arc4random() % [queues count];
dispatch_async([queues objectAtIndex:randQueue], ^{
NSLog(#"Do something");
});
randQueue = arc4random() % [queues count];
dispatch_async([queues objectAtIndex:randQueue], ^{
NSLog(#"Do something else");
});
I'm by no means endorsing this design - I think concurrent queues are pretty good at balancing system resources. But since you asked, I think this is a feasible approach.

Resources