How does a DispatchQueue work? (specifically multithreading) - ios

I don't understand the workings of a DispatchQueue and wanted to learn more about how they implement the foundational queueing theory requirements. I tried to inspect a queue using:
dump(DispatchQueue.global())
And this gave this output:
- <OS_dispatch_queue_global: com.apple.root.default-qos[0x10c041f00] = { xref = -2147483648, ref = -2147483648, sref = 1, target = [0x0], width = 0xfff, state = 0x0060000000000000, in-barrier}> #0
- super: OS_dispatch_queue
- super: OS_dispatch_object
- super: OS_object
- super: NSObject
I got that the label is com.apple.root.default-qos, and this is specified in the Apple docs and the class is the packaged OS_dispatch_queue_global. I understand qos is queryable on the queue itself and that makes sense as well. Width I think just means the allocated memory size.
What I don't understand are the relevances of xref, ref and sref, I think they are internal ids for the queues but I am not sure. I think they are related to fundamental queueing concepts (multithreading came to mind) but would be great to hone into this in more detail.
Is the autoreleaseFrequency hidden from this debug description? Also, what does in-barrier = 0 mean? I tried creating a custom queue and this was replaced by in-flight = 0.. so confused about that as well.
Any ideas on how these undocumented variables relate to queueing theory? I think these are undocumented internals of the API, so any educated and justified explanations would be fine!
Thanks.

Why ask this?
This is a fairly broad question about the internals of grand-central-dispatch. I had difficulty understanding the dumped output because the original WWDC '10 videos and slides for GCD are no longer public. I also didn't know about the open-source libdispatch repo (thanks Rob). That needn't be a problem, but there are no related QAs on SO explaining the topic in detail.
Why GCD?
According to the WWDC '10 GCD transcripts (Thanks Rob), the main idea behind the API was to simplify the boilerplate associated with using the #selector API for multithreading.
Benefits of GCD
Apple released a new block-based API instead of going with function pointers, to also enable type-safe code that wouldn't crash if the block had the wrong type signature. Using typedefs also made code cleaner when used in function parameters, local variables and #property declarations. Queues allow you to capture code and some state as a chunk of data that get managed, enqueued and executed automatically behind the scenes.
The same session mentions how GCD manages low-level threads under the hood. It enqueues blocks to execute on threads when they need to be executed and then releases those threads (PThreads to be precise) when they are no longer referenced. GCD manages threads automatically and doesn't expose this API - when a DispatchWorkItem is dequeued GCD creates a thread for this block to execute on.
Drawbacks of performSelector
performSelector:onThread:withObject:waitUntilDone: has numerous drawbacks that suggest poor design for the modern challenges of concurrency, waiting, synchronisation. leads to pyramids of doom when switching threads in a func. Furthermore, the NSObject.performSelector family of threading methods are inflexible and limited:
No options to optimise for concurrent, initially inactive, or synchronisation on a particular thread. Unlike GCD.
Only selectors can be dispatched on to new threads (awful).
Lots of threads for a given function leads to messy code (pyramids of doom).
No support for queueing without a limited (at the time when GCD was announced in iOS 4) NSOperation API. NSOperations are a high-level, verbose API that became more powerful after incorporating elements of dispatch (low-level API that became GCD) in iOS 4.
Lots of bugs related to unhandled invalid Selector errors (type safety).
DispatchQueue internals
I believe the xref, ref and sref are internal registers that manage reference counts for automatic reference counting. GCD calls dispatch_retain and dispatch_release in most cases when needed, so we don't need to worry about releasing a queue after all its blocks have been executed. However, there were cases when a developer could call retain and release manually when trying to ensure the queue is retained even when not directly in use. These registers allow libDispatch to crash when release is called on a queue with a positive reference count, for better error handling.
When calling a block with DispatchQueue.global().async or similar, I believe this increments the reference count of that queue (xref and ref).
The variables in the question are not documented explicitly, but from what I can tell:
xref counts the number of external references to a general DispatchQueue.
ref counts the total number of references to a general DispatchQueue.
sref counts the number of references to API serial/concurrent/runloop queues, sources and mach channels (these need to be tracked differently as they are represented using different types).
in-barrier looks like an internal state flag (DispatchWorkItemFlag) to track whether new work items submitted to a concurrent queue should be scheduled or not. Only once the barrier work item finishes, the queue returns to scheduling work items that were submitted after the barrier. in-flight means that there is no barrier in force currently.
state is also not documented explicitly but I presume points to memory where the block can access variables from the scope where the block was scheduled.

Related

Avoid Data Race condition in swift

I am getting race conditions in my code when I run TSan tool. As same code has been accessed from different queues and threads at the same time that's why I can not use Serial queues or barrier as Queue will block only single queues accessing the shared resource not the other queues.
I used objc_sync_enter(object) | objc_sync_exit(object) and locks NSLock() or NSRecursiveLock() to protect shared resource but these are also not working.
While when I use #synchronized() keyword in Objective C to protect shared resource, it's working fine as expected and I am not getting race conditions in particular block of code.
So, what is an alternative to protect data in Swift as we can not use #synchronized() keyword in Swift language.
PFA screenshot for reference -
I don't understand "I can not use Serial queues or barrier as Queue will block only single queues accessing the shared resource not the other queues." Using a queue is the standard solution to this problem.
class MultiAccess {
private var _property: String = ""
private let queue = DispatchQueue(label: "MultiAccess")
var property: String {
get {
var result: String!
queue.sync {
result = self._property
}
return result
}
set {
queue.async {
self._property = newValue
}
}
}
}
With this construction, access to property is atomic and thread-safe without the caller having to do anything special. Note that this intentionally uses a single queue for the class, not a queue per-property. As a rule, you want a relatively small number of queues doing a lot of work, not a large number of queues doing a little work. If you find that you're accessing a mutable object from lots of different threads, you need to rethink your system (probably reducing the number of threads). There's no simple pattern that will make that work efficiently and safely without you having to think about your specific use case carefully.
But this construction is useful for problems where system frameworks may call you back on random threads with minimal contention. It is simple to implement and fairly easy to use correctly. If you have a more complex problem, you will have to design a solution for that problem.
Edit: I haven't thought about this answer in a long time, but Brennan's comments brought it back to my attention. Because of the bug I had in the original code, my original answer was ok, but if you fixed the bug it was bad. (If you want to see my original code that used a barrier, look in the edit history, I don't want to put it here because people will copy it.) I've changed it to use a standard serial queue rather than a concurrent queue.
Don't generate concurrent queues without careful thought about how threads will be generated. If you are going to have many simultaneous accesses, you're going to create a lot of threads, which is bad. If you're not going to have many simultaneous accesses, then you don't need a concurrent queue. GCD talks make promises about managing threads that it doesn't actually live up to. You definitely can get thread explosion (as Brennan mentions.)

How would #synchronized work if called on separate threads? [duplicate]

I just created a singleton method, and I would like to know what the function #synchronized() does, as I use it frequently, but do not know the meaning.
It declares a critical section around the code block. In multithreaded code, #synchronized guarantees that only one thread can be executing that code in the block at any given time.
If you aren't aware of what it does, then your application probably isn't multithreaded, and you probably don't need to use it (especially if the singleton itself isn't thread-safe).
Edit: Adding some more information that wasn't in the original answer from 2011.
The #synchronized directive prevents multiple threads from entering any region of code that is protected by a #synchronized directive referring to the same object. The object passed to the #synchronized directive is the object that is used as the "lock." Two threads can be in the same protected region of code if a different object is used as the lock, and you can also guard two completely different regions of code using the same object as the lock.
Also, if you happen to pass nil as the lock object, no lock will be taken at all.
From the Apple documentation here and here:
The #synchronized directive is a
convenient way to create mutex locks
on the fly in Objective-C code. The
#synchronized directive does what any
other mutex lock would do—it prevents
different threads from acquiring the
same lock at the same time.
The documentation provides a wealth of information on this subject. It's worth taking the time to read through it, especially given that you've been using it without knowing what it's doing.
The #synchronized directive is a convenient way to create mutex locks on the fly in Objective-C code.
The #synchronized directive does what any other mutex lock would do—it prevents different threads from acquiring the same lock at the same time.
Syntax:
#synchronized(key)
{
// thread-safe code
}
Example:
-(void)AppendExisting:(NSString*)val
{
#synchronized (oldValue) {
[oldValue stringByAppendingFormat:#"-%#",val];
}
}
Now the above code is perfectly thread safe..Now Multiple threads can change the value.
The above is just an obscure example...
#synchronized block automatically handles locking and unlocking for you. #synchronize
you have an implicit lock associated with the object you are using to synchronize. Here is very informative discussion on this topic please follow How does #synchronized lock/unlock in Objective-C?
Excellent answer here:
Help understanding class method returning singleton
with further explanation of the process of creating a singleton.
#synchronized is thread safe mechanism. Piece of code written inside this function becomes the part of critical section, to which only one thread can execute at a time.
#synchronize applies the lock implicitly whereas NSLock applies it explicitly.
It only assures the thread safety, not guarantees that. What I mean is you hire an expert driver for you car, still it doesn't guarantees car wont meet an accident. However probability remains the slightest.

2017 / Swift 3.1 - GCD vs NSOperation

I am diving a bit deeper into concurrency and have been reading extensively about GCD and NSOperation. However, a lot of posts like the canonic answer on SO are several years old.
It seemed to me that NSOperation main advantages used to be, at the cost of some performance:
"the way to go" generally for more than a simple dispatch as the highest level abstraction (built atop of GCD)
to make task manipulation (cancellation, etc.) a lot easier
to easily set up dependencies between tasks
Given GCD's DispatchWorkItem & block cancellation / DispatchGroup / qos in particular, is there really an incentive (cost-performance wise) to use NSOperation anymore for concurrency apart from cases where you need to be able to cancel a task when it began executing or query the task state ?
Apple seems to put a lot more emphasis on GCD, at least in their WWDC (granted it's more recent than NSOperation).
I see them each still having their own purpose. I just recently rewatched the 2015 WWDC talk about this (Advanced NSOperations), and I see two main points here.
Run Time & User Interaction
From the talk:
NSOperations run for a little bit longer than you would expect a block to run, so blocks usually take a few nanoseconds, maybe at most a millisecond, to execute.
NSOperations, on the other hand, can be much longer, for anywhere from a couple of milliseconds to even several minutes
The example they talk about is in the WWDC app, where there exists an NSOperation that has a dependency on having a logged in user. The dependency NSOperation presents a login view controller and waits for the user to authenticate. Once finished, that NSOperation finishes and the NSOperationQueue resumes it's work. I don't think you'd want to use GCD for this scenario.
Subclassing
Since NSOperations are just classes, you can subclass them to get more reusability out of them. This isn't possible with GCD.
Example: (Using the WWDC login scenario from above)
You have many NSOperations in your code base that are associated with a user interaction that requires them to be authenticated. (Liking a video, in this example.) You could extend NSOperation to create an AuthenticatedOperation, then have all those NSOperations extend this new class.
First off, NSOperationQueue let you enqueue operations, that is, some sort of asynchronous operations with a start method, a cancel method and a few observable properties, while with a dispatch queue one can submit a block or a closure or a function to a dispatch queue, which will be then executed.
An "Operation" is semantically fundamentally different than a block (or closure, function). An operation has an underlying asynchronous task, while a block (closure or functions) is just that.
What comes close to an NSOperation, though, is an asynchronous function, e.g.:
func asyncTask(param: Param, completion: (T?, Error?) ->())
Now with Futures we can define the same asynchronous function like:
func asyncTask(param: Param) -> Future<T>
which makes such asynchronous functions quite handy.
Since futures have combinator functions like map and flatMap and so on, we can quite easily "emulate" the "dependency" feature of NSOperation, just in a more powerful, more concise and more comprehensible way.
We can also implement some sort of NSOperationQueue with a few lines of code based solely on GCD, say a "TaskQueue" and with basically the same features, like "maxConcurrentTasks" and can use it to enqueue task functions (not operations), in just a more powerful, more concise and a more comprehensible way as well. ;)
In order to get a cancelable operation, you need to create a subclass of NSOperation - while you can create a async function "ad-hod" - inline.
Also, since cancellation is an independent concept, we can assume, that there exists some library whose implementation is solely based on GCD which solves this problem in the, uhm, the usual way ;) It may look like this:
self.cancellationRequest = CancellationRequest()
self.asyncTask(param: param, cancellationToken: cr.token).map { result in
...
}
and later:
override func viewWillDisappear(_ animated: animated) {
super.viewWillDisappear(animated)
self.cancellationRequest.cancel()
}
So, IMHO there's really no reason to use clunky NSOperation and NSOperationQueue, and there's no reason any more for subclassing NSOperation, which is quite elaborate and surprising difficult, unless you don't care about data races.

How to dispatch_after in the current queue?

Now that dispatch_get_current_queue is deprecated in iOS 6, how do I use dispatch_after to execute something in the current queue?
The various links in the comments don't say "it's better not to do it." They say you can't do it. You must either pass the queue you want or dispatch to a known queue. Dispatch queues don't have the concept of "current." Blocks often feed from one queue to another (called "targeting"). By the time you're actually running, the "current" queue is not really meaningful, and relying on it can (and historically did) lead to dead-lock. dispatch_get_current_queue() was never meant for dispatching; it was a debugging method. That's why it was removed (since people treated it as if it meant something meaningful).
If you need that kind of higher-level book-keeping, use an NSOperationQueue which tracks its original queue (and has a simpler queuing model that makes "original queue" much more meaningful).
There are several approaches used in UIKit that are appropriate:
Pass the call-back dispatch_queue as a parameter (this is probably the most common approach in new APIs). See [NSURLConnection setDelegateQueue:] or addObserverForName:object:queue:usingBlock: for examples. Notice that NSURLConnection expects an NSOperationQueue, not a dispatch_queue. Higher-level APIs and all that.
Call back on whatever queue you're on and leave it up to the receiver to deal with it. This is how callbacks have traditionally worked.
Demand that there be a runloop on the calling thread, and schedule your callbacks on the calling runloop. This is how NSURLConnection historically worked before queues.
Always make your callbacks on one of the well-known queues (particularly the main queue) unless told otherwise. I don't know of anywhere that this is done in UIKit, but I've seen it commonly in app code, and is a very easy approach most of the time.
Create a queue manually and dispatch both your calling code and your dispatch_after code onto that. That way you can guarantee that both pieces of code are run from the same queue.
Having to do this is likely because the need of a hack. You can hack around this with another hack:
id block = ^foo() {
[self doSomething];
usleep(delay_in_us);
[self doSomehingOther];
}
Instead of usleep() you might consider to loop in a run loop.
I would not recommend this "approach" though. The better way is to have some method which takes a queue as parameter and a block as parameter, where the block is then executed on the specified queue.
And, by the way, there are ways during a block executes to check whether it runs on a particular queue - respectively on any of its parent queue, provided you have a reference to that queue beforehand: use functions dispatch_queue_set_specific, and dispatch_get_specific.

#synchronized block versus GCD dispatch_async()

Essentially, I have a set of data in an NSDictionary, but for convenience I'm setting up some NSArrays with the data sorted and filtered in a few different ways. The data will be coming in via different threads (blocks), and I want to make sure there is only one block at a time modifying my data store.
I went through the trouble of setting up a dispatch queue this afternoon, and then randomly stumbled onto a post about #synchronized that made it seem like pretty much exactly what I want to be doing.
So what I have right now is...
// a property on my object
#property (assign) dispatch_queue_t matchSortingQueue;
// in my object init
_sortingQueue = dispatch_queue_create("com.asdf.matchSortingQueue", NULL);
// then later...
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
dispatch_async(_sortingQueue, ^{
// do stuff...
});
}
And my question is, could I just replace all of this with the following?
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
#synchronized (self) {
// do stuff...
};
}
...And what's the difference between the two anyway? What should I be considering?
Although the functional difference might not matter much to you, it's what you'd expect: if you #synchronize then the thread you're on is blocked until it can get exclusive execution. If you dispatch to a serial dispatch queue asynchronously then the calling thread can get on with other things and whatever it is you're actually doing will always occur on the same, known queue.
So they're equivalent for ensuring that a third resource is used from only one queue at a time.
Dispatching could be a better idea if, say, you had a resource that is accessed by the user interface from the main queue and you wanted to mutate it. Then your user interface code doesn't need explicitly to #synchronize, hiding the complexity of your threading scheme within the object quite naturally. Dispatching will also be a better idea if you've got a central actor that can trigger several of these changes on other different actors; that'll allow them to operate concurrently.
Synchronising is more compact and a lot easier to step debug. If what you're doing tends to be two or three lines and you'd need to dispatch it synchronously anyway then it feels like going to the effort of creating a queue isn't worth it — especially when you consider the implicit costs of creating a block and moving it over onto the heap.
In the second case you would block the calling thread until "do stuff" was done. Using queues and dispatch_async you will not block the calling thread. This would be particularly important if you call sortArrayIntoLocalStore from the UI thread.

Resources