I'm getting into NSBlockOperation and I have some questions.
Notably, the documentation for addExecutionBlock says:
Discussion
The specified block should not make any assumptions about
its execution environment.
Calling this method while the receiver is executing or has already
finished causes an NSInvalidArgumentException exception to be thrown.
What kind of situation will throw NSInvalidArgumentException? What really doesn "while receiver is executing" mean? What can cause this?
You can't use addExecutionBlock: to add an execution block while the operation is running or has already completed. That's all it means.
A block operation object can have zero or more execution blocks associated with it. When the block operation is started, all of its associated execution blocks are submitted for concurrent execution. The warning is that you can't add more execution blocks to the operation after this point.
You can create more block operation objects and add execution blocks to those. Each block operation is started separately from others, so the rule about adding more execution blocks is evaluated separately.
Typically, you would create a block operation, add whatever execution blocks to it that you want, and then queue the operation onto an operation queue. Once the operation has been queued, it might start at any time (subject to readiness, which is subject to dependencies). So, it's best to not attempt to add execution blocks once it's been queued.
Related
I tried running the following code and it raises the following error every time:
DispatchQueue.main.sync { }
Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
I found this post on stackoverflow that says to never run synchronous code on the main queue:
DispatchQueue crashing with main.sync in Swift
I had assumed that because the sync { } method is there that means there is some context that it can be used. Is there absolutely no use for executing synchronous code on the main queue?
I had assumed that because the sync { } method is there that means there is some context that it can be used.
Yes, it's there to be used when appropriate, but that doesn't mean it should be applied to the main queue.
Is there absolutely no use for executing synchronous code on the main
queue?
The sync command blocks and waits for its operation to be performed and completed on the specified queue. That queue can certainly be the main queue. But the queue that blocks cannot be the main queue! You must never say sync when you are on the main queue, as you will then be blocking the main queue which is illegal; and you must really never say DispatchQueue.main.sync when you are on the main queue, as you will be blocking the main queue forever (thereby causing the heat death of the universe).
Really the best thing to do is adopt async/await and never mention DispatchQueue again. All these concerns vanish in a puff of smoke and your code becomes safe and easy to reason about, automatically.
sync should not be used in the main queue because you are likely to block everything, the method is there for "custom" queues (for example you created a queue and YOU handle when it has to be blocked and unblocked) the main queue is a special case, and since it is not manage by you introducing a block may generate unexpected behavior (that usually translates into a crash)
Most answers on stackoverflow implies in a way that sync vs async behaviour is quite similar to serial vs concurrent queue concept difference. Like the link in the first comment by #Roope
I have started to think that
Serial and concurrent are related to DispatchQueue, and sync/ async for how an operation will get executed on a thread.
Am I right?
Like if we've got DQ.main.sync then task/operation closure will get executed in a synchronous manner on this serial (main) queue.
And, if I do DQ.main.async then task will get asynchronously on some other background queue, and on reaching completion will return control on main thread.
And, since main is a serial queue, it won't let any other task/operation get into execution state/ start getting executed until the current closure task has finished its execution.
Then,
DQ.global().sync would execute a task synchronously on the thread on which its task/operation has been assigned i.e., it will block that thread from doing any other task/operation by blocking any context switching on that particular thread.
And, since, global is a concurrent queue it will keep on putting the tasks present in it to the execution state irrespective of previous task/operation's execution state.
DQ.global().async would allow context switching on the thread on which the operation closure has been put for execution
Is this the correct interpretations of the above dispatchQueues and sync vs async?
You are asking the right questions but I think you got a bit confused (mostly due to not very clear posts about this topic on internet).
Concurrent / Serial
Let's look at how you can create a new dispatch Queue:
let serialQueue = DispatchQueue(label: label)
If you don't specify any other additional parameter, this queue will behave as a serial queue:
This means that every block dispatched on this queue (sync or async it doesn't matter) will be executed alone, without the possibility for other blocks to be executed, on that same queue, simultaneously.
This doesn't mean that anything else is stopped, it just means that if something else is dispatched on that same queue, it will wait for the first block to finish before starting it's execution. Other threads and queues will still run on their own.
You can, however, create a concurrent queue, that will not constraint this blocks of code in this manner and, instead, if it happens that more blocks of code are dispatched on that same queue at the same time, it will execute them at the same time (on different threads)
let concurrentQueue = DispatchQueue(label: label,
qos: .background,
attributes: .concurrent,
autoreleaseFrequency: .inherit,
target: .global())
So, you just need to pass the attribute concurrent to the queue, and it won't be serial anymore.
(I won't be talking about the other parameters since they are not in focus of this particular question and, I think, you can read about them in the other SO post linked in the comment or, if it's not enough, you can ask another question)
If you want to understand more about concurrent queues (aka: skip if you don't care about concurrent queues)
You could ask: When do I even need a concurrent queue?
Well, just for example, let's think of a use-case where you want to synchronize READS on a shared resource: since the reads can be done simultaneously without issues, you could use a concurrent queue for that.
But what if you want to write on that shared resource?
well, in this case a write needs to act as a "barrier" and during the execution of that write, no other write and no reads can operate on that resource simultaneously.
To obtain this kind of behavior, the swift code would look something like this
concurrentQueue.async(flags: .barrier, execute: { /*your barriered block*/ })
So, in other words, you can make a concurrent queue work temporarily as a serial queue in case you need.
Once again, the concurrent / serial distinction is only valid for blocks dispatched to that same queue, it has nothing to do with other concurrent or serial work that can be done on another thread/queue.
SYNC / ASYNC
This is totally another issue, with virtually no connection to the previous one.
This two ways to dispatch some block of code are relative to the current thread/queue you are at the time of the dispatch call. This dispatch call blocks (in case of sync) or doesn't block (async) the execution of that thread/queue while executing the code you dispatch on the other queue.
So let's say I'm executing a method and in that method I dispatch async something on some other queue (I'm using main queue but it could be any queue):
func someMethod() {
var aString = "1"
DispatchQueue.main.async {
aString = "2"
}
print(aString)
}
What happens is that this block of code is dispatched on another queue and could be executed serially or concurrently on that queue, but that has no correlation to what is happening on the current queue (which is the one on which someMethod is called).
What happens on the current queue is that the code will continue executing and won't wait for that block to be completed before printing that variable.
This means that, very likely, you will see it print 1 and not 2. (More precisely you can't know what will happen first)
If instead you would dispatch it sync, than you would've ALWAYS printed 2 instead of 1, because the current queue would've waited for that block of code to be completed, before continuing in it's execution.
So this will print 2:
func someMethod() {
var aString = "1"
DispatchQueue.main.sync {
aString = "2"
}
print(aString)
}
But does it mean that the queue on which someMethod is called is actually stopped?
Well, it depends on the current queue:
If it's serial, than yes. All the blocks previously dispatched to that queue or that will be dispatched on that queue will have to wait for that block to be completed.
If it's concurrent, than no. All concurrent blocks will continue their execution, only this specific block of execution will be blocked, waiting for this dispatch call to finish it's work. Of course if we are in the barriered case, than it's like for serial queues.
What happens when the currentQueue and the queue on which we dispatch are the same?
Assuming we are on serial queues (which I think will be most of your use-cases)
In case we dispatch sync, than deadlock. Nothing will ever execute on that queue anymore. That's the worst it could happen.
In case we dispatch async, than the code will be executed at the end of all the code already dispatched on that queue (including but not limited to the code executing right now in someMethod)
So be extra careful when you use the sync method, and be sure you are not on that same queue you are dispatching into.
I hope this let you understand better.
I have started to think that Serial and concurrent are related to DispatchQueue, and sync/async for how an operation will get executed on a thread.
In short:
Whether the destination queue is serial or concurrent dictates how that destination queue will behave (namely, can that queue run this closure at the same time as other things that were dispatched to that same queue or not);
Whereas sync vs async dictates how the current thread from which you are dispatching will behave (namely, should the calling thread wait until the dispatched code to finish or not).
So, serial/concurrent affects the destination queue to which you are dispatching, whereas sync/async affects the current thread from which you are dispatching.
You go on to say:
Like if we've got DQ.main.sync then task/operation closure will get executed in a synchronous manner on this serial (main) queue.
I might rephrase this to say “if we've got DQ.main.sync then the current thread will wait for the main queue to perform this closure.”
FWIW, we don’t use DQ.main.sync very often, because 9 times out of 10, we’re just doing this to dispatch some UI update, and there’s generally no need to wait. It’s minor, but we almost always use DQ.main.async. We do use sync is when we’re trying to provide thread-safe interaction with some resource. In that scenario, sync can be very useful. But it often is not required in conjunction with main, but only introduces inefficiencies.
And, if I do DQ.main.async then task will get asynchronously on some other background queue, and on reaching completion will return control on main thread.
No.
When you do DQ.main.async, you’re specifying the closure will run asynchronously on the main queue (the queue to which you dispatched) and that that your current thread (presumably a background thread) doesn’t need to wait for it, but will immediately carry on.
For example, consider a sample network request, whose responses are processed on a background serial queue of the URLSession:
let task = URLSession.shared.dataTask(with: url) { data, _, error in
// parse the response
DispatchQueue.main.async {
// update the UI
}
// do something else
}
task.resume()
So, the parsing happens on this URLSession background thread, it dispatches a UI update to the main thread, and then carries on doing something else on this background thread. The whole purpose of sync vs async is whether the “do something else” has to wait for the “update the UI” to finish or not. In this case, there’s no point to block the current background thread while the main is processing the UI update, so we use async.
Then, DQ.global().sync would execute a task synchronously on the thread on which its task/operation has been assigned i.e., ...
Yes DQ.global().sync says “run this closure on a background queue, but block the current thread until that closure is done.”
Needless to say, in practice, we would never do DQ.global().sync. There’s no point in blocking the current thread waiting for something to run on a global queue. The whole point in dispatching closures to the global queues is so you don’t block the current thread. If you’re considering DQ.global().sync, you might as well just run it on the current thread because you’re blocking it anyway. (In fact, GCD knows that DQ.global().sync doesn’t achieve anything and, as an optimization, will generally run it on the current thread anyway.)
Now if you were going to use async or using some custom queue for some reason, then that might make sense. But there’s generally no point in ever doing DQ.global().sync.
... it will block that thread from doing any other task/operation by blocking any context switching on that particular thread.
No.
The sync doesn’t affect “that thread” (the worker thread of the global queue). The sync affects the current thread from which you dispatched this block of code. Will this current thread wait for the global queue to perform the dispatched code (sync) or not (async)?
And, since, global is a concurrent queue it will keep on putting the tasks present in it to the execution state irrespective of previous task/operation's execution state.
Yes. Again, I might rephrase this: “And, since global is a current queue, this closure will be scheduled to run immediately, regardless of what might already be running on this queue.”
The technical distinction is that when you dispatch something to a concurrent queue, while it generally starts immediately, sometimes it doesn’t. Perhaps all of the cores on your CPU are tied up running something else. Or perhaps you’ve dispatched many blocks and you’ve temporarily exhausted GCD’s very limited number of “worker threads”. Bottom line, while it generally will start immediately, there could always be resource constraints that prevent it from doing so.
But this is a detail: Conceptually, when you dispatch to a global queue, yes, it generally will start running immediately, even if you might have a few other closures that you have dispatched to that queue which haven’t finished yet.
DQ.global().async would allow context switching on the thread on which the operation closure has been put for execution.
I might avoid the phrase “context switching”, as that has a very specific meaning which is probably beyond the scope of this question. If you’re really interested, you can see WWDC 2017 video Modernizing Grand Central Dispatch Usage.
The way I’d describe DQ.global().async is that it simply “allows the current thread to proceed, unblocked, while the global queue performs the dispatched closure.” This is an extremely common technique, often called from the main queue to dispatch some computationally intensive code to some global queue, but not wait for it to finish, leaving the main thread free to process UI events, resulting in more responsive user interface.
I need a clarifications on how dispatch_queues is related to reentrancy and deadlocks.
Reading this blog post Thread Safety Basics on iOS/OS X, I encountered this sentence:
All dispatch queues are non-reentrant, meaning you will deadlock if
you attempt to dispatch_sync on the current queue.
So, what is the relationship between reentrancy and deadlock? Why, if a dispatch_queue is non-reentrant, does a deadlock arise when you are using dispatch_sync call?
In my understanding, you can have a deadlock using dispatch_sync only if the thread you are running on is the same thread where the block is dispatch into.
A simple example is the following. If I run the code in the main thread, since the dispatch_get_main_queue() will grab the main thread as well and I will end in a deadlock.
dispatch_sync(dispatch_get_main_queue(), ^{
NSLog(#"Deadlock!!!");
});
Any clarifications?
All dispatch queues are non-reentrant, meaning you will deadlock if
you attempt to dispatch_sync on the current queue.
So, what is the relationship between reentrancy and deadlock? Why, if
a dispatch_queue is non-reentrant, does a deadlock arise when you are
using dispatch_sync call?
Without having read that article, I imagine that statement was in reference to serial queues, because it's otherwise false.
Now, let's consider a simplified conceptual view of how dispatch queues work (in some made-up pseudo-language). We also assume a serial queue, and don't consider target queues.
Dispatch Queue
When you create a dispatch queue, basically you get a FIFO queue, a simple data structure where you can push objects on the end, and take objects off the front.
You also get some complex mechanisms to manage thread pools and do synchronization, but most of that is for performance. Let's simply assume that you also get a thread that just runs an infinite loop, processing messages from the queue.
void processQueue(queue) {
for (;;) {
waitUntilQueueIsNotEmptyInAThreadSaveManner(queue)
block = removeFirstObject(queue);
block();
}
}
dispatch_async
Taking the same simplistic view of dispatch_async yields something like this...
void dispatch_async(queue, block) {
appendToEndInAThreadSafeManner(queue, block);
}
All it is really doing is taking the block, and adding it to the queue. This is why it returns immediately, it just adds the block onto the end of the data structure. At some point, that other thread will pull this block off the queue, and execute it.
Note, that this is where the FIFO guarantee comes into play. The thread pulling blocks off the queue and executing them always takes them in the order that they were placed on the queue. It then waits until that block has fully executed before getting the next block off the queue
dispatch_sync
Now, another simplistic view of dispatch_sync. In this case, the API guarantees that it will wait until the block has run to completion before it returns. In particular, calling this function does not violate the FIFO guarantee.
void dispatch_sync(queue, block) {
bool done = false;
dispatch_async(queue, { block(); done = true; });
while (!done) { }
}
Now, this is actually done with semaphores so there is no cpu loops and boolean flag, and it doesn't use a separate block, but we are trying to keep it simple. You should get the idea.
The block is placed on the queue, and then the function waits until it knows for sure that "the other thread" has run the block to completion.
Reentrancy
Now, we can get a reentrant call in a number of different ways. Let's consider the most obvious.
block1 = {
dispatch_sync(queue, block2);
}
dispatch_sync(queue, block1);
This will place block1 on the queue, and wait for it to run. Eventually the thread processing the queue will pop block1 off, and start executing it. When block1 executes, it will put block2 on the queue, and then wait for it to finish executing.
This is one meaning of reentrancy: when you re-enter a call to dispatch_sync from another call to dispatch_sync
Deadlock from reentering dispatch_sync
However, block1 is now running inside the queue's for loop. That code is executing block1, and will not process anything more from the queue until block1 completes.
Block1, though, has placed block2 on the queue, and is waiting for it to complete. Block2 has indeed been placed on the queue, but it will never be executed. Block1 is "waiting" for block2 to complete, but block2 is sitting on a queue, and the code that pulls it off the queue and executes it will not run until block1 completes.
Deadlock from NOT reentering dispatch_sync
Now, what if we change the code to this...
block1 = {
dispatch_sync(queue, block2);
}
dispatch_async(queue, block1);
We are not technically reentering dispatch_sync. However, we still have the same scenario, it's just that the thread that kicked off block1 is not waiting for it to finish.
We are still running block1, waiting for block2 to finish, but the thread that will run block2 must finish with block1 first. This will never happen because the code to process block1 is waiting for block2 to be taken off the queue and executed.
Thus reentrancy for dispatch queues is not technically reentering the same function, but reentering the same queue processing.
Deadlocks from NOT reentering the queue at all
In it's most simple case (and most common), let's assume [self foo] gets called on the main thread, as is common for UI callbacks.
-(void) foo {
dispatch_sync(dispatch_get_main_queue(), ^{
// Never gets here
});
}
This doesn't "reenter" the dispatch queue API, but it has the same effect. We are running on the main thread. The main thread is where the blocks are taken off the main queue and processed. The main thread is currently executing foo and a block is placed on the main-queue, and foo then waits for that block to be executed. However, it can only be taken off the queue and executed after the main thread gets done with its current work.
This will never happen because the main thread will not progress until `foo completes, but it will never complete until that block it is waiting for runs... which will not happen.
In my understanding, you can have a deadlock using dispatch_sync only
if the thread you are running on is the same thread where the block is
dispatch into.
As the aforementioned example illustrates, that's not the case.
Furthermore, there are other scenarios that are similar, but not so obvious, especially when the sync access is hidden in layers of method calls.
Avoiding deadlocks
The only sure way to avoid deadlocks is to never call dispatch_sync (that's not exactly true, but it's close enough). This is especially true if you expose your queue to users.
If you use a self-contained queue, and control its use and target queues, you can maintain some control when using dispatch_sync.
There are, indeed, some valid uses of dispatch_sync on a serial queue, but most are probably unwise, and should only be done when you know for certain that you will not be 'sync' accessing the same or another resource (the latter is known as deadly embrace).
EDIT
Jody, Thanks a lot for your answer. I really understood all of your
stuff. I would like to put more points...but right now I cannot. 😢 Do
you have any good tips in order to learn this under the hood stuff? –
Lorenzo B.
Unfortunately, the only books on GCD that I've seen are not very advanced. They go over the easy surface level stuff on how to use it for simple general use cases (which I guess is what a mass market book is supposed to do).
However, GCD is open source. Here is the webpage for it, which includes links to their svn and git repositories. However, the webpage looks old (2010) and I'm not sure how recent the code is. The most recent commit to the git repository is dated Aug 9, 2012.
I'm sure there are more recent updates; but not sure where they would be.
In any event, I doubt the conceptual frameworks of the code has changed much over the years.
Also, the general idea of dispatch queues is not new, and has been around in many forms for a very long time.
Many moons ago, I spent my days (and nights) writing kernel code (worked on what we believe to have been the very first symmetric multiprocessing implementation of SVR4), and then when I finally breached the kernel, I spent most of my time writing SVR4 STREAMS drivers (wrapped by user space libraries). Eventually, I made it fully into user space, and built some of the very first HFT systems (though it wasn't called that back then).
The dispatch queue concept was prevalent in every bit of that. It's emergence as a generally available user space library is only a somewhat recent development.
Edit #2
Jody, thanks for your edit. So, to recap a serial dispatch queue is
not reentrant since it could produce an invalid state (a deadlock).
On the contrary, an reentrant function will not produce it. Am I right?
– Lorenzo B.
I guess you could say that, because it does not support reentrant calls.
However, I think I would prefer to say that the deadlock is the result of preventing invalid state. If anything else occurred, then either the state would be compromised, or the definition of the queue would be violated.
Core Data's performBlockAndWait
Consider -[NSManagedObjectContext performBlockAndWait]. It's non-asynchronous, and it is reentrant. It has some pixie dust sprinkled around the queue access so that the second block runs immediately, when called from "the queue." Thus, it has the traits I described above.
[moc performBlock:^{
[moc performBlockAndWait:^{
// This block runs immediately, and to completion before returning
// However, `dispatch_async`/`dispatch_sync` would deadlock
}];
}];
The above code does not "produce a deadlock" from reentrancy (but the API can't avoid deadlocks entirely).
However, depending on who you talk to, doing this can produce invalid (or unpredictable/unexpected) state. In this simple example, it's clear what's happening, but in more complicated parts it can be more insidious.
At the very least, you must be very careful about what you do inside a performBlockAndWait.
Now, in practice, this is only a real issue for main-queue MOCs, because the main run loop is running on the main queue, so performBlockAndWait recognizes that and immediately executes the block. However, most apps have a MOC attached to the main queue, and respond to user save events on the main queue.
If you want to watch how dispatch queues interact with the main run loop, you can install a CFRunLoopObserver on the main run loop, and watch how it processes the various input sources in the main run loop.
If you've never done that, it's an interesting and educational experiment (though you can't assume what you observe will always be that way).
Anyway, I generally try to avoid both dispatch_sync and performBlockAndWait.
I call dispatch_async(dispatch_get_main_queue()from several background threads. However, it appears that occasionally the code in the dispatch block is not executed. Could this be because i dispatch asynchronously and the thread exits before the main queue can execute the code?
Have you tried putting an NSLog in the beginning of your code snippet to be absolutely sure that it's not executing? Sometimes an if statement with faulty logic will pre-terminate your code. (From my past experience ;])
The moment the dispatch_async() call returns, it's not important whether or not the thread that invoked it subsequently exits or not - the "request is in the system" so to speak! Something else is happening in those "occasional" cases. Does your program have a run loop or call dispatch_main() at the end of its main function? Not clear whether this is a Cocoa/iOS/POSIX application you're describing.
That is, if we queue the same thing several time there will be no concurrency.
The one we queued first will be executed first.
I mean there is only one main thread right?
I have found a nice answer here:
NSOperationQueue and concurrent vs non-concurrent
So make all added operations serial you can always set:
[[NSOperationQueue mainQueue] setMaxConcurrentOperationCount:1];
And the answer is... YES and NO
when you create a new NSOperation to add to your queue, you can use
- (void)setQueuePriority:(NSOperationQueuePriority)priority
according to the documentation, the queue will use this priority, and other factors as inter dependency to decide what operation will be executed next.
As long as your operations have the same priority and no inter-operation dependencies, they should be executed in the same order you added them, maybe with other, system related operations, inserted between them.
From documentation:
The NSOperationQueue class regulates the execution of a set of NSOperation objects. After being added to a queue, an operation remains in that queue until it is explicitly canceled or finishes executing its task. Operations within the queue (but not yet executing) are themselves organized according to priority levels and inter-operation object dependencies and are executed accordingly. An application may create multiple operation queues and submit operations to any of them.
Inter-operation dependencies provide an absolute execution order for operations, even if those operations are located in different operation queues. An operation object is not considered ready to execute until all of its dependent operations have finished executing. For operations that are ready to execute, the operation queue always executes the one with the highest priority relative to the other ready operations. For details on how to set priority levels and dependencies, see NSOperation Class Reference.
About threads:
Although you typically execute operations by adding them to an operation queue, doing so is not required. It is also possible to execute an operation object manually by calling its start method, but doing so does not guarantee that the operation runs concurrently with the rest of your code. The isConcurrent method of the NSOperation class tells you whether an operation runs synchronously or asynchronously with respect to the thread in which its start method was called. By default, this method returns NO, which means the operation runs synchronously in the calling thread.
When you submit a nonconcurrent operation to an operation queue, the queue itself creates a thread on which to run your operation. Thus, adding a nonconcurrent operation to an operation queue still results in the asynchronous execution of your operation object code.
So, if I understand correctly here will be no concurrency.