Apple doc's GCD Producer-Consumer solution wrong? - ios

In the Migrating Away from Threads section of Apple's Concurrency Programming Guide, there is
Changing Producer-Consumer Implementations, which claims that the typical multistep pthread mutex + condition variable implementation can be simplified using GCD.
With dispatch queues, you can simplify the producer and consumer
implementations into a single call:
dispatch_async(queue, ^{
// Process a work item.
});
When your producer has work to be done, all it has to do is add that work to a queue and let the queue process the item.
The Producer-Consumer problem is also known as the Bounded-Buffer problem, yet the above makes no mention of a buffer, its bound or the consumer, let alone blocking the producer & consumer in order to avoid over/under runs.
How can this be a valid solution?

In the solution decribed at that Apple document:
There is no buffer because no buffer is needed;
System load is the bound;
Consumers are tasks.
Say you have multiple producers and consumers, producers place data in a shared buffer and consumers read data from that shared buffer. A semaphore or monitor is used to synchronise access to the shared buffer, and the buffer size is fixed so as to limit the amount of data that are being produced according to the rate they’re being consumed, hence throttling the producer.
Under Grand Central Dispatch, consumers are tasks dispatched to a queue. Since tasks are Objective-C blocks, a producer doesn’t need a buffer to tell a consumer about the data it should process: Objective-C blocks automatically capture objects they reference.
For example:
// Producer implementation
while (…) {
id dataProducedByTheProducer;
// Produce data and place it in dataProducedByTheProducer
dataProducedByTheProducer = …;
// Dispatch a new consumer task
dispatch_async(queue, ^{
// This task, which is an Objective-C block, is a consumer.
//
// Do something with dataProducedByTheProducer, which is
// the data that would otherwise be placed in the shared
// buffer of a traditional, semaphore-based producer-consumer
// implementation.
//
// Note that an Objective-C block automatically keeps a
// strong reference to any Objective-C object referenced
// inside of it, and the block releases said object when
// the block itself is released.
NSString *s = [dataProducedByTheProducer …];
});
}
The producer may place as many consumer tasks as data it can produce. However, this doesn’t mean that GCD will fire the consumer tasks at the same rate. GCD uses operating system information to control the amount of tasks that are executed according to the current system load. The producer itself isn’t throttled, and in most cases it doesn’t have to be because of GCD’s intrinsic load balancing.
If there’s actual need to throttle the producer, one solution is to have a master that would dispatch n producer tasks and have each consumer notify the master (via a task that’s dispatched after the consumer has finished its job) that it has ended, in which case the master would dispatch another producer task. Alternatively, the consumer itself could dispatch a producer task upon completion.
Specifically answering the items you’ve addressed:
The Producer-Consumer problem is also known as the Bounded-Buffer problem, yet the above makes no mention of a buffer
A shared buffer isn’t needed because consumers are Objective-C blocks, which automatically capture data that they reference.
its bound
GCD bounds the number of dispatched tasks according to the current system load.
or the consumer
Consumers are the tasks dispatched to GCD queues.
let alone blocking the producer & consumer in order to avoid over/under runs
There’s no need for blocking since there’s no shared buffer. As each consumer is an Objective-C block capturing the produced data via the Objective-C block context capturing mechanism, there’s a one-to-one relation between consumer and data.

Related

How does a DispatchQueue work? (specifically multithreading)

I don't understand the workings of a DispatchQueue and wanted to learn more about how they implement the foundational queueing theory requirements. I tried to inspect a queue using:
dump(DispatchQueue.global())
And this gave this output:
- <OS_dispatch_queue_global: com.apple.root.default-qos[0x10c041f00] = { xref = -2147483648, ref = -2147483648, sref = 1, target = [0x0], width = 0xfff, state = 0x0060000000000000, in-barrier}> #0
- super: OS_dispatch_queue
- super: OS_dispatch_object
- super: OS_object
- super: NSObject
I got that the label is com.apple.root.default-qos, and this is specified in the Apple docs and the class is the packaged OS_dispatch_queue_global. I understand qos is queryable on the queue itself and that makes sense as well. Width I think just means the allocated memory size.
What I don't understand are the relevances of xref, ref and sref, I think they are internal ids for the queues but I am not sure. I think they are related to fundamental queueing concepts (multithreading came to mind) but would be great to hone into this in more detail.
Is the autoreleaseFrequency hidden from this debug description? Also, what does in-barrier = 0 mean? I tried creating a custom queue and this was replaced by in-flight = 0.. so confused about that as well.
Any ideas on how these undocumented variables relate to queueing theory? I think these are undocumented internals of the API, so any educated and justified explanations would be fine!
Thanks.
Why ask this?
This is a fairly broad question about the internals of grand-central-dispatch. I had difficulty understanding the dumped output because the original WWDC '10 videos and slides for GCD are no longer public. I also didn't know about the open-source libdispatch repo (thanks Rob). That needn't be a problem, but there are no related QAs on SO explaining the topic in detail.
Why GCD?
According to the WWDC '10 GCD transcripts (Thanks Rob), the main idea behind the API was to simplify the boilerplate associated with using the #selector API for multithreading.
Benefits of GCD
Apple released a new block-based API instead of going with function pointers, to also enable type-safe code that wouldn't crash if the block had the wrong type signature. Using typedefs also made code cleaner when used in function parameters, local variables and #property declarations. Queues allow you to capture code and some state as a chunk of data that get managed, enqueued and executed automatically behind the scenes.
The same session mentions how GCD manages low-level threads under the hood. It enqueues blocks to execute on threads when they need to be executed and then releases those threads (PThreads to be precise) when they are no longer referenced. GCD manages threads automatically and doesn't expose this API - when a DispatchWorkItem is dequeued GCD creates a thread for this block to execute on.
Drawbacks of performSelector
performSelector:onThread:withObject:waitUntilDone: has numerous drawbacks that suggest poor design for the modern challenges of concurrency, waiting, synchronisation. leads to pyramids of doom when switching threads in a func. Furthermore, the NSObject.performSelector family of threading methods are inflexible and limited:
No options to optimise for concurrent, initially inactive, or synchronisation on a particular thread. Unlike GCD.
Only selectors can be dispatched on to new threads (awful).
Lots of threads for a given function leads to messy code (pyramids of doom).
No support for queueing without a limited (at the time when GCD was announced in iOS 4) NSOperation API. NSOperations are a high-level, verbose API that became more powerful after incorporating elements of dispatch (low-level API that became GCD) in iOS 4.
Lots of bugs related to unhandled invalid Selector errors (type safety).
DispatchQueue internals
I believe the xref, ref and sref are internal registers that manage reference counts for automatic reference counting. GCD calls dispatch_retain and dispatch_release in most cases when needed, so we don't need to worry about releasing a queue after all its blocks have been executed. However, there were cases when a developer could call retain and release manually when trying to ensure the queue is retained even when not directly in use. These registers allow libDispatch to crash when release is called on a queue with a positive reference count, for better error handling.
When calling a block with DispatchQueue.global().async or similar, I believe this increments the reference count of that queue (xref and ref).
The variables in the question are not documented explicitly, but from what I can tell:
xref counts the number of external references to a general DispatchQueue.
ref counts the total number of references to a general DispatchQueue.
sref counts the number of references to API serial/concurrent/runloop queues, sources and mach channels (these need to be tracked differently as they are represented using different types).
in-barrier looks like an internal state flag (DispatchWorkItemFlag) to track whether new work items submitted to a concurrent queue should be scheduled or not. Only once the barrier work item finishes, the queue returns to scheduling work items that were submitted after the barrier. in-flight means that there is no barrier in force currently.
state is also not documented explicitly but I presume points to memory where the block can access variables from the scope where the block was scheduled.

synchronized block within dispatch_async

I have seen code that dispatch async to a main queue or private dispatch queue (serial) and then in the dispatch code block is #synchronized. Under what circumstance do you want to do that? Isn't a serial queue already providing the synchronization needed?
Can the synchronized block be replaced with another GCD dispatch?
#synchronized() ensures that contained code (for a given token as the argument to #synchronized) is only run on one thread at a time.
Blocks submitted to a serial queue are executed one at a time, ie. a given block is not executed until all blocks submitted before it have finished executing. As long as a shared resource is only being accessed from code running on a serial queue, there's no need to synchronize/lock access to it. However, just because a given queue is serial, doesn't mean that other queues/threads (even serial queues!) aren't running simultaneously, and accessing the same shared resource.
Using #synchronized() is one way to prevent these multiple threads/queues from accessing the resource at the same time. Note that all code that access the shared resource needs to be wrapped with #synchronized().
Yes, you can use another GCD dispatch instead of a synchronized block. The "GCD way" of doing this would be to serialize all access to the shared resource using a serial queue. So, any time access to the shared resource needs to be made, you dispatch that code (using either dispatch_sync() or dispatch_async() depending on use case) to the serial queue associated with the resource. Of course, this means that the resource-access-queue must be visible/accessible to all parts of the program that access the shared resource. (You essentially have the same problem with #synchronized() in that its lock token must be accessible wherever it needs to be used, but it's a little easier since it can just be a string constant.)
The queue yes, it is synchronized, but if you access any 'external' object within, they are not synchronized.
If there are many threads, you know that each one will have its turn on the object. A tipical case is when you perform a CoreData import asynchronously, you have to #synchronized the context or the store coordinator.
Thread-safe is about making mutable shared state either immutable, or unshared. In this case, synchronize and serial queues are ways to temporarily unshare (prevent concurrent access), and both are valid.
However, consider the case where you have disjoint sets of related info inside the same object. For example, a bill with 1) parts of an address (city, street, etc), and 2) price, taxes, discount. Both need to be protected to avoid inconsistent state (object A sets a new street, while object B reads the old city and the new street), but both are unrelated. In this case, you should use the lower level of granularity to avoid blocks between unrelated code.
So a rule would be: don't use synchronize on unrelated sets of variables inside the same object, because it would cause unneeded blocks between them. You can use a queue + synchronize, or a queue per set, but not synchronized on both sets.
The key is that synchronize refers to the one and only intrinsic lock of the object, and that token can only be held by once. It accomplishes the same goal as if you routed all related code through one queue, except that you can have multiple queues (and thus, lower granularity), but only one intrinsic lock.
Back to your example. Assuming that the object is documented as “state X is protected by synchronize”, the use of synchronize inside the queue is useful to block related methods that could access that same state. So maybe queue and synchronized are protecting different things, or the serial queue is there to perform a different task.
Another reason to prefer a queue is to write a more sophisticated pattern like a read-write lock. Example:
NSMutableDictionary *_dic = [NSMutableDictionary new];
dispatch_queue_t _queue = dispatch_queue_create("com.concurrent.queue", DISPATCH_QUEUE_CONCURRENT);
- (id) objectForKey:(id)key
{
__block obj;
dispatch_sync(_queue, ^{
obj = [_dic objectForKey: key];
});
return obj;
}
- (void) setObject:(id)obj forKey:(id)key
{
// exclusive access while writing
dispatch_barrier_async(_queue, ^{
[_dic setObject:obj forKey:key];
});
}

How to use GCD for lightweight transactional locking of resources?

I'm trying to use GCD as a replacement for dozens of atomic properties. I remember at WWDC they were talking about that GCD could be used for efficient transactional locking mechanisms.
In my OpenGL ES runloop method I put all drawing code in a block executed by dispatch_sync on a custom created serial queue. The runloop is called by a CADisplayLink which is to my knowledge happening on the main thread.
There are ivars and properties which are used both for drawing but also for controlling what will be drawn. The problem is that there must be some locking in place to prevent concurrency problems, and a way of transactionally querying and modifying the state of the OpenGL ES scene from the main thread between two drawn frames.
I can modify a group of properties in a transactional way with GCD by executing a block on that serial queue.
But it seems I can't read values into the main thread, using GCD, while blocking the queue that executes the drawing code. dispatch_synch doesn't have a return value, but I want to get access to presentation values exactly between the drawing of two frames both for reading and writing.
Is it this barrier thing they were talking about? How does that work?
This is what the async writer / sync reader model was designed to accomplish. Let's say you have an ivar (and for purpose of discussion let's assume that you've gone a wee bit further and encapsulated all your ivars into a single structure, just for simplicity's sake:
struct {
int x, y;
char *n;
dispatch_queue_t _internalQueue;
} myIvars;
Let's further assume (for brevity) that you've initialized the ivars in a dispatch_once() and created the _internalQueue as a serial queue with dispatch_queue_create() earlier in the code.
Now, to write a value:
dispatch_async(myIvars._internalQueue, ^{ myIvars.x = 10; });
dispatch_async(myIvars._internalQueue, ^{ myIvars.n = "Hi there"; });
And to read one:
__block int val; __block char *v;
dispatch_sync(myIvars._internalQueue, ^{ val = myIvars.x; });
dispatch_sync(myIvars._internalQueue, ^{ v = myIvars.n; })
Using the internal queue makes sure everything is appropriately serialized and that writes can happen asynchronously but reads wait for all pending writes to complete before giving you back the value. A lot of "GCD aware" data structures (or routines that have internal data structures) incorporate serial queues as implementation details for just this purpose.
dispatch_sync allows you to specify a second argument as completion block where you can get the values from your serial queue and use them on your main thread.So it would look something like
dispatch_sync(serialQueue,^{
//execute a block
dispatch_async(get_dispatch_main_queue,^{
//use your calculations here
});
});
And serial queues handle the concurrency part themselves. So if another piece is trying to access the same code at the same time it will be handled by the queue itself.Hope this was of little help.

is there a way that the synchronized keyword doesn't block the main thread

Imagine you want to do many thing in the background of an iOS application but you code it properly so that you create threads (for example using GCD) do execute this background activity.
Now what if you need at some point to write update a variable but this update can occur or any of the threads you created.
You obviously want to protect that variable and you can use the keyword #synchronized to create the locks for you but here is the catch (extract from the Apple documentation)
The #synchronized() directive locks a section of code for use by a
single thread. Other threads are blocked until the thread exits the
protected code—that is, when execution continues past the last
statement in the #synchronized() block.
So that means if you synchronized an object and two threads are writing it at the same time, even the main thread will block until both threads are done writing their data.
An example of code that will showcase all this:
// Create the background queue
dispatch_queue_t queue = dispatch_queue_create("synchronized_example", NULL);
// Start working in new thread
dispatch_async(queue, ^
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
});
// won’t actually go away until queue is empty
dispatch_release(queue);
So the question is fairly simple: How to overcome this ? How can we securely add a locks on all the threads EXCEPT the main thread which, we know, doesn't need to be blocked in that case ?
EDIT FOR CLARIFICATION
As you some of you commented, it does seem logical (and this was clearly what I thought at first when using synchronized) that only two the threads that are trying to acquire the lock should block until they are both done.
However, tested in a real situation, this doesn't seem to be the case and the main thread seems to also suffer from the lock.
I use this mechanism to log things in separate threads so that the UI is not blocked. But when I do intense logging, the UI (main thread) is clearly highly impacted (scrolling is not as smooth).
So two options here: Either the background tasks are too heavy that even the main thread gets impacted (which I doubt), or the synchronized also blocks the main thread while performing the lock operations (which I'm starting reconsidering).
I'll dig a little further using the Time Profiler.
I believe you are misunderstanding the following sentence that you quote from the Apple documentation:
Other threads are blocked until the thread exits the protected code...
This does not mean that all threads are blocked, it just means all threads that are trying to synchronise on the same object (the _sharedResource in your example) are blocked.
The following quote is taken from Apple's Thread Programming Guide, which makes it clear that only threads that synchronise on the same object are blocked.
The object passed to the #synchronized directive is a unique identifier used to distinguish the protected block. If you execute the preceding method in two different threads, passing a different object for the anObj parameter on each thread, each would take its lock and continue processing without being blocked by the other. If you pass the same object in both cases, however, one of the threads would acquire the lock first and the other would block until the first thread completed the critical section.
Update: If your background threads are impacting the performance of your interface then you might want to consider putting some sleeps into the background threads. This should allow the main thread some time to update the UI.
I realise you are using GCD but, for example, NSThread has a couple of methods that will suspend the thread, e.g. -sleepForTimeInterval:. In GCD you can probably just call sleep().
Alternatively, you might also want to look at changing the thread priority to a lower priority. Again, NSThread has the setThreadPriority: for this purpose. In GCD, I believe you would just use a low priority queue for the dispatched blocks.
I'm not sure if I understood you correctly, #synchronize doesn't block all threads but only the ones that want to execute the code inside of the block. So the solution probably is; Don't execute the code on the main thread.
If you simply want to avoid having the main thread acquire the lock, you can do this (and wreck havoc):
dispatch_async(queue, ^
{
if(![NSThread isMainThread])
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
}
else
[self writeComplexDataOnLocalFile];
});

A simple method for a "master" thread to monitor "slave" threads

As my first real foray into using pthreads, I'm looking to adapt an already written app of mine to use threads.
The paradigm I have in mind is basically to have one "master" thread which iterates through a list of data items to be processed, launching a new thread for each, with MAX_THREADS threads running at any given time (until the number of remaining tasks is less than this), each of which perform the same task on a single data element within a list.
The master thread needs to be aware of whenever any thread has completed its task and returned (or pthread_exit()'ed), immediately launching a new thread to perform the next task in the list.
What I'm wondering about is what are people's preferred methods for working with such a design? Data considerations aside, what would be the simplest set of pthreads functions to use to accomplish this? Obviously, pthread_join() is out as a means for "checking up" on threads.
Early experiments have been using a struct, passed as the final argument to pthread_create(), which contains an element called "running" which the thread sets to true on startup and resets just before returning. The master thread simply checks the current value of this struct element for each thread in a loop.
Here are the data the program uses for thread management:
typedef struct thread_args_struct
{
char *data; /* the data item the thread will be working on */
int index; /* thread's index in the array of threads */
int thread_id; /* thread's actual integer id */
int running; /* boolean status */
int retval; /* value to pass back from thread on return */
} thread_args_t;
/*
* array of threads (only used for thread creation here, not referenced
* otherwise)
*/
pthread_t thread[MAX_THREADS];
/*
* array of argument structs
*
* a pointer to the thread's argument struct will be passed to it on creation,
* and the thread will place its return value in the appropriate struct element
* before returning/exiting
*/
thread_args_t thread_args[MAX_THREADS];
Does this seem like a sound design? Is there a better, more standardized method for monitoring threads' running/exited status, a more "pthreads-y" way? I'm looking to use the simplest, clearest, cleanest mechanism possible which won't lead to any unexpected complications.
Thanks for any feedback.
There isn't so much a "pthreads-y" way as a (generic) multi-threading way. There is nothing wrong with what you have but it is more complicated and inefficient than it need be.
A more standard design is to use a thread pool. The master thread spawns a bunch of worker threads that read a queue. The master puts work in the queue and all the workers take a shot at processing the work in the queue. This eliminates the need to constantly start and terminate threads (though more sophisticated pools can have some mechanism to increase/decrease the pool size based on the work load). If the threads have to return data or status information they can use an output queue (maybe just a pointer to the actual data) that the master can read.
This still leaves the issue of how to get rid of the threads when you are done processing. Again, it is a master-worker relationship so it is advised that the master tell the slaves to shut themselves down. This amounts to using some program switch (such as you currently have), employing a condition variable somewhere, sending a signal, or cancelling the thread. There are a lot questions (and good answers) on this topic here.

Resources