NSMutableDictionary - EXC BAD ACCESS - simultaneous read/write - ios

I was hoping for some help with my app.
I have a set up where multiple threads access a shared NSMutableDictionary owned by a singleton class. The threads access the dictionary in response to downloading JSON and processing it. The singleton class is basically preventing duplication of some downloaded objects which have an unique id number.
ie.
//NSURLConnection calls:
[[Singleton sharedInstance] processJSON:data];
#interface Singleton
+ (Singleton) sharedInstance;
#property (nonatomic, strong) NSMutableDictionary *store;
#end
#implementation
-(void) processJSON:(NSData*)data {
...
someCustomClass *potentialEntry = [someCustomClass parse:data];
...
if(![self entryExists:potentialEntry.stringId])
[self addNewEntry:potentialEntry];
...
}
-(void) entryExists:(NSString*)objectId {
if(self.store[objectId])
return true;
else return false;
}
-(void) addEntry:(someCustomClass *object) {
self.store[object.stringId] = object;
}
There can be as many as 5-10 threads at a time calling processJSON at once.
Not immediately but after a few minutes of running (quicker on the iPhone than on the simulator) I get the dreaded EXC BAD ACCESS.
I don't confess to know how NSMutableDictionary works but I would guess that there's some kind of hash table in the background which needs to be updated when assigning objects and read when accessing objects. Therefore, if threads were to instantaneously read/write to a dictionary, this error could occur - may be because an object has moved in memory?
Im hoping that someone with more knowledge on the subject could enlighten me!
As for solutions I was thinking of the singleton class having an NSOperationQueue with a maximum concurrent operation number of 1 and then using operationWithBlock: whenever I want to access the NSDictionary. The only problem is that it makes calling processJSON an asynchronous function and I can't return the created object straight away; I'd have to use a block and that would be a bit messier. Is there any way of using #synchronize? Would that work well?

I'd draw your attention to the Synchronization section of the iOS rendition of the Threading Programming Guide that Hot Licks pointed you to. One of those locking mechanisms, or the use of a dedicated serial queue, can help you achieve thread safety.
Your intuition regarding the serial operation queue is promising, though frequently people will use a serial dispatch queue for this (e.g., so you can call dispatch_sync from any queue to your dictionary's serial queue), achieving both a controlled mechanism for interacting with it as well as synchronous operations. Or, even better, you can use a custom concurrent queue (not a global queue), and perform reads via dispatch_sync and perform writes via dispatch_barrier_async, achieving an efficient reader/writer scheme (as discussed in WWDC 2011 - Mastering GCD or WWDC 2012 - Asynchronous Design Patterns).
The Eliminating Lock-Based Code section of the Concurrency Programming Guide outlines some of the rationale for using a serial queue for synchronization versus the traditional locking techniques.
The Grand Central Dispatch (GCD) Reference and the dispatch queue discussion in the Concurrency Programming Guide should provide quite a bit of information.

the simplest solution is to just put all off the code that accesses the dict in an #synchronized block.
serial operation queues are great, but sounds like overkill to me for this, as you aren't guarding a whole ecosystem of data, just one structure..

Related

How does a DispatchQueue work? (specifically multithreading)

I don't understand the workings of a DispatchQueue and wanted to learn more about how they implement the foundational queueing theory requirements. I tried to inspect a queue using:
dump(DispatchQueue.global())
And this gave this output:
- <OS_dispatch_queue_global: com.apple.root.default-qos[0x10c041f00] = { xref = -2147483648, ref = -2147483648, sref = 1, target = [0x0], width = 0xfff, state = 0x0060000000000000, in-barrier}> #0
- super: OS_dispatch_queue
- super: OS_dispatch_object
- super: OS_object
- super: NSObject
I got that the label is com.apple.root.default-qos, and this is specified in the Apple docs and the class is the packaged OS_dispatch_queue_global. I understand qos is queryable on the queue itself and that makes sense as well. Width I think just means the allocated memory size.
What I don't understand are the relevances of xref, ref and sref, I think they are internal ids for the queues but I am not sure. I think they are related to fundamental queueing concepts (multithreading came to mind) but would be great to hone into this in more detail.
Is the autoreleaseFrequency hidden from this debug description? Also, what does in-barrier = 0 mean? I tried creating a custom queue and this was replaced by in-flight = 0.. so confused about that as well.
Any ideas on how these undocumented variables relate to queueing theory? I think these are undocumented internals of the API, so any educated and justified explanations would be fine!
Thanks.
Why ask this?
This is a fairly broad question about the internals of grand-central-dispatch. I had difficulty understanding the dumped output because the original WWDC '10 videos and slides for GCD are no longer public. I also didn't know about the open-source libdispatch repo (thanks Rob). That needn't be a problem, but there are no related QAs on SO explaining the topic in detail.
Why GCD?
According to the WWDC '10 GCD transcripts (Thanks Rob), the main idea behind the API was to simplify the boilerplate associated with using the #selector API for multithreading.
Benefits of GCD
Apple released a new block-based API instead of going with function pointers, to also enable type-safe code that wouldn't crash if the block had the wrong type signature. Using typedefs also made code cleaner when used in function parameters, local variables and #property declarations. Queues allow you to capture code and some state as a chunk of data that get managed, enqueued and executed automatically behind the scenes.
The same session mentions how GCD manages low-level threads under the hood. It enqueues blocks to execute on threads when they need to be executed and then releases those threads (PThreads to be precise) when they are no longer referenced. GCD manages threads automatically and doesn't expose this API - when a DispatchWorkItem is dequeued GCD creates a thread for this block to execute on.
Drawbacks of performSelector
performSelector:onThread:withObject:waitUntilDone: has numerous drawbacks that suggest poor design for the modern challenges of concurrency, waiting, synchronisation. leads to pyramids of doom when switching threads in a func. Furthermore, the NSObject.performSelector family of threading methods are inflexible and limited:
No options to optimise for concurrent, initially inactive, or synchronisation on a particular thread. Unlike GCD.
Only selectors can be dispatched on to new threads (awful).
Lots of threads for a given function leads to messy code (pyramids of doom).
No support for queueing without a limited (at the time when GCD was announced in iOS 4) NSOperation API. NSOperations are a high-level, verbose API that became more powerful after incorporating elements of dispatch (low-level API that became GCD) in iOS 4.
Lots of bugs related to unhandled invalid Selector errors (type safety).
DispatchQueue internals
I believe the xref, ref and sref are internal registers that manage reference counts for automatic reference counting. GCD calls dispatch_retain and dispatch_release in most cases when needed, so we don't need to worry about releasing a queue after all its blocks have been executed. However, there were cases when a developer could call retain and release manually when trying to ensure the queue is retained even when not directly in use. These registers allow libDispatch to crash when release is called on a queue with a positive reference count, for better error handling.
When calling a block with DispatchQueue.global().async or similar, I believe this increments the reference count of that queue (xref and ref).
The variables in the question are not documented explicitly, but from what I can tell:
xref counts the number of external references to a general DispatchQueue.
ref counts the total number of references to a general DispatchQueue.
sref counts the number of references to API serial/concurrent/runloop queues, sources and mach channels (these need to be tracked differently as they are represented using different types).
in-barrier looks like an internal state flag (DispatchWorkItemFlag) to track whether new work items submitted to a concurrent queue should be scheduled or not. Only once the barrier work item finishes, the queue returns to scheduling work items that were submitted after the barrier. in-flight means that there is no barrier in force currently.
state is also not documented explicitly but I presume points to memory where the block can access variables from the scope where the block was scheduled.

-allKeys on background thread results in error: __NSDictionaryM was mutated while being enumerated

I've come across an interesting issue using mutable dictionaries on background threads.
Currently, I am downloading data in chunks on one thread, adding it to a data set, and processing it on another background thread. The overall design works for the most part aside from one issue: On occasion, a function call to an inner dictionary within the main data set causes the following crash:
*** Collection <__NSDictionaryM: 0x13000a190> was mutated while being enumerated.
I know this is a fairly common crash to have, but the strange part is that it's not crashing in a loop on this collection. Instead, the exception breakpoint in Xcode is stopping on the following line:
NSArray *tempKeys = [temp allKeys];
This leads me to believe that one thread is adding items to this collection while the NSMutableDictionary's internal function call to -allKeys is enumerating over the keys in order to return the array on another thread.
My question is: Is this what's happening? If so, what would be the best way to avoid this?
Here's the gist of what I'm doing:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(void) {
for (NSString *key in [[queue allKeys] reverseObjectEnumerator]) { //To prevent crashes
NEXActivityMap *temp = queue[key];
NSArray *tempKeys = [temp allKeys]; //<= CRASHES HERE
if (tempKeys.count > 0) {
//Do other stuff
}
}
});
You can use #synchronize. And it will work. But this is mixing up two different ideas:
Threads have been around for many years. A new thread opens a new control flow. Code in different threads are running potentially concurrently causing conflicts as you had. To prevent this conflicts you have to use locks like #synchronized do.
GCD is the more modern concept. GCD runs "on top of threads" that means, it uses threads, but this is transparent for you. You do not have to care about this. Code running in different queues are running potentially concurrently causing conflicts. To prevent this conflicts you have to use one queue for shared resources.
You are already using GCD, what is a good idea:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(void) {
The same code with threads would look like this:
[[NSThread mainThread] performSelector:…];
So, using GCD, you should use GCD to prevent the conflicts. What you are doing is to use GCD wrongly and then "repair" that with locks.
Simply put all accesses to the shared resource (in your case the mutable dictionary referred by temp) into on serial queue.
Create a queue at the beginning for the accesses. This is a one-timer.
You can use one of the existing queues as you do in your code, but you have to use a serial one! But this potentially leads to long queues with waiting tasks (in your example blocks). Different tasks in a serial queue are executed one after each other, even there are cpu cores idle. So it is no good idea to put too many tasks into one queue. Create a queue for any shared resource or "subsystem":
dispatch_queue_t tempQueue;
tempQueue = dispatch_queue_create("tempQueue", NULL);
When code wants to access the mutable dictionary, put it in a queue:
It looks like this:
dispatch_sync( tempQueue, // or async, if it is possible
^{
[tempQueue setObject:… forKey:…]; // Or what you want to do.
}
You have to put every code accessing the shared resource in the queue as you have to put every code accessing the shared resource inn locks when using threads.
From Apple documentation "Thread safety summary":
Mutable objects are generally not thread-safe. To use mutable objects
in a threaded application, the application must synchronize access to
them using locks. (For more information, see Atomic Operations). In
general, the collection classes (for example, NSMutableArray,
NSMutableDictionary) are not thread-safe when mutations are concerned.
That is, if one or more threads are changing the same array, problems
can occur. You must lock around spots where reads and writes occur to
assure thread safety.
In your case, following scenario happens. From one thread, you add elements into dictionary. In another thread, you accessing allKeys method. While this methods copies all keys into array, other methods adds new key. This causes exception.
To avoid that, you have several options.
Because you are using dispatch queues, preferred way is to put all code, that access same mutable dictionary instance, into private serial dispatch queue.
Second option is passing immutable dictionary copy to other thread. In this case, no matter what happen in first thread with original dictionary, data still will be consistent. Note that you will probably need deep copy, cause you use dictionary/arrays hierarchy.
Alternatively you can wrap all points, where you access collections, with locks. Using #synchronized also implicitly create recursive lock for you.
How about wrapping where you get the keys AND where you set the keys, with #synchronize?
Example:
- (void)myMethod:(id)anObj
{
#synchronized(anObj)
{
// Everything between the braces is protected by the #synchronized directive.
}
}

NSMutableArray Thread Concurrency with GCD

I have an NSMutableArray in a "sharedStore"-pattern singleton.
Publicly, it's accessible only through methods that cast it as an NSArray. Within the class, it's
#property (nonatomic, copy) NSMutableArray *myItems;
This array never gets manipulated outsdie the singleton but ViewControllers send the singleton messages to manipulate this controller. Some of these messages empty the array, some re-populate it, etc.
Having ended up in a situation where the array was empty in one method call and not yet empty in the next, I've started implementing some concurrency behaviour.
Here's what I'm doing so far:
In the .m file of the singleton, I have a
#property (nonatomic, strong) dispatch_queue_t arrayAccessQueue;
In my singleton's initializer it gets created as a serial queue. And then, every method that has anything to do with mutating this array does so from within a dispatch_sync call, for example:
dispatch_sync(self.arrayAccessQueue, ^{
[_myItems removeAllObjects];
});
This has made things better and has made my app behave more smoothly. However, I have no way of quantifying that beyond it having fixed that one odd behaviour described above. I also kind of feel like I'm in the dark as to any problems that may be lurking beneath the surface.
This pattern makes sense to me, but should I be using something else, like #synchronize or NSLock or NSOperationQueue? Will this come back to bite me?
Using dispatch_sync is fine as long as you wrap all array reads and writes and you ensure it is a serial queue.
But you could improve things by allowing concurrent reads. To do this, use dispatch_sync around all array reads and use dispatch_barrier_sync around all array writes. And setup the queue to be concurrent.
Do this ensures only a single write can happen at a time, reads will be block until the write is done, and a write will wait until all current reads are done.
Using a GCD queue concurrent and providing sort of accessor to your array you can synchronize reading and writing by using dispatch_sync while reading and dispatch_barrier_async while writing.
- (id)methodToRead {
id __block obj = nil;
dispatch_sync(syncQueue, ^{
obj = <#read_Something#>;
});
return obj;
}
- (void) methodsForWriting:(id)obj {
dispatch_barrier_async(syncQueue, ^{
// write passing obj to something
});
}
This will guarantee that during writing everything is locked from reading.
Using GCD is the right choice. The only "gotcha" is that you need to do ALL operations on that queue: add, remove, insert, etc.
I will also mention you need to ensure that you do not use a concurrent queue. You should be using a serial queue, which is the default anyways.

#synchronized block versus GCD dispatch_async()

Essentially, I have a set of data in an NSDictionary, but for convenience I'm setting up some NSArrays with the data sorted and filtered in a few different ways. The data will be coming in via different threads (blocks), and I want to make sure there is only one block at a time modifying my data store.
I went through the trouble of setting up a dispatch queue this afternoon, and then randomly stumbled onto a post about #synchronized that made it seem like pretty much exactly what I want to be doing.
So what I have right now is...
// a property on my object
#property (assign) dispatch_queue_t matchSortingQueue;
// in my object init
_sortingQueue = dispatch_queue_create("com.asdf.matchSortingQueue", NULL);
// then later...
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
dispatch_async(_sortingQueue, ^{
// do stuff...
});
}
And my question is, could I just replace all of this with the following?
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
#synchronized (self) {
// do stuff...
};
}
...And what's the difference between the two anyway? What should I be considering?
Although the functional difference might not matter much to you, it's what you'd expect: if you #synchronize then the thread you're on is blocked until it can get exclusive execution. If you dispatch to a serial dispatch queue asynchronously then the calling thread can get on with other things and whatever it is you're actually doing will always occur on the same, known queue.
So they're equivalent for ensuring that a third resource is used from only one queue at a time.
Dispatching could be a better idea if, say, you had a resource that is accessed by the user interface from the main queue and you wanted to mutate it. Then your user interface code doesn't need explicitly to #synchronize, hiding the complexity of your threading scheme within the object quite naturally. Dispatching will also be a better idea if you've got a central actor that can trigger several of these changes on other different actors; that'll allow them to operate concurrently.
Synchronising is more compact and a lot easier to step debug. If what you're doing tends to be two or three lines and you'd need to dispatch it synchronously anyway then it feels like going to the effort of creating a queue isn't worth it — especially when you consider the implicit costs of creating a block and moving it over onto the heap.
In the second case you would block the calling thread until "do stuff" was done. Using queues and dispatch_async you will not block the calling thread. This would be particularly important if you call sortArrayIntoLocalStore from the UI thread.

How to use GCD for lightweight transactional locking of resources?

I'm trying to use GCD as a replacement for dozens of atomic properties. I remember at WWDC they were talking about that GCD could be used for efficient transactional locking mechanisms.
In my OpenGL ES runloop method I put all drawing code in a block executed by dispatch_sync on a custom created serial queue. The runloop is called by a CADisplayLink which is to my knowledge happening on the main thread.
There are ivars and properties which are used both for drawing but also for controlling what will be drawn. The problem is that there must be some locking in place to prevent concurrency problems, and a way of transactionally querying and modifying the state of the OpenGL ES scene from the main thread between two drawn frames.
I can modify a group of properties in a transactional way with GCD by executing a block on that serial queue.
But it seems I can't read values into the main thread, using GCD, while blocking the queue that executes the drawing code. dispatch_synch doesn't have a return value, but I want to get access to presentation values exactly between the drawing of two frames both for reading and writing.
Is it this barrier thing they were talking about? How does that work?
This is what the async writer / sync reader model was designed to accomplish. Let's say you have an ivar (and for purpose of discussion let's assume that you've gone a wee bit further and encapsulated all your ivars into a single structure, just for simplicity's sake:
struct {
int x, y;
char *n;
dispatch_queue_t _internalQueue;
} myIvars;
Let's further assume (for brevity) that you've initialized the ivars in a dispatch_once() and created the _internalQueue as a serial queue with dispatch_queue_create() earlier in the code.
Now, to write a value:
dispatch_async(myIvars._internalQueue, ^{ myIvars.x = 10; });
dispatch_async(myIvars._internalQueue, ^{ myIvars.n = "Hi there"; });
And to read one:
__block int val; __block char *v;
dispatch_sync(myIvars._internalQueue, ^{ val = myIvars.x; });
dispatch_sync(myIvars._internalQueue, ^{ v = myIvars.n; })
Using the internal queue makes sure everything is appropriately serialized and that writes can happen asynchronously but reads wait for all pending writes to complete before giving you back the value. A lot of "GCD aware" data structures (or routines that have internal data structures) incorporate serial queues as implementation details for just this purpose.
dispatch_sync allows you to specify a second argument as completion block where you can get the values from your serial queue and use them on your main thread.So it would look something like
dispatch_sync(serialQueue,^{
//execute a block
dispatch_async(get_dispatch_main_queue,^{
//use your calculations here
});
});
And serial queues handle the concurrency part themselves. So if another piece is trying to access the same code at the same time it will be handled by the queue itself.Hope this was of little help.

Resources