Avoid Data Race condition in swift - ios

I am getting race conditions in my code when I run TSan tool. As same code has been accessed from different queues and threads at the same time that's why I can not use Serial queues or barrier as Queue will block only single queues accessing the shared resource not the other queues.
I used objc_sync_enter(object) | objc_sync_exit(object) and locks NSLock() or NSRecursiveLock() to protect shared resource but these are also not working.
While when I use #synchronized() keyword in Objective C to protect shared resource, it's working fine as expected and I am not getting race conditions in particular block of code.
So, what is an alternative to protect data in Swift as we can not use #synchronized() keyword in Swift language.
PFA screenshot for reference -

I don't understand "I can not use Serial queues or barrier as Queue will block only single queues accessing the shared resource not the other queues." Using a queue is the standard solution to this problem.
class MultiAccess {
private var _property: String = ""
private let queue = DispatchQueue(label: "MultiAccess")
var property: String {
get {
var result: String!
queue.sync {
result = self._property
}
return result
}
set {
queue.async {
self._property = newValue
}
}
}
}
With this construction, access to property is atomic and thread-safe without the caller having to do anything special. Note that this intentionally uses a single queue for the class, not a queue per-property. As a rule, you want a relatively small number of queues doing a lot of work, not a large number of queues doing a little work. If you find that you're accessing a mutable object from lots of different threads, you need to rethink your system (probably reducing the number of threads). There's no simple pattern that will make that work efficiently and safely without you having to think about your specific use case carefully.
But this construction is useful for problems where system frameworks may call you back on random threads with minimal contention. It is simple to implement and fairly easy to use correctly. If you have a more complex problem, you will have to design a solution for that problem.
Edit: I haven't thought about this answer in a long time, but Brennan's comments brought it back to my attention. Because of the bug I had in the original code, my original answer was ok, but if you fixed the bug it was bad. (If you want to see my original code that used a barrier, look in the edit history, I don't want to put it here because people will copy it.) I've changed it to use a standard serial queue rather than a concurrent queue.
Don't generate concurrent queues without careful thought about how threads will be generated. If you are going to have many simultaneous accesses, you're going to create a lot of threads, which is bad. If you're not going to have many simultaneous accesses, then you don't need a concurrent queue. GCD talks make promises about managing threads that it doesn't actually live up to. You definitely can get thread explosion (as Brennan mentions.)

Related

How does a DispatchQueue work? (specifically multithreading)

I don't understand the workings of a DispatchQueue and wanted to learn more about how they implement the foundational queueing theory requirements. I tried to inspect a queue using:
dump(DispatchQueue.global())
And this gave this output:
- <OS_dispatch_queue_global: com.apple.root.default-qos[0x10c041f00] = { xref = -2147483648, ref = -2147483648, sref = 1, target = [0x0], width = 0xfff, state = 0x0060000000000000, in-barrier}> #0
- super: OS_dispatch_queue
- super: OS_dispatch_object
- super: OS_object
- super: NSObject
I got that the label is com.apple.root.default-qos, and this is specified in the Apple docs and the class is the packaged OS_dispatch_queue_global. I understand qos is queryable on the queue itself and that makes sense as well. Width I think just means the allocated memory size.
What I don't understand are the relevances of xref, ref and sref, I think they are internal ids for the queues but I am not sure. I think they are related to fundamental queueing concepts (multithreading came to mind) but would be great to hone into this in more detail.
Is the autoreleaseFrequency hidden from this debug description? Also, what does in-barrier = 0 mean? I tried creating a custom queue and this was replaced by in-flight = 0.. so confused about that as well.
Any ideas on how these undocumented variables relate to queueing theory? I think these are undocumented internals of the API, so any educated and justified explanations would be fine!
Thanks.
Why ask this?
This is a fairly broad question about the internals of grand-central-dispatch. I had difficulty understanding the dumped output because the original WWDC '10 videos and slides for GCD are no longer public. I also didn't know about the open-source libdispatch repo (thanks Rob). That needn't be a problem, but there are no related QAs on SO explaining the topic in detail.
Why GCD?
According to the WWDC '10 GCD transcripts (Thanks Rob), the main idea behind the API was to simplify the boilerplate associated with using the #selector API for multithreading.
Benefits of GCD
Apple released a new block-based API instead of going with function pointers, to also enable type-safe code that wouldn't crash if the block had the wrong type signature. Using typedefs also made code cleaner when used in function parameters, local variables and #property declarations. Queues allow you to capture code and some state as a chunk of data that get managed, enqueued and executed automatically behind the scenes.
The same session mentions how GCD manages low-level threads under the hood. It enqueues blocks to execute on threads when they need to be executed and then releases those threads (PThreads to be precise) when they are no longer referenced. GCD manages threads automatically and doesn't expose this API - when a DispatchWorkItem is dequeued GCD creates a thread for this block to execute on.
Drawbacks of performSelector
performSelector:onThread:withObject:waitUntilDone: has numerous drawbacks that suggest poor design for the modern challenges of concurrency, waiting, synchronisation. leads to pyramids of doom when switching threads in a func. Furthermore, the NSObject.performSelector family of threading methods are inflexible and limited:
No options to optimise for concurrent, initially inactive, or synchronisation on a particular thread. Unlike GCD.
Only selectors can be dispatched on to new threads (awful).
Lots of threads for a given function leads to messy code (pyramids of doom).
No support for queueing without a limited (at the time when GCD was announced in iOS 4) NSOperation API. NSOperations are a high-level, verbose API that became more powerful after incorporating elements of dispatch (low-level API that became GCD) in iOS 4.
Lots of bugs related to unhandled invalid Selector errors (type safety).
DispatchQueue internals
I believe the xref, ref and sref are internal registers that manage reference counts for automatic reference counting. GCD calls dispatch_retain and dispatch_release in most cases when needed, so we don't need to worry about releasing a queue after all its blocks have been executed. However, there were cases when a developer could call retain and release manually when trying to ensure the queue is retained even when not directly in use. These registers allow libDispatch to crash when release is called on a queue with a positive reference count, for better error handling.
When calling a block with DispatchQueue.global().async or similar, I believe this increments the reference count of that queue (xref and ref).
The variables in the question are not documented explicitly, but from what I can tell:
xref counts the number of external references to a general DispatchQueue.
ref counts the total number of references to a general DispatchQueue.
sref counts the number of references to API serial/concurrent/runloop queues, sources and mach channels (these need to be tracked differently as they are represented using different types).
in-barrier looks like an internal state flag (DispatchWorkItemFlag) to track whether new work items submitted to a concurrent queue should be scheduled or not. Only once the barrier work item finishes, the queue returns to scheduling work items that were submitted after the barrier. in-flight means that there is no barrier in force currently.
state is also not documented explicitly but I presume points to memory where the block can access variables from the scope where the block was scheduled.

Is there something bad will happend when construct multi level dispatchQueue

Snippet 1
let workerQueue = DispatchQueue(label: "foo")
workderQueue.async {
#code
DispatchQueue.main.async {
#code
workerQueue.async {
#code
}
}
}
Snippet 2
let workerQueue = DispatchQueue(label: "foo")
DispatchQueue.main.async {
#code
workerQueue.async {
#code
DispatchQueue.main.async {
#code
}
}
}
Is it ok to write code like snippet 1 or snippet 2 ? Will the main thread be blocked?
No, there’s nothing inherently “bad” with these patterns.
That having been said, the typical pattern is:
workerQueue.async {
// do something computationally intensive
DispatchQueue.main.async {
// update UI and/or model
}
}
And this is motivated by “I have something sufficiently intensive that I don’t want it running on the main queue (and adversely affecting the UX), but when I’m done, I need to update my UI and/or the model.”
It’s pretty rare that you also have a further level of nesting that says “and when I’m done updating the UI, I need to dispatch something else back to my same worker queue”, as shown in your first example. If you have a practical example of that, perhaps you can share it with us, as there might be more elegant ways of refactoring that to avoid a “tower” of nested dispatches, which can start to render the code a little hard to follow.
In your second example, where you’re dispatching to the main thread first is also a bit atypical. (Yes, we occasionally have to do that, but it’s less common.) I guess you’re presuming you’re already on some other thread (though not always). If that’s the case, where the code is expecting you to be on a particular thread, you might want to make that assumption explicit, e.g.:
func foo() {
dispatchPrecondition(condition: .onQueue(workerQueue))
// do some stuff that should be on the `workerQueue`
DispatchQueue.main.async {
// update UI, etc.
...
}
}
Bottom line, there’s absolutely no problem with an arbitrary level of nesting, especially if doing it with async ... doing this with sync can invite deadlock problems if you’re not careful. But from a practical “can future programmers read this code and clearly deduce what the resulting behavior is going to be”, you will often want to constrain this so it’s not too confusing.
As a practical example I might be doing something on some dedicate functional queue (e.g. a database access queue or an image processing queue), and I might need to use another synchronization queue to ensure thread-safe access, and I might need to do UI updates on the main queue. But I don’t generally have this tower of three levels of queues, but, for example, I encapsulate these various levels to avoid confusion (e.g. I have a reader-writer generic that encapsulates the details of that concurrent queue I use for thread-safe access), and my code avoids complicated intermingling different types of dispatch queues in any given method. At any particular level of abstraction I’m trying to avoid too many different queue types at one time.
You only need to execute code on desired thread, if you are on some another thread.
For example, Since by default app runs on main thread. But if a task runs on Background thread and modifies UI element within itself, then it should be enclosed in main thread.
some_background/other_thread_task {
DispatchQueue.main.async {
//UI Update
self.myTableView.reloadData()
}
}
Both your snippets are the same structure. That structure is totally normal and is exactly how to switch threads.

How would #synchronized work if called on separate threads? [duplicate]

I just created a singleton method, and I would like to know what the function #synchronized() does, as I use it frequently, but do not know the meaning.
It declares a critical section around the code block. In multithreaded code, #synchronized guarantees that only one thread can be executing that code in the block at any given time.
If you aren't aware of what it does, then your application probably isn't multithreaded, and you probably don't need to use it (especially if the singleton itself isn't thread-safe).
Edit: Adding some more information that wasn't in the original answer from 2011.
The #synchronized directive prevents multiple threads from entering any region of code that is protected by a #synchronized directive referring to the same object. The object passed to the #synchronized directive is the object that is used as the "lock." Two threads can be in the same protected region of code if a different object is used as the lock, and you can also guard two completely different regions of code using the same object as the lock.
Also, if you happen to pass nil as the lock object, no lock will be taken at all.
From the Apple documentation here and here:
The #synchronized directive is a
convenient way to create mutex locks
on the fly in Objective-C code. The
#synchronized directive does what any
other mutex lock would do—it prevents
different threads from acquiring the
same lock at the same time.
The documentation provides a wealth of information on this subject. It's worth taking the time to read through it, especially given that you've been using it without knowing what it's doing.
The #synchronized directive is a convenient way to create mutex locks on the fly in Objective-C code.
The #synchronized directive does what any other mutex lock would do—it prevents different threads from acquiring the same lock at the same time.
Syntax:
#synchronized(key)
{
// thread-safe code
}
Example:
-(void)AppendExisting:(NSString*)val
{
#synchronized (oldValue) {
[oldValue stringByAppendingFormat:#"-%#",val];
}
}
Now the above code is perfectly thread safe..Now Multiple threads can change the value.
The above is just an obscure example...
#synchronized block automatically handles locking and unlocking for you. #synchronize
you have an implicit lock associated with the object you are using to synchronize. Here is very informative discussion on this topic please follow How does #synchronized lock/unlock in Objective-C?
Excellent answer here:
Help understanding class method returning singleton
with further explanation of the process of creating a singleton.
#synchronized is thread safe mechanism. Piece of code written inside this function becomes the part of critical section, to which only one thread can execute at a time.
#synchronize applies the lock implicitly whereas NSLock applies it explicitly.
It only assures the thread safety, not guarantees that. What I mean is you hire an expert driver for you car, still it doesn't guarantees car wont meet an accident. However probability remains the slightest.

-allKeys on background thread results in error: __NSDictionaryM was mutated while being enumerated

I've come across an interesting issue using mutable dictionaries on background threads.
Currently, I am downloading data in chunks on one thread, adding it to a data set, and processing it on another background thread. The overall design works for the most part aside from one issue: On occasion, a function call to an inner dictionary within the main data set causes the following crash:
*** Collection <__NSDictionaryM: 0x13000a190> was mutated while being enumerated.
I know this is a fairly common crash to have, but the strange part is that it's not crashing in a loop on this collection. Instead, the exception breakpoint in Xcode is stopping on the following line:
NSArray *tempKeys = [temp allKeys];
This leads me to believe that one thread is adding items to this collection while the NSMutableDictionary's internal function call to -allKeys is enumerating over the keys in order to return the array on another thread.
My question is: Is this what's happening? If so, what would be the best way to avoid this?
Here's the gist of what I'm doing:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(void) {
for (NSString *key in [[queue allKeys] reverseObjectEnumerator]) { //To prevent crashes
NEXActivityMap *temp = queue[key];
NSArray *tempKeys = [temp allKeys]; //<= CRASHES HERE
if (tempKeys.count > 0) {
//Do other stuff
}
}
});
You can use #synchronize. And it will work. But this is mixing up two different ideas:
Threads have been around for many years. A new thread opens a new control flow. Code in different threads are running potentially concurrently causing conflicts as you had. To prevent this conflicts you have to use locks like #synchronized do.
GCD is the more modern concept. GCD runs "on top of threads" that means, it uses threads, but this is transparent for you. You do not have to care about this. Code running in different queues are running potentially concurrently causing conflicts. To prevent this conflicts you have to use one queue for shared resources.
You are already using GCD, what is a good idea:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(void) {
The same code with threads would look like this:
[[NSThread mainThread] performSelector:…];
So, using GCD, you should use GCD to prevent the conflicts. What you are doing is to use GCD wrongly and then "repair" that with locks.
Simply put all accesses to the shared resource (in your case the mutable dictionary referred by temp) into on serial queue.
Create a queue at the beginning for the accesses. This is a one-timer.
You can use one of the existing queues as you do in your code, but you have to use a serial one! But this potentially leads to long queues with waiting tasks (in your example blocks). Different tasks in a serial queue are executed one after each other, even there are cpu cores idle. So it is no good idea to put too many tasks into one queue. Create a queue for any shared resource or "subsystem":
dispatch_queue_t tempQueue;
tempQueue = dispatch_queue_create("tempQueue", NULL);
When code wants to access the mutable dictionary, put it in a queue:
It looks like this:
dispatch_sync( tempQueue, // or async, if it is possible
^{
[tempQueue setObject:… forKey:…]; // Or what you want to do.
}
You have to put every code accessing the shared resource in the queue as you have to put every code accessing the shared resource inn locks when using threads.
From Apple documentation "Thread safety summary":
Mutable objects are generally not thread-safe. To use mutable objects
in a threaded application, the application must synchronize access to
them using locks. (For more information, see Atomic Operations). In
general, the collection classes (for example, NSMutableArray,
NSMutableDictionary) are not thread-safe when mutations are concerned.
That is, if one or more threads are changing the same array, problems
can occur. You must lock around spots where reads and writes occur to
assure thread safety.
In your case, following scenario happens. From one thread, you add elements into dictionary. In another thread, you accessing allKeys method. While this methods copies all keys into array, other methods adds new key. This causes exception.
To avoid that, you have several options.
Because you are using dispatch queues, preferred way is to put all code, that access same mutable dictionary instance, into private serial dispatch queue.
Second option is passing immutable dictionary copy to other thread. In this case, no matter what happen in first thread with original dictionary, data still will be consistent. Note that you will probably need deep copy, cause you use dictionary/arrays hierarchy.
Alternatively you can wrap all points, where you access collections, with locks. Using #synchronized also implicitly create recursive lock for you.
How about wrapping where you get the keys AND where you set the keys, with #synchronize?
Example:
- (void)myMethod:(id)anObj
{
#synchronized(anObj)
{
// Everything between the braces is protected by the #synchronized directive.
}
}

#synchronized block versus GCD dispatch_async()

Essentially, I have a set of data in an NSDictionary, but for convenience I'm setting up some NSArrays with the data sorted and filtered in a few different ways. The data will be coming in via different threads (blocks), and I want to make sure there is only one block at a time modifying my data store.
I went through the trouble of setting up a dispatch queue this afternoon, and then randomly stumbled onto a post about #synchronized that made it seem like pretty much exactly what I want to be doing.
So what I have right now is...
// a property on my object
#property (assign) dispatch_queue_t matchSortingQueue;
// in my object init
_sortingQueue = dispatch_queue_create("com.asdf.matchSortingQueue", NULL);
// then later...
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
dispatch_async(_sortingQueue, ^{
// do stuff...
});
}
And my question is, could I just replace all of this with the following?
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
#synchronized (self) {
// do stuff...
};
}
...And what's the difference between the two anyway? What should I be considering?
Although the functional difference might not matter much to you, it's what you'd expect: if you #synchronize then the thread you're on is blocked until it can get exclusive execution. If you dispatch to a serial dispatch queue asynchronously then the calling thread can get on with other things and whatever it is you're actually doing will always occur on the same, known queue.
So they're equivalent for ensuring that a third resource is used from only one queue at a time.
Dispatching could be a better idea if, say, you had a resource that is accessed by the user interface from the main queue and you wanted to mutate it. Then your user interface code doesn't need explicitly to #synchronize, hiding the complexity of your threading scheme within the object quite naturally. Dispatching will also be a better idea if you've got a central actor that can trigger several of these changes on other different actors; that'll allow them to operate concurrently.
Synchronising is more compact and a lot easier to step debug. If what you're doing tends to be two or three lines and you'd need to dispatch it synchronously anyway then it feels like going to the effort of creating a queue isn't worth it — especially when you consider the implicit costs of creating a block and moving it over onto the heap.
In the second case you would block the calling thread until "do stuff" was done. Using queues and dispatch_async you will not block the calling thread. This would be particularly important if you call sortArrayIntoLocalStore from the UI thread.

Resources