We have recently revised our threading mechanism in favour of using dispatch_async's in most places (after doing a lot of reading about NSOperation vs dispatch_async)*. Then our code started crashing with EXC_BAD_ACCESS in various parts of the code, always on the dispatch_async(queue,...) part, with no clear pattern. Usually happening after 20 minutes - 2 hours.
Our dispatch_async blocks were used to notify listeners, looked as follows:
NSMutableSet *_listeners; // Initialised elsewhere and filled with interested listeners
void(^block)(id listener); // Block to execute
#synchronized(_listeners) {
for (id listener in _listeners) {
dispatch_async_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); // We used different queues for different listeners, but showing only one type of queue here for brevity
dispatch_async(queue, ^{ // CRASHING LINE
block(listener);
});
}
}
y common symptoms were:
Happens on iOS10, never happens on iOS8
Happen during debugging but never reported on production
(This is a self-answered question)
*We liked the simplicity of dispatch_async, didn't need the blocking / dependency features of NSOperationQueue's and we will be moving to C++ soon so wanted to stay low level.
After days of debugging, ensuring our thread objects were strongly retained, and trying various weak-strong combinations and thorough profiling using Instruments, we have come to the conclusion that this is an Apple bug (as also reported here) that only appears on recent iOS versions (iOS10 for us, but I reckon it will be present from the version when libBacktraceRecording.dylib started appearing).
Symptoms of:
Not reproducible on iOS8.x
Only happens when in debugging mode
EXC_BAD_ACCESS in random parts of the code, without any pattern
might indicate this.
Hope this is useful for others!
Related
First off, I'd like to clarify that I'm not talking about concurrency here. I fully understand that having multiple threads modify the UI at the same time is bad, can give race conditions, deadlocks, bugs etc, but that's separate to my question.
I'd like to know why MacOS/iOS forces the main thread (ID 0, first thread, whatever) to be the thread on which the GUI must be used/updated/created on.
see here, related:
on OSX/iOS the GUI must always be updated from the main thread, end of story.
I understand that you only ever want a single thread doing the acutal updating of the GUI, but why does that thread have to be ID 0?
(this is background info, TLDR below)
In my case, I'm making a rust app that uses a couple of threads to do things:
engine - does processing and calculations
ui - self explanatory
program/main - monitors other threads and generally synchronizes things
I'm currently doing something semi-unsafe and creating the UI on it's own thread, which works since I'm on windows, but the API is explicitly marked as BAD to use, and it's not cross compatible for MacOS/iOS for obvious reasons (and I want it to be as compatible as possible).
With the UI/engine threads (there may be more in the future), they are semi-unstable and could crash/exit early, outside of my control (external code). This has happened before, and so I want to have a graceful shutdown if anything goes wrong, hence the 'main' thread monitoring (among other things it does).
I am aware that I could just make Thread 0 the UI thread and move the program to another thread, but the app will immediately quit when the main thread quits, which means if the UI crashes the whole things just aborts (and I don't want this). Essentially, I need my main function on the main thread, since I know it won't suddenly exit and abort the whole app abruptly.
TL;DR
Overall, I'd like to know three things
Why does MacOS/iOS enforce the GUI being on THread 0 (ignoring thread-safety outlined above)
Are there any ways to bypass this (use a different thread for GUI), or will I simply need to sacrifice those platforms (and possible others I'm unaware of)?
Would it be possible to do something like have the UI run as a separate process, and have it share some memory/communicate with the main process, using safe, simple rust?
p.s. I'm aware of this question, it's relevant but doesn't really answer my questions.
Why does MacOS/iOS enforce the GUI being on Thread 0.
Because it's been that way for over 30 years now (since NeXTSTEP), and changing it would break just about every program out there, since almost every Cocoa app assumes this, and relies on it regularly, not just for the main thread, but also the main runloop, the main dispatch group, and now the main actor. External UI events (which come from other processes like the window manager) are delivered on thread 0. NSDistributedNotifications are delivered on thread 0. Signal handling, the list goes on. Yes, it is certainly possible for Darwin (which underlies Cocoa) to be rewritten to allow this. That's not going to happen. I'm not sure what other answer you want.
Would it be possible to do something like have the UI run as a separate process, and have it share some memory/communicate with the main process, using safe, simple rust?
Absolutely. See XPC, which is explicitly for this purpose (communicating, not sharing memory; don't share memory, that's a mess). See sys-xpc for the Rust interface.
Why is it the responsibility of the programmer to call UI related methods on the main thread with:
DispatchQueue.main.async {}
Theoretically, couldn’t this be left up to the compiler or some other agent to determine?
The actual answer is developer inertia and grandfathering.
The Cocoa UI API is huge—nay, gigantic. It has also been in continuous development since the 1990's.
Back when I was a youth and there were no multi-core, 64-bit, anything, 99.999% of all applications ran on the main thread. Period. (The original Mac OS, pre-OS X, didn't even have threads.)
Later, a few specialized tasks could be run on background threads, but largely apps still ran on the main thread.
Fast forward to today where it's trivial to dispatch thousands of tasks for background execution and CPUs can run 30 or more current threads, it's easy to say "hey, why doesn't the compiler/API/OS handle this main-thread thing for me?" But what's nigh on impossible is re-engineering four decades of Cocoa code and apps to make that work.
There are—I'm going to say—hundreds of millions of lines of code that all assume UI calls are executing concurrently on the main thread. As others have pointed out, there is no cleaver switch or pre-processor that's going to undo all of those assumptions, fix all of those potential deadlocks, etc.
(Heck, if the compiler could figure this kind of stuff out we wouldn't even have to write multi-threaded code; you'd just let the compiler slice up your code so it runs concurrently.)
Finally, such a change just isn't worth the effort. I do Cocoa development full time and the number of times I have to deal with the "update control from a background thread problem" occurs, at most, once a week or so. There's no development cost-benefit analysis that's going to dedicate a million man-hours to solving a problem that already has a straight forward solution.
Now if you were developing a new, modern, UI API from scratch, you'd simply make the entire UI framework thread safe and whole question goes away. And maybe Apple has a brand new, redesigned-from-the-ground-up, UI framework in a lab somewhere that does that. But that's the only way I see something like this happening.
You would be substituting one kind of frustration for another.
Suppose that all UI-related methods that require invocation on the main thread did so by:
using DispatchQueue.main.async: You would be hiding asynchronous behaviour, with no obvious way to "follow up" on the result. Code like this would now fail:
label.text = "new value"
assert(label.text == "new value")
You would have thought that the property text just harmlessly assigned some value. In fact, it enqueued a work item to asynchronously execute on the main thread. In doing so, you've broken the expectation that your system has reached its desired state by the time you've completed that line.
using DispatchQueue.main.sync: You would be hiding a potential for deadlock. Synchronous code on the main queue can be very dangerous, because it's easy to unintentionally block (on the main thread) yourself waiting for such work, causing deadlock.
I think one way this could have been achieved is by having a hidden thread dedicated to UI. All UI-related APIs would switch to that thread to do their work. Though I don't know how expensive that would be (each switch to that thread is probably no faster than waiting on a lock), and I could imagine there's lots of "fun" ways that'll get you to write deadlocking code.
Only on rare instances would the UI call anything in the main thread, except for user login timeouts for security. Most UI related methods for any particular window are called within the thread that was started when the window was initialized.
I would rather manage my UI calls instead of the compiler because as a developer, I want control and do not want to rely on third party 'black boxes'.
check https://developer.apple.com/documentation/code_diagnostics/main_thread_checker
and UPDATE UI FROM MAIN THREAD ONLY!!!
If I write accidentally UI updating code on a background thread after fetching data from network req. So Will my application get crash?
From apple docs:
Updating UI on a thread other than the main thread is a common mistake that can result in missed UI updates, visual defects, data corruptions, and crashes. source
So it can crash, but it can also not crash. It might update the UI or it might not. It might crash immediately when the code is called; it might crash in the next runloop, or it might crash minutes later. It might cause weird UI that make you say WTF. In short, its behavior is undefined. Which makes can make it a really hard bug to track down and fix.
If you are asking what is the exact behavior to expect when doing this. The answer is: in a debug environment you should expect the Main Thread Checker to catch it and cause a crash with a good crash report. If it is a production build you can expect some crashes that look like this: Application crashes very rarely with UI update on secondary thread, but they might look different. If you have a small user base you might not see any crashes, but still have a very buggy app.
Straight Answer : Your Application Won't crash.
UI Update must be done in Main thread
Apple Documentation:
DispatchQueue manages the execution of work items. Each work item
submitted to a queue is processed on a pool of threads managed by the
system.
So, use
DispatchQueue.main.async {
//your UI code
}
Why do you want to update UI in the background thread ? it's a bad practice.
You'll get something like this:
Ever since iOS 11 was released, I'm experiencing an esporadic but frequent crash with the following signature:
Cannot remove an observer <CBPeripheral 0x1c010ef10> for the key path "delegate" from <CBPeripheral 0x1c010ef10> because it is not registered as an observer.
This happens in the context of a scan for Bluetooth devices, a later connection to one of them and a final cleanup of the whole process. All these tasks are performed in a non-main dispatch queue to soften the pressure on the main thread (for smoother UI experience). This very code has been running without incident ever since iOS 9 days and only now that iOS 11 came out, started to crash.
The only references I've found in the net so far regarding this behaviour are this and this post for the Estimote SDK. These references suggest that something might be going on with parallel instances of the CBCentralManager in different dispatch queues, however, nothing regarding special care on the matter is stated in the official Programming Guide. Also, a response from an Apple Staff member to another CoreBluetooth issue stating:
iOS 11 is in general going to be less forgiving for apps which don't hold a proper reference to CB objects...
Doesn't sound very encouraging. I tried profiling the app and looking for potential leaks using XCode and it's companion tools but this didnt't shed much light on it either.
Has anybody else experienced similar issues? Any suggestions on how to workaround it? Ideas on where to dig next?
After some testing period, in our particular case the solution consisted in translating all the Bluetooth stack related work to the mainQueue. Meaning that all the related callbacks exists within Main-thread territory.
This solution requires extra caution with the work performed in those callbacks (UI runs here too), but since most CoreBluetooth actions are asynchronous by default, this has proven feasible. This workaround has been confirmed in iOS 11 and so far no issues have been reported in iOS 12 as well.
The takeaway here is: Handle ONLY the absolutely necessary bits in mainQueue, and then transfer the rest of the load elsewhere if necessary.
Update 2: I found a workaround which is to synchronize MOC deallocating and saving. Please see the updated project.
https://github.com/shuningzhou/MOCDeadLock.git
Note: I made it fail more aggressively. Don't run it on a real device!
Update: A sample project to demonstrate this issue.
https://github.com/shuningzhou/MOCDeadLock.git
XCode 6.2: Unable to reproduce.
XCode 6.3: Reproducible.
XCode 6.4 beta: Reproducible.
========================== The Issue ===============================
Our app randomly stuck on OSSpinLockLockSlow after upgrading to XCode 6.3. In our project, we used NSOperation and NSOperationQueue to fetch data from our server and used Core Data for data persistence.
This issue never happened before! You can see from the stack trace that no calls are made by our code. I am not sure where to start debugging this. Could someone provide some guidance?
Thank you in advance!
Please see the stack trace
Edit:
We are using AFNetworking and our NSOperations are subclasses of AFHTTPRequestOperation. We added some custom properties and overrode the method -(void)start:
- (void)start;
{
//unrelated code...
NSString *completionQueueID = [NSString uuid];
const char *cString = [completionQueueID cStringUsingEncoding:NSASCIIStringEncoding];
self.completionQueue = dispatch_queue_create(cString, DISPATCH_QUEUE_SERIAL);
//unrelated code....
[super start];
}
For Core Data, We are following the thread-confinement pattern. We have separate managed object context for each thread, and the contexts share a static persistent store coordinator.
Edit 2:
More info: I found that this issue happens when the system exits multiple threads at the same time. We store the Managed Object Context in the thread dictionary, and they get released when the threads exit.
[[[NSThread currentThread] threadDictionary] setObject:dataManager forKey:#"IHDataManager"];
CPU usage is around 20%.
I have been experiencing precisely this issue. As per your stack trace, I have a bunch of threads stalled with _OSSpinLockLockSlow.
It appears to be a livelock situation with the spinlocks chained up together. Including some networking threads and core data. But as Rob pointed out, symptoms of livelock should include high CPU usages (spinlocks are all endlessly spinning). In my case (and in yours) this is not the case, CPU usage is low - simulator 'percent used' 20%, simulator overall in activity monitor 0.6% - so maybe it's a deadlock ;-)
Like you, I am using a thread-confinement pattern, separate managed object context per thread, single persistent store.
Following your observation that the hang always seems to follow deallocing of a bunch of threads, I checked that behaviour and can confirm that is the case.
This got me wondering why I had so many threads active. It turned out I was using gcd with a concurrent background queue:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND,0),^{
modelClass = [WNManagedObject classForID:mongoID];
dispatch_async(dispatch_get_main_queue(),^{
...
});
});
This snippet is part of some networking/JSON parsing code. 'classForID' was causing slight UI jitters on the main thread, so I backgrounded it.
In effect the concurrent background queue was spitting out a whole bunch of short-lived threads. This was completely unnecessary. Refactoring as a single serial queue fixed the thread excesses, which got rid of the spinlock issue. Finally I realised I didn't need to get the class at all, so this code has since been exorcised.
Problem fixed, but no explanation as to why this should suddenly become an issue with 8.3
I suspect that the same issue is touched on in this question (although Cocoalumberjack gets the blame there):
syscall_thread_switch iOS 8.3 race - CocoaLumberjack bug? how to debug this?
..and in this Cocoalumberjack bug report
https://github.com/CocoaLumberjack/CocoaLumberjack/issues/494
I am also using CocoaLumberjack but it does not feature in any of the problem threads, so I think that is a red herring. The underlying cause seems to be excess thread creation.
I have seen the issue in the simulator and on devices when tethered to XCode, but I have not experienced it when running independently of XCode. It is new to me in iOS 8.3 / XCode 6.3.1
Not really an answer, more of a diary of my own workaround for this weird issue, but maybe you'll find it useful.
If question is still actual - this is a bug in iOS: OpenRadar crash report
Also you may find this blog post useful: blog post
I think you should replace OSSpinLocks with something else to fix this in your app.
We encountered this bug in our Unity3d game. We didnt fixed this yet in our app because we do not have access to most of the native iOS code (we write our game on C# and we use a lot of 3-rd party native plugins). So I cannot recommend you something concrete about replacing OSSpinLock. Sorry for my English.
Update
Many Apple frameworks and libraries uses OSSpinLock internally, so you dont need to use it explicity to run into this issue.