I have encountered a data race in my app using Xcode's Thread Sanitizer and I have a question on how to address it.
I have a var defined as:
var myDict = [Double : [Date:[String:Any]]]()
I have a thread setup where I call a setup() function:
let queue = DispatchQueue(label: "my-queue", qos: .utility)
queue.async {
self.setup {
}
}
My setup() function essentially loops through tons of data and populates myDict. This can take a while, which is why we need to do it asynchronously.
On the main thread, my UI accesses myDict to display its data. In a cellForRow: method:
if !myDict.keys.contains(someObject) {
//Do something
}
And that is where I get my data race alert and the subsequent crash.
Exception NSException * "-[_NSCoreDataTaggedObjectID objectForKey:]:
unrecognized selector sent to instance
0x8000000000000000" 0x0000000283df6a60
Please kindly help me understand how to access a variable in a thread safe manner in Swift. I feel like I'm possibly half way there with setting, but I'm confused on how to approach getting on the main thread.
One way to access it asynchronously:
typealias Dict = [Double : [Date:[String:Any]]]
var myDict = Dict()
func getMyDict(f: #escaping (Dict) -> ()) {
queue.async {
DispatchQueue.main.async {
f(myDict)
}
}
}
getMyDict { dict in
assert(Thread.isMainThread)
}
Making the assumption, that queue possibly schedules long lasting closures.
How it works?
You can only access myDict from within queue. In the above function, myDict will be accessed on this queue, and a copy of it gets imported to the main queue. While you are showing the copy of myDict in a UI, you can simultaneously mutate the original myDict. "Copy on write" semantics on Dictionary ensures that copies are cheap.
You can call getMyDict from any thread and it will always call the closure on the main thread (in this implementation).
Caveat:
getMyDict is an async function. Which shouldn't be a caveat at all nowadays, but I just want to emphasise this ;)
Alternatives:
Swift Combine. Make myDict a published Value from some Publisher which implements your logic.
later, you may also consider to use async & await when it is available.
Preface: This will be a pretty long non-answer. I don't actually know what's wrong with your code, but I can share the things I do know that can help you troubleshoot it, and learn some interesting things along the way.
Understanding the error
Exception NSException * "-[_NSCoreDataTaggedObjectID objectForKey:]: unrecognized selector sent to instance 0x8000000000000000"
An Objective C exception was thrown (and not caught).
The exception happened when attempting to invoke -[_NSCoreDataTaggedObjectID objectForKey:]. This is a conventional way to refer to an Objective C method in writing. In this case, it's:
An instance method (hence the -, rather than a + that would be used for class methods)
On the class _NSCoreDataTaggedObjectID (more on this later)
On the method named objectForKey:
The object receiving this method invocation is the one with address 0x8000000000000000.
This is a pretty weird address. Something is up.
Another hint is the strange class name of _NSCoreDataTaggedObjectID. There's a few observations we can make about it:
The prefixed _NS suggests that it's an internal implementation detail of CoreData.
We google the name to find class dumps of the CoreData framework, which show us that:
_NSCoreDataTaggedObjectID subclasses _NSScalarObjectID
Which subclasses _NSCoreManagedObjectID
Which subclasses NSManagedObjectID
NSManagedObjectID is a public API, which has its own first-party documentation.
It has the word "tagged" in its name, which has a special meaning in the Objective C world.
Some back story
Objective C used message passing as its sole mechanism for method dispatch (unlike Swift which usually prefers static and v-table dispatch, depending on the context). Every method call you wrote was essentially syntactic sugar overtop of objc_msgSend (and its variants), passing to it the receiver object, the selector (the "name" of the method being invoked) and the arguments. This was a special function that would do the job of checking the class of the receiver object, and looking through that classes' hierarchy until it found a method implementation for the desired selector.
This was great, because it allows you to do a lot of cool runtime dynamic behaviour. For example, menu bar items on a macOS app would just define the method name they invoke. Clicking on them would "send that message" to the responder chain, which would invoke that method on the first object that had an implementation for it (the lingo is "the first object that answers to that message").
This works really well, but has several trade-offs. One of them was that everything had to be an object. And by object, we mean a heap-allocated memory region, whose first several words of memory stored meta-data for the object. This meta-data would contain a pointer to the class of the object, which was necessary for doing the method-loopup process in objc_msgSend as I just described.
The issue is, that for small objects, (particularly NSNumber values, small strings, empty arrays, etc.) the overhead of these several words of object meta-data might be several times bigger than the actual object data you're interested in. E.g. even though NSNumber(value: true /* or false */) stores a single bit of "useful" data, on 64 bit systems there would be 128 bits of object overhead. Add to that all the malloc/free and retain/release overhead associated with dealing with large numbers of tiny object, and you got a real performance issue.
"Tagged pointers" were a solution to this problem. The idea is that for small enough values of particular privileged classes, we won't allocate heap memory for their objects. Instead, we'll store their objects' data directly in their pointer representation. Of course, we would need a way to know if a given pointer is a real pointer (that points to a real heap-allocated object), or a "fake pointer" that encodes data inline.
The key realization that malloc only ever returns memory aligned to 16-byte boundaries. This means that 4 bits of every memory address were always 0 (if they weren't, then it wouldn't have been 16-byte aligned). These "unused" 4 bits could be employed to discriminate real pointers from tagged pointers. Exactly which bits are used and how differs between process architectures and runtime versions, but the general idea is the same.
If a pointer value had 0000 for those 4 bits then the system would know it's a real object pointer that points to a real heap-allocated object. All other possible values of those 4-bit values could be used to signal what kind of data is stored in the remaining bits. The Objective C runtime is actually opensource, so you can actually see the tagged pointer classes and their tags:
{
// 60-bit payloads
OBJC_TAG_NSAtom = 0,
OBJC_TAG_1 = 1,
OBJC_TAG_NSString = 2,
OBJC_TAG_NSNumber = 3,
OBJC_TAG_NSIndexPath = 4,
OBJC_TAG_NSManagedObjectID = 5,
OBJC_TAG_NSDate = 6,
// 60-bit reserved
OBJC_TAG_RESERVED_7 = 7,
// 52-bit payloads
OBJC_TAG_Photos_1 = 8,
OBJC_TAG_Photos_2 = 9,
OBJC_TAG_Photos_3 = 10,
OBJC_TAG_Photos_4 = 11,
OBJC_TAG_XPC_1 = 12,
OBJC_TAG_XPC_2 = 13,
OBJC_TAG_XPC_3 = 14,
OBJC_TAG_XPC_4 = 15,
OBJC_TAG_NSColor = 16,
OBJC_TAG_UIColor = 17,
OBJC_TAG_CGColor = 18,
OBJC_TAG_NSIndexSet = 19,
OBJC_TAG_NSMethodSignature = 20,
OBJC_TAG_UTTypeRecord = 21,
// When using the split tagged pointer representation
// (OBJC_SPLIT_TAGGED_POINTERS), this is the first tag where
// the tag and payload are unobfuscated. All tags from here to
// OBJC_TAG_Last52BitPayload are unobfuscated. The shared cache
// builder is able to construct these as long as the low bit is
// not set (i.e. even-numbered tags).
OBJC_TAG_FirstUnobfuscatedSplitTag = 136, // 128 + 8, first ext tag with high bit set
OBJC_TAG_Constant_CFString = 136,
OBJC_TAG_First60BitPayload = 0,
OBJC_TAG_Last60BitPayload = 6,
OBJC_TAG_First52BitPayload = 8,
OBJC_TAG_Last52BitPayload = 263,
OBJC_TAG_RESERVED_264 = 264
You can see, strings, index paths, dates, and other similar "small and numerous" classes all have reserved pointer tag values. For each of these "normal classes" (NSString, NSDate, NSNumber, etc.), there's a special internal subclass which implements all the same public API, but using a tagged pointer instead of a regular object.
As you can see, there's a value for OBJC_TAG_NSManagedObjectID. It turns out, that NSManagedObjectID objects were numerous and small enough that they would benefit greatly for this tagged-pointer representation. After all, the value of NSManagedObjectID might be a single integer, much like NSNumber, which would be wasteful to heap-allocate.
If you'd like to learn more about tagged pointers, I'd recommend Mike Ash's writings, such as https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html
There was also a recent WWDC talk on the subject: WWDC 2020 - Advancements in the Objective-C runtime
The strange address
So in the previous section we found out that _NSCoreDataTaggedObjectID is the tagged-pointer subclass of NSManagedObjectID. Now we can notice something else that's strange, the pointer value we saw had a lot of zeros: 0x8000000000000000. So what we're dealing with is probably some kind of uninitialized-state of an object.
Conclusion
The call stack can shed further light on where this happens exactly, but what we know is that somewhere in your program, the objectForKey: method is being invoked on an uninitialized value of NSManagedObjectID.
You're probably accessing a value too-early, before it's properly initialized.
To work around this you can take one of several approaches:
A future ideal world, use would just use the structured concurrency of Swift 5.5 (once that's available on enough devices) and async/await to push the work on the background and await the result.
Use a completion handler to invoke your value-consuming code only after the value is ready. This is most immediately-easy, but will blow up your code base with completion handler boilerplate and bugs.
Use a concurrency abstraction library, like Combine, RxSwift, or PromiseKit. This will be a bit more work to set up, but usually leads to much clearer/safer code than throwing completion handlers in everywhere.
The basic pattern to achieve thread safety is to never mutate/access the same property from multiple threads at the same time. The simplest solution is to just never let any background queue interact with your property directly. So, create a local variable that the background queue will use, and then dispatch the updating of the property to the main queue.
Personally, I wouldn't have setup interact with myDict at all, but rather return the result via the completion handler, e.g.
// properties
var myDict = ...
private let queue = DispatchQueue(label: "my-queue", qos: .utility) // some background queue on which we'll recalculate what will eventually be used to update `myProperty`
// method doesn't reference `myDict` at all, but uses local var only
func setup(completion: #escaping (Foo) -> Void) {
queue.async {
var results = ... // some local variable that we'll use as we're building up our results
// do time-consuming population of `results` here;
// do not touch `myDict` here, though;
// when all done, dispatch update of `myDict` back to the main queue
DispatchQueue.main.async { // dispatch update of property back to the main queue
completion(results)
}
}
}
Then the routine that calls setup can update the property (and trigger necessary UI update, too).
setup { results in
self.myDict = results
// also trigger UI update here, too
}
(Note, your closure parameter type (Foo in my example) would be whatever type myDict is. Maybe a typealias as advised elsewhere, or better, use custom types rather than dictionary within dictionary within dictionary. Use whatever type you’d prefer.)
By the way, your question’s title and preamble talks about TSAN and thread safety, but you then share a “unrecognized selector” exception, which is a completely different issue. So, you may well have two completely separate issues going on. A TSAN data race error would have produced a very different message. (Something like the error I show here.) Now, if setup is mutating myDict from a background thread, that undoubtedly will lead to thread-safety problems, but your reported exception suggests there might also be some other problem, too...
Background
I have a class Data that stores multiple input parameters and a single output value.
The output value is recalculated whenever one of the input parameters is mutated.
The calculation takes a non-trivial amount of time so it is performed asynchronously.
If one of the input parameters changes during recalculation, the current calculation is cancelled, and a new one is begun.
The cancellation logic is implemented via a serialized queue of calculation operations and a key (reference instance) (Data.key). Data.key is set to a new reference instance every time a new recalculation is added to the queue. Also only a single recalculation can occur at a time — due to the queue. Any executing recalculation constantly checks if it was the most recently initiated calculation by holding a reference to both the key that what was created with it when it was initiated and the currently existing key. If they are different, then a new recalculation has been queued since it began, and it will terminate.
This will trigger the next recalculation in the queue to begin, repeating the process.
The basis for my question
The reassignment of Data.key is done on the main thread.
The current calculation constantly checks to see if its key is the same as the current one. This means another thread is constantly accessing Data.key.
Question(s)
Is it safe for me to leave Data.key vulnerable to being read/written to at the same time?
Is it even possible for a property to be read and written to simultaneously?
Yes Data.Key vulnerable to being read/written to at the same time.
Here is example were i'm write key from main thread and read from MySerialQueue.
If you run that code, sometimes it would crash.
Crash happens because of dereference of pointer that point to memory released during writing by main queue.
Xcode have feature called ThreadSanitizer, it would help to catch such problems.
Discussion About Race condition
func experiment() {
var key = MyClass()
var key2 = MyClass()
class MyClass {}
func writer() {
for _ in 0..<1000000 {
key = MyClass()
}
}
func reader() {
for _ in 0..<1000000 {
if key === key2 {}
}
}
DispatchQueue.init(label: "MySerialQueue").async {
print("reader begin")
reader()
print("reader end")
}
DispatchQueue.main.async {
print("writer begin")
writer()
print("writer end")
}
}
Q:
Is it safe for me to leave Data.key vulnerable to being read/written to at the same time?
A:
No
Q:
Is it even possible for a property to be read and written to simultaneously?
A:
Yes, create a separate queue for the Data.Key that only through which you access it. As long as any operation (get/set) is restricted within this queue you can read or write from anywhere with thread safety.
I'm currently facing a problem with some Swift source files when a crash occurs. Indeed, on Crashlytics I have a weird info about the line and the reason of the crash. It tells me the source has crashed at the line 0 and it gives me a SIGTRAP error. I read that this error occurs when a Thread hits a BreakPoint. But the problem is that this error occurs while I'm not debugging (application test from TestFlight).
Here is an example when Crashlytics tells me there's a SIGTRAP Error at line 0 :
// Method that crashs
private func extractSubDataFrom(writeBuffer: inout Data, chunkSize: Int) -> Data? {
guard chunkSize > 0 else { // Prevent from having a 0 division
return nil
}
// Get nb of chunks to write (then the number of bytes from it)
let nbOfChunksToWrite: Int = Int(floor(Double(writeBuffer.count) / Double(chunkSize)))
let dataCountToWrite = max(0, nbOfChunksToWrite * chunkSize)
guard dataCountToWrite > 0 else {
return nil // Not enough data to write for now
}
// Extract data
let subData = writeBuffer.extractSubDataWith(range: 0..<dataCountToWrite)
return subData
}
Another Swift file to explain what happens at the line "writeBuffer.extractSubDataWith(range: 0..
public extension Data {
//MARK: - Public
public mutating func extractSubDataWith(range: Range) -> Data? {
guard range.lowerBound >= 0 && range.upperBound <= self.count else {
return nil
}
// Get a copy of data and remove them from self
let subData = self.subdata(in: range)
self.removeSubrange(range)
return subData
}
}
Could you tell me what I'm doing wrong ? Or what can occurs this weird SIGTRAP error ?
Thank you
Crashing with a line of zero is indeed weird. But, common in Swift code.
The Swift compiler can do code generation on your behalf. This can happen quite a bit with generic functions, but may also happen for other reasons. When the compiler generates code, it also produces debug information for the code it generates. This debug information typically references the file that caused the code to be generated. But, the compiler tags it all with a line of 0 to distinguish it from code that was actually written by the developer.
These generic functions also do not have to be written by you - I've seen this happen with standard library functions too.
(Aside: I believe that the DWARF standard can, in fact, describe this situation more precisely. But, unfortunately Apple doesn't seem to use it in that way.)
Apple verified this line zero behavior via a Radar I filed about it a number of years ago. You can also poke around in your app's own debug data (via, for example dwarfdump) if you want to confirm.
One reason you might want to try to do this, is if you really don't trust that Crashlytics is labelling the lines correctly. There's a lot of stuff between their UI and the raw crash data. It is conceivable something's gone wrong. The only way you can confirm this is to grab the crashing address + binary, and do the lookup yourself. If dwarfdump tells you this happened at line zero, then that confirms this is just an artifact of compile-time code generation.
However, I would tend to believe there's nothing wrong with the Crashlytics UI. I just wanted to point it out as a possibility.
As for SIGTRAP - there's nothing weird about that at all. This is just an indication that the code being run has decided to terminate the process. This is different, for example, from a SIGBUS, where the OS does the terminating. This could be caused by Swift integer and/or range bounds checking. Your code does have some of that kind of thing in both places. And, since that would be so performance-critical - would be a prime candidate for inline code generation.
Update
It now seems like, at least in some situations, the compiler also now uses a file name of <compiler-generated>. I'm sure they did this to make this case clearer. So, it could be that with more recent versions of Swift, you'll instead see <compiler-generated>:0. This might not help tracking down a crash, but will least make things more obvious.
Is it possible to typecast an object like so (let the code speak for itself):
protocol Parent {
...
}
class Child<LiterallyAnyValue, SameAsThePrevious>: Parent {
...
}
And then when using it:
func foobar(parent: Parent) {
if parent is Child { //ONE
print(parent as! Child) //TWO
}
}
At the signed points xcode wants me to supply the two types of "Child" within <> like Child<Int, String>...
The problem is that those types could be anything... LITERALLY
(And I've tried Child<Any, Any> but that doesn't work in this case)
Is there a workaround or a solution to this?
-------- Clarification --------
I am working on an iOS 7 project so I can't really use any modern library :)
That is including PromiseKit and Alamofire and the app has to make tons of http requests. The use promises in requests has grown on me, so I created my own Promise class.
At first I made it so that the Promise class would not be a generic and it would accept Any? as the value of the resolution procedure.
After that I wanted to improve my little Promise class with type clarification so the class Promise became "class Promise<T>"
In my implementation the then method created a PromiseSubscriber which then would be stored in a Promise property called subscribers.
The PromiseSubscriber is the protocol here that has two subset classes, one being PromiseHandler (this is called when the promise is resolved), and the other the PromiseCatcher (this is called when the promise is rejected)
Both PromiseSubscriber subset classes have a property called promise and one called handler.
These classes are also generics so that you know what kind of Promise they store and what is the return type of the handler.
In my resolution process I have to check if the PromiseSubscriber is a (let's say) PromiseHandler and if it is then call the handler that returns something and then resolve the subscribed promise with that value.
And here is the problem. I can't check if the subscriber is a catcher or a handler...
I hope it's clear enough now. Maybe this is not the right approach, I honestly don't know I am just trying to create something that is fun and easy to use (code completion without checking the type).
If it's still not clear and you are willing to help me, I'll send over the classes!
It's a little difficult to understand what you're really trying to do here (please tell me it's something other than JSON parsing; I'm so tired of JSON parsing and it's the only thing people ever ask about), but the short answer is almost certainly no. Some part of that is probably a misuse of types, and some part of that is a current limitation in Swift.
To focus on the limitation in Swift part, Swift lacks higher-kinded types. It is not possible to talk about Array. This is not a type in Swift. You can only work with Array<Int> or Array<String> or even Array<T>, but only in cases where T can be determined at compile time. There are several ways to work through this, but it really depends on what your underlying problem is.
To the misuse of types side, you generally should not have if x is ... in Swift. In the vast majority of cases this should be solved with a protocol. Whatever you were going to do in the if, make it part of the Parent protocol and give it a default empty implementation. Then override that implementation in Child. For example:
protocol Parent {
func doSpecialThing()
}
extension Parent {
func doSpecialThing() {} // nothing by default
}
class Child<LiterallyAnyValue, SameAsThePrevious>: Parent {}
extension Child {
func doSpecialThing() {
print(self)
}
}
func foobar(parent: Parent) {
parent.doSpecialThing()
}
Thanks for the clarification; Promise is a great thing to play with. Your mistake is here:
In my resolution process I have to check if the PromiseSubscriber is a (let's say) PromiseHandler and if it is then call the handler that returns something and then resolve the subscribed promise with that value.
Your resolution process should not need to know if it's a handler or a catcher. If it does, then your PromiseSubscriber protocol is incorrectly defined. The piece it sounds like you're missing is a Result. Most Promise types are built on top of Result, which is an enum bundling either success or failure. In your scheme, handlers would process successful Results and ignore failing results. Catchers would process failing results and ignore successful Results. The promise resolution shouldn't care, though. It should just send the Result to all subscribers and let them do what they do.
You can build this without a Result type by using a protocol as described above.
protocol PromiseSubscriber {
associatedType Wrapped // <=== It's possible you've also missed this piece
func handleSuccess(value: Wrapped)
func handleFailure(failure: Error)
}
extension PromiseSubscriber {
func handleSuccess(value: Wrapped) {} // By default do nothing
func handleFailure(failure: Error) {}
}
class PromiseHandler<Wrapped> {
func handleSuccess(value: Wrapped) { ... do your thing ... }
}
class PromiseCatcher {
func handleFailure(failure: Error) { ... do your thing ... }
}
I recommend studying PinkyPromise. It's a nice, simple Promise library (unlike PromiseKit which adds a lot of stuff that can make it harder to understand). I probably wouldn't use a protocol here; the associatedtype makes things a bit harder and I don't think you get much out of it. I'd use Result.
Use a generic type in your foobar function, the one below requires the parent parameter to conform to the Parent protocol and T will represent the class of the object passed.
func foobar<T: Parent>(parent: T) {
print(parent)
}
I have almost 4 years of experience with Objective C and a newbie in swift. Am trying to understand the concept of swift from the perspective of Objective C. So if I am wrong please guide me through :)
In objective c, we have blocks (chunck of code which can be executed later asynchronously) which made absolutely perfect sense. But in swift now we can pass a function as a parameter to another function, which can be executed later, and then we have closure as well.
As per Apple "functions are special cases of clauses."
As per O'Reilly "when a function is passed around as a value, it carries along its internal references to external variables. That is what makes a function a closure."
So I tried a little bit to understand the same :)
Here is my closure
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a ni
let tempNumber : Int = 5
let numbers = [1,2,3,4,5]
print (numbers.map({ $0 - tempNumber}))
}
The variable tempNumber is declared even before the closure was declared, yet closure has the access to the variable. Now rather then a map, I tried using a custom class passed closure as a parameter and tried executing the same code :) Though now closure is getting executed in differrent scope, it still has the access to tempNumber.
I concluded : closures have an access to the variables and methods which are declared in the same scope as closure it self, though it gets execcuted in differrent scope.
Now rather then passing closure as paramter, tried passing function as a parameter,
class test {
func testFunctionAsParameter(testMethod : (Int) -> Int){
let seconds = 4.0
let delay = seconds * Double(NSEC_PER_SEC) // nanoseconds per seconds
let dispatchTime = dispatch_time(DISPATCH_TIME_NOW, Int64(delay))
dispatch_after(dispatchTime, dispatch_get_main_queue(), {
self.callLater(testMethod)
})
}
func callLater(testMethod : (Int) -> Int) -> Int {
return testMethod(100)
}
}
In a differrent class I created an instance of Test and used it as follow
/* in differrent class */
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a ni
let tempAge : Int = 5
func test2(val : Int) -> Int {
return val - tempAge;
}
let testObj = test();
print(testObj.testFunctionAsParameter(test2))
}
Declared a class called test, which has a method called testFunctionAsParameter which in turn calls another method called callLater and finally this method executes the passed function :)
Now all these circus, just to ensure that passed method gets executed in differrent scope :)
When I executed the above code :) I was shocked to see that even though function passed as a parameter finally gets executed in different scope, still has the access to the variables testNumber that was declared at the same scope as the method declaration :)
I concluded : O'Reilly's statement "when a function is passed around as a value, it carries along its internal references to external variables." was bang on :)
Now my doubt is apple says functions are special cases of clauses. I thought special case must be something to do with scope :) But to my surprise code shows that both closure and function have access to variables in external scope !!!!
Other then, syntax differrence how closure is differrent from function passed as argument ??? Now there must be some differrence internally, otherwise Apple wouldn't have spent so much time in designing it :)
If not scope?? then what else is different in closure and function ?? O'Reilly states "when a function is passed around as a value, it carries along its internal references to external variables. That is what makes a function a closure." so what is it trying to point out ? that closure wont carry references to external variables ? Now they can't be wrong either, are they?
I am going mad with two conflicting statements from Apple and O'Reilly :( Please help, am I understanding something wrong ?? Please help me understand the difference.
In swift there really isn't any difference between functions and closures. A closure is an anonymous function (A function with no name.) That's about it, other than the syntax differences you noted.
In Objective-C functions and blocks/closures ARE different.
Based on your post, it looks like you have a pretty full understanding on the topic and you may just be getting lost in the semantics. It basically boils down to:
1. a closure is a closure
2. a function is a closure with a name
Here's a post with more details. But again, it's mostly a discussion of semantics. The "answer" is very simple.
What is the difference between functions and closures?