Background
I have a class Data that stores multiple input parameters and a single output value.
The output value is recalculated whenever one of the input parameters is mutated.
The calculation takes a non-trivial amount of time so it is performed asynchronously.
If one of the input parameters changes during recalculation, the current calculation is cancelled, and a new one is begun.
The cancellation logic is implemented via a serialized queue of calculation operations and a key (reference instance) (Data.key). Data.key is set to a new reference instance every time a new recalculation is added to the queue. Also only a single recalculation can occur at a time — due to the queue. Any executing recalculation constantly checks if it was the most recently initiated calculation by holding a reference to both the key that what was created with it when it was initiated and the currently existing key. If they are different, then a new recalculation has been queued since it began, and it will terminate.
This will trigger the next recalculation in the queue to begin, repeating the process.
The basis for my question
The reassignment of Data.key is done on the main thread.
The current calculation constantly checks to see if its key is the same as the current one. This means another thread is constantly accessing Data.key.
Question(s)
Is it safe for me to leave Data.key vulnerable to being read/written to at the same time?
Is it even possible for a property to be read and written to simultaneously?
Yes Data.Key vulnerable to being read/written to at the same time.
Here is example were i'm write key from main thread and read from MySerialQueue.
If you run that code, sometimes it would crash.
Crash happens because of dereference of pointer that point to memory released during writing by main queue.
Xcode have feature called ThreadSanitizer, it would help to catch such problems.
Discussion About Race condition
func experiment() {
var key = MyClass()
var key2 = MyClass()
class MyClass {}
func writer() {
for _ in 0..<1000000 {
key = MyClass()
}
}
func reader() {
for _ in 0..<1000000 {
if key === key2 {}
}
}
DispatchQueue.init(label: "MySerialQueue").async {
print("reader begin")
reader()
print("reader end")
}
DispatchQueue.main.async {
print("writer begin")
writer()
print("writer end")
}
}
Q:
Is it safe for me to leave Data.key vulnerable to being read/written to at the same time?
A:
No
Q:
Is it even possible for a property to be read and written to simultaneously?
A:
Yes, create a separate queue for the Data.Key that only through which you access it. As long as any operation (get/set) is restricted within this queue you can read or write from anywhere with thread safety.
Related
In summary:
The cascade effect nature of the Cold Stream, from Inactive to Active, Is in itself an "alien" execution (alien to the reactive design) that MUST BE EXECUTED WITHIN THE SYNCRHONIZED REGION, and this is unavoidable, going against Item 79 of Effective Java.
Effective Java: Item 79:
"..., to avoid deadlock and data corruption, never call an alien
method from within a synchronized region. More generally, keep the
amount of work that you do from within synchronized to a minimum."
never call an alien
method from within a synchronized region
An add(Consumer<T> observer) AND remove(Consumer<T> observer) WILL BE concurrent (because of switchMaps that react to asynchronous changes in values/states), BUT according to Item 79, it should not be possible for a subscribe(Publisher p); method to even exist.
Since a subscribe(publisher) MUST WORK as a callback function that reacts to additions and removals of observers...
private final Object lock = new Object();
private volatile BooleanConsumer suscriptor;
public void subscribe(Publisher p) {
syncrhonized (lock) {
suscriptor = isActive -> {
if (isActive) p.add(this);
else p.remove(this);
}
}
}
public void add(Consumer<T> observer) {
syncrhonized (lock) {
observers.add(observer);
if (observer.size() > 0) suscriptor.accept(true);
}
}
I would argue that using a volatile mediator is better than holding on to the Publisher directly, but holding on to the publisher makes no difference at all, because by altering its state (when adding ourselves to the publisher) we are triggering the functions (other possible subscriptions to publishers) within it!!!, There really is no difference.
Doing it via misdirection is the proper answer, and doing so is the main idea behind the separation of concerns principle!!
Instead, what Item 79 is asking, is that each time an observer is added, we manually synchronize FROM THE OUT/ALIEN-SIDE, and deliberately check whether a subscription must be performed.
synchronized (alienLock) {
observable.add(observer);
if (observable.getObserverSize() > 0) {
publishier.add(observable);
}
}
and each time an observer is removed:
synchronized (alienLock) {
observable.remove(observer);
if (observable.getObserverSize() == 0) {
publishier.remove(observable);
}
}
Imagine those lines repeated EACH and EVERY TIME a node forks or joins on to a new one (in the reactive graph), it would be an insane amount of boilerplate defeating the entire purpose.
Reading carefully the item, you can see that the rule is there to prevent a "something" done wrong by the user that hangs the thread preventing accesses.
And this answer will be part me trying to justify why this is possible but also a non-issue in this case.
Binary Events.
In this case an event that involves only 2 states, isActive == true || false;
This means that if the consumption gets "hanged" on true, the only other option that may be waiting is "false", BUT even worst...
IF one of the two becomes deadlocked, it means the entire system is deadlocked anyways... in reality the issue is outside the design, not inside.
What I mean is that out of the 2 possible options: true or false. the time it takes for either of them to execute is meaningless since the ONLY OTHER OPTION IS STILL REQUIRED TO WAIT regardless.
Enclosed functionality of the lock.
Even if subscribe(Publisher p) methods can be concatenated, the only thing the user has access to, IS NOT THE lock per se, but the method.
So even If we are executing "alien methods" with functions, inside our synchronized bodies, THOSE FUNCTIONS ARE STILL OURS, and we know what they have inside them and how they work and what exactly they will perform.
In this case the only uncertainty in the system is not what the functions will do, but HOW MANY CONTATENATIONS THE SYSTEM HAS.
What's wrong in my code:
Finally, the only thing (I see) wrong, is that observers and subscriptions MOST DEFINITEY WORK IN SEPARATE LOCKS, because observers MUST NOT, under any circumstance should allow themselves to get locked while a subscription domino effect is taking place.
I believe that's all...
I have encountered a data race in my app using Xcode's Thread Sanitizer and I have a question on how to address it.
I have a var defined as:
var myDict = [Double : [Date:[String:Any]]]()
I have a thread setup where I call a setup() function:
let queue = DispatchQueue(label: "my-queue", qos: .utility)
queue.async {
self.setup {
}
}
My setup() function essentially loops through tons of data and populates myDict. This can take a while, which is why we need to do it asynchronously.
On the main thread, my UI accesses myDict to display its data. In a cellForRow: method:
if !myDict.keys.contains(someObject) {
//Do something
}
And that is where I get my data race alert and the subsequent crash.
Exception NSException * "-[_NSCoreDataTaggedObjectID objectForKey:]:
unrecognized selector sent to instance
0x8000000000000000" 0x0000000283df6a60
Please kindly help me understand how to access a variable in a thread safe manner in Swift. I feel like I'm possibly half way there with setting, but I'm confused on how to approach getting on the main thread.
One way to access it asynchronously:
typealias Dict = [Double : [Date:[String:Any]]]
var myDict = Dict()
func getMyDict(f: #escaping (Dict) -> ()) {
queue.async {
DispatchQueue.main.async {
f(myDict)
}
}
}
getMyDict { dict in
assert(Thread.isMainThread)
}
Making the assumption, that queue possibly schedules long lasting closures.
How it works?
You can only access myDict from within queue. In the above function, myDict will be accessed on this queue, and a copy of it gets imported to the main queue. While you are showing the copy of myDict in a UI, you can simultaneously mutate the original myDict. "Copy on write" semantics on Dictionary ensures that copies are cheap.
You can call getMyDict from any thread and it will always call the closure on the main thread (in this implementation).
Caveat:
getMyDict is an async function. Which shouldn't be a caveat at all nowadays, but I just want to emphasise this ;)
Alternatives:
Swift Combine. Make myDict a published Value from some Publisher which implements your logic.
later, you may also consider to use async & await when it is available.
Preface: This will be a pretty long non-answer. I don't actually know what's wrong with your code, but I can share the things I do know that can help you troubleshoot it, and learn some interesting things along the way.
Understanding the error
Exception NSException * "-[_NSCoreDataTaggedObjectID objectForKey:]: unrecognized selector sent to instance 0x8000000000000000"
An Objective C exception was thrown (and not caught).
The exception happened when attempting to invoke -[_NSCoreDataTaggedObjectID objectForKey:]. This is a conventional way to refer to an Objective C method in writing. In this case, it's:
An instance method (hence the -, rather than a + that would be used for class methods)
On the class _NSCoreDataTaggedObjectID (more on this later)
On the method named objectForKey:
The object receiving this method invocation is the one with address 0x8000000000000000.
This is a pretty weird address. Something is up.
Another hint is the strange class name of _NSCoreDataTaggedObjectID. There's a few observations we can make about it:
The prefixed _NS suggests that it's an internal implementation detail of CoreData.
We google the name to find class dumps of the CoreData framework, which show us that:
_NSCoreDataTaggedObjectID subclasses _NSScalarObjectID
Which subclasses _NSCoreManagedObjectID
Which subclasses NSManagedObjectID
NSManagedObjectID is a public API, which has its own first-party documentation.
It has the word "tagged" in its name, which has a special meaning in the Objective C world.
Some back story
Objective C used message passing as its sole mechanism for method dispatch (unlike Swift which usually prefers static and v-table dispatch, depending on the context). Every method call you wrote was essentially syntactic sugar overtop of objc_msgSend (and its variants), passing to it the receiver object, the selector (the "name" of the method being invoked) and the arguments. This was a special function that would do the job of checking the class of the receiver object, and looking through that classes' hierarchy until it found a method implementation for the desired selector.
This was great, because it allows you to do a lot of cool runtime dynamic behaviour. For example, menu bar items on a macOS app would just define the method name they invoke. Clicking on them would "send that message" to the responder chain, which would invoke that method on the first object that had an implementation for it (the lingo is "the first object that answers to that message").
This works really well, but has several trade-offs. One of them was that everything had to be an object. And by object, we mean a heap-allocated memory region, whose first several words of memory stored meta-data for the object. This meta-data would contain a pointer to the class of the object, which was necessary for doing the method-loopup process in objc_msgSend as I just described.
The issue is, that for small objects, (particularly NSNumber values, small strings, empty arrays, etc.) the overhead of these several words of object meta-data might be several times bigger than the actual object data you're interested in. E.g. even though NSNumber(value: true /* or false */) stores a single bit of "useful" data, on 64 bit systems there would be 128 bits of object overhead. Add to that all the malloc/free and retain/release overhead associated with dealing with large numbers of tiny object, and you got a real performance issue.
"Tagged pointers" were a solution to this problem. The idea is that for small enough values of particular privileged classes, we won't allocate heap memory for their objects. Instead, we'll store their objects' data directly in their pointer representation. Of course, we would need a way to know if a given pointer is a real pointer (that points to a real heap-allocated object), or a "fake pointer" that encodes data inline.
The key realization that malloc only ever returns memory aligned to 16-byte boundaries. This means that 4 bits of every memory address were always 0 (if they weren't, then it wouldn't have been 16-byte aligned). These "unused" 4 bits could be employed to discriminate real pointers from tagged pointers. Exactly which bits are used and how differs between process architectures and runtime versions, but the general idea is the same.
If a pointer value had 0000 for those 4 bits then the system would know it's a real object pointer that points to a real heap-allocated object. All other possible values of those 4-bit values could be used to signal what kind of data is stored in the remaining bits. The Objective C runtime is actually opensource, so you can actually see the tagged pointer classes and their tags:
{
// 60-bit payloads
OBJC_TAG_NSAtom = 0,
OBJC_TAG_1 = 1,
OBJC_TAG_NSString = 2,
OBJC_TAG_NSNumber = 3,
OBJC_TAG_NSIndexPath = 4,
OBJC_TAG_NSManagedObjectID = 5,
OBJC_TAG_NSDate = 6,
// 60-bit reserved
OBJC_TAG_RESERVED_7 = 7,
// 52-bit payloads
OBJC_TAG_Photos_1 = 8,
OBJC_TAG_Photos_2 = 9,
OBJC_TAG_Photos_3 = 10,
OBJC_TAG_Photos_4 = 11,
OBJC_TAG_XPC_1 = 12,
OBJC_TAG_XPC_2 = 13,
OBJC_TAG_XPC_3 = 14,
OBJC_TAG_XPC_4 = 15,
OBJC_TAG_NSColor = 16,
OBJC_TAG_UIColor = 17,
OBJC_TAG_CGColor = 18,
OBJC_TAG_NSIndexSet = 19,
OBJC_TAG_NSMethodSignature = 20,
OBJC_TAG_UTTypeRecord = 21,
// When using the split tagged pointer representation
// (OBJC_SPLIT_TAGGED_POINTERS), this is the first tag where
// the tag and payload are unobfuscated. All tags from here to
// OBJC_TAG_Last52BitPayload are unobfuscated. The shared cache
// builder is able to construct these as long as the low bit is
// not set (i.e. even-numbered tags).
OBJC_TAG_FirstUnobfuscatedSplitTag = 136, // 128 + 8, first ext tag with high bit set
OBJC_TAG_Constant_CFString = 136,
OBJC_TAG_First60BitPayload = 0,
OBJC_TAG_Last60BitPayload = 6,
OBJC_TAG_First52BitPayload = 8,
OBJC_TAG_Last52BitPayload = 263,
OBJC_TAG_RESERVED_264 = 264
You can see, strings, index paths, dates, and other similar "small and numerous" classes all have reserved pointer tag values. For each of these "normal classes" (NSString, NSDate, NSNumber, etc.), there's a special internal subclass which implements all the same public API, but using a tagged pointer instead of a regular object.
As you can see, there's a value for OBJC_TAG_NSManagedObjectID. It turns out, that NSManagedObjectID objects were numerous and small enough that they would benefit greatly for this tagged-pointer representation. After all, the value of NSManagedObjectID might be a single integer, much like NSNumber, which would be wasteful to heap-allocate.
If you'd like to learn more about tagged pointers, I'd recommend Mike Ash's writings, such as https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html
There was also a recent WWDC talk on the subject: WWDC 2020 - Advancements in the Objective-C runtime
The strange address
So in the previous section we found out that _NSCoreDataTaggedObjectID is the tagged-pointer subclass of NSManagedObjectID. Now we can notice something else that's strange, the pointer value we saw had a lot of zeros: 0x8000000000000000. So what we're dealing with is probably some kind of uninitialized-state of an object.
Conclusion
The call stack can shed further light on where this happens exactly, but what we know is that somewhere in your program, the objectForKey: method is being invoked on an uninitialized value of NSManagedObjectID.
You're probably accessing a value too-early, before it's properly initialized.
To work around this you can take one of several approaches:
A future ideal world, use would just use the structured concurrency of Swift 5.5 (once that's available on enough devices) and async/await to push the work on the background and await the result.
Use a completion handler to invoke your value-consuming code only after the value is ready. This is most immediately-easy, but will blow up your code base with completion handler boilerplate and bugs.
Use a concurrency abstraction library, like Combine, RxSwift, or PromiseKit. This will be a bit more work to set up, but usually leads to much clearer/safer code than throwing completion handlers in everywhere.
The basic pattern to achieve thread safety is to never mutate/access the same property from multiple threads at the same time. The simplest solution is to just never let any background queue interact with your property directly. So, create a local variable that the background queue will use, and then dispatch the updating of the property to the main queue.
Personally, I wouldn't have setup interact with myDict at all, but rather return the result via the completion handler, e.g.
// properties
var myDict = ...
private let queue = DispatchQueue(label: "my-queue", qos: .utility) // some background queue on which we'll recalculate what will eventually be used to update `myProperty`
// method doesn't reference `myDict` at all, but uses local var only
func setup(completion: #escaping (Foo) -> Void) {
queue.async {
var results = ... // some local variable that we'll use as we're building up our results
// do time-consuming population of `results` here;
// do not touch `myDict` here, though;
// when all done, dispatch update of `myDict` back to the main queue
DispatchQueue.main.async { // dispatch update of property back to the main queue
completion(results)
}
}
}
Then the routine that calls setup can update the property (and trigger necessary UI update, too).
setup { results in
self.myDict = results
// also trigger UI update here, too
}
(Note, your closure parameter type (Foo in my example) would be whatever type myDict is. Maybe a typealias as advised elsewhere, or better, use custom types rather than dictionary within dictionary within dictionary. Use whatever type you’d prefer.)
By the way, your question’s title and preamble talks about TSAN and thread safety, but you then share a “unrecognized selector” exception, which is a completely different issue. So, you may well have two completely separate issues going on. A TSAN data race error would have produced a very different message. (Something like the error I show here.) Now, if setup is mutating myDict from a background thread, that undoubtedly will lead to thread-safety problems, but your reported exception suggests there might also be some other problem, too...
I have a singleton that manages an array. This singleton can be accessed from multiple threads, so it has its own internal DispatchQueue to manage read/write access across threads. For simplicity we'll say it's a serial queue.
There comes a time where the singleton will be reading from the array and updating the UI. How do I handle this?
Which thread my internal dispatch queue is not known, right? It's just an implementation detail I'm to not worry about? In most cases this seems fine, but in this one specific function I need to be sure it uses the main thread.
Is it okay to do something along the lines of:
myDispatchQueue.sync { // Synchronize with internal queue to ensure no writes/reads happen at the same time
DispatchQueue.main.async { // Ensure that it's executed on the main thread
for item in internalArray {
// Pretend internalArray is an array of strings
someLabel.text = item
}
}
}
So my questions are:
Is that okay? It seems weird/wrong to be nesting dispatch queues. Is there a better way? Maybe something like myDispatchQueue.sync(forceMainThread: true) { ... }?
If I DID NOT use DispatchQueue.main.async { ... }, and I called the function from the main thread, could I be sure that my internal dispatch queue will execute it on the same (main) thread as what called it? Or is that also an "implementation detail" where it could be, but it could also be called on a background thread?
Basically I'm confused that threads seem like an implementation detail you're not supposed to worry about with queues, but what happens on the odd chance when you DO need to worry?
Simple example code:
class LabelUpdater {
static let shared = LabelUpdater()
var strings: [String] = []
private let dispatchQueue: dispatchQueue
private init {
dispatchQueue = DispatchQueue(label: "com.sample.me.LabelUpdaterQueue")
super.init()
}
func add(string: String) {
dispatchQueue.sync {
strings.append(string)
}
}
// Assume for sake of example that `labels` is always same array length as `strings`
func updateLabels(_ labels: [UILabel]) {
// Execute in the queue so that no read/write can occur at the same time.
dispatchQueue.sync {
// How do I know this will be on the main thread? Can I ensure it?
for (index, label) in labels.enumerated() {
label.text = strings[index]
}
}
}
}
Yes, you can nest a dispatch to one queue inside a dispatch to another queue. We frequently do so.
But be very careful. Just wrapping an asynchronous dispatch to the main queue with a dispatch from your synchronizing queue is insufficient. Your first example is not thread safe. That array that you are accessing from the main thread might be mutating from your synchronization queue:
This is a race condition because you potentially have multiple threads (your synchronization queue’s thread and the main thread) interacting with the same collection. Rather than having your dispatched block to the main queue just interact objects directly, you should make a copy of of it, and that’s what you reference inside the dispatch to the main queue.
For example, you might want to do the following:
func process(completion: #escaping (String) -> Void) {
syncQueue.sync {
let result = ... // note, this runs on thread associated with `syncQueue` ...
DispatchQueue.main.async {
completion(result) // ... but this runs on the main thread
}
}
}
That ensures that the main queue is not interacting with any internal properties of this class, but rather just the result that was created in this closure passed to syncQueue.
Note, all of this is unrelated to it being a singleton. But since you brought up the topic, I’d advise against singletons for model data. It’s fine for sinks, stateless controllers, and the like, but not generally advised for model data.
I’d definitely discourage the practice of initiating UI controls updates directly from the singleton. I’d be inclined to provide these methods completion handler closures, and let the caller take care of the resulting UI updates. Sure, if you want to dispatch the closure to the main queue (as a convenience, common in many third party API), that’s fine. But the singleton shouldn’t be reaching in and update UI controls itself.
I’m assuming you did all of this just for illustrative purposes, but I added this word of caution to future readers who might not appreciate these concerns.
Try using OperationQueues(Operations) as they do have states:
isReady: It’s prepared to start
isExecuting: The task is currently running
isFinished: Once the process is completed
isCancelled: The task canceled
Operation Queues benefits:
Determining Execution Order
observe their states
Canceling Operations
Operations can be paused, resumed, and cancelled. Once you dispatch a
task using Grand Central Dispatch, you no longer have control or
insight into the execution of that task. The NSOperation API is more
flexible in that respect, giving the developer control over the
operation’s life cycle
https://developer.apple.com/documentation/foundation/operationqueue
https://medium.com/#aliakhtar_16369/concurrency-in-swift-operations-and-operation-queue-part-3-a108fbe27d61
I often come across the key terms "thread safe" and wonder what it means. For example, in Firebase or Realm, some objects are considered "Thread Safe". What exactly does it mean for something to be thread safe?
Thread Unsafe
- If any object is allowed to modify by more than one thread at the same time.
Thread Safe
- If any object is not allowed to modify by more than one thread at the same time.
Generally, immutable objects are thread-safe.
An object is said to be thread safe if more than one thread can call methods or access the object's member data without any issues; an "issue" broadly being defined as being a departure from the behaviour when only accessed from only one thread.
For example an object that contains the code i = i + 1 for a regular integer i would not be thread safe, since two threads might encounter that statement and one thread might read the original value of i, increment it, then write back that single incremented value; all at the same time as another thread. In that way, i would be incremented only once, where it ought to have been incremented twice.
After searching for the answer, I got the following from this website:
Thread safe code can be safely called from multiple threads or concurrent tasks without causing any problems (data corruption, crashing, etc). Code that is not thread safe must only be run in one context at a time. An example of thread safe code is let a = ["thread-safe"]. This array is read-only and you can use it from multiple threads at the same time without issue. On the other hand, an array declared with var a = ["thread-unsafe"] is mutable and can be modified. That means it’s not thread-safe since several threads can access and modify the array at the same time with unpredictable results. Variables and data structures that are mutable and not inherently thread-safe should only be accessed from one thread at a time.
iOS Thread safe
[Atomicity, Visibility, Ordering]
[General lock, mutex, semaphore]
Thread safe means that your program works as expected. It is about multithreading envirompment, where we have a problem with shared resource with problems of Data races and Race Condition[About].
Apple provides us by Synchronization Tools:
Atomicity
Atomic Operations - lock free mechanism which is based on hardware instructions - for example Compare-And-Swap(CAS)[More]...
Objective-C OSAtomic, atomic property attribute[About]
[Swift Atomic Operations]
Visibility
Volatile Variable - read value from memory(no cache)
Objective-C volatile
Ordering
Memory Barriers - guarantees up-to date data[About]
Objective-C OSMemoryBarrier
Find problem in your code
Thread Sanitizer - uses self.recordAndCheckWrite(var) inside to figure out when(timestamp) and who(thread) access to variable
Synchronisation
Locks - thread can get a lock and nobody else access to the resource. NSLock.
Semaphore consists of Threads queue, Counter value and has wait() and signal() api. Semaphore allows a number of threads(Counter value) work with resource at a given moment. DispatchSemaphore, POSIX Semaphore - semaphore_t. App Group allows share POSIX semaphores
Mutex - mutual exclusion, mutually exclusive - is a type of Semaphore(allows several threads) where Thread can acquire it and is able to work with block as a single encroacher, all others thread will be blocked until release. The main different with lock is that mutex also works between processes(not only threads). Also it includes memory barrier.
var lock = os_unfair_lock_s()
os_unfair_lock_lock(&lock)
//critical section
os_unfair_lock_unlock(&lock)
NSLock -POSIX Mutex Lock - pthread_mutex_t, Objective-C #synchronized.
let lock = NSLock()
lock.lock()
//critical section
lock.unlock()
Recursive lock - Lock Reentrance - thread can acquire a lock several times. NSRecursiveLock
Spin lock - waiting thread checks if it can get a lock repeatedly based on polling mechanism. It is useful for small operation. In this case thread is not blocked and expensive operations like context switch is not nedded
[GCD]
Common approach is using custom serial queue with async call - where all access to memory will be done one by one:
serial read and write access
private let queue = DispatchQueue(label: "com.company")
self.queue.async {
//read and write access to shared resource
}
concurrent read and serial write access. when write is oocured - all previous read access finished -> write -> all other reads
private let queue = DispatchQueue(label: "com.company", attributes: .concurrent)
//read
self.queue.sync {
//read
}
//write
self.queue.sync(flags: .barrier) {
//write
}
Operations
[Actors]
actor MyData {
var sharedVariable = "Hello"
}
//using
Task {
await self.myData.sharedVariable = "World"
}
Multi threading:
[Concurrency vs Parallelism]
[Sync vs Async]
[Mutable vs Immutable] [let vs var]
[Swift thread safe Singleton]
[Swift Mutable/Immutable collection]
pthread - POSIX[About] thread
NSThead
To give a simple example. If something is shared across multiple threads without any issues like crash, it is thread-safe. For example, if you have a constant (let value = ["Facebook"]) and it is shared across multiple threads, it is thread safe because it is read-only and cannot be modified. Whereas, if you have a variable (var value = ["Facebook"]), it may cause potential crash or data loss when shared with multiple threads because it's data can be modified.
I would like to have to possibility to make thread (consumer) express interest in when another thread (producer) makes something. But not all the time.
Basically I want to make a one-shot consumer. Ideally the producer through would go merrily about its business until one (or many) consumers signal that they want something, in which case the producer would push some data into a variable and signal that it has done so. The consumer will wait until the variable has become filled.
It must also be so that the one-shot consumer can decide that it has waited too long and abandon the wait (a la pthread_cond_timedwait)
I've been reading many articles and SO questions about different ways to synchronize threads. Currently I'm leaning towards a condition variable approach.
I would like to know if this is a good way to go about it (being a novice at thread programming I probably have quite a few bugs in there), or if it perhaps would be better to (ab)use semaphores for this situation? Or something else entirely? Just an atomic assign to a pointer variable if available? I currently don't see how these would work safely, probably because I'm trying to stay on the safe side, this application is supposed to run for months, without locking up. Can I do without the mutexes in the producer? i.e.: just signal a condition variable?
My current code looks like this:
consumer {
pthread_mutex_lock(m);
pred = true; /* signal interest */
while (pred) {
/* wait a bit and hopefully get an answer before timing out */
pthread_cond_timedwait(c, m, t);
/* it is possible that the producer never produces anything, in which
case the pred will stay true, we must "designal" interest here,
unfortunately the also means that a spurious wake could make us miss
a good answer, no? How to combat this? */
pred = false;
}
/* if we got here that means either an answer is available or we timed out */
//... (do things with answer if not timed out, otherwise assign default answer)
pthread_mutex_unlock(m);
}
/* this thread is always producing, but it doesn't always have listeners */
producer {
pthread_mutex_lock(m);
/* if we have a listener */
if (pred) {
buffer = "work!";
pred = false;
pthread_cond_signal(c);
}
pthread_mutex_unlock(m);
}
NOTE: I'm on a modern linux and can make use of platform-specific functionality if necessary
NOTE2: I used the seemingly global variables m, c, and t. But these would be different for every consumer.
High-level recap
I want a thread to be able to register for an event, wait for it for a specified time and then carry on. Ideally it should be possible for more than one thread to register at the same time and all threads should get the same events (all events that came in the timespan).
What you want is something similar to a std::future in c++ (doc). A consumer requests a task to be performed by a producer using a specific function. That function creates a struct called future (or promise), holding a mutex, a condition variable associated with the task as well as a void pointer for the result, and returns it to the caller. It also put that struct, the task id, and the parameters (if any) in a work queue handled by the producer.
struct future_s {
pthread_mutex_t m;
pthread_cond_t c;
int flag;
void *result;
};
// basic task outline
struct task_s {
struct future_s result;
int taskid;
};
// specific "mytask" task
struct mytask_s {
struct future_s result;
int taskid;
int p1;
float p2;
};
future_s *do_mytask(int p1, float p2){
// allocate task data
struct mytask_s * t = alloc_task(sizeof(struct mytask_s));
t->p1 = p1;
t->p2 = p2;
t->taskid = MYTASK_ID;
task_queue_add(t);
return (struct future_s *)t;
}
Then the producer pull the task out of the queue, process it, and once terminated, put the result in the future and trigger the variable.
The consumer may wait for the future or do something else.
For a cancellable futures, include a flag in the struct to indicate that the task is cancelled. The future is then either:
delivered, the consumer is the owner and must deallocate it
cancelled, the producer remains the owner and disposes of it.
The producer must therefore check that the future has not been cancelled before triggering the condition variable.
For a "shared" future, the flag turns into a number of subscribers. If the number is above zero, the order must be delivered. The consumer owning the result is left to be decided between all consumers (First come first served? Is the result passed along to all consumers?).
Any access to the future struct must be mutexed (which works well with the condition variable).
Regarding the queues, they may be implemented using a linked list or an array (for versions with limited capacity). Since the functions creating the futures may be called concurrently, they have to be protected with a lock, which is usually implemented with a mutex.