NSCondition, what if no lock when call signal? - ios

From this Apple's document about NSCondition, the usage of NSCondition should be:
Thead 1:
[cocoaCondition lock];
while (timeToDoWork <= 0)
[cocoaCondition wait];
timeToDoWork--;
// Do real work here.
[cocoaCondition unlock];
Thread 2:
[cocoaCondition lock];
timeToDoWork++;
[cocoaCondition signal];
[cocoaCondition unlock];
And in the document of method signal in NSConditon:
You use this method to wake up one thread that is waiting on the condition. You may call this method multiple times to wake up multiple threads. If no threads are waiting on the condition, this method does nothing. To avoid race conditions, you should invoke this method only while the receiver is locked.
My question is:
I don't want the Thread 2 been blocked in any situation, so I removed the lock and unlock call in Thread 2. That is, Thread 2 can put as many work as it wish, Thread 1 will do the work one by one, if no more work, it wait (blocked). This is also a producer-consumer pattern, but the producer never been blocked.
But the way is not correct according to Apple's document So what things could possibly go wrong in this pattern? Thanks.

Failing to lock is a serious problem when multiple threads are accessing shared data. In the example from Apple's code, if Thread 2 doesn't lock the condition object then it can be incrementing timeToDoWork at the same time that Thread 1 is decrementing it. That can result in the results from one of those operations being lost. For example:
Thread 1 reads the current value of timeToDoWork, gets 1
Thread 2 reads the current value of timeToDoWork, gets 1
Thread 2 computes the incremented value (timeToDoWork + 1), gets 2
Thread 1 computes the decremented value (timeToDoWork - 1), gets 0
Thread 2 writes the new value of timeToDoWork, stores 2
Thread 1 writes the new value of timeToDoWork, stores 0
timeToDoWork started at 1, was incremented and decremented, so it should end at 1, but it actually ends at 0. By rearranging the steps, it could end up at 2, instead. Presumably, the value of timeToDoWork represents something real and important. Getting it wrong would probably screw up the program.
If your two threads are doing something as simple as incrementing and decrementing a number, they can do it without locks by using the atomic operation functions, such as OSAtomicIncrement32Barrier() and OSAtomicDecrement32Barrier(). However, if the shared data is any more complicated than that (and it probably is in any non-trivial case), then they really need to use synchronization mechanisms such as condition locks.

Related

How to implement a lock and unlock sequence in a metal shader?

How should I implement a lock/unlock sequence with Compare and Swap using a Metal compute shader.
I’ve tested this sample code but it does not seem to work. For some reason, the threads are not detecting that the lock was released.
Here is a brief explanation of the code below:
The depthFlag is an array of atomic_bools. In this simple example, I simply try to do a lock by comparing the content of depthFlag[1]. I then go ahead and do my operation and once the operation is done, I do an unlock.
As stated above, only one thread is able to do the locking/work/unlocking but the rest of the threads get stuck in the while loop. They NEVER leave. I expect another thread to detect the unlock and go through the sequence.
What am I doing wrong? My knowledge on CAS is limited, so I appreciate any tips.
kernel void testFunction(device float *depthBuffer[[buffer(4)]], device atomic_bool *depthFlag [[buffer(5)]], uint index[[thread_position_in_grid]]){
//lock
bool expected=false;
while(!atomic_compare_exchange_weak_explicit(&depthFlag[1],&expected,true,memory_order_relaxed,memory_order_relaxed)){
//wait
expected=false;
}
//Do my operation here
//unlock
atomic_store_explicit(&depthFlag[1], false, memory_order_relaxed);
//barrier
}
You essentially can't use the locking programming model for GPU concurrency. For one, the relaxed memory order model (the only one available) is not suitable for this; for another, you can't guarantee that other threads will make progress between your atomic operations. Your code must always be able to make progress, regardless of what the other threads are doing.
My recommendation is that you use something like the following model instead:
Read atomic value to check if another thread has already completed the operation in question.
If no other thread has done it yet, perform the operation. (But don't cause any side effects, i.e. don't write to device memory.)
Perform an atomic operation to indicate your thread has completed the operation while checking whether another thread got there first. (e.g. compare-and-swap a boolean, but increasing a counter also works)
If another thread got there first, don't perform side effects.
If your thread "won" and no other thread registered completion, perform your operation's side effects, e.g. do whatever you need to do to write out the result etc.
This works well if there's not much competition, and if the result does not vary depending on which thread performs the operation.
The occasional discarded work should not matter. If there is significant competition, use thread groups; within a thread group, the threads can coordinate which thread will perform which operation. You may still end up with wasted computation from competition between groups. If this is a problem, you may need to change your approach more fundamentally.
If the results of the operation are not deterministic, and the threads all need to proceed using the same result, you will need to change your approach. For example, split your kernels up so any computation which depends on the result of the operation in question runs in a sequentially queued kernel.

Thread Sanitizer in xcode giving wrong error

func doSomething() -> Int {
var sum = 0
let increaseWork = DispatchWorkItem {
sum = sum + 100 //point 1
}
DispatchQueue.global().async(execute:increaseWork)
increaseWork.wait()
return sum //point 2
}
Thread Sanitizer is saying that there is race condition between point 1 and point 2. But I don't think there is any race condition as increaseWork.wait() is blocking call and it will not pass until the closure is executed.
There could be a race condition between those 2 points. Imagine what happens if you execute the doSomething() function on 2 separate threads like this:
The first thread executes the increaseWork() closure and finishes it. It's at the line with wait right now
The second thread starts executing and hits the async execute instruction
The first thread hits the return instruction as the wait can continue
At the same time, the closure from the second is scheduled and executed
At this point, you can't tell for sure what is executed first: the sum = sum + 100 from the second thread or the return sum from the first one.
The idea is that sum is a shared resource which is not synchronised, so, in theory, that could be a race condition. Even if you took care that such thing does not happen, the Thread Sanitizer detects a possible race condition as it does not know whether you start a single thread or you execute doSomething() function from 100 different threads at the same time.
UPDATE:
As I missed the fact that the sum variable is local, the above explanation does not answer the current question. The scenario described would never take place in the given piece of code.
But, even though the sum is a local variable, due to the fact it is used and retained inside a closure, it will be allocated on heap, rather than on stack. That's because the Swift compiler cannot determine whether the closure will be finished its execution before the doSomething() function return or not.
Why? Because the closure is passed to a constructor which behaves like an #escaping parameter. It implies that there is no guarantee when the closure will be executed, thus all the variables retained by the closure must be allocated on heap for safety. Not knowing when the closure will be executed, the Thread Sanitizer cannot determine that the return sum statement will be indeed executed after the closure finishes.
So even though here we can be sure that no race condition can happen, the Thread Sanitizer raises an alarm, as it cannot determine it cannot happen.

Atomic property and usage

I have read many stackoverflow answers for this like Is an atomic property thread safe?, When to use #atomic? or Atomic properties vs thread-safe in Objective-C but I have question for this:
Please correct me If I am wrong, This is like that I am using a count variable which I have declared with Atomic property and currently its value is 5
which is accessed by two thread , First thread that is increasing count value by 2 and 2nd thread decreasing the count value by 1 ,According to my understanding this go sequentially like when first thread increased its value which is now 5 + 2 = 7; after then only 2nd thread can access count variable and only decrease its value by 1 and that is 7 - 1 = 6?
First thread that is increasing count value by 2
That is not an atomic operation, and atomic in no way helps you. This is (at least) three separate atomic operations:
read value
increase value
write value
This is the classic multi-writer race condition. Another thread might read between "read value" and "write value." In your example the final result could be 4, such that the increase operation is completely lost (A reads 5, B reads 5, A +2, A writes 7, B -1, B writes 4).
The problem that atomic is meant to solve is that "read value" and "write value" aren't even atomic operations in many platform-specific situations. The above may actually be 5 operations such as:
read lower word
read upper word
increase value
write lower word
write upper word
Without atomic, another thread might read between "write lower word" and "write upper word" and get a garbage value that was never written (half of one value and half of another). By using atomic, you ensure that readers will always receive a "legitimate" value (one that was written at some point). But that's not much of a promise.
But as is noted in the questions you provide, if you need to make reads and writes atomic, you almost certainly need more than that, because you also want to make "increase the value" atomic. So in practice atomic is rarely useful. If you don't need it, it's slow. If you do need it, it's probably insufficient.
Atomic based property where two threads are accessing same object to increase or decrease a value , You can understand this like both have accessed the same/whole value which is 5 but First thread trying to update this value then this is locked no other thread can update this at the same time however 2nd thread is also having same value which is 5 and can update only when Count object updation get exited by first thread.
Ok, threads might not execute sequentially, the first thread created might run after the second one. If the threads executes in the order you describe, the mentioned behavior is fine. But I think that your have a misconception about thread safe.
I encourage you to read more about Concurrency Programming Guide and Thread safe.

The time given to every thread whether the same or has a minimum time?

For example, in thread 1 there is executing something and it uses a global variable, but another thread may change this value
thread 1
a = 1;
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
NSLog(#"a = %d", a);
});
thread 2
a = 2;
there are two questions,
if thread 1 executes first and can I assume system will always print a = 1? or system can change to thread 2 halfway and then change to thread 1 and get a = 2?
if I don't put NSLog in dispatch_asyc(), whether this cause different result?
You can't tell or guarantee exactly. You can assign the threads (or queues) priorities but you still don't know exactly what will happen.
Yes, logging will make a difference to the runtime, again you don't know if it might make a difference to the thread management.
So, if you need something to be protected from access by multiple threads then you need to protect it by adding some synchronisation. How you choose to do that depends on what it is and each case needs to be considered separately.

Mutex are needed to protect the Condition Variables

As it is said that Mutex are needed to protect the Condition Variables.
Is the reference here to the actual condition variable declared as pthread_cond_t
OR
A normal shared variable count whose values decide the signaling and wait.
?
is the reference here to the actual condition variable declared as pthread_cond_t or a normal shared variable count whose values decide the signaling and wait?
The reference is to both.
The mutex makes it so, that the shared variable (count in your question) can be checked, and if the value of that variable doesn't meet the desired condition, the wait that is performed inside pthread_cond_wait() will occur atomically with respect to that check.
The problem being solved with the mutex is that you have two separate operations that need to be atomic:
check the condition of count
wait inside of pthread_cond_wait() if the condition isn't met yet.
A pthread_cond_signal() doesn't 'persist' - if there are no threads waiting on the pthread_cond_t object, a signal does nothing. So if there wasn't a mutex making the two operations listed above atomic with respect to one another, you could find yourself in the following situation:
Thread A wants to do something once count is non-zero
Thread B will signal when it increments count (which will set count to something other than zero)
thread "A" checks count and finds that it's zero
before "A" gets to call pthread_cond_wait(), thread "B" comes along and increments count to 1 and calls pthread_cond_signal(). That call actually does nothing of consequence since "A" isn't waiting on the pthread_cond_t object yet.
"A" calls pthread_cond_wait(), but since condition variable signals aren't remembered, it will block at this point and wait for the signal that has already come and gone.
The mutex (as long as all threads are following the rules) makes it so that item #2 cannot occur between items 1 and 3. The only way that thread "B" will get a chance to increment count is either before A looks at count or after "A" is already waiting for the signal.
A condition variable must always be associated with a mutex, to avoid the race condition where a thread prepares to wait on a condition variable and another thread signals the condition just before the first thread actually waits on it.
More info here
Some Sample:
Thread 1 (Waits for the condition)
pthread_mutex_lock(cond_mutex);
while(i<5)
{
pthread_cond_wait(cond, cond_mutex);
}
pthread_mutex_unlock(cond_mutex);
Thread 2 (Signals the condition)
pthread_mutex_lock(cond_mutex);
i++;
if(i>=5)
{
pthread_cond_signal(cond);
}
pthread_mutex_unlock(cond_mutex);
As you can see in the same above, the mutex protects the variable 'i' which is the cause of the condition. When we see that the condition is not met, we go into a condition wait, which implicitly releases the mutex and thereby allowing the thread doing the signalling to acquire the mutex and work on 'i' and avoid race condition.
Now, as per your question, if the signalling thread signals first, it should have acquired the mutex before doing so, else the first thread might simply check the condition and see that it is not being met and might go for condition wait and since the second thread has already signalled it, no one will signal it there after and the first thread will keep waiting forever.So, in this sense, the mutex is for both the condition & the conditional variable.
Per the pthreads docs the reason that the mutex was not separated is that there is a significant performance improvement by combining them and they expect that because of common race conditions if you don't use a mutex, it's almost always going to be done anyway.
https://linux.die.net/man/3/pthread_cond_wait​
Features of Mutexes and Condition Variables
It had been suggested that the mutex acquisition and release be
decoupled from condition wait. This was rejected because it is the
combined nature of the operation that, in fact, facilitates realtime
implementations. Those implementations can atomically move a
high-priority thread between the condition variable and the mutex in a
manner that is transparent to the caller. This can prevent extra
context switches and provide more deterministic acquisition of a mutex
when the waiting thread is signaled. Thus, fairness and priority
issues can be dealt with directly by the scheduling discipline.
Furthermore, the current condition wait operation matches existing
practice.
I thought that a better use-case might help better explain conditional variables and their associated mutex.
I use posix conditional variables to implement what is called a Barrier Sync. Basically, I use it in an app where I have 15 (data plane) threads that all do the same thing, and I want them all to wait until all data planes have completed their initialization. Once they have all finished their (internal) data plane initialization, then they can start processing data.
Here is the code. Notice I copied the algorithm from Boost since I couldnt use templates in this particular application:
void LinuxPlatformManager::barrierSync()
{
// Algorithm taken from boost::barrier
// In the class constructor, the variables are initialized as follows:
// barrierGeneration_ = 0;
// barrierCounter_ = numCores_; // numCores_ is 15
// barrierThreshold_ = numCores_;
// Locking the mutex here synchronizes all condVar logic manipulation
// from this point until the point where either pthread_cond_wait() or
// pthread_cond_broadcast() is called below
pthread_mutex_lock(&barrierMutex_);
int gen = barrierGeneration_;
if(--barrierCounter_ == 0)
{
// The last thread to call barrierSync() enters here,
// meaning they have all called barrierSync()
barrierGeneration_++;
barrierCounter_ = barrierThreshold_;
// broadcast is the same as signal, but it signals ALL waiting threads
pthread_cond_broadcast(&barrierCond_);
}
while(gen == barrierGeneration_)
{
// All but the last thread to call this method enter here
// This call is blocking, not on the mutex, but on the condVar
// this call actually releases the mutex
pthread_cond_wait(&barrierCond_, &barrierMutex_);
}
pthread_mutex_unlock(&barrierMutex_);
}
Notice that every thread that enters the barrierSync() method locks the mutex, which makes everything between the mutex lock and the call to either pthread_cond_wait() or pthread_mutex_unlock() atomic. Also notice that the mutex is released/unlocked in pthread_cond_wait() as mentioned here. In this link it also mentions that the behavior is undefined if you call pthread_cond_wait() without having first locked the mutex.
If pthread_cond_wait() did not release the mutex lock, then all threads would block on the call to pthread_mutex_lock() at the beginning of the barrierSync() method, and it wouldnt be possible to decrease the barrierCounter_ variables (nor manipulate related vars) atomically (nor in a thread safe manner) to know how many threads have called barrierSync()
So to summarize all of this, the mutex associated with the Conditional Variable is not used to protect the Conditional Variable itself, but rather it is used to make the logic associated with the condition (barrierCounter_, etc) atomic and thread-safe. When the threads block waiting for the condition to become true, they are actually blocking on the Conditional Variable, not on the associated mutex. And a call to pthread_cond_broadcast/signal() will unblock them.
Here is another resource related to pthread_cond_broadcast() and pthread_cond_signal() for an additional reference.

Resources