Atomic property and usage - ios

I have read many stackoverflow answers for this like Is an atomic property thread safe?, When to use #atomic? or Atomic properties vs thread-safe in Objective-C but I have question for this:
Please correct me If I am wrong, This is like that I am using a count variable which I have declared with Atomic property and currently its value is 5
which is accessed by two thread , First thread that is increasing count value by 2 and 2nd thread decreasing the count value by 1 ,According to my understanding this go sequentially like when first thread increased its value which is now 5 + 2 = 7; after then only 2nd thread can access count variable and only decrease its value by 1 and that is 7 - 1 = 6?

First thread that is increasing count value by 2
That is not an atomic operation, and atomic in no way helps you. This is (at least) three separate atomic operations:
read value
increase value
write value
This is the classic multi-writer race condition. Another thread might read between "read value" and "write value." In your example the final result could be 4, such that the increase operation is completely lost (A reads 5, B reads 5, A +2, A writes 7, B -1, B writes 4).
The problem that atomic is meant to solve is that "read value" and "write value" aren't even atomic operations in many platform-specific situations. The above may actually be 5 operations such as:
read lower word
read upper word
increase value
write lower word
write upper word
Without atomic, another thread might read between "write lower word" and "write upper word" and get a garbage value that was never written (half of one value and half of another). By using atomic, you ensure that readers will always receive a "legitimate" value (one that was written at some point). But that's not much of a promise.
But as is noted in the questions you provide, if you need to make reads and writes atomic, you almost certainly need more than that, because you also want to make "increase the value" atomic. So in practice atomic is rarely useful. If you don't need it, it's slow. If you do need it, it's probably insufficient.

Atomic based property where two threads are accessing same object to increase or decrease a value , You can understand this like both have accessed the same/whole value which is 5 but First thread trying to update this value then this is locked no other thread can update this at the same time however 2nd thread is also having same value which is 5 and can update only when Count object updation get exited by first thread.

Ok, threads might not execute sequentially, the first thread created might run after the second one. If the threads executes in the order you describe, the mentioned behavior is fine. But I think that your have a misconception about thread safe.
I encourage you to read more about Concurrency Programming Guide and Thread safe.

Related

When can I not use atomic properties? [duplicate]

This question already has answers here:
What's the difference between the atomic and nonatomic attributes?
(27 answers)
Closed 7 years ago.
I know there are answers on atomic vs. non-atomic answers, but they mostly seem to be fairly old (2011 and earlier), so I'm hoping for updated advice. My understanding is that non-atomic properties are faster but not thread safe. Does this mean that any property that might be accessed from multiple threads at the same time should always be atomic? Are there conditions that can make it ok to make it non-atomic? What other concerns are there in determining whether to make a property atomic or nonatomic?
Declaring a property atomic makes compiler generate additional code that prevents concurrent access to the property. This additional code locks a semaphore, then gets or sets the property, and then unlock the semaphore. Compared to setting or getting a primitive value or a pointer, locking and unlocking a semaphore is expensive (although it is usually negligible if you consider the overall flow of your app).
Since most of your classes under iOS, especially the ones related to UI, will be used in a single-threaded environment, it is safe to drop atomic (i.e. write nonatomic, because properties are atomic by default), even though the operation is relatively inexpensive, you do not want to pay for things that you do not need.
Declaring a property as atomic does not necessarily make it thread safe.
Atomic is the default and involves some extra overhead compared to nonatomic. If thread A is halfway through the getter for that property and thread B changes the value in the setter, using atomic will ensure that a viable, whole value is returned from the getter. If you use nonatomic no such guarantee is made, the extra code is not generated, and thus nonatomic is faster.
However, This does not guarantee thread safety. If thread A calls the getter and threads B and C are updating the thread with different values then thread A could get either value and there is no guarantee which one it will get.
To specifically answer your question, many scenarios allow for nonatomic properties, if not most. Although the extra overhead of using atomic is probably negligable. Simply put, are your properties read or set on different threads? If not, you most likely don't need to declare them as atomic but the extra overhead might not even be noticable. It's just that simply declaring them as atomic does not guarantee thread safety.
In most situations it is unimportant, whether a property is atomic or not in a multithreaded environment.
What?
In most situations it is unimportant, whether a property is atomic or not in a multithreaded environment.
The reason for this is that making a property "thread-safe" by turning atomicity on does not make your code thread-safe. To get this you need more work and this work typically implicitly ensures that the property is not accessed parallel.
Let's have an example: You have a class Person with the properties firstName and lastName both of NSString*. You simply want to add the both name separated with a space to have a full name.
NSString *fullName = [NSString stringWithFormat:#"%# %#", person.firstName, person.lastName];
You know that other threads can update the properties while you do that. Making the properties atomic doesn't help anything. This does not help, if the last name of the persons are changed, after reading the first name, but before reading the second name. This does not help, if the names of the persons are changed after calculating the full name, because this can be invalid in the very next moment.
You have to serialize operations, not property accesses. But atomicity only serializes accesses. So it does not help, if at least one operation has more than one access. 99,999999373 % of all cases.
Forget about property atomicity. It is meaningless.
There are no other concerns. Yes, any property that can be accessed on multiple threads should be atomic or you can end up with unexpected results.

Which is thread Safe atomic or non-atomic?

After reading this answer I am very confused.
Some says atomic is thread safe and some are saying nonatomic is thread safe.
What is the exact answer of this.
Thread unsafety of operations is caused by the fact that an operation can be divided into several suboperations, for example:
a = a + 1
can be subdivided into operations
load value of a
add 1 to the loaded value
assign the calculated value to a.
The word "Atomic" comes from "atom" which comes from greek "atomos", which means "that which can't be split". For an operation it means that it is always performed as a whole, it is never performed one suboperation at a time. That's why it is thread safe.
TL;DR Atomic = thread safe.
Big warning: Having properties atomic does not mean that a whole function/class is thread safe. Atomic properties means only that operations with the given properties are thread safe.
Obviously, nonatomic certainly isn’t thread safe. More interestingly, atomic is closer, but it is insufficient to achieve “thread safety”, as well. To quote Apple’s Programming with Objective-C: Encapsulating Data:
Note: Property atomicity is not synonymous with an object’s thread safety.
It goes on to provide an example:
Consider an XYZPerson object in which both a person’s first and last names are changed using atomic accessors from one thread. If another thread accesses both names at the same time, the atomic getter methods will return complete strings (without crashing), but there’s no guarantee that those values will be the right names relative to each other. If the first name is accessed before the change, but the last name is accessed after the change, you’ll end up with an inconsistent, mismatched pair of names.
This example is quite simple, but the problem of thread safety becomes much more complex when considered across a network of related objects. Thread safety is covered in more detail in Concurrency Programming Guide.
Also see bbum’s Objective-C: Atomic, properties, threading and/or custom setter/getter.
The reason this is so confusing is that, in fact, the atomic keyword does ensure that your access to that immediate reference is thread safe. Unfortunately, when dealing with objects, that’s rarely sufficient. First, you have no assurances that the property’s own internal properties are thread safe. Second, it doesn’t synchronize your app’s access to the object’s individual properties (such as Apple’s example above). So, atomic is almost always insufficient to achieve thread safety, so you generally have to employ some higher-level degree of synchronization. And if you provide that higher-level synchronization, adding atomicity to that mix is redundant and inefficient.
So, with objects, atomic rarely has any utility. It can be useful, though, when dealing primitive C data types (e.g. integers, booleans, floats). For example, you might have some boolean that might be updated in some other thread indicating whether that thread’s asynchronous task is completed. This is a perfect use case for atomic.
Otherwise, we generally reach for higher-level synchronization mechanisms for thread safety, such as GCD serial queues or reader-writer pattern (... or, less common nowadays, locks, the #synchronized directive, etc.).
As you can read at developer.apple you should use atomic functions for thread savety.
You can read more about atomic functions here: Atomic Man page
In short:
Atomic ~ not splittable ~ not shared by threads
As is mentioned in several answers to the posted question, atomic is thread safe. This means that getter/setter working on any thread should finish first, before any other thread can perform getter/setter.

Why object's mutability is relevant to its Thread safe?

I am working on CoreImage on IOS, you guys know CIContext and CIImage are immutable so they can be shared on threads. The problem is i am not sure why objects' mutability is closely relevant to its thread safe.
I can guess the rough reason is to prevent multiple threads from do something on a certain object at the same time. But can anybody provide some theoretical evidence or give a specific answer to it?
You are correct, mutable objects are not thread safe because multiple threads can write to that data at the same time. This is in contrast to reading data, an operation multiple threads can do simultaneously without causing problems. Immutable types can only be read.
However, when multiple threads are writing to the same data, these operations may interfere. For example, an NSMutableArray which two threads are editing could easily be corrupted. One thread is editing at one location, suddenly changing the memory the other thread was updating. iOS will use what is called an atomic operation for simple data types. What this means is that iOS requires the edit operation to fully finish before anything else can happen. This has an efficiency advantage over locks. Just Google about this stuff if you want to know more.
Well, actually, you are right.
If you have an immutable objects, all you can do is to read data from them. And every thread will get the same data, since the object can not be changed.
When you have a mutable object, the problem is that (in general) read and write operations are not atomic. It means that they are not performed instantaneously, they take time. So they can overlap each other. Even if we have a single-core processor, it can switch between threads at arbitrary moments of time. Possible problems it might cause:
1) Two threads write at the same object simultaneously. In this case the object might become corrupted (for instance, half of data comes from the first thread, the other half – from the second, the result is unpredictable.
2) One thread writes the data, another one reads it. One problem is that thread might read already outdated data. Another one is that it might read a corrupted data (if the writing thread haven't finished writing yet).
Take the example with an array of values. Let's say I'm doing some computation and I find the count of the array to be 4
var array = [1,2,3,4]
let count = array.count
Now maybe I want to loop through all the elements in the array, so I setup a loop that goes through index i<4 or something along those lines. So far so good.
This array is mutable though, so on a separate thread, I could easily remove an element. So perhaps I'm on thread 1 and I get the count to be 4, I perhaps start looping through the array. Now we switch to thread 2 and I remove an element, so now this same array has only 3 values in it. If I end up going back to thread 1 and I still assume I have 4 values while looping through the array, my program is going to crash when it tries to access the 4th element.
This is why immutability is desirable, it can guarantee some level of consistency across threads.

Is a read on an atomic variable guaranteed to acquire the current value of it in C++11?

It is known that the modifications on a single atomic variable form a total order. Suppose we have an atomic read operation on some atomic variable v at wall-clock time T. Then, is this read guaranteed to acquire the current value of v that is wrote by the last one in the modification order of v at time T? To put it in another way, if an atomic write is done before an atomic read in natural time, and there is no other writes in between, then is the read guaranteed to return the value just written?
My accepted answer is the 6th comment made by Cubbi to his answer.
Wall-clock time is irrelevant. However, what you're describing sounds like the write-read coherence guarantee:
$1.10[intro.multithread]/20
If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M.
(translating the standardese, "value computation" is a read, and "side effect" is a write)
In particular, if your relaxed write and your relaxed read are in different statements of the same function, they are connected by a sequenced-before relationship, therefore they are connected by a happens-before relationship, therefore the guarantee holds.
Depends on the memory order which you can specify for the load() operation.
By default, it is std::memory_order_seq_cst and the answer is yes, it guarantees the current value stored by another thread (if stored at all, i.e. it must use std::memory_order_release memory order at least, otherwise the store visibility is not guaranteed).
But if you specify std::memory_order_relaxed for the load operation the documentation says Relaxed ordering: there are no synchronization or ordering constraints, only atomicity is required of this operation. I.e. the program could end up not reading from the memory at all.
Is a read on an atomic variable guaranteed to acquire the current value of it
No
Even though each atomic variable has a single modification order (which is observed by all threads), that does not mean that all threads observe modifications at the same time scale.
Consider this code:
std::atomic<int> g{0};
// thread 1
g.store(42);
// thread 2
int a = g.load();
// do stuff with a
int b = g.load();
A possible outcome is (see diagram):
thread 1: 42 is stored at time T1
thread 2: the first load returns 0 at time T2
thread 2: the store from thread 1 becomes visible at time T3
thread 2: the second load returns 42 at time T4.
This outcome is possible even though the first load at time T2 occurs after the store at T1 (in clock time).
The standard says:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
It does not require a store to become visible right away and it even allows room for a store to remain invisible (e.g. on systems without cache-coherency).
In that case, an atomic read-modify-write (RMW) is required to access the last value.
Atomic read-modify-write operations shall always read the last value (in the modification order) written
before the write associated with the read-modify-write operation.
Needless to say, RMW's are more expensive to execute (they lock the bus) and that is why a regular atomic load is allowed to return an older (cached) value.
If a regular load was required to return the last value, performance would be horrible while there would be hardly any benefit.

The memory consistency model CUDA 4.0 and global memory?

Update: The while() condition below gets optimized out by the compiler, so both threads just skip the condition and enter the C.S. even with -O0 flag. Does anyone know why the compiler is doing this? By the way, declaring the global variables volatile causes the program to hang for some odd reason...
I read the CUDA programming guide but I'm still a bit unclear on how CUDA handles memory consistency with respect to global memory. (This is different from the memory hierarchy) Basically, I am running tests trying to break sequential consistency. The algorithm I am using is Peterson's algorithm for mutual exclusion between two threads inside the kernel function:
flag[threadIdx.x] = 1; // both these are global
turn = 1-threadIdx.x;
while(flag[1-threadIdx.x] == 1 && turn == (1- threadIdx.x));
shared_gloabl_variable_x ++;
flag[threadIdx.x] = 0;
This is fairly straightforward. Each thread asks for the critical section by setting its flag to one and by being nice by giving the turn to the other thread. At the evaluation of the while(), if the other thread did not set its flag, the requesting thread can then enter the critical section safely. Now a subtle problem with this approach is that if the compiler re-orders the writes so that the write to turn executes before the write to flag. If this happens both threads will end up in the C.S. at the same time. This fairly easy to prove with normal Pthreads, since most processors don't implement sequential consistency. But what about GPUs?
Both of these threads will be in the same warp. And they will execute their statements in lock-step mode. But when they reach the turn variable they are writing to the same variable so the intra-warp execution becomes serialized (doesn't matter what the order is). Now at this point, does the thread that wins proceed onto the while condition, or does it wait for the other thread to finish its write, so that both can then evaluate the while() at the same time? The paths again will diverge at the while(), because only one of them will win while the other waits.
After running the code, I am getting it to consistently break SC. The value I read is ALWAYS 1, which means that both threads somehow are entering the C.S. every single time. How is this possible (GPUs execute instructions in order)? (Note: I have compiled it with -O0, so no compiler optimization, and hence no use of volatile).
Edit: since you have only two threads and 1-threadIdx.x works, then you must be using thread IDs 0 and 1. Threads 0 and 1 will always be part of the same warp on all current NVIDIA GPUs. Warps execute instructions SIMD fashion, with a thread execution mask for divergent conditions. Your while loop is a divergent condition.
When turn and flags are not volatile, the compiler probably reorders the instructions and you see the behavior of both threads entering the C.S.
When turn and flags are volatile, you see a hang. The reason is that one of the threads will succeed at writing turn, so turn will be either 0 or 1. Let's assume turn==0: If the hardware chooses to execute thread 0's part of the divergent branch, then all is OK. But if it chooses to execute thread 1's part of the divergent branch, then it will spin on the while loop and thread 0 will never get its turn, hence the hang.
You can probably avoid the hang by ensuring that your two threads are in different warps, but I think that the warps must be concurrently resident on the SM so that instructions can issue from both and progress can be made. (Might work with concurrent warps on different SMs, since this is global memory; but that might require __threadfence() and not just __threadfence_block().)
In general this is a great example of why code like this is unsafe on GPUs and should not be used. I realize though that this is just an investigative experiment. In general CUDA GPUs do not—as you mention most processors do not—implement sequential consistency.
Original Answer
the variables turn and flag need to be volatile, otherwise the load of flag will not be repeated and the condition turn == 1-threadIdx.X will not be re-evaluated but instead will be taken as true.
There should be a __threadfence_block() between the store to flag and store to turn to get the right ordering.
There should be a __threadfence_block() before the shared variable increment (which should also be declared volatile). You may also want a __syncthreads() or at least __threadfence_block() after the increment to ensure it is visible to other threads.
I have a hunch that even after making these fixes you may still run into trouble, though. Let us know how it goes.
BTW, you have a syntax error in this line, so it's clear this isn't exactly your real code:
while(flag[1-threadIdx.x] == 1 and turn==[1- threadIdx.x]);
In the absence of extra memory barriers such as __threadfence(), sequential consistency of global memory is enforced only within a given thread.

Resources