How to add numbers using multiple threads? - ios

I'm trying to add the numbers in a range eg. 1 to 50000000. But using a for-loop or reduce(_:_:) is taking too long to calculate the result.
func add(low: Int, high: Int) -> Int {
return (low...high).reduce(0, +)
}
Is there any way to do it using multiple threads?

Adding a series of integers does not amount to enough work to justify multiple threads. While this admittedly took 28 seconds on a debug build on my computer, in an optimized, release build, the single-threaded approach took milliseconds.
So, when testing performance, make sure to use an optimized “Release” build in your scheme settings (and/or manually change the optimization settings in your target’s build settings).
But, let us set this aside for a second and assume that you really were doing a calculation that was complex enough to justify running it on multiple threads. In that case, the simplest approach would be to just dispatch the calculation to another thread, and perhaps dispatch the results back to the main thread:
func add(low: Int, high: Int, completion: #escaping (Int) -> Void) {
DispatchQueue.global().async {
let result = (low...high).reduce(0, +)
DispatchQueue.main.async {
completion(result)
}
}
}
And you'd use it like so:
add(low: 0, high: 50_000_000) { result in
// use `result` here
self.label.text = "\(result)"
}
// but not here, because the above runs asynchronously
That will ensure that the main thread is not blocked while the calculation is being done. Again, in this example, adding 50 million integers on a release build may not even require this, but the general idea is to make sure that anything that takes more than a few milliseconds is moved off the main thread.
Now, if the computation was significantly more complicated, one might use concurrentPerform, which is like a for loop, but each iteration runs in parallel. You might think you could just dispatch each calculation to a concurrent queue using async, but that can easily exhaust the limited number of worker threads (called “thread explosion”, which can lead to locks and/or deadlocks). So we reach for concurrentPerform to perform calculations in parallel, but to constrain the number of concurrent threads to the capabilities of the device in question (namely, how many cores the CPU has).
Let’s consider this simple attempt to calculate the sum in parallel. This is inefficient, but we’ll refine it later:
func add(low: Int, high: Int, completion: #escaping (Int) -> Void) {
DispatchQueue.global().async {
let lock = NSLock()
var sum = 0
// the `concurrentPerform` below is equivalent to
//
// for iteration in 0 ... (high - low) { ... }
//
// but the iterations run in parallel
DispatchQueue.concurrentPerform(iterations: high - low + 1) { iteration in
// do some calculation in parallel
let value = iteration + low
// synchronize the update of the shared resource
lock.synchronized {
sum += value
}
}
// call completion handler with the result
DispatchQueue.main.async {
completion(sum)
}
}
}
Note, because we have multiple threads adding values, we must synchronize the interaction with sum to ensure thread-safety. In this case, I'm using NSLock and this routine (because introducing a GCD serial queue and/or using reader-writer in these massively parallelized scenarios is even slower):
extension NSLocking {
func synchronized<T>(block: () throws -> T) rethrows -> T {
lock()
defer { unlock() }
return try block()
}
}
Above, I wanted to show the above simple use of concurrentPerform but you are going to find that that is much slower than the single threaded implementation. That is because there is not enough work running on each thread and we’ll do 50m synchronizations. So we might, instead, “stride” adding a million values per thread:
func add(low: Int, high: Int, completion: #escaping (Int) -> Void) {
DispatchQueue.global().async {
let stride = 1_000_000
let iterations = (high - low) / stride + 1
let lock = NSLock()
var sum = 0
DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
let start = iteration * stride + low
let end = min(start + stride - 1, high)
let subtotal = (start...end).reduce(0, +)
lock.synchronized {
sum += subtotal
}
}
DispatchQueue.main.async {
completion(sum)
}
}
}
So, each thread adds up to 1 million values in a local subtotal and then when that calculation is done, it synchronizes the update of sum. This increases the work per thread and dramatically reduces the number of synchronizations. Frankly, adding a million integers is still no where near enough to justify the multithreading overhead, but it illustrates the idea.
If you want to see an example where concurrentPerform might be useful, consider this example, where we are calculating the Mandelbrot set, where each pixel of the calculation might be computationally intense. And we again stride (e.g. each iteration calculates a row of pixels), which (a) ensures that each thread is doing enough work to justify the multithreading overhead, and (b) avoid memory contention issues (a.k.a. “cache sloshing”).

If you want a function to just return the sum to all integers in range from low to high then you can do it even faster with some simple maths
you can consider an arithmetic sequence starting from low and going to high with a common difference of 1 ,and have (high - low + 1) elements in it.
then the sum will straight up be :-
sum = ( (high * ( high + 1 )) - ((low * (low - 1)) ) / 2

Related

How to use gcd barrier in iOS?

I want to use gcd barrier implement a safe store object. But it not work correctly. The setter sometime is more early than the getter. What's wrong with it?
https://gist.github.com/Terriermon/02c446d1238ad6ec1edb08b607b1bf05
class MutiReadSingleWriteObject<T> {
let queue = DispatchQueue(label: "com.readwrite.concurrency", attributes: .concurrent)
var _object:T?
var object: T? {
#available(*, unavailable)
get {
fatalError("You cannot read from this object.")
}
set {
queue.async(flags: .barrier) {
self._object = newValue
}
}
}
func getObject(_ closure: #escaping (T?) -> Void) {
queue.async {
closure(self._object)
}
}
}
func testMutiReadSingleWriteObject() {
let store = MutiReadSingleWriteObject<Int>()
let queue = DispatchQueue(label: "com.come.concurrency", attributes: .concurrent)
for i in 0...100 {
queue.async {
store.getObject { obj in
print("\(i) -- \(String(describing: obj))")
}
}
}
print("pre --- ")
store.object = 1
print("after ---")
store.getObject { obj in
print("finish result -- \(String(describing: obj))")
}
}
Whenever you create a DispatchQueue, whether serial or concurrent, it spawns its own thread that it uses to schedule and run work items on. This means that whenever you instantiate a MutiReadSingleWriteObject<T> object, its queue will have a dedicated thread for synchronizing your setter and getObject method.
However: this also means that in your testMutiReadSingleWriteObject method, the queue that you use to execute the 100 getObject calls in a loop has its own thread too. This means that the method has 3 separate threads to coordinate between:
The thread that testMutiReadSingleWriteObject is called on (likely the main thread),
The thread that store.queue maintains, and
The thread that queue maintains
These threads run their work in parallel, and this means that an async dispatch call like
queue.async {
store.getObject { ... }
}
will enqueue a work item to run on queue's thread at some point, and keep executing code on the current thread.
This means that by the time you get to running store.object = 1, you are guaranteed to have scheduled 100 work items on queue, but crucially, how and when those work items actually start executing are up to the queue, the CPU scheduler, and other environmental factors. While somewhat rare, this does mean that there's a chance that none of those tasks have gotten to run before the assignment of store.object = 1, which means that by the time they do happen, they'll see a value of 1 stored in the object.
In terms of ordering, you might see a combination of:
100 getObject calls, then store.object = 1
N getObject calls, then store.object = 1, then (100 - N) getObject calls
store.object = 1, then 100 getObject calls
Case (2) can actually prove the behavior you're looking to confirm: all of the calls before store.object = 1 should return nil, and all of the ones after should return 1. If you have a getObject call after the setter that returns nil, you'd know you have a problem. But, this is pretty much impossible to control the timing of.
In terms of how to address the timing issue here: for this method to be meaningful, you'll need to drop one thread to properly coordinate all of your calls to store, so that all accesses to it are on the same thread.
This can be done by either:
Dropping queue, and just accessing store on the thread that the method was called on. This does mean that you cannot call store.getObject asynchronously
Make all calls through queue, whether sync or async. This gives you the opportunity to better control exactly how the store methods are called
Either way, both of these approaches can have different semantics, so it's up to you to decide what you want this method to be testing. Do you want to be guaranteed that all 100 calls will go through before store.object = 1 is reached? If so, you can get rid of queue entirely, because you don't actually want those getters to be called asynchronously. Or, do you want to try to cause the getters and the setter to overlap in some way? Then stick with queue, but it'll be more difficult to ensure you get meaningful results, because you aren't guaranteed to have stable ordering with the concurrent calls.

Static Dispatch with Final performance comparison

According to the article by Apple - Increasing Performance by Reducing Dynamic Dispatch, it suggests that dynamic dispatch is not good performance wise
dynamic dispatch, increases language expressivity at the cost of a
constant amount of runtime overhead for each indirect usage. In
performance sensitive code such overhead is often undesirable
Also,
three ways to improve performance by eliminating such dynamism: final,
private, and Whole Module Optimization.
From what I understood, it said that when we use final keyword, it ensures the compiler that class will never be subclassed and thus increases performance (as without that, there cannot be dynamic dispatch).
So to sum it up, final will increase the performance for that class.
So, I performed a basic test to make sure of it :
import Foundation
func calculateTimeElapsed(label: String, codeToRun: () -> Void ) {
let startTime = CFAbsoluteTimeGetCurrent()
codeToRun()
let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
print("Time elapsed for \(label): \(timeElapsed) sec")
}
class Foo {
var number: Int = 1
func incrementNumber() {
number += 1
}
}
final class Bar {
var number: Int = 1
func incrementNumber() {
number += 1
}
}
let fooObject = Foo()
calculateTimeElapsed(label: "Static Dispatch for Foo") {
for _ in 0...100000 {
fooObject.incrementNumber()
}
}
let barObject = Bar()
calculateTimeElapsed(label: "Static Dispatch for Bar") {
for _ in 0...100000 {
barObject.incrementNumber()
}
}
/// 100000
//Time elapsed for Static Dispatch for Foo: 7.20068895816803 sec
//Time elapsed for Static Dispatch for Bar: 7.22502601146698 sec
/// 200000
//Time elapsed for Static Dispatch for Foo: 13.975957989692688 sec
//Time elapsed for Static Dispatch for Bar: 14.329360961914062 sec
/// 500000
//Time elapsed for Static Dispatch for Foo: 36.355777978897095 sec
//Time elapsed for Static Dispatch for Bar: 36.50222206115723 sec
/// 700000
//Time elapsed for Static Dispatch for Foo: 51.68453896045685 sec
//Time elapsed for Static Dispatch for Bar: 51.46391808986664 sec
//
Two classes , one final(Bar) and one not (Foo)
Performed some operations over certain input on objects of both the classes
Measured the time required for executing both the operations
However, to my surprise, the class that is not final takes less time in 3/4 cases. It can be that when the number of iterations increases, the result may go into the favor of final class, but i need to be sure that this concept is correct to measure.
I know my example might be wrong, but if it is, then can anyone please tell me some other example with which the fact that final increases the performance is correct

How to handle Race Condition Read/Write Problem in Swift?

I have got a concurrent queue with dispatch barrier from Raywenderlich post Example
private let concurrentPhotoQueue = DispatchQueue(label: "com.raywenderlich.GooglyPuff.photoQueue", attributes: .concurrent)
Where write operations is done in
func addPhoto(_ photo: Photo) {
concurrentPhotoQueue.async(flags: .barrier) { [weak self] in
// 1
guard let self = self else {
return
}
// 2
self.unsafePhotos.append(photo)
// 3
DispatchQueue.main.async { [weak self] in
self?.postContentAddedNotification()
}
}
}
While read opeartion is done in
var photos: [Photo] {
var photosCopy: [Photo]!
// 1
concurrentPhotoQueue.sync {
// 2
photosCopy = self.unsafePhotos
}
return photosCopy
}
As this will resolve Race Condition. Here why only Write operation is done with barrier and Read in Sync. Why is Read not done with barrier and write with sync ?. As with Sync Write, it will wait till it reads like a lock and while barrier Read it will only be read operation.
set(10, forKey: "Number")
print(object(forKey: "Number"))
set(20, forKey: "Number")
print(object(forKey: "Number"))
public func set(_ value: Any?, forKey key: String) {
concurrentQueue.sync {
self.dictionary[key] = value
}
}
public func object(forKey key: String) -> Any? {
// returns after concurrentQueue is finished operation
// beacuse concurrentQueue is run synchronously
var result: Any?
concurrentQueue.async(flags: .barrier) {
result = self.dictionary[key]
}
return result
}
With the flip behavior, I am getting nil both times, with barrier on Write it is giving 10 & 20 correct
You ask:
Why is Read not done with barrier ... ?.
In this reader-writer pattern, you don’t use barrier with “read” operations because reads are allowed to happen concurrently with respect to other “reads”, without impacting thread-safety. It’s the whole motivating idea behind reader-writer pattern, to allow concurrent reads.
So, you could use barrier with “reads” (it would still be thread-safe), but it would unnecessarily negatively impact performance if multiple “read” requests happened to be called at the same time. If two “read” operations can happen concurrently with respect to each other, why not let them? Don’t use barriers (reducing performance) unless you absolutely need to.
Bottom line, only “writes” need to happen with barrier (ensuring that they’re not done concurrently with respect to any “reads” or “writes”). But no barrier is needed (or desired) for “reads”.
[Why not] ... write with sync?
You could “write” with sync, but, again, why would you? It would only degrade performance. Let’s imagine that you had some reads that were not yet done and you dispatched a “write” with a barrier. The dispatch queue will ensure for us that a “write” dispatched with a barrier won’t happen concurrently with respect to any other “reads” or “writes”, so why should the code that dispatched that “write” sit there and wait for the “write” to finish?
Using sync for writes would only negatively impact performance, and offers no benefit. The question is not “why not write with sync?” but rather “why would you want to write with sync?” And the answer to that latter question is, you don’t want to wait unnecessarily. Sure, you have to wait for “reads”, but not “writes”.
You mention:
With the flip behavior, I am getting nil ...
Yep, so lets consider your hypothetical “read” operation with async:
public func object(forKey key: String) -> Any? {
var result: Any?
concurrentQueue.async {
result = self.dictionary[key]
}
return result
}
This effective says “set up a variable called result, dispatch task to retrieve it asynchronously, but don’t wait for the read to finish before returning whatever result currently contained (i.e., nil).”
You can see why reads must happen synchronously, because you obviously can’t return a value before you update the variable!
So, reworking your latter example, you read synchronously without barrier, but write asynchronously with barrier:
public func object(forKey key: String) -> Any? {
return concurrentQueue.sync {
self.dictionary[key]
}
}
public func set(_ value: Any?, forKey key: String) {
concurrentQueue.async(flags: .barrier) {
self.dictionary[key] = value
}
}
Note, because sync method in the “read” operation will return whatever the closure returns, you can simplify the code quite a bit, as shown above.
Or, personally, rather than object(forKey:) and set(_:forKey:), I’d just write my own subscript operator:
public subscript(key: String) -> Any? {
get {
concurrentQueue.sync {
dictionary[key]
}
}
set {
concurrentQueue.async(flags: .barrier) {
self.dictionary[key] = newValue
}
}
}
Then you can do things like:
store["Number"] = 10
print(store["Number"])
store["Number"] = 20
print(store["Number"])
Note, if you find this reader-writer pattern too complicated, note that you could just use a serial queue (which is like using a barrier for both “reads” and “writes”). You’d still probably do sync “reads” and async “writes”. That works, too. But in environments with high contention “reads”, it’s just a tad less efficient than the above reader-writer pattern.

Sync calls from Swift to C based thread-unsafe library

My Swift code needs to call some C functions that are not thread safe. All calls need to be:
1) synchronous (sequential invocation of function, only after previous call returned),
2) on the same thread.
I've tried to create a queue and then access C from within a function:
let queue = DispatchQueue(label: "com.example.app.thread-1", qos: .userInitiated)
func calc(...) -> Double {
var result: Double!
queue.sync {
result = c_func(...)
}
return result
}
This has improved the behaviour yet I still get crashes - sometimes, not as often as before and mostly while debugging from Xcode.
Any ideas about better handling?
Edit
Based on the comments below, can somebody give an general example of how to use a thread class to ensure sequential execution on the same thread?
Edit 2
A good example of the problem can be seen when using this wrapper around C library:
https://github.com/PerfectlySoft/Perfect-PostgreSQL
It works fine when accessed from a single queue. But will start producing weird errors if several dispatch queues are involved.
So I am envisaging an approach of a single executor thread, which, when called, would block the caller, perform calculation, unblock the caller and return result. Repeat for each consecutive caller.
Something like this:
thread 1 | |
---------> | | ---->
thread 2 | executor | ---->
---------> | thread |
thread 3 | -----------> |
---------> | | ---->
...
If you really need to ensure that all API calls must come from a single thread, you can do so by using the Thread class plus some synchronization primitives.
For instance, a somewhat straightforward implementation of such idea is provided by the SingleThreadExecutor class below:
class SingleThreadExecutor {
private var thread: Thread!
private let threadAvailability = DispatchSemaphore(value: 1)
private var nextBlock: (() -> Void)?
private let nextBlockPending = DispatchSemaphore(value: 0)
private let nextBlockDone = DispatchSemaphore(value: 0)
init(label: String) {
thread = Thread(block: self.run)
thread.name = label
thread.start()
}
func sync(block: #escaping () -> Void) {
threadAvailability.wait()
nextBlock = block
nextBlockPending.signal()
nextBlockDone.wait()
nextBlock = nil
threadAvailability.signal()
}
private func run() {
while true {
nextBlockPending.wait()
nextBlock!()
nextBlockDone.signal()
}
}
}
A simple test to ensure the specified block is really being called by a single thread:
let executor = SingleThreadExecutor(label: "single thread test")
for i in 0..<10 {
DispatchQueue.global().async {
executor.sync { print("\(i) # \(Thread.current.name!)") }
}
}
Thread.sleep(forTimeInterval: 5) /* Wait for calls to finish. */
0 # single thread test
1 # single thread test
2 # single thread test
3 # single thread test
4 # single thread test
5 # single thread test
6 # single thread test
7 # single thread test
8 # single thread test
9 # single thread test
Finally, replace DispatchQueue with SingleThreadExecutor in your code and let's hope this fixes your — very exotic! — issue ;)
let singleThreadExecutor = SingleThreadExecutor(label: "com.example.app.thread-1")
func calc(...) -> Double {
var result: Double!
singleThreadExecutor.sync {
result = c_func(...)
}
return result
}
An interesting outcome... I benchmarked performance of solution by Paulo Mattos that I have accepted vs my own earlier experiments where I used a much less elegant and lower level run loop & object reference approach to achieve the same pattern.
Playground for closure based approach:
https://gist.github.com/deze333/23d11123f02e65c456d16ffe5621e2ee
Playground for run loop & reference passing approach:
https://gist.github.com/deze333/82c0ee3e82fd250097449b1b200b7958
Using closures:
Invocations processed : 1000
Invocations duration, sec: 4.95894199609756
Cost per invocation, sec : 0.00495894199609756
Using run loop and passing object reference:
Invocations processed : 1000
Invocations duration, sec: 1.62595099210739
Cost per invocation, sec : 0.00162432666544195
Passing closures is x3 times slower due to them being allocated on the heap vs reference passing. This really confirms the performance problem of closures outlined in an excellent Mutexes and closure capture in Swift article.
The lesson: don't overuse closures when maximum performance in needed, which is often the case in mobile development.
Closures are so beautifully looking though!
EDIT:
Things are much better in Swift 4 with whole module optimisation. Closures are fast!

How to make a performance test fail if it's too slow?

I'd like my test to fail if it runs slower than 0.5 seconds but the average time is merely printed in the console and I cannot find a way to access it. Is there a way to access this data?
Code
//Measures the time it takes to parse the participant codes from the first 100 events in our test data.
func testParticipantCodeParsingPerformance()
{
var increment = 0
self.measureBlock
{
increment = 0
while increment < 100
{
Parser.parseParticipantCode(self.fields[increment], hostCodes: MasterCalendarArray.getHostCodeArray()[increment])
increment++
}
}
print("Events measured: \(increment)")
}
Test Data
[Tests.ParserTest testParticipantCodeParsingPerformance]' measured [Time, seconds] average: 0.203, relative standard deviation: 19.951%, values: [0.186405, 0.182292, 0.179966, 0.177797, 0.175820, 0.205763, 0.315636, 0.223014, 0.200362, 0.178165]
You need to set a baseline for your performance test. Head to the Report Navigator:
and select your recent test run. You'll see a list of all your tests, but the performance ones will have times associated with them. Click the time to bring up the Performance Result popover:
The "Baseline" value is what you're looking for--set it to 0.5s and that will inform Xcode that this test should complete in half a second. If your test is more than 10% slower than the baseline, it'll fail!
The only way to do something similar to what you describe is setting a time limit graphically like #andyvn22 recommends.
But, if you want to do it completely in code, the only thing you can do is extend XCTestCase with a new method that measure the execution time of the closure and returns it to be used in an assertiong, here is an example of what you could do:
extension XCTestCase{
/// Executes the block and return the execution time in millis
public func timeBlock(closure: ()->()) -> Int{
var info = mach_timebase_info(numer: 0, denom: 0)
mach_timebase_info(&info)
let begin = mach_absolute_time()
closure()
let diff = Double(mach_absolute_time() - begin) * Double(info.numer) / Double(1_000_000 * info.denom)
return Int(diff)
}
}
And use it with:
func testExample() {
XCTAssertTrue( 500 < self.timeBlock{
doSomethingLong()
})
}

Resources