Massive parallel computation ios

Massive parallel computation ios - ios

I have a method that performs a mathematical operation repeatedly (possibly millions on times) with different data. What is the best way to do this in iOs (it will run on iPad devices)? I understand that performSelectorOnBackgroundThread is deprecated... ? I also need to aggregate all the results in an NSArray . The best way seems to be: post a notification to the Notification Center and add the method as an observer. Is this correct? The array will need to be declared as atomic, I believe... Plus I will need to show a progress bar as the operations complete... How many threa can I start in parallel ? I don't think starting 1.000.000 threads is such a good idea on an iDevice..
Thanks in advance...

Look into Grand Central Dispatch, it's the preferred way to do multi-threading on iOS (and Mac).
A simple example of using GCD would look like:
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_async(queue, ^{
//do long running task here
}
This will execute a block asynchronously of the main thread. GCD has numerous other ways of dispatching tasks, one taken directly from the Wikipedia article listed above is:
dispatch_apply(count, dispatch_get_global_queue(0, 0), ^(size_t i){
results[i] = do_work(data, i);
});
total = summarize(results, count);
This particular code sample is probably exactly what you're looking for, assuming this "large task" of yours is a embarrassingly parallel.

While you could use dispatch_apply() and spin off all of the runs simultaneously, that'll end up being slower.
You'll want to be able to throttle the # of runs in flight simultaneously with the # of simultaneous computations being something that you'll need to tune.
I've often used a dispatch_semaphore_t to allow for easy tuning of the # of in-flight computations.
Details of doing so are in an answer here: https://stackoverflow.com/a/4535110/25646

Related

iOS GCD custom concurrent queue execution sequence

I have question regarding this issue ,
According to Apple's documents
Concurrent
Concurrent queues (also known as a type of global dispatch queue) execute one or more tasks concurrently, but tasks are still started in the order in which they were added to the queue. The currently executing tasks run on distinct threads that are managed by the dispatch queue. The exact number of tasks executing at any given point is variable and depends on system conditions.
In iOS 5 and later, you can create concurrent dispatch queues yourself by specifying DISPATCH_QUEUE_CONCURRENT as the queue type. In addition, there are four predefined global concurrent queues for your application to use. For more information on how to get the global concurrent queues, see Getting the Global Concurrent Dispatch Queues.
And i do a test, using the sample code ,
dispatch_queue_t concurrentQueue;
concurrentQueue = dispatch_queue_create("com.gcd.concurrentQueue",
DISPATCH_QUEUE_CONCURRENT);
dispatch_async(concurrentQueue, ^{
NSLog(#"First job ");
});
dispatch_async(concurrentQueue, ^{
NSLog(#"Second job");
});
dispatch_async(concurrentQueue, ^{
NSLog(#"Third job ");
});
But the results seems not as the order in which they are added, here is the results,
2015-06-03 18:36:38.114 GooglyPuff[58461:1110680] First job
2015-06-03 18:36:38.114 GooglyPuff[58461:1110682] Third job
2015-06-03 18:36:38.114 GooglyPuff[58461:1110679] Second job
So my question is , shouldn't it be
First, Second , Third ?
Any advice is welcome , and thanks for your help.

"Concurrent" means they run at the same time and no assumptions should be made about where in their progress any of them will be at any given moment and which will finish first. That is the whole meaning and implication of concurrency: between one line of code and the next in one concurrent operation - even during one line of code - anything else from any other concurrent operation might be happening.
So, in answer to your particular question, these tasks may have started in a known order, but that happened very quickly, and after that point their progress is interleaved unpredictably. And your NSLog calls are part of that progress; they do not, and cannot, tell you when the tasks started!

The documentation is correct - they will indeed start in the order you added them to the queue. Once in the queue, they will be started one after the other, but on concurrent threads. The order they will finish is dependent on how long the task will take to execute. Here's a thought experiment, imagine your code was like this instead:
dispatch_async(concurrentQueue, ^{
JobThatTakes_3_SecToExecute(); // Job 1 (3 seconds to execute)
});
dispatch_async(concurrentQueue, ^{
JobThatTakes_2_SecToExecute(); // Job 2 (2 seconds to execute)
});
dispatch_async(concurrentQueue, ^{
JobThatTakes_1_SecToExecute(); // Job 3 (1 second to execute)
});
The overhead in and out of the queue should be very small compared to these job lengths, so you would expect them to finish up in about the time that their task takes to execute. In this case they'd finish roughly 1 second apart starting with Job 3, then 2, then 1. The total time the queue would take to complete will be about the length of Job 1, since it takes the longest to execute. This is lovely, since the total time is set primarily by the longest job, not the sum of the jobs. However, you don't have any say in what order they finish, since that's dictated by the task duration.
Change dispatch_async to dispatch_sync in this example and the queue will take about 6 seconds to complete. They'll come out in this order: Job 1, 2, then 3. This will guarantee that your results come out in the order you wanted, but it will take much longer.
So back to the significance of what the docs mean by "tasks are still started in the order in which they were added to the queue" for concurrent queues. This will be noticeable if your job is resource constrained. Say you're putting a big pile of long duration tasks in a concurrent queue on a 2 CPU machine. It is unlikely you'll be able to run a dozen CPU-pegging tasks concurrently here; some will have to wait while others run. The order that you put them into the queue will decide who gets to run next as resources free up. In your example, the tasks are of super short duration and involve console locking (as Rob mentioned), so queue / locking overhead can mess with your expectations.
Another (probably more important) reason the order of execution in concurrent queues matter is when barriers are used. You may need to run some sort of a task every N other tasks, which is where a barrier would come in handy. The fixed order of execution will assure that the barrier executes after N tasks have completed concurrently, provided you put the barrier in the queue in the right spot.

Concurrent file enumeration

I have to perform a complex operation on a large number of files. Fortunately, enumeration order is not important and the jobs can be done in parallel without locking.
Does the platform provide a way to do this? For lack of a better API, I was thinking of:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
NSArray *paths = [[NSFileManager defaultManager] subpathsAtPath:folder];
[paths enumerateObjectsWithOptions:NSEnumerationConcurrent
usingBlock:^(NSString *path, NSUInteger idx, BOOL *stop) {
// Complex operation
}];
}];
Is there a better way?

Your current code puts one block on the global queue. So, that single block will run on a background thread and do all of the iteration and processing.
You want to do something a bit different to have your processing tasks run concurrently. You should really do the iteration on the main thread and add a block to the global queue on each iteration of the loop.
Better, create an NSOperation subclass. Put your logic there. Create an instance of the operation in the loop and add them to an operation queue. This is a higher level API and offers you options in adding dependencies, tailoring the maximum concurrency, checking the number of operations still to be completed, etc ...

Here's an approach you can consider. If you have (or may have) tens of thousands of files, instead of enumerating with enumerateObjectsWithOptions:usingBlock: you may want to enumerate the array manually in batches (let's say 100 elements each). When the current batch completes execution (you can use dispatch groups to check that) you start the next batch. With this approach you can avoid adding tens of thousands of blocks to the queue.
BTW I've deleted my previous answer, because it was wrong.

Best practice for writing to resource from two different processes in objective-c

I have a general objective-c pattern/practice question relative to a problem I'm trying to solve with my app. I could not find a similar objective-c focused question/answer here, yet.
My app holds a mutable array of objects which I call "Records". The app gathers records and puts them into the that array in one of two ways:
It reads data from a SQLite database available locally within the App's sand box. The read is usually very fast.
It requests data asynchronously from a web service, waits for it to finish then parses the data. The read can be fast, but often it is not.
Sometimes the app reads from the database (1) and requests data from the web service (2) at essentially the same time. It is often the case that (1) will finish before (2) finishes and adding Records to the mutable array does not cause a conflict.
I am worried that at some point my SQLite read process will take a bit longer than expected and it will try to add objects to the mutable array at the exact same time the async request finishes and does the same; or vice-versa. These are edge cases that seem difficult to test for but that surely would make my app crash or at the very least cause issues with my array of records.
I should also point out that the Records are to be merged into the mutable array. For example: if (1) runs first and returns 10 records, then shortly after (2) finishes and returns 5 records, my mutable array will contain all 15 records. I'm combining the data rather than overwriting it.
What I want to know is:
Is it safe for me to add objects to the same mutable array instance when the processes, either (1) or (2) finish?
Is there a good pattern/practice to implement for this sort of processing in objective-c?
Does this involve locking access to the mutable array so when (1) is adding objects to it (2) can't add any objects until (1) is done with it?
I appreciate any info you could share.
[EDIT #1]
For posterity, I found this URL to be a great help in understanding how to use NSOperations and an NSOperationQueue. It is a bit out of date, but works, none the less:
http://www.raywenderlich.com/19788/how-to-use-nsoperations-and-nsoperationqueues
Also, It doesn't talk specifically about the problem I'm trying to solve, but the example it uses is practical and easy to understand.
[EDIT #2]
I've decided to go with the approach suggested by danh, where I'll read locally and as needed hit my web service after the local read finished (which should be fast anyway). Taht said, I'm going to try and avoid synchronization issues altogether. Why? Because Apple says so, here:
http://developer.apple.com/library/IOS/#documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-SW8
Avoid Synchronization Altogether
For any new projects you work on, and even for existing projects, designing your code and data structures to avoid the need for synchronization is the best possible solution. Although locks and other synchronization tools are useful, they do impact the performance of any application. And if the overall design causes high contention among specific resources, your threads could be waiting even longer.
The best way to implement concurrency is to reduce the interactions and inter-dependencies between your concurrent tasks. If each task operates on its own private data set, it does not need to protect that data using locks. Even in situations where two tasks do share a common data set, you can look at ways of partitioning that set or providing each task with its own copy. Of course, copying data sets has its costs too, so you have to weigh those costs against the costs of synchronization before making your decision.

Is it safe for me to add objects to the same mutable array instance when the processes, either (1) or (2) finish?
Absolutely not. NSArray, along with the rest of the collection classes, are not synchronized. You can use them in conjunction with some kind of lock when you add and remove objects, but that's definitely a lot slower than just making two arrays (one for each operation), and merging them when they both finish.
Is there a good pattern/practice to implement for this sort of processing in objective-c?
Unfortunately, no. The most you can come up with is tripping a Boolean, or incrementing an integer to a certain number in a common callback. To see what I mean, here's a little pseudo-code:
- (void)someAsyncOpDidFinish:(NSSomeOperation*)op {
finshedOperations++;
if (finshedOperations == 2) {
finshedOperations = 0;
//Both are finished, process
}
}
Does this involve locking access to the mutable array so when (1) is adding objects to it (2) can't add any objects until (1) is done with it?
Yes, see above.

You should either lock around your array modifications, or schedule your modifications in the main thread. The SQL fetch is probably running in the main thread, so in your remote fetch code you could do something like:
dispatch_async(dispatch_get_main_queue(), ^{
[myArray addObject: newThing];
}];
If you are adding a bunch of objects this will be slow since it is putting a new task on the scheduler for each record. You can bunch the records in a separate array in the thread and add the temp array using addObjectsFromArray: if that is the case.

Personally, I'd be inclined to have a concurrent NSOperationQueue and add the two retrieval operations operations, one for the database operation, one for the network operation. I would then have a dedicated serial queue for adding the records to the NSMutableArray, which each of the two concurrent retrieval operations would use to add records to the mutable array. That way you have one queue for adding records, but being fed from the two retrieval operations running on the other, concurrent queue. If you need to know when the two concurrent retrieval operations are done, I'd add a third operation to that concurrent queue, set its dependencies to be the two retrieval operations, which would fire automatically when the two retrieval operations are done.

In addition to the good suggestions above, consider not launching the GET and the sql concurrently.
[self doTheLocalLookupThen:^{
// update the array and ui
[self doTheServerGetThen:^{
// update the array and ui
}];
}];
- (void)doTheLocalLookupThen:(void (^)(void))completion {
if ([self skipTheLocalLookup]) return completion();
// do the local lookup, invoke completion
}
- (void)doTheServerGetThen:(void (^)(void))completion {
if ([self skipTheServerGet]) return completion();
// do the server get, invoke completion
}

NSOperation hierarchy, units of work

So I was wondering what the best way to break out long tasks into NSOperations. If I have 3 long running tasks, is it better to have one NSOperation subclass that basically does something like
Single NSOperation subclass
- (void)main {
// do long running task 1
// do long running task 2
// do long running task 3
// call back the delegate
}
Or is it better to have each task be a subclass of NSOperation, and then manage each task from my ViewController as a single unit of work? Thanks in advance.

It depends whether the operation queue is serial (i.e. max concurrent operations 1) or parallel, and what the nature of the work is. If the queue is serial, then it really doesn't matter. If the queue is parallel, then it depends on a bunch of factors:
is the work safe to do concurrently
does the work contend on a shared resource (such as network or disk IO, or a lock) that would remove the concurrency
is each unit of work sufficiently large to be worth the overhead of dispatching separately
(edit)
Also, if you don't need the advanced features of NSOperationQueue (operation dependencies and priorities, KVO, etc...), consider using dispatch queues instead. They're significantly lighter weight.

How to programmatically control and balance a number of threads iOS app is executing?

How to control and balance the number of threads my app is executing, how to limit their number to avoid app's blocking because thread limit is reached?
Here on SO I saw the following possible answer: "Main concurrent queue (dispatch_get_global_queue) manages the number of threads automatically" which I don't like for the following reason:
Consider the following pattern (in my real app there are both more simple and more complex examples):
dispatch_queue_t defaultBackgroundQueue() {
return dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
}
dispatch_queue_t databaseQueue() {
dispatch_queue_create("Database private queue", 0);
}
dispatch_async(defaultBackgroundQueue(), ^{
[AFNetworkingAsynchronousRequestWithCompletionHandler:^(data){
dispatch_async(databaseQueue(), ^{
// data is about 100-200 elements to parse
for (el in data) {
}
maybe more AFNetworking requests and/or processing in other queues or
dispatch_async(dispatch_get_main_queue(), ^{
// At last! We can do something on UI.
});
});
}];
});
This design very often leads to the situation when:
The app is locked because of threads limit is reached (something like > 64)
the slower and thus narrow queues can be overwhelmed with a large number of pending jobs.
the second one also can produce a cancellation problem - if we have 100 jobs already waiting for execution in a serial queue we can't cancel them at once.
The obvious and dumb solution would be to replace sensitive dispatch_async methods with dispatch_sync, but it is definitely the one I don't like.
What is recommended approach for this kind of situations?
I hope an answer more smart than just "Use NSOperationQueue - it can limit the number of concurrent operations" does exist (similar topic: Number of threads with NSOperationQueueDefaultMaxConcurrentOperationCount).
UPDATE 1: The only decent pattern is see: is to replace all dispatch_async's of blocks to concurrent queues with running these blocks wrapped in NSOperations in NSOperationQueue-based concurrent queues with max operations limit set (in my case maybe also set a max operations limit on the NSOperationQueue-based queue that AFNetworking run all its operations in).

You are starting too many network requests. AFAIK it's not documented anywhere, but you can run up to 6 simultaneous network connections (which is a sensible number considering RFC 2616 8.1.4, paragraph 6). After that you get locking, and GCD compensates creating more threads, which by the way, have a stack space of 512KB each with pages allocated on demand. So yes, use NSOperation for this. I use it to queue network requests, increase the priority when the same object is requested again, pause and serialize to disk if the user leaves. I also monitor the speed of the network requests in bytes/time and change the number of concurrent operations.

While I don't see from your example where exactly you're creating "too many" background threads, I'll just try to answer the question of how to control the exact number of threads per queue. Apple's documentation says:
Concurrent queues (also known as a type of global dispatch queue) execute one or more tasks concurrently, but tasks are still started in the order in which they were added to the queue. The currently executing tasks run on distinct threads that are managed by the dispatch queue. The exact number of tasks executing at any given point is variable and depends on system conditions.
While you can now (since iOS5) create concurrent queues manually, there is no way to control how many jobs will be run concurrently by such a queue. The OS will balance the load automatically. If, for whatever reason, you don't want that, you could for example create a set of n serial queues manually and dispatch new jobs to one of your n queues at random:
NSArray *queues = #[dispatch_queue_create("com.myapp.queue1", 0),dispatch_queue_create("com.myapp.queue2", 0),dispatch_queue_create("com.myapp.queue3", 0)];
NSUInteger randQueue = arc4random() % [queues count];
dispatch_async([queues objectAtIndex:randQueue], ^{
NSLog(#"Do something");
});
randQueue = arc4random() % [queues count];
dispatch_async([queues objectAtIndex:randQueue], ^{
NSLog(#"Do something else");
});
I'm by no means endorsing this design - I think concurrent queues are pretty good at balancing system resources. But since you asked, I think this is a feasible approach.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart