Use of serialized target queues for Concurrent queues in iOS - ios

I was going through this excellent blog post
(http://www.humancode.us/2014/08/14/target-queues.html)
of target threads in iOS and I could not help but wonder why do we need such a mechanism. In the example, we are specifying a serialised target queue for a custom concurrent queue. Can we not achieve the same by executing the blocks in the original concurrent queue in a serialised queue instead?
Whats the point of having a serialised target queue for a concurrent queue????

If I got You right, you're asking why would someone start serial task on a concurrent queue.
You would need that kind of behaviour in case, if most tasks with some resource can be performed concurrently (aka, simultaneously), but some tasks are, by nature, unsafe to be performed concurrently with others.
The most common example is readers/writers problem. Here you are accessing, for example, some resource of a file system. It's ok to read it even from different threads - every reader will get exactly what it needs. But here comes necessity to update contents of that file. Modifying it while someone reads it leads to unpredicted results - reader is not guaranteed to get the right, expected, info (partially from old version, partially from new). Even worse - there can be two writers (if file contents changes by application user and from some central storage via net) - result will be some crazy mix of two versions (actually, it can be now even corrupted)
Here comes necessity for each writer to wait till all other tasks performed (no one reads, no one writes), and for each reader to wait until no writing tasks take place (no one writes, no matter how many readers)
Wikipedia has nice article on this one. I haven't run into any other practical situations, where you would need this, but I believe there're more of them.
Hope it answers your question

Related

How resilient is modern Rails to the antipattern "thread + fork"?

I think this is a popular antipattern that happens either standalone, for example activeJob local task with async, or coming from controllers, because then the strategy of the server must be taken into account.
My question is, what cautions should one take in the code when forking inside a thread (think inside of a ActiveJob task) and then even threading it?
The main worries I have seen online are:
Needs to lose and reopen the database connections after the fork. It seems that nowadays activeRecord takes care of it, doesn't it?
Access to the common Logger could be complicated. Somehow it seems to work.
Concurrent was expected to be problematic too but current versions are patched to detect that a fork has happened and threads are dead. Still it seems that one needs to make sure of doing, at the end of the forked process, a fine shutdown of any Rails::Concurrent pool that could have active or pending jobs. I think that it is enough
ActiveJob::Base.queue_adapter.shutdown
but perhaps it could miss some tasks that have not started or tasks under other Concurrent queue. In fact I think it already happens if one uses Concurrent::Future in a controller managed by the puma webserver. Generically I try to insert
Concurrent::global_io_executor.shutdown
Concurrent::global_io_executor.wait_for_termination
Extra problems I have found are resource-related: the postgres server is not ready to manage so many connections by default. Perhaps it could be sensible to reduce the size of the connection pool before the fork. And the inotify watcher gem also exhausts resource, when launched in development. Production is fine in both cases.
TL;DR; - I'm against doing it but many of us do it anyway and ignore the fact that it's unsafe... things break too rarely.
It is a simple fact that calling fork in a multi-threaded process may cause the new child to crash / deadlock / spin and may also cause other (harder to isolate) bugs.
This has nothing to do with Ruby, this is related to the locking mechanisms that safeguard critical sections and core process functionality such opening/closing files, allocating memory and any user created mutex / spinlock, etc'.
Why is it risky?
When calling fork the new process inherits all the state of the previous process but only the thread that called fork (all other threads do not exist in the new process).
This means that if any of the other threads was inside a critical section (i.e., allocating memory, opening a file, etc'), that critical section would remain locked for the lifetime of the new process, possibly causing deadlocks or unexpected errors.
Why do we ignore it?
In practical terms, the risk of something seriously breaking is often very low and most developers hadn't both encounter the issue and recognized its cause. Open files can be manually (if not automatically) closed, which leaves us mostly with the question of critical sections.
We can often reset our own critical sections which leaves mostly the system's critical sections...
The system's core critical sections that can be effected by fork are not that many. The main one is the memory allocator which can hardly ever break. Often the malloc implementation has multiple "arenas", each with its own critical section and it would be a long-shot to hit the system's underlying page allocation (i.e., mmap).
So is it safe?
No. Things still break, they just break rarely and when they do it isn't always obvious. Also, a parent process can sometime catch some of these errors and retry / recuperate and there are other ways to handle the risks.
Should I do it?
I wouldn't recommend to do it, but it depends. If you can handle an error, sure, go ahead. If not, that's a no.
Anyway, it's usually much better to use IPC to forward a message to a background process so that process perform any required fork / task.
The pattern can occur naturally when a Rails controller is combined with a webserver. The situation is different depending if the webserver is threaded, forked or evented, but the final conclusion is the same; that it is safe.
Fork + fork and thread + fork should not present problems of multiple access to the database or multiple execution of the same code, as only the current thread is active in the children.
Event + fork could be a source of troubles if the event machine is still active in the forked thread. Fortunately most designs generate a separate thread for the control of the event loop.

Multi-thread daata access issue, #synchronized & serial queue

As you may have experienced, access none-thread safe variables is a big headache. For iOS one simple solution is to use keyword #synchronized, which will add NSLock to insure the data can be accessed by unique one thread, the disadvantage is as below:
Lock too many will reduce app performance greatly, especially when invoked by main thread.
Dead lock will occur when logic becomes complex.
Based on the above considerations, we prefer to use serial queue to handle, each thread safe critical operation will append to the end of the queue, it is a great solution, but the problem is that all access interfaces should by designed in asyn style, see the following one.
-(id)objectForKey:(NSString *)key;
The people who invoke this class aren't reluctant to design in this way. Anyone who has experience on this field please share and discuss together.
The final solution is using NSUserDefault to store small data, for large cache data put them in file maintained by ourselves.
Per Apple doc the advantage of NSUserDefault is thread safe and will do synchronize work periodically.

NSOperation with dependency on another operation on a different queue does not start

I have dependency graph of operations and I use multiple queues to organize various streams of operations.
E.g. peopleQueue, sitesQueue, sessionQueue
sessionQueue: loginOp, fetchUpdatedAccountOp
peopleQueue: mostFrequentlyManagedClientsOp, remainingClientsOp
sitesQueue: mostFrequentlyAccessedSitesOp, remainingSitesOp
dependencies:
*all* -> loginOp
remainingClientsOp -> mostFrequentlyManagedClientsOp
remainingSitesOp -> mostFrequentlyAccessedSitesOp
The current setup works: after login completes, all the other operations kick off
mostFrequently* is a subset fetch that allows for quick app response, a subsequent op fetches much more data (sometimes in pages) in the background.
Recently I thought I'd add an operation that depended on all the leaf operations.
This latest operation would act as a sentinel to tell me when the graph traversal had completed (firing it would cause on NSNotification post or something). So:
sentinelOp -> remainingClientsOp, remainingSitesOp, fetchUpdatedAccountOp
What I discovered, however, is that even though all its dependencies completed, the sentinel operation never started/fired.
The sentinel, at the time was queued, on the sessionQueue (no particular reason).
After playing around in the debugger, I discovered that I could only get it to fire if the sentinel depended on only operations that were on the same queue.
I finally got the sentinel to run by introducing a 4th queue for just that operation.
The sentinel depends on the other 3 leaf operations in their respective queues and then gets called when they all complete.
I can go with this working model but it really bothers me.
The Apple docs for both mac and iOS suggest that inter-queue dependency should work.
I will need to extend the graph a bit further so it troubles me that using an existing queue for inter-queue dependencies prevents the operation from executing.
Clearly, inter-queue dependencies work to some extent because I got loginOp to be the root dependency for other operations regardless of their queues in the first place.
What am I doing wrong by placing the sentinel operation on one of the existing 3 queues?
I resolved this issue by using only 1 queue. I still can't understand what was wrong with the original implementation but I learned a couple of things that removed the need for multiple queues.
First, it's somewhat straightforward to observe the queues pending operations count using KVO. This is how I was able to do away with the sentinel (see Reference).
Second, I was maintaining several queues to logically separate out related operations. With one queue, I achieved pretty much the same results by composing the operations generation method into helper methods, 1 for each logical unit and then enqueuing all the operations returned by the helper.
I am not sure if there are performance implications to going from 3 queues to 1. As far as I can tell, so long as the operations are concurrent and the queue has no restrictions on current execution, it shouldn't matter whether the operations are distributed among multiple queues or all on the same queue.

Using GCD for offline persistent queue

Right now I have some older code I wrote years ago that allows an iOS app to queue up jobs (sending messages or submitting data to a back-end server, etc...) when the user is offline. When the user comes back online the tasks are run. If the app goes into the background or is terminated the queue is serialized and then loaded back when the app is launched again. I've subclassed NSOperationQueue and my jobs are subclasses of NSOperation. This gives me the flexibility of having a data structure provided for me that I can subclass directly (the operation queue) and by subclassing NSOperation I can easily requeue if my task fails (server is down, etc...).
I will very likely leave this as it is, because if it's not broke don't fix it, right? Also these are very lightweight operations and I don't expect in the current app I'm working on for there to be very many tasks queued at any given time. However I know there is some extra overhead with using NSOperation rather than using GCD directly.
I don't believe I could subclass a dispatch queue the way I can an NSOperationQueue, so there would be extra code overheard for me to maintain my own data structure and load this into & out of a dispatch queue each time the app is sent to the background, right? Also not sure how I'd handle requeueing the job if it fails. Right now if I get a HTTP 500 response from the server, for example, in my operation code I send a notification with a deep copy of the failed NSOperation object. My custom operation queue picks this notification up and adds the task to itself. Not sure how of if I'd be able to do something similar with GCD. I would also need an easy way to cancel all operations or suspend the queue when network connectivity is lost then reactivate when network access is regained.
Just hoping to get some thoughts, opinions and ideas from others who might have done something similar or are more familiar with GCD than I am.
Also worth noting I know there's some new background task support coming in iOS 7 but it will likely be a while before that will be my deployment target. I am also not sure yet if it would exactly do what I need, so at the moment just looking at the possibility of GCD.
Thanks.
If NSOperation vs submitting blocks to GCD ever shows up as measurable overhead, the problem isn't that you're using NSOperation, it's that your operations are far too granular. I would expect this overhead to be effectively unmeasurable in any real-world situation. (Sure, you could contrive a test harness to measure the overhead, but only by making operations that did effectively nothing.)
Use the highest level of abstraction that gets the job done. Move down only when hard data tells you that you should.

iOS - Concurrent access to memory resources

My app downloads several resources from server, data and data descriptors. These downloads, triggered by user actions, can be performed simultaneously, let's say, up to 50 downloads at a time. All these asynchronous tasks end up creating objects in memory, (e.g. appending leaves to data structures, such as adding keys to mutable dictionaries or objects to arrays). My question is: can this cause stability issues? For instance, if several simultaneous tasks try to add keys to the same dictionary, am I supposed to handle the situation, placing some kind of locks? If I implement a for cycle which looks for graphical elements in an array, is it possible that other running tasks might change the array content 'during' the cycle? Any reference or major, general orientation about this multitasking, multithreading issues other than official documentation?
Depends how you are dealing with the downloads - if you are using NSURLConnection it handles the separate threading / concurrency for you and your code is reentrant thus you don't have to worry about simultaneous action.
If you are creating your own threads you potentially have issues.
EDIT:
Your code runs in a main thread (the main run loop), lets say you have an NSURLConnection that is also running then it will run in a separate thread. However your delegate code that deals with events that happen while the connection is in progress runs in your run loop, not in the other thread. This means your code can only ever execute one thing at a time. A connection succeeded method would not get called at the same time as any of your other code. If you had a for loop running then it would block your main thread until it has finished looping, in the meanwhile if the connection finished while the for loop is still running then your delegate code will not execute until after the loop has finished.
You may want to look into Grand Central Dispatch's (GCD) and barrier blocks. Barrier blocks will allow you to do what y oh want in the background and "lock" resources.
Check out the Apple documentation and Mike Ash's blog post here on GCD.
The basic gist is that you use a concurrent queue that you create to perform the reads and use a barrier block to block all access to that resource for writing. good stuff.
Good luck
Tim

Resources