A simple method for a "master" thread to monitor "slave" threads - pthreads

As my first real foray into using pthreads, I'm looking to adapt an already written app of mine to use threads.
The paradigm I have in mind is basically to have one "master" thread which iterates through a list of data items to be processed, launching a new thread for each, with MAX_THREADS threads running at any given time (until the number of remaining tasks is less than this), each of which perform the same task on a single data element within a list.
The master thread needs to be aware of whenever any thread has completed its task and returned (or pthread_exit()'ed), immediately launching a new thread to perform the next task in the list.
What I'm wondering about is what are people's preferred methods for working with such a design? Data considerations aside, what would be the simplest set of pthreads functions to use to accomplish this? Obviously, pthread_join() is out as a means for "checking up" on threads.
Early experiments have been using a struct, passed as the final argument to pthread_create(), which contains an element called "running" which the thread sets to true on startup and resets just before returning. The master thread simply checks the current value of this struct element for each thread in a loop.
Here are the data the program uses for thread management:
typedef struct thread_args_struct
{
char *data; /* the data item the thread will be working on */
int index; /* thread's index in the array of threads */
int thread_id; /* thread's actual integer id */
int running; /* boolean status */
int retval; /* value to pass back from thread on return */
} thread_args_t;
/*
* array of threads (only used for thread creation here, not referenced
* otherwise)
*/
pthread_t thread[MAX_THREADS];
/*
* array of argument structs
*
* a pointer to the thread's argument struct will be passed to it on creation,
* and the thread will place its return value in the appropriate struct element
* before returning/exiting
*/
thread_args_t thread_args[MAX_THREADS];
Does this seem like a sound design? Is there a better, more standardized method for monitoring threads' running/exited status, a more "pthreads-y" way? I'm looking to use the simplest, clearest, cleanest mechanism possible which won't lead to any unexpected complications.
Thanks for any feedback.

There isn't so much a "pthreads-y" way as a (generic) multi-threading way. There is nothing wrong with what you have but it is more complicated and inefficient than it need be.
A more standard design is to use a thread pool. The master thread spawns a bunch of worker threads that read a queue. The master puts work in the queue and all the workers take a shot at processing the work in the queue. This eliminates the need to constantly start and terminate threads (though more sophisticated pools can have some mechanism to increase/decrease the pool size based on the work load). If the threads have to return data or status information they can use an output queue (maybe just a pointer to the actual data) that the master can read.
This still leaves the issue of how to get rid of the threads when you are done processing. Again, it is a master-worker relationship so it is advised that the master tell the slaves to shut themselves down. This amounts to using some program switch (such as you currently have), employing a condition variable somewhere, sending a signal, or cancelling the thread. There are a lot questions (and good answers) on this topic here.

Related

IOS application does not render due to infinite while loop

Hello I am implementing a primitive echo server for ios written purely in c. The issue arises when I enter an infinite while loop in order to accept incoming connections. Where is the best place to put the accept loop and is an infinite while loop (in my case all I want) the best implementation.
Here is the flow
Set root controller inside application function inside app delegate.
call start server function inside controller. (This function never returns)
while(true)
{
#autoreleasepool
{
int exchangeSocket = accept(socket, NULL,NULL);
if(recv(exchangeSocket, buffer, sizeof(buffer), 0) == -1)
{
NSLog(#"%#", #"Error");
}
else
{
//do something with data received
}
}
}
There are at least two concerns with your approach. First, you should not run an infinite loop on the main thread since that thread is already used for the main run loop. Blocking this loop with your own infinite loop (no matter where) will disrupt event processing. This is the answer to why your application does not render. The reference specifies:
Lengthy tasks (or potentially length tasks) should always be performed on a background thread. Any tasks involving network access, file access, or large amounts of data processing should all be performed asynchronously using GCD or operation objects
However, iOS does allow you to create threads using POSIX interfaces, e.g. it offers pthread_create and friends. You should be able to run your code in such a thread without blocking your app.
Second, (but perhaps less of interest to you) using the POSIX networking APIs is somewhat discouraged (due to failure of those APIs to turn on radio) in favor of other interfaces. What's closest to your liking may be CFSocket (also a C interface).

#synchronized block versus GCD dispatch_async()

Essentially, I have a set of data in an NSDictionary, but for convenience I'm setting up some NSArrays with the data sorted and filtered in a few different ways. The data will be coming in via different threads (blocks), and I want to make sure there is only one block at a time modifying my data store.
I went through the trouble of setting up a dispatch queue this afternoon, and then randomly stumbled onto a post about #synchronized that made it seem like pretty much exactly what I want to be doing.
So what I have right now is...
// a property on my object
#property (assign) dispatch_queue_t matchSortingQueue;
// in my object init
_sortingQueue = dispatch_queue_create("com.asdf.matchSortingQueue", NULL);
// then later...
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
dispatch_async(_sortingQueue, ^{
// do stuff...
});
}
And my question is, could I just replace all of this with the following?
- (void)sortArrayIntoLocalStore:(NSArray*)matches
{
#synchronized (self) {
// do stuff...
};
}
...And what's the difference between the two anyway? What should I be considering?
Although the functional difference might not matter much to you, it's what you'd expect: if you #synchronize then the thread you're on is blocked until it can get exclusive execution. If you dispatch to a serial dispatch queue asynchronously then the calling thread can get on with other things and whatever it is you're actually doing will always occur on the same, known queue.
So they're equivalent for ensuring that a third resource is used from only one queue at a time.
Dispatching could be a better idea if, say, you had a resource that is accessed by the user interface from the main queue and you wanted to mutate it. Then your user interface code doesn't need explicitly to #synchronize, hiding the complexity of your threading scheme within the object quite naturally. Dispatching will also be a better idea if you've got a central actor that can trigger several of these changes on other different actors; that'll allow them to operate concurrently.
Synchronising is more compact and a lot easier to step debug. If what you're doing tends to be two or three lines and you'd need to dispatch it synchronously anyway then it feels like going to the effort of creating a queue isn't worth it — especially when you consider the implicit costs of creating a block and moving it over onto the heap.
In the second case you would block the calling thread until "do stuff" was done. Using queues and dispatch_async you will not block the calling thread. This would be particularly important if you call sortArrayIntoLocalStore from the UI thread.

How to use GCD for lightweight transactional locking of resources?

I'm trying to use GCD as a replacement for dozens of atomic properties. I remember at WWDC they were talking about that GCD could be used for efficient transactional locking mechanisms.
In my OpenGL ES runloop method I put all drawing code in a block executed by dispatch_sync on a custom created serial queue. The runloop is called by a CADisplayLink which is to my knowledge happening on the main thread.
There are ivars and properties which are used both for drawing but also for controlling what will be drawn. The problem is that there must be some locking in place to prevent concurrency problems, and a way of transactionally querying and modifying the state of the OpenGL ES scene from the main thread between two drawn frames.
I can modify a group of properties in a transactional way with GCD by executing a block on that serial queue.
But it seems I can't read values into the main thread, using GCD, while blocking the queue that executes the drawing code. dispatch_synch doesn't have a return value, but I want to get access to presentation values exactly between the drawing of two frames both for reading and writing.
Is it this barrier thing they were talking about? How does that work?
This is what the async writer / sync reader model was designed to accomplish. Let's say you have an ivar (and for purpose of discussion let's assume that you've gone a wee bit further and encapsulated all your ivars into a single structure, just for simplicity's sake:
struct {
int x, y;
char *n;
dispatch_queue_t _internalQueue;
} myIvars;
Let's further assume (for brevity) that you've initialized the ivars in a dispatch_once() and created the _internalQueue as a serial queue with dispatch_queue_create() earlier in the code.
Now, to write a value:
dispatch_async(myIvars._internalQueue, ^{ myIvars.x = 10; });
dispatch_async(myIvars._internalQueue, ^{ myIvars.n = "Hi there"; });
And to read one:
__block int val; __block char *v;
dispatch_sync(myIvars._internalQueue, ^{ val = myIvars.x; });
dispatch_sync(myIvars._internalQueue, ^{ v = myIvars.n; })
Using the internal queue makes sure everything is appropriately serialized and that writes can happen asynchronously but reads wait for all pending writes to complete before giving you back the value. A lot of "GCD aware" data structures (or routines that have internal data structures) incorporate serial queues as implementation details for just this purpose.
dispatch_sync allows you to specify a second argument as completion block where you can get the values from your serial queue and use them on your main thread.So it would look something like
dispatch_sync(serialQueue,^{
//execute a block
dispatch_async(get_dispatch_main_queue,^{
//use your calculations here
});
});
And serial queues handle the concurrency part themselves. So if another piece is trying to access the same code at the same time it will be handled by the queue itself.Hope this was of little help.

is there a way that the synchronized keyword doesn't block the main thread

Imagine you want to do many thing in the background of an iOS application but you code it properly so that you create threads (for example using GCD) do execute this background activity.
Now what if you need at some point to write update a variable but this update can occur or any of the threads you created.
You obviously want to protect that variable and you can use the keyword #synchronized to create the locks for you but here is the catch (extract from the Apple documentation)
The #synchronized() directive locks a section of code for use by a
single thread. Other threads are blocked until the thread exits the
protected code—that is, when execution continues past the last
statement in the #synchronized() block.
So that means if you synchronized an object and two threads are writing it at the same time, even the main thread will block until both threads are done writing their data.
An example of code that will showcase all this:
// Create the background queue
dispatch_queue_t queue = dispatch_queue_create("synchronized_example", NULL);
// Start working in new thread
dispatch_async(queue, ^
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
});
// won’t actually go away until queue is empty
dispatch_release(queue);
So the question is fairly simple: How to overcome this ? How can we securely add a locks on all the threads EXCEPT the main thread which, we know, doesn't need to be blocked in that case ?
EDIT FOR CLARIFICATION
As you some of you commented, it does seem logical (and this was clearly what I thought at first when using synchronized) that only two the threads that are trying to acquire the lock should block until they are both done.
However, tested in a real situation, this doesn't seem to be the case and the main thread seems to also suffer from the lock.
I use this mechanism to log things in separate threads so that the UI is not blocked. But when I do intense logging, the UI (main thread) is clearly highly impacted (scrolling is not as smooth).
So two options here: Either the background tasks are too heavy that even the main thread gets impacted (which I doubt), or the synchronized also blocks the main thread while performing the lock operations (which I'm starting reconsidering).
I'll dig a little further using the Time Profiler.
I believe you are misunderstanding the following sentence that you quote from the Apple documentation:
Other threads are blocked until the thread exits the protected code...
This does not mean that all threads are blocked, it just means all threads that are trying to synchronise on the same object (the _sharedResource in your example) are blocked.
The following quote is taken from Apple's Thread Programming Guide, which makes it clear that only threads that synchronise on the same object are blocked.
The object passed to the #synchronized directive is a unique identifier used to distinguish the protected block. If you execute the preceding method in two different threads, passing a different object for the anObj parameter on each thread, each would take its lock and continue processing without being blocked by the other. If you pass the same object in both cases, however, one of the threads would acquire the lock first and the other would block until the first thread completed the critical section.
Update: If your background threads are impacting the performance of your interface then you might want to consider putting some sleeps into the background threads. This should allow the main thread some time to update the UI.
I realise you are using GCD but, for example, NSThread has a couple of methods that will suspend the thread, e.g. -sleepForTimeInterval:. In GCD you can probably just call sleep().
Alternatively, you might also want to look at changing the thread priority to a lower priority. Again, NSThread has the setThreadPriority: for this purpose. In GCD, I believe you would just use a low priority queue for the dispatched blocks.
I'm not sure if I understood you correctly, #synchronize doesn't block all threads but only the ones that want to execute the code inside of the block. So the solution probably is; Don't execute the code on the main thread.
If you simply want to avoid having the main thread acquire the lock, you can do this (and wreck havoc):
dispatch_async(queue, ^
{
if(![NSThread isMainThread])
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
}
else
[self writeComplexDataOnLocalFile];
});

pthread_singal awakes more than one thread on a multiprocessor system

This is an excerpt from pthread_cond_wait man page
Some implementations, particularly on a multi-processor, may sometimes cause multiple threads to wake up when the condition variable is signaled simultaneously on different processors.
In general, whenever a condition wait
returns, the thread has to re-evaluate
the predicate associated with the
condition wait to determine whether it
can safely proceed, should wait again,
or should declare a timeout. My
Question: here what is the meaning of
predicate?Does it mean that I need to
create one more variable apart from
the conditional variable provided in
pthread_cond_wait or does it referring to the same variable which
has been provided in
pthread_cond_wait
Yes, you need an additional variable like int done_flag; to use like this:
pthread_mutex_lock(&mutex);
while (!done_flag) pthread_cond_wait(&cond, &mutex);
/* do something that needs the lock held */
pthread_mutex_unlock(&mutex);
/* do some other stuff that doesn't need the lock held */
Of course it often might not be a flag but rather a count, or some other type of variable, with a more complicated condition to test.
This might be useful. You can use pthread_kill to wake a particular thread.
sigset_t _fSigMask; // global sigmask
We do this before creating our threads. Threads inherit their mask from the thread that creates them. We use SIGUSR1 to signal our threads. Other signals are available.
sigemptyset(&_fSigMask);
sigaddset(&_fSigMask, SIGUSR1);
sigaddset(&_fSigMask, SIGSEGV);
Then to sleep a thread
int nSig;
sigwait(&fSigMask, &nSig);
Then to wake the thread, YourThread.
pthread_kill(YourThread, SIGUSR1);
By the way, during our testing, sleeping and waking our threads this way was about 40x faster than using condition variables.

Resources