Thread creation using pthread_create with SCHED_RR scheduling fails - pthreads

I try to write some cores for create a pthread with SCHED_RR:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setinheritsched (&attr, PTHREAD_EXPLICIT_SCHED);
pthread_attr_setschedpolicy(&attr, SCHED_RR);
struct sched_param params;
params.sched_priority = 10;
pthread_attr_setschedparam(&attr, &params);
pthread_create(&m_thread, &attr, &startThread, NULL);
pthread_attr_destroy(&attr);
But the thread does't run, do I need set more parameters?

A thread can only set the scheduling to SCHED_OTHER without the CAP_SYS_NICE capability. From sched(7):
In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE)
threads can set a nonzero static priority (i.e., set a real-time
scheduling policy). The only change that an unprivileged thread can
make is to set the SCHED_OTHER policy, and this can be done only if
the effective user ID of the caller matches the real or effective
user ID of the target thread (i.e., the thread specified by pid)
whose policy is being changed.
That means when you set the scheduling policy to round-robin scheduling (SCHED_RR) using pthread_attr_setschedpolicy() it's failed in all likelihood (unless you have enabled this capability for the user you are running as or running the program as sysadmin/root user who can override CAP_SYS_NICE).
You can set the capability using the setcap program:
$ sudo setcap cap_sys_nice=+ep ./a.out
(assuming a.out is your program name).
You'd have figured this out if you did error checking. You should check the return value of all the pthread functions (and generally all the library functions) for failure.
Since you haven't posted the full code, it might be an issue if you haven't joined with the thread you create (as main thread could exit before the m_thread was created and this exit the whole process). So, you might want to join:
pthread_join(m_thread, NULL);
or you could exit main thread without joining if main thread is no longer needed using pthread_exit(NULL); in main().

Related

Why is it a good idea to hold a pthread mutex when signaling or broadcasting a condition? [duplicate]

I read somewhere that we should lock the mutex before calling pthread_cond_signal and unlock the mutex after calling it:
The pthread_cond_signal() routine is
used to signal (or wake up) another
thread which is waiting on the
condition variable. It should be
called after mutex is locked, and must
unlock mutex in order for
pthread_cond_wait() routine to
complete.
My question is: isn't it OK to call pthread_cond_signal or pthread_cond_broadcast methods without locking the mutex?
If you do not lock the mutex in the codepath that changes the condition and signals, you can lose wakeups. Consider this pair of processes:
Process A:
pthread_mutex_lock(&mutex);
while (condition == FALSE)
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
Process B (incorrect):
condition = TRUE;
pthread_cond_signal(&cond);
Then consider this possible interleaving of instructions, where condition starts out as FALSE:
Process A Process B
pthread_mutex_lock(&mutex);
while (condition == FALSE)
condition = TRUE;
pthread_cond_signal(&cond);
pthread_cond_wait(&cond, &mutex);
The condition is now TRUE, but Process A is stuck waiting on the condition variable - it missed the wakeup signal. If we alter Process B to lock the mutex:
Process B (correct):
pthread_mutex_lock(&mutex);
condition = TRUE;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
...then the above cannot occur; the wakeup will never be missed.
(Note that you can actually move the pthread_cond_signal() itself after the pthread_mutex_unlock(), but this can result in less optimal scheduling of threads, and you've necessarily locked the mutex already in this code path due to changing the condition itself).
According to this manual :
The pthread_cond_broadcast() or
pthread_cond_signal() functions
may be called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait()
or pthread_cond_timedwait() have
associated with the condition variable
during their waits; however, if
predictable scheduling behavior is
required, then that mutex shall be
locked by the thread calling
pthread_cond_broadcast() or
pthread_cond_signal().
The meaning of the predictable scheduling behavior statement was explained by Dave Butenhof (author of Programming with POSIX Threads) on comp.programming.threads and is available here.
caf, in your sample code, Process B modifies condition without locking the mutex first. If Process B simply locked the mutex during that modification, and then still unlocked the mutex before calling pthread_cond_signal, there would be no problem --- am I right about that?
I believe intuitively that caf's position is correct: calling pthread_cond_signal without owning the mutex lock is a Bad Idea. But caf's example is not actually evidence in support of this position; it's simply evidence in support of the much weaker (practically self-evident) position that it is a Bad Idea to modify shared state protected by a mutex unless you have locked that mutex first.
Can anyone provide some sample code in which calling pthread_cond_signal followed by pthread_mutex_unlock yields correct behavior, but calling pthread_mutex_unlock followed by pthread_cond_signal yields incorrect behavior?

Handing off a piece of work to a thread and waiting for it to accept

My application works as follows:
the worker-threads initialize and begin waiting in pthread_cond_wait()
the main thread connects to DB and starts handing over one row at a time to the proper worker
Because of the DB-driver internals, the next row can not be read until the current one is extracted, so the main thread has to wait for the worker to "accept" the row.
I achieve this by calling pthread_cond_wait() inside the main thread -- waiting for a pthread_signal() from the worker. This works cleanly -- on both Linux and FreeBSD -- but usually takes much longer on Linux. Whereas I consistently process the entire 1.6M rows in about 27 seconds on FreeBSD, on Linux it usually takes over 2 minutes. Except sometimes the Linux box shows the same time...
The code is compiled from the same source and the program talks to the same DB-server. If anything, the Linux box is located on the same LAN as the DB, whereas the FreeBSD machine connects via VPN (so it should be a bit slower). But it is the wide inconsistency of the Linux results that bothers me, and I suspect the thread-coordination...
Here is what I have now:
MAIN THREAD WORKER
--------------------------------------------------------------------------
get new row
figure out, which worker it belongs to lock my mutex
lock the worker's mutex go into pthread_cond_wait
signal the worker extract the row's data
unlock the worker's mutex signal the main thread
go into pthread_cond_wait unlock the mutex
go on back to getting the next row go on to process the row's data
Is there a better way? Thanks!
If reading the next row must be serial anyway, why are you delegating this to the worker? As the main thread has to wait anyway, have the main thread do the extraction and have the hand-off occur as soon as the row has been sufficiently extracted that the master can proceed to the next row.
Other than that, you will need to provide code, as your description is incomplete, as would be any question of this nature submitted without code.
It looks like your problem is that you are calling pthread_cond_wait() without the mutex locked in the main thread. This means that there's a race-condition: if the worker thread wakes up, extracts the data and signals the condition before the parent executes pthread_cond_wait(), the wakeup will be lost.
What you should have is some shared state paired with the condition variable, like this:
Main Thread:
get_new_row();
worker = decide_worker();
pthread_mutex_lock(&mutex);
/* Signal worker that data is available */
flag[worker] = 1;
pthread_cond_signal(&cond);
/* Wait for worker to extract it */
while (flag[worker] == 1)
pthread_cond_wait(&cond, &mutex):
pthread_mutex_unlock(&mutex);
Worker Thread:
pthread_mutex_lock(&mutex);
/* Wait for data to be available */
while (flag[worker] == 0)
pthread_cond_wait(&cond, &mutex):
extract_row_data();
/* Signal main thread that extraction is complete */
flag[worker] = 0;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

pthread_create and EAGAIN

I got an EAGAIN when trying to spawn a thread using pthread_create. However, from what I've checked, the threads seem to have been terminated properly.
What determines the OS to give EAGAIN when trying to create a thread using pthread_create? Would it be possible that unclosed sockets/file handles play a part in causing this EAGAIN (i.e they share the same resource space)?
And lastly, is there any tool to check resource usage, or any functions that can be used to see how many pthread objects are active at the time?
Okay, found the answer. Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable.
Putting a pthread_join(tid, NULL) in the end did the trick.
edit (was not waitpid, but rather pthread_join)
As a practical matter EAGAIN is almost always related to running out of memory for the process. Often this has to do with the stack size allocated for the thread which you can adjust with pthread_attr_setstacksize(). But there are process limits to how many threads you can run. You can query the hard and soft limits with getrlimit() using RLIMIT_NPROC as the first parameter.
There are quite a few questions here dedicated to keeping track of threads, their number, whether they are dead or alive, etc. Simply put, the easiest way to keep track of them is to do it yourself through some mechanism you code, which can be as simple as incrementing and decrementing a global counter (protected by a mutex) or something more elaborate.
Open sockets or other file descriptors shouldn't cause pthread_create() to fail. If you reached the maximum for descriptors you would have already failed before creating the new thread and the new thread would have already have had to be successfully created to open more of them and thus could not have failed with EAGAIN.
As per my observation if one of the parent process calls pthread_join(), and chilled processes are trying to release the thread by calling pthread_exit() or pthread_cancel() then system is not able to release that thread properly. In that case, if pthread_detach() is call immediately after successful call of pthread_create() then this problem has been solved. A snapshot is here -
err = pthread_create(&(receiveThread), NULL, &receiver, temp);
if (err != 0)
{
MyPrintf("\nCan't create thread Reason : %s\n ",(err==EAGAIN)?"EAGAUIN":(err==EINVAL)?"EINVAL":(err==EPERM)?"EPERM":"UNKNOWN");
free(temp);
}
else
{
threadnumber++;
MyPrintf("Count: %d Thread ID: %u\n",threadnumber,receiveThread);
pthread_detach(receiveThread);
}
Another potential cause: I was getting this problem (EAGAIN on pthread_create) because I had forgotten to call pthread_attr_init on the pthread_attr_t I was trying to initialize my thread with.

Waiting for a pthread_cancel to conclude

I'm using pthreads that don't allocate any local variables. For reasons I won't go into here, I need a pthread_cancel() option, and the threads I'm writing should be able to support it (no resources to clean up, OK to stop execution at any point). At the moment, I have a problem because pthread_cancel returns before the pthread is actually finished running, causing problems for shared resources I want to touch only after thread cancellation.
Is there any way I can know when my pthread has well and truly concluded? Is there perhaps a function for this I haven't found, or a parameter I'm not familiar with?
Would
pthread_cancel(thread_handle);
pthread_join(thread_handle, NULL);
do the trick, or is that not guaranteed (since thread_handle may already be invalid)?
I'm pretty new to pthreads, so best practices welcome (beyond "don't use pthread_cancel()," which I've already learned :P ).
The kernel.org manual page is actually doing it. It's safe.
s = pthread_cancel(thr);
if (s != 0)
handle_error_en(s, "pthread_cancel");
/* Join with thread to see what its exit status was */
s = pthread_join(thr, &res);
if (s != 0)
handle_error_en(s, "pthread_join");
Until you call pthread_join on a joinable thread, its tid remains valid. If the thread is joinable (which it must be for pthread_cancel to be safe), then the thread_handle must still be valid.
If the thread was detached, it wouldn't even be safe to call pthread_cancel. What if the thread terminated just as you called it?

pthreads : pthread_cond_signal() from within critical section

I have the following piece of code in thread A, which blocks using pthread_cond_wait()
pthread_mutex_lock(&my_lock);
if ( false == testCondition )
pthread_cond_wait(&my_wait,&my_lock);
pthread_mutex_unlock(&my_lock);
I have the following piece of code in thread B, which signals thread A
pthread_mutex_lock(&my_lock);
testCondition = true;
pthread_cond_signal(&my_wait);
pthread_mutex_unlock(&my_lock);
Provided there are no other threads, would it make any difference if pthread_cond_signal(&my_wait) is moved out of the critical section block as shown below ?
pthread_mutex_lock(&my_lock);
testCondition = true;
pthread_mutex_unlock(&my_lock);
pthread_cond_signal(&my_wait);
My recommendation is typically to keep the pthread_cond_signal() call inside the locked region, but probably not for the reasons you think.
In most cases, it doesn't really matter whether you call pthread_cond_signal() with the lock held or not. Ben is right that some schedulers may force a context switch when the lock is released if there is another thread waiting, so your thread may get switched away before it can call pthread_cond_signal(). On the other hand, some schedulers will run the waiting thread as soon as you call pthread_cond_signal(), so if you call it with the lock held, the waiting thread will wake up and then go right back to sleep (because it's now blocked on the mutex) until the signaling thread unlocks it. The exact behavior is highly implementation-specific and may change between operating system versions, so it isn't anything you can rely on.
But, all of this looks past what should be your primary concern, which is the readability and correctness of your code. You're not likely to see any real-world performance benefit from this kind of micro-optimization (remember the first rule of optimization: profile first, optimize second). However, it's easier to think about the control flow if you know that the set of waiting threads can't change between the point where you set the condition and send the signal. Otherwise, you have to think about things like "what if thread A sets testCondition=TRUE and releases the lock, and then thread B runs and sees that testCondition is true, so it skips the pthread_cond_wait() and goes on to reset testCondition to FALSE, and then finally thread A runs and calls pthread_cond_signal(), which wakes up thread C because thread B wasn't actually waiting, but testCondition isn't true anymore". This is confusing and can lead to hard-to-diagnose race conditions in your code. For that reason, I think it's better to signal with the lock held; that way, you know that setting the condition and sending the signal are atomic with respect to each other.
On a related note, the way you are calling pthread_cond_wait() is incorrect. It's possible (although rare) for pthread_cond_wait() to return without the condition variable actually being signaled, and there are other cases (for example, the race I described above) where a signal could end up awakening a thread even though the condition isn't true. In order to be safe, you need to put the pthread_cond_wait() call inside a while() loop that tests the condition, so that you call back into pthread_cond_wait() if the condition isn't satisfied after you reacquire the lock. In your example it would look like this:
pthread_mutex_lock(&my_lock);
while ( false == testCondition ) {
pthread_cond_wait(&my_wait,&my_lock);
}
pthread_mutex_unlock(&my_lock);
(I also corrected what was probably a typo in your original example, which is the use of my_mutex for the pthread_cond_wait() call instead of my_lock.)
The thread waiting on the condition variable should keep the mutex locked, and the other thread should always signal with the mutex locked. This way, you know the other thread is waiting on the condition when you send the signal. Otherwise, it's possible the waiting thread won't see the condition being signaled and will block indefinitely waiting on it.
Condition variables are typically used like this:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int go = 0;
void *threadproc(void *data) {
printf("Sending go signal\n");
pthread_mutex_lock(&lock);
go = 1;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&lock);
}
int main(int argc, char *argv[]) {
pthread_t thread;
pthread_mutex_lock(&lock);
printf("Waiting for signal to go\n");
pthread_create(&thread, NULL, &threadproc, NULL);
while(!go) {
pthread_cond_wait(&cond, &lock);
}
printf("We're allowed to go now!\n");
pthread_mutex_unlock(&lock);
pthread_join(thread, NULL);
return 0;
}
This is valid:
void *threadproc(void *data) {
printf("Sending go signal\n");
go = 1;
pthread_cond_signal(&cond);
}
However, consider what's happening in main
while(!go) {
/* Suppose a long delay happens here, during which the signal is sent */
pthread_cond_wait(&cond, &lock);
}
If the delay described by that comment happens, pthread_cond_wait will be left waiting—possibly forever. This is why you want to signal with the mutex locked.
Both are correct, however for reactivity issues, most schedulers give hand to another thread when a lock is released. I you don't signal before unlocking, your waiting thread A is not in the ready list and thous will not be scheduled until B is scheduled again and call pthread_cond_signal().
The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition (which as far as I can tell is the official pthread specification) says this on the matter:
The pthread_cond_broadcast() or pthread_cond_signal() functions may be
called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait() or pthread_cond_timedwait() have
associated with the condition variable during their waits; however, if
predictable scheduling behavior is required, then that mutex shall be
locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal().
To add my personal experience, I was working on an application that had code where the conditional variable was destroyed (and the memory containing it freed) by the thread that was woken up. We found that on a multi-core device (an iPad Air 2) the pthread_cond_signal() could actually crash sometimes if it was outside the mutex lock, as the waiter woke up and destroyed the conditional variable before the pthread_cond_signal had completed. This was quite unexpected.
So I would definitely veer towards the 'signal inside the lock' version, it appears to be safer.
Here is nice write up about the conditional variables: Techniques for Improving the Scalability of Applications Using POSIX Thread Condition Variables (look under 'Avoiding the Mutex Contention' section and point 7)
It says that, the second version may have some performance benefits. Because it makes possible for thread with pthread_cond_wait to wait less frequently.

Resources