strange behavior of pthread_cond_destroy() hanging - pthreads

I know that pthread_cancel() is tricky. I ask this question to understand a bug in a software where pthread_cancel() is used.
I simplified the problem to the following code:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
static pthread_mutex_t notify_mutex;
static pthread_cond_t notify;
static void *_watcher_thread(void *arg)
{
(void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
(void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
printf("watcher: thread started\n");
while (1) {
if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) {
perror("failed to disable watcher thread cancel: ");
}
pthread_mutex_lock(&notify_mutex);
pthread_cond_wait(&notify, &notify_mutex);
pthread_mutex_unlock(&notify_mutex);
(void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
return NULL;
}
static void *_timer_thread(void *args)
{
(void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
(void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
printf("timer: thread started\n");
while (1) {
if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) {
perror("failed to disable timer thread cancel: ");
}
pthread_mutex_lock(&notify_mutex); /* XXX: not a cancellation point */
pthread_cond_signal(&notify);
pthread_mutex_unlock(&notify_mutex);
(void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
return NULL;
}
int main(void)
{
pthread_t watcher_tid, timer_tid;
pthread_attr_t attr;
long i = 0;
while (1) {
pthread_cond_init(&notify, NULL);
pthread_mutex_init(&notify_mutex, NULL);
pthread_attr_init(&attr);
if (pthread_create(&watcher_tid, &attr,
&_watcher_thread, NULL)) {
perror("failed to create watcher thread: ");
}
if (pthread_create(&timer_tid, &attr,
&_timer_thread, NULL)) {
perror("failed to create timer thread: ");
}
sleep(1);
printf("main: to cancel watcher thread\n");
pthread_cancel(watcher_tid);
pthread_join(watcher_tid, NULL);
printf("main: watcher thread canceled\n");
printf("main: to cancel timer thread\n");
pthread_cancel(timer_tid);
pthread_join(timer_tid, NULL);
printf("main: timer thread canceled\n");
pthread_cond_destroy(&notify);
pthread_mutex_destroy(&notify_mutex);
pthread_attr_destroy(&attr);
i ++;
printf("iteration: %ld\n", i);
}
return 0;
}
Basically there are three threads: watcher, timer, and main. The timer thread wakes up the watcher thread periodically to do some work. And finally the main thread terminates the other threads and exits. I write some loops in the above test program to reproduce the problem.
Compile and run the program in Linux (debian testing, 4.9.0-3-amd64 #1 SMP, glibc-2.24), it will hang after some iterations:
...
main: to cancel timer thread
main: timer thread canceled
iteration: 4
timer: thread started
watcher: thread started
main: to cancel watcher thread
main: watcher thread canceled
main: to cancel timer thread
main: timer thread canceled
iteration: 5
timer: thread started
watcher: thread started
main: to cancel watcher thread
main: watcher thread canceled
main: to cancel timer thread
main: timer thread canceled
gdb shows the stack trace of the hanging program:
(gdb) attach 29247
Attaching to process 29247
Reading symbols from /home/hjcao/temp/test/pthread/hang1...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f7960708eb5 in pthread_cond_destroy##GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x0000561b1f194f01 in main () at hang1.c:78
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f7960b12700 (LWP 29247) "hang1" 0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb)
=======================================================
My Question is: I do not understand why the main thread will hang in pthread_cond_destroy().
Indeed the original program(with name hang0) does not has the pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) and pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) calls in the while loops in the watcher/timer threads. It will hang in the main thread, which is understandable: asynchronously canceling the watcher/timer thread may result in the thread being canceled during executing in pthread_cond_wait()/pthread_cond_notify() and left the condition variable notify messed up internally. I added the pthread_setcancelstate() calls to prevent the watcher/timer thread from being canceled when manipulating the condition variable. But the new program (with name hang1) still hangs.
Could somebody please help me to explain this?

I think this thread can be of help :
pthread conditions and process termination
(the answer by Gusev Petr helped me fix my issue)
I had the same issue of condition variable hanging at the pthread_cond_destroy() function.
Its mostly because the condition variable doesn't have logic to find out whether the thread on which it has been waiting is still running or dead (normally due to pthread_cancel()). So one possible solution is to force change the value in the variable to 0 as explained in the above link.

Related

C Application with Pthread crashes

i have a problem with the pthread library in a C-Application for Linux.
In my Application a Thread is started over and over again.
But I allways wait until the Thread is finished before starting it.
At some point the thread doesn't start anymore and I get an out of memory error.
The solution I found is to do a pthread_join after the thread has finished.
Can anyone tell me why the Thread doesn't end correctly?
Here is an Example Code, that causes the same Problem.
If the pthread_join isn't called the Process stops at about 380 calls of the Thread:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <pthread.h>
#include <unistd.h>
volatile uint8_t check_p1 = 0;
uint32_t stack_start;
void *thread1(void *ch)
{
static int counter = 0;
int i;
int s[100000];
char stack_end;
srand(time(NULL) + counter);
for (i = 0; i < (sizeof (s)/sizeof(int)); i++) //do something
{
s[i] = rand();
}
counter++;
printf("Thread %i finished. Stacksize: %u\n", counter, ((uint32_t) (stack_start)-(uint32_t) (&stack_end)));
check_p1 = 1; // Mark Thread as finished
return 0;
}
int main(int argc, char *argv[])
{
pthread_t p1;
int counter = 0;
stack_start = (uint32_t)&counter; // save the Address of counter
while (1)
{
counter++;
check_p1 = 0;
printf("Start Thread %i\n", counter);
pthread_create(&p1, NULL, thread1, 0);
while (!check_p1) // wait until thread has finished
{
usleep(100);
}
usleep(1000); // wait a little bit to be really sure that the thread is finished
//pthread_join(p1,0); // crash without pthread_join
}
return 0;
}
The solution I found is to do a pthread_join after the thread has finished.
That is the correct solution. You must do that, or you leak thread resources.
Can anyone tell me why the Thread doesn't end correctly?
It does end correctly, but you must join it in order for the thread library to know: "yes, he is really done with this thread; no need to hold resources any longer".
This is exactly the same reason you must use wait (or waitpid, etc.) in this loop:
while (1) {
int status;
pid_t p = fork();
if (p == 0) exit(0); // child
// parent
wait(&status); // without this wait, you will run out of OS resources.
}

Whats the difference between pthread_join and pthread_mutex_lock?

The following code is taken from this site and it shows how to use mutexes. It implements both pthread_join and pthread_mutex_lock:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *functionC();
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;
main()
{
int rc1, rc2;
pthread_t thread1, thread2;
/* Create independent threads each of which will execute functionC */
if( (rc1=pthread_create( &thread1, NULL, &functionC, NULL)) )
{
printf("Thread creation failed: %d\n", rc1);
}
if( (rc2=pthread_create( &thread2, NULL, &functionC, NULL)) )
{
printf("Thread creation failed: %d\n", rc2);
}
/* Wait till threads are complete before main continues. Unless we */
/* wait we run the risk of executing an exit which will terminate */
/* the process and all threads before the threads have completed. */
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
exit(EXIT_SUCCESS);
}
void *functionC()
{
pthread_mutex_lock( &mutex1 );
counter++;
printf("Counter value: %d\n",counter);
pthread_mutex_unlock( &mutex1 );
}
I ran the code as given above as it is and it produced following result:
Counter value: 1
Counter value: 2
But in the second run i removed "pthread_mutex_lock( &mutex1 );" and "pthread_mutex_unlock( &mutex1 );" . I compiled and ran the code, it again produced the same result.
Now the thing that confuses me is why mutex lock is used in above code when same thing can be done without it (using pthread_join)? If pthread_join prevents another thread from running untill the first one has finished then i think it would already prevent the other thread from accessing the counter value. Whats the purpose of pthread_mutex_lock?
The join prevents the starting thread from running (and thus terminating the process) until thread1 and thread2 finish. It doesn't provide any synchronization between thread1 and thread2. The mutex prevents thread1 from reading the counter while thread2 is modifying it, or vice versa.
Without the mutex, the most obvious thing that could go wrong is that thread1 and thread2 run in perfect synch. They each read zero from the counter, each add one to it, and each output "Counter value: 1".

pthread_kill ends calling program

I am working on Ubuntu 12.04.2 LTS. I have a strange problem with pthread_kill(). The following program ends after writing only "Create thread 0!" to standard output. The program ends with exit status 138.
If I uncomment "usleep(1000);" everything executes properly. Why would this happen?
#include <nslib.h>
void *testthread(void *arg);
int main() {
pthread_t tid[10];
int i;
for(i = 0; i < 10; ++i) {
printf("Create thread %d!\n", i);
Pthread_create(&tid[i], testthread, NULL);
//usleep(1000);
Pthread_kill(tid[i], SIGUSR1);
printf("Joining thread %d!\n", i);
Pthread_join(tid[i]);
printf("Joined %d!", i);
}
return 0;
}
void sighandlertest(int sig) {
printf("print\n");
pthread_exit();
//return NULL;
}
void* testthread(void *arg) {
struct sigaction saction;
memset(&saction, 0, sizeof(struct sigaction));
saction.sa_handler = &sighandlertest;
if(sigaction(SIGUSR1, &saction, NULL) != 0 ) {
fprintf(stderr, "Sigaction failed!\n");
}
printf("Starting while...\n");
while(true) {
}
return 0;
}
If the main thread does not sleep a bit before raising the SIGUSR1, the signal handler for the thread created most propably had not been set up, so the default action for receiving the signal applies, which is ending the process.
Using sleep()s to synchronise threads is not recommended as not guaranteed to be reliable. Use other mechanics here. A condition/mutex pair would be suitable.
Declare a global state variable int signalhandlersetup = 0, protect access to it by a mutex, create the thread, make the main thread wait using pthread_cond_wait(), let the created thread set up the signal handle for SIGUSR1, set signalhandlersetup = 0 and then signal the condition the main thread is waiting on using pthread_signal_cond(). Finally let the main thread call pthread_kill() as by your posting.

pthread_cleanup_pop with argument 0

As per the man page of pthread_cleanup_pop, the cleanup handler will get called if argument of this function is non-zero, else from the cleanup handler installed by the matching pthread_cleanup_push will just get removed.
I am using Ubuntu 3.2.0-32-generic-pae.
But I am seeing that though the parameter is 0, the cleanup handler is getting called.
The thread routine:
void *func_a (void *arg)
{
pthread_t thr_e;
void *status;
pthread_t tid = pthread_self();
printf("[%2d] D: In thread D [%s]\n", my_time(), (char *)thread_name(tid));
pthread_cleanup_push(cleanup, NULL);
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
//sleep(1);
pthread_create(&thr_e, &attr, func_e, NULL);
printf("[%2d] D: Created thread E [%s]\n", my_time(), (char *)thread_name(thr_e));
sleep(20);
printf("[%2d] D: Thread exiting...\n", my_time());
pthread_cleanup_pop(0);
return (void *)55;
}
Cleanup routine:
void
cleanup (void *arg)
{
printf("[%2d] Calling cleanup...\n", my_time());
}
Main thread routine:
int main()
{
......
printf("[%2d] Main: Created thread C [%s]\n", my_time(),
(char *)thread_name(thr_c));
//sleep(20);
printf("[%2d] Main: Cancelling Thread D\n", my_time());
error1 = pthread_cancel(thr_d);
//sleep(1);
printf("[%2d] Main: Calcel status %d, %s, %d\n",
my_time(), error1, (char *)strerror(errno), (int)thr_d);
printf("[%2d] Main; Exiting...\n", my_time());
}
Output is as below:
[ 0] Main: Calcel status 0, Success, -1218630848
[ 0] Main; Exiting...
[ 0] Calling cleanup...
So here why cleanup() is getting called?
Please let me know what is happening here.
Since "D: Thread exiting" is not printed, one can assume that the main thread cancels D between the push and the pop (during sleep in the example code). This causes the cleanup, which hasn't yet been popped, to execute.
As per man page (emphasis added):
When a thread is canceled, all of the stacked clean-up handlers are popped and executed in the reverse of the order in which they were pushed onto the stack.

main() thread versus one created by pthread_create()

I've been creating programs exemplifying concurrency bugs using POSIX threads.
The overall question I have is what is the difference between the main() thread and one created by pthread_create(). My original understanding was that they are pretty much the same but, I'm getting different results from the two programs below.
To expand before showing the code I've written, what I am wondering is: Is there a difference between the following.
int main() {
...
pthread_create(&t1, NULL, worker, NULL);
pthread_create(&t2, NULL, worker, NULL);
...
}
and
int main() {
...
pthread_create(&t1, NULL, worker, NULL);
worker();
...
}
To expand using a full example program. I've made two versions of the same program. They both have the same function worker()
void *worker(void *arg) {
printf("Entered worker function\n");
int myid;
int data = 999;
pthread_mutex_lock(&gidLock);
myid = gid;
gid++;
printf("myid == %d\n", myid);
pthread_mutex_unlock(&gidLock);
if (myid == 0) {
printf("Sleeping since myid == 0\n");
sleep(1);
result = data;
printf("Result updated\n");
}
return NULL;
}
gid and data are globals initialized to 0.
What's the difference between the following main() functions
int main_1() {
pthread_t t1, t2;
int tmp;
/* initialize globals */
gid = 0;
result = 0;
pthread_create(&t1, NULL, worker, NULL);
pthread_create(&t2, NULL, worker, NULL);
pthread_join(t2, NULL);
printf("Parent thread exited worker function\n");
tmp = result;
printf("%d\n", tmp);
pthread_exit((void *) 0);
}
and
int main_2() {
pthread_t t1;
int tmp;
/* initialize globals */
gid = 0;
result = 0;
pthread_create(&t1, NULL, worker, NULL);
worker(NULL);
printf("Parent thread exited worker function\n");
tmp = result;
printf("%d\n", tmp);
pthread_exit((void *) 0);
}
Sample output for main_1()
Entered worker function
myid == 0
Sleeping since myid == 0
Entered worker function
myid == 1
Parent thread exited worker function
0
Result Updated
Sample output for main_2()
Entered worker function
myid == 0
Sleeping since myid == 0
Entered worker function
myid == 1
/* program waits here */
Result updated
Parent thread exited worker function
999
Edit: The program intentionally has a concurrency bug (an atomicity violation). Delays were added by calling sleep() to attempt to force the buggy interleaving to occur. The intention of the program is to be used to test software automatically detecting concurrency bugs.
I would think that main_1() and main_2() are essentially the same program which should result in the same interleaving when run on the same system (or largely the same interleaving; it is indeterminate, but running the same program on the same system only tend explores a small section of the potential scheduling and rarely deviates [1]).
The "desired" output is is that from main_1()
I'm not sure why the thread with myid == 1 stalls and does not return in main_2(). If I had to guess
Thanks for reading this far, and if anyone needs more information I'd be happy to oblige. Here are links to the full source code.
main_1(): https://gist.github.com/2942372
main_2(): https://gist.github.com/2942375
I've been compiling with gcc -pthread -lpthread
Thanks again.
[1] S. Park, S. Lu, Y. Zhou. "CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places"

Resources