My mutex seems to be unlocked.
My code looks like this (not actual code) (using pthread):
thread
{
int id=...;
//locked aditional mutex _m2
mutex_lock(&_m);
varx=valuex;//irelevant
print("th%d signaling listener",id);
cond_signal(&_c);
print("th%d signaled listener",id);
mutex_unlock(&_m);
//unlocked additional mutex _m2
}
listener
{
tc=0
mutex_lock(&_m);
while(tc<threadcount)
{
cond_wait(&_c,&_m);
print("working");
tc++
work;
}
mutex_unlock(&_m);
}
Normal (predicted )out put:
th0 signaling listener;
working;
th0 signaled listener;
th1 signaling listener;
working;
th1 signaled listener;
My output:
0 signaling listener;
working;
0 signaled listener;
1 signaling listener;
1 signaled listener;
..so the thread skipped (listener does not execute nor locks _m) to printing output
I've profiled it with helgrind (full) and i have no errors but my app stops at listener because according to him he is waiting for all to finish.
Notes:
listener is joinable.
Additional mutex _m2 does not help.
Thread is detached. I have about 800 detached threads to avoid stack problems, max 50 simultaneous using semaphore to limit thread count.
Code worked for 3-4 threads
pthread_cond_signal() does not unlock any mutex. It's not supposed to (no mutex is passed to it). If at least one thread is waiting on the condition variable signalled, that thread will be scheduled when it can re-acquire the mutex that it passed to pthread_cond_wait().
In your case your listener appears to be waiting on a different condition (_s) than the other thread signals (_c).
If you fix that problem, you also have the problem that you don't seem to have any shared state between the thread that waits on the condition variable and the thread(s) that signal it. It looks like your tc counter should actually be a shared variable, protected by the _m mutex. Your threads would then do:
pthread_mutex_lock(&_m);
tc++;
if (tc >= threadcount)
pthread_cond_signal(&_c);
pthread_mutex_unlock(&_m);
and the listener would do:
pthread_mutex_lock(&_m);
while (tc < threadcount)
pthread_mutex_wait(&_c, &_m);
pthread_mutex_unlock(&_m);
The listener would then only continue once all of the threads have hit the signalling code, which seems to be what you've after.
Alternately you could use pthread_barrier_wait(), which appears to be what you're implementing.
Related
Could anyone please suggest how to optimize the application code below using pthread_mutex_lock?
Let me describe the situation:
I have 2 threads sharing a global shared memory variable. The variable shmPtr->status is protected with mutex lock in both functions. Although there is a sleep(1/2) inside the "for loop" in the task1 function, I can't access the shmPtr->status in task2 when required and have to wait until the "for loop" is finished in the task1 function. It takes around 50 seconds for the shmPtr->status to be available for the task2 function.
I am wondering why the task1 function is not releasing the mutex lock despite the sleep(1/2) line. I don't want to wait with processing the shmPtr->status in the task2 function. Please advice.
thr_id1 = pthread_create ( &p_thread1, NULL, (void *)execution_task1, NULL );
thr_id2 = pthread_create ( &p_thread2, NULL, (void *)execution_task2, NULL );
void execution_task1()
{
for(int i = 0;i < 100;i++)
{
//100 lines of application code running here
pthread_mutex_lock(&lock);
shmPtr->status = 1; //shared memory variable
pthread_mutex_unlock(&lock);
sleep(1/2);
}
}
void execution_task2()
{
//100 lines of application code running here
pthread_mutex_lock(&lock);
shmPtr->status = 0; //shared memory variable
pthread_mutex_unlock(&lock);
sleep(1/2);
}
Regards,
NK
I am wondering why the task1 function is not releasing the mutex lock even with an sleep(1/2).
There is no reason to think that the thread running execution_task1() in your example fails to release the mutex, though you would know for sure if you appropriately tested the return value of its pthread_mutex_unlock() call. Rather, the potential problem is that it reacquires the mutex before any other thread contending for it has an opportunity to acquire it.
It seems plausible that a call to sleep(1/2) is ineffective at preventing that. 1/2 is an integer division, evaluating to 0, so you are performing a sleep(0). That probably does not cause the calling thread to suspend at all, and it may not even cause the thread to yield the CPU.
More generally, sleep() is never a good solution for any thread-synchronization problem.
If you are running on a multi-core system, however, and maybe even if you aren't, then freezing out other threads by such a mechanism seems unlikely if the function really does execute a hundred lines of code between releasing the mutex and trying to lock it again. If that's what you think you see then look harder.
If you really do need to force a thread to allow others a chance to acquire the mutex, then you could perhaps set up a fair queueing system as described in Implementing a FIFO mutex in pthreads. For a case such as you dsecribe, however, with one long-running thread needing occasionally to yield to other, quicker tasks, you could consider introducing a condition variable on which that long-running thread can suspend, and an atomic counter by which it can determine whether it should do so:
#include <stdatomic.h>
// ...
pthread_cond_t yield_cv = PTHREAD_COND_INITIALIZER;
_Atomic unsigned int waiting_thread_count = ATOMIC_VAR_INIT(0);
void execution_task1() {
for (int i = 0; i < 100; i++) {
// ...
pthread_mutex_lock(&lock);
if (waiting_thread_count > 0) {
pthread_cond_wait(&yield_cv, &lock);
// spurrious wakeup ok
}
// ... critical section ...
pthread_mutex_unlock(&lock);
}
}
void execution_task2() {
// ...
waiting_thread_count += 1; // Atomic get & increment; safe w/o mutex
pthread_mutex_lock(&lock);
waiting_thread_count -= 1;
pthread_cond_signal(&yield_cv); // no problem if no-one is presently waiting
// ... critical section ...
pthread_mutex_unlock(&lock);
}
Using an atomic counter relieves the program from having to protect that counter with its own mutex, which could just shift the problem instead of solving it. That allows threads to use the counter to signal upcoming attempts to acquire the mutex. This intent is then visible to the other thread, so that it can suspend on the CV to allow the other to acquire the mutex.
The short-running threads then acknowledge acquiring the mutex by decrementing the counter. They must do so before releasing the mutex, else the long-running thread might cycle around, acquire the mutex, and read the counter before it is decremented, thus erroneously blocking on the CV when no additional signal can be expected.
Although CV's can be subject to spurrious wakeup, that does not present a serious problem for this approach. If the long-running thread wakes spurriously from it wait on the CV, then the worst that happens is that it performs one more iteration of its main loop, then waits again.
I am still confused about the NumberOfConcurrentThreads parameter within CreateIoCompletionPort(). I have read and re-read the MSDN dox, but the quote
This value limits the number of runnable threads associated with the
completion port.
still puzzles me.
Question
Let's assume that I specify this value as 4. In this case, does this mean that:
1) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I can then have 4 threads again call this function,
or
2) a thread can call GetQueuedCompletionStatus() (at which point I can allow a further 3 threads to make this call), then as soon as that call returns (i.e. we have a completion packet) I then go on to process that packet. Only when I have finished processing the packet do I then call GetQueuedCompletionStatus(), at which point I can then have 4 threads again call this function.
See my confusion? Its the use of the phrase 'runnable threads'.
I think it might be the latter, because the link above also quotes
If your transaction required a lengthy computation, a larger
concurrency value will allow more threads to run. Each completion
packet may take longer to finish, but more completion packets will be
processed at the same time.
This will ultimately affect how we design servers. Consider a server that receives data from clients, then echoes that data to logging servers. Here is what our thread routine could look like:
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
{
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
{
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
(PULONG_PTR)&PerHandleData, (LPOVERLAPPED*)&PerIoData, INFINITE))
{
// OK, we have 'BytesTransferred' of data in 'PerIoData', process it:
// send the data onto our logging servers, then loop back around
send(...);
}
}
return 0;
}
Now assume I have a four core machine; if I leave NumberOfConcurrentThreads as zero within my call to CreateIoCompletionPort() I will have four threads running ServerWorkerThread(). Fine.
My concern is that the send() call may take a long time due to network traffic. Hence, I could be receiving a load of data from clients that cannot be dequeued because all four threads are taking a long time sending the data on?!
Have I missed the point here?
Update 07.03.2018 (This has now been resolved: see this comment.)
I have 8 threads running on my machine, each one runs the ServerWorkerThread():
DWORD WINAPI ServerWorkerThread(HANDLE hCompletionPort)
{
DWORD BytesTransferred;
CPerHandleData* PerHandleData = nullptr;
CPerOperationData* PerIoData = nullptr;
while (TRUE)
{
if (GetQueuedCompletionStatus(hCompletionPort, &BytesTransferred,
(PULONG_PTR)&PerHandleData, (LPOVERLAPPED*)&PerIoData, INFINITE))
{
switch (PerIoData->Operation)
{
case CPerOperationData::ACCEPT_COMPLETED:
{
// This case is fired when a new connection is made
while (1) {}
}
}
}
I only have one outstanding AcceptEx() call; when that gets filled by a new connection I post another one. I don't wait for data to be received in AcceptEx().
I create my completion port as follows:
CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 4)
Now, because I only allow 4 threads in the completion port, I thought that because I keep the threads busy (i.e. they do not enter a wait state), when I try and make a fifth connection, the completion packet would not be dequeued hence would hang! However this is not the case; I can make 5 or even 6 connections to my server! This shows that I can still dequeue packets even though my maximum allowed number of threads (4) are already running? This is why I am confused!
the completion port - is really KQUEUE object. the NumberOfConcurrentThreads is corresponded to MaximumCount
Maximum number of concurrent threads the queue can satisfy waits for.
from I/O Completion Ports
When the total number of runnable threads associated with the
completion port reaches the concurrency value, the system blocks the
execution of any subsequent threads associated with that completion
port until the number of runnable threads drops below the concurrency
value.
it's bad and not exactly said. when thread call KeRemoveQueue ( GetQueuedCompletionStatus internal call it) system return packet to thread only if Queue->CurrentCount < Queue->MaximumCount even if exist packets in queue. system not blocks any threads of course. from another side look for KiInsertQueue - even if some threads wait on packets - it activated only in case Queue->CurrentCount < Queue->MaximumCount.
also look how and when Queue->CurrentCount is changed. look for KiActivateWaiterQueue (This function is called when the current thread is about to enter a wait state) and KiUnlinkThread. in general - when thread begin wait for any object (or another queue) system call KiActivateWaiterQueue - it decrement CurrentCount and possible (if exist packets in queue and became Queue->CurrentCount < Queue->MaximumCount and threads waited for packets) return packet to wait thread. from another side, when thread stop wait - KiUnlinkThread is called. it increment CurrentCount.
your both variant is wrong. any count of threads can call GetQueuedCompletionStatus(). and system of course not blocks the execution of any subsequent threads. for example - you have queue with MaximumCount = 4. you can queue 10 packets to queue. and call GetQueuedCompletionStatus() from 7 threads in concurrent. but only 4 from it got packets. another will be wait (despite yet 6 packets in queue). if some of threads, which remove packet from queue begin wait - system just unwait and return packet to another thread wait on queue. or if thread (which already previous remove packet from this queue (Thread->Queue == Queue) - so active thread) again call KeRemoveQueue will be Queue->CurrentCount -= 1;
I read somewhere that we should lock the mutex before calling pthread_cond_signal and unlock the mutex after calling it:
The pthread_cond_signal() routine is
used to signal (or wake up) another
thread which is waiting on the
condition variable. It should be
called after mutex is locked, and must
unlock mutex in order for
pthread_cond_wait() routine to
complete.
My question is: isn't it OK to call pthread_cond_signal or pthread_cond_broadcast methods without locking the mutex?
If you do not lock the mutex in the codepath that changes the condition and signals, you can lose wakeups. Consider this pair of processes:
Process A:
pthread_mutex_lock(&mutex);
while (condition == FALSE)
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
Process B (incorrect):
condition = TRUE;
pthread_cond_signal(&cond);
Then consider this possible interleaving of instructions, where condition starts out as FALSE:
Process A Process B
pthread_mutex_lock(&mutex);
while (condition == FALSE)
condition = TRUE;
pthread_cond_signal(&cond);
pthread_cond_wait(&cond, &mutex);
The condition is now TRUE, but Process A is stuck waiting on the condition variable - it missed the wakeup signal. If we alter Process B to lock the mutex:
Process B (correct):
pthread_mutex_lock(&mutex);
condition = TRUE;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
...then the above cannot occur; the wakeup will never be missed.
(Note that you can actually move the pthread_cond_signal() itself after the pthread_mutex_unlock(), but this can result in less optimal scheduling of threads, and you've necessarily locked the mutex already in this code path due to changing the condition itself).
According to this manual :
The pthread_cond_broadcast() or
pthread_cond_signal() functions
may be called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait()
or pthread_cond_timedwait() have
associated with the condition variable
during their waits; however, if
predictable scheduling behavior is
required, then that mutex shall be
locked by the thread calling
pthread_cond_broadcast() or
pthread_cond_signal().
The meaning of the predictable scheduling behavior statement was explained by Dave Butenhof (author of Programming with POSIX Threads) on comp.programming.threads and is available here.
caf, in your sample code, Process B modifies condition without locking the mutex first. If Process B simply locked the mutex during that modification, and then still unlocked the mutex before calling pthread_cond_signal, there would be no problem --- am I right about that?
I believe intuitively that caf's position is correct: calling pthread_cond_signal without owning the mutex lock is a Bad Idea. But caf's example is not actually evidence in support of this position; it's simply evidence in support of the much weaker (practically self-evident) position that it is a Bad Idea to modify shared state protected by a mutex unless you have locked that mutex first.
Can anyone provide some sample code in which calling pthread_cond_signal followed by pthread_mutex_unlock yields correct behavior, but calling pthread_mutex_unlock followed by pthread_cond_signal yields incorrect behavior?
For the following code:
f1()
{
pthread_mutex_lock(&mutex); //LINE1 (thread3 and thread4)
pthread_cond_wait(&cond, &mutex); //LINE2 (thread1 and thread2)
pthread_mutex_unlock(&mutex);
}
f2()
{
pthread_mutex_lock(&mutex);
pthread_cond_signal(&cond); //LINE3 (thread5)
pthread_mutex_unlock(&mutex);
}
Assume thread1 and thread2 are waiting at LINE2, thread3 and thread4 are blocked at LINE1. When thread5 executes LINE3, which threads will run first? thread1 or thread2? thread3 or thread4?
When thread5 signals the condition, either thread1 or thread2, or both, will be released from waiting, and will wait until the mutex can be locked... which won't be until after thread5 unlocks it.
When thread5 then unlocks the mutex, one of the threads waiting to lock the mutex will be able to do so. My reading of POSIX reveals only that the order in which threads waiting to lock will proceed is "under defined", though higher priority threads may be expected to run first. How things are scheduled is largely system dependent.
If you need threads to run in a particular order, then you need to arrange that for yourself.
Before pthread wait we lock using a mutex so that some other code might not try to change the condition variable. wait then unlocks the mutex and waits for the signal.
Say, in some other thread i had locked the same mutex and after that, i had used 'signal'. and then unlock thread.
when signal is done, the waiting thread wakes up and aquires the mutex again.
Thread1 Thread2
{ {
lock(mutex); lock(mutex);
wait(mutex); signal(mutex);
unlock(mutex); unlock(mutex);
} }
Say the three thread one statements are enclosed in a while(1) loop. Then assume that thread2 locks the mutex, signals it, and unlocks the mutex. and then doesn't end, but goes to sleep.
So will the value of the condition variable be changed permanently? If three statements of thread one are running in infinite lop, will it never wait and just find that the signal has been given? When wait call returns, does it set the value of the condition variable back to initial value?
If yes, can I use create,destroy or initialize methods on the variables to set the value back? If yes, how? What exactly do these functions do?
Thanks,
pthread_cond_signal() will always wake at least one thread that is currently waiting on that condition variable in pthread_cond_wait(). If the same thread or a different thread then calls pthread_cond_wait() again, it will block and wait for another signal.
This means that pthread condition variables must always be paired with some kind of shared data, protected by the mutex that is held when calling pthread_cond_wait(). Before calling pthread_cond_wait(), the thread must check the shared data to see if the condition it wants to wait for has occurred - if not, it shouldn't wait.
The simplest example of such shared data might be a global flag. In your example:
int flag = 0;
Thread 1 {
pthread_mutex_lock(&mutex);
while (!flag)
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
}
Thread 2 {
pthread_mutex_lock(&mutex);
flag = 1;
pthread_mutex_signal(&cond);
pthread_mutex_unlock(&mutex);
}
You can see here that when the condition is "reset" is entirely under your control - for example you could have Thread 1 set flag = 0; before it calls pthread_mutex_unlock().
The shared state is often more complex than a simple flag - for example you might have a producer thread call pthread_mutex_wait() while there is no room in a shared buffer.