how to manage EINTR errno in sem_timedwait - pthreads

Can you help me to understand why it is recommended to use:
while ((s = sem_timedwait(&sem, &ts)) == -1 && errno == EINTR)
continue; // Restart when interrupted by handler
(EINTR: The call was interrupted by a signal handler)
Instead of simply:
s = sem_timedwait(&sem, &ts);
In witch cases I have to manage EINTR ?

The loop will cause the system call to be restarted if a signal is caught during the execution of the system call, so it will not go on to the next statement unless the system call has either succeeded or failed (with some other error). Otherwise the thread will continue execution with the next statement when the system call is interrupted by a signal handler.
For example, if you want to be able to abort this sem_timedwait() by sending a particular signal to the thread, then you would not want to unconditionally restart the system call. Instead you may want to mark that the operation was aborted and clean up. If you have multiple signal handlers, the signal handler can set a flag which can be checked when EINTR is encountered in order to determine whether to restart the system call.
This only matters if the thread catches any signals using a signal handler, and the sigaction() SA_RESTART flag was not used to automatically restart any interrupted system call. However, if you are not using any signal handlers and did not intend for your code to be affected by a signal handler, it is still good practice to use the loop so that it will continue to work as you intended even if your code is later used in the same program as other code which uses signal handlers for unrelated purposes.

Related

Why is it a good idea to hold a pthread mutex when signaling or broadcasting a condition? [duplicate]

I read somewhere that we should lock the mutex before calling pthread_cond_signal and unlock the mutex after calling it:
The pthread_cond_signal() routine is
used to signal (or wake up) another
thread which is waiting on the
condition variable. It should be
called after mutex is locked, and must
unlock mutex in order for
pthread_cond_wait() routine to
complete.
My question is: isn't it OK to call pthread_cond_signal or pthread_cond_broadcast methods without locking the mutex?
If you do not lock the mutex in the codepath that changes the condition and signals, you can lose wakeups. Consider this pair of processes:
Process A:
pthread_mutex_lock(&mutex);
while (condition == FALSE)
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);
Process B (incorrect):
condition = TRUE;
pthread_cond_signal(&cond);
Then consider this possible interleaving of instructions, where condition starts out as FALSE:
Process A Process B
pthread_mutex_lock(&mutex);
while (condition == FALSE)
condition = TRUE;
pthread_cond_signal(&cond);
pthread_cond_wait(&cond, &mutex);
The condition is now TRUE, but Process A is stuck waiting on the condition variable - it missed the wakeup signal. If we alter Process B to lock the mutex:
Process B (correct):
pthread_mutex_lock(&mutex);
condition = TRUE;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
...then the above cannot occur; the wakeup will never be missed.
(Note that you can actually move the pthread_cond_signal() itself after the pthread_mutex_unlock(), but this can result in less optimal scheduling of threads, and you've necessarily locked the mutex already in this code path due to changing the condition itself).
According to this manual :
The pthread_cond_broadcast() or
pthread_cond_signal() functions
may be called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait()
or pthread_cond_timedwait() have
associated with the condition variable
during their waits; however, if
predictable scheduling behavior is
required, then that mutex shall be
locked by the thread calling
pthread_cond_broadcast() or
pthread_cond_signal().
The meaning of the predictable scheduling behavior statement was explained by Dave Butenhof (author of Programming with POSIX Threads) on comp.programming.threads and is available here.
caf, in your sample code, Process B modifies condition without locking the mutex first. If Process B simply locked the mutex during that modification, and then still unlocked the mutex before calling pthread_cond_signal, there would be no problem --- am I right about that?
I believe intuitively that caf's position is correct: calling pthread_cond_signal without owning the mutex lock is a Bad Idea. But caf's example is not actually evidence in support of this position; it's simply evidence in support of the much weaker (practically self-evident) position that it is a Bad Idea to modify shared state protected by a mutex unless you have locked that mutex first.
Can anyone provide some sample code in which calling pthread_cond_signal followed by pthread_mutex_unlock yields correct behavior, but calling pthread_mutex_unlock followed by pthread_cond_signal yields incorrect behavior?

ReactiveCocoa 4: How to send error to an observer without interrupting the signal

let (signal, sink) = Signal<[CLBeacon], BeaconManagerError>.pipe()
When I call this because the user disabled the Bluetooth:
sendError(self.sink, error)
the Signal is interrupted and I don't receive more next nor interrupted events after enabling the Bluetooth back again. The Signal is broken.
How can I send error types to the observer without interrupting / breaking the Signal? I can't find in the RAC 4 documentation. Thanks!
By design, an error causes the signal to finish. The documentation says:
failures should only be used to represent “abnormal” termination. If
it is important to let operators (or consumers) finish their work, a
Next event describing the result might be more appropriate.
If you want to turn errors into Next events, you can use flatMapError operator as described here or use retry if you want to allow only several occurances of the error.
If you want different semantics than Next* (Error|Completed) I recommend encoding that in the type. You can use a Signal that can't fail, but which values can be either success or failure, by using Result:
Signal<Result<[CLBeacon], BeaconManagerError>, NoError>
That signal will emit no errors, but its Next events will be Result.Success<[CLBeacon]> or Result.Failure<BeaconManagerError>, **and the signal won't terminate when receiving a Result.Failure.

pthread_cancel when using mutexes an conditional variables

Hello I have an question about cancelling a thread that uses mutexes and conditional variables. The thread has cancel type deferred. When I use only the functions pthread_mutex_lock/unlock and pthread_cond_wait, and a cancel request arrives, the thread's cancelation point is only pthread_cond_wait. Will it lock the mutex or not? I am not sure, if thread always leaves the mutex unlock. Or are the pthread_mutex_lock/unlock functions also cancellation points? Thank you.
I doubt that I can phrase this better than the documentation:
A condition wait (whether timed or not) is a cancellation point. When
the cancelability type of a thread is set to PTHREAD_CANCEL_DEFERRED,
a side effect of acting upon a cancellation request while in a
condition wait is that the mutex is (in effect) re-acquired before
calling the first cancellation cleanup handler. The effect is as if
the thread were unblocked, allowed to execute up to the point of
returning from the call to pthread_cond_timedwait() or
pthread_cond_wait(), but at that point notices the cancellation
request and instead of returning to the caller of
pthread_cond_timedwait() or pthread_cond_wait(), starts the thread
cancellation activities, which includes calling cancellation cleanup
handlers.
Also just be sure you are aware that other functions are cancellation points as well.
If you have multiple threads waiting on pthread_cond_wait and you call pthread_cancel before sending out some of kind of broadcast using pthread_cond_broadcast or pthread_cond_signal, then it may lead to a deadlock where the pthread_cond_wait is waiting on thread to cancel and thread is waiting on the pthread_cond_wait to come out.
Per "The Linux Programming Interface" book pages 667-678
the mutex will be locked. So you can use
pthread_mutex_lock(&mutex);
pthread_cleanup_push((void (*)(void *))pthread_mutex_unlock, &mutex);
while (var == 0)) {
pthread_cond_wait(&cond, &mutex);
}
pthread_cleanup_pop(1);

Difference between pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self())

When pthread_exit(PTHREAD_CANCELED) is called I have expected behavior (stack unwinding, destructors calls) but the call to pthread_cancel(pthread_self()) just terminated the thread.
Why pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self()) differ significantly and the thread memory is not released in the later case?
The background is as follows:
The calls are made from a signal handler and reasoning behind this strange approach is to cancel a thread waiting for the external library semop() to complete (looping around on EINTR I suppose)
I have noticed that calling pthread_cancel from other thread does not work (as if semop was not a cancellation point) but signalling the thread and then calling pthread_exit works but calls the destructor within a signal handler.
pthread_cancel could postpone the action to the next cancellation point.
In terms of thread specific clean-up behaviour there should be no difference between cancelling a thread via pthread_cancel() and exiting a thread via pthread_exit().
POSIX says:
[...] When the cancellation is acted on, the cancellation clean-up handlers for thread shall be called. When the last cancellation clean-up handler returns, the thread-specific data destructor functions shall be called for thread. When the last destructor function returns, thread shall be terminated.
From Linux's man pthread_cancel:
When a cancellation requested is acted on, the following steps occur for thread (in this order):
Cancellation clean-up handlers are popped (in the reverse of the order in which they were pushed) and called. (See pthread_cleanup_push(3).)
Thread-specific data destructors are called, in an unspecified order. (See pthread_key_create(3).)
The thread is terminated. (See pthread_exit(3).)
Referring the strategy to introduce a cancellation point by signalling a thread, I have my doubts this were the cleanest way.
As many system calls return on receiving a signal while setting errno to EINTR, it would be easy to catch this case and simply let the thread end itself cleanly under this condition via pthread_exit().
Some pseudo code:
while (some condition)
{
if (-1 == semop(...))
{ /* getting here on error or signal reception */
if (EINTR == errno)
{ /* getting here on signal reception */
pthread_exit(...);
}
}
}
Turned out that there is no difference.
However some interesting side effects took place.
Operations on std::iostream especially cerr/cout include cancellation points. When the underlying operation is canceled the stream is marked as not good. So you will get no output from any other thread if only one has discovered cancellation on an attempt to print.
So play with pthread_setcancelstate() and pthread_testcancel() or just call cerr.clear() when needed.
Applies to C++ streams only, stderr,stdin seems not be affected.
First of all, there are two things associated to thread which will tell what to do when you call pthread_cancel().
1. pthread_setcancelstate
2. pthread_setcanceltype
first function will tell whether that particular thread can be cancelled or not, and the second function tells when and how that thread should be cancelled, for example, should that thread be terminated as soon as you send cancellation request or it need to wait till that thread reaches some milestone before getting terminated.
when you call pthread_cancel(), thread wont be terminated directly, above two actions will be performed, i.e., checking whether that thread can be cancelled or not, and if yes, when to cancel.
if you disable cancel state, then pthread_cancel() can't terminate that thread, but the cancellation request will stay in a queue waiting for that thread to become cancellable, i.e., at some point of time if you are enabling cancel state, then your cancel request will start working on terminating that thread
whereas if you use pthread_exit(), then the thread will be terminated irrespective to the cancel state and cancel type of that particular thread.
*this is one of the differences between pthread_exit() and pthread_cancel(), there can be few more.

Handing off a piece of work to a thread and waiting for it to accept

My application works as follows:
the worker-threads initialize and begin waiting in pthread_cond_wait()
the main thread connects to DB and starts handing over one row at a time to the proper worker
Because of the DB-driver internals, the next row can not be read until the current one is extracted, so the main thread has to wait for the worker to "accept" the row.
I achieve this by calling pthread_cond_wait() inside the main thread -- waiting for a pthread_signal() from the worker. This works cleanly -- on both Linux and FreeBSD -- but usually takes much longer on Linux. Whereas I consistently process the entire 1.6M rows in about 27 seconds on FreeBSD, on Linux it usually takes over 2 minutes. Except sometimes the Linux box shows the same time...
The code is compiled from the same source and the program talks to the same DB-server. If anything, the Linux box is located on the same LAN as the DB, whereas the FreeBSD machine connects via VPN (so it should be a bit slower). But it is the wide inconsistency of the Linux results that bothers me, and I suspect the thread-coordination...
Here is what I have now:
MAIN THREAD WORKER
--------------------------------------------------------------------------
get new row
figure out, which worker it belongs to lock my mutex
lock the worker's mutex go into pthread_cond_wait
signal the worker extract the row's data
unlock the worker's mutex signal the main thread
go into pthread_cond_wait unlock the mutex
go on back to getting the next row go on to process the row's data
Is there a better way? Thanks!
If reading the next row must be serial anyway, why are you delegating this to the worker? As the main thread has to wait anyway, have the main thread do the extraction and have the hand-off occur as soon as the row has been sufficiently extracted that the master can proceed to the next row.
Other than that, you will need to provide code, as your description is incomplete, as would be any question of this nature submitted without code.
It looks like your problem is that you are calling pthread_cond_wait() without the mutex locked in the main thread. This means that there's a race-condition: if the worker thread wakes up, extracts the data and signals the condition before the parent executes pthread_cond_wait(), the wakeup will be lost.
What you should have is some shared state paired with the condition variable, like this:
Main Thread:
get_new_row();
worker = decide_worker();
pthread_mutex_lock(&mutex);
/* Signal worker that data is available */
flag[worker] = 1;
pthread_cond_signal(&cond);
/* Wait for worker to extract it */
while (flag[worker] == 1)
pthread_cond_wait(&cond, &mutex):
pthread_mutex_unlock(&mutex);
Worker Thread:
pthread_mutex_lock(&mutex);
/* Wait for data to be available */
while (flag[worker] == 0)
pthread_cond_wait(&cond, &mutex):
extract_row_data();
/* Signal main thread that extraction is complete */
flag[worker] = 0;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

Resources