Why POSIX standardize semaphore as system call but leave mutex and condition variable to Pthread (user level) - pthreads

I came up with this strange question which haunted me. Why POSIX standardize support for semaphore as syscall but leave condition variable and mutex to pthread library?
What's the division of responsibility here? Why semaphore is not standardized in Pthread package? Why the syscall for synchronization that POSIX standardize is semaphore but not mutex, condition variable?
Don't know. Guess performance is the concern for not implementing mutex as syscall. (Atomic hardware instructions are unprivileged so implementing them at user level is possible. Even though Linux provide futex, it is actually trying to optimize spin lock into two phase lock, towards sleep lock). And the reason for semaphore is that semaphore can be manipulated by different process, compared to the fact that mutex can only be unlocked by the process that hold it? Semaphore's V operation allows process waiting for it unblocked. So semaphore is kept by kernel, and semaphore's id is like the file descriptor, a capability given out by kernel, which makes it a syscall but not purely user level package.
But what about condition variable? Any reason to specify it in Pthread but not syscall level? Because it is stateless and originates from monitor, which is purely stateless programming construct, so it can be implemented using mutex?
Thanks!

Short answer: semaphores and pthreads have separate histories.
Yes: semaphores can be used between processes where pthreads stuff is (generally) all within the current process, or between processes which share memory.
From a performance perspective: a quick poke (on my x86_64) tells me that sem_wait() and sem_post() use straightforward lock cmpxchg instructions, doing syscall only to suspend/wake-up a thread. That is essentially the same as a pthread_mutex_t -- when the semaphore is used as a mutex.
Obviously a semaphore can do things that a mutex and a condition variable do not do, and you can use unnamed semaphores within a process -- sem_init() with pshared=0.
I guess the pthread developers decided that specifying a pthread_sema_t would be unnecessary duplication. Sadly, it does leave room for doubt that the (more general) semaphore might have performance issues even when used only within a process :-( Or, indeed, some doubt that semaphore and pthread stuff always play nicely together :-(

Related

Why is there only limited usage of thread pools in TensorFlow-Federated?

TFF's threading libraries start a new thread from ThreadRun by default, and the only usage (as of TFF 0.42.0) of the optional ThreadPool parameter is in the implementation of a single executor. Why is this the case?
After conferring with some people who were close to the implementation, the understanding we came to was:
The issue with totally general usage of thread pools in TFF is generally that if used incorrectly, we may be courting deadlock. We need FIFO scheduling in the thread pool itself, and FIFO-compatible usage in the runtime (if you need the result of a computation, you need to know it will be started before you start).
When implementing the first usages of thread pools in the TF executor, we reasoned ourselves to believing the following statement is true: at the leaf executors (that is, so long as an executor doesnt have any children), this FIFO-compatible programming is guaranteed by the stateful executor interface. That is, if you need a value, you know it has already been created (otherwise the executor wouldn't be able to resolve it), so as long as the thread pool is FIFO, it will be ready before you execute. Either the creating function already pushed a function onto this FIFO queue, or just created the value directly, so you can push yourself onto the FIFO queue no sweat.
Due to difficulty, we haven't really tried to reason too hard about how / whether we might be able to make similar statements about executors which have children (and these children may be pushing work onto the queue; AFAIK we dont really currently make any guarantees about how we do this, but i could imagine reasoning about a similar invariant step-by-step 'up the stack'). Thus we have only considered it safe so far to inject thread pool usage at leaf executors. The fact that we don't have this in the XLAExecutor yet is simply due to lack of use.

Does pthread mutex guarantee starvation freedom?

Background
I often stumble open cases where the order of lock acquisitions must be the same as real-time order of lock attempts. Those cases are usually about semaphore like locks.
Theory
From what I read in "The Art of Multiprocessor Programming", deadlock freedom and first-come-first-served guarantees are sufficient to make the lock starvation free. Deadlock freedom seems to be on the users since they have to remember to unlock properly. I have looked at possible types of mutexes provided on the pthreads manual page, but it doesn't seem to mention any ordering on lock acquisitions.
Question
Does pthread mutex guarantee starvation freedom? Are there implementations that do (I'm mainly concerned about linux family and macOS)? Are semaphore guaranteed the same properties as mutex?

Synchronizing forked processes with pthread_mutex in C

Is it possible to use mutex from pthread.h to synchronize processes created with fork() from unistd.h? Afaik, both in the end are using system call clone().
I am asking it in the scope of shared memory segment (from ipc.h, shm.h) with critical data, which should be protected against concurrent writes from different processes. In that memory then semaphores can be defined and later used in different processes. Why couldn't mutexes be used instead of semaphores?
Why am I asking?
First of all I was told that it won't work, without receiving any explanation for that. On the Internet I was not able to find any answer so I decided to ask here.
Second, forked process is safer than thread created with pthread_create - if forked process crashes, the rest of the program continues to work and if thread crashes then whole program exits.
Third, mutexes seem to be more human-friendly than semaphores in managing.

How to convert synchronous blocking shared memory model code to asynchronous coroutines running on thread pool?

While there are lots of solutions matching my question partially, I'd like to know if a complete match exists. It's hard to find a complete solution because of these partial ones occupying search results. This should be a runtime framework and (optionally) a transformation required to source language code when the language doesn't support coroutines.
There are libraries like lthread having lthread_cond_wait() API, but every lthread is bounded by a single pthread. I'd like lightweight threads to be able to run in several pthreads. They should be arbitrary picked by thread pool. Either single-threaded schedulers or global lock schedulers don't match. I think we can do better.
lthreads is also not an option because it neither involves source code transformation nor avoids it like protothreads.
Several green-threading runtimes (Erlang, Limbo) don't match because they are limited to CSP (communicating sequential processes) model only, but I'd like to have shared memory model synchronization primitives as well: mutexes, condition variables, rwlocks.
Transformation involves:
Transforming stack contexts into objects in heap
Transforming mutex calls into manipulating disabling and activating jobs on thread pool and publish-subscribe
Condition variables should also be transformed into publish-subscribe realtionships
It would be nice to have Ada-style rendezvous
I failed to do straightforward runtime implementation due to potential deadlocks in publish-subscribe mechanism without using global lock or single scheduler thread, but I still think this is possible.
Disclaimer: lthread author.
You can launch several pthreads and run an lthread scheduler in each one (this is done automagically by calling lthread_run() in the pthread function). This way each pthread will run a bunch of lthreads.

Why is `pthread_mutex_lock` needed when `pthread_mutex_trylock` is there?

pthread_mutex_trylock detects deadlocks, doesn't block, then why would you even "need" pthread_mutex_lock?
Perhaps when you deliberately want the thread to block? But in that case it may result in a deadlock?
pthread_mutex_trylock does not detect deadlocks.
You can use it to avoid deadlocks but you have to do that by wrapping your own code around it, effectively multiple calls to pthread_mutex_trylock in a loop with a time-out, after which your thread releases all its resources.
In any case, you can avoid deadlocks even with pthread_mutex_lock if you just follow the simple rule that all threads allocate resources in the same order.
You use pthread_mutex_lock if you just want to efficiently wait until the resource is available, without having to spin on the mutex, something which is often very inefficient. Properly designed multi-threaded applications have no need for the pthread_mutex_trylock variant.
Locks should only be held for the absolute minimum time to do the work and, if that's too long, you can generally redesign things so the lock time is less (such as by using the mutex to only copy data to a thread's local data areas, and having the long-running bit work on that after the mutex is released).
The pseudo-code:
while not pthread_mutex_trylock:
yield
will continue to run your thread, waiting for the lock to be available, especially since there is no pthread_yield() in POSIX threads (though it's sometimes provided as a non-portable extension).
That means, at worst, the code segment above won't even be able to portably yield the CPU, therefore chewing up the rest of it's quantum every time through the scheduler cycle.
And at best, it will still activate the thread once per scheduler cycle just to see if the mutex can be obtained.
Whereas:
pthread_mutex_lock
will most likely totally pause your thread until the lock is made available, since it will move it to a waiting queue until the current lock holder releases the mutex.
That's probably the major reason why you should prefer pthread_mutex_lock to pthread_mutex_trylock.
Perhaps when you deliberately want the thread to block?
Yup, exactly in this case. But you can mimic pthread_mutex_lock() behavior with something like that
while(pthread_mutex_trylock(&mtx))
pthread_yield()

Resources