Does pthread mutex guarantee starvation freedom? - pthreads

Background
I often stumble open cases where the order of lock acquisitions must be the same as real-time order of lock attempts. Those cases are usually about semaphore like locks.
Theory
From what I read in "The Art of Multiprocessor Programming", deadlock freedom and first-come-first-served guarantees are sufficient to make the lock starvation free. Deadlock freedom seems to be on the users since they have to remember to unlock properly. I have looked at possible types of mutexes provided on the pthreads manual page, but it doesn't seem to mention any ordering on lock acquisitions.
Question
Does pthread mutex guarantee starvation freedom? Are there implementations that do (I'm mainly concerned about linux family and macOS)? Are semaphore guaranteed the same properties as mutex?

Related

Grand Central Dispatch: What happens when queues get overloaded?

Pretty simple question that I haven't found anywhere in documentation or tutorials on GCD: What happens if I'm submitting work to queues faster than it's being processed and removed? I'm aware that GCD queues have no size limit, would work just pile up until the program runs out of memory? Is there any way to properly handle this situation?
What happens if I'm submitting work to queues faster than it's being processed and removed?
It depends.
If dispatching tasks to a single/shared serial queue, they will just be added to the queue and it will process them in a FIFO manner. No problem. Memory is your only constraint.
If dispatching tasks to a concurrent queue, though, you end up with “thread explosion”, and you will quickly exhaust the limited number of worker threads available for that quality-of-service (QoS). This can result in unpredictable behaviors should the OS need to avail itself of a queue of the same QoS. Thus, you must be very careful to avoid this thread explosion.
See a discussion on thread explosion WWDC 2015 Building Responsive and Efficient Apps with GCD and again in WWDC 2016 Concurrent Programming With GCD in Swift 3.
Is there any way to properly handle this situation?
It is hard to answer that in the abstract. Different situations call for different solutions.
In the case of thread explosion, the solution is to constrain the degree of concurrency using concurrentPerform (limiting the concurrency to the number of cores on your device). Or we use operation queues and their maxConcurrentOperationCount to limit the degree of concurrency to something reasonable. There are other patterns, too, but the idea is to constrain concurrency to something suitable for the device in question.
But if you're just dispatching a large number of tasks to a serial queue, there's not much you can do (other than looking for parallelism opportunities, to make efficient use of all of CPU’s cores). But that's OK, as that is the whole purpose of a queue, to let it perform tasks in the order they were submitted, even if the queue can't keep up. It wouldn’t be a “queue” if it didn’t follow this FIFO sort of pattern.
Now if dealing with real-time data that cannot be processed quickly enough, you have a different problem. In that case, you might want to decouple the capture of the input from the processing and decide how to you want to handle it. E.g. if you can't keep up with real-time processing of a video, for example, you have a choice. Either you start dropping frames or process the data asynchronously/later. You just have to decide what is right for your use case. We cannot answer this question in the abstract.

Why POSIX standardize semaphore as system call but leave mutex and condition variable to Pthread (user level)

I came up with this strange question which haunted me. Why POSIX standardize support for semaphore as syscall but leave condition variable and mutex to pthread library?
What's the division of responsibility here? Why semaphore is not standardized in Pthread package? Why the syscall for synchronization that POSIX standardize is semaphore but not mutex, condition variable?
Don't know. Guess performance is the concern for not implementing mutex as syscall. (Atomic hardware instructions are unprivileged so implementing them at user level is possible. Even though Linux provide futex, it is actually trying to optimize spin lock into two phase lock, towards sleep lock). And the reason for semaphore is that semaphore can be manipulated by different process, compared to the fact that mutex can only be unlocked by the process that hold it? Semaphore's V operation allows process waiting for it unblocked. So semaphore is kept by kernel, and semaphore's id is like the file descriptor, a capability given out by kernel, which makes it a syscall but not purely user level package.
But what about condition variable? Any reason to specify it in Pthread but not syscall level? Because it is stateless and originates from monitor, which is purely stateless programming construct, so it can be implemented using mutex?
Thanks!
Short answer: semaphores and pthreads have separate histories.
Yes: semaphores can be used between processes where pthreads stuff is (generally) all within the current process, or between processes which share memory.
From a performance perspective: a quick poke (on my x86_64) tells me that sem_wait() and sem_post() use straightforward lock cmpxchg instructions, doing syscall only to suspend/wake-up a thread. That is essentially the same as a pthread_mutex_t -- when the semaphore is used as a mutex.
Obviously a semaphore can do things that a mutex and a condition variable do not do, and you can use unnamed semaphores within a process -- sem_init() with pshared=0.
I guess the pthread developers decided that specifying a pthread_sema_t would be unnecessary duplication. Sadly, it does leave room for doubt that the (more general) semaphore might have performance issues even when used only within a process :-( Or, indeed, some doubt that semaphore and pthread stuff always play nicely together :-(

Synchronizing forked processes with pthread_mutex in C

Is it possible to use mutex from pthread.h to synchronize processes created with fork() from unistd.h? Afaik, both in the end are using system call clone().
I am asking it in the scope of shared memory segment (from ipc.h, shm.h) with critical data, which should be protected against concurrent writes from different processes. In that memory then semaphores can be defined and later used in different processes. Why couldn't mutexes be used instead of semaphores?
Why am I asking?
First of all I was told that it won't work, without receiving any explanation for that. On the Internet I was not able to find any answer so I decided to ask here.
Second, forked process is safer than thread created with pthread_create - if forked process crashes, the rest of the program continues to work and if thread crashes then whole program exits.
Third, mutexes seem to be more human-friendly than semaphores in managing.

Is there a way to limit the number of threads spawned by GCD in my application?

I know that the max number of threads spawned cannot exceed 66 through the response to this question. But is there a way to limit the thread count to a value which an user has defined?
From my experience and work with GCD under various circumstances, I believe this is not possible.
Said that, it is very important to understand, that by using GCD, you spawn queues, not threads. Whenever a call to create a queue is made from your code, GCD subsystem in its turn checks OS condition and seeks for available resources. New threads are then created under the hood based on these conditions – in the order and with the resources allocated, not controlled by you. This is clearly explained in official documentation:
When it comes to adding concurrency to an application, dispatch queues
provide several advantages over threads. The most direct advantage is
the simplicity of the work-queue programming model. With threads, you
have to write code both for the work you want to perform and for the
creation and management of the threads themselves. Dispatch queues let
you focus on the work you actually want to perform without having to
worry about the thread creation and management. Instead, the system
handles all of the thread creation and management for you. The
advantage is that the system is able to manage threads much more
efficiently than any single application ever could. The system can
scale the number of threads dynamically based on the available
resources and current system conditions. In addition, the system is
usually able to start running your task more quickly than you could if
you created the thread yourself.
Source: Dispatch Queues
There is no way you can control resources consumption with GCD, like by setting some kind of threshold. GCD is a high-level abstraction over low-level things, such as threads, and it manages it for you.
The only way you can possibly influence how many resources particular task within your application should take, is by setting its QoS (Quality of Service) class (formerly known simply as priority, extended to a more complex concept). To be brief, you can classify tasks within your application based on their importance, this way helping GCD and your application be more resource- and battery- efficient. Its employment is highly encouraged in complex applications with vast concurrency usage.
Even still, however, this kind of regulation from developer end has its limits and ultimately does not address the goal to control threads creation:
Apps and operations compete to use finite resources—CPU, memory,
network interfaces, and so on. In order to remain responsive and
efficient, the system needs to prioritize tasks and make intelligent
decisions about when to execute them.
Work that directly impacts the user, such as UI updates, is extremely
important and takes precedence over other work that may be occurring
in the background. This higher priority work often uses more energy,
as it may require substantial and immediate access to system
resources.
As a developer, you can help the system prioritize more effectively by
categorizing your app’s work, based on importance. Even if you’ve
implemented other efficiency measures, such as deferring work until an
optimal time, the system still needs to perform some level of
prioritization. Therefore, it is still important to categorize the
work your app performs.
Source: Prioritize Work with Quality of Service Classes
To conclude, if you are deliberate in your intent to control threads, don't use GCD. Use low-level programming techniques and manage them yourself. If you use GCD, then you agree to leave this kind of responsibility to GCD.

Why is `pthread_mutex_lock` needed when `pthread_mutex_trylock` is there?

pthread_mutex_trylock detects deadlocks, doesn't block, then why would you even "need" pthread_mutex_lock?
Perhaps when you deliberately want the thread to block? But in that case it may result in a deadlock?
pthread_mutex_trylock does not detect deadlocks.
You can use it to avoid deadlocks but you have to do that by wrapping your own code around it, effectively multiple calls to pthread_mutex_trylock in a loop with a time-out, after which your thread releases all its resources.
In any case, you can avoid deadlocks even with pthread_mutex_lock if you just follow the simple rule that all threads allocate resources in the same order.
You use pthread_mutex_lock if you just want to efficiently wait until the resource is available, without having to spin on the mutex, something which is often very inefficient. Properly designed multi-threaded applications have no need for the pthread_mutex_trylock variant.
Locks should only be held for the absolute minimum time to do the work and, if that's too long, you can generally redesign things so the lock time is less (such as by using the mutex to only copy data to a thread's local data areas, and having the long-running bit work on that after the mutex is released).
The pseudo-code:
while not pthread_mutex_trylock:
yield
will continue to run your thread, waiting for the lock to be available, especially since there is no pthread_yield() in POSIX threads (though it's sometimes provided as a non-portable extension).
That means, at worst, the code segment above won't even be able to portably yield the CPU, therefore chewing up the rest of it's quantum every time through the scheduler cycle.
And at best, it will still activate the thread once per scheduler cycle just to see if the mutex can be obtained.
Whereas:
pthread_mutex_lock
will most likely totally pause your thread until the lock is made available, since it will move it to a waiting queue until the current lock holder releases the mutex.
That's probably the major reason why you should prefer pthread_mutex_lock to pthread_mutex_trylock.
Perhaps when you deliberately want the thread to block?
Yup, exactly in this case. But you can mimic pthread_mutex_lock() behavior with something like that
while(pthread_mutex_trylock(&mtx))
pthread_yield()

Resources