Why is `pthread_mutex_lock` needed when `pthread_mutex_trylock` is there? - pthreads

pthread_mutex_trylock detects deadlocks, doesn't block, then why would you even "need" pthread_mutex_lock?
Perhaps when you deliberately want the thread to block? But in that case it may result in a deadlock?

pthread_mutex_trylock does not detect deadlocks.
You can use it to avoid deadlocks but you have to do that by wrapping your own code around it, effectively multiple calls to pthread_mutex_trylock in a loop with a time-out, after which your thread releases all its resources.
In any case, you can avoid deadlocks even with pthread_mutex_lock if you just follow the simple rule that all threads allocate resources in the same order.
You use pthread_mutex_lock if you just want to efficiently wait until the resource is available, without having to spin on the mutex, something which is often very inefficient. Properly designed multi-threaded applications have no need for the pthread_mutex_trylock variant.
Locks should only be held for the absolute minimum time to do the work and, if that's too long, you can generally redesign things so the lock time is less (such as by using the mutex to only copy data to a thread's local data areas, and having the long-running bit work on that after the mutex is released).
The pseudo-code:
while not pthread_mutex_trylock:
yield
will continue to run your thread, waiting for the lock to be available, especially since there is no pthread_yield() in POSIX threads (though it's sometimes provided as a non-portable extension).
That means, at worst, the code segment above won't even be able to portably yield the CPU, therefore chewing up the rest of it's quantum every time through the scheduler cycle.
And at best, it will still activate the thread once per scheduler cycle just to see if the mutex can be obtained.
Whereas:
pthread_mutex_lock
will most likely totally pause your thread until the lock is made available, since it will move it to a waiting queue until the current lock holder releases the mutex.
That's probably the major reason why you should prefer pthread_mutex_lock to pthread_mutex_trylock.

Perhaps when you deliberately want the thread to block?
Yup, exactly in this case. But you can mimic pthread_mutex_lock() behavior with something like that
while(pthread_mutex_trylock(&mtx))
pthread_yield()

Related

Why is there only limited usage of thread pools in TensorFlow-Federated?

TFF's threading libraries start a new thread from ThreadRun by default, and the only usage (as of TFF 0.42.0) of the optional ThreadPool parameter is in the implementation of a single executor. Why is this the case?
After conferring with some people who were close to the implementation, the understanding we came to was:
The issue with totally general usage of thread pools in TFF is generally that if used incorrectly, we may be courting deadlock. We need FIFO scheduling in the thread pool itself, and FIFO-compatible usage in the runtime (if you need the result of a computation, you need to know it will be started before you start).
When implementing the first usages of thread pools in the TF executor, we reasoned ourselves to believing the following statement is true: at the leaf executors (that is, so long as an executor doesnt have any children), this FIFO-compatible programming is guaranteed by the stateful executor interface. That is, if you need a value, you know it has already been created (otherwise the executor wouldn't be able to resolve it), so as long as the thread pool is FIFO, it will be ready before you execute. Either the creating function already pushed a function onto this FIFO queue, or just created the value directly, so you can push yourself onto the FIFO queue no sweat.
Due to difficulty, we haven't really tried to reason too hard about how / whether we might be able to make similar statements about executors which have children (and these children may be pushing work onto the queue; AFAIK we dont really currently make any guarantees about how we do this, but i could imagine reasoning about a similar invariant step-by-step 'up the stack'). Thus we have only considered it safe so far to inject thread pool usage at leaf executors. The fact that we don't have this in the XLAExecutor yet is simply due to lack of use.

Grand Central Dispatch: What happens when queues get overloaded?

Pretty simple question that I haven't found anywhere in documentation or tutorials on GCD: What happens if I'm submitting work to queues faster than it's being processed and removed? I'm aware that GCD queues have no size limit, would work just pile up until the program runs out of memory? Is there any way to properly handle this situation?
What happens if I'm submitting work to queues faster than it's being processed and removed?
It depends.
If dispatching tasks to a single/shared serial queue, they will just be added to the queue and it will process them in a FIFO manner. No problem. Memory is your only constraint.
If dispatching tasks to a concurrent queue, though, you end up with “thread explosion”, and you will quickly exhaust the limited number of worker threads available for that quality-of-service (QoS). This can result in unpredictable behaviors should the OS need to avail itself of a queue of the same QoS. Thus, you must be very careful to avoid this thread explosion.
See a discussion on thread explosion WWDC 2015 Building Responsive and Efficient Apps with GCD and again in WWDC 2016 Concurrent Programming With GCD in Swift 3.
Is there any way to properly handle this situation?
It is hard to answer that in the abstract. Different situations call for different solutions.
In the case of thread explosion, the solution is to constrain the degree of concurrency using concurrentPerform (limiting the concurrency to the number of cores on your device). Or we use operation queues and their maxConcurrentOperationCount to limit the degree of concurrency to something reasonable. There are other patterns, too, but the idea is to constrain concurrency to something suitable for the device in question.
But if you're just dispatching a large number of tasks to a serial queue, there's not much you can do (other than looking for parallelism opportunities, to make efficient use of all of CPU’s cores). But that's OK, as that is the whole purpose of a queue, to let it perform tasks in the order they were submitted, even if the queue can't keep up. It wouldn’t be a “queue” if it didn’t follow this FIFO sort of pattern.
Now if dealing with real-time data that cannot be processed quickly enough, you have a different problem. In that case, you might want to decouple the capture of the input from the processing and decide how to you want to handle it. E.g. if you can't keep up with real-time processing of a video, for example, you have a choice. Either you start dropping frames or process the data asynchronously/later. You just have to decide what is right for your use case. We cannot answer this question in the abstract.

Why POSIX standardize semaphore as system call but leave mutex and condition variable to Pthread (user level)

I came up with this strange question which haunted me. Why POSIX standardize support for semaphore as syscall but leave condition variable and mutex to pthread library?
What's the division of responsibility here? Why semaphore is not standardized in Pthread package? Why the syscall for synchronization that POSIX standardize is semaphore but not mutex, condition variable?
Don't know. Guess performance is the concern for not implementing mutex as syscall. (Atomic hardware instructions are unprivileged so implementing them at user level is possible. Even though Linux provide futex, it is actually trying to optimize spin lock into two phase lock, towards sleep lock). And the reason for semaphore is that semaphore can be manipulated by different process, compared to the fact that mutex can only be unlocked by the process that hold it? Semaphore's V operation allows process waiting for it unblocked. So semaphore is kept by kernel, and semaphore's id is like the file descriptor, a capability given out by kernel, which makes it a syscall but not purely user level package.
But what about condition variable? Any reason to specify it in Pthread but not syscall level? Because it is stateless and originates from monitor, which is purely stateless programming construct, so it can be implemented using mutex?
Thanks!
Short answer: semaphores and pthreads have separate histories.
Yes: semaphores can be used between processes where pthreads stuff is (generally) all within the current process, or between processes which share memory.
From a performance perspective: a quick poke (on my x86_64) tells me that sem_wait() and sem_post() use straightforward lock cmpxchg instructions, doing syscall only to suspend/wake-up a thread. That is essentially the same as a pthread_mutex_t -- when the semaphore is used as a mutex.
Obviously a semaphore can do things that a mutex and a condition variable do not do, and you can use unnamed semaphores within a process -- sem_init() with pshared=0.
I guess the pthread developers decided that specifying a pthread_sema_t would be unnecessary duplication. Sadly, it does leave room for doubt that the (more general) semaphore might have performance issues even when used only within a process :-( Or, indeed, some doubt that semaphore and pthread stuff always play nicely together :-(

GCD vs #synchronized vs NSLock

Can someone give a rundown of the benefits and drawbacks of these 3 systems in how they relate to thread safety?
From watching more recent WWDC videos, I get the feeling that Apple is pushing the usage of GCD to create performant reader-writer that are thread safe.
What's the idea/backing behind this? Is it the time to access a lock having to enter the kernel that leads to this GCD push, and shying away from #synchronized and NSLock?
Are #synchronized and NSLock being pushed out of what would be considered best practice, or is there still a place for them?
There are many many details that could be discussed at great length in regards to this. But, at the core:
These always require a lock to be taken somewhere or somehow:
#synchronized(...) { ... }
[lock lock];
Locks are very expensive for the reasons you mention; they necessarily consume kernel resources. (The #synchronized() case actually may avoid kernel locks these days, but it is a hash based exclusion mechanism and that, in itself, is expensive).
And these do not always require a lock (but sometimes maybe do):
dispatch_sync(...concurrent q...., ^{ ... });
dispatch_async(...queue of any kind...., ^{ ... });
There is a fast path through the dispatch functions that are effectively lockless (though they will use test-and-set atomic primitives that can cause performance issues under load).
The end result is that a synchronous dispatch to a concurrent queue can effectively be treated as "execute this on this thread right now". A synchronous dispatch to a serial queue can do the atomic test-and-set to test if the queue is processing, mark it as busy, and, if it wasn't busy, execute the block on the calling thread immediately.
Asynchronous dispatches can be similarly as fast, though asynchronous dispatch requires copying the block (which can be very cheap, but something to think about).
In general, GCD can do anything a lock can do, can do it at least -- if not more -- efficiently, and you can use the GCD APIs to go way beyond just simple locking (using a semaphore as a computation throttle, for example).
BTW: If your tasks are relatively coarse grained, have a look at NSOperationQueue and NSOperation.

Delphi - What happens with un-freed (but terminated) thread when application exits?

I have multithreaded application and I've got a little problem when application ends: I can correctly terminate the thread by calling TThread.Terminate method in Form1.OnDestroy event handler, but the termination does take some time and so I can't free the memory (by TThread.Free method).
Unfortunately for some other reason I must have TThread.FreeOnTerminate property set to false, so the thread object isn't destroyed automatically after thread termination.
My question is probably a little silly and I should have known it a long time ago, but is this ok and the thread will be destroyed automatically (since the application just ends), or is it a problem and the memory would be "lost"? Thanks a lot for explanation.
You should wait for the thread to terminate before you begin the process off shutting down the rest of your application, otherwise shared resources may be freed under the threads feet, possibly leading to a string of access violations. After you have waited for thread termination, then you can free it. In fact, that's what the TThread destructor does for you.
If there are no shared resources, then sure, let it die by itself. Even if the thread terminates after the main thread, all that is required is that all your threads exit for the program to terminate. Any memory associated with the thread's object will just get cleaned up and given back to the OS with everything else.
BUT, be careful! If your thread is taking a while to exit, it can lead to a zombie process sitting there churning away without a GUI. That is why it is very important to check the Terminated flag very often in the thread loop, and exit the thread.
N#
Your question is not silly or simple - read the MSDN article. All in all, if you want to be on the safe side you are better to wait a background thread to terminate before exiting an application.
The thread will eventually terminate and Windows will clean up any memory left over. However, you might as well just wait for the thread to terminate, because that is exactly what Windows will do anyway. Your application may appear to have shut down because all windows may have been closed/hidden, but the application process won't terminate until all threads have finished...
When a process terminates the OS will reclaim all allocated memory and will close all open handles. You don't need to worry about MEMORY*) that leaks in the very special event of shutting down the application. The OS will also close all your open handles**), at least theoretically. All those taken into account, it might be safe for you to simply terminate your thread (using TerminateThread(MyThread.Handle)) from your forms destructor, before killing other shared resources. Ask yourself those questions:
What's the thread doing? Is it safe to terminate it at any time? Example: If the thread is doing any writing to disk, it's unsafe to just kill it, because you might live files on disk in an inconsistant state.
Are you using any resources that aren't automatically freed by Windows? Can't think of an good example here...
If you're on the safe side with both, you can use TerminateThread and not wait for the thread to naturally terminate. A safer approach might a combined approach, maybe you should give the thread a chance to naturally terminate and, if it didn't terminate withing 5 seconds, force-terminate it.
*) I'm talking about memory you can prove only leaks on process termination, like threads you kill without giving them a chance to properly shut down, or global singleton classes you don't free. All other unaccounted memory needs to be tracked down and fixed, because it's an bug.
**) Unfortunately the Windows OS is not bug-free. Example: Anyone that worked with serial devices on the Windows platform knows how easy it is to get the serial-device in a "locked" state, requiring an restart to get it working again. Technically that's also an Handle, end-processing the application that locked it should unlock it.
why you dont increment a variable when creating the thread, and on the destroy event wait until thread finish, decrement the variable, and on applicationterminate just do Application.processmessages ?
why your thread isn't freeonterminate=true ? all shared resources can be handled into a critical section.
best regards,

Resources