proper threadpool using pthreads

proper threadpool using pthreads - pthreads

I am trying to write a customized threadpool suited to my purpose using pthreads, and I am new to pthreads. I read these (POSIX threads programming and Linux Tutorial Posix Threads) tutorials online and they were quite helpful, but i still have some (maybe silly) doubts regarding mutexes and condition variables:
What is the scope of a mutex? Will a global mutex lock all the global variables so that only one thread can access them at a time? If i have two global mutexes, would they lock the same set of variables? What about a mutex that is declared inside a class or a function, what will happen when i lock/unlock it?
If i just plan to just read a global variable, and not modify it at all, should i still use a mutex lock?
If i am correct, a condition variable is used to wake up other threads which are sleeping (or blocked using pthread_cond_wait()) on some condition. The wake up call to sleeping threads is given by pthread_cond_signal() or pthread_cond_broadcast() from some other thread. How is the flow of control supposed to occur so that some all or one thread wake(s) up to do a work and wait until next work is available? I am particularly interested in a scenario with 4 threads.
Is there a way to set the affinity of a thread to a particular processor core before it is created (so that it starts execution on the desired core and no shifting of cores occur after creation)?
I am sorry if the questions look silly, but as i said, i am new to this. Any help, comments, code or pointer to good resources is appreciated. thanks in advance for your help.

That's a lot of questions. A few answers.
(1a) The scope of a mutex is whatever you program it to be. In that sense it is no different from any other kind of variable.
(1b) A global mutex will protect whatever variables you program it to protect. I think from your other questions you might have a fundamental misunderstanding here. There is nothing magical about mutexes. You can't just declare one and say "Ok, protect these variables", you have to incorporate the mutex in your code. So if you have two functions that use variable X and one does a mutex lock/unlock around any changes to the variable and the other function completely ignores that a mutex even exists you really aren't protecting anything. The best example I can think of is advisory file locks - one program can use them but if another doesn't then that file isn't locked.
(1c) As a rule, don't have multiple mutexes locking the same data. It is an invitation to problems. Again the use of mutexes depends on programmed cooperation. If function A is protecting data B with mutex C while function D is protecting data B with mutex E then data B isn't protected at all. Function A can hold the lock on mutex C but since function D pays no attention to it it will just overwrite data B anyway.
(1d) Basic scoping rules apply.
(2) No. If the variable isn't going to change in any way that would make it inconsistent among threads then you don't need to lock it.
(3) There are a number of detailed answers on this on SO that go into considerable detail on this. Search around a bit.
(4) Not that I am aware.

Related

Is the objc_msgSend function thread-safe?

Is the objc_msgSend function thread-safe?
Check the source code and find that it is written in assembly, but it is not locked at the beginning of the function. Will it not cause confusion when called by multiple threads?

The details of this answer depend on the specifics of what you mean by "thread-safe", but for most definitions of "thread-safe" (e.g., calling this code from multiple threads will leave your program in a consistent state without unintended interactions), then the answer is yes, objc_msgSend is thread-safe.
The specifics of how this is done are complex, but some context:
A function is inherently thread-safe if it does not modify state shared across threads in a way that could lead to unintended consequences
For example,
func add(_ a: Int, _ b: Int) -> Int {
a + b
}
is inherently thread-safe — you can't call it in a way that would cause it to mutate shared state somehow
A function is thread-safe if it does modify state shared across threads, but in a way that is coordinated with other threads reading that state
objc_msgSend falls into the latter camp: there are cases in which it does need to modify shared state (entries in the method cache), but it does so in a coordinated way.
Some helpful references, written by Mike Ash (who last I knew, was still working on the Obj-C runtime at Apple):
Friday Q&A 2017-06-30: Dissecting objc_msgSend on ARM64
Friday Q&A 2015-05-29: Concurrent Memory Deallocation in the Objective-C Runtime
objc_msgSend's New Prototype
In "Dissecting objc_msgSend on ARM64", Mike goes through the specifics of how objc_msgSend works, line-by-line. While the code has changed a bit since 2017, the specifics are all largely the same:
It gets the class of the target object passed in
It finds the method cache for that class
It uses the selector called on the target object to look up a method implementation for that method in the cache
If a method is not found in the cache, it falls back to looking it up, and possibly inserts it into the cache
It calls the implementation of that method directly
Of these operations, (1), (2), and (4) are most obviously thread-safe: (1) and (4) are inherently thread-safe (and don't rely on other threads at all), and (2) performs an atomic read that is safe across threads.
(3) is the trickier operation, since the method cache is shared for all instances of a class, and if you're calling a method on multiple instances of the same class at the same time, they might be digging through the method cache simultaneously. "Concurrent Memory Deallocation in the Objective-C Runtime" goes into this a little bit more on how this is done, but the general gist is that there are plenty of cool synchronization tricks that you can without needing to lock across threads. The full explanation is out of scope (and the article goes into a lot of detail anyway), but in a comment on "Dissecting objc_msgSend on ARM64" from Blaine Garst (one of the original authors of many of the features you might recognize as part of the Objective-C runtime):
Mike, you missed explaining the greatest secret of the messenger, one that Matt and Dave and I cooked up on the black couches at Seaport: how it is that multiple clients can read a mutable hash table without synchronization. It stumps people to this day. The answer is that the messenger uses a specialized garbage collector! I had put locks in when we first tried to do multi-threaded ObjC and it was miserable and then I read a paper by David Black where the kernel would reset the PC back to the start of a critical range for a mutex, and, well, we use the absense of a PC in critical ranges to deduce safety of reclaiming stale caches.
So, this operation is coordinated across threads too.
All of the work that objc_msgSend does is safe across thread boundaries, so much so that you can call methods on the same exact object across any number of threads at the exact same time, and it will work just fine. (Whether the implementation of the method is thread-safe is something else entirely.)
And in the end, it has to be. objc_msgSend is the backbone of pretty much all of Objective-C. If it wasn't thread-safe, it would be impossible to write multi-threaded Objective-C code. But luckily it is, and you can rely on it working at scale.

Multi-thread daata access issue, #synchronized & serial queue

As you may have experienced, access none-thread safe variables is a big headache. For iOS one simple solution is to use keyword #synchronized, which will add NSLock to insure the data can be accessed by unique one thread, the disadvantage is as below:
Lock too many will reduce app performance greatly, especially when invoked by main thread.
Dead lock will occur when logic becomes complex.
Based on the above considerations, we prefer to use serial queue to handle, each thread safe critical operation will append to the end of the queue, it is a great solution, but the problem is that all access interfaces should by designed in asyn style, see the following one.
-(id)objectForKey:(NSString *)key;
The people who invoke this class aren't reluctant to design in this way. Anyone who has experience on this field please share and discuss together.

The final solution is using NSUserDefault to store small data, for large cache data put them in file maintained by ourselves.
Per Apple doc the advantage of NSUserDefault is thread safe and will do synchronize work periodically.

What exactly does a pthread mutex lock out?

I'm assuming this has been asked on here, but I can't find this particular question. Does it just lock the part of the code in between the lock and unlock, or does it lock global variables? Like for this code
pthread_mutex_lock(&mtx);
bitmap[index] = 1;
pthread_mutex_unlock(&mtx);
the mutex just locks that line of code? Is there a way to lock specific variables without just locking the part of code that uses them?

No, it locks the actual mutex variable.
Any piece of code that attempts to lock that mutex while it's locked will block until it's unlocked.
If that is the only piece of code that locks the mutex then, yes, you can say it just protects that line. But that's not necessarily the case.
A mutex is used to serialise access to a resource. Whether that resource is considered a line of code or (more likely in this case) the bitmap array is down to where the mutex is locked and unlocked.
Chances are you have a few different areas where the bitmap array is read or modified and you should probably ensure they're all protected by the mutex.

No there is no way to just lock a variable.
Mutex is just an abstraction. So whenever you want some variable should not be affected when you are working on it, declare a mutex for that variable and lock it as long as you want.
There is no direct relation between the mutex and the variable you want to lock. Its up to the programmer. Most commonly it is used in multi-threaded environment.
Whenever a variable (a resource. In programming, resources are manipulated in variables) is shared across parallel running processes (according to kernel, threads of a same process are a group of processes sharing same address space and some resources), if a programmer want to make the variable to be accessed exactly only one process at a particular time, he has to write the block of code accessing the variable in all the processes (or threads) between a mutex lock pair (pthread_mutex_lock and pthread_mutex_unlock). So whenever the variable is accessed in any process (or thread), mutex will be locked. So if another process want's to access the variable, it has to wait until the mutex unlock. So the programmers' final goal is achieved.

Check if pthread thread is blocking

Here's the situation, I have a thread running that is partially controlled by code that I don't own. I started the thread so I have it's thread id but then I passed it off to some other code. I need to be able to tell if that other code has currently caused the thread to block from another thread that I am in control of. Is there are way to do this in pthreads? I think I'm looking for something equivalent to the getState() method in Java's Thread class (http://download.oracle.com/javase/6/docs/api/java/lang/Thread.html#getState() ).
--------------Edit-----------------
It's ok if the solution is platform dependent. I've already found a solution for linux using the /proc file system.

You could write wrappers for some of the pthreads functions, which would simply update some state information before/after calling the original functions. That would allow you to keep track of which threads are running, when they're acquiring or holding mutexes (and which ones), when they're waiting on which condition variables, and so on.
Of course, this only tells you when they're blocked on pthreads synchronization objects -- it won't tell you when they're blocking on something else.

Before you hand the thread off to some other code, set a flag protected by a mutex. When the thread returns from the code you don't control, clear the flag protected by the mutex. You can then check, from wherever you need to, whether the thread is in the code you don't control.
From outside the code, there is no distinction between blocked and not-blocked. If you literally checked the state of the thread, you would get nonsensical results.
For example, consider two library implementations.
A: We do all the work in the calling thread.
B: We dispatch a worker thread to do the work. The calling thread blocks until the worker is done.
In both cases A and B the code you don't control is equally making forward progress. Your 'getstate' idea would provide different results. So it's not what you want.

Clone a lua state

Recently, I have encountered many difficulties when I was developing using C++ and Lua. My situation is: for some reason, there can be thousands of Lua-states in my C++ program. But these states should be same just after initialization. Of course, I can do luaL_loadlibs() and lua_loadfile() for each state, but that is pretty heavy(in fact, it takes a rather long time for me even just initial one state). So, I am wondering the following schema: What about keeping a separate Lua-state(the only state that has to be initialized) which is then cloned for other Lua-states, is that possible?

When I started with Lua, like you I once wrote a program with thousands of states, had the same problem and thoughts, until I realized I was doing it totally wrong :)
Lua has coroutines and threads, you need to use these features to do what you need. They can be a bit tricky at first but you should be able to understand them in a few days, it'll be well worth your time.

take a look to the following lua API call I think it is what you exactly need.
lua_State *lua_newthread (lua_State *L);
This creates a new thread, pushes it on the stack, and returns a pointer to a lua_State that represents this new thread. The new thread returned by this function shares with the original thread its global environment, but has an independent execution stack.
There is no explicit function to close or to destroy a thread. Threads are subject to garbage collection, like any Lua object.

Unfortunately, no.
You could try Pluto to serialize the whole state. It does work pretty well, but in most cases it costs roughly the same time as normal initialization.

I think it will be hard to do exactly what you're requesting here given that just copying the state would have internal references as well as potentially pointers to external data. One would need to reconstruct those internal references in order to not just have multiple states pointing to the clone source.
You could serialize out the state after one starts up and then load that into subsequent states. If initialization is really expensive, this might be worth it.
I think the closest thing to doing what you want that would be relatively easy would be to put the states in different processes by initializing one state and then forking, however your operating system supports it:
http://en.wikipedia.org/wiki/Fork_(operating_system)
If you want something available from within Lua, you could try something like this:
How do you construct a read-write pipe with lua?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart