Can I use pthread mutexes in the destructor function for thread-specific data? - pthreads

I'm allocating my pthread thread-specific data from a fixed-size global pool that's controlled by a mutex. (The code in question is not permitted to allocate memory dynamically; all the memory it's allowed to use is provided by the caller as a single buffer. pthreads might allocate memory, I couldn't say, but this doesn't mean that my code is allowed to.)
This is easy to handle when creating the data, because the function can check the result of pthread_getspecific: if it returns NULL, the global pool's mutex can be taken there and then, the pool entry acquired, and the value set using pthread_setspecific.
When the thread is destroyed, the destructor function (as per pthread_key_create) is called, but the pthreads manual is a bit vague about any restrictions that might be in place.
(I can't impose any requirements on the thread code, such as needing it to call a destructor manually before it exits. So, I could leave the data allocated, and maybe treat the pool as some kind of cache, reusing entries on an LRU basis once it becomes full -- and this is probably the approach I'd take on Windows when using the native API -- but it would be neatest to have the per-thread data correctly freed when each thread is destroyed.)
Can I just take the mutex in the destructor? There's no problem with thread destruction being delayed a bit, should some other thread have the mutex taken at that point. But is this guaranteed to work? My worry is that the thread may "no longer exist" at that point. I use quotes, because of course it certainly exists if it's still running code! -- but will it exist enough to permit a mutex to be acquired? Is this documented anywhere?

The pthread_key_create() rationale seems to justify doing whatever you want from a destructor, provided you keep signal handlers from calling pthread_exit():
There is no notion of a destructor-safe function. If an application does not call pthread_exit() from a signal handler, or if it blocks any signal whose handler may call pthread_exit() while calling async-unsafe functions, all functions may be safely called from destructors.
Do note, however, that this section is informative, not normative.
The thread's existence or non-existence will most likely not affect the mutex in the least, unless the mutex is error-checking. Even then, the kernel is still scheduling whatever thread your destructor is being run on, so there should definitely be enough thread to go around.

Related

Delphi - Why is TObject.InitInstance public?

I'm somewhat new to Delphi, and this question is just me being curious. (I also just tried using it by accident only to discover I'm not supposed to.)
If you look at the documentation for TObject.InitInstance it tells you not to use it unless you're overriding NewInstance. The method is also public. Why not make it protected if the user is never supposed to call it?
Since I was around when this whole Delphi thing got started back around mid-1992, there are likely several answers to this question. If you look at the original declaration for TObject in Delphi 1, there weren't any protected/private members on TObject. That was because very early on in the development of Delphi and in concert with the introduction of exceptions to the language, exceptions were allocated from a different heap than other objects. This was the genesis of the NewInstance/InitInstance/CleanupInstance/FreeInstance functions. Overriding these functions on your class types you can literally control where an object is allocated.
In recent years I've used this functionality to create a cache of object instances that are literally "recycled". By intercepting NewInstance and FreeInstance, I created a system where instances are not returned to the heap upon de-allocation, rather they are placed on a lock-free/low-lock linked list. This makes allocating/freeing instances of a particular type much faster and eliminates a lot of excursions into the memory manager.
By having InitInstance public (the opposite of which is CleanupInstance), this would allow those methods to be called from other utility functions. In the above case I mentioned, InitInstance could be called on an existing block of memory without having to be called only from NewInstance. Suppose NewInstance calls a general purpose function that manages the aforementioned cache. The "scope" of the class instance is lost so the only way to call InitInstance is of it were public.
One of these days, we'll likely ship the code that does what I described above... for now it's part of an internal "research" project.
Oh, as an aside and also a bit of a history lesson... Prior to the Delphi 1 release, the design of how Exception instances were allocated/freed was returned to using the same heap as all the other objects. Because of an overall collective misstep it was assumed that we needed to allocate all Exception object instances to "protect" the Out of memory case. We reasoned that if we try and raise an exception because the memory manager was "out of memory", how in the blazes would we allocate the exception instance!? We already know there is no memory at that point! So we decided that a separate heap was necessary for all exceptions... until either Chuck Jazdzewski or Anders Heijlsberg (I forget exactly which one), figured out a simple, rather clever solution... Just pre-allocate the out of memory exception on startup! We still needed to control whether or not the exception should ever actually be freed (Exception instances are automatically freed once handled), so the whole NewInstance/FreeInstance mechanism remained.
Well never say never. In the VCL too much stuff is private and not virtual as it is, so I kinda like the fact that this stuff is public.
It isn't really necessary for normal use, but in specific cases, you might use it to allocate objects in bulk. NewInstance reserves a bit of memory for the object and then calls InitInstance to initialize it. You could write a piece of code that allocates memory for a great number of objects in one go, and then calls InitInstance for different parts of that large block to initialize different blocks in it. Such an implementation could be the base for a flyweight pattern implementation.
Normally you wouln't need such a thing at all, but it's nice that you can if you really want/need to.
How it works?
The fun thing is: a constructor in Delphi is just some method. The Create method itself doesn't do anything special. If you look at it, it is just a method as any other. It's even empty in TObject!
You can even call it on an instance (call MyObject.Create instead of TMyObject.Create), and it won't return a new object at all. The key is in the constructor keyword. That tells the compiler, that before executing the TAnyClass.Create method, it should also construct an actual object instance.
That construction means basically calling NewInstance. NewInstance allocates a piece of memory for the data of the object. After that, it calls InitInstance to do some special initialization of that memory, starting with clearing it (filling with zeroes).
Allocating memory is a relatively expensive task. A memory manager (compiled into your application) needs to find a free piece of memory and assign it to your object. If it doesn't have enough memory available, it needs to make a request to Windows to give it some more. If you have thousands or even millions of objects to create, then this can be inefficient.
In those rare cases, you could decide to allocate the memory for all those objects in one go. In that case you won't call the constructor at all, because you don't want to call NewInstance (because it would allocate extra memory). Instead, you can call InitInstance yourself to initialize pieces of your big chunk of memory.
Anyway, this is just a hypotheses of the reason. Maybe there isn't a reason at all. I've seen so many irrationally applied visibility levels in the VCL. Maybe they just didn't think about it at all. ;)
It gives developers a way to create object not using NewInstance (memory from stack/memory pool)

pthread mutex: get state

I was looking through some code that provides a C/C++ wrapper for a pthread mutex. The code keeps a shadow variable for signaled/not signaled condition. The code also ignores return values from functions like pthread_mutex_lock and pthread_mutex_trylock, so the shadow variable may not accurately reflect the state of the mutex (ignoring the minor race condition).
Does pthread provide a way to query a mutex for its state? A quick read of the pthread API does not appear to offer one. I also don't see anything interesting that operates on pthread_mutexattr_t.
Or should one use trylock, rely upon EBUSY, and give up ownership if acquired?
Thanks in advance.
There is no such function because there would be no point. If you queried the state of a mutex without trying to acquire it, as pthread_mutex_trylock() does, then the result you get could be invalidated immediately by another thread changing that mutex's state.

Wrapper class for thread-safe objects

I have recently played around with one demo opensource project for the basic functionality of the INDY10 TCP/IP server and stumbled upon the problem of internal multitasking implementation of INDY and its interaction with VCL components. Since there are many different topics in SO on the subject, I decided to make a simple client-server application and test some of the solutions and approaches suggested, at least the ones that I understood correctly. Below I would like to summarize and review an approach that was previously suggested on SO, and if possible listen to your expert opinion on the subject.
Problem: Encapsulation the VCL for thread-safe usage inside an indy10-based client/server application.
Description of the Development Env.:
Delphi Version: Delphi® XE2 Version 16.0
INDY Version 10.5.8.0
O.S. Windows 7 (32Bit)
As mentioned in the article ([ Is the VCL Thread-safe?]) (sorry I do not have enough reputation to post the link) special care should be taken when one wishes to use any kind of VCL components inside a multithreaded (multitasking) application. VCL is not thread safe, but can be used in a thread safe way!
The how and the why usually depend on the application at hand but one can attempt to generalize a bit and suggest some kind of general approach to this problem. First of all, as in the case of INDY10, one does not need to be explicitly parallelizing his code, i.e. create and execute multiple threads, in order to expose VCL to deadlocks and data inter dependencies.
In every sclient-server application, the server has to be able to handle multiple requests simultaneously, so naturally, INDY10 internally implements this functionality. This would mean that the INDY10 set of classes are responsible to manage the program's thread creation, execution and destruction procedures internally.
The most obvious place where our code is exposed to the inner workings of INDY10 and hence possible thread conflicts, is the IdTCPServerExecute (TIdTCPServer onExecute event) method.
Naturally, INDY10 provides classes (wrappers) that ensure thread-safe program flow, but since I did not manage to get enough explanation on their application and usage, I prefer a custom made approach.
Below I summarize a method ( the suggested technique is based on a previous comment I found in SO How to use TIdThreadSafe class from Indy10 ) that attempts (and presumably succeeds) in dealing with this problem:
The question I tackle below is: How to make a specific class "MyClass" ThreadSafe?
The main idea is to create kind of a wrapper class that encapsulates "MyClass" and queues the threads that try to access it in First-In-First-Out principle. The underlying objects that are used for synchronization are [Windows's Critical Section Objects.].
In the context of a client-server application, "MyClass" will contain all thread unsafe functionality of our server, so we will try to ensure that those procedures and functions are not executed by more than one working thread simultaneously. This naturally means loss of parallelism of our code, but since the approach is simple and seems to be , in some cases this maybe a useful approach.
Wrapper class Implementation:
constructor TThreadSafeObject<T>.Create(originalObject: T);
begin
tsObject := originalObject; // pass it already instantiated instance of MyClass
tsCriticalSection:= TCriticalSection.Create; // Critical section Object
end;
destructor TThreadSafeObject<T>.Destroy();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
inherited Destroy;
end;
function TThreadSafeObject<T>.Lock(): T;
begin
tsCriticalSection.Enter;
result:=tsObject;
end;
procedure TThreadSafeObject<T>.Unlock();
begin
tsCriticalSection.Leave;
end;
procedure TThreadSafeObject<T>.FreeOwnership();
begin
FreeAndNil(tsObject);
FreeAndNil(tsCriticalSection);
end;
MyClass Definition:
MyClass = class
public
procedure drawRandomBitmap(abitmap: TBitmap); //Draw Random Lines on TCanvas
function decToBin(i: LongInt): String; //convert decimal number to Bin.
procedure addLineToMemo(aLine: String; MemoFld: TMemo); // output message to TMemo
function randomColor(): TColor;
end;
Usage:
Since threads execute in order and wait for the thread which has the current ownership of the critical section to finish (tsCriticalSection.Enter; and tsCriticalSection.Leave;) it is logical that if you want to manage that ownership relay, you need one unique instance TThreadSafeObject (you can consider using the singleton pattern). so include:
tsMyclass:= TThreadSafeObject<MyClass>.Create(MyClass.Create);
in Form.Create and
tsMyclass.Destroy;
in Form.Close; Here tsMyclass is a global variable of type MyClass.
Usage:
Regarding the usage of MyClass try the following:
with tsMyclass.Lock do
try
addLineToMemo('MemoLine1', Memo1);
addLineToMemo('MemoLine2', Memo1);
addLineToMemo('MemoLine3', Memo1);
finally
// release ownership
tsMyclass.unlock;
end;
, where Memo1 is an instance of a TMemo component on the form.
With this, we are supposed to ensure that anything that happens when tsMyClass is locked
will be executed by only one thread at a time. An obvious drawback of this approach, however, is that since I have only one instance of tsMyclass, even if one thread is trying to draw for e.g. on the Canvas, while another is writing on the Memo, the first thread will have to wait for the second to finish and only then it will be able to carry out its job.
My questions here are:
Is the above suggested method correct? Am I still free of race
conditions or do I have some "loopholes" in the code, from where
data conflicts could occur?
How can one, in general, test for thread
unsafety of his/her applicaiton?
I would like to stress that the above approach is in no way my own doing. It is basically a summary of the solution found in 2. Nevertheless, I have decided to post again in an attempt to get some kind of closure on the topic or a kind of proof of validity for the suggested solution. Besides, repetition is mother of all knowledge, as they say.
With this, we are supposed to ensure that anything that happens when
tsMyClass is locked will be executed by only one thread at a time. An
obvious drawback of this approach, however, is that since I have only
one instance of tsMyclass, even if one thread is trying to draw for
e.g. on the Canvas, while another is writing on the Memo, the first
thread will have to wait for the second to finish and only then it
will be able to carry out its job.
I see one big problem here: the VCL (forms, drawing, etc...) lives on the main thread. Even if you block concurrent thread access, the updates need to be done in the context of the main thread. This is the part where you need to use Synhronize(), the big difference with a lock (Criticalsection) is that synchronized code is ran in the context of the main thread. The end result is basically the same, your threaded code is serialized and you lose the advantage of using threads in the first place.
Locking on the whole object can be much too coarse.
Imagine cases where some properties or methods are independent of others. If the lock works on a "global" level, many operations will be blocked needlessly.
From Reduce lock granularity – Concurrency optimization
So, how can we reduce lock granularity? With a short answer, by asking
for locks as less as possible. The basic idea is to use separate locks
to guard multiple independent state variables of a class, instead of
having only one lock in class scope.
First things first: You don't need to implement a LOCK for each of your objects, Delphi's done that for you with the TMonitor class:
TMonitor.Enter(WhateverObject);
try
// Your code goes here.
finally TMonitor.Leave(WhateverObject);
end;
just make sure you free the WhateverObject when your application shuts down, or else you'll run into a bug that I've opened on QC: http://qc.embarcadero.com/wc/qcmain.aspx?d=111795
Secondly, making an application multi-threading is a bit more involved. You can't just wrapp each call between Enter/Leave calls: your "locking" needs to take into account what the object does and what the access pattern is. Wrapping calls within Enter/Leave simply make sure that only one thread runs that method at any time, but race conditions are much more complex, and might arise from successive calls to your locked methods. Even those each method is locked, and only one thread ever called those methods at any given time, the state of the locked object might change between as a consequence of other thread's activity.
This kind of code would be just fine in a single-threaded application, but locking at method level is not enough when switching to multi-threaded:
if List.IndexOf(Something) = -1 then
List.Add(Something);

Do I need to wrap accesses to Int64's with a critical section?

I have code that logs execution times of routines by accessing QueryPerformanceCounter. Roughly:
var
FStart, FStop : Int64 ;
...
QueryPerformanceCounter (FStart) ;
... <code to be measured>
QueryPerformanceCounter (FStop) ;
<calculate FStop - FStart, update minimum and maximum execution times, etc>
Some of this logging code is inside threads, but on the other hand, there is a display UI that accesses the derived results. I figure the possibility exists of the VCL thread accessing the same variables that the logging code is also accessing. The VCL will only ever read the data (and a mangled read would not be too serious) but the logging code will read and write the data, sometimes from another thread.
I assume QueryPerformanceCounter itself is thread-safe.
The code has run happily without any sign of a problem, but I'm wondering if I need to wrap my accesses to the Int64 counters in a critical section?
I'm also wondering what the speed penalty of the critical section access is?
Any time you access multi-byte non-atomic data across thread when both reads and writes are involved, you need to serialize the access. Whether you use a critical section, mutex, semaphore, SRW lock, etc is up to you.

Difference between the WaitFor function for TMutex delphi and the equivalent in win32 API

The documentation of delphi says that the WaitFor function for TMutex and others sychronization objects wait until a handle of object is signaled.But this function also guarantee the ownership of the object for the caller?
Yes, the calling thread of a TMutex owns the mutex; the class is just a wrapper for the OS mutex object. See for yourself by inspecting SyncObjs.pas.
The same is not true for other synchronization objects, such as TCriticalSection. Any thread my call the Release method on such an object, not just the thread that called Acquire.
TMutex.Acquire is a wrapper around THandleObjects.WaitFor, which will call WaitForSingleObject OR CoWaitForMultipleHandles depending on the UseCOMWait contructor argument.
This may be very important, if you use STA COM objects in your application (you may do so without knowing, dbGO/ADO is COM, for instance) and you don't want to deadlock.
It's still a dangerous idea to enter a long/infinite wait in the main thread, 'cause the only method which correctly handles calls made via TThread.Synchronize is TThread.WaitFor and you may stall (or deadlock) your worker threads if you use the SyncObjs objects or WinAPI wait functions.
In commercial projects, I use a custom wait method, built upon the ideas from both THandleObjects.WaitFor AND TThread.WaitFor with optional alertable waiting (good for asynchronous IO but irreplaceable for the possibility to abort long waits).
Edit: further clarification regarding COM/OLE:
COM/OLE model (e.g. ADO) can use different threading models: STA (single-threaded) and MTA (multi or free-threaded).
By definition, the main GUI thread is initialized as STA, which means, the COM objects can use window messages for their asynchronous messaging (particulary when invoked from other threads, to safely synchronize). AFAIK, they may also use APC procedures.
There is a good reason for the CoWaitForMultipleHandles function to exist - see its' use in SyncObjs.pas THandleObject.WaitFor - depending on the threading model, it can process internal COM messages, while blocking on the wait handle.

Resources