slow memory release (refcounted structure) - Is my workaround a good way? - delphi

in my program i can load a Catalog: ICatalog
a Catalog here contains a lot of refcounted structures (Icollections of IItems, IElements, IRules, etc.)
when I want to change to another catalog,
I load a new Catalog
but the automatic release of the previous ICatalog instance takes time, freezing my application for 2 second or more.
my question is :
I want to defer the release of the old (and no more used) ICatalog instance to another thread.
I've not tested it already, but I intend to create a new thread with :
ErazerThread.OldCatalog := Catalog; // old catalog refcount jumps to 2
Catalog := LoadNewCatalog(...); // old catalog refcount =1
ErazerThread.Execute; //just set OldCatalog to nil.
this way, I expect the release to occur in the thread, and my application not
beeing freezed anymore.
Is it safe (and good practice) ?
Do you have examples of existing code already perfoming release with a similar method ?

I would let such thread block on some threadsafe queue(*), and push the interfaces to release into that queue as iunknowns.
Note however that if the releasing touches a lock that your memory manager uses (like a global heapmanager lock), then this is futile, since your mainthread will block on the first heapmanager access.
With a heapmanager with per thread pools, allocating many items in one thread and releasing it in a different thread might frustrate coalescing and reuse of (small) blocks algorithms.
I still think the way you describe is generally sound when implemented properly. But
this is from a theoretic perspective to show that there might be a link from the 2nd thread to the mainthread via the heapmanager.
(*) Simplest way is to add it to a tthreadlist and use tevent to signal that an element was added.

That looks OK, but don't call the thread's Execute method directly; that will run the thread object's code in the current thread instead of the one that the thread object creates. Call Start or Resume instead.

Related

Does the POSIX Thread API offer a way to test if the calling thread alreeady holds a lock?

Short question: Does the POSIX thread API offer me a way to determine if the calling thread already holds a particular lock?
Long question:
Suppose I want to protect a data structure with a lock. Acquiring and releasing the lock need to happen in different functions. Calls between the functions involved are rather complex (I am adding multithreading to a 16-year-old code base). For example:
do_dangerous_stuff() does things for which the current thread needs to hold the write lock. Therefore, I acquire the lock at the beginning of the function and release it at the end, as the caller does not necessarily hold the lock.
Another function, do_extra_dangerous_stuff(), calls do_dangerous_stuff(). However, before the call it already does things which also require a write lock, and the data is not consistent until the call to do_dangerous_stuff() returns (so releasing and immediately re-acquiring the lock might break things).
In reality it is more complicated than that. There may be a bunch of functions calling do_dangerous_stuff(), and requiring each of them to obtain the lock before calling do_dangerous_stuff() may be impractical. Sometimes the lock is acquired in one function and released in another.
With a read lock, I could just acquire it multiple times from the same thread, as long as I make sure I release the same number of lock instances that I have acquired. For a write lock, this is not an option (attempting to do so will result in a deadlock). An easy solution would be: test if the current thread already holds the lock and acquire it if not, and conversely, test if the current thread still holds the lock and release it if it does. However, that requires me to test if the current thread already holds the lock—is there a way to do that?
Looking at the man page for pthread_rwlock_wrlock(), I see it says:
If successful, the pthread_rwlock_wrlock() function shall return zero; otherwise, an error number shall be returned to indicate the error.
[…]
The pthread_rwlock_wrlock() function may fail if:
EDEADLK The current thread already owns the read-write lock for writing or reading.
As I read it, EDEADLK is never used to indicate chains involving multiple threads waiting for each other’s resources (and from my observations, such deadlocks indeed seem to result in a freeze rather than EDEADLK). It seems to indicate exclusively that the thread is requesting a resource already held by the current thread, which is the condition I want to test for.
If I have misunderstood the documentation, please let me know. Otherwise the solution would be to simply call pthread_rwlock_wrlock(). One of the following should happen:
It blocks because another thread holds the resource. When we get to run again, we will hold the lock. Business as usual.
It returns zero (success) because we have just acquired the lock (which we didn’t hold before). Business as usual.
It returns EDEADLK because we are already holding the lock. No need to reacquire, but we might want to consider this when we release the lock—that depends on the code in question, see below
It returns some other error, indicating something has truly gone wrong. Same as with every other lock operation.
It may make sense to keep track of the number of times we have acquired the lock and got EDEADLK. Taking from Gil Hamilton’s answer, the lock depth would work for us:
Reset the lock depth to 0 when we have acquired a lock.
Increase the lock depth by 1 each time we get EDEADLK.
Match each attempt to acquire the lock with the following: If lock depth is 0, release the lock. Else decrease the lock depth.
This should be thread-safe without further synchronization, as the lock depth is effectively protected by the lock it refers to (we touch it only while holding the lock).
Caveat: if the current thread already holds the lock for reading (and no others do), this will also report it as being “already locked”. Further tests will be needed to determine if the currently held lock is indeed a write lock. If multiple threads, among them the current one, hold the read lock, I do not know if attempting to obtain a write lock will return EDEADLK or freeze the thread. This part needs some more work…
AFAIK, there's no easy way to accomplish what you're trying to do.
In linux, you can use the "recursive" mutex attribute to achieve your purpose (as shown here for example: https://stackoverflow.com/a/7963765/1076479), but this is not "posix"-portable.
The only really portable solution is to roll your own equivalent. You can do that by creating a data structure containing a lock along with your own thread index (or equivalent) and an ownership/recursion count.
CAVEAT: Pseudo-code off the top of my head
Recursive lock:
// First try to acquire the lock without blocking...
if ((err = pthread_mutex_trylock(&mystruct.mutex)) == 0) {
// Acquire succeeded. I now own the lock.
// (I either just acquired it or already owned it)
assert(mystruct.owner == my_thread_index || mystruct.lock_depth == 0);
mystruct.owner = my_thread_index;
++mystruct.lock_depth;
} else if (mystruct.owner == my_thread_index) {
assert(err == EBUSY);
// I already owned the lock. Now one level deeper
++mystruct.lock_depth;
} else {
// I don't own the lock: block waiting for it.
pthread_mutex_lock(&mystruct.mutex);
assert(mystruct.lock_depth == 0);
}
On the way out, it's simpler, because you know you own the lock, you only need to determine whether it's time to release it (i.e. the last unlock). Recursive unlock:
if (--mystruct.lock_depth == 0) {
assert(mystruct.owner == my_thread_index);
// Last level of recursion unwound
mystruct.owner = -1; // Mark it un-owned
pthread_mutex_unlock(&mystruct.mutex);
}
I would want to add some additional checks and assertions and significant testing before trusting this too.

Is a block in Objective-C always guaranteed to capture a variable?

Are there any conditions in Objective-C (Objective-C++) where the compiler can detect that a variable capture in a block is never used and thus decide to not capture the variable in the first place?
For example, assume you have an NSArray that contains a large number of items which might take a long time to deallocate. You need to access the NSArray on the main thread, but once you're done with it, you're willing to deallocate it on a background queue. The background block only needs to capture the array and then immediately deallocate. It doesn't actually have to do anything with it. Can the compiler detect this and, "erroneously", skip the block capture altogether?
Example:
// On the main thread...
NSArray *outgoingRecords = self.records;
self.records = incomingRecords;
dispatch_async(background_queue, ^{
(void)outgoingRecords;
// After this do-nothing block exits, then outgoingRecords
// should be deallocated on this background_queue.
});
Am I guaranteed that outgoingRecords will always be captured in that block and that it will always be deallocated on the background_queue?
Edit #1
I'll add a bit more context to better illustrate my issue:
I have an Objective-C++ class that contains a very large std::vector of immutable records. This could easily be 1+ million records. They are basic structs in a vector and accessed on the main thread to populate a table view. On a background thread, a different set of database records might be read into a separate vector, which could also be quite large.
Once the background read has occurred, I jump over to the main thread to swap Objective-C objects and repopulate the table.
At that point, I don't care at all about the contents of the older vector or its parent Objective-C class. There's no fancy destructors or object-graph to teardown, but deallocating hundreds of megabytes, maybe even gigabytes of memory is not instantaneous. So I'm willing to punt it off to a background_queue and have the memory deallocation occur there. In my tests, that appears to work fine and gives me a little bit more time on the main thread to do other stuff before 16ms elapses.
I'm trying to understand if I can get away with simply capturing the object in an "empty" block or if I should do some sort of no-op operation (like call count) so that the compiler cannot optimize it away somehow.
Edit #2
(I originally tried to keep the question as simple as possible, but it seems like it's more nuanced then that. Based on Ken's answer below, I'll add another scenario.)
Here's another scenario that doesn't use dispatch_queues but still uses blocks, which is the part I'm really interested in.
id<MTLCommandBuffer> commandBuffer = ...
// A custom class that manages an MTLTexture that is backed by an IOSurface.
__block MyTextureWrapper *wrapper = ...
// Issue some Metal calls that use the texture inside the wrapper.
// Wait for the buffer to complete, then release the wrapper.
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> cb) {
wrapper = nil;
}];
In this scenario, the order of execution is guaranteed by Metal. Unlike the example above, in this scenario performance is not the issue. Rather, the IOSurface that is backing the MTLTexture is being recycled into a CVPixelBufferPool. The IOSurface is being shared between processes and, from what I can tell, MTLTexture does not appear to increase the useCount on the surface. My wrapper class does. When my wrapper class is deallocated, the useCount is decremented and the bufferPool is then free to recycling the IOSurface.
This is all working as expected but I end up with silly code like above just out of uncertainty whether I need to "use" the wrapper instance in the block to ensure it's captured or not. If the wrapper is deallocated before the completion handler runs, then the IOSurface will be recycled and the texture will get overwritten.
Edit to address question edits:
From the Clang Language Specification for Blocks:
Local automatic (stack) variables referenced within the compound
statement of a Block are imported and captured by the Block as const
copies. The capture (binding) is performed at the time of the Block
literal expression evaluation.
The compiler is not required to capture a variable if it can prove
that no references to the variable will actually be evaluated.
Programmers can force a variable to be captured by referencing it in a
statement at the beginning of the Block, like so:
(void) foo;
This matters when capturing the variable has side-effects, as it can
in Objective-C or C++.
(Emphasis added.)
Note that using this technique guarantees that the referenced object lives at least as long as the block, but does not guarantee it will be released with the block, nor by which thread.
There's no guarantee that the block submitted to the background queue will be the last code to hold a strong reference to the array (even ignoring the question of whether the block captures the variable).
First, the block may in fact run before the context which submitted it returns and releases its strong reference. That is, the code which called dispatch_async() could be swapped off the CPU and the block could run first.
But even if the block runs somewhat later than that, a reference to the array may be in an autorelease pool somewhere and not released for some time. Or there may be a strong reference someplace else that will eventually be cleared but not under you explicit control.

How to make a function atomic in Swift?

I'm currently writing an iOS app in Swift, and I encountered the following problem: I have an object A. The problem is that while there is only one thread for the app (I didn't create separate threads), object A gets modified when
1) a certain NSTimer() triggers
2) a certain observeValueForKeyPath() triggers
3) a certain callback from Parse triggers.
From what I know, all the above three cases work kind of like a software interrupt. So as the code run, if NSTimer()/observeValueForKeyPath()/callback from Parse happens, current code gets interrupted and jumps to corresponding code. This is not a race condition (since just one thread), and I don't think something like this https://gist.github.com/Kaelten/7914a8128eca45f081b3 can solve this problem.
There is a specific function B called in all three cases to modify object A, so I'm thinking if I can make this function B atomic, then this problem is solved. Is there a way to do this?
You are making some incorrect assumptions. None of the things you mention interrupt the processor. 1 and 2 both operate synchronously. The timer won't fire or observeValueForKeyPath won't be called until your code finishes and your app services the event loop.
Atomic properties or other synchronization techniques are only meaningful for concurrent (multi-threaded) code. If memory serves, Atomic is only for properties, not other methods/functions.
I believe Parse uses completion blocks that are run on a background thread, in which case your #3 **is* using separate threads, even though you didn't realize that you were doing so. This is the only case in which you need to be worried about synchronization. In that case the simplest thing is to simply bracket your completion block code inside a call to dispatch_async(dispatch_get_main_queue()), which makes all the code in the dispatch_async closure run on the main, avoiding concurrency issues entirely.

is there a way that the synchronized keyword doesn't block the main thread

Imagine you want to do many thing in the background of an iOS application but you code it properly so that you create threads (for example using GCD) do execute this background activity.
Now what if you need at some point to write update a variable but this update can occur or any of the threads you created.
You obviously want to protect that variable and you can use the keyword #synchronized to create the locks for you but here is the catch (extract from the Apple documentation)
The #synchronized() directive locks a section of code for use by a
single thread. Other threads are blocked until the thread exits the
protected code—that is, when execution continues past the last
statement in the #synchronized() block.
So that means if you synchronized an object and two threads are writing it at the same time, even the main thread will block until both threads are done writing their data.
An example of code that will showcase all this:
// Create the background queue
dispatch_queue_t queue = dispatch_queue_create("synchronized_example", NULL);
// Start working in new thread
dispatch_async(queue, ^
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
});
// won’t actually go away until queue is empty
dispatch_release(queue);
So the question is fairly simple: How to overcome this ? How can we securely add a locks on all the threads EXCEPT the main thread which, we know, doesn't need to be blocked in that case ?
EDIT FOR CLARIFICATION
As you some of you commented, it does seem logical (and this was clearly what I thought at first when using synchronized) that only two the threads that are trying to acquire the lock should block until they are both done.
However, tested in a real situation, this doesn't seem to be the case and the main thread seems to also suffer from the lock.
I use this mechanism to log things in separate threads so that the UI is not blocked. But when I do intense logging, the UI (main thread) is clearly highly impacted (scrolling is not as smooth).
So two options here: Either the background tasks are too heavy that even the main thread gets impacted (which I doubt), or the synchronized also blocks the main thread while performing the lock operations (which I'm starting reconsidering).
I'll dig a little further using the Time Profiler.
I believe you are misunderstanding the following sentence that you quote from the Apple documentation:
Other threads are blocked until the thread exits the protected code...
This does not mean that all threads are blocked, it just means all threads that are trying to synchronise on the same object (the _sharedResource in your example) are blocked.
The following quote is taken from Apple's Thread Programming Guide, which makes it clear that only threads that synchronise on the same object are blocked.
The object passed to the #synchronized directive is a unique identifier used to distinguish the protected block. If you execute the preceding method in two different threads, passing a different object for the anObj parameter on each thread, each would take its lock and continue processing without being blocked by the other. If you pass the same object in both cases, however, one of the threads would acquire the lock first and the other would block until the first thread completed the critical section.
Update: If your background threads are impacting the performance of your interface then you might want to consider putting some sleeps into the background threads. This should allow the main thread some time to update the UI.
I realise you are using GCD but, for example, NSThread has a couple of methods that will suspend the thread, e.g. -sleepForTimeInterval:. In GCD you can probably just call sleep().
Alternatively, you might also want to look at changing the thread priority to a lower priority. Again, NSThread has the setThreadPriority: for this purpose. In GCD, I believe you would just use a low priority queue for the dispatched blocks.
I'm not sure if I understood you correctly, #synchronize doesn't block all threads but only the ones that want to execute the code inside of the block. So the solution probably is; Don't execute the code on the main thread.
If you simply want to avoid having the main thread acquire the lock, you can do this (and wreck havoc):
dispatch_async(queue, ^
{
if(![NSThread isMainThread])
{
// Synchronized that shared resource
#synchronized(sharedResource_)
{
// Write things on that resource
// If more that one thread access this piece of code:
// all threads (even main thread) will block until task is completed.
[self writeComplexDataOnLocalFile];
}
}
else
[self writeComplexDataOnLocalFile];
});

Does pthread_detach free up the stack allocated to the child thread, after the child thread as exited

I have used pthread_detach inorder to free up the stack allocated to the child thread, but this is not working, I guess it does not free up the memory.....
I don't want to use pthread_join. I know join assures me of freeing up the stack for child, but, I don't want the parent to hang up until the child thread terminates, I want my parent to do some other work in the mean time. So, I have used detach, as it will not block the parent thread.
Please, help me. I have been stuck..
this is not working
Yes it is. You are likely mis-interpreting your observations.
I want my parent to do some other work in the mean time
That's usually the reason to create threads in the first place, and you can do that:
pthread_create(...);
do_some_work(); // both current and new threads work in parallel
pthread_join(...); // wait for both threads to finish
report_results();
I don't want to use pthread_join. I know join assures me of freeing up the stack for child
Above statement is false: it assures no such thing. A common implementation will cache the now available child stack for reuse (in case you'll create another thread shortly).
YES - according to http://cursuri.cs.pub.ro/~apc/2003/resources/pthreads/uguide/users-16.htm it frees memory either when the thread ends or immediately if the thread has already ended...
As you don't provide any clue as how you determine that the memory is not freed up I can only assume that the method you use to determine it is not sufficient...

Resources