Since Apple's API is not opened source nor it is mentioned in the documentation, when writing in Swift, we have no way, to know whether the returned object is an autorelease objective-c object.
Hence, it becomes unclear when we should use autoreleasepool
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html#//apple_ref/doc/uid/20000047-1041876
If you write a loop that creates many temporary objects.
You may use an autorelease pool block inside the loop to dispose of
those objects before the next iteration. Using an autorelease pool
block in the loop helps to reduce the maximum memory footprint of the
application.
Without autoreleasepool
for ... {
FileManager.default.copyItem
CGImageSourceCreateWithURL
CGImageSourceCopyPropertiesAtIndex
CGImageSourceCreateThumbnailAtIndex
CGImageDestinationCreateWithURL
CGImageDestinationFinalize
}
With autoreleasepool
for ... {
autoreleasepool {
FileManager.default.copyItem
CGImageSourceCreateWithURL
CGImageSourceCopyPropertiesAtIndex
CGImageSourceCreateThumbnailAtIndex
CGImageDestinationCreateWithURL
CGImageDestinationFinalize
}
}
I try to run an intensive loop over the above 2 code for comparison purpose.
I found no significant difference in their memory usage pattern, based on XCode memory report.
I was wondering, what are some good guideline/ thought process, to decide whether we should apply autoreleasepool throughout our code?
I have such concern, as recently I saw autoreleasepool is required in code which involves FileHandle.read - https://stackoverflow.com/a/42935601/72437
Use FileManager to copy item doesn't have a huge payload. And Image I/O you're using will save a lot of memory during the I/O process. In addition, apple's image api will have caches for the same file.
That's why your code won't have a significant difference. Because you didn't make any memory payload.
You could try another way to validate the usage of autoreleasepool. And I can assure that it will have a tremendous difference.
Use for-loop(10000 times) to generate random strings (longer is better), and use each string to transform an UTF-8 data in each loop. Then see different memory growth from the with or without autoreleasepool case.
Try it out.
Related
I have code that updates the values of a protobuf map periodically. This code is simplified for clarity.
void my_periodically_called_function() {
my_protobuf_map->Clear();
MyObject obj;
obj.set_value(data);
my_protobuf_map['my_key'] = obj;
}
What happens is that the program memory keeps growing every iteration. After digging through protobuf's map.h it seems that after clearing the map and re-adding elements, [] will just allocate more data to the arena (without clearing any older data) which is obviously undesirable.
What is the most protobuf friendly way to resolve this? I want a good way to be able to delete specific memory from the arena.
An easy way to fix the problem would be to remove the Clear() but I'd like to keep that to avoid weird bugs with old state persisting.
Thanks in advance.
The way protobuf C++ library implements arena allocation, there is no way to free an individual piece of memory. Instead, all of it is freed at once by freeing the whole arena.
The main point of arena allocator is to improve speed by making allocation a constant-time operation (it just increments a pointer).
In your case, it sounds like you'll either want to periodically free the arena and reconstruct the message, or otherwise use the heap allocator which handles freeing memory.
Are there any conditions in Objective-C (Objective-C++) where the compiler can detect that a variable capture in a block is never used and thus decide to not capture the variable in the first place?
For example, assume you have an NSArray that contains a large number of items which might take a long time to deallocate. You need to access the NSArray on the main thread, but once you're done with it, you're willing to deallocate it on a background queue. The background block only needs to capture the array and then immediately deallocate. It doesn't actually have to do anything with it. Can the compiler detect this and, "erroneously", skip the block capture altogether?
Example:
// On the main thread...
NSArray *outgoingRecords = self.records;
self.records = incomingRecords;
dispatch_async(background_queue, ^{
(void)outgoingRecords;
// After this do-nothing block exits, then outgoingRecords
// should be deallocated on this background_queue.
});
Am I guaranteed that outgoingRecords will always be captured in that block and that it will always be deallocated on the background_queue?
Edit #1
I'll add a bit more context to better illustrate my issue:
I have an Objective-C++ class that contains a very large std::vector of immutable records. This could easily be 1+ million records. They are basic structs in a vector and accessed on the main thread to populate a table view. On a background thread, a different set of database records might be read into a separate vector, which could also be quite large.
Once the background read has occurred, I jump over to the main thread to swap Objective-C objects and repopulate the table.
At that point, I don't care at all about the contents of the older vector or its parent Objective-C class. There's no fancy destructors or object-graph to teardown, but deallocating hundreds of megabytes, maybe even gigabytes of memory is not instantaneous. So I'm willing to punt it off to a background_queue and have the memory deallocation occur there. In my tests, that appears to work fine and gives me a little bit more time on the main thread to do other stuff before 16ms elapses.
I'm trying to understand if I can get away with simply capturing the object in an "empty" block or if I should do some sort of no-op operation (like call count) so that the compiler cannot optimize it away somehow.
Edit #2
(I originally tried to keep the question as simple as possible, but it seems like it's more nuanced then that. Based on Ken's answer below, I'll add another scenario.)
Here's another scenario that doesn't use dispatch_queues but still uses blocks, which is the part I'm really interested in.
id<MTLCommandBuffer> commandBuffer = ...
// A custom class that manages an MTLTexture that is backed by an IOSurface.
__block MyTextureWrapper *wrapper = ...
// Issue some Metal calls that use the texture inside the wrapper.
// Wait for the buffer to complete, then release the wrapper.
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> cb) {
wrapper = nil;
}];
In this scenario, the order of execution is guaranteed by Metal. Unlike the example above, in this scenario performance is not the issue. Rather, the IOSurface that is backing the MTLTexture is being recycled into a CVPixelBufferPool. The IOSurface is being shared between processes and, from what I can tell, MTLTexture does not appear to increase the useCount on the surface. My wrapper class does. When my wrapper class is deallocated, the useCount is decremented and the bufferPool is then free to recycling the IOSurface.
This is all working as expected but I end up with silly code like above just out of uncertainty whether I need to "use" the wrapper instance in the block to ensure it's captured or not. If the wrapper is deallocated before the completion handler runs, then the IOSurface will be recycled and the texture will get overwritten.
Edit to address question edits:
From the Clang Language Specification for Blocks:
Local automatic (stack) variables referenced within the compound
statement of a Block are imported and captured by the Block as const
copies. The capture (binding) is performed at the time of the Block
literal expression evaluation.
The compiler is not required to capture a variable if it can prove
that no references to the variable will actually be evaluated.
Programmers can force a variable to be captured by referencing it in a
statement at the beginning of the Block, like so:
(void) foo;
This matters when capturing the variable has side-effects, as it can
in Objective-C or C++.
(Emphasis added.)
Note that using this technique guarantees that the referenced object lives at least as long as the block, but does not guarantee it will be released with the block, nor by which thread.
There's no guarantee that the block submitted to the background queue will be the last code to hold a strong reference to the array (even ignoring the question of whether the block captures the variable).
First, the block may in fact run before the context which submitted it returns and releases its strong reference. That is, the code which called dispatch_async() could be swapped off the CPU and the block could run first.
But even if the block runs somewhat later than that, a reference to the array may be in an autorelease pool somewhere and not released for some time. Or there may be a strong reference someplace else that will eventually be cleared but not under you explicit control.
I have an application that does a lot of background reading of a realm, during which time, another background thread (i.e. not the main thread) may be writing to the same realm, so I am using an autoreleasepool on the background threads to ensure the thread reference to the realm is reclaimed quickly. See excerpt below
autoreleasepool {
do {
let backgroundRealm = try Realm(configuration: self.configuration)
.... Do lots of reading
backgroundRealm.beginWrite()
.... Do lots of writing here
try backgroundRealm.commitWrite()
// Is this good practice or not?
backgroundRealm.invalidate()
}
catch {
....
}
}
From reading the documentation Using a realm across threads and inWriteTransaction, it is not clear if after the commitWrite() and/or before leaving the autoreleasepool, would a call to backgroundRealm.invalidate() help keep file sizes down and improve performance? Does this implicitly happen when the realm is reclaimed behind the scenes? Would the call to invalidate() only waste CPU cycles and provide no additional benefits?
Would a call to backgroundRealm.invalidate() help keep file sizes down and improve performance?
No. invalidate() has no impact on the file size. If you want to keep the file size down, you would need to use writeCopyToURL(_:, encryptionKey:_, error: _) to write a compacted copy. But there is no convenience method for an in-place compact, which would require to invalidate all accessors across threads.
Does this implicitly happen when the realm is reclaimed behind the scenes?
It wouldn't be necessary. A Realm is deallocated, when there isn't any acccessor left keeping a hold off it anymore. So there is nothing left to be invalidated.
Would the call to invalidate() only waste CPU cycles and provide no additional benefits?
As long as you don't leak accessors from your autoreleasepool, you should be fine. Calling invalidate() might help if you leak objects to locate these later at runtime. But take care: when you access an invalidated object, it will fail.
In short: is it fast/cheap? Does it make sense to store a value from NSUserDefaults in memory for faster access?
Longer: say, i have significant number of values to be stored and read from NSUserDefaults; with the need to access(read) those values frequently.
In the snippet below, i initialize a private stored property, and keep it synced with corresponding NSUserDefaults value - so when i need to read it, i read the property.
If reading from the defaults directly is in fact fast, i'd remove the private property, obviously. But i'm unsure of that. Is it fast?
private var _loggedIn = NSUserDefaults.standardUserDefaults().boolForKey("loggedIn")
public var loggedIn: Bool {
get {
return _loggedIn
}
set {
_loggedIn = newValue
NSUserDefaults.standardUserDefaults().setBool(newValue, forKey: "loggedIn")
NSUserDefaults.standardUserDefaults().synchronize()
}
}
Clarification for future readers: the question is about reading, not writing/synchronizing, which is (as pointed in the answers) not fast nor cheap.
.synchronize() is called in the setter for a valid reason - in my specific case it is important to have it synched right away, so i sacrifice performance for logic integrity. In general cases, you should consider whether you need to call it - or let the system pick appropriate time for writing.
..In fact, now that i look at it, i see keeping the stored property as it is in the snippet, will provide logic integrity (as long as access from other places happens via the getter, and not directly from userDefaults). So i can avoid synchronizing here as well.
Reading is cheap. There is a generous caching in place and everything happens in RAM. Mutating is relatively cheap, but the system will still have to store the contents to non-volatile memory (a .plist file on the flash) at regular intervals.
Explicit synchronising isn't cheap. It eats time and more energy.
So for reads it is fine, but with a lot of writes I would still do it in a separate container and serialise only as needed.
It's unlikely to have a significant performance impact, but you can profile that yourself using Instruments to ensure the performance impact is negligible.
I made some performance tests with Instruments as #mipadi has suggested the past year and my conclusion was there is no substantial difference.
As I pointed out in a comment above, it's very important to detect which of those NSUserDefaults writes we want to be done straightaway. Just in those particular cases use synchronize method, otherwise leave iOS to handle that work to obtain better performance.
It's all fine unless you use NSUserDefaults as a database. synchronize () will write the complete plist file, so if you store megabytes of data, and then synchronize a lot, performance and/or battery life will suffer.
But check out this question as well: How often are NSUserDefaults synchronised?
An interesting detail is that user defaults will be written when your application terminates. Someone might experiment what happens if your program crashes right after changing NSUserdefaults; if that counts as "termination".
I have a sessions property, a mutable set. I need to iterate over the collection, but at the same time I could change the collection in another method:
- (Session*) sessionWithID: (NSString*) sessionID
{
for (Session *candidate in _sessions) {
/* do something */
}
return nil;
}
- (void) doSomethingElse
{
[_sessions removeObject:…];
}
This isn’t thread-safe. A bullet-proof version would be using #synchronized or a dispatch queue to serialize the _sessions access. But how reasonable is to simply copy the set before iterating over it?
- (Session*) sessionWithID: (NSString*) sessionID
{
for (Session *candidate in [_sessions copy]) {
/* do something */
}
return nil;
}
I don’t care about the performance difference much.
But how reasonable is to simply copy the set before iterating over it?
As presented, it is not guaranteed to be thread safe. You would need to guarantee that _sessions is not mutated during -copy. Then iterating over an immutable copy is safe, and mutation of _sessions may occur on a secondary thread or in your implementation.
In many cases with Cocoa collections, you will find it is preferable to use immutable ivars and copy on set by declaring the property as copy of type NSSet. This way, you copy on write/set, and then avoid the copy on read. This has the potential to reduce copies, depending on how your program actually executes. Generally, this alone is not enough, and you will need some higher level of synchronization.
Also remember that the Sessions in the set may not be thread safe. Even once your collections accesses are properly guarded, you may need to protect access to those objects.
Your code does not look thread-safe to me because the collection might be mutated from another thread while it is copied.
You would have to protect [_sessions copy] and [_sessions removeObject:…] from
executing simultaneously.
After creating the copy, you can iterate over it without a lock (assuming that the collection elements themselves are not modified from another thread).
In one of my projects I have a background simulation that a GLView is drawn based on. In order to do the drawing in a background thread I need to copy the simulation's current frame data, then perform the drawing based on that data so that the simulation can continue in it's own thread and not distort the drawing data.
I see the copying of information to be used asynchronously as perfectly valid. Especially in devices that have multiple cores. #synchronize causes the separate threads to stop (if they are accessing the same information) and thereby can cause more of a performance loss than the copy procedure.