Large POD struct copy to stack crashes in a worker thread - ios

iOS project in a mix of Obj-C++ and C++ proper. I have a POD struct about 1 MB in size. There is a global instance of it. If I create a local instance of the same in a function that is invoked on a worker thread, the copy operation crashes (when invoked from a worker thread) in debug builds on a simulator. The release builds don't crash.
This smells like running out of stack size.
The worker thread in question is not created manually - is an NSOperationQueue worker.
The questions are twofold:
why does automatic stack growth fail?
how does one increase stack size on NSOperationQueue threads?
The repro goes:
struct S
{
char s[1024*1024];
};
S gs;
-(void)f
{
S ls;
ls = gs; //Crash!
}
Okay, I see about rewriting NSOperationQueue to grow the stack. That said - the **compiler (Clang) definitely has some kind of workaround for that. I'll check the disassembly just in case, but the same code doesn't crash in release builds. How come it's not engaging in this case? OBTW, checked the same code on Android, saw same crash.
PS: the motivating case is from not from my code, it's from a third party algorithm library. The original library doesn't really need a deep copy; the local variable is never written to, only read from; a reference (even a const one) would do just as well. My guess is that the author wanted a reference but messed it up. In the release build, the compiler optimizes is to a reference, thus no crash. In the debug build, it does not.

I don't know how up to date this documentation is, but according to Apple non-main threads top out at 512kiB stack size unless otherwise configured during thread creation.
I struggle to see a good reason to store such a data structure on the stack, particularly with (Obj-)C++ where you can easily wrap it in something like a std::unique_ptr which manages heap allocations and deallocations automatically. (Or indeed any other RAII based abstraction, or even storing it as an ivar in an ARC-enabled Objective-C class if you're so inclined.)
One downside to opting into very large stack sizes is that this memory likely stays resident but unused until the thread terminates, particularly on iOS where this memory won't even be swapped to disk. This is fine if you're explicitly starting up a thread and shut it down once you're done with your giant-stack-requiring-algorithm. But if you're running a one-off job on a pooled thread, you've now effectively leaked 1MB of memory. Maybe it's the embedded developer in me (or that I remember when iPhones only had 128MB RAM) but I'd prefer not to write code like that. (Or can someone come up with evidence that the low-memory warning mechanism purges unused stack space?)

Under the hood the threading mechanism of Cocoa uses unix POSIX threads for which stack size follows the following rules:
Default stack size, if it's not explicitly specified (e.g. in macOS you can find this value by running ulimit -s command, which for my machine is 8192 KiB, but for iOS is very likely a few times less)
Arbitrary stack size if it's specified during creating of a thread
Answering your first question:
why does automatic stack growth fail?
It "fails" because it's not allowed to grow beyond the allocated size for a given thread. More interesting question in this case - why it doesn't fail for release build? And frankly I don't have an answer here. I assume it most likely has something to do with the optimisation, where the compiler is allowed to bypass certain memory flow routines or discard some parts of code completely.
For the second question:
how does one increase stack size on NSOperationQueue threads?
The main thread of an application always has the default system stack size, and can only be altered in macOS (or rooted iOS device) with use of ulimit (size is given in KiB):
# Sets 32 MiB default stack size to a thread
% ulimit -s 32768
All other threads (to my knowledge) under both iOS and macOS have their size specified explicitly and it equals to 512 KiB. You will have to somehow forward the stack size to the pthread_create(3) function for them, something like this:
#import <Foundation/Foundation.h>
#import <pthread.h>
struct S {
char s[1024 * 1024];
};
void *func(void *context) {
// 16 MiB stack variable
S s[16];
NSLog(#"Working thread is finished");
auto* result = new int{};
return result;
}
int main(int argc, const char * argv[]) {
pthread_attr_t attrs;
auto s = pthread_attr_init(&attrs);
// Allocates 32 MiB stack size
s = pthread_attr_setstacksize(&attrs, 1024 * 1024 * 32);
pthread_t thread;
s = pthread_create(&thread, &attrs, &func, nullptr);
s = pthread_attr_destroy(&attrs);
void* result;
s = pthread_join(thread, &result);
if (s) {
NSLog(#"Error code: %d", s);
} else {
NSLog(#"Main is finished with result: %d", *(int *)result);
delete (int *)result;
}
#autoreleasepool {
}
return 0;
}
Unfortunately neither of queue API (GCD or NSOperation) exposes allocation part of their thread pools, let alone that NSThread doesn't let you to specify your own pthread explicitly for underlying execution. If you want to rely on those APIs, you will have to implement it "artificially".
Sample NSOperation subclass with arbitrary stack size thread
The interface of such a class can look something like this (provided the thread's stack size is constant and is not supposed to be an injected dependency):
// TDWOpeartion.h
#import <Foundation/Foundation.h>
NS_ASSUME_NONNULL_BEGIN
typedef void(^TDWOperationBlock)(void);
__attribute__((__objc_direct_members__))
#interface TDWOperation: NSOperation
#property (copy, nonatomic, readonly) TDWOperationBlock executionBlock;
- (instancetype)initWithExecutionBlock:(TDWOperationBlock)block NS_DESIGNATED_INITIALIZER;
#end
NS_ASSUME_NONNULL_END
The implementation file:
// TDWOpeartion.mm
#import "TDWOpeartion.h"
#import <pthread.h>
#define EXECUTE_WITH_ERROR(codeVar, execution) if((codeVar = execution)) {\
NSLog(#"Failed to execute " #execution " with error code: %d", codeVar);\
return;\
}
NS_ASSUME_NONNULL_BEGIN
__attribute__((__objc_direct_members__))
#interface TDWOperation ()
#property (assign, getter=tdw_p_isThreadStarted) BOOL tdw_p_threadStarted;
#property (assign, nonatomic) pthread_t tdw_p_underlyingThread;
#property (strong, nonatomic, readonly) dispatch_queue_t tdw_p_productsSyncQueue;
#end
NS_ASSUME_NONNULL_END
#implementation TDWOperation
#synthesize tdw_p_threadStarted = _tdw_p_threadStarted;
#pragma mark Lifecycle
- (instancetype)initWithExecutionBlock:(TDWOperationBlock)block {
if (self = [super init]) {
_executionBlock = block;
_tdw_p_threadStarted = NO;
_tdw_p_productsSyncQueue = dispatch_queue_create("the.dreams.wind.property_access.TDWOperation.isThreadStarted",
DISPATCH_QUEUE_CONCURRENT);
}
return self;
}
- (instancetype)init {
return [self initWithExecutionBlock:^{}];
}
#pragma mark NSOperation
- (void)main {
pthread_attr_t attrs;
int statusCode;
EXECUTE_WITH_ERROR(statusCode, pthread_attr_init(&attrs))
// Allocates 32 MiB stack size
EXECUTE_WITH_ERROR(statusCode, pthread_attr_setstacksize(&attrs, 1024 * 1024 * 32))
pthread_t thread;
EXECUTE_WITH_ERROR(statusCode, pthread_create(&thread, &attrs, &tdw_p_runExecutionBlock, (__bridge_retained void *)self))
EXECUTE_WITH_ERROR(statusCode, pthread_attr_destroy(&attrs))
void* result = nullptr;
if (!self.cancelled) {
self.tdw_p_threadStarted = YES;
EXECUTE_WITH_ERROR(statusCode, pthread_join(thread, &result));
self.tdw_p_threadStarted = NO;
}
NSLog(#"Main is finished with result: %d", *(int *)result);
delete (int *)result;
}
#pragma mark Properties
- (void)setExecutionBlock:(TDWOperationBlock)executionBlock {
if (self.tdw_p_isThreadStarted) {
[NSException raise:NSInternalInconsistencyException
format:#"Cannot change execution block when execution is already started"];
}
_executionBlock = executionBlock;
}
- (BOOL)tdw_p_isThreadStarted {
__block BOOL result;
dispatch_sync(_tdw_p_productsSyncQueue, ^{
result = _tdw_p_threadStarted;
});
return result;
}
- (void)setTdw_p_threadStarted:(BOOL)threadStarted {
dispatch_barrier_async(_tdw_p_productsSyncQueue, ^{
self->_tdw_p_threadStarted = threadStarted;
});
}
#pragma mark Private
void *tdw_p_runExecutionBlock(void *args) {
TDWOperation *self = (__bridge_transfer TDWOperation *)args;
if (self.executionBlock) {
self.executionBlock();
}
int *result = new int{};
return result;
}
#end
And now you can use it just like a regular NSOperation instance:
#import "TDWOpeartion.h"
#include <type_traits>
struct S {
unsigned char s[1024 * 1024];
};
int main(int argc, const char * argv[]) {
#autoreleasepool {
NSOperationQueue *queue = [NSOperationQueue new];
[queue addOperations:#[
[[TDWOperation alloc] initWithExecutionBlock:^{
using elemType = std::remove_all_extents_t<decltype(S::s)>;
S arr[16];
auto numOfElems = sizeof(S::s) / sizeof(elemType);
for(decltype(numOfElems) i = 0; i < numOfElems; ++i) {
for (auto val: arr) {
val.s[i] = i % sizeof(elemType);
}
}
NSLog(#"Sixteen MiB were initialized");
}]
] waitUntilFinished:YES];
}
return 0;
}

Related

What are the Malloc memory leaks shown in Instruments in Xcode, and how can I fix them?

I recently started working on optimizing the memory usage in my Swift application. When I started using the Leaks Instrument, I got over 100 "Malloc" leaks with no descriptions. I've looked around, but cannot find an explanation.
I'm running iOS 12.0 and Xcode 10.2
I went as far as commenting out all of the functions that were being called in my ViewDidLoad, and I'm still getting around 50 Malloc leaks.
I researched what causes memory leaks, and there's nothing in my code to suggest a leak, but I'm fairly new to memory management.
It's important for my app to not have leaks, so any help would be appreciated!
Let's say you have a very simple code
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int mem(long int mem) {
char *buffer = malloc(sizeof(char) * mem);
if(buffer == NULL) {
return 1;
} else {
return 0;
}
}
int main(int argc, const char * argv[]) {
long int mem_to_alloc = 2;
for(int i=0; i<30; i++, mem_to_alloc *= 2) {
printf("Allocation: %d Allocating: %ld\n", i, mem_to_alloc);
if( mem(mem_to_alloc) == 1 )
break;
sleep(1);
}
return 0;
}
First of all, make sure to set proper configuration in Build Scheme
Once in Instruments choose Leaks
After you have run your code, you should see suspicious allocations with the calls stack (Stack Trace) on the right.

iOS - How to measure thread wakeups?

I have an app that's crashing due to too many "thread wakeups". For example:
45004 wakeups over the last 220 seconds (205 wakeups per
second average), exceeding limit of 150 wakeups per second over 300 seconds
This is difficult to debug because I know of no direct way to measure thread wakeups. The closest I've found is an Instruments template called System Trace that will show you number of blocked thread events. Presumably, this is closely related since a blocked thread means that that thread will sleep and then wake up when it becomes unblocked.
The weird thing about this is that the number of blocked threads is in the 10,000's range per second when the app is running normally and doesn't crash. My assumption is that a blocked, sleeping thread only counts towards your "wakeups" limit in certain circumstances - e.g. I would expect that a thread that is locked due to a mutex lock counts, whereas the OS simply transitioning to other threads in normal operation doesn't.
It would be amazing to me if Instruments had a Thread Wakeups template. The only documentation I can find is here - https://developer.apple.com/library/content/technotes/tn2151/_index.html:
The exception subtype WAKEUPS indicates that threads in the process are being woken up too many times per second, which forces the CPU to wake up very often and consumes battery life.
Typically, this is caused by thread-to-thread communication (generally using peformSelector:onThread: or dispatch_async) that is unwittingly happening far more often than it should be. Because the sort of communication that triggers this exception is happening so frequently, there will usually be multiple background threads with very similar Backtraces - indicating where the communication is originating.
Here’s some Objective-C code based on Ivan’s answer you can copy + paste somewhere into your project (e.g. your applicationDidFinishLaunching: method) to log the number of wakeups per second (works on Mac and iOS):
#include <mach/task.h>
#include <mach/mach.h>
...
__block NSUInteger lastWakeups = 0;
[NSTimer scheduledTimerWithTimeInterval:1.0 repeats:YES block:^(NSTimer * _Nonnull timer) {
struct task_power_info info = {0};
mach_msg_type_number_t count = TASK_POWER_INFO_COUNT;
kern_return_t ret = task_info(current_task(), TASK_POWER_INFO, (task_info_t)&info, &count);
if (ret == KERN_SUCCESS) {
NSUInteger wakeups = info.task_interrupt_wakeups + info.task_timer_wakeups_bin_1 + info.task_timer_wakeups_bin_2;
NSLog(#"WAKEUPS: %lu per second", (unsigned long)(wakeups - lastWakeups));
lastWakeups = wakeups;
} else {
NSLog(#"Error: unable to get CPU wakeups (%d)", ret);
}
}];
Please, take a look here https://developer.apple.com/forums/thread/124180 There is a description of a code of getting wakeup count in your app, not only in the instrument. May help you:
#include <mach/task.h>
#include <mach/mach.h>
BOOL GetSystemWakeup(NSInteger *interrupt_wakeup, NSInteger *timer_wakeup) {
struct task_power_info info = {0};
mach_msg_type_number_t count = TASK_POWER_INFO_COUNT;
kern_return_t ret = task_info(current_task(), TASK_POWER_INFO, (task_info_t)&info, &count);
if (ret == KERN_SUCCESS) {
if (interrupt_wakeup) {
*interrupt_wakeup = info.task_interrupt_wakeups;
}
if (timer_wakeup) {
*timer_wakeup = info.task_timer_wakeups_bin_1 + info.task_timer_wakeups_bin_2;
}
return true;
}
else {
if (interrupt_wakeup) {
*interrupt_wakeup = 0;
}
if (timer_wakeup) {
*timer_wakeup = 0;
}
return false;
}
}
Also there you can find some reasons why wakeups occur too much times.
The "Energy Efficiency Guide for Mac Apps" at the end mentions a command-line utility called timerfires that can be used to see what is causing wakeups.
However, the utility seems to be outdated on macOS 12 Monterey, as I was getting errors like the following when I first tried to run it:
probe description fbt::thread_dispatch:entry does not match any probes
I had to copy the utility and edit it to remove all the DTrace methods that are no longer available to get the tool working.
Once that was done, the tool will show each timer invocation, which is very helpful to track down timers in order to reduce wakeups.

XCTestCase disable Snapshot accessibility hierarchy

I'm using XCode's XCTestCase for automated UI testing in order to measure performance of my application. I currently have a UITable with 25 000 elements in it and when trying to run tests that is supposed to swipe this list it takes for ever and crashes before finishing the test. The App targets CPU usage is at 100% at this point.
The last output in the console is:
Snapshot accessibility hierarchy for
When limiting the list down to a few hundred elements(not acceptable) the automated test is able to scroll the list at least but with around a 3-4 seconds wait between each scroll.
Test scenario:
let app = XCUIApplication();
app.buttons["Long list"].tap();
let table = app.tables.element;
table.swipeUp();
table.swipeUp();
So is there any way of speeding up the testing? Perhaps disabling the accessibility hierarchy(not using accessibility labels for the tests any ways).
maybe you can use api like - (double)pressAtPoint:(struct CGPoint)arg1 forDuration:(double)arg2 liftAtPoint:(struct CGPoint)arg3 velocity:(double)arg4 orientation:(long long)arg5 name:(id)arg6 handler:(CDUnknownBlockType)arg7;
#ifndef XCEventGenerator_h
#define XCEventGenerator_h
typedef void (^CDUnknownBlockType)(void);
#interface XCEventGenerator : NSObject
+ (id)sharedGenerator;
// iOS 10.3 specific
- (double)forcePressAtPoint:(struct CGPoint)arg1 orientation:(long long)arg2 handler:(CDUnknownBlockType)arg3;
- (double)pressAtPoint:(struct CGPoint)arg1 forDuration:(double)arg2 orientation:(long long)arg3 handler:(CDUnknownBlockType)arg4;
- (double)pressAtPoint:(struct CGPoint)arg1 forDuration:(double)arg2 liftAtPoint:(struct CGPoint)arg3 velocity:(double)arg4 orientation:(long long)arg5 name:(id)arg6 handler:(CDUnknownBlockType)arg7;
#end
#endif /* XCEventGenerator_h */
- (void)testExample {
XCUIApplication* app = [[XCUIApplication alloc] init];
XCUICoordinate* start_coord = [app coordinateWithNormalizedOffset:CGVectorMake(0.5, 0.3)];
XCUICoordinate* end_coord = [app coordinateWithNormalizedOffset:CGVectorMake(0.5, 0.7)];
NSLog(#"Start sleeping");
[NSThread sleepForTimeInterval:4];
NSLog(#"end sleeping");
for(int i = 0; i < 100; i++)
{
[[XCEventGenerator sharedGenerator] pressAtPoint:start_coord.screenPoint
forDuration:0
liftAtPoint:end_coord.screenPoint
velocity:1000
orientation:0
name:#"drag"
handler:^{}];
[NSThread sleepForTimeInterval:1];
}
}

Doing simple malloc/free within dispatch_async causes memory leak on iOS9

I just got a memory leak in my code after I updated my iPad to iOS9, which worked fine on iOS8 and iOS7.
I have an anonymous thread created by the following code:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^{
[self threadWork];
});
And the thread does a pair of malloc/free call like this:
- (void)threadWork {
// Create a serial queue.
dispatch_queue_t mySerialQueue = dispatch_queue_create("myQueue", NULL);
while (1) {
// Do a simple malloc.
int *foo = (int *)malloc(1024);
// Do free in serial queue.
dispatch_async(mySerialQueue, ^{
free(foo);
});
[NSThread sleepForTimeInterval:1.0 / 60.0];
}
}
This routing will keep the memory usage increasing and finally crashes device on iOS 9. The problem also happened on new/delete in Objective-C++.
I found some other way to do this without memory leak:
Use main queue or global queue to instead the serial queue.
Create concurrent queue instead the serial queue.
Use [NSThread detachNewThreadWithSelector:toTarget:withObject:] to create the thread instead GCD.
I don't understand why this simple routing causes this problem.
I've searched this on google but found nothing.
How can I do this with keeping serial queue and GCD anonymous thread?
Update:
I tried to put NSLog commands in my code to figure out when will the malloc/free be called. The result shows that both of them are called immediately and come in pair. I also tried to slow the thread down to once per second, but the problem still here.
The test code of thread:
- (void)threadWork {
uint64_t mallocCount = 0;
__block uint64_t freeCount = 0;
dispatch_queue_t mySerialQueue = dispatch_queue_create("MyQueue", NULL);
while (1) {
void *test = malloc(1024);
NSLog(#"malloc %llu", ++mallocCount);
dispatch_async(mySerialQueue, ^{
free(test);
NSLog(#"free %llu", ++freeCount);
});
[NSThread sleepForTimeInterval:1.0];
}
}
The console result:
...
2015-10-23 09:51:33.876 OS9MemoryTest[759:153135] malloc 220
2015-10-23 09:51:33.876 OS9MemoryTest[759:153133] free 220
2015-10-23 09:51:34.877 OS9MemoryTest[759:153135] malloc 221
2015-10-23 09:51:34.878 OS9MemoryTest[759:153133] free 221
2015-10-23 09:51:35.883 OS9MemoryTest[759:153135] malloc 222
2015-10-23 09:51:35.883 OS9MemoryTest[759:153133] free 222
I think I've found a better way to do this without leak problem rather than using dispatch_sync.
The point seems to be the setting of Quality of Service (QoS) class of serial queue.
Doing free in a queue which have QOS_CLASS_UNSPECIFIED QoS class causes this problem.
In my question, I free memory in a serial queue which was created by the following call :
dispatch_queue_t mySerialQueue = dispatch_queue_create("MyQueue", NULL);
Its QoS setting is QOS_CLASS_UNSPECIFIED which causes this problem.
If create a serial queue with dispatch_queue_attr_t object, which have QoS setting excepted QOS_CLASS_UNSPECIFIED, the code runs perfectly without leaking:
- (void)threadWork {
// Create a serial queue with QoS class.
dispatch_queue_attr_t attr = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_DEFAULT, 0);
dispatch_queue_t mySerialQueue = dispatch_queue_create("myQueue", attr);
while (1) {
// Do a simple malloc.
int *foo = (int *)malloc(1024);
// Do free in serial queue.
dispatch_async(mySerialQueue, ^{
free(foo);
});
[NSThread sleepForTimeInterval:1.0 / 60.0];
}
}
I still don't understand why this problem would happened on iOS9,
but setting the QoS seems to make things work.

does NSThread create autoreleasepool automatically now?

I have test code like this
- (void)viewDidLoad
{
[super viewDidLoad];
NSThread *thread = [[NSThread alloc] initWithTarget:self selector:#selector(test) object:nil];
[thread start];
}
-(void)test
{
MyClass *my = [[[MyClass alloc] init] autorelease];
NSLog(#"%#",[my description]);
}
I did not create any autoreleasepool for my own thread, but when the thread exit, object "my" just dealloc.why?
even though I change my test code as below
- (void)viewDidLoad
{
[super viewDidLoad];
NSThread *thread = [[NSThread alloc] initWithTarget:self selector:#selector(test) object:nil];
[thread start];
}
-(void)test
{
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
MyClass *my = [[[MyClass alloc] init] autorelease];
NSLog(#"%#",[my description]);
}
I create my own autoreleasepool but not drain it when the thread exit. object "my" can still dealloc anyway. why?
I use Xcode5 and not using ARC
It's not documented, but the answer appears to be Yes, on OS X 10.9+ and iOS 7+.
The Objective-C runtime is open-source so you can read the source to see what's going on. The latest version of the runtime (646, which shipped with OS X 10.10 and iOS 8) does indeed add a pool if you perform an autorelease without a pool on the current thread. In NSObject.mm:
static __attribute__((noinline))
id *autoreleaseNoPage(id obj)
{
// No pool in place.
assert(!hotPage());
if (obj != POOL_SENTINEL && DebugMissingPools) {
// We are pushing an object with no pool in place,
// and no-pool debugging was requested by environment.
_objc_inform("MISSING POOLS: Object %p of class %s "
"autoreleased with no pool in place - "
"just leaking - break on "
"objc_autoreleaseNoPool() to debug",
(void*)obj, object_getClassName(obj));
objc_autoreleaseNoPool(obj);
return nil;
}
// Install the first page.
AutoreleasePoolPage *page = new AutoreleasePoolPage(nil);
setHotPage(page);
// Push an autorelease pool boundary if it wasn't already requested.
if (obj != POOL_SENTINEL) {
page->add(POOL_SENTINEL);
}
// Push the requested object.
return page->add(obj);
}
This function is called when you push the first pool (in which case the thing pushed is POOL_SENTINEL), or you autorelease with no pool. When the first pool is pushed, it sets up the autorelease stack. But as you see from the code, as long as the DebugMissingPools environmental variable is not set (it's not set by default), when autorelease is done with no pool, it also sets up the autorelease stack, and then pushes a pool (pushes a POOL_SENTINEL).
Similarly, (it's a little hard to follow without looking at the other code, but this is the relevant part) when the thread is destroyed (and the Thread-Local Storage is destroyed), it releases everything in the autorelease stack (that's what the pop(0); does) so it doesn't rely on the user to pop the last pool:
static void tls_dealloc(void *p)
{
// reinstate TLS value while we work
setHotPage((AutoreleasePoolPage *)p);
pop(0);
setHotPage(nil);
}
The previous version of the runtime (551.1, which came with OS X 10.9 and iOS 7), also did this, as you can see from its NSObject.mm:
static __attribute__((noinline))
id *autoreleaseSlow(id obj)
{
AutoreleasePoolPage *page;
page = hotPage();
// The code below assumes some cases are handled by autoreleaseFast()
assert(!page || page->full());
if (!page) {
// No pool. Silently push one.
assert(obj != POOL_SENTINEL);
if (DebugMissingPools) {
_objc_inform("MISSING POOLS: Object %p of class %s "
"autoreleased with no pool in place - "
"just leaking - break on "
"objc_autoreleaseNoPool() to debug",
(void*)obj, object_getClassName(obj));
objc_autoreleaseNoPool(obj);
return nil;
}
push();
page = hotPage();
}
do {
if (page->child) page = page->child;
else page = new AutoreleasePoolPage(page);
} while (page->full());
setHotPage(page);
return page->add(obj);
}
But the version before that (532.2, which came with OS X 10.8 and iOS 6), does not:
static __attribute__((noinline))
id *autoreleaseSlow(id obj)
{
AutoreleasePoolPage *page;
page = hotPage();
// The code below assumes some cases are handled by autoreleaseFast()
assert(!page || page->full());
if (!page) {
assert(obj != POOL_SENTINEL);
_objc_inform("Object %p of class %s autoreleased "
"with no pool in place - just leaking - "
"break on objc_autoreleaseNoPool() to debug",
obj, object_getClassName(obj));
objc_autoreleaseNoPool(obj);
return NULL;
}
do {
if (page->child) page = page->child;
else page = new AutoreleasePoolPage(page);
} while (page->full());
setHotPage(page);
return page->add(obj);
}
Note that the above works for any pthreads, not just NSThreads.
So basically, if you are running on OS X 10.9+ or iOS 7+, autoreleasing on a thread without a pool should not lead to a leak. This is not documented and is an internal implementation detail, so be careful relying on this as Apple could change it in a future OS. However, I don't see any reason why they would remove this feature as it is simple and only has benefits and no downsides, unless they completely re-write the way autorelease pools work or something.
Apple documentation says (4th paragraph):
You create an NSAutoreleasePool object with the usual alloc and
init messages and dispose of it with drain (or release—to understand
the difference, see Garbage Collection). Since you cannot retain an
autorelease pool (or autorelease it—see retain and autorelease),
draining a pool ultimately has the effect of deallocating it. You
should always drain an autorelease pool in the same context
(invocation of a method or function, or body of a loop) that it was
created. See Using Autorelease Pool Blocks for more details.

Resources