iOS: How to avoid autoreleased copies when manipulating large NSString instance? - ios

I have a scenario in an iOS application where manipulating a very large NSString instance (an HTTP response, upwards of 11MB) results in multiple large intermediaries being in memory at once, since the SDK methods I am calling return new autoreleased instances. What is the best approach to take here?
For example, assuming that largeString is an autoreleased NSString instance:
NSArray *partsOfLargeString = [largeString componentsSeparatedByString:separator];
for (NSString *part in partsOfLargeString) {
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
}
It would be great if there were non-autoreleased equivalents to componentsSeparatedByString or stringByTrimmingCharactersInSet, but I'm not looking to implement these myself.
To my knowledge, there isn't a way to "force" release an object that has already been added to an autorelease pool. I know that I can create and use my own autorelease pool here, but I'd like to be extremely granular and having autorelease pools around individual statements definitely isn't a very scalable approach.
Any suggestions are much appreciated.

As Bill said, I’d first try to have an autorelease pool for each loop iteration, e.g.:
for (NSString *part in partsOfLargeString) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
…
[pool drain];
}
or, if you’re using a recent enough compiler:
for (NSString *part in partsOfLargeString) {
#autoreleasepool {
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
…
}
}
If that’s still not acceptable and you do need to release objects in a more granular fashion, you could use something like:
static inline __attribute__((ns_returns_retained))
id BICreateDrainedPoolObject(id (^expression)(void)) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
id object = expression();
[object retain];
[pool drain];
return object;
}
#define BIOBJ(expression) BICreateDrainedPoolObject(^{return (expression);})
which evaluates the expression, retains its result, releases any ancillary autoreleased objects and returns the result; and then:
for (NSString *part in partsOfLargeString) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
NSString *trimmedPart = BIOBJ([part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]);
NSData *data = BIOBJ([trimmedPart dataUsingEncoding:NSUTF8StringEncoding]);
[trimmedPart release];
// do something with data
[data release];
…
[pool drain];
}
Note that, since the function returns a retained object, you’re responsible for releasing it. You’ll have control over when to do that.
Feel free to choose better names for the function and macro. There might be some corner cases that should be handled but it should work for your particular example. Suggestions are welcome!

First, you shouldn't need to parse responses from an HTTP server in this fashion. Parsing HTTP responses (including parsing HTML) is a solved problem and attempting to parse it using raw string manipulation will lead to fragile code that can easily be crashed with seemingly innocuous server side changes.
Autorelease pools are pretty cheap. You could surround the body [inside] of the for with #autoreleasepool {... that code ...} and it'll probably both fix your high-water issue and have negligible performance impact [compared to raw string manipulation].
Beyond that, your summary is correct -- if there isn't a non-autoreleasing variant in the 'kit, then you'd have to re-invent the wheel. With that said, it is fairly typical that the lack of a non-autoreleasing variant is not an oversight on the part of the designer. Instead, it is likely because there are better tools available for achieving the sort of high-volume solution that would also require finer grained memory management.

Related

Objective-C memory leak when returning NSString

I want to be sure that my code is not leaking, since this small snippet is called thousand times in my app. I run the app through Instruments and the initWithBytes seems to be problematic. Is anything wrong in this code?
First [reader readString] is called.
case FirstCase:
{
NSString *string = [reader readString];
[self setPropertyByName:propertyName value:string];
break;
}
...
readString is returns the strings which is autoreleased.
- (NSString*) readString
{
...
NSString *string = [[[[NSString alloc] initWithBytes:cursor length:stringLength encoding:NSUTF8StringEncoding] autorelease];
return string;
}
Is the code OK? Any other better approach to avoid autorelease?
I cannot change my code to ARC. Plain old non-ARC memory management.
What you posted is OK. The only rule at this point is that methods contain "create" or "alloc" will return an object that needs to be explicitly released. In your case that is the string returned in the readString method.
Since the object will be returned you need to retain it till the end of the run loop cycle which the autorelease pool will do. What that means for instance is if this method will be called in a for loop the objects will not be deallocated before the loop has exited.
If you want or need to avoid that I suggest you to do the same pattern with "create" or "alloc" and return an object not being autoreleased:
case FirstCase:
{
NSString *string = [reader createReadString];
[self setPropertyByName:propertyName value:string];
[string release];
break;
}
...
- (NSString*) createReadString
{
...
NSString *string = [[[NSString alloc] initWithBytes:cursor length:stringLength encoding:NSUTF8StringEncoding];
return string;
}

Why is the memory allocated from componentsSeparatedByString never being allocated

I have a iOS app which does alot of calculation and is using standard ARC for memory management. After I run it for a few minutes it crashes due to being out out memory. I checked with Instruments and most of the memory is being eaten up by allocations from a call to NSString's commentsSeparatedByString.
I tried running it in a autorelease pool but that didn't help much. Since there are no references to that string outside of my function, I'm confused why the memory isn't being automatically deallocated. I also have another function which is having the same problem with commentsSeparatedByString.
Here is the code:
- (void) processWorkWithExtraData:(NSData *) extraData
{
#autoreleasepool {
NSString *string = [[NSString alloc] initWithData:extraData encoding:NSUTF8StringEncoding];
NSArray *dataArray = [string componentsSeparatedByString:#","]; // eats up memory like crazy!!!
NSMutableArray *objectArray = [[NSMutableArray alloc] init];
for (int i=0;i<[dataArray count];i += 1)
{
TestObject *p = [[TestObject alloc] initWithFloat:[[dataArray objectAtIndex:i] floatValue]];
[objectArray addObject:p];
}
[self processArray: objectArray]; // just performs math computations on the floats in the objects
}
}
If anyone can let me know why memory would not be freed here please let me know.
Figured out the problem, I thought I was using ARC but I wasn't (:
Good thing is this fixes my memory issues.
Bad thing is that it's much slower (50-70% slower).
I guess that's the price one has to pay for the magic that is ARC.

Peculiar issue with wiping the underlying buffer of NSString

I am using a technique outlined in the book Hacking and Securing iOS Applications (relevant section here) to wipe the underlying buffer of a NSString as shown below.
NSString *s = [NSString stringWithFormat:#"Hello"];
unsigned char *text = (unsigned char*)CFStringGetCStringPtr((CFStringRef)s, CFStringGetSystemEncoding());
if (text != NULL)
{
memset(text, 0, [s length]);
}
This works, unless the string is a certain value.
// The following crashes with EXC_ACCESS_ERROR on memset
NSString *s = [NSString stringWithFormat:#"No"];
NSString *s = [NSString stringWithFormat:#"Yes"];
// These work fine though
NSString *s = [NSString stringWithFormat:#"Hello"];
NSString *s = [NSString stringWithFormat:#"Do"];
NSString *s = [#"N" stringByAppendingString:#"o"];
It looks like certain strings are not created on the heap but is optimized by making it point to a read-only string table even if the string is created on the heap.
Indeed constant string are not created on the heap and are in read-only memory. This includes a few that look like runtime but are made compile-time constants, your examples are such statements.
With this statement fragment there is no reason not to make it a compile-time constant.
[NSString stringWithFormat:#"No"]
it is equivalent to:
#"No"
Suggestion, file a bug report requesting a secure string class, I have. Several have been filed and I have been told that if there are enough (whatever that amount might be) are files it will implement it.
It is possible to subclass NSString, not easy but you will have compete control of the actual buffer and it should not be subject to possible failure due to Apple changing the implementation detail. I have done that successfully.
NSString *s = [NSString stringWithFormat:#"No"]; will be optimised to NSString *s = #"No"; because there are no substitutions in the format. Assigning a literal will give you a pointer to the readonly text segment of the loaded binary.
NSString *s = [#"N" stringByAppendingString:#"o"]; will create a new string on the heap and return a reference to it. The heap is read-write even though the datatype NSString is readonly.
When you get the CString pointer to the underlying data it's pointing either into read-only data in the first case, or read-write data in the second. The memset will fail on the readonly memory, but succeed on the read-write.

multiple assignments of objects to variable under ARC

There's a point of memory management I'm not 100% clear on, suppose there is the following code:
{
NSString *string = [[NSString alloc] init];
string = [[NSString alloc] init];
}
Does this cause a memory leak of the first allocation? If not why not?
Under ARC, this does not leak memory. This is because any time a strong object pointer is changed, the compiler automatically sends a release to the old object. Local variables, like NSString *string, are strong by default.
So your code above gets compiled to something more like this:
{
NSString *string = [[NSString alloc] init];
// Oh, we're changing what `string` points to. Gotta release the old value.
[string release];
string = [[NSString alloc] init];
}
Conceptually, BJ is correct, but the generated code is slightly different. It goes something like this:
NSString *string = [[NSString alloc] init];
// Oh, we're changing what `string` points to. Gotta release the old value.
NSString *tmpString = string;
string = [[NSString alloc] init];
[tmpString release];
[string release]; // string goes out of scope at this point in your code
This order of operation is usually not that critical (and if you care too much about it, you are probably coding incorrectly). But understanding it explains why the objects are destroyed exactly when they are.
No it does not cause a leak. ARC will release the first string before it sets the second string. This is the truly amazing power of ARC!

Sensitive data: NSString VS NSMutableString (iPhone)

I have some sensitive data I want to clear directly after use. Currently, the sensitive data is in the form of NSString. NSString is in my understanding immutable, meaning that I can't really clear the data. NSMutableString seems more appropriate, though, as it is mutable and has methods like replaceCharactersInRange and deleteCharactersInRange. I have no knowledge of the implementation details so I wonder if NSMutableString would serve my purpose?
I would be afraid NSMutableString would try to optimize and leave the string in memory. If you want more control try allocating your own memory then create an NSString with it. If you do that you can overwrite the memory before you release it.
char* block = malloc(200);
NSString* string = [[NSString alloc] initWithBytesNoCopy:length:encoding:freeWhenDone];
//use string
memset(block, 0, 200);// overwrite block with 0
[string release];
free(block);
You need to wipe the c pointer with zeros with a memset function however a memset call can be optimized out by the compiler, see What is the correct way to clear sensitive data from memory in iOS?
So the code could be something like this:
NSString *string = #"hi";
unsigned char *stringChars = (unsigned char *)CFStringGetCStringPtr((CFStringRef)string, CFStringGetSystemEncoding());
safeMemset(stringChars, 0, [string length]);
But be careful clearing the underlying c pointer of an NSString. On a device for example, if the string contains the word "password", the underlying c pointer just reuses or points to the same address as used by the system and you will crash by trying to wipe this area of memory.
To be safe you may want to use a char array, not the char pointer, to store your sensitive strings and wipe them after without ever putting it into an NSString object.
If an attacker can read the contents of memory, you are beyond hosed.
-release the string and be done with it. There's no way to know if you've deleted any possible copies of the string in various caches (such as if you draw it to screen, etc).
You probably have much more significant security issues to worry about.
As of iOS9, inner pointer of NSString obtained from the snippet below has become read-only and generates bad access when trying to set the bytes.
unsigned char *stringChars = (unsigned char *)CFStringGetCStringPtr((CFStringRef)string, CFStringGetSystemEncoding());
It is possible with NSMutableString but then if you have another NSString source, say from a textfield, that source will still be in memory and you're still out of luck.
If you are creating a new NSString, The best way is implement your own String class with underlying byte array. Provide a method to create NSString copies using the underlying byte array as the inner pointer.:
-(NSString *)string
{
return [[NSString alloc] initWithBytesNoCopy:_buff length:_length encoding:NSUTF8StringEncoding freeWhenDone:NO];
}
// Will prematurely wipe data and all its copies when called
- (void)clear
{
// Volatile keyword disables compiler's optimization
volatile unsigned char *t = (unsigned char *)_buff;
int len = _length;
while (len--) {
*t++ = 0;
}
}
// In case you forget to clear, it will cleared on dealloc
- (void)dealloc
{
[self clear];
free(_buff);
}

Resources