Sensitive data: NSString VS NSMutableString (iPhone) - ios

I have some sensitive data I want to clear directly after use. Currently, the sensitive data is in the form of NSString. NSString is in my understanding immutable, meaning that I can't really clear the data. NSMutableString seems more appropriate, though, as it is mutable and has methods like replaceCharactersInRange and deleteCharactersInRange. I have no knowledge of the implementation details so I wonder if NSMutableString would serve my purpose?

I would be afraid NSMutableString would try to optimize and leave the string in memory. If you want more control try allocating your own memory then create an NSString with it. If you do that you can overwrite the memory before you release it.
char* block = malloc(200);
NSString* string = [[NSString alloc] initWithBytesNoCopy:length:encoding:freeWhenDone];
//use string
memset(block, 0, 200);// overwrite block with 0
[string release];
free(block);

You need to wipe the c pointer with zeros with a memset function however a memset call can be optimized out by the compiler, see What is the correct way to clear sensitive data from memory in iOS?
So the code could be something like this:
NSString *string = #"hi";
unsigned char *stringChars = (unsigned char *)CFStringGetCStringPtr((CFStringRef)string, CFStringGetSystemEncoding());
safeMemset(stringChars, 0, [string length]);
But be careful clearing the underlying c pointer of an NSString. On a device for example, if the string contains the word "password", the underlying c pointer just reuses or points to the same address as used by the system and you will crash by trying to wipe this area of memory.
To be safe you may want to use a char array, not the char pointer, to store your sensitive strings and wipe them after without ever putting it into an NSString object.

If an attacker can read the contents of memory, you are beyond hosed.
-release the string and be done with it. There's no way to know if you've deleted any possible copies of the string in various caches (such as if you draw it to screen, etc).
You probably have much more significant security issues to worry about.

As of iOS9, inner pointer of NSString obtained from the snippet below has become read-only and generates bad access when trying to set the bytes.
unsigned char *stringChars = (unsigned char *)CFStringGetCStringPtr((CFStringRef)string, CFStringGetSystemEncoding());
It is possible with NSMutableString but then if you have another NSString source, say from a textfield, that source will still be in memory and you're still out of luck.
If you are creating a new NSString, The best way is implement your own String class with underlying byte array. Provide a method to create NSString copies using the underlying byte array as the inner pointer.:
-(NSString *)string
{
return [[NSString alloc] initWithBytesNoCopy:_buff length:_length encoding:NSUTF8StringEncoding freeWhenDone:NO];
}
// Will prematurely wipe data and all its copies when called
- (void)clear
{
// Volatile keyword disables compiler's optimization
volatile unsigned char *t = (unsigned char *)_buff;
int len = _length;
while (len--) {
*t++ = 0;
}
}
// In case you forget to clear, it will cleared on dealloc
- (void)dealloc
{
[self clear];
free(_buff);
}

Related

What exactly does #"" do behind the scenes?

When giving a NSString a value using = #"", what exactly is that shorthand for? I noticed that, all other things being equal in an application, global NSStrings made with #"" don't need retain to be used outside of the methods that gave them that value, whereas NSStrings given a value in any other way do.
I have found this question utterly unsearchable, so I thank you for your time.
A related answer: Difference between NSString literals
In short, when writing #"" in the global scope of your program, it does the same as writing "": The literal string you specified will be embedded in your binary at compile time, and can thus not be deallocated during runtime. This is why you don't need release or retain.
The # just tells the compiler to construct a NSString object from the C string literal. Mind you that this construction is very cheap and probably heavily optimized. You can follow the link and see this example:
NSString *str = #"My String";
NSLog(#"%# (%p)", str, str);
NSString *str2 = [[NSString alloc] initWithString:#"My String"];
NSLog(#"%# (%p)", str2, str2);
Producing this output:
2011-11-07 07:11:26.172 Craplet[5433:707] My String (0x100002268)
2011-11-07 07:11:26.174 Craplet[5433:707] My String (0x100002268)
Note the same memory addresses
EDIT:
I made some test myself, see this code:
static NSString *str0 = #"My String";
int main(int argc, const char * argv[]) {
NSLog(#"%# (%p)", str0, str0);
NSString *str = #"My String";
NSLog(#"%# (%p)", str, str);
NSString *str2 = [[NSString alloc] initWithString:#"My String"];
NSLog(#"%# (%p)", str2, str2);
return 0;
}
Will produce this output:
2015-12-13 21:20:00.771 Test[6064:1195176] My String (0x100001030)
2015-12-13 21:20:00.772 Test[6064:1195176] My String (0x100001030)
2015-12-13 21:20:00.772 Test[6064:1195176] My String (0x100001030)
Also, when using the debugger you can see that the actual object being created when using literals are in fact __NSCFConstantString objects.
I think the related concept to that is called Class Clusters
Objective-C is a super-set of C, which means you should be able to write C code in there and have it work. As a result, the compiler needs a way to distinguish between a C string (an old-school series of bytes) and an Objective-C NSString (Unicode support, etc). This is done using the # symbol, such as #"Hello".
#"Hello" can't be released because it's a literal string – it's been written into the program when it was built, versus being assigned at run-time.
String allocated using the literal syntax i.e # reside as c strings in the data segment of the executable. They are allocated only once at the time of program launch and are never released until the program quits.
As an exercise you can try this:
NSString *str1 = #"Hello";
NSString *str2 = #"Hello";
NSLog("Memory Address of str1 : %p", str1);
NSLog("Memory Address of str2 : %p", str2);
both the log statements will print the same address which means literals are constant strings with lifetimes same as that of the program.

Need assistance regarding NSString

In NSString NSString Class Reference what this means
Distributed objects:
Over distributed-object connections, mutable string objects are passed by-reference and immutable string objects are passed by-copy.
And NSString can't be changed, so what happening when I am changing str in this code
NSString *str = #"";
for (int i=0; i<1000; i++) {
str = [str stringByAppendingFormat:#"%d", i];
}
will I get memory leak? Or what?
What your code is doing:
NSString *str = #""; // pointer str points at memory address 123 for example
for (int i=0; i<1000; i++) {
// now you don't change the value to which the pointer str points
// instead you create a new string located at address, lets say, 900 and let the pointer str know to point at address 900 instead of 123
str = [str stringByAppendingFormat:#"%d", i]; // this method creates a new string and returns a pointer to the new string!
// you can't do this because str is immutable
// [str appendString:#"mmmm"];
}
Mutable means you can change the NSString. For example with appendString.
pass by copy means that you get a copy of NSString and you can do whatever you want; it does not change the original NSString
- (void)magic:(NSString *)string
{
string = #"LOL";
NSLog(#"%#", string);
}
// somewhere in your code
NSString *s = #"Hello";
NSLog(#"%#", s); // prints hello
[self magic:s]; // prints LOL
NSLog(#"%#", s); // prints hello not lol
But imagine you get a mutable NSString.
- (void)magic2:(NSMutableString *)string
{
[string appendString:#".COM"];
}
// somewhere in your code
NSString *s = #"Hello";
NSMutableString *m = [s mutableCopy];
NSLog(#"%#", m); // prints hello
[self magic2:m];
NSLog(#"%#", m); // prints hello.COM
Because you pass a reference you can actually change the "value" of your string object since you are working with the original version and not a duplicate.
NOTE
String literals live as long as your app lives. In your exmaple it means that your NSString *str = #""; never gets deallocated. So in the end after you have looped through your for loop there are two string objects living in your memory. Its #"" which you cannot access anymore since you have no pointer to it but it is still there! And your new string str=123456....1000; But this is not a memory leak.
more information
No, you will not get memory leak with your code, as you are not retaining those objects in the loop, they're created with convenience method, you don't own them, and they will be released on next cycle of autorelease pool. And, it's doesn't matter if you are using ARC or not, objects created with convenience methods and not retained are released wherever they are out of their context.
In will not leak memory, but will get more memory allocation, due to making new copy of immutable copy as many time loop triggers [str stringByAppendingFormat:#"%d", i];.
Memory leak will get performed when, you put your data unreferenced, or orphan, this will not make your last copy of string orphan every time when loops, but will clear all copies of NSString when operation get complete, or viewDidUnload.
You will not get a memory leak in the example code because Automatic Reference Counting will detect the assignment to str and (automatically) release the old str.
But it would be much better coding style (and almost certainly better performance) to do this:
NSMutableString* mstr = [NSMutableString new];
for(int i = 0; i < 1000; ++i){
[mstr appendFormat:#"%d",i];
}
NSString* str = mstr;
...
As to the first question, I think it means that a change made to a mutable string by a remote process will be reflected in the originating process's object.

Peculiar issue with wiping the underlying buffer of NSString

I am using a technique outlined in the book Hacking and Securing iOS Applications (relevant section here) to wipe the underlying buffer of a NSString as shown below.
NSString *s = [NSString stringWithFormat:#"Hello"];
unsigned char *text = (unsigned char*)CFStringGetCStringPtr((CFStringRef)s, CFStringGetSystemEncoding());
if (text != NULL)
{
memset(text, 0, [s length]);
}
This works, unless the string is a certain value.
// The following crashes with EXC_ACCESS_ERROR on memset
NSString *s = [NSString stringWithFormat:#"No"];
NSString *s = [NSString stringWithFormat:#"Yes"];
// These work fine though
NSString *s = [NSString stringWithFormat:#"Hello"];
NSString *s = [NSString stringWithFormat:#"Do"];
NSString *s = [#"N" stringByAppendingString:#"o"];
It looks like certain strings are not created on the heap but is optimized by making it point to a read-only string table even if the string is created on the heap.
Indeed constant string are not created on the heap and are in read-only memory. This includes a few that look like runtime but are made compile-time constants, your examples are such statements.
With this statement fragment there is no reason not to make it a compile-time constant.
[NSString stringWithFormat:#"No"]
it is equivalent to:
#"No"
Suggestion, file a bug report requesting a secure string class, I have. Several have been filed and I have been told that if there are enough (whatever that amount might be) are files it will implement it.
It is possible to subclass NSString, not easy but you will have compete control of the actual buffer and it should not be subject to possible failure due to Apple changing the implementation detail. I have done that successfully.
NSString *s = [NSString stringWithFormat:#"No"]; will be optimised to NSString *s = #"No"; because there are no substitutions in the format. Assigning a literal will give you a pointer to the readonly text segment of the loaded binary.
NSString *s = [#"N" stringByAppendingString:#"o"]; will create a new string on the heap and return a reference to it. The heap is read-write even though the datatype NSString is readonly.
When you get the CString pointer to the underlying data it's pointing either into read-only data in the first case, or read-write data in the second. The memset will fail on the readonly memory, but succeed on the read-write.

iOS: How to avoid autoreleased copies when manipulating large NSString instance?

I have a scenario in an iOS application where manipulating a very large NSString instance (an HTTP response, upwards of 11MB) results in multiple large intermediaries being in memory at once, since the SDK methods I am calling return new autoreleased instances. What is the best approach to take here?
For example, assuming that largeString is an autoreleased NSString instance:
NSArray *partsOfLargeString = [largeString componentsSeparatedByString:separator];
for (NSString *part in partsOfLargeString) {
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
}
It would be great if there were non-autoreleased equivalents to componentsSeparatedByString or stringByTrimmingCharactersInSet, but I'm not looking to implement these myself.
To my knowledge, there isn't a way to "force" release an object that has already been added to an autorelease pool. I know that I can create and use my own autorelease pool here, but I'd like to be extremely granular and having autorelease pools around individual statements definitely isn't a very scalable approach.
Any suggestions are much appreciated.
As Bill said, I’d first try to have an autorelease pool for each loop iteration, e.g.:
for (NSString *part in partsOfLargeString) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
…
[pool drain];
}
or, if you’re using a recent enough compiler:
for (NSString *part in partsOfLargeString) {
#autoreleasepool {
NSString *trimmedPart = [part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [trimmedPart dataUsingEncoding:NSUTF8StringEncoding];
…
}
}
If that’s still not acceptable and you do need to release objects in a more granular fashion, you could use something like:
static inline __attribute__((ns_returns_retained))
id BICreateDrainedPoolObject(id (^expression)(void)) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
id object = expression();
[object retain];
[pool drain];
return object;
}
#define BIOBJ(expression) BICreateDrainedPoolObject(^{return (expression);})
which evaluates the expression, retains its result, releases any ancillary autoreleased objects and returns the result; and then:
for (NSString *part in partsOfLargeString) {
NSAutoreleasePool *pool = [NSAutoreleasePool new];
NSString *trimmedPart = BIOBJ([part stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]);
NSData *data = BIOBJ([trimmedPart dataUsingEncoding:NSUTF8StringEncoding]);
[trimmedPart release];
// do something with data
[data release];
…
[pool drain];
}
Note that, since the function returns a retained object, you’re responsible for releasing it. You’ll have control over when to do that.
Feel free to choose better names for the function and macro. There might be some corner cases that should be handled but it should work for your particular example. Suggestions are welcome!
First, you shouldn't need to parse responses from an HTTP server in this fashion. Parsing HTTP responses (including parsing HTML) is a solved problem and attempting to parse it using raw string manipulation will lead to fragile code that can easily be crashed with seemingly innocuous server side changes.
Autorelease pools are pretty cheap. You could surround the body [inside] of the for with #autoreleasepool {... that code ...} and it'll probably both fix your high-water issue and have negligible performance impact [compared to raw string manipulation].
Beyond that, your summary is correct -- if there isn't a non-autoreleasing variant in the 'kit, then you'd have to re-invent the wheel. With that said, it is fairly typical that the lack of a non-autoreleasing variant is not an oversight on the part of the designer. Instead, it is likely because there are better tools available for achieving the sort of high-volume solution that would also require finer grained memory management.

Converting NSString into uint8_t

I am working on data encryption sample code provided by Apple in the "Certificate, Key and Trust Programming guide". The sample code for encrypting/decrypting data considers an uint8_t. However the real world application would be doing this on an NSString object. I have been trying to convert NSString object to uint8_t but every-time I try I get a compiler warning. Solutions given for 'almost' same problems given in various forums, don't seem to work for me.
Here is an example of turning any string value into a uint8_t*. The easiest way is to just cast the bytes of NSData as and uint8_t*. Other option is to allocate memory and copy the bytes but you will still need to track the length somehow.
NSData *someData = [#"SOME STRING VALUE" dataUsingEncoding:NSUTF8StringEncoding];
const void *bytes = [someData bytes];
int length = [someData length];
//Easy way
uint8_t *crypto_data = (uint8_t*)bytes;
Optional way
//If you plan on using crypto_data as a class variable
// you will need to do a memcpy since the NSData someData
// will get autoreleased
crypto_data = malloc(length);
memcpy(crypto_data, bytes, length);
//work with crypto_data
//free crypto_data most likely in dealloc
free(crypto_data);
NSString *stringToEncrypt = #"SOME STRING VALUE";
uint8_t *cString = (uint8_t *)stringToEncrypt.UTF8String;

Resources