Find Character String In Binary Data - ios

I have a binary file I've loaded using an NSData object. Is there a way to locate a sequence of characters, 'abcd' for example, within that binary data and return the offset without converting the entire file to a string? Seems like it should be a simple answer, but I'm not sure how to do it. Any ideas?
I'm doing this on iOS 3 so I don't have -rangeOfData:options:range: available.
I'm going to award this one to Sixteen Otto for suggesting strstr. I went and found the source code for the C function strstr and rewrote it to work on a fixed length Byte array--which incidentally is different from a char array as it is not null terminated. Here is the code I ended up with:
- (Byte*)offsetOfBytes:(Byte*)bytes inBuffer:(const Byte*)buffer ofLength:(int)len;
{
Byte *cp = bytes;
Byte *s1, *s2;
if ( !*buffer )
return bytes;
int i = 0;
for (i=0; i < len; ++i)
{
s1 = cp;
s2 = (Byte*)buffer;
while ( *s1 && *s2 && !(*s1-*s2) )
s1++, s2++;
if (!*s2)
return cp;
cp++;
}
return NULL;
}
This returns a pointer to the first occurrence of bytes, the thing I'm looking for, in buffer, the byte array that should contain bytes.
I call it like this:
// data is the NSData object
const Byte *bytes = [data bytes];
Byte* index = [self offsetOfBytes:tag inBuffer:bytes ofLength:[data length]];

Convert your substring to an NSData object, and search for those bytes in the larger NSData using rangeOfData:options:range:. Make sure that the string encodings match!
On iPhone, where that isn't available, you may have to do this yourself. The C function strstr() will give you a pointer to the first occurrence of a pattern within the buffer (as long as neither contain nulls!), but not the index. Here's a function that should do the job (but no promises, since I haven't tried actually running it...):
- (NSUInteger)indexOfData:(NSData*)needle inData:(NSData*)haystack
{
const void* needleBytes = [needle bytes];
const void* haystackBytes = [haystack bytes];
// walk the length of the buffer, looking for a byte that matches the start
// of the pattern; we can skip (|needle|-1) bytes at the end, since we can't
// have a match that's shorter than needle itself
for (NSUInteger i=0; i < [haystack length]-[needle length]+1; i++)
{
// walk needle's bytes while they still match the bytes of haystack
// starting at i; if we walk off the end of needle, we found a match
NSUInteger j=0;
while (j < [needle length] && needleBytes[j] == haystackBytes[i+j])
{
j++;
}
if (j == [needle length])
{
return i;
}
}
return NSNotFound;
}
This runs in something like O(nm), where n is the buffer length, and m is the size of the substring. It's written to work with NSData for two reasons: 1) that's what you seem to have in hand, and 2) those objects already encapsulate both the actual bytes, and the length of the buffer.

If you're using Snow Leopard, a convenient way is the new -rangeOfData:options:range: method in NSData that returns the range of the first occurrence of a piece of data. Otherwise, you can access the NSData's contents yourself using its -bytes method to perform your own search.

I had the same problem.
I solved it doing the other way round, compared to the suggestions.
first, I reformat the data (assume your NSData is stored in var rawFile) with:
NSString *ascii = [[NSString alloc] initWithData:rawFile encoding:NSAsciiStringEncoding];
Now, you can easily do string searches like 'abcd' or whatever you want using the NSScanner class and passing the ascii string to the scanner. Maybe this is not really efficient, but it works until the -rangeOfData method will be available for iPhone also.

Related

NSData to NSArray of NSNumbers?

I'm coming from Swift to Objective-C and a problem I have run into is that NSData doesn't seem have methods to enumerate over UInt8 values the way Swift's Data does. Other answers with similar titles exist but they all deal with property lists, whereas I just have a bucket of ASCII bytes. Specifically, the code I want to replicate in Obj-C is:
//Find the next newline in ASCII data given a byte offset
let subset = myData.advanced(by: offset)
var lineEnd: Data.Index = myData.endIndex
subset.enumerateBytes({ (memory, idx, stop) in
let newline: UInt8 = 10
for idx in memory.indices {
let charByte = memory[idx]
if charByte == newline {
lineEnd = idx
stop = true; return;
}
}
})
Ideally I want a way to convert an NSArray to an array of NSNumbers which I can extract the intValues from. Of course, if there is a better method, let me know. A goal is to keep the Obj-C code as similar to Swift as possible as I will be maintaining the two codebases simultaneously.
The only good way to access an NSData's bytes is to call its -bytes method, which gets you a C pointer to its internal storage.
NSData *data = ...;
const uint8_t *bytes = data.bytes;
NSUInteger length = data.length;
for (NSUInteger i = 0; i < length; i++) {
uint8_t byte = bytes[i];
// do something with byte
}
The closest equivalent to advance would be subdataWithRange. The equivalent to enumerateBytes is enumerateByteRangesUsingBlock. It would yield something like:
NSData *subset = [data subdataWithRange:NSMakeRange(offset, data.length - offset)];
__block NSUInteger lineEnd = data.length;
Byte newline = 10;
[subset enumerateByteRangesUsingBlock:^(const void * _Nonnull bytes, NSRange byteRange, BOOL * _Nonnull stop) {
for (NSInteger index = 0; index < byteRange.length; index++) {
Byte charByte = ((Byte *)bytes)[index];
if (charByte == newline) {
lineEnd = index + byteRange.location;
*stop = true;
return;
}
}
}];
Note, I made a few changes from your Swift example:
If the data was not contiguous, your Swift example returns the index within the current block. But I suspect you want the location within subset, not the current block. I'm wagering that you've never noticed this because it's pretty rare that NSData blocks are not continuous.
But the Swift code doesn't look correct to me. This Objective-C example reports the offset within subset, not within the current block within subset.
It's not observable performance difference, but I pulled the definition of newline out of the enumeration block. Why repeatedly define that?
If you're really searching for a character in the NSData, I'd suggest avoiding creating the subset altogether. Just use rangeOfData:options:range:. This will find whatever you're looking for.

Process unicode string in C and Objective C

I write a C function to read characters in an user-input string. Because this string is user-input, so it can contains any unicode characters. There's an Objective C method receives the user-input NSString, then convert this string to NSData and pass this data to the C function for processing. The C function searches for these symbol characters: *, [, ], _, it doesn't care any other characters. Everytime it found one of the symbols, it processes and then calls an Objective C method, pass the location of the symbol.
C code:
typedef void (* callback)(void *context, size_t location);
void process(const uint8_t *data, size_t length, callback cb, void *context)
{
size_t i = 0;
while (i < length)
{
if (data[i] == '*' || data[i] == '[' || data[i] == ']' || data[i] == '_')
{
int valid = 0;
//do something, set valid = 1
if (valid)
cb(context, i);
}
i++;
}
}
Objective C code:
//a C function declared in .m file
void mycallback(void *context, size_t location)
{
[(__bridge id)context processSymbolAtLocation:location];
}
- (void)processSymbolAtLocation:(NSInteger)location
{
NSString *result = [self.string substringWithRange:NSMakeRange(location, 1)];
NSLog(#"%#", result);
}
- (void)processUserInput:(NSString*)string
{
self.string = string;
//convert string to data
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
//pass data to C function
process(data.bytes, data.length, mycallback, (__bridge void *)(self));
}
The code works fine if the input string contains only English characters. If it contains composed character sequences, multibyte characters or other unicode characters, the result string in processSymbolAtLocation method is not the expected symbol.
How to convert the NSString object to NSData correctly? How to get the correct location?
Thanks!
Your problem is that you start off with a UTF-16 encoded NSString and produce a sequence of UTF-8 encoded bytes. The number of code units required to represent a string in UTF-16 may not be equal to that number required to represent it in UTF-8, so the offsets in your two forms may not match - as you have found out.
Why are you using C to scan the string for matches in the first place? You might want to look at NSString's rangeOfCharacterFromSet:options:range: method which you can use to find the next occurrence of character from your set.
If you need to use C then convert your string into a sequence of UTF-16 words and use uint16_t on the C side.
HTH

Obfuscating a number(in a string) Objective C

I'm using the following code to obfuscate a passcode for a test app of mine.
- (NSString *)obfuscate:(NSString *)string withKey:(NSString *)key
{
// Create data object from the string
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
// Get pointer to data to obfuscate
char *dataPtr = (char *) [data bytes];
// Get pointer to key data
char *keyData = (char *) [[key dataUsingEncoding:NSUTF8StringEncoding] bytes];
// Points to each char in sequence in the key
char *keyPtr = keyData;
int keyIndex = 0;
// For each character in data, xor with current value in key
for (int x = 0; x < [data length]; x++)
{
// Replace current character in data with
// current character xor'd with current key value.
// Bump each pointer to the next character
*dataPtr = *dataPtr++ ^ *keyPtr++;
// If at end of key data, reset count and
// set key pointer back to start of key value
if (++keyIndex == [key length])
keyIndex = 0, keyPtr = keyData;
}
return [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
}
This works like a charm with all strings, but i've ran into a bit of a problem comparing the following results
NSLog([[self obfuscate:#"0000", #"maki"]); //Returns 0]<W
NSLog([[self obfuscate:#"0809", #"maki"]); //Returns 0]<W
As you can see, the two strings with numbers in, while different, return the same result! Whats gone wrong in the code i've attached to result in the same result for these two numbers?
Another example:
NSLog([self obfuscate:#"8000" withKey:#"maki"]); //Returns 8U4_
NSLog([self obfuscate:#"8290" withKey:#"maki"]); //Returns 8U4_ as well
I may be misunderstanding the concept of obfuscation, but I was under the impression that each unique string returns a unique obfuscated string!
Please help me fix this bug/glitch
Source of Code: http://iosdevelopertips.com/cocoa/obfuscation-encryption-of-string-nsstring.html
The problem is your last line. You create the new string with the original, unmodified data object.
You need to create a new NSData object from the modified dataPtr bytes.
NSData *newData = [NSData dataWithBytes:dataPtr length:data.length];
return [[NSString alloc] initWithData:newData encoding:NSUTF8StringEncoding];
But you have some bigger issues.
The calls to bytes returns a constant, read-only reference to the bytes in the NSData object. You should NOT be modifying that data.
The result of your XOR on the character data could, in theory, result in a byte stream that is no longer a valid UTF-8 encoded string.
The obfuscation algorithm that you have selected is based on XORing the data and the "key" values together. Generally, this is not very strong. Moreover, since XOR is symmetric, the results are very prone to producing duplicates.
Although your implementation is currently broken, fixing it would not be of much help in preventing the algorithm from producing identical results for different data: it is relatively straightforward to construct key/data pairs that produce the same obfuscated string - for example,
[self obfuscate:#"0123" withKey:#"vwxy"]
[self obfuscate:#"pqrs" withKey:#"6789"]
will produce identical results "FFJJ", even though both the strings and the keys look sufficiently different.
If you would like to "obfuscate" your strings in a cryptographically strong way, use a salted secure hash algorithm: it will produce very different results for even slightly different strings.

how to read chinese from pdf in ios correctly

here is what I have done, but it appears disorderly. Thanks in advance.
1.use CGPDFStringCopyTextString to get the text from the pdf
2.encode the NSString to char*
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingGB_18030_2000);
const char *char_content = [self.currentData cStringUsingEncoding:enc];
Below is how I get the currentData:
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
BIDViewController *pp = (__bridge BIDViewController*)userInfo;
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(inScanner, &array);
for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 1)
{
if(n >= CGPDFArrayGetCount(array))
continue;
CGPDFStringRef string;
success = CGPDFArrayGetString(array, n, &string);
if(success)
{
NSString *data = (__bridge NSString *)CGPDFStringCopyTextString(string);
[pp.currentData appendFormat:#"%#", data];
}
}
}
- (IBAction)press:(id)sender {
table = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
self.currentData = [NSMutableString string];
CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(pagerf);
CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, (__bridge void *)(self));
bool ret = CGPDFScannerScan(scanner);
}
According to the Mac Developer Library
CGPDFStringCopyTextString returns a CFString object that represents a PDF string as a text string. The PDF string is given as a CGPDFString which is a series of bytes—unsigned integer values in the range 0 to 255; thus, this method already decodes the bytes according to some character encoding.
It is given none explicitly, so it assumes one encoding type, most likely the PDFDocEncoding or the UTF-16BE Unicode character encoding scheme which are the two encodings that may be used to represent text strings in a PDF document outside the document’s content streams, cf. section 7.9.2.2 Text String Type and Table D.1, Annex D in the PDF specification.
Now you have not told us from where you received your CGPDFString. I assume, though, that you received it from inside one of the document’s content streams. Text strings there, on the other hand, can be encoded with any imaginable encoding. The encoding used is given by the embedded data of the font the string is to be displayed with.
For more information on this you may want to read CGPDFScannerPopString returning strange result and have a look at PDFKitten.

unichar* to NSString, get the length

I am trying to create an NSString object from a const unichar buffer where I don't know the length of the buffer.
I want to use the NSString stringWithCharacters: length: method to create the string (this seems to work), but please can you help me find out the length?
I have:
const unichar *c_emAdd = [... returns successfully from a C++ function...]
NSString *emAdd = [NSString stringWithCharacters:c_emAdd length = unicharLen];
Can anyone help me find out how to check what unicharLen is? I don't get this length passed back to me by the call to the C++ function, so I presume I'd need to iterate until I find a terminating character? Anyone have a code snippet to help? Thanks!
Is your char buffer null terminated?
Is it 16-bit unicode?
NSString *emAdd = [NSString stringWithFormat:#"%S", c_emAdd];
Your unichars should be null terminated so you when you reach two null bytes (a unichar = 0x0000) in the pointer you will know the length.
unsigned long long unistrlen(unichar *chars)
{
unsigned long long length = 0llu;
if(NULL == chars) return length;
while(NULL != chars[length])
length++;
return length;
}
//...
//Inside Some method or function
unichar chars[] = { 0x005A, 0x0065, 0x0062, 0x0072, 0x0061, 0x0000 };
NSString *string = [NSString stringWithCharacters:chars length:unistrlen(chars)];
NSLog(#"%#", string);
Or even simpler format with %S specifier

Resources