MD5 with ASCII Char - character-encoding

I have a string
wDevCopyright = [NSString stringWithFormat:#"Copyright: %c 1995 by WIRELESS.dev, Corp Communications Inc., All rights reserved.",0xa9];
and to munge it I call
-(NSString *)getMD5:(NSString *)source
{
const char *src = [source UTF8String];
unsigned char result[CC_MD5_DIGEST_LENGTH];
CC_MD5(src, strlen(src), result);
return [NSString stringWithFormat:
#"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
result[0], result[1], result[2], result[3],
result[4], result[5], result[6], result[7],
result[8], result[9], result[10], result[11],
result[12], result[13], result[14], result[15]
]; //ret;
}
because of 0xa9 *src = [source UTF8String] does not create a char that represents the string, thus returning a munge that is not comparable with other platforms.
I tried to encode the char with NSASCIIStringEncoding but it broke the code.
How do I call CC_MD5 with a string that has ASCII characters and get the same hash as in Java?
Update to code request:
Java
private static char[] kTestASCII = {
169
};
System.out.println("\n\n>>>>> msg## " + (char)0xa9 + " " + (char)169 + "\n md5 " + md5(new String(kTestASCII), false) //unicode = false
Result >>>>> msg## \251 \251
md5 a252c2c85a9e7756d5ba5da9949d57ed
ObjC
char kTestASCII [] = {
169
};
NSString *testString = [NSString stringWithCString:kTestASCII encoding:NSUTF8StringEncoding];
NSLog(#">>>> objC msg## int %d char %c md5: %#", 0xa9, 169, [self getMD5:testString]);
Result >>>> objC msg## int 169 char © md5: 9b759040321a408a5c7768b4511287a6
** As stated earlier - without the 0xa9 the hashes in Java and ObjC are the same. I am trying to get the hash for 0xa9 the same in Java and ObjC
Java MD5 code
private static char[] kTestASCII = {
169
};
md5(new String(kTestASCII), false);
/**
* Compute the MD5 hash for the given String.
* #param s the string to add to the digest
* #param unicode true if the string is unciode, false for ascii strings
*/
public synchronized final String md5(String value, boolean unicode)
{
MD5();
MD5.update(value, unicode);
return WUtilities.toHex(MD5.finish());
}
public synchronized void update(String s, boolean unicode)
{
if (unicode)
{
char[] c = new char[s.length()];
s.getChars(0, c.length, c, 0);
update(c);
}
else
{
byte[] b = new byte[s.length()];
s.getBytes(0, b.length, b, 0);
update(b);
}
}
public synchronized void update(byte[] b)
{
update(b, 0, b.length);
}
//--------------------------------------------------------------------------------
/**
* Add a byte sub-array to the digest.
*/
public synchronized void update(byte[] b, int offset, int length)
{
for (int n = offset; n < offset + length; n++)
update(b[n]);
}
/**
* Add a byte to the digest.
*/
public synchronized void update(byte b)
{
int index = (int)((count >>> 3) & 0x03f);
count += 8;
buffer[index] = b;
if (index >= 63)
transform();
}
I believe that my issue is with using NSData withEncoding as opposed to a C char[] or the Java byte[]. So what is the best way to roll my own bytes into a byte[] in objC?

The character you are having problems with, ©, is the Unicode COPYRIGHT SIGN (00A9). The correct UTF-8 encoding of this character is the byte sequence 0xc9 0xa9.
You are attempting, however to convert from the single-byte sequence 0xa9 which is not a valid UTF-8 encoding of any character. See table 3-7 of http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404 . Since this is not a valid UTF-8 byte sequence, stringWithCString is converting your input to the Unicode REPLACEMENT_CHARACTER (FFFD). When this character is then encoded back into UTF-8, it yields the byte sequence 0xef 0xbf 0xbd. The MD5 of this sequence is 9b759040321a408a5c7768b4511287a6 as reported by your Objective-C example.
Your Java example yields an MD5 of a252c2c85a9e7756d5ba5da9949d57ed, which simple experimentation shows is the MD5 of the byte sequence 0xa9, which I have already noted is not a valid UTF-8 representation of the desired character.
I think we need to see the implementation of the Java md5() method you are using. I suspect it is simply dropping the high bytes of every Unicode character to convert to a byte sequence for passing to the MessageDigest class. This does not match your Objective-C implementation where you are using a UTF-8 encoding.
Note: even if you fix your Objective-C implementation to match the encoding of your Java md5() method, your test will need some adjustment because you cannot use stringWithCString with the NSUTF8StringEncoding encoding to convert the byte sequence 0xa9 to an NSString.
UPDATE
Having now seen the Java implementation using the deprecated getBytes method, my recommendation is to change the Java implementation, if at all possible, to use a proper UTF-8 encoding.
I suspect, however, that your requirements are to match the current Java implementation, even if it is wrong. Therefore, I suggest you duplicate the bad behavior of Java's deprecated getBytes() method by using NSString getCharacters:range: to retrieve an array of unichars, then manually create an array of bytes by taking the low byte of each unichar.

stringWithCString requires a null terminated C-String. I don't think that kTestASCII[] is necessarily null terminated in your Objective-C code. Perhaps that is the cause of the difference.
Try:
char kTestASCII [] = {
169,
0
};

Thanks to GBegan's explanation - here is my solution
for(int c = 0; c < [s length]; c++){
int number = [s characterAtIndex:c];
unsigned char c[1];
c[0] = (unsigned char)number;
NSMutableData *oneByte = [NSMutableData dataWithBytes:&c length:1];
}

Related

Binary hash representation to HEX/Ascii in Objective-c

I would to log a binary hash representation in the console, using an hex or ascii representation. The algorithm is MD5, so the function is CC_MD5
I get the binary hash representation via a Theos tweak, which is working well.
EDIT: this tweak intercept the CC_MD5 call. The call is implemented in the method described below. When CC_MD5 is called, replaced_CC_MD5 intercept the call.
The app tested, is a simple app which i made myself and it's using this method to calculate MD5 Hash:
- (NSString *) md5:(NSString *) input
{
const char *cStr = [input UTF8String];
unsigned char digest[16];
CC_MD5( cStr, strlen(cStr), digest ); // This is the md5 call
NSMutableString *output = [NSMutableString stringWithCapacity:CC_MD5_DIGEST_LENGTH * 2];
for(int i = 0; i < CC_MD5_DIGEST_LENGTH; i++)
[output appendFormat:#"%02x", digest[i]];
return output;
}
The hashing it's ok, and the app returns to me the correct hash for the input
input = prova
MD5 Digest = 189bbbb00c5f1fb7fba9ad9285f193d1
The function in my Theos Tweak where i manipulate the CC_MD5 function is
EDIT: where data would be cStr, len would be strlen(cStr) and md would be digest.
static unsigned char * replaced_CC_MD5(const void *data, CC_LONG len, unsigned char *md) {
CC_LONG dataLength = (size_t) len;
NSLog(#"==== START CC_MD5 HOOK ====");
// hex of digest
NSData *dataDigest = [NSData dataWithBytes:(const void *)md length:(NSUInteger)CC_MD5_DIGEST_LENGTH];
NSLog(#"%#", dataDigest);
// hex of string
NSData *dataString = [NSData dataWithBytes:(const void *)data length:(NSUInteger)dataLength];
NSLog(#"%#", dataString);
NSLog(#"==== END CC_MD5 HOOK ====");
return original_CC_MD5(data, len, md);
}
The log of dataString it's ok: 70726f76 61 which is the HEX representation of prova
The log of dataDigest is e9aa0800 01000000 b8c00800 01000000 which is, if i understood, the binary hash representation.
How can i convert this representation to have the MD5 Hash digest?
In replaced_CC_MD5 you are displaying md before the call to original_CC_MD5 which sets its value. What you are seeing is therefore random data (or whatever was last stored in md).
Move the call to original_CC_MD5 to before the display statement and you should see the value you expect. (You'll of course need to save the result of the call in a local so you can return the value in the return statement.)

How to turn 4 bytes into a float in objective-c from NSData

Here is an example of turning 4 bytes into a 32bit integer in objective-c. The function readInt grabs 4 bytes from the read function and then converts it into a single 32 bit int. Does anyone know how I would convert 4 bytes to a float? I believe it is big endian. Basically I need a readFloat function. I can never grasp these bitwise operations.
EDIT:
I forgot to mention that the original data comes from Java's DataOutputStream class. The writeFloat function accordign to java doc is
Converts the float argument to an int using the floatToIntBits method
in class Float, and then writes that int value to the underlying
output stream as a 4-byte quantity, high byte first.
This is Objective c trying to extract the data written by java.
- (int32_t)read{
int8_t v;
[data getBytes:&v range:NSMakeRange(length,1)];
length++;
return ((int32_t)v & 0x0ff);
}
- (int32_t)readInt {
int32_t ch1 = [self read];
int32_t ch2 = [self read];
int32_t ch3 = [self read];
int32_t ch4 = [self read];
if ((ch1 | ch2 | ch3 | ch4) < 0){
#throw [NSException exceptionWithName:#"Exception" reason:#"EOFException" userInfo:nil];
}
return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}
OSByteOrder.h contains functions for reading, writing, and converting integer data.
You can use OSSwapBigToHostInt32() to convert a big-endian integer to the native representation, then copy the bits into a float:
NSData* data = [NSData dataWithContentsOfFile:#"/tmp/java/test.dat"];
int32_t bytes;
[data getBytes:&bytes length:sizeof(bytes)];
bytes = OSSwapBigToHostInt32(bytes);
float number;
memcpy(&number, &bytes, sizeof(bytes));
NSLog(#"Float %f", number);
[data getBytes:&myFloat range:NSMakeRange(locationWhereFloatStarts, sizeof(float)] ought to do the trick.
Given that the data comes from DataOutputStream's writeFloat() method, then that is documented to use Float.floatToIntBits() to create the integer representation. intBitsToFloat() further documents how to interpret that representation.
I'm not sure if it's the same thing, but the xdr API seems like it might handle that representation. The credits on the man page refer to Sun Microsystems standards/specifications, so it seems likely it's related to Java.
So, it may work to do something like:
// At top of file:
#include <rpc/types.h>
#include <rpc/xdr.h>
// In some function or method:
XDR xdr;
xdrmem_create(&xdr, (char*)data.bytes + offset, data.length - offset, XDR_DECODE);
float f;
if (!xdr_float(&xdr, &f))
/* handle error */;
xdr_destroy(&xdr);
If the data consists of a whole stream in eXternal Data Representation, then you would create one XDR stream for the whole task of extracting items from it, and use many xdr_...() calls between creating and destroying it to extract all of the items.

C++ - Removing invalid characters when a user paste in a grid

Here's my situation. I have an issue where I need to filter invalid characters that a user may paste from word or excel documents.
Here is what I'm doing.
First I'm trying to convert any unicode characters to ascii
extern "C" COMMON_STRING_FUNCTIONS long ConvertUnicodeToAscii(wchar_t * pwcUnicodeString, char* &pszAsciiString)
{
int nBufLen = WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, NULL, 0, NULL, NULL)+1;
pszAsciiString = new char[nBufLen];
WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, pszAsciiString, nBufLen, NULL, NULL);
return nBufLen;
}
Next I'm filtering out any character that does not have a value between 31 and 127
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character that was stripped out.
String filterChars = "";
// Not Used. We never received the whitelist
String l_SWhiteList = "";
// Our String without the invalid characters.
AnsiString l_stempString;
// convert the string into an array of chars
wchar_t* outputChars = l_sConversion.w_str();
char * pszOutputString = NULL;
//convert any unicode characters to ASCII
ConvertUnicodeToAscii(outputChars, pszOutputString);
l_stempString = (AnsiString)pszOutputString;
//We're going backwards since we are removing characters which changes the length and position.
for (int i = l_stempString.Length(); i > 0; i--)
{
char l_sCurrentChar = l_stempString[i];
//If we don't have a valid character, filter it out of the string.
if (((unsigned int)l_sCurrentChar < 31) ||((unsigned int)l_sCurrentChar > 127))
{
String l_sSecondHalf = "";
String l_sFirstHalf = "";
l_sSecondHalf = l_stempString.SubString(i + 1, l_stempString.Length() - i);
l_sFirstHalf = l_stempString.SubString(0, i - 1);
l_stempString = l_sFirstHalf + l_sSecondHalf;
filterChars += "\'" + ((String)(unsigned int)(l_sCurrentChar)) + "\' ";
}
}
if (filterChars.Length() > 0)
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, "The Following ASCII Values were filtered from the string: " + filterChars);
}
// Delete the char* to avoid memory leaks.
delete [] pszOutputString;
return l_stempString;
}
Now this seems to work except, when you try to copy and past bullets from a word document.
o Bullet1:
 subbullet1.
You will get something like this
oBullet1?subbullet1.
My filter function is called on an onchange event.
The bullets are replaced with the value o and a question mark.
What am I doing wrong, and is there a better way of trying to do this.
I'm using c++ builder XE5 so please no Visual C++ solutions.
When you perform the conversion to ASCII (which is not actually converting to ASCII, btw), Unicode characters that are not supported by the target codepage are lost - either dropped, replaced with ?, or replaced with a close approximation - so they are not available to your scanning loop. You should not do the conversion at all, scan the source Unicode data as-is instead.
Try something more like this:
#include <System.Character.hpp>
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character sequence that was stripped out.
String filterChars;
// Not Used. We never received the whitelist
String l_SWhiteList;
// Our String without the invalid sequences.
String l_stempString;
int numChars;
for (int i = 1; i <= l_sConversion.Length(); i += numChars)
{
UCS4Char ch = TCharacter::ConvertToUtf32(l_sConversion, i, numChars);
String seq = l_sConversion.SubString(i, numChars);
//If we don't have a valid codepoint, filter it out of the string.
if ((ch <= 31) || (ch >= 127))
filterChars += (_D("\'") + seq + _D("\' "));
else
l_stempString += seq;
}
if (!filterChars.IsEmpty())
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, _D("The Following Values were filtered from the string: ") + filterChars);
}
return l_stempString;
}

Process unicode string in C and Objective C

I write a C function to read characters in an user-input string. Because this string is user-input, so it can contains any unicode characters. There's an Objective C method receives the user-input NSString, then convert this string to NSData and pass this data to the C function for processing. The C function searches for these symbol characters: *, [, ], _, it doesn't care any other characters. Everytime it found one of the symbols, it processes and then calls an Objective C method, pass the location of the symbol.
C code:
typedef void (* callback)(void *context, size_t location);
void process(const uint8_t *data, size_t length, callback cb, void *context)
{
size_t i = 0;
while (i < length)
{
if (data[i] == '*' || data[i] == '[' || data[i] == ']' || data[i] == '_')
{
int valid = 0;
//do something, set valid = 1
if (valid)
cb(context, i);
}
i++;
}
}
Objective C code:
//a C function declared in .m file
void mycallback(void *context, size_t location)
{
[(__bridge id)context processSymbolAtLocation:location];
}
- (void)processSymbolAtLocation:(NSInteger)location
{
NSString *result = [self.string substringWithRange:NSMakeRange(location, 1)];
NSLog(#"%#", result);
}
- (void)processUserInput:(NSString*)string
{
self.string = string;
//convert string to data
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
//pass data to C function
process(data.bytes, data.length, mycallback, (__bridge void *)(self));
}
The code works fine if the input string contains only English characters. If it contains composed character sequences, multibyte characters or other unicode characters, the result string in processSymbolAtLocation method is not the expected symbol.
How to convert the NSString object to NSData correctly? How to get the correct location?
Thanks!
Your problem is that you start off with a UTF-16 encoded NSString and produce a sequence of UTF-8 encoded bytes. The number of code units required to represent a string in UTF-16 may not be equal to that number required to represent it in UTF-8, so the offsets in your two forms may not match - as you have found out.
Why are you using C to scan the string for matches in the first place? You might want to look at NSString's rangeOfCharacterFromSet:options:range: method which you can use to find the next occurrence of character from your set.
If you need to use C then convert your string into a sequence of UTF-16 words and use uint16_t on the C side.
HTH

how to read chinese from pdf in ios correctly

here is what I have done, but it appears disorderly. Thanks in advance.
1.use CGPDFStringCopyTextString to get the text from the pdf
2.encode the NSString to char*
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingGB_18030_2000);
const char *char_content = [self.currentData cStringUsingEncoding:enc];
Below is how I get the currentData:
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
BIDViewController *pp = (__bridge BIDViewController*)userInfo;
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(inScanner, &array);
for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 1)
{
if(n >= CGPDFArrayGetCount(array))
continue;
CGPDFStringRef string;
success = CGPDFArrayGetString(array, n, &string);
if(success)
{
NSString *data = (__bridge NSString *)CGPDFStringCopyTextString(string);
[pp.currentData appendFormat:#"%#", data];
}
}
}
- (IBAction)press:(id)sender {
table = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
self.currentData = [NSMutableString string];
CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(pagerf);
CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, (__bridge void *)(self));
bool ret = CGPDFScannerScan(scanner);
}
According to the Mac Developer Library
CGPDFStringCopyTextString returns a CFString object that represents a PDF string as a text string. The PDF string is given as a CGPDFString which is a series of bytes—unsigned integer values in the range 0 to 255; thus, this method already decodes the bytes according to some character encoding.
It is given none explicitly, so it assumes one encoding type, most likely the PDFDocEncoding or the UTF-16BE Unicode character encoding scheme which are the two encodings that may be used to represent text strings in a PDF document outside the document’s content streams, cf. section 7.9.2.2 Text String Type and Table D.1, Annex D in the PDF specification.
Now you have not told us from where you received your CGPDFString. I assume, though, that you received it from inside one of the document’s content streams. Text strings there, on the other hand, can be encoded with any imaginable encoding. The encoding used is given by the embedded data of the font the string is to be displayed with.
For more information on this you may want to read CGPDFScannerPopString returning strange result and have a look at PDFKitten.

Resources