C++ - Removing invalid characters when a user paste in a grid - c++builder

Here's my situation. I have an issue where I need to filter invalid characters that a user may paste from word or excel documents.
Here is what I'm doing.
First I'm trying to convert any unicode characters to ascii
extern "C" COMMON_STRING_FUNCTIONS long ConvertUnicodeToAscii(wchar_t * pwcUnicodeString, char* &pszAsciiString)
{
int nBufLen = WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, NULL, 0, NULL, NULL)+1;
pszAsciiString = new char[nBufLen];
WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, pszAsciiString, nBufLen, NULL, NULL);
return nBufLen;
}
Next I'm filtering out any character that does not have a value between 31 and 127
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character that was stripped out.
String filterChars = "";
// Not Used. We never received the whitelist
String l_SWhiteList = "";
// Our String without the invalid characters.
AnsiString l_stempString;
// convert the string into an array of chars
wchar_t* outputChars = l_sConversion.w_str();
char * pszOutputString = NULL;
//convert any unicode characters to ASCII
ConvertUnicodeToAscii(outputChars, pszOutputString);
l_stempString = (AnsiString)pszOutputString;
//We're going backwards since we are removing characters which changes the length and position.
for (int i = l_stempString.Length(); i > 0; i--)
{
char l_sCurrentChar = l_stempString[i];
//If we don't have a valid character, filter it out of the string.
if (((unsigned int)l_sCurrentChar < 31) ||((unsigned int)l_sCurrentChar > 127))
{
String l_sSecondHalf = "";
String l_sFirstHalf = "";
l_sSecondHalf = l_stempString.SubString(i + 1, l_stempString.Length() - i);
l_sFirstHalf = l_stempString.SubString(0, i - 1);
l_stempString = l_sFirstHalf + l_sSecondHalf;
filterChars += "\'" + ((String)(unsigned int)(l_sCurrentChar)) + "\' ";
}
}
if (filterChars.Length() > 0)
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, "The Following ASCII Values were filtered from the string: " + filterChars);
}
// Delete the char* to avoid memory leaks.
delete [] pszOutputString;
return l_stempString;
}
Now this seems to work except, when you try to copy and past bullets from a word document.
o Bullet1:
 subbullet1.
You will get something like this
oBullet1?subbullet1.
My filter function is called on an onchange event.
The bullets are replaced with the value o and a question mark.
What am I doing wrong, and is there a better way of trying to do this.
I'm using c++ builder XE5 so please no Visual C++ solutions.

When you perform the conversion to ASCII (which is not actually converting to ASCII, btw), Unicode characters that are not supported by the target codepage are lost - either dropped, replaced with ?, or replaced with a close approximation - so they are not available to your scanning loop. You should not do the conversion at all, scan the source Unicode data as-is instead.
Try something more like this:
#include <System.Character.hpp>
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character sequence that was stripped out.
String filterChars;
// Not Used. We never received the whitelist
String l_SWhiteList;
// Our String without the invalid sequences.
String l_stempString;
int numChars;
for (int i = 1; i <= l_sConversion.Length(); i += numChars)
{
UCS4Char ch = TCharacter::ConvertToUtf32(l_sConversion, i, numChars);
String seq = l_sConversion.SubString(i, numChars);
//If we don't have a valid codepoint, filter it out of the string.
if ((ch <= 31) || (ch >= 127))
filterChars += (_D("\'") + seq + _D("\' "));
else
l_stempString += seq;
}
if (!filterChars.IsEmpty())
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, _D("The Following Values were filtered from the string: ") + filterChars);
}
return l_stempString;
}

Related

Objective-C how to convert a keystroke to ASCII character code?

I need to find a way to convert an arbitrary character typed by a user into an ASCII representation to be sent to a network service. My current approach is to create a lookup dictionary and send the corresponding code. After creating this dictionary, I see that it is hard to maintain and determine if it is complete:
__asciiKeycodes[#"F1"] = #(112);
__asciiKeycodes[#"F2"] = #(113);
__asciiKeycodes[#"F3"] = #(114);
//...
__asciiKeycodes[#"a"] = #(97);
__asciiKeycodes[#"b"] = #(98);
__asciiKeycodes[#"c"] = #(99);
Is there a better way to get ASCII character code from an arbitrary key typed by a user (using standard 104 keyboard)?
Objective C has base C primitive data types. There is a little trick you can do. You want to set the keyStroke to a char, and then cast it as an int. The default conversion in c from a char to an int is that char's ascii value. Here's a quick example.
char character= 'a';
NSLog("a = %ld", (int)test);
console output = a = 97
To go the other way around, cast an int as a char;
int asciiValue= (int)97;
NSLog("97 = %c", (char)asciiValue);
console output = 97 = a
Alternatively, you can do a direct conversion within initialization of your int or char and store it in a variable.
char asciiToCharOf97 = (char)97; //Stores 'a' in asciiToCharOf97
int charToAsciiOfA = (int)'a'; //Stores 97 in charToAsciiOfA
This seems to work for most keyboard keys, not sure about function keys and return key.
NSString* input = #"abcdefghijklkmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!##$%^&*()_+[]\{}|;':\"\\,./<>?~ ";
for(int i = 0; i<input.length; i ++)
{
NSLog(#"Found (at %i): %i",i , [input characterAtIndex:i]);
}
Use stringWithFormat call and pass the int values.

Is it possible to parse a string of fixed length in yacc/lex?

I have a file format something like this
...
{string_length} {binary_string}
...
example:
...
10 abcdefghij
...
Is this possible to parse using lexer/yacc? There is no null terminator for the string, so I'm at a loss of how to tokenize that.
I'm currently using ply's lexer and yacc for this
You can't do it with a regular expression, but you can certainly extract the lexeme. You're not specific about how the length is terminated; here, I'm assuming that it is terminated by a single space character. I'm also assuming that yylval has some appropriate struct type:
[[:digit:]]+" " { unsigned long len = atol(yytext);
yylval.str = malloc(len);
yylval.len = len;
for (char *p = yylval.str; len; --len, ++p) {
int ch = input();
if (ch == EOF) { /* handle the lexical error */ }
*p = ch;
}
return BINARY_STRING;
}
There are other solutions (a start condition and a state variable for the count, for example), but I think the above is the simplest.

Process unicode string in C and Objective C

I write a C function to read characters in an user-input string. Because this string is user-input, so it can contains any unicode characters. There's an Objective C method receives the user-input NSString, then convert this string to NSData and pass this data to the C function for processing. The C function searches for these symbol characters: *, [, ], _, it doesn't care any other characters. Everytime it found one of the symbols, it processes and then calls an Objective C method, pass the location of the symbol.
C code:
typedef void (* callback)(void *context, size_t location);
void process(const uint8_t *data, size_t length, callback cb, void *context)
{
size_t i = 0;
while (i < length)
{
if (data[i] == '*' || data[i] == '[' || data[i] == ']' || data[i] == '_')
{
int valid = 0;
//do something, set valid = 1
if (valid)
cb(context, i);
}
i++;
}
}
Objective C code:
//a C function declared in .m file
void mycallback(void *context, size_t location)
{
[(__bridge id)context processSymbolAtLocation:location];
}
- (void)processSymbolAtLocation:(NSInteger)location
{
NSString *result = [self.string substringWithRange:NSMakeRange(location, 1)];
NSLog(#"%#", result);
}
- (void)processUserInput:(NSString*)string
{
self.string = string;
//convert string to data
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
//pass data to C function
process(data.bytes, data.length, mycallback, (__bridge void *)(self));
}
The code works fine if the input string contains only English characters. If it contains composed character sequences, multibyte characters or other unicode characters, the result string in processSymbolAtLocation method is not the expected symbol.
How to convert the NSString object to NSData correctly? How to get the correct location?
Thanks!
Your problem is that you start off with a UTF-16 encoded NSString and produce a sequence of UTF-8 encoded bytes. The number of code units required to represent a string in UTF-16 may not be equal to that number required to represent it in UTF-8, so the offsets in your two forms may not match - as you have found out.
Why are you using C to scan the string for matches in the first place? You might want to look at NSString's rangeOfCharacterFromSet:options:range: method which you can use to find the next occurrence of character from your set.
If you need to use C then convert your string into a sequence of UTF-16 words and use uint16_t on the C side.
HTH

Parse String respresentation of byte array in JavaME

I sending a byte array over a REST service. It is being received as String. Here is an extract of it. with start and end tags.
[0,0,0,0,32,122,26,65,0,0,0,0,96,123,26,65,0,0,0,0,192,123,20,65,0,0,0,0,0,125,20,65,71,73,70,56,57,97,244,1,244,1,247,0,0,51,85,51,51,85,102,51,85,153,51,85,204,51,85,255,51,128,0,51,128,51,51,128,102,51,128,153,51,128,204,51,128,255,51,170,0,51,170,51,51,170,102,51,170,153,51,170,204,51,170,255,51,213,0,51,213,51,51,213,102,51,213,153,51,213,204,51,213,255,51,255,0,51,255,51,51,255,102,51,255,153,51,255,204,51]
Now before anyone suggests sending it as a base64 encoded String, that would require Blackberry to actually have a working Base64 decoder. But alas, it fails for files over 64k and Ive tried alsorts.
Anyway this is what ive tried:
str = str.replace('[', ' ');
str = str.replace(']', ' ');
String[] tokens = split(str,",");
byte[] decoded = new byte[tokens.length];
for(int i = 0; i < tokens.length; i++)
{
decoded[i] = (byte)Integer.parseInt(tokens[i]);
}
But it fails. Where split is like the JAVA implementation found here.
Logically it should work? but its not. This is for JavaME / Blackberry. No Java Answers please (unless they work on javaME).
Two problems one minor and one that is a pain.
Minor:whitespaces (as mentioned by Nikita)
Major:casting to bytes ... since java only has unsigned byte, 128 and higher will become negative numbers when casting from int to byte.
str = str.replace('[',' ');
str = str.replace(']', ' ');
String[] tokens = split(str,",");//String[] tokens = str.split(",");
byte[] decoded = new byte[tokens.length];
for (int i = 0; i < tokens.length; i++) {
decoded[i] = (byte) (Integer.parseInt(tokens[i].trim()) & 0xFF);
}
for(byte b:decoded) {
int tmp = ((int)b) & 0xff;
System.out.print("byte:"+tmp);
}
(btw:implementing base64 encoder/decoder isn't especially hard - might be "overkill" for your project though)
Replace brackets with empty strings, not with spaces:
str = str.replace('[', '');
str = str.replace(']', '');
In your case you have following array:
[" 0", "0", "0", ..., "204", "51 "]
First element " 0" cannot be parsed to Integer.
I recommend to use Base64 encoded string to send byte array.
There's a post with link to Base64 library for J2ME.
This way allows you convert byte array to a string and later you can convert this string to byte array.

MD5 with ASCII Char

I have a string
wDevCopyright = [NSString stringWithFormat:#"Copyright: %c 1995 by WIRELESS.dev, Corp Communications Inc., All rights reserved.",0xa9];
and to munge it I call
-(NSString *)getMD5:(NSString *)source
{
const char *src = [source UTF8String];
unsigned char result[CC_MD5_DIGEST_LENGTH];
CC_MD5(src, strlen(src), result);
return [NSString stringWithFormat:
#"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
result[0], result[1], result[2], result[3],
result[4], result[5], result[6], result[7],
result[8], result[9], result[10], result[11],
result[12], result[13], result[14], result[15]
]; //ret;
}
because of 0xa9 *src = [source UTF8String] does not create a char that represents the string, thus returning a munge that is not comparable with other platforms.
I tried to encode the char with NSASCIIStringEncoding but it broke the code.
How do I call CC_MD5 with a string that has ASCII characters and get the same hash as in Java?
Update to code request:
Java
private static char[] kTestASCII = {
169
};
System.out.println("\n\n>>>>> msg## " + (char)0xa9 + " " + (char)169 + "\n md5 " + md5(new String(kTestASCII), false) //unicode = false
Result >>>>> msg## \251 \251
md5 a252c2c85a9e7756d5ba5da9949d57ed
ObjC
char kTestASCII [] = {
169
};
NSString *testString = [NSString stringWithCString:kTestASCII encoding:NSUTF8StringEncoding];
NSLog(#">>>> objC msg## int %d char %c md5: %#", 0xa9, 169, [self getMD5:testString]);
Result >>>> objC msg## int 169 char © md5: 9b759040321a408a5c7768b4511287a6
** As stated earlier - without the 0xa9 the hashes in Java and ObjC are the same. I am trying to get the hash for 0xa9 the same in Java and ObjC
Java MD5 code
private static char[] kTestASCII = {
169
};
md5(new String(kTestASCII), false);
/**
* Compute the MD5 hash for the given String.
* #param s the string to add to the digest
* #param unicode true if the string is unciode, false for ascii strings
*/
public synchronized final String md5(String value, boolean unicode)
{
MD5();
MD5.update(value, unicode);
return WUtilities.toHex(MD5.finish());
}
public synchronized void update(String s, boolean unicode)
{
if (unicode)
{
char[] c = new char[s.length()];
s.getChars(0, c.length, c, 0);
update(c);
}
else
{
byte[] b = new byte[s.length()];
s.getBytes(0, b.length, b, 0);
update(b);
}
}
public synchronized void update(byte[] b)
{
update(b, 0, b.length);
}
//--------------------------------------------------------------------------------
/**
* Add a byte sub-array to the digest.
*/
public synchronized void update(byte[] b, int offset, int length)
{
for (int n = offset; n < offset + length; n++)
update(b[n]);
}
/**
* Add a byte to the digest.
*/
public synchronized void update(byte b)
{
int index = (int)((count >>> 3) & 0x03f);
count += 8;
buffer[index] = b;
if (index >= 63)
transform();
}
I believe that my issue is with using NSData withEncoding as opposed to a C char[] or the Java byte[]. So what is the best way to roll my own bytes into a byte[] in objC?
The character you are having problems with, ©, is the Unicode COPYRIGHT SIGN (00A9). The correct UTF-8 encoding of this character is the byte sequence 0xc9 0xa9.
You are attempting, however to convert from the single-byte sequence 0xa9 which is not a valid UTF-8 encoding of any character. See table 3-7 of http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404 . Since this is not a valid UTF-8 byte sequence, stringWithCString is converting your input to the Unicode REPLACEMENT_CHARACTER (FFFD). When this character is then encoded back into UTF-8, it yields the byte sequence 0xef 0xbf 0xbd. The MD5 of this sequence is 9b759040321a408a5c7768b4511287a6 as reported by your Objective-C example.
Your Java example yields an MD5 of a252c2c85a9e7756d5ba5da9949d57ed, which simple experimentation shows is the MD5 of the byte sequence 0xa9, which I have already noted is not a valid UTF-8 representation of the desired character.
I think we need to see the implementation of the Java md5() method you are using. I suspect it is simply dropping the high bytes of every Unicode character to convert to a byte sequence for passing to the MessageDigest class. This does not match your Objective-C implementation where you are using a UTF-8 encoding.
Note: even if you fix your Objective-C implementation to match the encoding of your Java md5() method, your test will need some adjustment because you cannot use stringWithCString with the NSUTF8StringEncoding encoding to convert the byte sequence 0xa9 to an NSString.
UPDATE
Having now seen the Java implementation using the deprecated getBytes method, my recommendation is to change the Java implementation, if at all possible, to use a proper UTF-8 encoding.
I suspect, however, that your requirements are to match the current Java implementation, even if it is wrong. Therefore, I suggest you duplicate the bad behavior of Java's deprecated getBytes() method by using NSString getCharacters:range: to retrieve an array of unichars, then manually create an array of bytes by taking the low byte of each unichar.
stringWithCString requires a null terminated C-String. I don't think that kTestASCII[] is necessarily null terminated in your Objective-C code. Perhaps that is the cause of the difference.
Try:
char kTestASCII [] = {
169,
0
};
Thanks to GBegan's explanation - here is my solution
for(int c = 0; c < [s length]; c++){
int number = [s characterAtIndex:c];
unsigned char c[1];
c[0] = (unsigned char)number;
NSMutableData *oneByte = [NSMutableData dataWithBytes:&c length:1];
}

Resources