Using Flex/Bison to parse underscore delimited value - parsing

I would like to parse some input, mainly numbers, that can be delimited using underscore ( _ ) for user readability.
Ex.
1_0001_000 -> 1000100
000_000_111 -> 000000111
How would I set up my flex/yacc to do so?

Here's a potential flex answer (in C):
DIGIT [0-9]
%%
{DIGIT}+("_"{DIGIT}+)* { int numUnderscores = 0;
for(int i = 0; i < yyleng; i++)
if(yytext[i] == '_')
numUnderscores++;
int stringLength = yyleng - numUnderscores + 1;
char *string = (char*) malloc(sizeof(char) * stringLength);
/* be sure to check and ensure string isn't NULL */
int pos = 0;
for(int i = 0; i < yyleng; i++) {
if(yytext[i] != '_') {
string[pos] = yytext[i];
pos++;
}
}
return string;
}
If you know the maximum size of the number, you could use a statically sized array instead of dynamically allocating space for the string.
As stated before flex isn't the most efficient tool for solving this problem. If this problem is part of a larger problem (such as a language grammar), then keep using flex. Otherwise, there are many more efficient ways of handling this.
If you just need the string numerically, try this:
DIGIT [0-9]
%%
{DIGIT}+("_"{DIGIT}+)* { int number = 0;
for(int i = 0; i < yyleng; i++)
if(yytext[i] != '_')
number = (number*10) + (yytext[i]-'0');
return number;
}
Just be sure to check for overflow!

Related

Cs50 speller: not recognising any incorrect words

I'm currently working on the CS50 Speller function. I have managed to compile my code and have finished a prototype of the full program, however it does not work (it doesn't recognise any mispelled words). I am looking through my functions one at a time and printing out their output to have a look at what's going on inside.
// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
char word[LENGTH + 1];
int counter = 0;
FILE *dicptr = fopen(dictionary, "r");
if (dicptr == NULL)
{
printf("Could not open file\n");
return 1;
}
while (fscanf(dicptr, "%s", word) != EOF)
{
printf("%s", word);
node *n = malloc(sizeof(node));
if (n == NULL)
{
unload();
printf("Memory Error\n");
return false;
}
strcpy(n->word, word);
int h = hash(n->word);
n->next = table[h];
table[h] = n;
amount++;
}
fclose(dicptr);
return true;
}
From what I can see this works fine. Which makes me wonder if the issue is with my check function as shown here:
bool check(const char *word)
{
int n = strlen(word);
char copy[n + 1];
copy[n] = '\0';
for(int i = 0; i < n; i++)
{
copy[i] = tolower(word[i]);
printf("%c", copy[i]);
}
printf("\n");
node *cursor = table[hash(copy)];
while(cursor != NULL)
{
if(strcasecmp(cursor->word, word))
{
return true;
}
cursor = cursor->next;
}
return false;
}
If someone with a keener eye can spy what is the issue I'd be very grateful as I'm stumped. The first function is used to load a the words from a dictionary into a hash table\linked list. The second function is supposed to check the words of a txt file to see if they match with any of the terms in the linked list. If not then they should be counted as incorrect.
This if(strcasecmp(cursor->word, word)) is a problem. From man strcasecmp:
Return Value
The strcasecmp() and strncasecmp() functions return an
integer less than, equal to, or greater than zero if s1 (or the first
n bytes thereof) is found, respectively, to be less than, to match, or
be greater than s2.
If the words match, it returns 0, which evaluates to false.

C++ - Removing invalid characters when a user paste in a grid

Here's my situation. I have an issue where I need to filter invalid characters that a user may paste from word or excel documents.
Here is what I'm doing.
First I'm trying to convert any unicode characters to ascii
extern "C" COMMON_STRING_FUNCTIONS long ConvertUnicodeToAscii(wchar_t * pwcUnicodeString, char* &pszAsciiString)
{
int nBufLen = WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, NULL, 0, NULL, NULL)+1;
pszAsciiString = new char[nBufLen];
WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, pszAsciiString, nBufLen, NULL, NULL);
return nBufLen;
}
Next I'm filtering out any character that does not have a value between 31 and 127
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character that was stripped out.
String filterChars = "";
// Not Used. We never received the whitelist
String l_SWhiteList = "";
// Our String without the invalid characters.
AnsiString l_stempString;
// convert the string into an array of chars
wchar_t* outputChars = l_sConversion.w_str();
char * pszOutputString = NULL;
//convert any unicode characters to ASCII
ConvertUnicodeToAscii(outputChars, pszOutputString);
l_stempString = (AnsiString)pszOutputString;
//We're going backwards since we are removing characters which changes the length and position.
for (int i = l_stempString.Length(); i > 0; i--)
{
char l_sCurrentChar = l_stempString[i];
//If we don't have a valid character, filter it out of the string.
if (((unsigned int)l_sCurrentChar < 31) ||((unsigned int)l_sCurrentChar > 127))
{
String l_sSecondHalf = "";
String l_sFirstHalf = "";
l_sSecondHalf = l_stempString.SubString(i + 1, l_stempString.Length() - i);
l_sFirstHalf = l_stempString.SubString(0, i - 1);
l_stempString = l_sFirstHalf + l_sSecondHalf;
filterChars += "\'" + ((String)(unsigned int)(l_sCurrentChar)) + "\' ";
}
}
if (filterChars.Length() > 0)
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, "The Following ASCII Values were filtered from the string: " + filterChars);
}
// Delete the char* to avoid memory leaks.
delete [] pszOutputString;
return l_stempString;
}
Now this seems to work except, when you try to copy and past bullets from a word document.
o Bullet1:
 subbullet1.
You will get something like this
oBullet1?subbullet1.
My filter function is called on an onchange event.
The bullets are replaced with the value o and a question mark.
What am I doing wrong, and is there a better way of trying to do this.
I'm using c++ builder XE5 so please no Visual C++ solutions.
When you perform the conversion to ASCII (which is not actually converting to ASCII, btw), Unicode characters that are not supported by the target codepage are lost - either dropped, replaced with ?, or replaced with a close approximation - so they are not available to your scanning loop. You should not do the conversion at all, scan the source Unicode data as-is instead.
Try something more like this:
#include <System.Character.hpp>
String __fastcall TMainForm::filterInput(String l_sConversion)
{
// Used to store every character sequence that was stripped out.
String filterChars;
// Not Used. We never received the whitelist
String l_SWhiteList;
// Our String without the invalid sequences.
String l_stempString;
int numChars;
for (int i = 1; i <= l_sConversion.Length(); i += numChars)
{
UCS4Char ch = TCharacter::ConvertToUtf32(l_sConversion, i, numChars);
String seq = l_sConversion.SubString(i, numChars);
//If we don't have a valid codepoint, filter it out of the string.
if ((ch <= 31) || (ch >= 127))
filterChars += (_D("\'") + seq + _D("\' "));
else
l_stempString += seq;
}
if (!filterChars.IsEmpty())
{
LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, _D("The Following Values were filtered from the string: ") + filterChars);
}
return l_stempString;
}

Groovy string match 90% (ignore letter casing)

I need to write a Groovy function to check if two given strings match at least 90%. I just wanted to know if anyone knew of an already existent such utility method that I could use in a Grails project. I haven't really written the method yet but ideally this is how it would work:
def doStringsMatch(String str1, String str2) {
if (str1 and str2 match at least 90% or
str1 appears in str2 somewhere or
str2 appears in str1 somewhere)
return true
else
return false
}
Thanks
This is a groovy implementation of Levenshtein distance, basically it returns a percentage of how similar the two strings appear to be. 0 means they are completely different and 1 means they are the exact same. This implementation is case insensitive.
private double similarity(String s1, String s2) {
if (s1.length() < s2.length()) { // s1 should always be bigger
String swap = s1; s1 = s2; s2 = swap;
}
int bigLen = s1.length();
if (bigLen == 0) { return 1.0; /* both strings are zero length */ }
return (bigLen - computeEditDistance(s1, s2)) / (double) bigLen;
}
private int computeEditDistance(String s1, String s2) {
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
int[] costs = new int[s2.length() + 1];
for (int i = 0; i <= s1.length(); i++) {
int lastValue = i;
for (int j = 0; j <= s2.length(); j++) {
if (i == 0)
costs[j] = j;
else {
if (j > 0) {
int newValue = costs[j - 1];
if (s1.charAt(i - 1) != s2.charAt(j - 1))
newValue = Math.min(Math.min(newValue, lastValue),
costs[j]) + 1;
costs[j - 1] = lastValue;
lastValue = newValue;
}
}
}
if (i > 0)
costs[s2.length()] = lastValue;
}
return costs[s2.length()];
}

Is it possible to parse a string of fixed length in yacc/lex?

I have a file format something like this
...
{string_length} {binary_string}
...
example:
...
10 abcdefghij
...
Is this possible to parse using lexer/yacc? There is no null terminator for the string, so I'm at a loss of how to tokenize that.
I'm currently using ply's lexer and yacc for this
You can't do it with a regular expression, but you can certainly extract the lexeme. You're not specific about how the length is terminated; here, I'm assuming that it is terminated by a single space character. I'm also assuming that yylval has some appropriate struct type:
[[:digit:]]+" " { unsigned long len = atol(yytext);
yylval.str = malloc(len);
yylval.len = len;
for (char *p = yylval.str; len; --len, ++p) {
int ch = input();
if (ch == EOF) { /* handle the lexical error */ }
*p = ch;
}
return BINARY_STRING;
}
There are other solutions (a start condition and a state variable for the count, for example), but I think the above is the simplest.

Will this unicode encryption fail?

I'm not needing any serious security, I just need to stop 11 year olds with plist editors from editing their number of coins in my game with ease.
I created a function that takes a string, for each unicode value of a character it raises this unicode value by 220 plus 14 times the character number that it is in the string.
Obviously this will fail (I think) if the string was like a million characters long because eventually you run out of unicode characters, but for all intents and purposes, this will only be used on strings of 20 characters and less.
Are there any unicode characters in this range that will not be stored to a plist or will be ignored by Apple's underlying code when I save the plist so that when I retrieve it and decrypt the character will be gone and I can't decrypt it?
+(NSString*)encryptString:(NSString*)theString {
NSMutableString *encryptedFinal = [[NSMutableString alloc] init];
for (int i = 0; i < theString.length; i++) {
unichar uniCharacter = [theString characterAtIndex:i];
uniCharacter += +220+(14*i);
[encryptedFinal appendFormat:#"%C", uniCharacter];
}
return encryptedFinal;
}
+(NSString*)decryptString:(NSString*)theString {
NSMutableString *decryptedFinal = [[NSMutableString alloc] init];
for (int i = 0; i < theString.length; i++) {
unichar uniCharacter = [theString characterAtIndex:i];
uniCharacter += +220+(14*i);
[decryptedFinal appendFormat:#"%C", uniCharacter];
}
return decryptedFinal;
}
It works for a range of a string of length 20 characters or less if you are encrypting one of the first 26+26+10+30 characters in the unicode index at any given point along the 20 character line. It probably works higher, I just didn't test it any higher.
This is the code I created to test it, all unicode characters were stored in an NSString and stayed valid for counting later.
int i = 0;
NSMutableString *encryptedFinal = [[NSMutableString alloc] init];
NSString *theString = #"a";
int j = 26+26+10+30;//letters + capital letters + numbers + 30 extra things like ?><.\]!#$
int f = 0;
int z = 0;
while (f < j) {
while (i < 220+220+(14*20)) {
unichar uniCharacter = [theString characterAtIndex:0];
uniCharacter += +f;
uniCharacter += +220+(14*i);
[encryptedFinal appendFormat:#"%C", uniCharacter];
i++;
}
z += i;
f++;
i = 0;
}
NSLog(#"%#", encryptedFinal);
NSLog(#"%i == %i?", z, encryptedFinal.length);
There are two thing that you can do:
Save the number of coins using NSData rather than using
NSNumber. Then use
NSData+AES
to encrypt it. You can even encrypt your entire .plist file to
ensure that no other fields are changed.
Security through obscurity. Just save the number of coins as an important sounding field. e.g.:Security Token Number. You can also create a bogus number of coins field whose value is ignored. Or maybe save the same value in both the fields and flag the user for cheating if the two values don't match.

Resources