How format strings get their arguments from the stack - stack

I'm playing around with format strings and am finding it difficult to understand, how the arguments are retrieved from the stack. This is my example code:
#include <stdio.h>
int main(int argc, char **argv)
{
int i = 123321;
char a[] = "AAAAA";
printf(argv[1]);
printf("\n");
return 0;
}
Now, I've been running the program in a debugger, experimenting with input, so things like "AAA %x %x" and looking at the memory segment between the stack pointer and the base pointer, trying to figure out which format specifiers got replaced with which words.
This lead to some confusion. I ran the program with "AAAAA %x %x %x %x %x %x %x %p %p %x %x %x %x %x", as can be seen in the image below.
Afterwards, I examined the memory starting at the stack pointer, going towards and beyond the base pointer. As can be seen by the 0x4141414141 and 0x1e1b9ca70c100 values that are output, which are the array a[]="AAAAA" and the value i=123321 (=0x1e1b9), I was able to get my local variables output to me using the format string. However, if I replace the two %p with two %x, then I only get part of the array (since it's larger than one word) and 0x001e1b9 (the value of i) is not output at all.
Why wont these 4-byte values be popped off the stack? I have to use %p, otherwise it just seems to skip these words on the stack and pops of different stuff. Does this have something to do with how the local variables are aligned on the stack?
Thanks,
Brick

Related

Antlr should place some marker if there is no value provided

I have such a line to parse:
*A VE 8507492 8065969 1234 00922 00945 %
All of those values are optional. This is my Grammar for it:
a_ve: '*A VE' INT* INT* INT* INT* INT* PROZ;
where INT is:
INT: [0-9]+ ; and: SPACE: [\r\n ]+ -> skip:
Since all of those values are optional, I can end up with such a line:
*A VE 8507492 8065969 1234 00945 %
where the value 00922 is not there. But in my abstract parse tree there is nothing for the value 00922. There should be something so I can recognize that there were no values. The spaces between all values (8507492 8065969) should be ignored. What should I changed in my grammar to achieve that?
From a syntactic standpoint, what makes the value 00922 stand out in a row of integers? Right, nothing. The parser (which is only matching syntax to defined rules) cannot know that this number has a special meaning for you. It's a semantic problem you have to solve after parsing. The entire rule should just be:
a_ve: '*A VE' INT* PROZ;
Then in your parse tree (note: there is no such thing like an abstract parse tree) you can check the incoming numbers and act on them. What you consider missing or not is up to you then.

How to read a char in Vala?

I'm programming in Vala language and I've a problem: I don't know how to read a char (although I know how to read a string and an integer).
string name = stdin.read_line();
int maximum = int.parse(stdin.read_line());
char option = ????;
One option is using scanf(), but if I use it I have problems during execution time.
If you just want to read a single character from a GLib.FileStream, you can use GLib.FileStream.getc, which "reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error." Actually, GLib.FileStream.read_line is implemented using getc. So, the "????" in your question would be something like (char) stdin.getc().
If you are trying to interact with a user, note that input is typically buffered by the terminal until the end of a line. If you want to get a character immediately after the user presses a key you'll need to use something like ncurses (for which Vala bindings are distributed with valac).

rule exclusion in flex

I am trying to write a flex file which recognizes (-! comment !-) as one token called comment. The following is my file:
%{
#include <stdio.h>
void showToken(char* name);
void error();
void enter();
int lineNum=1;
%}
%option yylineno
%option noyywrap
whitespace ([\t ])
enter ([\n])
startcomment (\(\-\!)
endcomment (\!\-\))
comment (^\!\-\))
%%
{startcomment}{comment}*{endcomment} showToken("COMMENT");
{enter} enter();
{whitespace}
. error();
%%
void showToken(char* name){
printf("%d %s %s %d% \n",lineNum,name, yytext);
}
void enter(){
lineNum++;
}
void error(){
printf("%d error %s \n",lineNum,yytext);
}
but i fail for a simple (-! comment !-) input, this file does recognize the (-! and !-) but fails to recognize my comment rule. I did try replacing it with comment (^{endcomment}) but it did not work, any suggestions?
You seem to think that ^ means the following pattern should not match, but it means to match the start of a line. Inside a character class ^ does mean everything but the character class, but outside a character class its meaning is totally different.
In answer to your question for an alternative. Your problem is similar to C-comment /* comment */. The following expression matches C-comment:
"/*"([^*]|"*"+[^/*])*"*"+"/"
Alternatively and more intuitive (if you like) you can use a sub-automaton:
%x comment
%%
"/*" { BEGIN(comment); }
<comment>(.|"\n") { /* Skip */ }
<comment>"*/" { BEGIN(INITIAL); }
%%
I'll leave it as an exercise to apply this to your comment style. Having !-) as the closing of your comment, makes the first solution a bit more complicated.
Note that in general the second solution is preferred because it does not cause the use of a big buffer. The first solution will create a buffer containing the complete comment (which can be big), whereas the buffer requirements for the second solution is at most two characters long.
The easiest way to maintain line-numbers is using the %option yylineno as flex will then keep track of line-numbers in the variable int yylineno. Alternatively you can count the number of new-lines in yytext. In the second solution you can split the second rule and make a separate case for "\n" and count line-numbers there.

parsing bibtex with bison

I am a novice. I want to parse bibtex file using flex/bison. A sample
bibtex is:
#Book{a1,
author="amook",
Title="ASR",
Publisher="oxf",
Year="2010",
Add="UK",
Edition="1",
}
#Article{a2,
Author="Rudra Banerjee",
Title={FeNiMo},
Publisher={P{\"R}B},
Issue="12",
Page="36690",
Year="2011",
Add="UK",
Edition="1",
}
and for parsing this I have written the following code:
%{
#include <stdio.h>
#include <stdlib.h>
%}
%{
char yylval;
int YEAR,i;
//char array_author[1000];
%}
%x author
%x title
%x pub
%x year
%%
#                               printf("\nNEWENTRY\n");
[a-zA-Z][a-zA-Z0-9]*            {printf("%s",yytext);
                                        BEGIN(INITIAL);}
author=                         {BEGIN(author);}
<author>\"[a-zA-Z\/.]+\"        {printf("%s",yytext);
                                        BEGIN(INITIAL);}
year=                           {BEGIN(year);}
<year>\"[0-9]+\"                {printf("%s",yytext);
                                        BEGIN(INITIAL);}
title=                          {BEGIN(title);}
<title>\"[a-zA-Z\/.]+\"         {printf("%s",yytext);
                                        BEGIN(INITIAL);}
publisher=                      {BEGIN(pub);}
<pub>\"[a-zA-Z\/.]+\"           {printf("%s",yytext);
                                        BEGIN(INITIAL);}
[a-zA-Z0-9\/.-]+=        printf("ENTRY TYPE ");
\"                      printf("QUOTE ");
\{                      printf("LCB ");
\}                      printf(" RCB");
;                       printf("SEMICOLON ");
\n                      printf("\n");
%%
int main(){
  yylex();
//char array_author[1000];
//printf("%d%s",&i,array_author[i]);
i++;
return 0;
}
The problem is that I want to separate key and val in different
variables and store it in some place (may be array).
Can I have some insight?
If I'd seen this question a year ago I would have made a contemporaneous comment so the question could be improved. The code supplied is not a parser, but regular expressions coded for flex only. Scanning an input file for tokens using regular expressions is but a part of building a parser. No grammar or structure for the bibtex file has been defined for bison.
To separate the key and val, if that what was all that was required, could be done much more easily with tools like awk and sed than flex. One thing I'd point out is that the vals always follow an equal sign. Kinda makes them easy to identify without any special syntactic jiggery pokery.
As we have no information as to why we need to parse a bibtex file, and the ultimate goal of the exercise its hard to see what would be the best approach.
Edit: This question is a duplicate, as the OP asked it again and it was answered: parse bibtex with flex+bison: revisited

Convert TCHAR array to char array

How to convert to TCHAR[] to char[] ?
Honestly, I don't know how to do it with arrays but with pointers, Microsoft provides us with some APIs, such as wctomb and wcstombs. First one is less secure than the second one. So I think you can do what you want to achieve with one array-to-pointer and one pointer-to-array casting like;
// ... your includes
#include <stdlib.h>
// ... your defines
#define MAX_LEN 100
// ... your codes
// I assume there is no any defined TCHAR array to be converted so far, so I'll create one
TCHAR c_wText[MAX_LEN] = _T("Hello world!");
// Now defining the char pointer to be a buffer for wcstomb/wcstombs
char c_szText[MAX_LEN];
wcstombs(c_szText, c_wText, wcslen(c_wText) + 1);
// ... and you're free to use your char array, c_szText
PS: Could not be the best solution but at least it's working and functional.
TCHAR is a Microsoft-specific typedef for either char or wchar_t (a wide character).
Conversion to char depends on which of these it actually is. If TCHAR is actually a char, then you can do a simple cast, but if it is truly a wchar_t, you'll need a routine to convert between character sets. See the function MultiByteToWideChar().
Why not just use wcstombs_s ?
Here is the code to show how simple it is.
#define MAX_LENGTH 500
...
TCHAR szWideString[MAX_LENGTH];
char szString[MAX_LENGTH];
size_t nNumCharConverted;
wcstombs_s(&nNumCharConverted, szString, MAX_LENGTH,
szWideString, MAX_LENGTH);
It depends on the character set (Unicode or ANSI) (wchar_t or char), so if you are using ANSI simply TCHAR will be char without any casting, but for Unicode, you have to convert from wchar_t to char, you can use WideCharToMultiByte

Resources