Using yyparse() to make a two pass assembler? - parsing

I'm writing an assembler for a custom micro controller I'm working on. I've got the assembler to a point where it will assemble instructions down to binary.
However, I'm now having problems with getting labels to work. Currently, when my assembler encounters a new label, it stores the name of the label and the memory location its referring to. When an instruction references a label, the assembler looks up the label and replaces the label with the appropriate value.
This is fine and dandy, but what if the label is defined after the instruction referencing it? Because of this, I need to have my parser run over the code twice.
Here's what I currently have for my main function:
303 int main(int argc, char* argv[])
304 {
305
306 if(argc < 1 || strcmp(argv[1],"-h")==0 || 0==strcmp(argv[1],"--help"))
307 {
308 //printf("%s\n", usage);
309 return 1;
310 }
311 // redirect stdin to the file pointer
312 int stdin = dup(0);
313 close(0);
314
315 // pass 1 on the file
316 int fp = open(argv[1], O_RDONLY, "r");
317 dup2(fp, 0);
318
319 yyparse();
320
321 lseek(fp, SEEK_SET, 0);
322
323 // pass 2 on the file
324 if(secondPassNeeded)
325 {
326 fp = open(argv[1], O_RDONLY, "r");
327 dup2(fp, 0);
328 yyparse();
329 }
330 close(fp);
331
332 // restore stdin
333 dup2(0, stdin);
334
335 for(int i = 0; i < labels.size(); i++)
336 {
337 printf("Label: %s, Loc: %d\n", labels[i].name.c_str(), labels[i].memoryLoc);
338 }
339 return 0;
340 }
I'm using this inside a flex/bison configuration.

If that is all you need, you don't need a full two-pass assembler. If the label is not defined when you reference it, you simply output a stand-in address (say 0x0000) and have a data structure that lists all of the places with forward references and what symbol they refered to. At the end of the file (or block if you have local symbols), you simply go through that list and patch the addresses.

Related

Bison won't return correct tokens [duplicate]

flex code:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include "stdio.h"
4 #include "tp.tab.h"
5 %}
6
7 %%
8 "{" {return '{';}
9 "}" {return '}';}
10 ";" {return ';';}
11 "create" {return CREATE;}
12 "cmd" {return CMD;}
13 "int" {yylval.intval = 20;return INT;}
14 [a-zA-Z]+ {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
15 [ \t\n]
16 <<EOF>> {return 0;}
17 . {printf("mistery char\n");}
18
bison code:
1 %{
2 #include "stdlib.h"
3 #include "stdio.h"
4 #include "stdarg.h"
5 void yyerror(char *s, ...);
6 #define YYDEBUG 1
7 int yydebug = 1;
8 %}
9
10 %union{
11 char *strval;
12 int intval;
13 }
14
15 %token <strval> ID
16 %token <intval> INT
17 %token CREATE
18 %token CMD
19
20 %type <strval> col_definition
21 %type <intval> create_type
22 %start stmt_list
23
24 %%
25 stmt_list:stmt ';'
26 | stmt_list stmt ';'
27 ;
28
29 stmt:create_cmd_stmt {/*printf("create cmd\n");*/}
30 ;
31
32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}' {printf("%s\n" , $3);}
33 ;
34 create_col_list:col_definition
35 | create_col_list col_definition
36 ;
37
38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
39 ;
40
41 create_type:INT {$$ = $1;}
42 ;
43
44 %%
45 extern FILE *yyin;
46
47 void
48 yyerror(char *s, ...)
49 {
50 extern yylineno;
51 va_list ap;
52 va_start(ap, s);
53 fprintf(stderr, "%d: error: ", yylineno);
54 vfprintf(stderr, s, ap);
55 fprintf(stderr, "\n");
56 }
57
58 int main(int argc , char *argv[])
59 {
60 yyin = fopen(argv[1] , "r");
61 if(!yyin){
62 printf("open file %s failed\n" ,argv[1]);
63 return -1;
64 }
65
66 if(!yyparse()){
67 printf("parse work!\n");
68 }else{
69 printf("parse failed!\n");
70 }
71
72 fclose(yyin);
73 return 0;
74 }
75
test input file:
create cmd keeplive
{
int a;
int b;
};
test output:
root#VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
int a;
int b;
}
parse work!
I have two questions:
1) Why does the action at line 38 print the token ';'? For instance, "20 , a;" and "20 , b;"
2) Why does the action at line 32 print "keeplive
{
int a;
int b;
}" instead of simply "keeplive"?
Short answer:
yylval.strval = yytext;
You can't use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.
Longer answer:
yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.
So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

Why is $1 matching the whole grouping and not its token? [duplicate]

flex code:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include "stdio.h"
4 #include "tp.tab.h"
5 %}
6
7 %%
8 "{" {return '{';}
9 "}" {return '}';}
10 ";" {return ';';}
11 "create" {return CREATE;}
12 "cmd" {return CMD;}
13 "int" {yylval.intval = 20;return INT;}
14 [a-zA-Z]+ {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
15 [ \t\n]
16 <<EOF>> {return 0;}
17 . {printf("mistery char\n");}
18
bison code:
1 %{
2 #include "stdlib.h"
3 #include "stdio.h"
4 #include "stdarg.h"
5 void yyerror(char *s, ...);
6 #define YYDEBUG 1
7 int yydebug = 1;
8 %}
9
10 %union{
11 char *strval;
12 int intval;
13 }
14
15 %token <strval> ID
16 %token <intval> INT
17 %token CREATE
18 %token CMD
19
20 %type <strval> col_definition
21 %type <intval> create_type
22 %start stmt_list
23
24 %%
25 stmt_list:stmt ';'
26 | stmt_list stmt ';'
27 ;
28
29 stmt:create_cmd_stmt {/*printf("create cmd\n");*/}
30 ;
31
32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}' {printf("%s\n" , $3);}
33 ;
34 create_col_list:col_definition
35 | create_col_list col_definition
36 ;
37
38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
39 ;
40
41 create_type:INT {$$ = $1;}
42 ;
43
44 %%
45 extern FILE *yyin;
46
47 void
48 yyerror(char *s, ...)
49 {
50 extern yylineno;
51 va_list ap;
52 va_start(ap, s);
53 fprintf(stderr, "%d: error: ", yylineno);
54 vfprintf(stderr, s, ap);
55 fprintf(stderr, "\n");
56 }
57
58 int main(int argc , char *argv[])
59 {
60 yyin = fopen(argv[1] , "r");
61 if(!yyin){
62 printf("open file %s failed\n" ,argv[1]);
63 return -1;
64 }
65
66 if(!yyparse()){
67 printf("parse work!\n");
68 }else{
69 printf("parse failed!\n");
70 }
71
72 fclose(yyin);
73 return 0;
74 }
75
test input file:
create cmd keeplive
{
int a;
int b;
};
test output:
root#VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
int a;
int b;
}
parse work!
I have two questions:
1) Why does the action at line 38 print the token ';'? For instance, "20 , a;" and "20 , b;"
2) Why does the action at line 32 print "keeplive
{
int a;
int b;
}" instead of simply "keeplive"?
Short answer:
yylval.strval = yytext;
You can't use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.
Longer answer:
yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.
So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

SIGBUS crash on Solaris 8

Compiled with g++ 4.7.4 on Solaris 8. 32 bit application. Stack trace is
Core was generated by `./z3'.
Program terminated with signal 10, Bus error.
\#0 0x012656ec in vector<unsigned long long, false, unsigned int>::push_back (this=0x2336ef4 <g_prime_generator>, elem=#0xffbff1f0: 2) at ../src/util/vector.h:284
284 new (m_data + reinterpret_cast<SZ *>(m_data)[SIZE_IDX]) T(elem);
(gdb) bt
\#0 0x012656ec in vector<unsigned long long, false, unsigned int>::push_back (this=0x2336ef4 <g_prime_generator>, elem=#0xffbff1f0: 2) at ../src/util/vector.h:284
\#1 0x00ae66d4 in prime_generator::prime_generator (this=0x2336ef4 <g_prime_generator>) at ../src/util/prime_generator.cpp:24
\#2 0x00ae714c in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at ../src/util/prime_generator.cpp:99
\#3 0x00ae71c4 in _GLOBAL__sub_I_prime_generator.cpp(void) () at ../src/util/prime_generator.cpp:130
\#4 0x00b16a68 in __do_global_ctors_aux ()
\#5 0x00b16aa0 in _init ()
\#6 0x00640b10 in _start ()
(gdb) list
279
280 void push_back(T const & elem) {
281 if (m_data == 0 || reinterpret_cast<SZ *>(m_data)[SIZE_IDX] == reinterpret_cast<SZ *>(m_data)[CAPACITY_IDX]) {
282 expand_vector();
283 }
284 new (m_data + reinterpret_cast\<Z *>(m_data)[SIZE_IDX]) T(elem);
285 reinterpret_cast<SZ *>(m_data)[SIZE_IDX]++;
286 }
287
288 void insert(T const & elem) {
(gdb) ptype SZ
type = unsigned int
(gdb) ptype m_data
type = unsigned long long *
SIGBUS on Solaris is usually indicative of a misaligned access, but I am not sure if it is due to the casting going on an endianess issue
The SPARC data alignment requirements is most likely at issue.
The m_data field in the vector class is off by two fields that are used
to store the size and capacity of a vector.
You can debug this by displaying (printing or using the debugger) the pointer m_data and it's alignment.
One option is to supply a separate vector implementation
where the size and capacity fields are stored
in fields directly in the vector for porting this library utility.
Z3 interacts with memory alignment a few other places (but not overly many).
The main other potential places are in the watch lists (sat_solver and smt_context), and region memory allocators (region.h) and possibly in hash tables.

Golang append memory allocation VS. STL push_back memory allocation

I compared the Go append function and the STL vector.push_back and found that different memory allocation strategy which confused me. The code is as follow:
// CPP STL code
void getAlloc() {
vector<double> arr;
int s = 9999999;
int precap = arr.capacity();
for (int i=0; i<s; i++) {
if (precap < i) {
arr.push_back(rand() % 12580 * 1.0);
precap = arr.capacity();
printf("%d %p\n", precap, &arr[0]);
} else {
arr.push_back(rand() % 12580 * 1.0);
}
}
printf("\n");
return;
}
// Golang code
func getAlloc() {
arr := []float64{}
size := 9999999
pre := cap(arr)
for i:=0; i<size; i++ {
if pre < i {
arr = append(arr, rand.NormFloat64())
pre = cap(arr)
log.Printf("%d %p\n", pre, &arr)
} else {
arr = append(arr, rand.NormFloat64())
}
}
return;
}
But the memory address is invarient to the increment of size expanding, this really confused me.
By the way, the memory allocation strategy is different in this two implemetation (STL VS. Go), I mean the expanding size. Is there any advantage or disadvantage? Here is the simplified output of code above[size and first element address]:
Golang CPP STL
2 0xc0800386c0 2 004B19C0
4 0xc0800386c0 4 004AE9B8
8 0xc0800386c0 6 004B29E0
16 0xc0800386c0 9 004B2A18
32 0xc0800386c0 13 004B2A68
64 0xc0800386c0 19 004B2AD8
128 0xc0800386c0 28 004B29E0
256 0xc0800386c0 42 004B2AC8
512 0xc0800386c0 63 004B2C20
1024 0xc0800386c0 94 004B2E20
1280 0xc0800386c0 141 004B3118
1600 0xc0800386c0 211 004B29E0
2000 0xc0800386c0 316 004B3080
2500 0xc0800386c0 474 004B3A68
3125 0xc0800386c0 711 004B5FD0
3906 0xc0800386c0 1066 004B7610
4882 0xc0800386c0 1599 004B9768
6102 0xc0800386c0 2398 004BC968
7627 0xc0800386c0 3597 004C1460
9533 0xc0800386c0 5395 004B5FD0
11916 0xc0800386c0 8092 004C0870
14895 0xc0800386c0 12138 004D0558
18618 0xc0800386c0 18207 004E80B0
23272 0xc0800386c0 27310 0050B9B0
29090 0xc0800386c0 40965 004B5FD0
36362 0xc0800386c0 61447 00590048
45452 0xc0800386c0 92170 003B0020
56815 0xc0800386c0 138255 00690020
71018 0xc0800386c0 207382 007A0020
....
UPDATE:
See comments for Golang memory allocation strategy.
For STL, the strategy depends on the implementation. See this post for further information.
Your Go and C++ code fragments are not equivalent. In the C++ function, you are printing the address of the first element in the vector, while in the Go example you are printing the address of the slice itself.
Like a C++ std::vector, a Go slice is a small data type that holds a pointer to an underlying array that holds the data. That data structure has the same address throughout the function. If you want the address of the first element in the slice, you can use the same syntax as in C++: &arr[0].
You're getting the pointer to the slice header, not the actual backing array. You can think of the slice header as a struct like
type SliceHeader struct {
len,cap int
backingArray unsafe.Pointer
}
When you append and the backing array is reallocated, the pointer backingArray will likely be changed (not necessarily, but probably). However, the location of the struct holding the length, cap, and pointer to the backing array doesn't change -- it's still on the stack right where you declared it. Try printing &arr[0] instead of &arr and you should see behavior closer to what you expect.
This is pretty much the same behavior as std::vector, incidentally. Think of a slice as closer to a vector than a magic dynamic array.

why is code not executing on return from Future in Dart program

Could someone please explain to me why in the following code (using r25630 Windows), the value of iInsertTot at line 241 is null, or more to the point, why is line 234 ("return iInsertTot;") not executed and therefore at line 241, iInsertTot is null. The value of iInsertTot at lines 231/232 is an integer. While I can and probably should code this differently, I thought that I would try and see if it worked, because my understanding of Futures and Chaining was that it would work. I have used “return” in a similar way before and it worked, but I was returning null in those cases (eg. line 201 below).
/// The problem lines are :
233 fUpdateTotalsTable().then((_) {
234 return iInsertTot;
235 });
While running in the debugger, it appears that line 234 “return iInsertTot;” is never actually executed. Running from command line has the same result.
The method being called on line 233 (fUpdateTotalsTable) is something I am just in the process of adding, and it consists basically of sync code at this stage. However, the debugger appears to go through it correctly.
I have included the method “fUpdateTotalsTable()” (line 1076) just in case that is causing a problem.
Lines 236 to 245 have just been added, however just in case that code is invalid I have commented those lines out and run with the same problem occurring.
218 /*
219 * Process Inserts
220 */
221 }).then((_) {
222 sCheckpoint = "fProcessMainInserts";
223 ogPrintLine.fPrintForce ("Processing database ......");
224 int iMaxInserts = int.parse(lsInput[I_MAX_INSERTS]);
225 print ("");
226 return fProcessMainInserts(iMaxInserts, oStopwatch);
227 /*
228 * Update the 'totals' table with the value of Inserts
229 */
230 }).then((int iReturnVal) {
231 int iInsertTot = iReturnVal;
232 sCheckpoint = "fUpdateTotalsTable (insert value)";
233 fUpdateTotalsTable().then((_) {
234 return iInsertTot;
235 });
236 /*
237 * Display totals for inserts
238 */
239 }).then((int iInsertTot) {
240 ogTotals.fPrintTotals(
241 "${iInsertTot} rows inserted - Inserts completed",
242 iInsertTot, oStopwatch.elapsedMilliseconds);
243
244 return null;
245 /*
192 /*
193 * Clear main table if selected
194 */
195 }).then((tReturnVal) {
196 if (tReturnVal)
197 ogPrintLine.fPrintForce("Random Keys Cleared");
198 sCheckpoint = "Clear Table ${S_TABLE_NAME}";
199 bool tClearTable = (lsInput[I_CLEAR_YN] == "y");
200 if (!tFirstInstance)
201 return null;
202 return fClearTable(tClearTable, S_TABLE_NAME);
203
204 /*
205 * Update control row to increment count of instances started
206 */
207 }).then((_) {
1073 /*
1074 * Update totals table with values from inserts and updates
1075 */
1076 async.Future<bool> fUpdateTotalsTable() {
1077 async.Completer<bool> oCompleter = new async.Completer<bool>();
1078
1079 String sCcyValue = ogCcy.fCcyIntToString(ogTotals.iTotAmt);
1080
1081 print ("\n********* Total = ${sCcyValue} \n");
1082
1083 oCompleter.complete(true);
1084 return oCompleter.future;
1085 }
Your function L230-235 does not return anything and that's why your iInsertTot is null L239. To make it work you have to add a return at line 233.
231 int iInsertTot = iReturnVal;
232 sCheckpoint = "fUpdateTotalsTable (insert value)";
233 return fUpdateTotalsTable().then((_) {
234 return iInsertTot;
235 });

Resources