Floating point getting truncated in bison grammar - parsing

I finally got around to learning the basics of lex and bison. The problem I had was that I was calculating how much money I was going to give to my co-worker for picking up a burrito, and didn't like doing it manually.
For example, a $7.75 burrito + 20% tip can be figured out using 7.75*(1 + 20/100.0). However, I'd rather have the computer just take $7.75 + 20% and do it for me.
So I made this: https://github.com/tlehman/tipcalc
The lexing rules are
%%
\$ return TOKDOLLAR;
\% return TOKPERCENT;
[0-9]+(\.[0-9]+)* yylval=atof(yytext); return NUMBER;
[ \t]+ /* eat whitespace */
[\+\-] return TOKOP;
%%
And the parsing rules are
%%
start:
dollars TOKOP percentage
{
double dollars = $1;
double percentage = ($3)/(100.0);
double total = dollars + dollars*percentage;
printf("debug: dollars = %f\n", dollars);
printf("debug: percent = %f\n", percentage);
printf("%.2f", total);
}
dollars:
TOKDOLLAR NUMBER
{
$$ = (double)$2;
}
percentage:
NUMBER TOKPERCENT
{
$$ = (double)$1;
}
%%
The only problem is that dollars is getting handled incorrectly, when I run
$ echo '$7.75 + 20%' | ./tipcalc
I get this output:
debug: dollars = 7.000000
debug: percent = 0.200000
8.40
The dollars value is getting rounded somewhere. I think the rounding is happening after lexing since percentage seems to work with all the values I threw at it. I can't figure out where it is happening, have any ideas?

By default, the values passed around by the Bison-generated parser (yylval and the dollar things) are integers. So unless you explicitly tell Bison they are doubles, they will be integers. This includes yylval, so the truncation happens already here: yylval=atof(yytext);

Related

Implement heredocs with trim indent using PEG.js

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*

How to achieve capturing groups in flex lex?

I wanted to match for a string which starts with a '#', then matches everything until it matches the character that follows '#'. This can be achieved using capturing groups like this: #(.)[^(?1)]*(?1)(EDIT this regex is also erroneous). This matches #$foo$, does not match #%bar&, matches first 6 characters of #"foo"bar.
But since flex lex does not support capturing groups, what is the workaround here?
As you say, (f)lex does not support capturing groups, and it certainly doesn't support backreferences.
So there is no simple workaround, but there are workarounds. Here are a few possibilities:
You can read the input one character at a time using the input() function, until you find the matching character (but you have to create your own buffer to store the characters, because characters read by input() are not added to the current token). This is not the most efficient because reading one character at a time is a bit clunky, but it's the only interface that (f)lex offers. (The following snippet assumes you have some kind of expandable stringBuilder; if you are using C++, this would just be replaced with a std::string.)
#. { StringBuilder sb = string_builder_new();
int delim = yytext[1];
for (;;) {
int next = input();
if (next == delim) break;
if (next == EOF ) { /* Signal error */; break; }
string_builder_addchar(next);
}
yylval = string_builder_release();
return DELIMITED_STRING;
}
Even less efficiently, but perhaps more conveniently, you can get (f)lex to accumulate the characters in yytext using yymore(), matching one character at a time in a start condition:
%x DELIMITED
%%
int delim;
#. { delim = yytext[1]; BEGIN(DELIMITED); }
<DELIMITED>.|\n { if (yytext[0] == delim) {
yylval = strdup(yytext);
BEGIN(INITIAL);
return DELIMITED_STRING;
}
yymore();
}
<DELIMITED><<EOF>> { /* Signal unterminated string error */ }
The most efficient solution (in (f)lex) is to just write one rule for each possible delimiter. While that's a lot of rules, they could be easily generated with a small script in whatever scripting language you prefer. And, actually, there are not that many rules, particularly if you don't allow alphabetic and non-printing characters to be delimiters. This has the additional advantage that if you want Perl-like parenthetic delimiters (#(Hello) instead of #(Hello(), you can just modify the individual pattern to suit (as I've done below). [Note 1] Since all the actions are the same; it might be easier to use a macro for the action, making it easier to modify.
/* Ordinary punctuation */
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#![^!]*! { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\.[^.]*\. { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Matched pairs */
#<[^>]*> { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\[[^]]*] { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Trap errors */
# { /* Report unmatched or invalid delimiter error */ }
If I were writing a script to generate these rules, I would use hexadecimal escapes for all the delimiter characters rather than trying to figure out which ones needed escapes.
Notes:
Perl requires nested balanced parentheses in constructs like that. But you can't do that with regular expressions; if you wanted to reproduce Perl behaviour, you'd need to use some variation on one of the other suggestions. I'll try to revisit this answer later to address that feature.

using a string in a math equation in Dart

I store various formulas in Postgres and I want to use those formulas in my code. It would look something like this:
var amount = 100;
var formula = '5/105'; // normally something I would fetch from Postgres
var total = amount * formula; // should return 4.76
Is there a way to evaluate the string in this manner?
As far as I'm aware, there isn't a formula solver package developed for Dart yet. (If one exists or gets created after this post, we can edit it into the answer.)
EDIT: Mattia in the comments points out the math_expressions package, which looks pretty robust and easy to use.
There is a way to execute arbitrary Dart code as a string, but it has several problems. A] It's very roundabout and convoluted; B] it becomes a massive security issue; and C] it only works if the Dart is compiled in JIT mode (so in Flutter this means it will only work in debug builds, not release builds).
So the answer is that unfortunately, you will have to implement it yourself. The good news is that, for simple 4-function arithmetic, this is pretty straight-forward, and you can follow a tutorial on writing a calculator app like this one to see how it's done.
Of course, if all your formulas only contain two terms with an operator between them like in your example snippet, it becomes even easier. You can do the whole thing in just a few lines of code:
void main() {
final amount = 100;
final formula = '5/105';
final pattern = RegExp(r'(\d+)([\/+*-])(\d+)');
final match = pattern.firstMatch(formula);
final value = process(num.parse(match[1]), match[2], num.parse(match[3]));
final total = amount * value;
print(total); // Prints: 4.761904761904762
}
num process(num a, String operator, num b) {
switch (operator) {
case '+': return a + b;
case '-': return a - b;
case '*': return a * b;
case '/': return a / b;
}
throw ArgumentError(operator);
}
There are a few packages that can be used to accomplish this:
pub.dev/packages/function_tree
pub.dev/packages/math_expressions
pub.dev/packages/expressions
I used function_tree as follows:
double amount = 100.55;
String formula = '5/105*.5'; // From Postgres
final tax = amount * formula.interpret();
I haven't tried it, but using math_expressions it should look like this:
double amount = 100.55;
String formula = '5/105*.5'; // From Postgres
Parser p = Parser();
// Context is used to evaluate variables, can be empty in this case.
ContextModel cm = ContextModel();
Expression exp = p.parse(formula) * p.parse(amount.toString());
// or..
//Expression exp = p.parse(formula) * Number(amount);
double result = exp.evaluate(EvaluationType.REAL, cm);
// Result: 2.394047619047619
print('Result: ${result}');
Thanks to fkleon for the math_expressions help.

How to make exact decimal computations?

I need to make decimal computations but sometimes the result is not exact.
0.009 + 0.001; // => 0.009999999999999998
How can I workaround that ?
You can use the decimal package. This package enables to make computations on decimal numbers without loosing precision like double operations.
Decimal.parse('0.2') + Decimal.parse('0.1'); // => 0.3
Decimal.parse('0.2') returns a new Decimal object that can be handled like num (by the way Decimal is not a num because num cannot be used as superclass or implemented).
To make your code shorter you can define a shortcut for Decimal.parse :
final d = Decimal.parse;
d('0.2') + d('0.1'); // => 0.3

What standard produced hex-encoded characters with an extra "25" at the front?

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:
VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS.
The encoded characters in this sample are as follows:
%253a = %3A = a colon
%252c = %2C = a comma
%2527 = %27 = an apostrophe (non-curly)
I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.
So, what standard is this? Is there an official algorithm for converting values like this to other encodings?
%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.
If that's the case, it is safe to replace %25 with % (or just URLDecode twice)
The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.
It looks like your data got URL encoded twice: , -> %2C -> %252C
Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.
Create a counter that increments one by one for next two characters, and if you found modulus, you go back, assign the previous counter the '%' char and proceed again. Something like this.
char *str, *newstr; // Fill up with some memory before proceeding below..
....
int k = 0, j = 0;
short modulus = 0;
char first = 0, second = 0;
short proceed = 0;
for(k=0,j=0; k<some_size; j++,k++) {
if(str[k] == '%') {
++k; first = str[k];
++k; second = str[k];
proceed = 1;
} else if(modulus == 1) {
modulus = 0;
--j; first = str[k];
++k; second = str[k];
newstr[j] = '%';
proceed = 1;
} else proceed = 0; // Do not do decoding..
if(proceed == 1) {
if(first == '2' && second == '5') {
newstr[j] = '%';
modulus = 1;
......

Resources