I would write a function which can parse the multiplication of 2 algebraic expressions in GF(2), i.e any variable in the expression only take on 2 possible values 0 or 1, so a^2 = a,(0^2 = 0, 1^2 = 1)
As an example, if we expand (a+b)*(a+c) in GF(2), we should get
(a + b)*(a + c) = a^2 + a*b + a*c + b*c = a + a*b + a*c + b*c.
However, I am not sure how to start about the parsing of 2 algebraic expressions using strings. Any suggestion/ help is appreciated. Thanks!
I would recommend taking a look at OMeta, by Alex Warth, and/or PetitParser, by Lucas Rengli. Both are excellent frameworks for writing parsers. The first one is for JS, the second for Smalltalk.
Here are some few initial lines of code showing how to write your parser in PetitParser. Every fragment is a method of your own subclass of PPCompositeParser.
constant
ˆ$0 asParser / $1 asParser
variable
^#letter asParser
timesOp
^#blank asParser star , $* asParser, #blank asParser star
sumOp
^#blank asParser star, $* asParser, #blank asParser star
element
^self constant / self variable
term
^self element , (self timesOp , self element) star
etc.
I'm not saying this is trivial. I'm only saying that this is where I would start. Note also that once you have your grammar in place you might want to subclass it so you can generate more appropriate productions, etc.
Writing parsers for big complicated languages can be hard. But writing parsers for algebraic expressions (GF(2) or otherwise) is pretty easy.
See my SO answer on how to write such parsers easily: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
The GF(2) bit is about semantic interpretation of what such a formula means. It doesn't matter at all for parsing, which is purely about syntax.
Where meaning comes into play is when you want to interpret the formula.
At some point, you may want to evaluate the expression using values for the variables. To do that, you have to capture the formula as a data structure (usually called an (abstract) syntax tree), and then walk that tree to compute the desired result. That link also discusses how to do that.
If you want to manipulate the formula symbolically, you're in an entirely different ball game. Parsing is still easy, but formula manipulation is not, and you'll want to use tools that are designed to do such symbolic manipulation; they generally define thier own parsing machinery (and make it easy to use) to ensure that the captured parse can be manipulated. And of course, you'll have to define what the rules of you symbolic manipulation are.
You can see an example of how to write something pretty close to your needs at Symbolic Algebra with a program transformation system. (This a tool that my company builds).
Related
I am getting the user to input a function, e.g. y = 2x^2 + 3, as a string. What I am looking to do is to enter that string into TChart and for TChart to graph the function.
As far as I know, TChart/TeeChart will only accept X values that are assigned values, e.g. -10 to 10 for X, so the X value would need to be calculated each time - this isn't an issue.
The issue is getting each part of the inputted function and substituting the X-values into each part. The workaround I have found is to get the user to enter the degree for each part of the function, e.g. 2 for X^2, 3 for X^3, etc. but is there a cleaner way of doing this?
If I could convert the inputted string into a Mathematical formula which TeeChart would accept, that would be the ideal outcome.
Saying that you can't use external units effectively makes your question unanswerable in the SO format, because the topic is far broader (and deeper) that can comfortably be dealt with in SO's Q&A format. So the following is at best an outline:
If you want to, or have to, write a DIY expression evaluator, one way to do it is to proceed as follows:
Write yourself a class that takes a string as input and snips it up into a series of symbols, aka "tokens" which represent the component parts of the expression, e.g, numbers, operators, parentheses, names of functions, names of variables, etc; these tokens might themselves be records or class instances and need to include a mechanisms for storing values associated with particular symbols (e.g. the tokens that represent numbers in the input). This step is called "tokenisation" or "lexing". Store the resulting list of symbols in a list or similiar structure. This class needs to implement a mechanism to retrieve the next symbol from the list (usually, this method is called something like "NextToken") and indicate whether there are any symbols left. This class also needs a mechanism to "put back" a symbol (or, equivalently, "peek" the symbol following the current one).
Then, write yourself a s/ware machine which takes the tokenised symbols and "evaluates" the list of symbols to produce the (mathematical) result you're after. This step is an order of magnitude or two more difficult than the tokenisation step. There are numerous ways to do it. As I said an a comment earlier, a recursive descent parser is probably the most tractable approach if you've never done anything like this before. There are countless examples in textbooks, but here's a link to an article about a Delphi implementation that should be understandable as an intro:
http://www8.umoncton.ca/umcm-deslierres_michel/Calcs/ParsingMathExpr-1.html
That article begins by noting that there are numerous pre-existing Delphi expression evaluators but makes the point that they are not necessarily the best place to start for someone wanting to learn how to write an evaluator/parser rather than just use one. Instead it goes through the coding of an evaluator to implement this simple expression grammar:
expression : term | term + term | term − term
term : factor | factor * factor | factor / factor
factor : number | ( expression ) | + factor | − factor
(the vertical bar | denotes ‘or’)
The article has a link to a second part which shows had to add exponentiation to the evaluator - this is trickier than it might sound and involves issues of ambiguity: e.g. how to evaluate - and what does it mean to write - an expression like
x^y^z
? This relates to the issue of "associativity": most operators are "left associative" which means that they bind more tightly to what's on the left of them than what's on their right. The exponentiation operator is an example of the reverse, where the operator binds more tightly to what's on its right.
Have fun!
By the way, you used to see suggestions to implement an evaluator using the "shunting yard algorithm"
http://en.wikipedia.org/wiki/Shunting-yard_algorithm
to convert an "infix" expression where the operators are between the operands, as in 1 + 3 * 4 to RPN (reverse Polish notation), as used on older HP calculators. The reason to do that was that RPN makes for much more efficient evaluation of an expression that the infix equivalent. Ymmv, but personally I found that implementing the SY algorithm properly was actually trickier than learning how to write an evaluator in the expression/term/factor style.
Fwiw, RPN is the basis of the Forth programming language, http://en.wikipedia.org/wiki/Forth_%28programming_language%29, so you could write a Forth implementation in Delphi if you wanted!
This is a question I've been mildly irritated about for some time and just never got around to search the answer to.
However I thought I might at least ask the question and perhaps someone can explain.
Basically many languages I've worked in utilize syntactic sugar to write (using syntax from C++):
int main() {
int a = 2;
a += 3; // a=a+3
}
while in lua the += is not defined, so I would have to write a=a+3, which again is all about syntactical sugar. when using a more "meaningful" variable name such as: bleed_damage_over_time or something it starts getting tedious to write:
bleed_damage_over_time = bleed_damage_over_time + added_bleed_damage_over_time
instead of:
bleed_damage_over_time += added_bleed_damage_over_time
So I would like to know not how to solve this if you don't have a nice solution, in that case I would of course be interested in hearing it; but rather why lua doesn't implement this syntactical sugar.
This is just guesswork on my part, but:
1. It's hard to implement this in a single-pass compiler
Lua's bytecode compiler is implemented as a single-pass recursive descent parser that immediately generates code. It does not parse to a separate AST structure and then in a second pass convert that to bytecode.
This forces some limitations on the grammar and semantics. In particular, anything that requires arbitrary lookahead or forward references is really hard to support in this model. This means assignments are already hard to parse. Given something like:
foo.bar.baz = "value"
When you're parsing foo.bar.baz, you don't realize you're actually parsing an assignment until you hit the = after you've already parsed and generated code for that. Lua's compiler has a good bit of complexity just for handling assignments because of this.
Supporting self-assignment would make that even harder. Something like:
foo.bar.baz += "value"
Needs to get translated to:
foo.bar.baz = foo.bar.baz + "value"
But at the point that the compiler hits the =, it's already forgotten about foo.bar.baz. It's possible, but not easy.
2. It may not play nice with the grammar
Lua doesn't actually have any statement or line separators in the grammar. Whitespace is ignored and there are no mandatory semicolons. You can do:
io.write("one")
io.write("two")
Or:
io.write("one") io.write("two")
And Lua is equally happy with both. Keeping a grammar like that unambiguous is tricky. I'm not sure, but self-assignment operators may make that harder.
3. It doesn't play nice with multiple assignment
Lua supports multiple assignment, like:
a, b, c = someFnThatReturnsThreeValues()
It's not even clear to me what it would mean if you tried to do:
a, b, c += someFnThatReturnsThreeValues()
You could limit self-assignment operators to single assignment, but then you've just added a weird corner case people have to know about.
With all of this, it's not at all clear that self-assignment operators are useful enough to be worth dealing with the above issues.
I think you could just rewrite this question as
Why doesn't <languageX> have <featureY> from <languageZ>?
Typically it's a trade-off that the language designers make based on their vision of what the language is intended for, and their goals.
In Lua's case, the language is intended to be an embedded scripting language, so any changes that make the language more complex or potentially make the compiler/runtime even slightly larger or slower may go against this objective.
If you implement each and every tiny feature, you can end up with a 'kitchen sink' language: ADA, anyone?
And as you say, it's just syntactic sugar.
Another reason why Lua doesn't have self-assignment operators is that table access can be overloaded with metatables to have arbitrary side effects. For self assignment you would need to choose to desugar
foo.bar.baz += 2
into
foo.bar.baz = foo.bar.baz + 2
or into
local tmp = foo.bar
tmp.baz = tmp.baz + 2
The first version runs the __index metamethod for foo twice, while the second one does so only once. Not including self-assignment in the language and forcing you to be explicit helps avoid this ambiguity.
I'm making my own javascript-based programming language (yeah, it is crazy, but it's for learn only... maybe?). Well, I'm reading about parsers and the first pass is to convert the code source to tokens, like:
if(x > 5)
return true;
Tokenizer to:
T_IF "if"
T_LPAREN "("
T_IDENTIFIER "x"
T_GT ">"
T_NUMBER "5"
T_RPAREN ")"
T_IDENTIFIER "return"
T_TRUE "true"
T_TERMINATOR ";"
I don't know if my logic is correct for that for while. On my parser it is even better (or not?) and translate to it (yeah, multidimensional array):
T_IF "if"
T_EXPRESSION ...
T_IDENTIFIER "x"
T_GT ">"
T_NUMBER "5"
T_CLOSURE ...
T_IDENTIFIER "return"
T_TRUE "true"
I have some doubts:
Is my way better or worse that the original way? Note that my code will be read and compiled (translated to another language, like PHP), instead of interpreted all the time.
After I tokenizer, what I need do exactly? I'm really lost on this pass!
There are some good tutorial to learn how I can do it?
Well, is that. Bye!
Generally, you want to separate the functions of the tokeniser (also called a lexer) from other stages of your compiler or interpreter. The reason for this is basic modularity: each pass consumes one kind of thing (e.g., characters) and produces another one (e.g., tokens).
So you’ve converted your characters to tokens. Now you want to convert your flat list of tokens to meaningful nested expressions, and this is what is conventionally called parsing. For a JavaScript-like language, you should look into recursive descent parsing. For parsing expressions with infix operators of different precedence levels, Pratt parsing is very useful, and you can fall back on ordinary recursive descent parsing for special cases.
Just to give you a more concrete example based on your case, I’ll assume you can write two functions: accept(token) and expect(token), which test the next token in the stream you’ve created. You’ll make a function for each type of statement or expression in the grammar of your language. Here’s Pythonish pseudocode for a statement() function, for instance:
def statement():
if accept("if"):
x = expression()
y = statement()
return IfStatement(x, y)
elif accept("return"):
x = expression()
return ReturnStatement(x)
elif accept("{")
xs = []
while True:
xs.append(statement())
if not accept(";"):
break
expect("}")
return Block(xs)
else:
error("Invalid statement!")
This gives you what’s called an abstract syntax tree (AST) of your program, which you can then manipulate (optimisation and analysis), output (compilation), or run (interpretation).
Most toolkits split the complete process into two separate parts
lexer (aka. tokenizer)
parser (aka. grammar)
The tokenizer will split the input data into tokens. The parser will only operate on the token "stream" and build the structure.
Your question seems to be focused on the tokenizer. But your second solution mixes the grammar parser and the tokenizer into one step. Theoretically this is also possible but for a beginner it is much easier to do it the same way as most other tools/framework: keep the steps separate.
To your first solution: I would tokenize your example like this:
T_KEYWORD_IF "if"
T_LPAREN "("
T_IDENTIFIER "x"
T_GT ">"
T_LITARAL "5"
T_RPAREN ")"
T_KEYWORD_RET "return"
T_KEYWORD_TRUE "true"
T_TERMINATOR ";"
In most languages keywords cannot be used as method names, variable names and so on. This is reflected already on the tokenizer level (T_KEYWORD_IF, T_KEYWORD_RET, T_KEYWORD_TRUE).
The next level would take this stream and - by applying a formal grammar - would build some datastructure (often called AST - Abstract Syntax Tree) which might look like this:
IfStatement:
Expression:
BinaryOperator:
Operator: T_GT
LeftOperand:
IdentifierExpression:
"x"
RightOperand:
LiteralExpression
5
IfBlock
ReturnStatement
ReturnExpression
LiteralExpression
"true"
ElseBlock (empty)
Implementing the parser by hand is usually done by some frameworks. Implementing something like that by hand and efficiently is usually done at a university in the better part of a semester. So you really should use some kind of framework.
The input for a grammar parser framework is usually a formal grammar in some kind of BNF. Your "if" part migh look like this:
IfStatement: T_KEYWORD_IF T_LPAREN Expression T_RPAREN Statement ;
Expression: LiteralExpression | BinaryExpression | IdentifierExpression | ... ;
BinaryExpression: LeftOperand BinaryOperator RightOperand;
....
That's only to get the idea. Parsing a realworld-language like Javascript correctly is not an easy task. But funny.
Is my way better or worse that the original way? Note that my code will be read and compiled (translated to another language, like PHP), instead of interpreted all the time.
What's the original way ? There are many different ways to implement languages. I think yours is fine actually, I once tried to build a language myself that translated to C#, the hack programming language. Many language compilers translate to an intermediate language, it's quite common.
After I tokenizer, what I need do exactly? I'm really lost on this pass!
After tokenizing, you need to parse it. Use some good lexer / parser framework, such as the Boost.Spirit, or Coco, or whatever. There are hundreds of them. Or you can implement your own lexer, but that takes time and resources. There are many ways to parse code, I generally rely on recursive descent parsing.
Next you need to do Code Generation. That's the most difficult part in my opinion. There are tools for that too, but you can do it manually if you want to, I tried to do it in my project, but it was pretty basic and buggy, there's some helpful code here and here.
There are some good tutorial to learn how I can do it?
As I suggested earlier, use tools to do it. There are a lot of pretty good well-documented parser frameworks. For further information, you can try asking some people who know about this stuff. #DeadMG , over at the Lounge C++ is building a programming language called "Wide". You may try consulting him.
Let's say I have this statement in a programming language:
if (0 < 1) then
print("Hello")
The lexer will translate it into:
keyword: if
num: 0
op: <
num: 1
keyword: then
keyword: print
string: "Hello"
The parser will then take the information (aka "Token Stream") and make this:
if:
expression:
<:
0, 1
then:
print:
"Hello"
I don't know if this will help or not, but I hope it does.
I need to convert a math formula written in the Latex style to the function of a C/C++ code.
For example:
y = sin(x)^2 would become something like
double y = sin(x) * sin(x);
or
double y = pow(sin(x), 2);
where x is a variable defined somewhere before.
I mean that it should convert the latex formula to the C/C++ syntax. So that if there is a function y = G(x, y)^F(x) it doesn't matter what is G(x,y) and F(x),
it is a problem of the programmer to define it. It will just generate
double y = pow(G(x, y), F(x));
When the formula is too complicated it will take some time to make include it in the C/C++ formula. Is there any way to do this conversion?
Emacs' built-in calculator calc-mode can do this (and much more). Your examples can be converted like this:
Put the formula in some emacs buffer
$ y = sin(x)^2 $
With the cursor in the formula, activate calc-embedded mode
M-x calc-embedded
Switch the display language to C:
M-x calc-c-language
There you are:
$ y == pow(sin(x), 2) $
Note that it interprets the '=' sign in latex as an equality, which results in '==' for C. The latex equivalent to Cs assignment operator '=' would be '\gets'.
More on this topic on Turong's blog
I know the question is too old, but I'll just add a reply anyway as a think it might help someone else later. The question popped up a lot for me in my searches.
I'm working on a tool that does something similar, in a public git repo
You'll have to put some artificial limitations on your latex input, that's out of question.
Currently the tool I wrote only supports mul, div, add, sub, sqrt, pow, frac and sum as those are the only set of operations I need to handle, and the imposed limitations can be a bit loose by providing a preprocessor (see preproc.l for an [maybe not-so-good] example) that would clean away the raw latex input.
A mathematical equation, such as the ones in LaTeX, and a C expression are not interchangeable. The former states a relation between two terms, the latter defines an entity that can be evaluated, unambiguously yielding one value. a = b in C means 'take the value in variable b and store it in variable a', wheres in Math, it means 'in the current context, a and b are equal'. The first describes a computation process, the second describes a static fact. Consequently, the Math equation can be reversed: a = b is equivalent to b = a, but doing the same to the C equation yields something quite different.
To make matters worse, LaTeX formulae only contain the information needed to render the equations; often, this is not enough to capture their meaning.
Of course some LaTeX formulae, like your example, can be converted into C computations, but many others cannot, so any automated way of doing so would only make limited sense.
I'm not sure there is a simple answer, because mathematical formulaes (in LaTeX documents) are actually ambiguous, so to automate their translation to some code requires automating their understanding.
And the MathML standard has, IIRC, two forms representing formulaes (one for displaying, another for computing) and there is some reason for that.
I'm trying to parse a syntax using the Shunting Yard (SY) algorithm. The syntax includes the following commands (they're are many many others though!)
a + b // a and b are numbers
setxy c d //c,d can be numbers
setxy c+d b+a //all numbers
Essentially, setxy is a function but it doesn't expect any function argument separators. This makes it very difficult (impossible?) to do via SY due to the lack of parens and function argument separators.
Any idea if SY can be used to parse a parentheses-less/function argument separator-less function or should I move on to a different parsing algorithm? If so, which one would you recommend?
Thanks!
djs22
Having defined correct grammar you can make http://www.antlr.org/ generate parser for you. Whether it is appropriate solution depends on your homework "requirements".
At least you can generate it and look inside for some hints.
I don't fully understand what you are trying to do, but perhaps you could use some regex? what are you trying to do write a simple command line program?