Interpreter with a one-register VM - possible to evaluate all math. expressions? - parsing

I'm writing an interpreter. I've done that before but never tried one which can work with expressions like 3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3.
I'm not having a problem with the parsing process, actually it is about my VM which then executes the code.
My goal was a fast interpreter and so I decided not to use a stack-based VM where you would need more than one instruction for a multiplication, for example (push, push, mul)
The "assembly" code for the VM generated by the parser looks as following:
3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3
becomes
sub 1 5
pow result 2
pow result 3
div 2 result
mul 4 result
add 3 result
(The result is correct)
As you can see: Every instruction takes no, one or two arguments. There is the result register which holds the result of the last instruction. And that's it.
Can a VM with a language of this structure and only one register calculate every mathematical expression for example Python or PHP can?
If it is not possible without a stack I'll start over right now!

What do you do about (1 + 2) * (3 + 4), or any other that would require you to calculate more than one intermediate result?

Related

Prime factorization of integers with Maxima

I want to use Maxima to get the prime factorization of a random positive integer, e.g. 12=2^2*3^1.
What I have tried so far:
a:random(20);
aa:abs(a);
fa:ifactors(aa);
ka:length(fa);
ta:1;
pfza: for i:1 while i<=ka do ta:ta*(fa[i][1])^(fa[i][2]);
ta;
This will be implemented in STACK for Moodle as part of a online exercise for students, so the exact implementation will be a little bit different from this, but I broke it down to these 7 lines.
I generate a random number a, make sure that it is a positive integer by using aa=|a|+1 and want to use the ifactors command to get the prime factors of aa. ka tells me the number of pairwise distinct prime factors which I then use for the while loop in pfza. If I let this piece of code run, it returns everything fine, execpt for simplifying ta, that is I don't get ta as a product of primes with some exponents but rather just ta=aa.
I then tried to turn off the simplifier, manually simplifying everything else that I need:
simp:false$
a:random(20);
aa:ev(abs(a),simp);
fa:ifactors(aa);
ka:ev(length(fa),simp);
ta:1;
pfza: for i:1 while i<=ka do ta:ta*(fa[i][1])^(fa[i][2]);
ta;
This however does not compile; I assume the problem is somewhere in the line for pfza, but I don't know why.
Any input on how to fix this? Or another method of getting the factorizing in a non-simplified form?
(1) The for-loop fails because adding 1 to i requires 1 + 1 to be simplified to 2, but simplification is disabled. Here's a way to make the loop work without requiring arithmetic.
(%i10) for f in fa do ta:ta*(f[1]^f[2]);
(%o10) done
(%i11) ta;
2 2 1
(%o11) ((1 2 ) 2 ) 3
Hmm, that's strange, again because of the lack of simplification. How about this:
(%i12) apply ("*", map (lambda ([f], f[1]^f[2]), fa));
2 1
(%o12) 2 3
In general I think it's better to avoid explicit indexing anyway.
(2) But maybe you don't need that at all. factor returns an unsimplified expression of the kind you are trying to construct.
(%i13) simp:true;
(%o13) true
(%i14) factor(12);
2
(%o14) 2 3
I think it's conceptually inconsistent for factor to return an unsimplified, but anyway it seems to work here.

How to implement a simple stack machine?

I've read a few articles about this, and have understood roughly what is most commonly used. The most simple and common method seems to be reverse Polish notation, which is as follows. To evaluate an expression, the first two operands are pushed onto the stack, an instruction is read, like an operation such as addition, that operator operates on the two topmost elements of the stack, the two topmost elements are popped off and the result is pushed onto the stack. However this only seems to work with the examples I've seen, but not in other cases. Here are some examples:
The expression:
(11 * 22 + 33 * 44) / 121
is noted as the following instructions:
push 11
push 22
mult
push 33
push 44
mult
add
push 121
div
I understand this. Likewise another example in another article is given:
The expression:
(12 + 45) * 98
has instructions:
push 98
push 12
push 45
+
*
I understand this too. But suppose in the first example we wanted to reverse the division like the following:
Instead of:
(11 * 22 + 33 * 44) / 121
have:
121 / (11 * 22 + 33 * 44)
Now instead of pushing 121 right at the end, it seems that it needs to be pushed right at the start and look something like this:
push 121
push 11
push 22
mult
push 33
push 44
mult
add
div
As you can see 121 is the first thing pushed rather than the second last in the original example. However because the other parts is in brackets it should be evaluated first, how do you get around this?
I'm also having the same problem with the other examples I've found. How do you order the values and operations correctly? Do you need to look ahead? Or is there a simple way to just walk across the expression and create the instructions? From what I've read this seems to be most simple way of evaluating expressions, and they seem to imply that writing the instructions is really easy as if you can just run across the expression and create the proper reverse polish notation for it. I'm having trouble understanding how to order the values and instructions in the right way.
As we discussed in the comments, order of evaluation has nothing to do with operator precedence. For example, in Java subexpressions are evaluated left-to right. In C, order of evaluation is unspecified. Rather, it has a concept of sequence points, which basically says "Whatever you need to get done, just make sure it happens before this semicolon."
So your reverse-polish notation can just be constructed left to right without needing to reorder for parentheses. If you're not sure how to construct the RPN at all, pick up a book on parsing or compilers. The Dragon book opens with a detailed example of translating an expression to RPN.
Now, there is one instance where evaluation order does matter. In most languages, && and || short-circuit, so we need to avoid evaluating the RHS in some cases. This can be accomplished with a jump instruction.
For example if we have 1 > 2 && 2 < 3, we might emit the following instructions:
push 1
push 2
>
jmp_f lbl0 ; jump if false to label lbl0
push 2
push 3
<
&&
label lbl0
(Of course, your VM may not have all these instructions, but this is just an example of how to conditionally evaluate expressions)

Does GNU FORTH have an editor?

Chapter 3 of Starting FORTH says,
Now that you've made a block "current", you can list it by simply typing the word L. Unlike LIST, L does not want to be proceeded by a block number; instead it lists the current block.
When I run 180 LIST, I get
Screen 180 not modified
0
...
15
ok
But when I run L, I get an error
:30: Undefined word
>>>L<<<
Backtrace:
$7F0876E99A68 throw
$7F0876EAFDE0 no.extensions
$7F0876E99D28 interpreter-notfound1
What am I doing wrong?
Yes, gForth supports an internal (BLOCK) editor. Start gforth
type: use blocked.fb (a demo page)
type: 1 load
type editor
words will show the editor words,
s b n bx nx qx dl il f y r d i t 'par 'line 'rest c a m ok
type 0 l to list screen 0 which describes the editor,
Screen 0 not modified
0 \\ some comments on this simple editor 29aug95py
1 m marks current position a goes to marked position
2 c moves cursor by n chars t goes to line n and inserts
3 i inserts d deletes marked area
4 r replaces marked area f search and mark
5 il insert a line dl delete a line
6 qx gives a quick index nx gives next index
7 bx gives previous index
8 n goes to next screen b goes to previous screen
9 l goes to screen n v goes to current screen
10 s searches until screen n y yank deleted string
11
12 Syntax and implementation style a la PolyFORTH
13 If you don't like it, write a block editor mode for Emacs!
14
15
ok
Creating your own block file
To create your own new block file myblocks.fb
type: use blocked.fb
type: 1 load
type editor
Then
type use myblocks.fb
1 load will show BLOCK #1 (lines 0 till 15. 16 Lines of 64 characters each)
1 t will highlight line 1
Type i this is text to [i]nsert into line 1
After the current BLOCK is edited type flush in order to write BLOCK #1 to the file myblocks.fb
For more information see, gForth Blocks
It turns out these are "Editor Commands" the book says,
For Those Whose EDITOR Doesn't Follow These Rules
The FORTH-79 Standard does not specify editor commands. Your system may use a different editor; if so, check your systems documentation
I don't believe gforth supports an internal editor at all. So L, T, I, P, F, E, D, R are all presumably unsupported.
gforth is well integrated with emacs. In my xemacs here, by default any file called *.fs is considered FORTH source. "C-h m", as usual, gives the available commands.
No, GNU Forth doesn't have an internal editor; I use Vim :)

Functional impact of declaring local variables via function parameters

In writing some one-off Lua code for an answer, I found myself code golfing to fit a function on a single line. While this code did not fit on one line...
foo=function(a,b) local c=bob; some_code_using_c; return c; end
...I realized that I could just make it fit by converting it to:
foo=function(a,b,c) c=bob; some_code_using_c; return c; end
Are there any performance or functional implications of using a function parameter to declare a function-local variable (assuming I know that a third argument will never be passed to the function) instead of using local? Do the two techniques ever behave differently?
Note: I included semicolons in the above for clarity of concept and to aid those who do not know Lua's handling of whitespace. I am aware that they are not necessary; if you follow the link above you will see that the actual code does not use them.
Edit Based on #Oka's answer, I compared the bytecode generated by these two functions, in separate files:
function foo(a,b)
local c
return function() c=a+b+c end
end
function foo(a,b,c)
-- this line intentionally blank
return function() c=a+b+c end
end
Ignoring addresses, the byte code report is identical (except for the number of parameters listed for the function).
You can go ahead and look at the Lua bytecode generated by using luac -l -l -p my_file.lua, comparing instruction sets and register layouts.
On my machine:
function foo (a, b)
local c = a * b
return c + 2
end
function bar (a, b, c)
c = a * b
return c + 2
end
Produces:
function <f.lua:1,4> (4 instructions at 0x80048fe0)
2 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [2] MUL 2 0 1
2 [3] ADD 3 2 -1 ; - 2
3 [3] RETURN 3 2
4 [4] RETURN 0 1
constants (1) for 0x80048fe0:
1 2
locals (3) for 0x80048fe0:
0 a 1 5
1 b 1 5
2 c 2 5
upvalues (0) for 0x80048fe0:
function <f.lua:6,9> (4 instructions at 0x800492b8)
3 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [7] MUL 2 0 1
2 [8] ADD 3 2 -1 ; - 2
3 [8] RETURN 3 2
4 [9] RETURN 0 1
constants (1) for 0x800492b8:
1 2
locals (3) for 0x800492b8:
0 a 1 5
1 b 1 5
2 c 1 5
upvalues (0) for 0x800492b8:
Not very much difference, is there? If I'm not mistaken, there's just a slightly different declaration location specified for each c, and the difference in the params size, as one might expect.

Multiset Partition Using Linear Arithmetic and Z3

I have to partition a multiset into two sets who sums are equal. For example, given the multiset:
1 3 5 1 3 -1 2 0
I would output the two sets:
1) 1 3 3
2) 5 -1 2 1 0
both of which sum to 7.
I need to do this using Z3 (smt2 input format) and "Linear Arithmetic Logic", which is defined as:
formula : formula /\ formula | (formula) | atom
atom : sum op sum
op : = | <= | <
sum : term | sum + term
term : identifier | constant | constant identifier
I honestly don't know where to begin with this and any advice at all would be appreciated.
Regards.
Here is an idea:
1- Create a 0-1 integer variable c_i for each element. The idea is c_i is zero if element is in the first set, and 1 if it is in the second set. You can accomplish that by saying that 0 <= c_i and c_i <= 1.
2- The sum of the elements in the first set can be written as 1*(1 - c_1) + 3*(1 - c_2) + ... +
3- The sum of the elements in the second set can be written as 1*c1 + 3*c2 + ...
While SMT-Lib2 is quite expressive, it's not the easiest language to program in. Unless you have a hard requirement that you have to code directly in SMTLib2, I'd recommend looking into other languages that have higher-level bindings to SMT solvers. For instance, both Haskell and Scala have libraries that allow you to script SMT solvers at a much higher level. Here's how to solve your problem using the Haskell, for instance: https://gist.github.com/1701881.
The idea is that these libraries allow you to code at a much higher level, and then perform the necessary translation and querying of the SMT solver for you behind the scenes. (If you really need to get your hands onto the SMTLib encoding of your problem, you can use these libraries as well, as they typically come with the necessary API to dump the SMTLib they generate before querying the solver.)
While these libraries may not offer everything that Z3 gives you access to via SMTLib, they are much easier to use for most practical problems of interest.

Resources