Often I hear that using a symbol table optimizes look ups of symbols in a programming language. Currently, my language is implemented only as an interpreter, not as a compiler. I do not yet want to allocate the time to build a compiler, so I'm attempting to optimize the interpreter. The language is based on Scheme semantics and syntax for the most part, and is statically-scoped. I use the AST for executing code at run-time (in my interpreter, implemented as discriminated unions just like the AST in Write Yourself a Scheme in 48 Hours.
Unfortunately, symbol look-up in my interpreter is slow due to the use of an F# Map to contain and look up symbols by name. (Well, in truth, it uses a Trie, but the performance is similarly problematic). I would like to instead use a symbol tree to achieve faster symbol lookup. However, I don't know if or how one can implement symbols tables in an interpreter. I hear about them only in the context of a compiler.
Is this possible? If the implementation strategy or performance differs from a symbol table in a compiler, could you describe the differences? Finally, is there an existing reference implementation of a symbol tree in an interpreter I might look at?
Thank you!
A symbol table associates some information with every symbol. In an interpreter, you would perhaps associate values with symbols. Map is one implementation particularly suitable for functional interpreters.
If you want to optimize your interpreter, get rid of the need for a symbol table at runtime. One way to to go is De Bruijn idexing.
There is also nice literature on mechanically deriving optimized interpreters, VMs and compilers from a functional interpreter, for example:
http://www.brics.dk/RS/03/14/BRICS-RS-03-14.pdf
For a simple example, consider lambda calculus with constants encoded with De Bruijn indices. Notice that the evaluator gets by without a symbol table, because it can use integers for lookup.
type exp =
| App of exp * exp
| Const of int
| Fn of exp
| Var of int
type value =
| Closure of exp * env
| Number of int
and env = value []
let lookup env i = Array.get env i
let extend value env = Array.append [| value |] env
let empty () : env = Array.empty
let eval exp =
let rec eval env exp =
match exp with
| App (f, x) ->
match eval env f with
| Closure (bodyF, envF) ->
let vx = eval env x
eval (extend vx envF) bodyF
| _ -> failwith "?"
| Const x -> Number x
| Fn e -> Closure (e, env)
| Var x -> lookup env x
eval (empty ()) exp
Related
I looked at how to make a tree from a given data with F# and https://citizen428.net/blog/learning-fsharp-binary-search-tree/
Basically what I am attempting to do is to implementing a function for building an extremely simple AST using discriminated unions (DU) to represent the tree.
I want to use tokens/symbols to build the tree. I think these could also be represented by DU. I am struggling to implement the insert function.
Let's just say we use the following to represent the tree. The basic idea is that for addition and subtraction of integers I'll only need binary tree. The Expression could either be an operator or a constant. This might be the wrong way of implementing the tree, but I'm not sure.
type Tree =
| Node of Tree * Expression * Tree
| Empty
and Expression =
| Operator //could be a token or another type
| Constant of int
And let's use the following for representing tokens. There's probably a smarter way of doing this. This is just an example.
type Token =
| Integer
| Add
| Subtract
How should I implement the insert function? I've written the function below and tried different ways of inserting elements.
let rec insert tree element =
match element, tree with
//use Empty to initalize
| x, Empty -> Node(Empty, x, Empty)
| x, Node(Empty,y,Empty) when (*x is something here*) -> Node((*something*))
| _, _ -> failwith "Missing case"
If you got any advice or maybe a link then I would appreciate it.
I think that thinking about the problem in terms of tree insertion is not very helpful, because what you really want to do is to parse a sequence of tokens. So, a plain tree insertion is not very useful. You instead need to construct the tree (expression) in a more specific way.
For example, say I have:
let input = [Integer 1; Add; Integer 2; Subtract; Integer 1;]
Say I want to parse this sequence of tokens to get a representation of 1 + (2 - 1) (which has parentheses in the wrong way, but it makes it easier to explain the idea).
My approach would be to define a recursive Expression type rather than using a general tree:
type Token =
| Integer of int
| Add
| Subtract
type Operator =
| AddOp | SubtractOp
type Expression =
| Binary of Operator * Expression * Expression
| Constant of int
To parse a sequence of tokens, you can write something like:
let rec parse input =
match input with
| Integer i::Add::rest ->
Binary(AddOp, Constant i, parse rest)
| Integer i::Subtract::rest ->
Binary(SubtractOp, Constant i, parse rest)
| Integer i::[] ->
Constant i
| _ -> failwith "Unexpected token"
This looks for lists starting with Integer i; Add; ... or similar with subtract and constructs a tree recursively. Using the above input, you get:
> parse input;;
val it : Expression =
Binary (AddOp, Constant 1,
Binary (SubtractOp, Constant 2, Constant 1))
I'm taking a Haskell course at school, and I have to define a Logical Proposition datatype in Haskell. Everything so far Works fine (definition and functions), and i've declared it as an instance of Ord, Eq and show. The problem comes when I'm required to define a program which interacts with the user: I have to parse the input from the user into my datatype:
type Var = String
data FProp = V Var
| No FProp
| Y FProp FProp
| O FProp FProp
| Si FProp FProp
| Sii FProp FProp
where the formula: ¬q ^ p would be: (Y (No (V "q")) (V "p"))
I've been researching, and found that I can declare my datatype as an instance of Read.
Is this advisable? If it is, can I get some help in order to define the parsing method?
Not a complete answer, since this is a homework problem, but here are some hints.
The other answer suggested getLine followed by splitting at words. It sounds like you instead want something more like a conventional tokenizer, which would let you write things like:
(Y
(No (V q))
(V p))
Here’s one implementation that turns a string into tokens that are either a string of alphanumeric characters or a single, non-alphanumeric printable character. You would need to extend it to support quoted strings:
import Data.Char
type Token = String
tokenize :: String -> [Token]
{- Here, a token is either a string of alphanumeric characters, or else one
- non-spacing printable character, such as "(" or ")".
-}
tokenize [] = []
tokenize (x:xs) | isSpace x = tokenize xs
| not (isPrint x) = error $
"Invalid character " ++ show x ++ " in input."
| not (isAlphaNum x) = [x]:(tokenize xs)
| otherwise = let (token, rest) = span isAlphaNum (x:xs)
in token:(tokenize rest)
It turns the example into ["(","Y","(","No","(","V","q",")",")","(","V","p",")",")"]. Note that you have access to the entire repertoire of Unicode.
The main function that evaluates this interactively might look like:
main = interact ( unlines . map show . map evaluate . parse . tokenize )
Where parse turns a list of tokens into a list of ASTs and evaluate turns an AST into a printable expression.
As for implementing the parser, your language appears to have similar syntax to LISP, which is one of the simplest languages to parse; you don’t even need precedence rules. A recursive-descent parser could do it, and is probably the easiest to implement by hand. You can pattern-match on parse ("(":xs) =, but pattern-matching syntax can also implement lookahead very easily, for example parse ("(":x1:xs) = to look ahead one token.
If you’re calling the parser recursively, you would define a helper function that consumes only a single expression, and that has a type signature like :: [Token] -> (AST, [Token]). This lets you parse the inner expression, check that the next token is ")", and proceed with the parse. However, externally, you’ll want to consume all the tokens and return an AST or a list of them.
The stylish way to write a parser is with monadic parser combinators. (And maybe someone will post an example of one.) The industrial-strength solution would be a library like Parsec, but that’s probably overkill here. Still, parsing is (mostly!) a solved problem, and if you just want to get the assignment done on time, using a library off the shelf is a good idea.
the read part of a REPL interpreter typically looks like this
repl :: ForthState -> IO () -- parser definition
repl state
= do putStr "> " -- puts a > character to indicate it's waiting for input
input <- getLine -- this is what you're looking for, to read a line.
if input == "quit" -- allows user to quit the interpreter
then do putStrLn "Bye!"
return ()
else let (is, cs, d, output) = eval (words input) state -- your grammar definition is somewhere down the chain when eval is called on input
in do mapM_ putStrLn output
repl (is, cs, d, [])
main = do putStrLn "Welcome to your very own interpreter!"
repl initialForthState -- runs the parser, starting with read
your eval method will have various loops, stack manipulations, conditionals, etc to actually figure out what the user inputted. hope this helps you with at least the reading input part.
I'm at the moment doing some very basic pattern matching with quotations.
My code:
let rec test e =
match e with
| Patterns.Lambda(v,e) -> test e
| Patterns.Call(_, mi, [P.Value(value, _); P.Value(value2, _)]) ->
printfn "Value1: %A | Value2 : %A" value value2
| Patterns.Call(_, mi, [P.Value(value, _); P.PropertyGet(_, pi, exprs)]) ->
printfn "Value1: %A | Value2 : %A" value (pi.GetValue(pi, null))
| _ -> failwith "Expression not supported"
let quot1 = <# "Name" = "MyName" #>
(* Call (None, Boolean op_Equality[String](System.String, System.String),
[Value ("Name"), Value ("lol")]) *)
let quot2 = <# "Name" = getNameById 5 #>
(* Call (None, Boolean op_Equality[String](System.String, System.String),
[Value ("Name"),
Call (None, System.String getNameById[Int32](Int32), [Value (5)])]) *)
test quot1 // Works!
test quot2 // Fails.. Dosent match any of the patterns.
Is it possible to somehow evaluate the result of the getNameById function first, so that it will match one of the patterns, or am I doomed to assign a let binding with the result of the function outside the quotation?
I've tried playing with the ExprShape patterns, but without luck..
You can use PowerPack's Eval to evaluate only the arguments to the Call expression:
match e with
| Call(_,mi,[arg1;arg2]) ->
let arg1Value, arg2Value = arg1.Eval(), arg2.Eval()
...
And similarly for Lambda expressions, etc. Noticed this frees you from enumerating permutations of Value, Property, and other argument expressions.
Update
Since you want to avoid using Eval (for good reason if you are implementing a performance conscious application), you'll need to implement your own eval function using reflection (which is still not lightening fast, but should be faster than PowerPack's Eval which involves an intermediate translation of F# Quotations to Linq Expressions). You can get started by supporting a basic set of expressions, and expand from there as needed. Recursion is the key, the following can help you get started:
open Microsoft.FSharp.Quotations
open System.Reflection
let rec eval expr =
match expr with
| Patterns.Value(value,_) -> value //value
| Patterns.PropertyGet(Some(instance), pi, args) -> //instance property get
pi.GetValue(eval instance, evalAll args) //notice recursive eval of instance expression and arg expressions
| Patterns.PropertyGet(None, pi, args) -> //static property get
pi.GetValue(null, evalAll args)
| Patterns.Call(Some(instance), mi, args) -> //instance call
mi.Invoke(eval instance, evalAll args)
| Patterns.Call(None, mi, args) -> //static call
mi.Invoke(null, evalAll args)
| _ -> failwith "invalid expression"
and evalAll exprs =
exprs |> Seq.map eval |> Seq.toArray
And then wrapping this in an Active Pattern will improve syntax:
let (|Eval|) expr =
eval expr
match e with
| Patterns.Call(_, mi, [Eval(arg1Value); Eval(arg2Value)]) -> ...
Update 2
OK, this thread got me motivated to try and implement a robust reflection based solution, and I've done so with good results which are now part of Unquote as of version 2.0.0.
It turned out not to be as difficult as I thought it would be, currently I am supporting all quotation expressions except for AddressGet, AddressSet, and NewDelegate. This is already better than PowerPack's eval, which doesn't support PropertySet, VarSet, FieldSet, WhileLoop, ForIntegerRangeLoop, and Quote for example.
Some noteworthy implementation details are with VarSet and VarGet, where I need to pass around an environment name / variable lookup list to each recursive call. It is really an excellent example of the beauty of functional programming with immutable data-structures.
Also noteworthy is special care taken with issues surrounding exceptions: striping the TargetInvokationExceptions thrown by reflection when it catches exceptions coming from methods it is invoking (this is very important for handling TryWith evaluation properly, and also makes for better user handling of exceptions which fly out of the quotation evaluation.
Perhaps the most "difficult" implementation detail, or really the most grueling, was the need to implement all of the core operators (well, as most I could discover: the numeric and conversion operators, checked versions as well) since most of them are not given dynamic implementations in the F# library (they are implemented using static type tests with no fallback dynamic implementations), but also means a serious performance increase when using these functions.
Some informal benchmarking I observe performance increases of up to 50 times over PowerPack's (not pre-compiled) eval.
I am also confident that my reflection-based solution will be less bug prone then PowerPack's, simply because it is less complicated than the PowerPack's approach (not to mention I've backed it up with about 150 unit tests, duly fortified by Unquotes additional 200+ unit tests which now is driven by this eval implementation).
If you want to peek at the source code, the main modules are Evaluation.fs and DynamicOperators.fs (I've locked the links into revision 257). Feel free to grab and use the source code for your own purposes, it licensed under Apache License 2.0! Or you could wait a week or so, when I release Unquote 2.0.0 which will include evaluation operators and extensions publicly.
You can write an interpreter that will evaluate the quotation and call the getNameById function using Reflection. However, that would be quite a lot of work. The ExprShape isn't going to help you much - it is useful for simple traversing of quotations, but to write an interpreter, you'll need to cover all patterns.
I think the easiest option is to evaluate quotations using the PowerPack support:
#r "FSharp.PowerPack.Linq.dll"
open Microsoft.FSharp.Linq.QuotationEvaluation
let getNameById n =
if n = 5 then "Name" else "Foo"
let quot1 = <# "Name" = "MyName" #>
let quot2 = <# "Name" = getNameById 5 #>
quot1.Eval()
quot2.Eval()
This has some limitations, but it is really the easiest option. However, I'm not really sure what are you trying to achieve. If you could clarify that, then you may get a better answer.
1- I'm really confusing on applying F# Quotation & Pattern on Meta Programming, please suggest some way to approach this concept in F#.
2- Can you show me some real application of F# Quotations and Pattern in Meta Programming ?
3- Some guys said that he can even make another language like IronScheme by F#,is that right ?
Thanks.
1- I'm really confusing on applying F# Quotation & Pattern on Meta Programming, please suggest some way to approach this concept in F#.
A quotation mechanism lets you embed code in your code and have the compiler transform that code from the source you provide into a data structure that represents it. For example, the following gives you a data structure representing the F# expression 1+2:
> <# 1+2 #>;;
val it : Quotations.Expr<int> =
Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
[Value (1), Value (2)])
{CustomAttributes = [NewTuple (Value ("DebugRange"),
NewTuple (Value ("stdin"), Value (3), Value (3), Value (3), Value (6)))];
Raw = ...;
Type = System.Int32;}
You can then hack on this data structure in order to apply transformations to your code, such as translating it from F# to Javascript in order to run it client side on almost any browser.
2- Can you show me some real application of F# Quotations and Pattern in Meta Programming ?
The F# quotation mechanism is extremely limited in functionality compared to the quotation mechanisms of languages like OCaml and Lisp, to the point where I wonder why it was ever added. Moreover, although the .NET Framework and F# compiler provide everything required to compile and execute quoted code at full speed, the evaluation mechanism for quoted code is orders of magnitude slower than real F# code which, again, renders it virtually useless. Consequently, I am not familiar with any real applications of it beyond Websharper.
For example, you can only quote certain kinds of expressions in F# and not other code such as type definitions:
> <# type t = Int of int #>;;
<# type t = Int of int #>;;
---^^^^
C:\Users\Jon\AppData\Local\Temp\stdin(4,4): error FS0010: Unexpected keyword 'type' in quotation literal
Most quotation mechanisms let you quote any valid code at all. For example, OCaml's quotation mechanism can quote the type definition that F# just barfed on:
$ ledit ocaml dynlink.cma camlp4oof.cma
Objective Caml version 3.12.0
Camlp4 Parsing version 3.12.0
# open Camlp4.PreCast;;
# let _loc = Loc.ghost;;
val _loc : Camlp4.PreCast.Loc.t = <abstr>
# <:expr< 1+2 >>;;
- : Camlp4.PreCast.Ast.expr =
Camlp4.PreCast.Ast.ExApp (<abstr>,
Camlp4.PreCast.Ast.ExApp (<abstr>,
Camlp4.PreCast.Ast.ExId (<abstr>, Camlp4.PreCast.Ast.IdLid (<abstr>, "+")),
Camlp4.PreCast.Ast.ExInt (<abstr>, "1")),
Camlp4.PreCast.Ast.ExInt (<abstr>, "2"))
# <:str_item< type t = Int of int >>;;
- : Camlp4.PreCast.Ast.str_item =
Camlp4.PreCast.Ast.StSem (<abstr>,
Camlp4.PreCast.Ast.StTyp (<abstr>,
Camlp4.PreCast.Ast.TyDcl (<abstr>, "t", [],
Camlp4.PreCast.Ast.TySum (<abstr>,
Camlp4.PreCast.Ast.TyOf (<abstr>,
Camlp4.PreCast.Ast.TyId (<abstr>,
Camlp4.PreCast.Ast.IdUid (<abstr>, "Int")),
Camlp4.PreCast.Ast.TyId (<abstr>,
Camlp4.PreCast.Ast.IdLid (<abstr>, "int")))),
[])),
Camlp4.PreCast.Ast.StNil <abstr>)
FWIW, here is an example in Common Lisp:
$ sbcl
This is SBCL 1.0.29.11.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* '(+ 1 2)
(+ 1 2)
Metaprogramming is one application where pattern matching can be extremely useful but pattern matching is a general-purpose language feature. You may appreciate my article from the Benefits of OCaml about a minimal interpreter. In particular, note how easy pattern matching makes it to act upon each of the different kinds of expression:
> let rec eval vars = function
| EApply(func, arg) ->
match eval vars func, eval vars arg with
| VClosure(var, vars, body), arg -> eval ((var, arg) :: vars) body
| _ -> invalid_arg "Attempt to apply a non-function value"
| EAdd(e1, e2) -> VInt (int(eval vars e1) + int(eval vars e2))
| EMul(e1, e2) -> VInt (int(eval vars e1) * int(eval vars e2))
| EEqual(e1, e2) -> VBool (eval vars e1 = eval vars e2)
| EIf(p, t, f) -> eval vars (if bool (eval vars p) then t else f)
| EInt i -> VInt i
| ELetRec(var, arg, body, rest) ->
let rec vars = (var, VClosure(arg, vars, body)) :: vars in
eval vars rest
| EVar s -> List.assoc s vars;;
val eval : (string * value) list -> expr -> value = <fun>
That OCaml article was used as the basis of the F#.NET Journal article "Language-oriented programming: The Term-level Interpreter" (31st December 2007).
3- Some guys said that he can even make another language like IronScheme by F#,is that right ?
Yes, you can write compilers in F#. In fact, F# is derived from a family of languages that were specifically designed for metaprogramming, the so-called MetaLanguages (ML) family.
The article "Run-time code generation using System.Reflection.Emit" (31st August 2008) from the F#.NET Journal described the design and implementation of a simple compiler for a minimal language called Brainf*ck. You can extend this to implement more sophisticated languages like Scheme. Indeed, the F# compiler is mostly written in F# itself.
On a related note, I just completed a project writing high-performance serialization code that used reflection to consume F# types in a project and then spit out F# code to serialize and deserialize values of those types
F# quotations allow you to mark some piece of F# code and get the representation of the source code. This is ued in WebSharper (see for example this tutorial) to translate F# code to JavaScript. Another example is F# support for LINQ where code marked as <# ... #> is translated to SQL:
let res = <# for p in db.Products
if p.IsVisible then yield p.Name #> |> query
Pattern matching is simply a very powerful language construct, but it is nothing more mysterious than for example if. The idea is that you can match value against patterns and program will choose the first matching branch. This is powerful because patterns can be nested and so you can use it to process various complex data structures or implement symbolc processing:
match expr with
| Multiply(Constant 0, _) | Multiply(_, Constant 0) -> 0
| Multiply(expr1, expr2) -> (eval expr1) * (eval expr2)
// (other patterns)
For example, here we're using pattern matching to evaluate some representation of numerical expression. The first pattern is an optimization that deals with cases where one argument of multiplication is 0.
Writing languages You can use F# (just like any other general purpose language) to write compilers and tools for other languages. In F#, this is easy because it comes with tools for generating lexers and parsers. See for example this introduction.
I'm still working on a tiny parser for a tiny language defined in a task at school. The parser that generates an AST(Abstract syntax tree) is working. What I want is to check the defined variables, they must be bounded by the let expression. First the method that is defined in the task(suggestion, not needed):
checkVars :: Expr -> Char
data Expr = Var Char | Tall Int | Sum Expr Expr | Mult Expr Expr | Neg Expr | Let Expr Expr Expr
deriving(Eq, Show)
A valid sentence would be "let X be 5 in *(2,X)". X would normally be a Var and 5 is normally an int. And the last can be any part of the dataExpr type. Main point: X is used somewhere in the last expression. The datatype for let is:
Let Expr Expr Expr
Link to the other questions I've asked about this task here just FYI;
First question
Second question
As you see the datatype to the checkVars is Expr, so here is an example of what I would feed to that function:
parseProg "let X be 4 in let Y be *(2 , X) in let Z be +(Y , X) in
+(+(X , Y) , Z)"
Let (Var 'X') (Tall 4) (Let (Var 'Y') (Mult (Tall 2) (Var 'X')) (Let
(Var 'Z') (Sum (Var 'Y') (Var 'X')) (Sum (Sum (Var 'X') (Var 'Y')) (Var
'Z'))))
Just 24
This is an all-inclusive example, the top part is the string/program being parsed. The second part, starting at line 3 (Let) is the AST, input for the checkVars function. And the bottom part "Just 24" is the evaluation. Which I will be back here for more help for.
Note: The point is to spit out the first unbound variable found as an error, and ' ' if everything is fine. Obviously if you want to do this another way you can.
Here's something to think about:
The first field of your Let constructor is an Expr. But can it actually hold anything else than Vars? If not, you should reflect this by making that field's type, say, String and adapting the parser correspondingly. This will make your task a lot easier.
The standard trick to evaluating an expression with let-bindings (which you are doing) is to write a function
type Env = [(String, Int)]
eval :: Expr -> Env -> Int
Note the extra argument for the environment. The environment keeps track of what variables are bound at any given moment to what values. Its position in the type means that you get to decide its value every time you call eval on child expressions. This is crucial! It also means you can have locally declared variables: binding a variable has no effect on its context, only on subexpressions.
Here are the special cases:
In a Var, you want to lookup the variable name in the environment and return the value that is bound to it. (Use the standard Prelude function lookup.)
In a Let, you want to add an extra (varname, value) to the front of the environment list before passing it on to the child expression.
I've left out some details, but this should be enough to get you going a long way. If you get stuck, ask another question. :-)
Oh, and I see you want to return a Maybe value to indicate failure. I suggest you first try without and use error to indicate unbound variables. When you have that version of eval working, adapt it to return Maybe values. The reason for this is that working with Maybe values makes the evaluation quite a bit more complicated.
I would actually try to evaluate the AST. Start by processing (and thus removing) all the Lets. Now, try to evaluate the resulting AST. If you run across a Var then there is an unbound variable.