Similar to C++ access by reference? - f#

I'm building a clustering algorithm in C++, but I don't deal well with OOP and the state of variables (member data) that change. For an algorithm of some complexity, I find this an obstacle to my development.
So, I was thinking in changing the programming language, to one of the functional languages: Ocaml or F#. Apart from having to change my mindset on how to approach programming, there's something that I need to be clarified. In C++, I use a double end queue to slide a window of time through the data. After some period of time, the oldest data is removed and newer data is appended. Data that is not yet too old remains in the double end queue.
Another, and more demanding task, is to compare properties of one of each objects. Each object is the data from a certain period of time. And if I have one thousand data objects at a certain time window, I need to compare each one to between none or twenty or thirty, depending. And some properties of that object being compared may change as a result of this comparison. In C++, I do it all using references, which means that I access objects in memory, that they are never copied, thus the algorithm runs at full speed (for my knowledge of C++).
I've been reading about functional programming, and the idea I get is that each function performs some operation and that original data (the input) is not changed. This means that the language copies the data structure and performs the required transformation. If so, using functional programming will delay the execution of the algorithm a great deal. Is this correct? If not, i.e., if there is a speedy way to perform transformation in data, is it possible to show me how to do it? A very small example would be great.
I'm hoping to have some kind of facility. I've read that both Ocaml and F# are used in research and scientific projects.

At a high level your question is whether using immutable data is slower than using mutable data. The answer to this is yes, it is slower in some cases. What's surprising (to me) is how small the penalty is. In most cases (in my experience) the extra time, which is often a log factor, is worth the extra modularity and clarity of using immutable data. And in numerous other cases there is no penalty at all.
The main reason that it's not as much slower as you would expect is that you can freely reuse any parts of the old data. There's no need to worry that some other part of the computation will change the data later: it's immutable!
For a similar reason, all accesses to immutable data are like references in C++. There's no need to make copies of data, as other parts of the computation can't change it.
If you want to work this way, you need to structure your data to get some re-use. If you can't easily do this, you may want to use some (controlled) mutation.
Both OCaml and F# are mixed-paradigm languages. They allow you to use mutable data if you want to.
The most enlightening account of operations on immutable data (IMHO) is Chris Okasaki's book Purely Functional Data Structures. (This Amazon link is for info only, not necessarily a suggestion to buy the book :-) You can also find much of this information in Okasaki's Phd thesis.

You can definitely implement a pointer machine in OCaml and F#. So that you can store direct references, and update them. E.g.,
type 'a cell = {
data : 'a;
mutable lhs : 'a cell;
mutable rhs : 'a cell;
}
In OCaml this will be represented as a pointer to a data structure, containing three words: a pointer to a data, and two pointers to sibling nodes:
+--------+ +-------+ +-------+
| cell |-------->| data |----->| |
+--------+ |-------| +-------+
+---| lhs |
| |-------|
| | rhs |--+
| +-------+ |
| +-------+ | +-------+
+-->| data | --->| data |
|-------| |-------|
| lhs | | lhs |
|-------| |-------|
| rhs | | rhs |
+-------+ +-------+
So, there is nothing special here. It is the same, as you can choose between persistent and imperative implementation in C++. But in C++ you usually pay a more significant cost for persistence, due to the lack of a support of a language itself. In OCaml there is a generative garbage collector, with very cheap allocation costs, and other optimizations.
So, yes, you can implement your data structure in a regular (imperative) way. But before doing this, you must be pretty sure, that you're ready to pay for this. It is much easier to reason about functional code, rather than imperative. This is actual the main reason, why people choose and use FP paradigm.

This means that the language copies the data structure and performs the required transformation
Not necessarily. If the objects are immutable (as they default to for F# record types, in C++ if all data members are const with no use of mutable) then taking a reference is fine.
If so, using functional programming will delay the execution of the algorithm a great deal. Is this correct?
Even with the above, functional languages tend to support lazy operations. In F#, with the right data structures/methods, this will be the case. But it can also be eager.
An example (not terrible idiomatic, but trying to be clear):
let Square (is : seq<'t>) = is |> Seq.map(fun n -> n*n)
and then in
let res = [1; 2; 3; 4] |> Square
will not calculate any of the squares until you read the values from re.

Its important to understand this in terms of two factors: mutation and sharing.
You are (seem to be) concentrated on the mutation aspect and seem to be neglecting sharing.
Take the standard list-append '#'; it copies the left arg and shares the right
So, yes it is true that you lose efficiency by copying but you correspondingly
gain by sharing. And so if you arrange your data structures to maximze sharing
you stand to gain from that what you lose by immutability caused copying.
For the most part this 'just happens'. However sometimes you need to tweak it.
Common example involving laziness in haskell:
ones = 1 : ones
this denotes an infinite list of 1s [1,1,1,...]
And the implementation can
be expected to optimize it to a loop (circular-graph)
+-----------+
| |
V |
+---------+ |
| | |
| 1 |-->---+
| |
+---------+
However when we generalize it to an infinite list of x-es
repeat x = x : repeat x
the implementation has a harder time detecting the loop because the
variable ones has now become a (recursive) function-call repeat x
Change it to
repeat x = let repeat_x = x : repeat_x in repeat_x
and the loop (ie sharing) is reinstated.

Related

Observer pattern usage in syntax tree parsing

I have detailed the specifications of the problem for reasons that will become clear after I ask my question, at the end. The program I am building is a parser in Java for a language with the following syntax (although this is not very relevant to the question):
<expr> ::= [<op> <expr> <expr>] | <symbol> | <value>
<symbol> ::= [a-zA-Z]+
<value> ::= [0-9]+
<op> ::= '+' | '*' | '==' | ‘<’
<assignment> ::= [= <symbol> <expr>]
<prog> ::= <assignment> |
[; <prog> <prog>] |
[if <expr> <prog> <prog>] |
[for <assignment> <expr> <assignment> <prog>] |
[assert <expr>] |
[return <expr>]`
This is an example of code in said language:
[; [= x 0] [; [if [== x 5] [= x 7] [= x [+ x 1]]] [return x]]]
Which is equivalent to:
x = 0;
if (x == 5)
x = 7;
else
x = x + 1;
return x;`
The code is guaranteed to be give in correct syntax; incorrectness of the given code is defined only by having:
a) An used variable (symbol) not previously declared (by declared meaning assigned something to it), even if the variable is used in a branch of an if or some other place that is never reached in the execution of the program;
b) Having a "return" instruction on each path the program could take, meaning the program cannot end without returning on any execution path it may take.
The requirements are that the program should parse the code.
My parser must:
a) Check for said correctness;
b) Parse the code and compute what is the returned value.
My take on this is:
1) Parse the code given into a tree of instructions and expressions;
2) Check for correctness by traversing the tree and seeing if a variable was declared in an upper scope before it was used;
3) Check for correctness by traversing the tree and seeing if any execution branch ends in a "return" instruction;
4) If all previous conditions hold, evaluate the returned value of the code by traversing the tree and remembering the value of all the variables in a HashMap or some other storage.
Now, my problem is that I must implement the parser using the Visitor and Observer design patterns. This is a key requirement for the project. I am fairly new to design patterns and only have a basic grasp about these two.
My question is: where should/can I use the Observer design patter in my parser?
It makes sense to use a Visitor to visit the nodes of the tree for steps 2, 3 and 4. I cannot figure out where I must use the Observer pattern, though.
Is there any place I can use it in my implementation? From my understanding, the Observer pattern takes care of data that can be read and modified by many "observers", the central idea being that an object modifying the data will announce the other objects that may be affected by the modification.
The main data being modified in my program is the tree and the HashMap in which I store the values for the variables. Both of there are accessed in a linear fashion, by only one thing. The tree is built one node at a time, and no other node, or object, for that matter, cares that a node is added or modified. In the evaluation phase, each node is visited and variables are added or modified in the hash table, but no object other than the current visitor from the current node cares about this. I suppose I can make each node an observer which upon observing a change does nothing, or something like that, forcing an Observer pattern, but that isn't really useful.
So, is there an obvious answer which I am completely missing? Is there a not so obvious one, but still giving an useful implementation of Observer? Can I use a half useful slightly forced Observer pattern somewhere in my algorithms, or is fully forced, completely useless way the only way to implement it? Is there a completely different way of approaching the problem which will allow me to use the Visitor and, more importantly, the Observer pattern?
Notes:
I am yet to implement the evaluation of the tree (steps 2, 3 and 4) with Visitor; I have only thought about how I should do it. I will implement it tomorrow and see if there is a way to use Observer somewhere, but having thought about how I could use it for a few hours, I still have no idea. I am hoping, though, that there is a way, which I haven't been able to discover but which will become clear after writing that part.
I apologize for writing so much. I couldn't summarize it better and still give details about the situation any better.
Also, I apologize if I am not clear in explanations. It is quite late, I have though about this for some hours and got tired, and I can't say I have a perfect grasp on the concepts. If anything is unclear or want further details on some matter, don't hesitate to ask. Also, don't hesitate in highlighting any mistakes or wrong paths in my judgement about the problem.
Here are some ideas how you could use well-known patterns and concepts to build an interpreter for your language:
Start processing an input stream by splitting it up into tokens ([;, =, x, 0, ], etc.). This first component (a.k.a. lexer, scanner, tokenizer) strips out irrelevant detail such as whitespace and produces tokens.
You could implement this as a simple state machine that is fed one character at a time. It can therefore be implemented as an observer of the input character sequence.
In the next stage (a.k.a. parsing) you build an abstract syntax tree (AST) from the generated tokens.
The parser is an observer of the tokenizer, i.e. it gets notified of any new tokens the tokenizer produces.(*) It is fed one token at a time. In the case of a fairly simple grammar, the scanner itself can also be some stack-based state machine. (For example, if it needs to match opening and closing brackets, it needs to be able to remember what context / state it was in outside the brackets, thus it'll probably use some kind of stack for context management.)
Once the parser has built an AST, perhaps almost all subsequent steps can be implemented using the visitor pattern. Every algorithm that traverses the AST, locates specific nodes or subtrees, or transforms (parts of) the AST can be modelled as a visitor. Some visitors will only model actions that do not return a value, others return a new AST node for a visited node such that a new transformed AST can be put together. For example:
Check whether the AST or any subtree of it describes a valid program fragment (visit a (sub-) tree, reduce it to a boolean or a list of errors).
Simplify / optimize the program (visit a (sub-) tree, generate a smaller or otherwise more optimal subtree, finally reassemble a new tree from the transformed subtrees).
Execute the AST (visit and interpret each node, execute some code accoeding to the node's meaning).
(*) Calling the parser an observer of the scanner is perhaps somewhat inaccurate. This Software Engineering SE post has a good summary of closely related design patterns. Could be that the scanner implements a strategy for dealing with the tokens.

Does the order of mutually exclusive clauses matter in function or match expressions

When clauses in function and match statements aren't mutually exclusive, the order clearly matters. However when clauses are mutually exclusive they can be written in any order. E.g., to find the minimum element in a list, the following are functionally equivalent:
let rec minElt =
function
| [] -> failwith "empty list"
| [x0] -> x0
| x0::xtl -> min x0 (minElt xtl)
let rec minElt =
function
| [x0] -> x0
| x0::xtl -> min x0 (minElt xtl)
| [] -> failwith "empty list"
I prefer the first one stylistically, because the patterns are listed in increasing order of size / the base cases are first. However is there any advantage for the second one? In particular, is the second one more efficient because the exceptional case will never be checked in the course of normal evaluation?
I don't think there is any established idiomatic style. I would focus on making the code readable and understandable first - I think this depends on personal preferences, but I guess you could write:
Special cases first (anything that needs special handling, or handles special but valid values)
Most common cases next (the typical path, such as x::xs for lists)
Exceptional cases (anything that means invalid input)
I guess this is how I generally tend to write pattern matching (because this is the order in which I think about the possible cases).
I would not worry too much about the performance. I tested your function just out of curiosity. I called it 1000 times on lists from length 1 to 100 (so that's 100000 iterations) and the first one was about 895ms while the second one 878ms so the difference is 2%. Does not sound like something that would matter over readability (this was in F# Interactive, so the difference might be even smaller).
I typically put the finishing case of a recursive function first. This is usually either the 0 case or the 1 case. Next I put the case which recurses (assuming there is only one), followed by any exceptional cases.
The reasoning behind this is that it's how I understand inductive reasoning: i.e. I understand the base case (how the recursion ends), following by how that base case is used to prove the correctness of the algorithm (how the recursion reaches the end). I put exceptional cases last because they don't add to my understanding of the logic behind the recursion. Recursion is more difficult to understand if you start in the middle of the argument than if you start at the end (and since recursion doesn't have a well-defined beginning point, you can't very well start there).
This reasoning leans heavily on the mathematical foundations of functional programming, so if that isn't how you approach functional programming, then maybe some other ordering would make more sense to you.

Writing a Z80 assembler - lexing ASM and building a parse tree using composition?

I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts.
What is the process to actually break up a source file into tokens? I believe this process is called lexing, and I've searched high and low for a real code examples that make sense, but I can't find a thing so simple code examples very welcome ;)
When parsing, does information ever need to be passed up or down the tree? The reason I ask is as follows, take:
LD BC, nn
It needs to be turned into the following parse tree once tokenized(???)
___ LD ___
| |
BC nn
Now, when this tree is traversed it needs to produce the following machine code:
01 n n
If the instruction had been:
LD DE,nn
Then the output would need to be:
11 n n
Meaning that it raises the question, does the LD node return something different based on the operand or is it the operand that returns something? And how is this achieved? More simple code examples would be excellent if time permits.
I'm most interested in learning some of the raw processes here rather than looking at advanced existing tools so please bear that in mind before sending me to Yacc or Flex.
Well, the structure of the tree you really want for
an instruction that operates on a register and an memory
addressing mode involing an offset displacement and an index register
would look like this:
INSTRUCTION-----+
| | |
OPCODE REG OPERAND
| |
OFFSET INDEXREG
And yes, you want want to pass values up and down the tree.
A method for formally specifying such value passing is called
"attribute grammars", and you decorate the grammar for your
langauge (in your case, your assembler syntax) with the value-passing
and the computations over those values. For more background,
see Wikipedia on attribute grammars.
In a related question you asked, I discussed a
tool, DMS,
which handles expression grammars and building trees. As
language manipulation tool, DMS faces exactly these same up-and-down
the tree information flows issues. It shouldn't surprise you,
that as a high-end language manipulation tool, it can handle
attribute grammar computations directly.
It is not necessary to build a parse tree. Z80 op codes are very simple. They consist of the op code and 0, 1 or 2 operands, separated by commas. You just need to split the opcode up into the (maximum of 3) components with a very simple parser - no tree is needed.
Actually, the opcodes do have not a byte base, but an octal base. The best description I know is DECODING Z80 OPCODES.

F#: is it OK for developing theorem provers?

Please advise. I am a lawyer, I work in the field of Law Informatics. I have been a programmer for a long time (Basic, RPG, Fortran, Pascal, Cobol, VB.NET, C#). I am currently interested in F#, but I'd like some advise. My concern is F# seems to be fit for math applications. And what I want would require a lot of Boolean Math operations and Natural Language Processing of text and, if successful, speech. I am worried about the text processing.
I received a revolutionary PROLOG source code (revolutionary in the field of Law and in particular Dispute Resolution). The program solves disputes by evaluating Yes-No (true-false) arguments advanced by two debating parties. Now, I am learning PROLOG so I can take the program to another level: evaluate the strenght of arguments when they are neither a Yes or No, but a persuasive element in the argumentation process.
So, the program handles the dialectics aspect of argumentation, I want it to begin processing the rhetoric aspect of argumentation, or at least some aspects.
Currently the program can manage formal logic. What I want is to begin managing some aspects of informal logic and for that I would need to do parsing of strings (long strings, maybe ms word documents) for the detection of text markers, words like "but" "therefore" "however" "since" etc, etc, just a long list of words I have to look up in any speech (verbal or written) and mark, and then evaluate left side and right side of the mark. Depending on the mark the sides are deemed strong or weak.
Initially, I thought of porting the Prolog program to C# and use a Prolog library. Then, it ocurred to me maybe it could be better in pure F#.
First, the project you describe sounds (and I believe this is the correct legal term) totally freaking awesome.
Second, while F# is a good choice for math applications, its also extremely well-suited for any applications which perform a lot of symbolic processing. Its worth noting that F# is part of the ML family of languages which were originally designed for the specific purpose of developing theorem provers. It sounds like you're writing an application which appeals directly to the niche ML languages are geared for.
I would personally recommend writing any theorem proving applications you have in F# rather than C# -- only because the resulting F# code will be about 1/10th the size of the C# equivalent. I posted this sample demonstrating how to evaluate propositional logic in C# and F#, you can see the difference for yourself.
F# has many features that make this type of logic processing natural. To get a feel for what the language looks like, here is one possible way to decide which side of an argument has won, and by how much. Uses a random result for the argument, since the interesting (read "very hard to impossible") part will be parsing out the argument text and deciding how persuasive it would be to an actual human.
/// Declare a 'weight' unit-of-measure, so the compiler can do static typechecking
[<Measure>] type weight
/// Type of tokenized argument
type Argument = string
/// Type of argument reduced to side & weight
type ArgumentResult =
| Pro of float<weight>
| Con of float<weight>
| Draw
/// Convert a tokenized argument into a side & weight
/// Presently returns a random side and weight
let ParseArgument =
let rnd = System.Random()
let nextArg() = rnd.NextDouble() * 1.0<weight>
fun (line:string) ->
// The REALLY interesting code goes here!
match rnd.Next(0,3) with
| 1 -> Pro(nextArg())
| 2 -> Con(nextArg())
| _ -> Draw
/// Tally the argument scored
let Score args =
// Sum up all pro & con scores, and keep track of count for avg calculation
let totalPro, totalCon, count =
args
|> Seq.map ParseArgument
|> Seq.fold
(fun (pros, cons, count) arg ->
match arg with
| Pro(w) -> (pros+w, cons, count+1)
| Con(w) -> (pros, cons+w, count+1)
| Draw -> (pros, cons, count+1)
)
(0.0<weight>, 0.0<weight>, 0)
let fcount = float(count)
let avgPro, avgCon = totalPro/fcount, totalCon/ fcoun
let diff = avgPro - avgCon
match diff with
// consider < 1% a draw
| d when abs d < 0.01<weight> -> Draw
| d when d > 0.0<weight> -> Pro(d)
| d -> Con(-d)
let testScore = ["yes"; "no"; "yes"; "no"; "no"; "YES!"; "YES!"]
|> Score
printfn "Test score = %A" testScore
Porting from prolog to F# wont be that straight forward. While they are both non-imperative languages. Prolog is a declarative language and f# is functional. I never used C# Prolog libraries but I think it will be easier then converting the whole thing to f#.
It sounds like the functional aspects of F# are appealing to you, but you wonder if it can handle the non-functional aspects. You should know that F# has the entire .NET Framework at its disposal. It also is not a purely functional language; you can write imperative code in it if you want to.
Finally, if there are still things you want to do from C#, it is possible to call F# functions from C#, and vice versa.
While F# is certainly more suitable than C# for this kind of application since there're going to be several algorithms which F# allows you to express in a very concise and elegant way, you should consider the difference between functional, OO, and logic programming. In fact, porting from F# will most likely require you to use a solver (or implement your own) and that might take you some time to get used to. Otherwise you should consider making a library with your prolog code and access it from .NET (see more about interop at this page and remember that everything you can access from C# you can also access from F#).
F# does not support logic programming as Prolog does. you might want to check out the P# compiler.

Explaining pattern matching vs switch

I have been trying to explain the difference between switch statements and pattern matching(F#) to a couple of people but I haven't really been able to explain it well..most of the time they just look at me and say "so why don't you just use if..then..else".
How would you explain it to them?
EDIT! Thanks everyone for the great answers, I really wish I could mark multiple right answers.
Having formerly been one of "those people", I don't know that there's a succinct way to sum up why pattern-matching is such tasty goodness. It's experiential.
Back when I had just glanced at pattern-matching and thought it was a glorified switch statement, I think that I didn't have experience programming with algebraic data types (tuples and discriminated unions) and didn't quite see that pattern matching was both a control construct and a binding construct. Now that I've been programming with F#, I finally "get it". Pattern-matching's coolness is due to a confluence of features found in functional programming languages, and so it's non-trivial for the outsider-looking-in to appreciate.
I tried to sum up one aspect of why pattern-matching is useful in the second of a short two-part blog series on language and API design; check out part one and part two.
Patterns give you a small language to describe the structure of the values you want to match. The structure can be arbitrarily deep and you can bind variables to parts of the structured value.
This allows you to write things extremely succinctly. You can illustrate this with a small example, such as a derivative function for a simple type of mathematical expressions:
type expr =
| Int of int
| Var of string
| Add of expr * expr
| Mul of expr * expr;;
let rec d(f, x) =
match f with
| Var y when x=y -> Int 1
| Int _ | Var _ -> Int 0
| Add(f, g) -> Add(d(f, x), d(g, x))
| Mul(f, g) -> Add(Mul(f, d(g, x)), Mul(g, d(f, x)));;
Additionally, because pattern matching is a static construct for static types, the compiler can (i) verify that you covered all cases (ii) detect redundant branches that can never match any value (iii) provide a very efficient implementation (with jumps etc.).
Excerpt from this blog article:
Pattern matching has several advantages over switch statements and method dispatch:
Pattern matches can act upon ints,
floats, strings and other types as
well as objects.
Pattern matches can act upon several
different values simultaneously:
parallel pattern matching. Method
dispatch and switch are limited to a single
value, e.g. "this".
Patterns can be nested, allowing
dispatch over trees of arbitrary
depth. Method dispatch and switch are limited
to the non-nested case.
Or-patterns allow subpatterns to be
shared. Method dispatch only allows
sharing when methods are from
classes that happen to share a base
class. Otherwise you must manually
factor out the commonality into a
separate function (giving it a
name) and then manually insert calls
from all appropriate places to your
unnecessary function.
Pattern matching provides redundancy
checking which catches errors.
Nested and/or parallel pattern
matches are optimized for you by the
F# compiler. The OO equivalent must
be written by hand and constantly
reoptimized by hand during
development, which is prohibitively
tedious and error prone so
production-quality OO code tends to
be extremely slow in comparison.
Active patterns allow you to inject
custom dispatch semantics.
Off the top of my head:
The compiler can tell if you haven't covered all possibilities in your matches
You can use a match as an assignment
If you have a discriminated union, each match can have a different 'type'
Tuples have "," and Variants have Ctor args .. these are constructors, they create things.
Patterns are destructors, they rip them apart.
They're dual concepts.
To put this more forcefully: the notion of a tuple or variant cannot be described merely by its constructor: the destructor is required or the value you made is useless. It is these dual descriptions which define a value.
Generally we think of constructors as data, and destructors as control flow. Variant destructors are alternate branches (one of many), tuple destructors are parallel threads (all of many).
The parallelism is evident in operations like
(f * g) . (h * k) = (f . h * g . k)
if you think of control flowing through a function, tuples provide a way to split up a calculation into parallel threads of control.
Looked at this way, expressions are ways to compose tuples and variants to make complicated data structures (think of an AST).
And pattern matches are ways to compose the destructors (again, think of an AST).
Switch is the two front wheels.
Pattern-matching is the entire car.
Pattern matches in OCaml, in addition to being more expressive as mentioned in several ways that have been described above, also give some very important static guarantees. The compiler will prove for you that the case-analysis embodied by your pattern-match statement is:
exhaustive (no cases are missed)
non-redundant (no cases that can never be hit because they are pre-empted by a previous case)
sound (no patterns that are impossible given the datatype in question)
This is a really big deal. It's helpful when you're writing the program for the first time, and enormously useful when your program is evolving. Used properly, match-statements make it easier to change the types in your code reliably, because the type system points you at the broken match statements, which are a decent indicator of where you have code that needs to be fixed.
If-Else (or switch) statements are about choosing different ways to process a value (input) depending on properties of the value at hand.
Pattern matching is about defining how to process a value given its structure, (also note that single case pattern matches make sense).
Thus pattern matching is more about deconstructing values than making choices, this makes them a very convenient mechanism for defining (recursive) functions on inductive structures (recursive union types), which explains why they are so abundantly used in languages like Ocaml etc.
PS: You might know the pattern-match and If-Else "patterns" from their ad-hoc use in math;
"if x has property A then y else z" (If-Else)
"some term in p1..pn where .... is the prime decomposition of x.." ((single case) pattern match)
Perhaps you could draw an analogy with strings and regular expressions? You describe what you are looking for, and let the compiler figure out how for itself. It makes your code much simpler and clearer.
As an aside: I find that the most useful thing about pattern matching is that it encourages good habits. I deal with the corner cases first, and it's easy to check that I've covered every case.

Resources