converting string list to ternary tree in sml - xml-parsing

I would like to convert a string list into ternary tree.The tree should have the following datatype
datatype ttree = node of string*ttree list*string|leaf of string*string*string
for example,
["<root>","<a3>","<b2>","Father","</b2>","<b3>","Mother","</b3>","</a3>","<a1>","<ffx>","AAA",...] : string list
should be converted to
val test =
node
("<root>",
[node
("<a3>",
[leaf ("<b2>","Father","<b2>"),leaf ("<b3>","Mother","<b3>")],
"<a3>"),
node
("<a1>",[leaf ("<ffx>","AAA","<ffx>"),leaf ("<ff>","BBB","<ff>")],
"<a1>")],"<root>") : ttree

Not the most efficient solution, but this should solve the problem:
datatype ttree = node of string*ttree list*string | leaf of string*string*string
val l = ["<root>",
"<a3>",
"<b2>","Father","</b2>",
"<b3>","Mother","</b3>",
"</a3>",
"<a1>",
"<ffx>","AAA","</ffx>",
"</a1>"]
(*listToTree(l) = string list to ternary tree*)
fun listToTree(l : string list) : ttree =
let
(*isLeaf(s1, s3) = true if s1=<a> and s3=</a>,
a can be replaced with anything
false otherwise*)
fun isLeaf(s1 : string, s3 : string) : bool =
let
val (e1, e3) = (explode s1, explode s3)
val (t1c1::t1) = e1
val t1e = List.last t1
val (t3c1::t3c2::t3) = e3
val t3e = List.last t3
in
if t1c1 = #"<" andalso
t1c1 = t3c1 andalso
t1e = #">" andalso
t1e = t3e andalso
t3c2 = #"/" andalso
t1 = t3 then true else false
end
(*parseUntil(l, until, acc) = (p, r),
where p = acc + a part of l until an element x
that satisfies isLeaf(until,x)
and r = elements of list l after x *)
fun parseUntil(l : string list, until : string, acc : string list)
: string list * string list =
case l of
[] => (rev acc, [])
| x::xs => if isLeaf(until, x)
then (rev acc, xs)
else parseUntil(xs, until, x::acc)
(*parseList(l) = the ttree list of the given string list*)
fun parseList(l : string list) : ttree list =
case l of
[] => []
| x::xs => let
val (parsed, rest) = parseUntil(xs, x, [])
in
if length parsed = 1
then leaf(x, hd xs, x)::parseList(rest)
else node(x, parseList(parsed), x)::parseList(rest)
end
in
hd (parseList l)
end
val result = listToTree l;
It is basically doing this:
Once you get a tag, get the list elements until the close tag.
Call the same parser function with the elements until the close tag and put the result in the list in the node.
Call the same parser function with the elements after the close tag, and append them to the upper level list.
I hope I could help.

Related

F# creating a new list in function

let rec findMatches str list =
let hd :: tl = list
match list with
| [] -> []
| (s, _) as hd :: tl when s = str -> hd :: findMatches str tl
| _::tl -> findMatches str tl
This is my current function and i am stuck on how to create a new list and returning the list, I would want to test my function with this
matchs "A" [("A",5); ("BB",6); ("AA",9); ("A",0)];;
and i want it to reutrn
val it : int list = [0; 5]
so i know that i need a int list returned
It is easy to achieve your goal using a recursive inner function with an accumulator argument to collect the results one by one:
let findMatches str list =
let rec inner acc = function
| [] -> acc
| (s, n) :: tl ->
inner (if s = str then n :: acc else acc) tl
inner [] list
This is a perfect candidate to use List.fold from the F# library
let toMatch = "A"
let test =
[ ("A", 5)
("BB", 6)
("AA", 9)
("A", 0) ]
let findMatches toMatch items =
List.fold
(fun output item ->
if toMatch = (fst item) then
(snd item) :: output //Append if we find a match otherwise just return the same list
else
output)
[] //Set initial output to the empty list
items
findMatches toMatch test

Slice/Group a sequence of equal chars in F#

I need to extract the sequence of equal chars in a text.
For example:
The string "aaaBbbcccccccDaBBBzcc11211" should be converted to a list of strings like
["aaa";"B";"bb";"ccccccc";"D";"a";"BBB";"z";"cc";"11";"2";"11"].
That's my solution until now:
let groupSequences (text:string) =
let toString chars =
System.String(chars |> Array.ofList)
let rec groupSequencesRecursive acc chars = seq {
match (acc, chars) with
| [], c :: rest ->
yield! groupSequencesRecursive [c] rest
| _, c :: rest when acc.[0] <> c ->
yield (toString acc)
yield! groupSequencesRecursive [c] rest
| _, c :: rest when acc.[0] = c ->
yield! groupSequencesRecursive (c :: acc) rest
| _, [] ->
yield (toString acc)
| _ ->
yield ""
}
text
|> List.ofSeq
|> groupSequencesRecursive []
groupSequences "aaaBbbcccccccDaBBBzcc11211"
|> Seq.iter (fun x -> printfn "%s" x)
|> ignore
I'm a F# newbie.
This solution can be better?
Here a completely generic implementation:
let group xs =
let folder x = function
| [] -> [[x]]
| (h::t)::ta when h = x -> (x::h::t)::ta
| acc -> [x]::acc
Seq.foldBack folder xs []
This function has the type seq<'a> -> 'a list list when 'a : equality, so works not only on strings, but on any (finite) sequence of elements, as long as the element type supports equality comparison.
Used with the input string in the OP, the return value isn't quite in the expected shape:
> group "aaaBbbcccccccDaBBBzcc11211";;
val it : char list list =
[['a'; 'a'; 'a']; ['B']; ['b'; 'b']; ['c'; 'c'; 'c'; 'c'; 'c'; 'c'; 'c'];
['D']; ['a']; ['B'; 'B'; 'B']; ['z']; ['c'; 'c']; ['1'; '1']; ['2'];
['1'; '1']]
Instead of a string list, the return value is a char list list. You can easily convert it to a list of strings using a map:
> group "aaaBbbcccccccDaBBBzcc11211" |> List.map (List.toArray >> System.String);;
val it : System.String list =
["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
This takes advantage of the String constructor overload that takes a char[] as input.
As initially stated, this implementation is generic, so can also be used with other types of lists; e.g. integers:
> group [1;1;2;2;2;3;4;4;3;3;3;0];;
val it : int list list = [[1; 1]; [2; 2; 2]; [3]; [4; 4]; [3; 3; 3]; [0]]
How about with groupby
"aaaBbbcccccccD"
|> Seq.groupBy id
|> Seq.map (snd >> Seq.toArray)
|> Seq.map (fun t -> new string (t))
If you input order matters, here is a method that works
"aaaBbbcccccccDaBBBzcc11211"
|> Seq.pairwise
|> Seq.toArray
|> Array.rev
|> Array.fold (fun (accum::tail) (ca,cb) -> if ca=cb then System.String.Concat(accum,string ca)::tail else string(ca)::accum::tail) (""::[])
This one is also based on recursion though the matching gets away with smaller number of checks.
let chop (txt:string) =
let rec chopInner txtArr (word: char[]) (res: List<string>) =
match txtArr with
| h::t when word.[0] = h -> chopInner t (Array.append word [|h|]) res
| h::t when word.[0] <> h ->
let newWord = word |> (fun s -> System.String s)
chopInner t [|h|] (List.append res [newWord])
| [] ->
let newWord = word |> (fun s -> System.String s)
(List.append res [newWord])
let lst = txt.ToCharArray() |> Array.toList
chopInner lst.Tail [|lst.Head|] []
And the result is as expected:
val text : string = "aaaBbbcccccccDaBBBzcc11211"
> chop text;;
val it : string list =
["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
When you're folding, you'll need to carry along both the previous value and the accumulator holding the temporary results. The previous value is wrapped as option to account for the first iteration. Afterwards, the final result is extracted and reversed.
"aaaBbbcccccccDaBBBzcc11211"
|> Seq.map string
|> Seq.fold (fun state ca ->
Some ca,
match state with
| Some cb, x::xs when ca = cb -> x + ca::xs
| _, xss -> ca::xss )
(None, [])
|> snd
|> List.rev
// val it : string list =
// ["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
Just interesting why everyone publishing solutions based on match-with? Why not go plain recursion?
let rec groups i (s:string) =
let rec next j = if j = s.Length || s.[i] <> s.[j] then j else next(j+1)
if i = s.Length then []
else let j = next i in s.Substring(i, j - i) :: (groups j s)
"aaaBbbcccccccDaBBBzcc11211" |> groups 0
val it : string list = ["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
As someone other here:
Know thy fold ;-)
let someString = "aaaBbbcccccccDaBBBzcc11211"
let addLists state elem =
let (p, ls) = state
elem,
match p = elem, ls with
| _, [] -> [ elem.ToString() ]
| true, h :: t -> (elem.ToString() + h) :: t
| false, h :: t -> elem.ToString() :: ls
someString
|> Seq.fold addLists ((char)0, [])
|> snd
|> List.rev

F# records and mapping

I am new to F# and have been messing around with records and changing them. I am trying to apply my own function with out using map to my list. This is what i have so far. I am just wondering if my approach for how to write a mapping without using the map function the correct way of thinking about it.
module RecordTypes =
// creation of simple record
// immutable by default - key word mutable allows that to change
type Student =
{
Name : string
mutable age : int
mutable major : string
}
// setting up a few records with student information
// studentFive.age <- studentFive.age + 2 ; example of how to change mutable variable
let studentOne = { Name = "bob" ; age = 20 ; major = "spanish" }
let studentTwo= { Name = "sally" ; age = 18 ; major = "english" }
let studentThree = { Name = "frank" ; age = 22 ; major = "history" }
let studentFour = { Name = "lisa" ; age = 19 ; major = "math" }
let studentFive = { Name = "john" ; age = 17 ; major = "philosophy" }
// placing the records into a lits
let studentList = [studentOne; studentTwo; studentThree ;studentFour; studentFive]
// placing the records into a lits
let studentList = [studentOne; studentTwo; studentThree ;studentFour; studentFive]
// itterate through a list and printing each records
printf "the unsorted list of students: \n"
studentList |> List.iter (fun s-> printf "Name: %s, Age: %d, Major: %s\n" s.Name s.age s.major)
// a sort of the records based on the name, can be sorted by other aspects in the records
let sortStudents alist =
alist
|> List.sortBy (function student -> student.age)
let rec selectionSort = function
| [] -> [] //if the list is empty it will return an empty list
| l -> let min = List.min l in // otherwise set a min variable and use the min function to find the smallest item in a list
let rest = List.filter (fun i -> i <> min) l in // set a variable to hold the rest of the list using filter
// Returns a new collection containing only the elements of the collection for which the given predicate returns true
// fun sets up a lambda expression that if ( i -> i <> (not equal boolean) min) if i(the record is not the min put it into a list)
let sortedList = selectionSort rest in // sort the rest of the list that isnt the min
min :: sortedList // :: is an operator that creates a list, left elem appended to right side
let unsortedList = studentList
let sortedList = selectionSort unsortedList
printfn "sorted list based on first name:\n"
sortedList |> List.iter(fun s -> printf "Name: %s, Age: %d, Major: %s\n" s.Name s.age s.major)
here is where i tried to create my own map with function foo
let foo x = x + 1
let applyOnEachElement (list : Student list) (someFunction) =
list |> List.iter(fun s -> someFunction s.age)
//let agedStudents = applyOnEachElement studentList foo
printf " the students before function is applied to each: \n"
sortedList |> List.iter(fun s -> printf "Name: %s, Age: %d, Major: %s\n" s.Name s.age s.major)
printf " the student after function is applied to each: \n"
agedStudents |> List.iter(fun s -> printf "Name: %s, Age: %d, Major: %s\n" s.Name s.age s.major)
In the last comment, the OP mentions his almost complete solution. With a bit of added formatting and a forgotten match construct, it looks as follows:
let rec applyOnEachElement2 (list: Student list) (f) =
match list with
| [] -> []
| hd :: tl -> hd::applyOnEachElement2 f tl
This is quite close to the correct implementation of map function! There are only two issues:
when calling applyOnEachElement2 recursively, you switched the parameters
the f parameter is passed recursively but never actually used for anything
To fix this, all you need is to switch the order of parameters (I'll do this on the function arguments to get the parameters in the same order as standard map) and call the f function on hd on the last line (so that the function returns a list of transformed elements):
let rec applyOnEachElement2 f (list: Student list) =
match list with
| [] -> []
| hd :: tl -> (f hd)::applyOnEachElement2 f tl
You can also make it generic by dropping the type annotation, which gives you a function with the same type signature as the built in List.map:
let rec applyOnEachElement2 f list =
match list with
| [] -> []
| hd :: tl -> (f hd)::applyOnEachElement2 f tl

F# stream of armstrong numbers

I am seeking help, mainly because I am very new to F# environment. I need to use F# stream to generate an infinite stream of Armstrong Numbers. Can any one help with this one. I have done some mambo jumbo but I have no clue where I'm going.
type 'a stream = | Cons of 'a * (unit -> 'a stream)
let rec take n (Cons(x, xsf)) =
if n = 0 then []
else x :: take (n-1) (xsf());;
//to test if two integers are equal
let test x y =
match (x,y) with
| (x,y) when x < y -> false
| (x,y) when x > y -> false
| _ -> true
//to check for armstrong number
let check n =
let mutable m = n
let mutable r = 0
let mutable s = 0
while m <> 0 do
r <- m%10
s <- s+r*r*r
m <- m/10
if (test n s) then true else false
let rec armstrong n =
Cons (n, fun () -> if check (n+1) then armstrong (n+1) else armstrong (n+2))
let pos = armstrong 0
take 5 pos
To be honest your code seems a bit like a mess.
The most basic version I could think of is this:
let isArmstrong (a,b,c) =
a*a*a + b*b*b + c*c*c = (a*100+b*10+c)
let armstrongs =
seq {
for a in [0..9] do
for b in [0..9] do
for c in [0..9] do
if isArmstrong (a,b,c) then yield (a*100+b*10+c)
}
of course assuming a armstrong number is a 3-digit number where the sum of the cubes of the digits is the number itself
this will yield you:
> Seq.toList armstrongs;;
val it : int list = [0; 1; 153; 370; 371; 407]
but it should be easy to add a wider range or remove the one-digit numbers (think about it).
general case
the problem seems so interesting that I choose to implement the general case (see here) too:
let numbers =
let rec create n =
if n = 0 then [(0,[])] else
[
for x in [0..9] do
for (_,xs) in create (n-1) do
yield (n, x::xs)
]
Seq.initInfinite create |> Seq.concat
let toNumber (ds : int list) =
ds |> List.fold (fun s d -> s*10I + bigint d) 0I
let armstrong (m : int, ds : int list) =
ds |> List.map (fun d -> bigint d ** m) |> List.sum
let leadingZero =
function
| 0::_ -> true
| _ -> false
let isArmstrong (m : int, ds : int list) =
if leadingZero ds then false else
let left = armstrong (m, ds)
let right = toNumber ds
left = right
let armstrongs =
numbers
|> Seq.filter isArmstrong
|> Seq.map (snd >> toNumber)
but the numbers get really sparse quickly and using this will soon get you out-of-memory but the
first 20 are:
> Seq.take 20 armstrongs |> Seq.map string |> Seq.toList;;
val it : string list =
["0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "153"; "370"; "371";
"407"; "1634"; "8208"; "9474"; "54748"; "92727"; "93084"]
remark/disclaimer
this is the most basic version - you can get big speed/performance if you just enumerate all numbers and use basic math to get and exponentiate the digits ;) ... sure you can figure it out

function 'startsWithVowel' in F#

Given a list of vowels, I have written the function startsWithVowel to investigate if a word starts with a vowel. As you can see I use exception as controlflow, and that's not ideal. How to implement this better?
let vowel = ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) =
try
List.findIndex (fun x -> x = str.[0]) vowel
true
with
| :? System.Collections.Generic.KeyNotFoundException -> false
UPDATE : tx to all : once again I experience : never hesitate to ask a newbee question. I see a lot of very useful remarks, keep them coming :-)
try using the exists method instead
let vowel = ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) = List.exists (fun x -> x = str.[0]) vowel
exists returns true if any element in the list returns true for the predicate and false otherwise.
Use sets for efficient lookup
let vowels = Set.ofList ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str : string) = vowels |> Set.mem (str.[0])
Yet another alternative, tryFindIndex returns Some or None rather than throwing an exception:
> let vowel = ['A'; 'E'; 'I'; 'O'; 'U'; 'a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) =
match List.tryFindIndex (fun x -> x = str.[0]) vowel with
| Some(_) -> true
| None -> false;;
val vowel : char list = ['A'; 'E'; 'I'; 'O'; 'U'; 'a'; 'e'; 'i'; 'o'; 'u']
val startsWithVowel : string -> bool
> startsWithVowel "Juliet";;
val it : bool = false
> startsWithVowel "Omaha";;
val it : bool = true
I benchmarked a few approaches mentioned in this thread (Edit: added nr. 6).
The List.exists approach (~0.75 seconds)
The Set.contains approach (~0.51 seconds)
String.IndexOf (~0.25 seconds)
A non-compiled regex (~5 - 6 seconds)
A compiled regex (~1.0 seconds)
Pattern matching (why did I forget this the first time?) (~0.17 seconds)
I filled a list with 500000 random words and filtered it through various startsWithVowel functions, repeated 10 times.
Test code:
open System.Text.RegularExpressions
let startsWithVowel1 =
let vowels = ['a';'e';'i';'o';'u']
fun (s:string) -> vowels |> List.exists (fun v -> s.[0] = v)
let startsWithVowel2 =
let vowels = ['a';'e';'i';'o';'u'] |> Set.ofList
fun (s:string) -> Set.contains s.[0] vowels
let startsWithVowel3 (s:string) = "aeiou".IndexOf(s.[0]) >= 0
let startsWithVowel4 str = Regex.IsMatch(str, "^[aeiou]")
let startsWithVowel5 =
let rex = new Regex("^[aeiou]",RegexOptions.Compiled)
fun (s:string) -> rex.IsMatch(s)
let startsWithVowel6 (s:string) =
match s.[0] with
| 'a' | 'e' | 'i' | 'o' | 'u' -> true
| _ -> false
//5x10^5 random words
let gibberish =
let R = new System.Random()
let (word:byte[]) = Array.zeroCreate 5
[for _ in 1..500000 ->
new string ([|for _ in 3..R.Next(4)+3 -> char (R.Next(26)+97)|])
]
//f = startsWithVowelX, use #time in F# interactive for the timing
let test f =
for _ in 1..10 do
gibberish |> List.filter f |> ignore
My humble conclusion:
EDIT:
The imperative IndexOf F# pattern match wins the speed contest.
The Set.contains approach wins the beauty contest.
Note also that a number of exception-throwing functions have non-exception equivalents that return option rather than throwing - these typically have a 'try' prefix in the function name.
List.tryFindIndex:
http://msdn.microsoft.com/en-us/library/ee340224(VS.100).aspx
See also
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!181.entry
Using regular expressions:
open System.Text.RegularExpressions
let startsWithVowel str = Regex.IsMatch(str, "^[AEIOU]", RegexOptions.IgnoreCase)
let startsWithVowel (word:string) =
let vowels = ['a';'e';'i';'o';'u']
List.exists (fun v -> v = word.[0]) vowels

Resources