function 'startsWithVowel' in F# - f#

Given a list of vowels, I have written the function startsWithVowel to investigate if a word starts with a vowel. As you can see I use exception as controlflow, and that's not ideal. How to implement this better?
let vowel = ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) =
try
List.findIndex (fun x -> x = str.[0]) vowel
true
with
| :? System.Collections.Generic.KeyNotFoundException -> false
UPDATE : tx to all : once again I experience : never hesitate to ask a newbee question. I see a lot of very useful remarks, keep them coming :-)

try using the exists method instead
let vowel = ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) = List.exists (fun x -> x = str.[0]) vowel
exists returns true if any element in the list returns true for the predicate and false otherwise.

Use sets for efficient lookup
let vowels = Set.ofList ['a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str : string) = vowels |> Set.mem (str.[0])

Yet another alternative, tryFindIndex returns Some or None rather than throwing an exception:
> let vowel = ['A'; 'E'; 'I'; 'O'; 'U'; 'a'; 'e'; 'i'; 'o'; 'u']
let startsWithVowel(str :string) =
match List.tryFindIndex (fun x -> x = str.[0]) vowel with
| Some(_) -> true
| None -> false;;
val vowel : char list = ['A'; 'E'; 'I'; 'O'; 'U'; 'a'; 'e'; 'i'; 'o'; 'u']
val startsWithVowel : string -> bool
> startsWithVowel "Juliet";;
val it : bool = false
> startsWithVowel "Omaha";;
val it : bool = true

I benchmarked a few approaches mentioned in this thread (Edit: added nr. 6).
The List.exists approach (~0.75 seconds)
The Set.contains approach (~0.51 seconds)
String.IndexOf (~0.25 seconds)
A non-compiled regex (~5 - 6 seconds)
A compiled regex (~1.0 seconds)
Pattern matching (why did I forget this the first time?) (~0.17 seconds)
I filled a list with 500000 random words and filtered it through various startsWithVowel functions, repeated 10 times.
Test code:
open System.Text.RegularExpressions
let startsWithVowel1 =
let vowels = ['a';'e';'i';'o';'u']
fun (s:string) -> vowels |> List.exists (fun v -> s.[0] = v)
let startsWithVowel2 =
let vowels = ['a';'e';'i';'o';'u'] |> Set.ofList
fun (s:string) -> Set.contains s.[0] vowels
let startsWithVowel3 (s:string) = "aeiou".IndexOf(s.[0]) >= 0
let startsWithVowel4 str = Regex.IsMatch(str, "^[aeiou]")
let startsWithVowel5 =
let rex = new Regex("^[aeiou]",RegexOptions.Compiled)
fun (s:string) -> rex.IsMatch(s)
let startsWithVowel6 (s:string) =
match s.[0] with
| 'a' | 'e' | 'i' | 'o' | 'u' -> true
| _ -> false
//5x10^5 random words
let gibberish =
let R = new System.Random()
let (word:byte[]) = Array.zeroCreate 5
[for _ in 1..500000 ->
new string ([|for _ in 3..R.Next(4)+3 -> char (R.Next(26)+97)|])
]
//f = startsWithVowelX, use #time in F# interactive for the timing
let test f =
for _ in 1..10 do
gibberish |> List.filter f |> ignore
My humble conclusion:
EDIT:
The imperative IndexOf F# pattern match wins the speed contest.
The Set.contains approach wins the beauty contest.

Note also that a number of exception-throwing functions have non-exception equivalents that return option rather than throwing - these typically have a 'try' prefix in the function name.
List.tryFindIndex:
http://msdn.microsoft.com/en-us/library/ee340224(VS.100).aspx
See also
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!181.entry

Using regular expressions:
open System.Text.RegularExpressions
let startsWithVowel str = Regex.IsMatch(str, "^[AEIOU]", RegexOptions.IgnoreCase)

let startsWithVowel (word:string) =
let vowels = ['a';'e';'i';'o';'u']
List.exists (fun v -> v = word.[0]) vowels

Related

The mutable variable 'index' is used in an invalid way in seq {}?

In the following code, the compiler gets error on index <- index + 1 with error
Error 3 The mutable variable 'index' is used in an invalid way. Mutable variables cannot be captured by closures. Consider eliminating this use of mutation or using a heap-allocated mutable reference cell via 'ref' and '!'. d:\Users....\Program.fs 11 22 ConsoleApplication2
However, it has been defined as mutable?
let rec iterateTupleMemberTypes (tupleArgTypes: System.Type[]) (columnNames: string[]) startingIndex =
seq {
let mutable index = startingIndex
for t in tupleArgTypes do
match t.IsGenericType with
| true -> iterateTupleMemberTypes (t.GetGenericArguments()) columnNames index |> ignore
| false ->
printfn "Name: %s Type: %A" (columnNames.[index]) t
index <- index + 1
yield (columnNames.[index]), t
} |> Map.ofSeq
let myFile = CsvProvider<"""d:\temp\sample.txt""">.GetSample()
let firstRow = myFile.Rows |> Seq.head
let tupleType = firstRow.GetType()
let tupleArgTypes = tupleType.GetGenericArguments()
let m = iterateTupleMemberTypes tupleArgTypes myFile.Headers.Value 0
An idiomatic version of this might look like the following:
#r #"..\packages\FSharp.Data.2.2.2\lib\net40\FSharp.Data.dll"
open FSharp.Data
open System
type SampleCsv = CsvProvider<"Sample.csv">
let sample = SampleCsv.GetSample()
let rec collectLeaves (typeTree : Type) =
seq {
match typeTree.IsGenericType with
| false -> yield typeTree.Name
| true -> yield! typeTree.GetGenericArguments() |> Seq.collect collectLeaves
}
let columnTypes = (sample.Rows |> Seq.head).GetType() |> collectLeaves
let columnDefinitions = columnTypes |> Seq.zip sample.Headers.Value |> Map.ofSeq
let getDefinitions (sample : SampleCsv) = (sample.Rows |> Seq.head).GetType() |> collectLeaves |> Seq.zip sample.Headers.Value |> Map.ofSeq
Personally, I wouldn't be concerned too much about the performance of Map vs Dictionary (and rather have the immutable Map) unless there are hundreds of columns.
The statement after it, let index = 0, shadows your definition of mutable variable index. Also, to make mutables work in sequences, you need refs. https://msdn.microsoft.com/en-us/library/dd233186.aspx
Suggested by #Ming-Tang, I changed the mutable variable to ref and it works now. However, is it a way not to use mutable/ref variable at all?
let rec iterateTupleMemberTypes (tupleArgTypes: System.Type[]) (columnNames: string[]) startingIndex =
seq {
let index = ref startingIndex
for t in tupleArgTypes do
match t.IsGenericType with
| true ->
yield! iterateTupleMemberTypes (t.GetGenericArguments()) columnNames !index
| false ->
printfn "Name: %s Type: %A" (columnNames.[!index]) t
yield (columnNames.[!index]), t
index := !index + 1
} |> dict
let myFile = CsvProvider<"""d:\temp\sample.txt""">.GetSample()
let firstRow = myFile.Rows |> Seq.head
let tupleType = firstRow.GetType()
let tupleArgTypes = tupleType.GetGenericArguments()
let m = iterateTupleMemberTypes tupleArgTypes myFile.Headers.Value 0

Parsing file into Hashtable in OCaml

I'm trying to learn functional programming and having difficulty expressing a file parsing task functionally.
Let's say I have a text file with the following format:
val_0: <--- "header"
key_0_0 <--- these keys should be set to the "header" or val0
key_0_1
key_0_2
...
...
val_n:
...
key_n_m
How can I end up with a hash table with all keys set to their associated value?
EDIT: My solution. Can anyone improve it?
open Core.Std
let contains s1 s2 =
let re = Str.regexp_string s2 in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
let read_db f =
let tbl = Caml.Hashtbl.create 123456 in
let lines = In_channel.read_lines f in
let src = ref "" in
List.iter ~f:(fun g -> if contains g ":" then src := else Caml.Hashtbl.add tbl g !src) lines;
tbl
Here's my solution, just for comparison.
let line_opt ic =
try Some (input_line ic) with End_of_file -> None
let fold_lines_in f init fn =
let ic = open_in fn in
let rec go accum =
match line_opt ic with
| None -> accum
| Some line -> go (f accum line)
in
let res = go init in
close_in ic;
res
let hashtable_of_file fn =
let ht = Hashtbl.create 16 in
let itab label line =
let len = String.length line in
if line.[len - 1] = ':' then
String.sub line 0 (len - 1)
else
let () = Hashtbl.add ht line label in
label
in
let _ = fold_lines_in itab "" fn in
ht
Update
(Fixed non-tail-recursive fold implementation, sorry.)
Waving from the future as this requires OCaml library features that were not available at the time the question was asked.
String.ends_with was added in OCaml 4.13.0. And sequences in OCaml 4.07.0.
Setting aside all of the file reading, we can do this without the imperative style Hashtbl using just Map and list folds.
We'll build up a list using fold_left by keeping the current key and a reversed association list of values, then later reversing both the association list and the values in each sublist, and then converting to a Map.
module SM = Map.Make (String)
let data = "val_0:\nhello\nworld\nval_1:\nfoo\nval_2:\nbar"
let data' = String.split_on_char '\n'
data'
|> List.fold_left
(fun (cur_key, lst as acc) line ->
let label = String.ends_with ~suffix:":" line in
match cur_key with
| None when label -> (Some line, (line, [])::lst)
| Some _ when label -> (Some line, (line, [])::lst)
| None -> acc
| Some cur_key' -> (cur_key, let ((key, lst')::tl) = lst in (key, line::lst')::tl)
)
(None, [])
|> snd
|> List.map (fun (k, v) -> (k, List.rev v))
|> List.rev
|> List.to_seq
|> SM.of_seq
If we get run SM.bindings on this value we can see that it has worked:
[("val_0:", ["hello"; "world"]); ("val_1:", ["foo"]); ("val_2:", ["bar"])]

F# stream of armstrong numbers

I am seeking help, mainly because I am very new to F# environment. I need to use F# stream to generate an infinite stream of Armstrong Numbers. Can any one help with this one. I have done some mambo jumbo but I have no clue where I'm going.
type 'a stream = | Cons of 'a * (unit -> 'a stream)
let rec take n (Cons(x, xsf)) =
if n = 0 then []
else x :: take (n-1) (xsf());;
//to test if two integers are equal
let test x y =
match (x,y) with
| (x,y) when x < y -> false
| (x,y) when x > y -> false
| _ -> true
//to check for armstrong number
let check n =
let mutable m = n
let mutable r = 0
let mutable s = 0
while m <> 0 do
r <- m%10
s <- s+r*r*r
m <- m/10
if (test n s) then true else false
let rec armstrong n =
Cons (n, fun () -> if check (n+1) then armstrong (n+1) else armstrong (n+2))
let pos = armstrong 0
take 5 pos
To be honest your code seems a bit like a mess.
The most basic version I could think of is this:
let isArmstrong (a,b,c) =
a*a*a + b*b*b + c*c*c = (a*100+b*10+c)
let armstrongs =
seq {
for a in [0..9] do
for b in [0..9] do
for c in [0..9] do
if isArmstrong (a,b,c) then yield (a*100+b*10+c)
}
of course assuming a armstrong number is a 3-digit number where the sum of the cubes of the digits is the number itself
this will yield you:
> Seq.toList armstrongs;;
val it : int list = [0; 1; 153; 370; 371; 407]
but it should be easy to add a wider range or remove the one-digit numbers (think about it).
general case
the problem seems so interesting that I choose to implement the general case (see here) too:
let numbers =
let rec create n =
if n = 0 then [(0,[])] else
[
for x in [0..9] do
for (_,xs) in create (n-1) do
yield (n, x::xs)
]
Seq.initInfinite create |> Seq.concat
let toNumber (ds : int list) =
ds |> List.fold (fun s d -> s*10I + bigint d) 0I
let armstrong (m : int, ds : int list) =
ds |> List.map (fun d -> bigint d ** m) |> List.sum
let leadingZero =
function
| 0::_ -> true
| _ -> false
let isArmstrong (m : int, ds : int list) =
if leadingZero ds then false else
let left = armstrong (m, ds)
let right = toNumber ds
left = right
let armstrongs =
numbers
|> Seq.filter isArmstrong
|> Seq.map (snd >> toNumber)
but the numbers get really sparse quickly and using this will soon get you out-of-memory but the
first 20 are:
> Seq.take 20 armstrongs |> Seq.map string |> Seq.toList;;
val it : string list =
["0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "153"; "370"; "371";
"407"; "1634"; "8208"; "9474"; "54748"; "92727"; "93084"]
remark/disclaimer
this is the most basic version - you can get big speed/performance if you just enumerate all numbers and use basic math to get and exponentiate the digits ;) ... sure you can figure it out

How to get culture-aware output with printf-like functions?

Is there a way to use F#'s sprintf float formating with a decimal comma? It would be nice if this worked:
sprintf "%,1f" 23.456
// expected: "23,456"
Or can I only use String.Format Method (IFormatProvider, String, Object()) ?
EDIT: I would like to have a comma not a point as a decimal separator. Like most non-English speaking countries use it.
It's quite a pain, but you can write your own version of sprintf that does exactly what you want:
open System
open System.Text.RegularExpressions
open System.Linq.Expressions
let printfRegex = Regex(#"^(?<text>[^%]*)((?<placeholder>%(%|((0|-|\+| )?([0-9]+)?(\.[0-9]+)?b|c|s|d|i|u|x|X|o|e|E|f|F|g|G|M|O|A|\+A|a|t)))(?<text>[^%]*))*$", RegexOptions.ExplicitCapture ||| RegexOptions.Compiled)
type PrintfExpr =
| K of Expression
| F of ParameterExpression * Expression
let sprintf' (c:System.Globalization.CultureInfo) (f:Printf.StringFormat<'a>) : 'a =
//'a has form 't1 -> 't2 -> ... -> string
let cultureExpr = Expression.Constant(c) :> Expression
let m = printfRegex.Match(f.Value)
let prefix = m.Groups.["text"].Captures.[0].Value
let inputTypes =
let rec loop t =
if Reflection.FSharpType.IsFunction t then
let dom, rng = Reflection.FSharpType.GetFunctionElements t
dom :: loop rng
else
if t <> typeof<string> then
failwithf "Unexpected return type: %A" t
[]
ref(loop typeof<'a>)
let pop() =
let (t::ts) = !inputTypes
inputTypes := ts
t
let exprs =
K(Expression.Constant(prefix)) ::
[for i in 0 .. m.Groups.["placeholder"].Captures.Count - 1 do
let ph = m.Groups.["placeholder"].Captures.[i].Value
let text = m.Groups.["text"].Captures.[i+1].Value
// TODO: handle flags, width, precision, other placeholder types, etc.
if ph = "%%" then yield K(Expression.Constant("%" + text))
else
match ph with
| "%f" ->
let t = pop()
if t <> typeof<float> && t <> typeof<float32> then
failwithf "Unexpected type for %%f placeholder: %A" t
let e = Expression.Variable t
yield F(e, Expression.Call(e, t.GetMethod("ToString", [| typeof<System.Globalization.CultureInfo> |]), [cultureExpr]))
| "%s" ->
let t = pop()
if t <> typeof<string> then
failwithf "Unexpected type for %%s placeholder: %A" t
let e = Expression.Variable t
yield F(e, e)
| _ ->
failwithf "unhandled placeholder: %s" ph
yield K (Expression.Constant text)]
let innerExpr =
Expression.Call(typeof<string>.GetMethod("Concat", [|typeof<string[]>|]), Expression.NewArrayInit(typeof<string>, exprs |> Seq.map (fun (K e | F(_,e)) -> e)))
:> Expression
let funcConvert =
typeof<FuncConvert>.GetMethods()
|> Seq.find (fun mi -> mi.Name = "ToFSharpFunc" && mi.GetParameters().[0].ParameterType.GetGenericTypeDefinition() = typedefof<Converter<_,_>>)
let body =
List.foldBack (fun pe (e:Expression) ->
match pe with
| K _ -> e
| F(p,_) ->
let m = funcConvert.MakeGenericMethod(p.Type, e.Type)
Expression.Call(m, Expression.Lambda(m.GetParameters().[0].ParameterType, e, p))
:> Expression) exprs innerExpr
Expression.Lambda(body, [||]).Compile().DynamicInvoke() :?> 'a
sprintf' (Globalization.CultureInfo.GetCultureInfo "fr-FR") "%s %f > %f" "It worked!" 1.5f -12.3
Taking a look at source code of Printf module, it uses invariantCulture. I don't think printf-like functions are culture aware.
If you always need a comma, you could use sprintf and string.Replace function. If your code is culture-dependent, using ToString or String.Format is your best bet.

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources