F# map a seq to another seq of shorter length - f#

I have a sequence of strings like this (lines in a file)
[20150101] error a
details 1
details 2
[20150101] error b
details
[20150101] error c
I am trying to map this to a sequence of strings like this (log entries)
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
I can do this in an imperative way (by translating the code I would write in C#) - this works but it reads like pseudo-code because I have omitted the referenced functions:
let getLogEntries logFilePath =
seq {
let logEntryLines = new ResizeArray<string>()
for lineOfText in getLinesOfText logFilePath do
if isStartOfNewLogEntry lineOfText && logEntryLines.Any() then
yield joinLines logEntryLines
logEntryLines.Clear()
logEntryLines.Add(lineOfText)
if logEntryLines.Any() then
yield joinLines logEntryLines
}
Is there a more functional way of doing this?
I can't use Seq.map since it's not a one to one mapping, and Seq.fold doesn't seem right because I suspect it will process the entire input sequence before returning the results (not great if I have very large log files). I assume my code above isn't the ideal way to do this in F# because it's using ResizeArray<string>.

In general, when there is no built-in function that you can use, the functional way to solve things is to use recursion. Here, you can recursively walk over the input, remember the items of the last chunk (since the last [xyz] Info line) and produce new results when you reach a new starting block. In F#, you can write this nicely with sequence expressions:
let rec joinDetails (lines:string list) lastChunk = seq {
match lines with
| [] ->
// We are at the end - if there are any records left, produce a new item!
if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
| line::lines when line.StartsWith("[") ->
// New block starting. Produce a new item and then start a new chunk
if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
yield! joinDetails lines [line]
| line::lines ->
// Ordinary line - just add it to the last chunk that we're collection
yield! joinDetails lines (line::lastChunk) }
Here is an example showing the code in action:
let lines =
[ "[20150101] error a"
"details 1"
"details 2"
"[20150101] error b"
"details"
"[20150101] error c" ]
joinDetails lines []

There is not much in-built in Seq that is going to help you, so you have to roll your own solution. Ultimately, parsing a file like this involves iterating and maintaining state, but what F# does is encapsulate that iteration and state by means of computation expressions (hence your use of the seq computation expression).
What you've done isn't bad but you could extract your code into a generic function that computes the chunks (i.e. sequences of strings) in an input sequence without knowledge of the format. The rest, i.e. parsing an actual log file, can be made purely functional.
I have written this function in the past to help with this.
let chunkBy chunkIdentifier source =
seq {
let chunk = ref []
for sourceItem in source do
let isNewChunk = chunkIdentifier sourceItem
if isNewChunk && !chunk <> [] then
yield !chunk
chunk := [ sourceItem ]
else chunk := !chunk # [ sourceItem ]
yield !chunk
}
It takes a chunkIdentifier function which returns true if the input is the start of a new chunk.
Parsing a log file is simply a case of extracting the lines, computing the chunks and joining each chunk:
logEntryLines |> chunkBy (fun line -> line.[0] = '[')
|> Seq.map (fun s -> String.Join (" ", s))
By encapsulating the iteration and mutation as much as possible, while creating a reusable function, it's more in the spirit of functional programming.

Alternatively, another two variants:
let lst = ["[20150101] error a";
"details 1";
"details 2";
"[20150101] error b";
"details";
"[20150101] error c";]
let fun1 (xs:string list) =
let sb = new System.Text.StringBuilder(xs.Head)
xs.Tail
|> Seq.iter(fun x -> match x.[0] with
| '[' -> sb.Append("\n" + x)
| _ -> sb.Append(" " + x)
|> ignore)
sb.ToString()
lst |> fun1 |> printfn "%s"
printfn "";
let fun2 (xs:string list) =
List.fold(fun acc (x:string) -> acc +
match x.[0] with| '[' -> "\n" | _ -> " "
+ x) xs.Head xs.Tail
lst |> fun2 |> printfn "%s"
Print:
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
Link:
https://dotnetfiddle.net/3KcIwv

Related

Splitting a Seq of Strings Of Variable Length in F#

I am using a .fasta file in F#. When I read it from disk, it is a sequence of strings. Each observation is usually 4-5 strings in length: 1st string is the title, then 2-4 strings of amino acids, and then 1 string of space. For example:
let filePath = #"/Users/XXX/sample_database.fasta"
let fileContents = File.ReadLines(filePath)
fileContents |> Seq.iter(fun x -> printfn "%s" x)
yields:
I am looking for a way to split each observation into its own collection using the OOB high order functions in F#. I do not want to use any mutable variables or for..each syntax. I thought Seq.chunkBySize would work -> but the size varies. Is there a Seq.chunkByCharacter?
Mutable variables are totally fine for this, provided their mutability doesn't leak into a wider context. Why exactly do you not want to use them?
But if you really want to go hardcore "functional", then the usual functional way of doing something like that is via fold.
Your folding state would be a pair of "blocks accumulated so far" and "current block".
At each step, if you get a non-empty string, you attach it to the "current block".
And if you get an empty string, that means the current block is over, so you attach the current block to the list of "blocks so far" and make the current block empty.
This way, at the end of folding you'll end up with a pair of "all blocks accumulated except the last one" and "last block", which you can glue together.
Plus, an optimization detail: since I'm going to do a lot of "attach a thing to a list", I'd like to use a linked list for that, because it has constant-time attaching. But then the problem is that it's only constant time for prepending, not appending, which means I'll end up with all the lists reversed. But no matter: I'll just reverse them again at the very end. List reversal is a linear operation, which means my whole thing would still be linear.
let splitEm lines =
let step (blocks, currentBlock) s =
match s with
| "" -> (List.rev currentBlock :: blocks), []
| _ -> blocks, s :: currentBlock
let (blocks, lastBlock) = Array.fold step ([], []) lines
List.rev (lastBlock :: blocks)
Usage:
> splitEm [| "foo"; "bar"; "baz"; ""; "1"; "2"; ""; "4"; "5"; "6"; "7"; ""; "8" |]
[["foo"; "bar"; "baz"]; ["1"; "2"]; ["4"; "5"; "6"; "7"]; ["8"]]
Note 1: You may have to address some edge cases depending on your data and what you want the behavior to be. For example, if there is an empty line at the very end, you'll end up with an empty block at the end.
Note 2: You may notice that this is very similar to imperative algorithm with mutating variables: I'm even talking about things like "attach to list of blocks" and "make current block empty". This is not a coincidence. In this purely functional version the "mutating" is accomplished by calling the same function again with different parameters, while in an equivalent imperative version you would just have those parameters turned into mutable memory cells. Same thing, different view. In general, any imperative iteration can be turned into a fold this way.
For comparison, here's a mechanical translation of the above to imperative mutation-based style:
let splitEm lines =
let mutable blocks = []
let mutable currentBlock = []
for s in lines do
match s with
| "" -> blocks <- List.rev currentBlock :: blocks; currentBlock <- []
| _ -> currentBlock <- s :: currentBlock
List.rev (currentBlock :: blocks)
To illustrate Fyodor's point about contained mutability, here's an example that is mutable as can be while still somewhat reasonable. The outer functional layer is a sequence expression, a common pattern demonstrated by Seq.scan in the F# source.
let chooseFoldSplit
folding (state : 'State)
(source : seq<'T>) : seq<'U[]> = seq {
let sref, zs = ref state, ResizeArray()
use ie = source.GetEnumerator()
while ie.MoveNext() do
let newState, uopt = folding !sref ie.Current
if newState <> !sref then
yield zs.ToArray()
zs.Clear()
sref := newState
match uopt with
| None -> ()
| Some u -> zs.Add u
if zs.Count > 0 then
yield zs.ToArray() }
// val chooseFoldSplit :
// folding:('State -> 'T -> 'State * 'U option) ->
// state:'State -> source:seq<'T> -> seq<'U []> when 'State : equality
There is mutability of a ref cell (equivalent to a mutable variable) and there is a mutable data structure; an alias for System.Collection.Generic.List<'T>, which allows appending at O(1) cost.
The folding function's signature 'State -> 'T -> 'State * 'U option is reminiscent of the folder of fold, except that it causes the result sequence to be split when its state changes. And it also spawns an option that denotes the next member for the current group (or not).
It would work fine without the conversion to a persistent array, as long as you iterate the resulting sequence lazily and only exactly once. Therefore we need to isolate the contents of the ResizeArrayfrom the outside world.
The simplest folding for your use case is negation of a boolean, but you could leverage it for more complex tasks like numbering your records:
[| "foo"; "1"; "2"; ""; "bar"; "4"; "5"; "6"; "7"; ""; "baz"; "8"; "" |]
|> chooseFoldSplit (fun b t ->
if t = "" then not b, None else b, Some t ) false
|> Seq.map (fun a ->
if a.Length > 1 then
{ Description = a.[0]; Sequence = String.concat "" a.[1..] }
else failwith "Format error" )
// val it : seq<FastaEntry> =
// seq [{Description = "foo";
// Sequence = "12";}; {Description = "bar";
// Sequence = "4567";}; {Description = "baz";
// Sequence = "8";}]
I went with recursion:
type FastaEntry = {Description:String; Sequence:String}
let generateFastaEntry (chunk:String seq) =
match chunk |> Seq.length with
| 0 -> None
| _ ->
let description = chunk |> Seq.head
let sequence = chunk |> Seq.tail |> Seq.reduce (fun acc x -> acc + x)
Some {Description=description; Sequence=sequence}
let rec chunk acc contents =
let index = contents |> Seq.tryFindIndex(fun x -> String.IsNullOrEmpty(x))
match index with
| None ->
let fastaEntry = generateFastaEntry contents
match fastaEntry with
| Some x -> Seq.append acc [x]
| None -> acc
| Some x ->
let currentChunk = contents |> Seq.take x
let fastaEntry = generateFastaEntry currentChunk
match fastaEntry with
| None -> acc
| Some y ->
let updatedAcc =
match Seq.isEmpty acc with
| true -> seq {y}
| false -> Seq.append acc (seq {y})
let remaining = contents |> Seq.skip (x+1)
chunk updatedAcc remaining
You also can use Regular Expression for these kind of stuff. Here is a solution that uses a regular expression to extract a whole Fasta Block at once.
type FastaEntry = {
Description: string
Sequence: string
}
let fastaRegexStr =
#"
^> # Line Starting with >
(.*) # Capture into $1
\r?\n # End-of-Line
( # Capturing in $2
(?:
^ # A Line ...
[A-Z]+ # .. containing A-Z
\*? \r?\n # Optional(*) followed by End-of-Line
)+ # ^ Multiple of those lines
)
(?:
(?: ^ [ \t\v\f]* \r?\n ) # Match an empty (whitespace) line ..
| # or
\z # End-of-String
)
"
(* Regex for matching one Fasta Block *)
let fasta = Regex(fastaRegexStr, RegexOptions.IgnorePatternWhitespace ||| RegexOptions.Multiline)
(* Whole file as a string *)
let content = System.IO.File.ReadAllText "fasta.fasta"
let entries = [
for m in fasta.Matches(content) do
let desc = m.Groups.[1].Value
(* Remove *, \r and \n from string *)
let sequ = Regex.Replace(m.Groups.[2].Value, #"\*|\r|\n", "")
{Description=desc; Sequence=sequ}
]

Remove All but First Occurrence of a Character in a List of Strings

I have a list of names, and I need to output a single string that shows the letters from the names in the order they appear without the duplicates (e.g. If the list is ["John"; "James"; "Jack"], the output string should be Johnamesck). I've got a solution (folding all the names into a string then parse), but I feel like I'm cheesing it a bit by making my string mutable.
I also want to state this is not a school assignment, just an exercise from a work colleague as I'm coming into F# from only ever knowing Java Web stuff.
Here is my working solution (for insight purposes):
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
let mutable concatedNames = ["John"; "James"; "Jack"] |> List.fold (+) ""
let greaterThanOne (length : int) = length > 1
let stripChars (str : string) letter =
let parts = str.Split([| letter |])
match greaterThanOne (Array.length parts) with
| true -> seq {
yield Array.head parts
yield string letter
yield! Array.tail parts
}
|> String.concat ""
| _ -> str
let killAllButFirstLower = lower |> List.iter (fun letter -> concatedNames <- (stripChars concatedNames letter))
let killAllButFirstUpper = upper |> List.iter ( fun letter -> concatedNames <- (stripChars concatedNames letter))
printfn "All names with duplicate letters removed: %s" concatedNames
I originally wanted to do this explicitly with functions alone and had a solution previous to above
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
:
:
:
let lowerStripped = [""]
let killLowerDuplicates = lower |> List.iter (fun letter ->
match lowerStripped.Length with
| 1 ->
(stripChars concatedNames letter)::lowerStripped |> ignore
| _ -> (stripChars (List.head lowerStripped) letter)::lowerStripped |> ignore
)
let upperStripped = [List.head lowerStripped]
let killUpperDuplicates = lower |> List.iter ( fun letter -> (stripChars (List.head upperStripped) letter)::upperStripped |> ignore )
let strippedAll = List.head upperStripped
printfn "%s" strippedAll
But I couldn't get this working because I realized the consed lists weren't going anywhere (not to mention this is probably inefficient). The idea was that by doing it this way, once I parsed everything, the first element of the list would be the desired string.
I understand it may be strange asking a question I already have a solution to, but I feel like using mutable is just me not letting go of my Imperative habits (as I've read it should be rare to need to use it) and I want to more reinforce pure functional. So is there a better way to do this? Is the second solution a feasible route if I can somehow pipe the result somewhere?
You can use Seq.distinct to remove duplicates and retain ordering, so you just need to convert the list of strings to a single string, which can be done with String.concat "":
let distinctChars s = s |> String.concat ""
|> Seq.distinct
|> Array.ofSeq
|> System.String
If you run distinctChars ["John"; "James"; "Jack"], you will get back:
"Johnamesck"
This should do the trick:
let removeDuplicateCharacters strings =
// Treat each string as a seq<char>, flattening them into one big seq<char>
let chars = strings |> Seq.collect id // The id function (f(x) = x) is built in to F#
// We use it here because we want to collect the characters themselves
chars
|> Seq.mapi (fun i c -> i,c) // Get the index of each character in the overall sequence
|> Seq.choose (fun (i,c) ->
if i = (chars |> Seq.findIndex ((=) c)) // Is this character's index the same as the first occurence's index?
then Some c // If so, return (Some c) so that `choose` will include it,
else None) // Otherwise, return None so that `choose` will ignore it
|> Seq.toArray // Convert the seq<char> into a char []
|> System.String // Call the new String(char []) constructor with the choosen characters
Basically, we just treat the list of strings as one big sequence of characters, and choose the ones where the index in the overall sequence is the same as the index of the first occurrence of that character.
Running removeDuplicateCharacters ["John"; "James"; "Jack";] gives the expected output: "Johnamesck".

F# Writing to file changes behavior on return type

I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ ->
failwith "Bad data."
)
|> ignore
Strangely, the output file is empty after running in the F# interactive panel and that printf yields no printing at all.
If I remove the ignore to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:
val it : seq<unit> = Error: Cannot write to a closed TextWriter.
Before, I was declaring the StreamWriter using let and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).
What is happening here? Also, how to fix the file writing?
Seq.map returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert so no rows are processed. If you return a Seq<unit> and iterate over it outside convert, outFile will already be closed which is why you see the exception.
You should use Seq.iter instead:
data.Rows.Observations
|> Seq.iter (fun kvp -> ...)
Apart from the solutions already mentioned, you could also avoid the StreamWriter altogether, and use one of the standard .Net functions, File.WriteAllLines. You would prepare a sequence of converted lines, and then write that to the file:
let convert (inFileName : string) =
let lines =
Frame.ReadCsv(inFileName).Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map snd |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
File.WriteAllLines(outFileName, lines)
Update based on the discussion in the comments: Here's a solution that avoids Deedle altogether. I'm making some assumptions about your input file format here, based on another question you posted today: Label is in column 1, features follow.
let lines =
File.ReadLines inFileName
|> Seq.map (fun line ->
match Seq.toList(line.Split ',') with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
As Lee already mentioned, Seq.map is lazy. And that's also why you were getting "Cannot write to a closed TextWriter": the use keyword disposes of its IDisposable when it goes out of scope. In this case, that's at the end of your function. Since Seq.map is lazy, your function was returning an unevaluated sequence object, which had closed over the StreamWriter in your use statement -- but by the time you evaluated that sequence (in whatever part of your code checked for the Seq of nulls, or in the F# Interactive window), the StreamWriter had already been disposed by going out of scope.
Change Seq.map to Seq.iter and both of your problems will be solved.

F# idiomatic way of transforming text

Myello! So I am looking for a concise, efficient an idiomatic way in F# to parse a file or a string. I have a strong preference to treat the input as a sequence of char (char seq). The idea is that every function is responsible to parse a piece of the input, return the converted text tupled with the unused input and be called by a higher level function that chains the unused input to the following functions and use the results to build a compound type. Every parsing function should therefore have a signature similar to this one: char seq -> char seq * 'a . If, for example, the function's responsibility is simply to extract the first word, then, one approach would be the following:
let parseFirstWord (text: char seq) =
let rec forTailRecursion t acc =
let c = Seq.head t
if c = '\n' then
(t, acc)
else
forTailRecursion (Seq.skip 1 t) (c::acc)
let rest, reversedWord = forTailRecursion text []
(rest, List.reverse reversedWord)
Now, of course the main problem with this approach is that it extracts the word in reverse order and so you have to reverse it. Its main advantages however are that is uses strictly functional features and proper tail recursion. One could avoid the reversing of the extracted value while losing tail recursion:
let rec parseFirstWord (text: char seq) =
let c = Seq.head t
if c = '\n' then
(t, [])
else
let rest, tail = parseFirstWord (Seq.skip 1 t)
(rest, (c::tail))
Or use a fast mutable data structure underneath instead of using purely functional features, such as:
let parseFirstWord (text: char seq) =
let rec forTailRecursion t queue =
let c = Seq.head t
if c = '\n' then
(t, queue)
else
forTailRecursion (Seq.skip 1 t) (queue.Enqueu(c))
forTailRecursion text (new Queue<char>())
I have no idea how to use OO concepts in F# mind you so corrections to the above code are welcome.
Being new to this language, I would like to be guided in terms of the usual compromises that an F# developer makes. Among the suggested approaches and your own, which should I consider more idiomatic and why? Also, in that particular case, how would you encapsulate the return value: char seq * char seq, char seq * char list or evenchar seq * Queue<char>? Or would you even consider char seq * String following a proper conversion?
I would definitely have a look at FSLex. FSYacc, FParsec. However if you just want to tokenize a seq<char> you can use a sequence expression to generate tokens in the right order. Reusing your idea of a recursive inner function, and combinining with a sequence expression, we can stay tail recursive like shown below, and avoid non-idiomatic tools like mutable data structures.
I changed the separator char for easy debugging and the signature of the function. This version produces a seq<string> (your tokens) as result, which is probably easier to consume than a tuple with the current token and the rest of the text. If you just want the first token, you can just take the head. Note that the sequence is generated 'on demand', i.e. the input is parsed only as tokens are consumed through the sequence. Should you need the remainder of the input text next to each token, you can yield a pair in loop instead, but I'm guessing the downstream consumer most likely wouldn't (furthermore, if the input text is itself a lazy sequence, possibly linked to a stream, we don't want to expose it as it should be iterated through only in one place).
let parse (text : char seq) =
let rec loop t acc =
seq {
if Seq.isEmpty t then yield acc
else
let c, rest = Seq.head t, Seq.skip 1 t
if c = ' ' then
yield acc
yield! loop rest ""
else yield! loop rest (acc + string c)
}
loop text ""
parse "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]
This is not the only 'idiomatic' way of doing this in F#. Every time we need to process a sequence, we can look at the functions made available in the Seq module. The most general of these is fold which iterates through a sequence once, accumulating a state at each element by running a given function. In the example below accumulate is such a function, that progressively builds the resulting sequence of tokens. Since Seq.fold doesn't run the accumulator function on an empty sequence, we need the last two lines to extract the last token from the function's internal accumulator.
This second implementation keeps the nice characteriestics of the first, i.e. tail recursion (inside the fold implementation, if I'm not mistaken) and processing of the input sequence on demand. It also happens to be shorter, albeit a bit less readable probably.
let parse2 (text : char seq) =
let accumulate (res, acc) c =
if c = ' ' then (Seq.append res (Seq.singleton acc), "")
else (res, acc + string c)
let (acc, last) = text |> Seq.fold accumulate (Seq.empty, "")
Seq.append acc (Seq.singleton last)
parse2 "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]
One way of lexing/parsing in a way truly unique to F# is by using active patterns. The following simplified example shows the general idea. It can process a calculation string of arbitrary length without producing a stack overflow.
let rec (|CharOf|_|) set = function
| c :: rest when Set.contains c set -> Some(c, rest)
| ' ' :: CharOf set (c, rest) -> Some(c, rest)
| _ -> None
let rec (|CharsOf|) set = function
| CharOf set (c, CharsOf set (cs, rest)) -> c::cs, rest
| rest -> [], rest
let (|StringOf|_|) set = function
| CharsOf set (_::_ as cs, rest) -> Some(System.String(Array.ofList cs), rest)
| _ -> None
type Token =
| Int of int
| Add | Sub | Mul | Div | Mod
| Unknown
let lex: string -> _ =
let digits = set ['0'..'9']
let ops = Set.ofSeq "+-*/%"
let rec lex chars =
seq { match chars with
| StringOf digits (s, rest) -> yield Int(int s); yield! lex rest
| CharOf ops (c, rest) ->
let op =
match c with
| '+' -> Add | '-' -> Sub | '*' -> Mul | '/' -> Div | '%' -> Mod
| _ -> failwith "invalid operator char"
yield op; yield! lex rest
| [] -> ()
| _ -> yield Unknown }
List.ofSeq >> lex
lex "1234 + 514 / 500"
// seq [Int 1234; Add; Int 514; Div; Int 500]

Tail recursive copy of a seq to a list in F#

I am trying to build a list from a sequence by recursively appending the first element of the sequence to the list:
open System
let s = seq[for i in 2..4350 -> i,2*i]
let rec copy s res =
if (s|>Seq.isEmpty) then
res
else
let (a,b) = s |> Seq.head
Console.WriteLine(string a)
let newS = s |> Seq.skip(1)|> Seq.cache
let newRes = List.append res ([(a,b)])
copy newS newRes
copy s ([])
Two problems:
. getting a Stack overflow which means my tail recusive ploy sucks
and
. why is the code 100x faster when I put |> Seq.cache here let newS = s |> Seq.skip(1)|> Seq.cache.
(Note this is just a little exercise, I understand you can do Seq.toList etc.. )
Thanks a lot
One way that works is ( the two points still remain a bit weird to me ):
let toList (s:seq<_>) =
let rec copyRev res (enum:Collections.Generic.IEnumerator<_*_>) =
let somethingLeft = enum.MoveNext()
if not(somethingLeft) then
res
else
let curr = enum.Current
Console.WriteLine(string curr)
let newRes = curr::res
copyRev newRes enum
let enumerator = s.GetEnumerator()
(copyRev ([]) (enumerator)) |>List.rev
You say it's just an exercise, but it's useful to point to my answer to
While or Tail Recursion in F#, what to use when?
and reiterate that you should favor more applicative/declarative constructs when possible. E.g.
let rec copy2 s = [
for tuple in s do
System.Console.WriteLine(string(fst tuple))
yield tuple
]
is a nice and performant way to express your particular function.
That said, I'd feel remiss if I didn't also say "never create a list that big". For huge data, you want either array or seq.
In my short experience with F# it is not a good idea to use Seq.skip 1 like you would with lists with tail. Seq.skip creates a new IEnumerable/sequence and not just skips n. Therefore your function will be A LOT slower than List.toSeq. You should properly do it imperative with
s.GetEnumerator()
and iterates through the sequence and hold a list which you cons every single element.
In this question
Take N elements from sequence with N different indexes in F#
I started to do something similar to what you do but found out it is very slow. See my method for inspiration for how to do it.
Addition: I have written this:
let seqToList (xs : seq<'a>) =
let e = xs.GetEnumerator()
let mutable res = []
while e.MoveNext() do
res <- e.Current :: res
List.rev res
And found out that the build in method actually does something very similar (including the reverse part). It do, however, checks whether the sequence you have supplied is in fact a list or an array.
You will be able to make the code entirely functional: (which I also did now - could'nt resist ;-))
let seqToList (xs : seq<'a>) =
Seq.fold (fun state t -> t :: state) [] xs |> List.rev
Your function is properly tail recursive, so the recursive calls themselves are not what is overflowing the stack. Instead, the problem is that Seq.skip is poorly behaved when used recursively, as others have pointed out. For instance, this code overflows the stack on my machine:
let mutable s = seq { 1 .. 20001 }
for i in 1 .. 20000 do
s <- Seq.skip 1 s
let v = Seq.head s
Perhaps you can see the vague connection to your own code, which also eventually takes the head of a sequence which results from repeatedly applying Seq.skip 1 to your initial sequence.
Try the following code.
Warning: Before running this code you will need to enable tail call generation in Visual Studio. This can be done through the Build tab on the project properties page. If this is not enabled the code will StackOverflow processing the continuation.
open System
open System.Collections.Generic
let s = seq[for i in 2..1000000 -> i,2*i]
let rec copy (s : (int * int) seq) =
use e = s.GetEnumerator()
let rec inner cont =
if e.MoveNext() then
let (a,b) = e.Current
printfn "%d" b
inner (fun l -> cont (b :: l))
else cont []
inner (fun x -> x)
let res = copy s
printfn "Done"

Resources