Why does iterating previously read-in sequence trigger a new read?

Why does iterating previously read-in sequence trigger a new read? - f#

In this SO post, adding
inSeq
|> Seq.length
|> printfn "%d lines read"
caused the lazy sequence in inSeq to be read in.
OK, I've expanded on that code and want to first print out that sequence (see new program below).
When the Visual Studio (2012) debugger gets to
inSeq |> Seq.iter (fun x -> printfn "%A" x)
the read process starts over again. When I examine inSeq using the debugger, inSeq appears to have no elements in it.
If I have first read elements into inSeq, how can I see (examine) those elements and why won't they print out with the call to Seq.iter?
open System
open System.Collections.Generic
open System.Text
open System.IO
#nowarn "40"
let rec readlines () =
seq {
let line = Console.ReadLine()
if not (line.Equals("")) then
yield line
yield! readlines ()
}
[<EntryPoint>]
let main argv =
let inSeq = readlines ()
inSeq
|> Seq.length
|> printfn "%d lines read"
inSeq |> Seq.iter (fun x -> printfn "%A" x)
// This will keep it alive enough to read your output
Console.ReadKey() |> ignore
0
I've read somewhere that results of lazy evaluation are not cached. Is that what is going on here? How can I cache the results?

Sequence is not a "container" of items, rather it's a "promise" to deliver items sometime in the future. You can think of it as a function that you call, except it returns its result in chunks, not all at once. If you call that function once, it returns you the result once. If you call it second time, it will return the result second time.
Because your particular sequence is not pure, you can compare it to a non-pure function: you call it once, it returns a result; you call it second time, it may return something different.
Sequences do not automatically "remember" their items after the first read - exactly same way as functions do not automatically "remember" their result after the first call. If you want that from a function, you can wrap it in a special "caching" wrapper. And so you can do for a sequence as well.
The general technique of "caching return value" is usually called "memoization". For F# sequences in particular, it is implemented in the Seq.cache function.

Related

f# observable fork and side effect

I have an observable chain, the initial observable is from network, and will be fired every time message is ready to be read. The next handler then read the message and deserialize it. Now I have a fork of the observable, one is message handler and the other is logging the message.
The problem is that because I'm using observable I will actually try to read the message twice.
I understand that using Event instead of Observable will solve the issue, however I will then have a garbage collection issue that might cause sockets to not being collected.
One solution I thought of is to insert some kind of separator which end one chain of observables and creates a new one, does such a function already exist as part of fsharp or other library library.
Are there other solutions to the problem?
Edit:
Code example that doesn't work correctly
let messagesStream =
socket.observable |>
Observable.map (fun () -> socket.read ()) |>
Observable.map (fun m -> deserialize m)
messagesStream |> Observable.add (fun m -> printf m)
messagesStream |> Observable.add (fun m -> handle m)

The easiest way to add some logging is to use Observable.iter as follows:
let messagesStream =
socket.observable
|> Observable.map (fun () -> socket.read ())
|> Observable.map (fun m -> deserialize m)
|> Observable.iter (printfn "%A")
messagesStream |> Observable.add (fun m -> handle m)

It sounds like you could create one observable that handles the reading of the message from the network and deserializes it. Assuming this is done with standard Rx operators, that should return an observable that pushes new, deserialized network messages.
You can have 2 subscribers to that observable, one that reacts to new messages with whatever business logic you would like and the second subscriber logging the messages.
That should eliminate the side effect of reading multiple times from the network. Pushing 2 copies of the deserialized message should not incur a side effect.

Generating an infinite list of items does not "pause" when getting external input

I have some code which I'm expecting to pause when it asks for user input. It only does this however, if the last expression is Seq.initInfinite.
let consoleaction (i : int) =
Console.WriteLine ("Enter Input: ")
(Console.ReadLine().Trim(), i)
Seq.initInfinite (fun i -> consoleaction i) |> Seq.map (fun f -> printfn "%A" f)
printfn "foo" // program will not pause unless this line is commented out.
Very new to F# and I've spent way too much time on this already. Would like to know what is going on :)

If you try that piece of code in F# interactive you will see different effects depending on how you execute it.
For instance if you execute it in one shot it will create values but nothing will be executed since the Seq.initInfinite instruction is 'lost' I mean, not let-bound to anything and at the same time is a lazy expression so its side effects will not be executed. If you remove the last instruction it will start prompting, that's because fsi bounds to it the last expression so in order to show you the value of it it starts evaluating the seq expression.
Things are different if you put this in a function, for example:
open System
let myProgram() =
let consoleaction ...
Now you will get a warning on the Seq.initInfinite:
warning FS0020: This expression should have type 'unit', but has type
'seq<unit>'. Use 'ignore' to discard the result of the expression, or
'let' to bind the result to a name.
Which is very clear. Additionally to ignore as the warning suggest you can change the Seq.map to Seq.iter since you are not interested in the result of the map which will be a seq of units.
But now again your program will not execute (try myProgram())unless you remove the last line, the printfn and it's clear why, this is because it returns the last expression which is not the Seq.initInfinite which is lost since it's lazy and ignored.
If you remove the printfn it will become the 'return value' of your function so it will be evaluated when calling the function.

Avoid mutation in this example in F#

Coming from an OO background, I am having trouble wrapping my head around how to solve simple issues with FP when trying to avoid mutation.
let mutable run = true
let player1List = ["he"; "ho"; "ha"]
let addValue lst value =
value :: lst
while run do
let input = Console.ReadLine()
addValue player1List input |> printfn "%A"
if player1List.Length > 5 then
run <- false
printfn "all done" // daz never gunna happen
I know it is ok to use mutation in certain cases, but I am trying to train myself to avoid mutation as the default. With that said, can someone please show me an example of the above w/o using mutation in F#?
The final result should be that player1List continues to grow until the length of items are 6, then exit and print 'all done'

The easiest way is to use recursion
open System
let rec makelist l =
match l |> List.length with
|6 -> printfn "all done"; l
| _ -> makelist ((Console.ReadLine())::l)
makelist []
I also removed some the addValue function as it is far more idiomatic to just use :: in typical F# code.
Your original code also has a common problem for new F# coders that you use run = false when you wanted run <- false. In F#, = is always for comparison. The compiler does actually warn about this.

As others already explained, you can rewrite imperative loops using recursion. This is useful because it is an approach that always works and is quite fundamental to functional programming.
Alternatively, F# provides a rich set of library functions for working with collections, which can actually nicely express the logic that you need. So, you could write something like:
let player1List = ["he"; "ho"; "ha"]
let player2List = Seq.initInfinite (fun _ -> Console.ReadLine())
let listOf6 = Seq.append player1List list2 |> Seq.take 6 |> List.ofSeq
The idea here is that you create an infinite lazy sequence that reads inputs from the console, append it at the end of your initial player1List and then take first 6 elements.
Depending on what your actual logic is, you might do this a bit differently, but the nice thing is that this is probably closer to the logic that you want to implement...

In F#, we use recursion to do loop. However, if you know how many times you need to iterate, you could use F# List.fold like this to hide the recursion implementation.
[1..6] |> List.fold (fun acc _ -> Console.ReadLine()::acc) []

I would remove the pipe from match for readability but use it in the last expression to avoid extra brackets:
open System
let rec makelist l =
match List.length l with
| 6 -> printfn "all done"; l
| _ -> Console.ReadLine()::l |> makelist
makelist []

Working with large text files?

I need to import a large text file (55MB) (525000 * 25) and manipulate the data and produce some output. As usual I started exploring with f# interactive, and I get some really strange behaviours.
Is this file too large or my code wrong?
First test was to import and simply comute the sum over one column (not the end goal but first test):
let calctest =
let reader = new StreamReader(path)
let csv = reader.ReadToEnd()
csv.Split([|'\n'|])
|> Seq.skip 1
|> Seq.map (fun line -> line.Split([|','|]))
|> Seq.filter (fun a -> a.[11] = "M")
|> Seq.map (fun values -> float(values.[14]))
As expected this produces a seq of float both in typecheck and in interactive. If I know add:
|> Seq.sum
Type check works and says this function should return a float but if I run it in interactive I get this error:
System.IndexOutOfRangeException: Index was outside the bounds of the array
Then I removed the last line again and thought I look at the seq of float in a text file:
let writetest =
let str = calctest |> Seq.map (fun i -> i.ToString())
System.IO.File.WriteAllLines("test.txt", str )
Again, this passes the type check but throws errors in interactive.
Can the standard StreamReader not handle that amount of data? or am I going wrong somewhere? Should I use a different function then Streamreader?
Thanks.

Seq is lazy, which means that only when you add the Seq.sum is all the mapping and filtering actually being done, that's why you don't see the error before adding that line. Are you sure you have 15 columns on all rows? That's probably the problem
I would advise you to use the CSV Type Provider instead of just doing a string.Split, that way you'll be sure to not have an accidental IndexOutOfRangeException, and you'll handle , escaping correctly.
Additionaly, you're reading the whole csv file into memory by calling reader.ReadToEnd(), the CsvProvider supports streaming if you set the Cache parameter to false. It's not a problem with a 55MB file, but if you have something much larger it might be

How to evaluate only a part of a lazy sequence?

Lazy evaluation is a great boon for stuff like processing huge files that will not fit in main memory at one go. However, suppose there are some elements in the sequence that I want evaluated immediately, while the rest can be lazily computed - is there any way to specify that?
Specific problem: (in case that helps to answer the question)
Specifically, I am using a series of IEnumerables as iterators for multiple sequences - these sequences are data read from files opened using BinaryReader streams (each sequence is responsible for the reading in of data from one of the files). The MoveNext() on these is to be called in a specific order. Eg. iter0 then iter1 then iter5 then iter3 .... and so on. This order is specified in another sequence index = {0,1,5,3,....}. However sequences being lazy, the evaluation is naturally done only when required. Hence, the file reads (for the sequences right at the beginning that read from files on disk) happens as the IEnumerables for a sequence are moving. This is causing an illegal file access - a file that is being read by one process is accessed again (as per the error msg).
True, the illegal file access could be for other reasons, and after having tried my best to debug other causes a partially lazy evaluation might be worth trying out.

While I agree with Tomas' comment: you shouldn't need this if file sharing is handled properly, here's one way to eagerly evaluate the first N elements:
let cacheFirst n (items: seq<_>) =
seq {
use e = items.GetEnumerator()
let i = ref 0
yield!
[
while !i < n && e.MoveNext() do
yield e.Current
incr i
]
while e.MoveNext() do
yield e.Current
}
Example
let items = Seq.initInfinite (fun i -> printfn "%d" i; i)
items
|> Seq.take 10
|> cacheFirst 5
|> Seq.take 3
|> Seq.toList
Output
0
1
2
3
4
val it : int list = [0; 1; 2]

Daniel's solution is sound, but I don't think we need another operator, just Seq.cache for most cases.
First cache your sequence:
let items = Seq.initInfinite (fun i -> printfn "%d" i; i) |> Seq.cache
Eager evaluation followed by lazy access from the beginning:
let eager = items |> Seq.take 5 |> Seq.toList
let cached = items |> Seq.take 3 |> Seq.toList
This will evaluate the first 5 elements once (during eager) but make them cached for secondary access.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why does iterating previously read-in sequence trigger a new read? - f#

Related

f# observable fork and side effect

Generating an infinite list of items does not "pause" when getting external input

Avoid mutation in this example in F#

Working with large text files?

How to evaluate only a part of a lazy sequence?

Categories

Resources