IEnumerator in F# continued - f#

This is the question about the earlier Persistence class that I was trying to expose as an enumerator. I realized that I need to pass by reference really to change the value of of the object that I am trying to populate. I guess I am going about this in a C++ way (As most may have guessed I am an F# beginner). However, I want to be as efficient in terms of memory foot print as I can. Ideally I would like to reuse the same object over and over again when I read from a file.
I am having a problem with this code where it does not allow me to pass by reference in the call to the function serialize. I am again reproducing the code here. I thank you in advance for your help.
The error I get:
error FS0001: This expression was expected to have type byref<'T> but here has type 'T
If I change the call to serialize(& current_, reader_) I get the following error:
persistence.fs(71,6): error FS0437: A type would store a byref typed value. This is not permitted by Common IL.
persistence.fs(100,29): error FS0412: A type instantiation involves a byref type. This is not permitted by the rules of Common IL.
persistence.fs(100,30): error FS0423: The address of the field current_ cannot be used at this point
The CODE:
type BinaryPersistenceIn<'T when 'T: (new : unit -> 'T)>(fn: string, serializer: ('T byref * BinaryReader) -> unit) =
let stream_ = File.Open(fn, FileMode.Open, FileAccess.Read)
let reader_ = new BinaryReader(stream_)
let mutable current_ = new 'T()
let eof() =
stream_.Position = stream_.Length
interface IEnumerator<'T> with
member this.Current
with get() = current_
member this.Dispose() =
stream_.Close()
reader_.Close()
interface System.Collections.IEnumerator with
member this.Current
with get() = current_ :> obj
member this.Reset() =
stream_.Seek((int64) 0., SeekOrigin.Begin) |> ignore
member this.MoveNext() =
let mutable ret = eof()
if stream_.CanRead && ret then
serializer( current_, reader_)
ret

You can circumvent this by introducing a mutable local, passing it to serialize, and then assigning back to current_:
member this.MoveNext() =
let mutable ret = eof()
if stream_.CanRead && ret then
let mutable deserialized = Unchecked.defaultof<_>
serializer( &deserialized, reader_)
current_ <- deserialized
ret
But now this is becoming really, really unsettling. Notice the use of Unchecked.defaultof<_>? There is no other way to initialize a value of unknown type, and it's called "unchecked" for a reason: the compiler can't guarantee safety of this code.
I strongly advise that you explore other ways of achieving your initial goal, such as using a seq computation expression instead, as I have suggested in your other question.

With respect to memory footprint, let's analyze the sequence option:
You have an instance of the seq. That's going to be some class implementing IEnumerable<'T>. This one will be held until you no longer need the seq, i.e. not reallocated each time.
You hold a Stream as part of the seq, with the same lifetime.
You hold a BinaryReader as part of the seq, with the same lifetime.
eof : unit -> bool is a compiler-generated function class as part of the seq, with the same lifetime.
The loop will use a bool for the while loop and the if condition. Both of which are stack-allocated structs and needed for the branching logic.
And finally, you yield an instance that you already got from the serializer.
Conceptually, that's as little memory consumption as you can have for a lazily evaluated seq. Once an element is consumed, it can be garbage collected. Multiple evaluations will do the same thing again.
The only thing you can actually play with, is what the serializer returns.
If you have your serializer return a struct, it is copied and stack-allocated. And it should not be mutable. Mutable structs discouraged. Why are mutable structs “evil”?
Structs are good with respect to the garbage collector as they avoid garbage collection. But they are typically to be used with very small objects, in the order of say 16-24 bytes max.
Classes are heap-allocated and are passed by reference always. So if your serializer returns a class, say a string, then you just pass that around by reference and overhead of copying will be very small as you only ever copy the reference, not the content.
If you want your serializer side-effecting, i.e. overwriting the same object (class, i.e. reference type is to be used for this), then the whole approach of IEnumerable<'T> and consequently seq is wrong. IEnumerables always give you new objects as result and should never modify any pre-existing object. The only state with them should be the information, at what place in the enumeration they are.
So if you need a side-effecting version, you could do something like (pseudo-code).
let readAndOverwrite stream target =
let position = // do something here to know the state
fun target ->
target.MyProp1 <- stream.ReadInt()
target.MyProp2 <- stream.ReadFloat()
Passing as byref does not seem very reasonable to me, as you then anyway allocate and garbage collect the object. So you can just as well do that in an immutable way. What you can do, is just modifying properties on your object instead.

Related

Clarifying field types in F#

After some practice with F#, I have still some points where I need to clear confusion:
The question is specifically about fields in a type.
This is what I understand and, some must be wrong because the naming wouldn't make sense if I was right:
let x -> private read-only field, evaluated once
let mutable x -> private mutable field
val x -> public read-only field.. difference with let?
val mutable x -> public mutable field
member this.x -> private read-only field, evaluated every time
member val -> public mutable field.. difference with val? why no mutable keyword?
Can someone tell me what is right / wrong, or some concepts I may have gotten wrong.
First of all, you can pretty much ignore val and val mutable. Those two are used with an older syntax for defining classes that is not exactly formally deprecated, but I would almost never use it when writing new normal F# code (there are some rare use cases, but I don't think it's worth worrying about those).
This leaves let and let mutable vs. member and member val.
let defines a private field that can only be accessed within the class. The value you assign to it is evaluated once. You can also define functions like let foo x = x + 1 or let bar () = printfn "hi" which have body that's evaluated when the function is called.
let mutable defines a private mutable field. This is initialized by evaluating the right-hand side, but you can later mutate it using fld <- <new value>.
member this.Foo = (...) defines a get-only property. The expression (...) is evaluated repeatedly whenever the property is accessed. This is a side-effect of how .NET properties work - they have a hidden get() method that's called whenever they are accessed, so the body is the body of this method.
member val Foo = (...) is a way of writing a property that is evaluated only once. In earlier versions of F#, this was not available, so you had to implement this functionality quite tediously yourself by definining a local field (to run the code once) and then returning that from a regular property:
let foo = (...)
member x.Foo = foo

In F#, is it possible to pass a reference to a mutable, defaulted value as a parameter?

For the Froto project (Google Protobuf in F#), I am trying to update the deserialization code from using 'a ref objects to passing values byref<'a>, for performance.
However, the code below fails on the hydrator &element field line:
type Field = TypeA | TypeB | Etc
let hydrateRepeated
(hydrator:byref<'a> -> Field -> unit)
(result:byref<'a list>)
(field:Field) =
let mutable element = Unchecked.defaultof<'a>
hydrator &element field
result <- element :: result
error FS0421: The address of the variable 'element' cannot be used at this point
Is there anything I can do to get this code to work without changing the signature of the hydrator parameter?
I'm very aware that I could use hydrator:'a ref -> Field -> unit and get things to work. However, the goal is to support deserializing into record types without needing to create a bunch of ref objects on the heap every time a record is deserialize.
Note that the following code is perfectly legal and has the same signature as the hydrator function declaration, above, so I'm unclear on what the problem is.
let assign (result:byref<'a>) (x:'a) =
result <- x
let thisWorks() =
let mutable v = Unchecked.defaultof<int>
assign &v 5
printfn "%A" v
I'll try to clarify what I was saying in my comments. You're right that your definition of assign is perfectly fine, and it appears to have the signature byref<'a> -> 'a -> unit. However, if you look at the resulting assembly, you'll find that the way it's compiled at the .NET representation level is:
Void assign[a](a ByRef, a)
(that is, it's a method that takes two arguments and doesn't return anything, not a function value that takes one argument and returns a function that takes the next argument and returns a value of type unit - the compiler uses some additional metadata to determine how the method was actually declared).
The same is true of function definitions that don't involve byref. For instance, assume you've got the following definition:
let someFunc (x:int) (y:string) = ()
Then the compiler actually creates a method with the signature
Void someFunc(Int32, System.String)
The compiler is smart enough to do the right thing when you try to use a function like someFunc as a first class value - if you use it in a context where it isn't applied to any arguments, the compiler will generate a subtype of int -> string -> unit (which is FSharpFunc<int, FSharpFunc<string, unit>> at the .NET representation level), and everything works seamlessly.
However, if you try to do the same thing with assign, it won't work (or shouldn't work, but there are several compiler bugs that may make it seem like certain variations work when really they don't - you might not get a compiler error but you may get an output assembly that is malformed instead) - it's not legal for .NET type instantiations to use byref types as generic type arguments, so FSharpFunc<int byref, FSharpFunc<int, unit>> is not a valid .NET type. The fundamental way that F# represents function values just doesn't work when there are byref arguments.
So the workaround is to create your own type with a method taking a byref argument and then create subtypes/instances that have the behavior you want, sort of like doing manually what the compiler does automatically in the non-byref case. You could do this with a named type
type MyByrefFunc2<'a,'b> =
abstract Invoke : 'a byref * 'b -> unit
let assign = {
new MyByrefFunc2<_,_> with
member this.Invoke(result, x) =
result <- x }
or with a delegate type
type MyByrefDelegate2<'a,'b> = delegate of 'a byref * 'b -> unit
let assign = MyByrefDelegate2(fun result x -> result <- x)
Note that when calling methods like Invoke on the delegate or nominal type, no actual tuple is created, so you shouldn't be concerned about any extra overhead there (it's a .NET method that takes two arguments and is treated as such by the compiler). There is the cost of a virtual method call or delegate call, but in most cases similar costs exist when using function values in a first class way too. And in general, if you're worried about performance then you should set a target and measure against it rather than trying to optimize prematurely.

Downcast using type object in F#

let o1 = box SomeType()
let t = typeof<SomeType>
Is it possible to downcast (to SomeType) a boxed object (o1) using the Type information stored in other object (o1)?
The ultimate objective is to have a sort of dynamic invocation of functions.
I'm storing functions with signature FSharpFunc<'Pre,'Post> in a Map:
// Lack of Covariance/Contravariance force me to define it as obj:
let functions = Map<string,obj>
let invoke f (pre : 'Pre when 'Pre : comparison) (post : 'Post when 'Post : comparison) =
(unbox<FSharpFunc<'Pre,'Post>> f).Invoke(pre)
This dynamic invocation works whenever I pass the proper types objects in pre and post.
And know comes the issue. I also has the arguments of the invocation in a map of the form:
let data = Map<string,obj>
let conf = Map<string, Type>
where conf stores the type of each possible string key in data.
So given a function key and a proper configuration, I can retrieve the arguments from data in order to feed the function. But for these to work I should be able to downcast data values using conf Types.
I suspect that it is not possible and I'm aware that I am bypassing static type safety (I'm ok with that). In that case, Any workaround or alternative approach?
I'm not sure I understand what you are after here, so this is not a specific answer to your question, but rather a couple of suggestions that might help you.
Generally speaking it sounds like you want some sort of existential types. It sounds like
you have data of various types and
you have operations on that data and
you want to dynamically invoke those operations on the data.
To make such things safely, you should encapsulate the data (or ideally the type of the data) and the operations together rather than separately. At the point when you know the type of the data and the possible operations on the data, wrap them together so that other parts of your program cannot just take the data and try to unsafely perform arbitrary operations on the data. (To make such encapsulation general and safe, allowing type safe manipulation of data whose type is not know statically, you need something like first-class modules.)
As another suggestion, rather than boxing whole functions, you might rather want to box and unbox
the domains and ranges of functions. Consider the following wrap and unwrap functions:
let wrap (a2b: 'a -> 'b) : obj -> obj =
unbox<'a> >> a2b >> box<'b>
let unwrap (o2o: obj -> obj) : 'a -> 'b =
box<'a> >> o2o >> unbox<'b>
The function map would have the signature
val functions: Map<string, obj -> obj>
and would store wrapped functions. To invoke a function from the map, you would unwrap the previously wrapped o2o function with the desired type:
(unwrap o2o : 'a when 'a: comparison -> 'b when 'b: comparison)
This is not type safe as such, but allows for flexible invocations.

Warning produced by f#: value has been copied to ensure the original is not mutated

The first definition below produces the warning in the title when compiled with f# 3.0 and the warning level set to 5. The second definition compiles cleanly. I wondered if someone could please explain just what the compiler worries I might accidentally mutate, or how would splitting the expression with a let clause help avoid that. Many thanks.
let ticks_with_warning () : int64 =
System.DateTime.Now.Ticks
let ticks_clean () : int64 =
let t = System.DateTime.Now
t.Ticks
I cannot really explain why the compiler emits this warning in your particular case - I agree with #ildjarn that you can safely ignore it, because the compiler is probably just being overly cautious.
However, I can give you an example where the warning might actually give you a useful hint that something might not go as you would expect. If we had a mutable struct like this:
[<Struct>]
type Test =
val mutable ticks : int64
member x.Inc() = x.ticks <- x.ticks + 1L
new (init) = { ticks = init }
Now, the Inc method mutates the struct (and you can also access the mutable field ticks). We can try writing a function that creates a Test value and mutates it:
let foo () =
let t = Test(1L)
t.Inc() // Warning: The value has been copied to ensure the original is not mutated
t
We did not mark the local value t as mutable, so the compiler tries to make sure the value is not mutated when we call Inc. It does not know whether Inc mutates the value or not, so the only safe thing is to create a copy - and thus foo returns the value Test(1L).
If we mark t as mutable, then the compiler does not have to worry about mutating it as a result of a call and so it does not give the warning (and the function returns Test(2L)):
let foo () =
let mutable t = Test(1L)
t.Inc()
t
I'm not really sure what is causing the warning in your example though. Perhaps the compiler thinks (as a result of some intermediate representation) that Ticks operation could mutate the left-hand-side value (System.DateTime.Now and t respectively) and it wants to prevent that.
The odd thing is that if you write your own DateTime struct in F#, you get a warning in both cases unless you mark the variable t as mutable (which is what I'd expect), but the behaviour with standard DateTime is different. So perhaps the compiler knows something about the standard type that I'm missing...

Declaring a variable without assigning

Any way to declare a new variable in F# without assigning a value to it?
See Aidan's comment.
If you insist, you can do this:
let mutable x = Unchecked.defaultof<int>
This will assign the absolute zero value (0 for numeric types, null for reference types, struct-zero for value types).
It would be interesting to know why the author needs this in F# (simple example of intended use would suffice).
But I guess one of the common cases when you may use uninitialised variable in C# is when you call a function with out parameter:
TResult Foo<TKey, TResult>(IDictionary<TKey, TResult> dictionary, TKey key)
{
TResult value;
if (dictionary.TryGetValue(key, out value))
{
return value;
}
else
{
throw new ApplicationException("Not found");
}
}
Luckily in F# you can handle this situation using much nicer syntax:
let foo (dict : IDictionary<_,_>) key =
match dict.TryGetValue(key) with
| (true, value) -> value
| (false, _) -> raise <| ApplicationException("Not Found")
You can also use explicit field syntax:
type T =
val mutable x : int
I agree with everyone who has said "don't do it". However, if you are convinced that you are in a case where it really is necessary, you can do this:
let mutable naughty : int option = None
...then later to assign a value.
naughty <- Some(1)
But bear in mind that everyone who has said 'change your approach instead' is probably right. I code in F# full time and I've never had to declare an unassigned 'variable'.
Another point: although you say it wasn't your choice to use F#, I predict you'll soon consider yourself lucky to be using it!
F# variables are by default immutable, so you can't assign a value later. Therefore declaring them without an initial value makes them quite useless, and as such there is no mechanism to do so.
Arguably, a mutable variable declaration could be declared without an initial value and still be useful (it could acquire an initial default like C# variables do), but F#'s syntax does not support this. I would guess this is for consistency and because mutable variable slots are not idiomatic F# so there's little incentive to make special cases to support them.

Resources