I am trying to experiment with live data from the Coronavirus pandemic (unfortunately and good luck to all of us).
I have developed a small script and I am transitioning into a console application: it uses CSV type providers.
I have the following issue. Suppose we want to filter by region the Italian spread we can use this code into a .fsx file:
open FSharp.Data
let provinceData = CsvProvider< #"https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv" , IgnoreErrors = true>.GetSample()
let filterDataByProvince province =
provinceData.Rows
|> Seq.filter (fun x -> x.Sigla_provincia = province)
Being sequences lazy, then suppose I force the complier to load in memory the data for the province of Rome, I can add:
let romeProvince = filterDataByProvince "RM" |> Seq.toArray
This works fine, run by FSI, locally.
Now, if I transition this code into a console application using a .fs file; I declare exactly the same functions and using exactly the same type provider loader; but instead of using the last line to gather the data, I put it into a main function:
[<EntryPoint>]
let main _ =
let romeProvince = filterDataByProvince "RM" |> Seq.toArray
Console.Read() |> ignore
0
This results into the following runtime exception:
System.Exception
HResult=0x80131500
Message=totale_casi is missing
Source=FSharp.Data
StackTrace:
at <StartupCode$FSharp-Data>.$TextRuntime.GetNonOptionalValue#139-4.Invoke(String message)
at CoronaSchiatta.Evoluzione.provinceData#10.Invoke(Object parent, String[] row) in C:\Users\glddm\source\repos\CoronaSchiatta\CoronaSchiatta\CoronaEvolution.fs:line 10
at FSharp.Data.Runtime.CsvHelpers.parseIntoTypedRows#174.GenerateNext(IEnumerable`1& next)
Can you explain that?
Some rows have an odd format, possibly, but the FSI session is robust to those, whilst the console version is fragile; why? How can I fix that?
I am using VS2019 Community Edition, targeting .NET Framework 4.7.2, F# runtime: 4.7.0.0;
as FSI, I am using the following: FSI Microsoft (R) F# Interactive version 10.7.0.0 for F# 4.7
PS: Please also be aware that if I use CsvFile, instead of type providers, as in:
let test = #"https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv"
|> CsvFile.Load |> (fun x -> x.Rows ) |> Seq.filter ( fun x-> x.[6 ] = "RM")
|> Seq.iter ( fun x -> x.[9] |> Console.WriteLine )
Then it works like a charm also in the console application. Of course I would like to use type providers otherwise I have to add type definition, mapping the schema to the columns (and it will be more fragile). The last line was just a quick test.
Fragility
CSV Type Providers can be fragile if you don't have a good schema or sample.
Now getting a runtime error is almost certainly because your data doesn't match up.
How do you figure it out? One way is to run through your data first:
provinceData.Rows |> Seq.iteri (fun i x -> printfn "Row %d: %A" (i + 1) x)
This runs up to Row 2150. And sure enough, the next row:
2020-03-11 17:00:00,ITA,19,Sicilia,994,In fase di definizione/aggiornamento,,0,0,
You can see the last value (totale_casi) is missing.
One of CsvProvider's options is InferRows. This is the number of rows the provider scans in order to build up a schema - and its default value happens to be 1000.
So:
type COVID = CsvProvider<uri, InferRows = 0>
A better way to prevent this from happening in the future is to manually define a sample from a sub-set of data:
type COVID = CsvProvider<"sample-dpc-covid19-ita-province.csv">
and sample-dpc-covid19-ita-province.csv is:
data,stato,codice_regione,denominazione_regione,codice_provincia,denominazione_provincia,sigla_provincia,lat,long,totale_casi
2020-02-24 18:00:00,ITA,13,Abruzzo,069,Chieti,CH,42.35103167,14.16754574,0
2020-02-24 18:00:00,ITA,13,Abruzzo,066,L'Aquila,AQ,42.35122196,13.39843823,
2020-02-24 18:00:00,ITA,13,Abruzzo,068,Pescara,PE,42.46458398,14.21364822,0
2020-02-24 18:00:00,ITA,13,Abruzzo,067,Teramo,TE,42.6589177,13.70439971,0
With this the type of totale_casi is now Nullable<int>.
If you don't mind NaN values, you can also use:
CsvProvider<..., AssumeMissingValues = true>
Why does FSI seem more robust?
FSI isn't more robust. This is my best guess:
Your schema source is being regularly updated.
Type Providers cache the schema, so that it doesn't regenerate the schema every time you compile your code, which can be impractical. When you restart an FSI session, you end up regenerating your Type Provider, but not so with the console application. So it might sometimes has the effect of being less error-prone, having worked with a newer source.
I am learning F# by automating few of my tasks with F# scripts. I run this scripts with "fsi/fsarpi --exec" from command line. I am using .Net core for my work. One of the thing I was looking for is how to profile my F# script. I am primarily looking for
See overall time consumed by my entire script, I tried doing with stopwatch kind of functionality and it works well. Is there anything which can show time for my various top level function calls? Or timings/counts for function calls.
See the overall memory consumption by my script.
Hot spots in my scripts.
Overall I am trying to understand the performance bottlenecks of my scripts.
On a side note, is there a way to compile F# scripts to exe?
I recommend using BenchmarkDotNet for any benchmarking tasks (well, micro-benchmarks). Since it's a statistical tool, it accounts for many things that hand-rolled benchmarking will not. And just by applying a few attributes you can get a nifty report.
Create a .NET Core console app, add the BenchmarkDotNet package, create a benchmark, and run it to see the results. Here's an example that tests two trivial parsing functions, with one as the baseline for comparison, and informing BenchmarkDotNet to capture memory usage stats when running the benchmark:
open System
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
module Parsing =
/// "123,456" --> (123, 456)
let getNums (str: string) (delim: char) =
let idx = str.IndexOf(delim)
let first = Int32.Parse(str.Substring(0, idx))
let second = Int32.Parse(str.Substring(idx + 1))
first, second
/// "123,456" --> (123, 456)
let getNumsFaster (str: string) (delim: char) =
let sp = str.AsSpan()
let idx = sp.IndexOf(delim)
let first = Int32.Parse(sp.Slice(0, idx))
let second = Int32.Parse(sp.Slice(idx + 1))
struct(first, second)
[<MemoryDiagnoser>]
type ParsingBench() =
let str = "123,456"
let delim = ','
[<Benchmark(Baseline=true)>]
member __.GetNums() =
Parsing.getNums str delim |> ignore
[<Benchmark>]
member __.GetNumsFaster() =
Parsing.getNumsSpan str delim |> ignore
[<EntryPoint>]
let main _ =
let summary = BenchmarkRunner.Run<ParsingBench>()
printfn "%A" summary
0 // return an integer exit code
In this case, the results will show that the getNumsFaster function allocations 0 bytes and runs about 33% faster.
Once you've found something that consistently performs better and allocates less, you can transfer that over to a script or some other environment where the code will actually execute.
As for hotspots, your best tool is to actually run the script under a profiler like PerfView and look at CPU time and allocations caused by the script while it's executing. There's no simple answer here: interpreting profiling results correctly is challenging and time consuming work.
There's no way to compile an F# script to an executable for .NET Core. It's possible only on Windows/.NET Framework, but this is legacy behavior that is considered deprecated. It's recommended that you convert code in your script to an application if you'd like it to run as an executable.
UPDATE:
I now realize that the question was stupid, I should have just filed the issue. In hindsight, I don't see why I even asked this question.
The issue is here: https://github.com/fsharp/FSharp.Compiler.Service/issues/544
Original question:
I'm using FSharp Compiler Services for parsing some F# code.
The particular piece of code that I'm facing right now is this:
let f x y = x+y
let g = f 1
let h = (g 2) + 3
This program yields a TAST without the (+) call on the last line. That is, the compiler service returns TAST as if the last line was just let h = g 2.
The question is: is this is a legitimate bug that I ought to report or am I missing something?
Some notes
Here is a repo containing minimal repro (I didn't want to include it in this question, because Compiler Services require quite a bit of dancing around).
Adding more statements after the let h line does not change the outcome.
When compiled to IL (as opposed to parsed with Compiler Services), it seems to work as expected (e.g. see fiddle)
If I make g a value, the program parses correctly.
If I make g a normal function (rather than partially applied one), the program parses correctly.
I have no priori experience with FSharp.Compiler.Services but nevertheless I did a small investigation using Visual Studio's debugger. I analyzed abstract syntax tree of following string:
"""
module X
let f x y = x+y
let g = f 1
let h = (g 2) + 3
"""
I've found out that there's following object inside it:
App (Val (op_Addition,NormalValUse,D:\file.fs (6,32--6,33) IsSynthetic=false),TType_forall ([T1; T2; T3],TType_fun (TType_var T1,TType_fun (...,...))),...,...,...)
As you can see, there's an addition in 6th line between characters 32 and 33.
The most likely explanation why F# Interactive doesn't display it properly is a bug in a library (maybe AST is in an inconsistent state or pretty-printing is broken). I think that you should file a bug in project's issue tracker.
UPDATE:
Aforementioned object can be obtained in a debbuger in a following way:
error.[0]
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpImplementationFileDeclaration.Entity)
.Item2
.[2]
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpImplementationFileDeclaration.MemberOrFunctionOrValue)
.Item3
.f (private member)
.Value
(option of Microsoft.FSharp.Compiler.SourceCodeServices.FSharpExprConvert.ConvExprOnDemand#903)
.expr
Is there a reason why the exception is displayed before the function has even started in the following?
let listCharacters (text:string) =
let stripv3 = text.Split([|' '|], System.StringSplitOptions.RemoveEmptyEntries) |> System.String.Concat
for i in 0..2..stripv3.Length do
let char = stripv3.Chars(i)
if char <> ' ' then
printfn "%c" char
listCharacters "honey badger is a badass"
Produces the following output:-
System.IndexOutOfRangeException: Index was outside the bounds of the array.
h
n
y
a
g
r
s
b
d
s
Interestingly if I add a try..with any operations within the with occurs in order, any ideas why this is?
To summarise the comments above the problem is specific to the IDE and not to the language.
In this instance the issue was only seen in the Visual Studio F# Interactive view.
When executed fully or run via Visual Studio FSI.exe the exception was the last item to be output.
I have a F# console application that calls functions in other modules to perform its work from the main function entrypoint. I have a series of printfn in these other functions to provide me with information on the running of the program.
When compiled in DEBUG mode, all the statements print to the console. However, when compiled in RELEASE mode the only statements that print to the console are those that are directly inside the main entrypoint function.
What can I do to print statements for info in these other modules?
A code example is provided below:
Program.fs
[<EntryPoint>]
let main argv =
printfn "%s" "start" // prints in RELEASE and DEBUG mode
File1.Run
printfn "%s" "end" // prints in RELEASE and DEBUG mode
System.Console.ReadLine() |> ignore
0 // return an integer exit code
File1.fs
module File1
let Run =
let x = 1
printfn "%d" x // this won't print in RELEASE mode
yep you are (kindof) right - it will not print as Run is an expression here and it seems the compiler is optimizing it away in release mode.
And why should it not? In a perfect (pure/referential transparent) world you have an expression of type unit that can only have a single value () ... and you don't even use or remember the value!
To be honest I don't know if this is a bug or a feature ;)
anyway this simple trick will help you and indeed you should not use an expression with effects in the way you did:
let Run () =
let x = 1
printfn "%d" x
...
File1.Run ()
see - now it's a function and get's called at the right time and your output is back ;)
btw: if you are interested in this kind of stuff you an either use tools like Reflector (which I do not have at hand at the moment) or just use IL DASM (a tool VS should install anyway) - if you look at the compiled debug/release assemblies you will notice that nowhere something like this:
IL_001f: call class [FSharp.Core]Microsoft.FSharp.Core.Unit File1::get_Run()
can be found in the release version if you use the expression.
I played with it a bit and you have to get creative to make the compiler do this stuff:
For example
let reallyNeed v =
if v = ()
[<EntryPoint>]
let main argv =
printfn "%s" "start" // prints in RELEASE and DEBUG mode
File1.Run |> reallyNeed
printfn "%s" "end" // prints in RELEASE and DEBUG mode
System.Console.ReadLine () |> ignore
0 // return an integer exit code
works (it prints your 1) - while
ignore File1.Run
or
let reallyNeed v = ignore v
don`t ;) - seems like you have to actually use the value somewhere :D