I have a sqlite table with a mix of integer and float columns. I'm trying to get the max and min values of each column. For integer columns the following code works but I get a cast error when using the same code on float columns:
let numCats = query{for row in db do minBy row.NumCats}
For float columns I'm using the following code but it's slow.
let CatHight = query{for row in db do select row.CatHeight} |> Seq.toArray |> Array.max
I have 8 integer columns and 9 float columns and the behavior has been consistent across all columns so that's why I think it's an issue with the column type. But I'm new to F# and don't know anything so I'm hoping you can help me.
Thank you for taking the time to help, it's much appreciated.
SQLProvider version: 1.0.41
System.Data.SQLite.Core version: 1.0.104
The error is: System.InvalidCastException occurred in FSharp.Core.dll
Added Information
I created a new table with one column of type float. I inserted the values 2.2 and 4.2. Using SQLProvider and System.Data.SQLite.Core I connected queried the database using minBy or maxBy and I get the cast exception. If the column type is integer it works correctly.
More Added Information
Exception detail:
System.Exception was unhandled
Message: An unhandled exception of type 'System.Exception' occurred in >FSharp.Core.dll
Additional information: Unsupported execution expression value(FSharp.Data.Sql.Runtime.QueryImplementation+SqlQueryable1[FSharp.>Data.Sql.Common.SqlEntity]).Min(row => >Convert(Convert(row.GetColumn("X"))))`
Code that fails:
open FSharp.Data.Sql
[<Literal>]
let ConnectionString =
"Data Source=c:\MyDB.db;" +
"Version=3;foreign keys=true"
type Sql = SqlDataProvider<Common.DatabaseProviderTypes.SQLITE,
ConnectionString,
//ResolutionPath = resolutionPath,
CaseSensitivityChange = Common.CaseSensitivityChange.ORIGINAL>
let ctx = Sql.GetDataContext()
let Db = ctx.Main.Test
let x = query{for row in Db do minBy row.X}
printfn "x: %A" x
Update 2/1/17
Another user was able to reproduce the issue so I filed an Issue with SQLProvider. I'm now looking at workarounds and while the following code works and is fast, I know there's a better way to do it - I just can't find the correct way. If somebody answers with better code I'll accept that answer. Thanks again for all the help.
let x = query {for row in db do
sortBy row.Column
take 1
select row.Column } |> Seq.toArray |> Array.min
This is my workaround that #s952163 and good people in the SO f# chat room helped me with. Thanks again to everyone who helped.
let x = query {for row in db do
sortBy row.Column
take 1
select row.Column } |> Seq.head
You need to coerce the output column to int or float (whichever you need or is giving trouble to you). You also need to take care in case any of your columns are nullable. The example below will coerce the column to float first (to take care of being nullable), then convert it to int, and finally get the minimum:
let x = query { for row in MYTABLE do
minBy (int (float row.MYCOLUMN))}
You might want to change the order of course, or just say float Mycolumn.
Update:
With Sqlite it indeed causes an error. You might want to do query { ... } |> Seq.minBy to extract the smallest number.
Related
Let's say I have a value defined as a sort of commission formula
let address_commission = 1.0 // minimal simplified example
and I want to apply the above said commission to an amount I'm reading from the DB (the code is from a window WCF service I have in production)
let address_commission = 1.0 // minimal simplified example
new Model.ClaimModel(
//RequestRow = i, recounting
Code = (row.["claim_code"] :?> string),
EvtDate = (row.["event_date"] :?> DateTime),
// skipping lines...
Amount = (row.["amount"] :?> double) * address_commission,
now I see that the amount line compiles fine, but I also need to include the same commission in the following
PrevAmount = (if row.IsNull("prev_amount") then Nullable() else (row.["prev_amount"] :?> Nullable<double>)),
which is wrong since The type 'float' does not match the type 'obj'
Therefore I've tried also
PrevAmount = (if row.IsNull("prev_amount") then Nullable() else (((row.["prev_amount"] :?> double) * address_commission) :?> Nullable<double>)),
but it also fails with The type 'double' does not have any proper subtypes and cannot be used as the source of a type test or runtime coercion.
What is the correct way to handle this?
:?> is a dynamic cast and it's only checked at run-time so better try to avoid it. If you are accessing databases it helps to open the open FSharp.Linq.NullableOperators namespace. (The link is gone for me but it's somewhere on docs or msdn). Then you can use ?*? and similar operators. For example:
let x = System.Nullable<float> 4.
let y = x ?* 3.0
//val y : System.Nullable<float> = 12.0
You can have ? on either or both sides.
You will get back a Nullable float which you can coerce to an option with
Option.ofNullable(y) or to a double float y.
I'm going to use only one type coercion and wrap it within a Nullable(...)
PrevAmount = (if row.IsNull("prev_amount") then Nullable() else Nullable((row.["prev_amount"] :?> double) * address_commission)),
It compiles and looks ok to me, but I'm still open to different answers if they are more correct than mine
FAOCropsLivestock.csv contains more than 14 million row. In my .fs file I have declared
type FAO = CsvProvider<"c:\FAOCropsLivestock.csv">
and tried to work with follwoing code
FAO.GetSample().Rows.Where(fun x -> x.Country = country) |> ....
FAO.GetSample().Filter(fun x -> x.Country = country) |> ....
In both cases, exception was thrown.
I also have tried with follwoing code after loading the csv file in MSSQL Server
type Schema = SqlDataConnection<conStr>
let db = Schema.GetDataContext()
db.FAOCropsLivestock.Where(fun x-> x.Country = country) |> ....
it works. It also works if I issue query using OleDb connection, but it is slow.
How can I get a squence out of it using CsvProvider?
If you refer to the bottom of the CSV Type Provider documentation, you will see a section on handling large datasets. As explained there, you can set CacheRows = false which will aid you when it comes to handling large datasets.
type FAO = CsvProvider<"c:\FAOCropsLivestock.csv", CacheRows = false>
You can then use standard sequence operations over the rows of the CSV as a sequence without loading the entire file into memory. e.g.
FAO.GetSample().Rows |> Seq.filter (fun x -> x.Country = country) |> ....
You should, however, take care to only enumerate the contents once.
I am just starting to learn F#, and impressed by the type inference I thought I would try a function that gets the first record from a table (using query expressions, Linq style):
let getfirst data =
let result = query { for n in data do take 1 }
result |> Seq.head
This works, the type is IQueryable<'a> -> 'a.
But why doesn't this version work?
let getfirst2 data =
query { for n in data do head }
Shouldn't for n in data do head give a scalar 'a just like last time? Can someone explain why the second version doesn't work, and how to make it work without using Seq.head?
The reason is that the query builder has a somewhat hacky overloaded Run method for running queries, with the following overloads:
QueryBuilder.Run : Quotations.Expr<'t> -> 't
QueryBuilder.Run : Quotations.Expr<Linq.QuerySource<'t, IEnumerable>> -> seq<'t>
QueryBuilder.Run : Quotations.Expr<Linq.QuerySource<'t, IQueryable>> -> IQueryable<'t>
In your case, any of the overloads could apply, given a suitable type for data (though QuerySource<_,_> is a type which isn't ever meant to be used by user code, so two of the overloads are quite unlikely). Unfortunately, due to the strange way these overloads are defined (the first and second are actually extension methods defined in separate modules), the third one wins the overload resolution battle.
I don't know why, but when you hover over the data argument in getfirst2 you see it's of type System.Linq.IQueryable<Linq.QuerySource<'a, System.Linq.IQueryable>> when it really should be System.Linq.IQueryable<'a>.
You can "fix" it by adding type annotations:
open System.Linq
let getfirst2 (data : IQueryable<'a>) : 'a = query {
for item in data do
head
}
Then it works like you have expected:
[1 .. 10]
|> System.Linq.Queryable.AsQueryable
|> getfirst2
|> printfn "%d" // Prints 1.
Maybe someone else can shed some light on why the compiler infers the types it does.
I wanted to filter a Deedle dataframe based on a list of values how would I go about doing this?
I had an idea to use the following code below:
let d= df1|>filterRowValues(fun row -> row.GetAs<float>("ts") = timex)
However the issue with this is that it is only based on one variable, I then thought of combining this with a for loop and an append function:
for i in 0.. recd.length -1 do
df2.Append(df1|>filterRowValues(fun row -> row.GetAs<float>("ts") = recd.[i]))
This does not work either however and there must be a better way of doing this without using a for loop. In R I could for instance using an %in%.
You can use the F# set type to create a set of the values that you are interested. In the filtering, you can then check whether the set contains the actual value for the row.
For example, say that you have recd of type seq<float>. Then you should be able to write:
let recdSet = set recd
let d = df1 |> Frame.filterRowValues (fun row ->
recdSet.Contains(row.GetAs<float>("ts"))
Some other things that might be useful:
You can replace row.GetAs<float>("ts") with just row?ts (which always returns float and works only when you have a fixed name, like "ts", but it makes the code nicer)
Comparing float values might not be the best thing to do (because of floating point imprecisions, this might not always work as expected).
I was trying to implement a Deedle solution for the little challenge from #migueldeicaza to achieve in F# what was done in http://t.co/4YFXk8PQaU with python and R. The csv source data is available from the link.
The start is simple but now, while trying to order based upon a column series of float values I'm struggling to understand the syntax for the IndexRows type annotation.
#I "../packages/FSharp.Charting.0.90.5"
#I "../packages/Deedle.0.9.12"
#load "FSharp.Charting.fsx"
#load "Deedle.fsx"
open System
open Deedle
open FSharp.Charting
let bodyCountData = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/film_death_counts.csv")
bodyCountData?DeathsPerMinute <- bodyCountData?Body_Count / bodyCountData?Length_Minutes
// select top 3 rows based upon default ordinal indexer
bodyCountData.Rows.[0..3]
// create a new frame indexed and ordered by descending number of screen deaths per minute
let bodyCountDataOrdered =
bodyCountData
|> Frame.indexRows <float>"DeathsPerMinute" // uh oh error here - I'm confused
And because I can't figure that syntax out... various messages like:
Error 1 The type '('a -> Frame<'c,Frame<int,string>>)' does not support the 'comparison' constraint. For example, it does not support the 'System.IComparable' interface. See also c:\wd\RPythonFSharpDFChallenge\RPythonFSharpDFChallenge\EvilMovieQuery.fsx(18,4)-(19,22). c:\wd\RPythonFSharpDFChallenge\RPythonFSharpDFChallenge\EvilMovieQuery.fsx 19 8 RPythonFSharpDFChallenge
Error 2 Type mismatch. Expecting a
'a -> Frame<'c,Frame<int,string>>
but given a
'a -> float
The type 'Frame<'a,Frame<int,string>>' does not match the type 'float' c:\wd\RPythonFSharpDFChallenge\RPythonFSharpDFChallenge\EvilMovieQuery.fsx 19 25 RPythonFSharpDFChallenge
Error 3 This expression was expected to have type
bool
but here has type
string c:\wd\RPythonFSharpDFChallenge\RPythonFSharpDFChallenge\EvilMovieQuery.fsx 19 31 RPythonFSharpDFChallenge
Edit: Just thinking about this... indexing on a measured float is a silly thing to do anyway - duplicates and missing values in real world data. So, I wonder what a more sensible approach to this would be. I still need to find the 25 max values... Maybe I can work this out for myself...
With Deedle 1.0, you can sort on an arbitrary column.
See: http://bluemountaincapital.github.io/Deedle/reference/deedle-framemodule.html#section7