lines=True parameter for the Json Type Provider and Json.Net library? - f#

I am working on this Kaggle competition. The Jupyter notebooks on Kaggle only support R and Python and I wanted to use F# locally. The problem is that the datasets are .json files and both the F# Json Type Provider and Newtonsoft libraries fail when trying to parse the files.
Here are examples of the code failing in F#:
open FSharp.Data
type Context = JsonProvider<"train.json">
let context = Context.
and
open System
open System.IO
open Newtonsoft.Json
open Newtonsoft.Json.Linq
let object = JObject.Parse(File.ReadAllText("train.json"));
object
This Python example uses these line of code to parse them correctly:
train = pd.read_json('../input/stanford-covid-vaccine/train.json', lines=True)
test = pd.read_json('../input/stanford-covid-vaccine/test.json', lines=True)
In the notebook, the author says that without the "lines=True" parameter, the read_json method fails with this trailing error.
My question: assuming tis is the same error, is there a way to apply that same kind of "lines=true" to the .NET libraries to parse the json?

I've seen a few datasets where the format was one valid JSON record per line:
{"event":"nothing 1"}
{"event":"nothing 2"}
{"event":"nothing 3"}
This is not valid JSON overall. I think you can either parse it line-by-line or you can turn it into valid JSON. For line-by-line parsing (which may be more efficient as you can do this in a streaming fashion), I would use:
open FSharp.Data
type Log = JsonProvider<"""{"event":"nothing 1"}""">
for line in File.ReadAllLines("some.json") do
let l = Log.Parse(line)
printfn "%s" l.Event

Related

which library is GetSamples in

Referenced here
Add calculated key to collection
not in FSharp.Data 2.4.6
and no name space referenced in the Answer given by the great TP
GetSample is present.....
GetSamples can be used on your JSON data represented by a JsonProvider.
Install the FSharp.Data Nuget package and add the import declaration by using open FSharp.Data. Once you have defined a JSON file or some JSON data you should be able to process it further.
open FSharp.Data
//type Values = JsonProvider<"yourData.json">
type Values = JsonProvider<""" [{"Name":"Hello"}, {"Name":"World"}] """>
printfn "%A" (Values.GetSamples())

Can I create a conditional literal?

In order to create a Json provider I need to pass a literal with the path. There are several people working on the project from different locations, and the paths are different in each case. (Actually only the beginning of each path). I tried to create a literal with pattern matching but the compiler does not accept it. Is there another way to do this?
My failed attempt is below:
open FSharp.Data
[<Literal>]
let bitbucketRoot = // Error message: This is not a valid constant expression
let computerName = Environment.MachineName
match computerName with
| "DESKTOP-G3OF32U" -> "C:\\Users\\Fernando"
| "HPW8" -> #"H:\Dropbox\"
| _ -> failwith "Unknown computer"
[<Literal>] // Error message: This is not a valid constant expression
let projDataPath = bitbucketRoot + #"Bitbucket\VSProjects\Fractal10\Fractal10\data\"
[<Literal>] // Error message: This is not a valid constant expression
let jsonPath = projDataPath + "fractal.json"
type PathInfo = JsonProvider<Sample=jsonPath>
I would advise that you store it in source control and make it a path relative to your project root, assuming you are working out of a common source control repository.
Either that, or host the sample on a public URL. (I wouldn't actually recommend this because including it in your source repository allows versioning and doesn't publicly expose your data)
You cannot create a conditional literal as the other comments point it out. However this is a fairly frequent use case and the way to deal with it is as follows:
#r #"..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"
open FSharp.Data
open System
open System.IO
[<Literal>]
let JsonSource = __SOURCE_DIRECTORY__ + #"\test.json"
type JSonType = JsonProvider<JsonSource>
let json1 = JSonType.GetSamples()
let anotherPath = #"C:\tmp"
let anotherJson = anotherPath + #"\test.json"
let json2 = JSonType.Load(anotherJson)
The __SOURCE_DIRECTORY__ directive will point to the project root (just display it in the REPL) and then you can add the filename to it and make that a literal. If you check in this file into a git repo, then everyone who checks it out can have it in a relative path, and you can refer it when generating the type. When actually using the type or referring to the full file you can just use the .Load() method to load any file, and this doesn't have to be a literal.
There is actually a second way, which could work for you depending on the circumstances, compile a sample, and distribute it as a .dll. You can refer to this and use it directly without having access to the actual file. Please see the Using the JSON Provider in a Library section at the end of the documentation.
I have not tried referring to the json in a config file, it might also be possible.

How to implement a read table function in F#? (like Vlookup)

I need a function.
When I enter 0~15 and Protection, it'll return 45%.
Just like Vlookup function in Excel.
Is there a function like this one in F#?
(At website try F#, Learn -> Financial Modeling -> Using the Yahoo Finance Type Provider
It recommended us to use Samples.Csv.dll. However, I failed to install it and don't want to install that package just for a function :(.. )
I followed the tutorial (http://fsharp.github.io/FSharp.Data/library/CsvProvider.html)
and tried to run the program on my computer. But I am in trouble now
It couldn't identify the type CsvProvider (So I can't use the function Stocks.Load.)
What's the problem..?
This is how the code looks when using the CSV type provider in F# Data. To get this to work, you'll need to add reference to FSharp.Data.dll. The best way to do this is to install the package from NuGet. In Visual Studio, it will add reference for you, and in command line you can say:
nuget install FSharp.Data
Alternatively, if you are in an F# script file, then you need to install the nuget package and then add #r #"C:\path\to\FSharp.Data.dll". Then you can write the following:
open FSharp.Data
// Generate type based on a local copy with sample data
type Data = CsvProvider<"sample.csv">
// Load actual data from a file (this can be a different file with the same structure)
let loaded = Data.Load("runtime/file/name.csv")
// Find row for a specified age range & look at the properties
let row = loaded.Data |> Seq.find (fun r -> r.Age = "0~15")
row.Protection
row.Saving
row.Specified
A very simple way to do this is with a DataTable:
open System.Data
open System.IO
open LumenWorks.Framework.IO.Csv
let vlookup =
let table = new DataTable()
do
use streamReader = new StreamReader(#"C:\data.csv")
use csvReader = new CsvReader(streamReader, hasHeaders=true)
table.Load(csvReader)
table.PrimaryKey <- [|table.Columns.["Age"]|]
fun age (column: string) -> table.Rows.Find([|age|]).[column]
//Usage
vlookup "0~15" "Protection" |> printfn "%A"
There's no lack of CSV readers out there. I used this especially fast one (also available on NuGet).

The type 'XmlProvider' is not defined

I'm trying to use the FSharp.Data third party library but am getting an error The type 'XmlProvider' is not defined on the XmlProvider class.
namespace KMyMoney
open FSharp.Data
module Read =
let xml = File.ReadAllText("KMyMoneySampleFile.xml")
type KMyMoneySource = XmlProvider<xml>
I'm using NuGet to get the library. Library is 'FSharp.Data 1.1.8'
When I type FSharp.Data. There are four options given: Csv, FreebaseOperators, Json, and RuntimeImplementation.
Am I missing something? I'm relatively new to F#. So, sorry for the simple question. I've looked on GitHub but haven't seen any mention of this problem. I am creating a library in F#.
The parameter between <> is the Sample parameter of the type provider, which has to be a compile time constant. That sample is used to infer the structure of the xml.
Try this instead:
namespace KMyMoney
open FSharp.Data
module Read =
type KMyMoneySource = XmlProvider<"KMyMoneySampleFile.xml">
and then do
let xml = KMyMoneySource.Load("KMyMoneySampleFile.xml")
or if you're reading the same file you used as the XmlProvider sample parameter, just do this:
let xml = KMyMoneySource.GetSample()
Note that Type Providers are a feature of F# 3.0, so this only works in VS2012 or upper. If you're using VS2010, you'll just get a bunch of syntax errors.
The data has to be available at compile-time which is achieved by putting a file reference in the angle brackets like this (notice that it is a string literal containing a file path, not a string binding containing the data). You can also achieve this by putting a string literal containing the format in the brackets:
type Stocks = CsvProvider<"../docs/MSFT.csv">
let csv = new CsvProvider<"1,2,3", HasHeaders = false, Schema = "Duration (float<second>),foo,float option">()
See here for more information.
Check out this link. Basically you need to add System.Xml.Linq.dll also as reference to your project.

How to read .docx file using F#

How can I read a .docx file using F#. If I use
System.IO.File.ReadAllText("D:/test.docx")
It is returning me some garbage output with beep sounds.
Here is a F# snippet that may give you a jump-start. It successfully extracts all text contents of a Word2010-created .docx file as a string of concatenated lines:
open System
open System.IO
open System.IO.Packaging
open System.Xml
let getDocxContent (path: string) =
use package = Package.Open(path, FileMode.Open)
let stream = package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream()
stream.Seek(0L, SeekOrigin.Begin) |> ignore
let xmlDoc = new XmlDocument()
xmlDoc.Load(stream)
xmlDoc.DocumentElement.InnerText
printfn "%s" (getDocxContent #"..\..\test.docx")
In order to make it working do not forget to reference WindowsBase.dll in your VS project.
.docx files follow Open Packaging Convention specifications. At the lowest level, they are .ZIP files. To read it programmatically, see example here:
A New Standard For Packaging Your Data
Packages and Parts
Using F#, it's the same story, you'll have to use classes in the System.IO.Packaging Namespace.
System.IO.File.ReadAllText has type of string -> string.
Because a .docx file is a binary file, it's probable that some of the chars in the strings have the bell character. Rather than ReadAllText, look into Word automation, the Packaging, or the OpenXML APIs
Try using the OpenXML SDK from Microsoft.
Also on the linked page is the Microsoft tool that you can use to decompile the office 2007 files. The decompiled code can be quite lengthy even for simple documents though so be warned. There is a big learning curve associated with OpenXML SDK. I'm finding it quite difficult to use.

Resources