Trying to learning F# and I tried to reimplement the following function in F#
private string[] GetSynonyms(string synonyms)
{
var items = Enumerable.Repeat(synonyms, 1)
.Where(s => s != null)
.Select(XDocument.Parse)
.Select(doc => doc.Root)
.Where(root => root != null)
.SelectMany(e => e.Elements(SynonymsNamespace + "synonym"))
.Select(e => e.Value)
.ToArray();
return items;
}
I got this far by myself
let xname = XNamespace.Get "http://localuri"
let syn = "<synonyms xmlns=\"http://localuri\"><synonym>a word</synonym><synonym>another word</synonym></synonyms>"
let synonyms str =
let items = [str]
items
|> List.map System.Xml.Linq.XDocument.Parse
|> List.map (fun x -> x.Root)
|> List.map (fun x -> x.Elements(xname + "synonym") |> Seq.cast<System.Xml.Linq.XElement>)
|> Seq.collect (fun x -> x)
|> Seq.map (fun x -> x.Value)
let a = synonyms syn
Dump a
Now I'm wondering if there is a more-functional way to write the same code.
By extracting the access to the properties to standalone functions I got this version
let xname = XNamespace.Get "http://localuri"
let syn = "<synonyms xmlns=\"http://localuri\"><synonym>a word</synonym><synonym>another word</synonym></synonyms>"
let getRoot (doc:System.Xml.Linq.XDocument) = doc.Root
let getValue (element:System.Xml.Linq.XElement) = element.Value
let getElements (element:System.Xml.Linq.XElement) =
element.Elements(xname + "synonym")
|> Seq.cast<System.Xml.Linq.XElement>
let synonyms str =
let items = [str]
items
|> List.map System.Xml.Linq.XDocument.Parse
|> List.map getRoot
|> List.map getElements
|> Seq.collect (fun x -> x)
|> Seq.map getValue
let a = synonyms syn
Dump a
But I still have a few concerns
Can I rewrite that Seq.collect (fun x -> x) in another way? It sounds redundant
Can I remove all those (fun x -> x.Property) without creating new functions?
How to actually return an array and not a Seq<'a> (which I understand is an IEnumerable<'a>)
Thanks
Seq.collect (fun x -> x) can be rewritten with the predefined id function to Seq.collect id
In F# 4.0 it can be removed for constructors only.
use Seq.toArray or Seq.toList
Would it be very wrong to drop the C#-code and go all in with the XML-provider in F#? In my world its always wrong to parse XML when there exists other solutions (unless Im trying to create octogonal wheels or moist gun powders other have made better before me).
In this regard I would even have used some transformation (XSLT) or selection (XPATH/XQUERY) unless I could use the XML-provider or some XSD (c#) for generating code.
If for some reason the XML is so unstructured that you actually need parsing, then the XML is arguably wrong...
If using the XmlProvider you get namespacing, types etc for free...
#r #"..\correct\this\path\to\packages\FSharp.Data.2.2.5\lib\net40\FSharp.Data.dll"
#r "System.Xml.Linq"
open FSharp.Data
[<Literal>]
let syn = "<synonyms xmlns=\"http://localuri\"><synonym>a word</synonym><synonym>another word</synonym></synonyms>"
type Synonyms = XmlProvider<syn>
let a = Synonyms.GetSample()
a.Synonyms |> Seq.iter (printfn "%A")
Mind that the XmlProvider also can take files or url as examples for inferring types etc, and that you also can have this code as the example and then use
let a = Synonyms.Load(stuff)
where stuff is a read from stream, textreader or URI and inferred according to your example. The sample and the stuff might even point to same file/Uri if this is some standard placing of data.
See also: http://fsharp.github.io/FSharp.Data/library/XmlProvider.html
Related
My data is a SEQUENCE of:
[(40,"TX");(48,"MO");(15,"TX");(78,"TN");(41,"VT")]
My code is as follows:
type Csvfile = CsvProvider<somefile>
let data = Csvfile.GetSample().Rows
let nullid row =
row.Id = 15
let otherid row =
row.Id= 40
let iddata =
data
|> Seq.filter (not nullid)
|> Seq.filter (not otherid)
I create the functions.
Then I want to call the "not" of those functions to filter them out of a sequence.
But the issue is that I am getting errors for "row.Id" in the first two functions, because you can only do that with a type.
How do I solve this problem so I can accomplish this successfully.
My result should be a SEQUENCE of:
[(48,"MO);(78,"TN");(41,"VT")]
You can use >> operator to compose the two functions:
let iddata =
data
|> Seq.filter (nullid >> not)
|> Seq.filter (othered >> not)
See Function Composition and Pipelining.
Or you can make it more explicit:
let iddata =
data
|> Seq.filter (fun x -> not (nullid x))
|> Seq.filter (fun x -> not (othered x))
You can see that in action:
let input = [|1;2;3;4;5;6;7;8;9;10|];;
let is3 value =
value = 3;;
input |> Seq.filter (fun x -> not (is3 x));;
input |> Seq.filter (not >> is3);;
They both print val it : seq<int> = seq [1; 2; 4; 5; ...]
Please see below what an MCVE might look in your case, for an fsx file you can reference the Fsharp.Data dll with #r, for a compiled project just reference the dll an open it.
#if INTERACTIVE
#r #"..\..\SO2018\packages\FSharp.Data\lib\net45\FSharp.Data.dll"
#endif
open FSharp.Data
[<Literal>]
let datafile = #"C:\tmp\data.csv"
type CsvFile = CsvProvider<datafile>
let data = CsvFile.GetSample().Rows
In the end this is what you want to achieve:
data
|> Seq.filter (fun x -> x.Id <> 15)
|> Seq.filter (fun x -> x.Id <> 40)
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
One way to do this is with SRTP, as they allow a way to do structural typing, where the type depends on its shape, for example in this case having the Id property. If you want you can define helper function for the two numbers 15 and 40, and use that in your filter, just like in the second example. However SRTP syntax is a bit strange, and it's designed for a use case where you need to apply a function to different types that have some similarity (basically like interfaces).
let inline getId row =
(^T : (member Id : int) row)
data
|> Seq.filter (fun x -> (getId x <> 15 ))
|> Seq.filter (fun x -> (getId x <> 40))
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
Now back to your original post, as you correctly point out your function will show an error, as you define it to be generic, but it needs to operate on a specific Csv row type (that has the Id property). This is very easy to fix, just add a type annotation to the row parameter. In this case your type is CsvFile.Row, and since CsvFile.Row has the Id property we can access that in the function. Now this function returns a Boolean. You could make it return the actual row as well.
let nullid (row: CsvFile.Row) =
row.Id = 15
let otherid (row: CsvFile.Row) =
row.Id = 40
Then what is left is applying this inside a Seq.filter and negating it:
let iddata =
data
|> Seq.filter (not << nullid)
|> Seq.filter (not << otherid)
|> Seq.toList
//val iddata : CsvProvider<...>.Row list = [(48, "MO"); (78, "TN"); (41, "VT")]
I know there's the back-pipe (<|) operator, referenced in several other SO answers. But that doesn't work well when combined with forward pipes (|>), which is common in chaining. However I'm looking for related options. Basically is there any built-in version of the below function definition? Or is this a bad/dangerous practice?
let inline (^%) f = f
let stuff =
[1;2;3]
|> Seq.filter ^% (>) 2
|> Seq.map ^% fun x -> x.ToString()
// compare to this, which doesn't compile (and would be hard to follow even if it did)
let stuff =
[1;2;3]
|> Seq.filter <| (>) 2
|> Seq.map <| fun x -> x.ToString()
There are some Haskell features, like optional infixing using backticks, and sections, which aren't available in F#. That makes certain constructs a bit more verbose.
Usually, I'd simply write a pipe of functions as the above like this:
let stuff =
[1;2;3]
|> Seq.filter (fun x -> x < 2)
|> Seq.map string
This is, in my opinion, much more readable. For example, using Seq.filter ^% (>) 2, I'd intuitively read that as meaning 'all values greater than 2', but that's not what it does:
> let inline (^%) f = f;;
val inline ( ^% ) : f:'a -> 'a
> let stuff =
[1;2;3]
|> Seq.filter ^% (>) 2
|> Seq.map ^% fun x -> x.ToString()
|> Seq.toList;;
val stuff : string list = ["1"]
If you leave the reader of the code in doubt of what the code does, you've just made everyone less productive. Using Seq.filter (fun x -> x < 2) may look more verbose, but is unambiguous to the reader.
I am currently working on a beginner's project to implement my own duplicate file finder. This is my first time working with a .NET language, so I am still extremely unfamiliar with .NET APIs.
Here is the code that I have written so far:
open System
open System.IO
open System.Collections.Generic
let directory = #somePath
let getAllFiles (directory : string) =
Directory.GetFiles(directory)
let getFileInfo (directory : string) =
directory
|> getAllFiles
|> Seq.map (fun eachFile -> (eachFile, new FileInfo(eachFile)))
let getFileLengths (directory: string) =
directory
|> getFileInfo
|> Seq.map (fun (eachFile, eachFileInfo : FileInfo) -> (eachFile, eachFileInfo.Length))
// If two files have the same lengths, they might be duplicates of each other.
let groupByFileLengths (directory: string) =
directory
|> getFileLengths
|> Seq.groupBy snd
|> Seq.map (fun (fileLength, files) -> fileLength, files |> Seq.map fst |> List.ofSeq)
let findGroupsOfTwoOrMore (directory: string) =
directory
|> groupByFileLengths
|> Seq.filter (snd >> List.length >> (<>) 1)
let constructHashtable (someTuple) =
let hashtable = new Hashtable()
someTuple
|> Seq.iter hashtable.Add
hashtable
let readAllBytes (tupleOfFileLengthsAndFiles) =
tupleOfFileLengthsAndFiles
|> snd
|> Seq.map (fun eachFile -> (File.ReadAllBytes eachFile, eachFile))
|> constructHashtable
What I want to do is to construct a hash table with the byte array of each file as the key, and the file name itself as the value. If multiple files with different file names share the same bye array, then they are duplicates, and my goal is to remove the duplicate files.
I have looked through the Hashtable namespace on MSDN, but there is no method for identifying hashtable keys that contain multiple values.
Edit: Here is my attempt at implementing MD5:
let readAllBytesMD5 (tupleOfFileLengthsAndFiles) =
let md5 = MD5.Create()
tupleOfFileLengthsAndFiles
|> snd
|> Seq.map (fun eachFile -> (File.ReadAllBytes eachFile, eachFile))
|> Seq.map (fun (byteArray, eachFile) -> (md5.ComputeHash(byteArray), eachFile))
|> Seq.map (fun (hashCode, eachFile) -> (hashCode.ToString, eachFile))
Please advise on how I may improve and continue, because I am stuck here due to not having a firm grasp of how MD5 works. Thank you.
Hashtable doesn't support multiple values for the same key - you'll get an exception when you try to add a second entry with the same key. It is also untyped, you should almost always prefer a typed mutable System.Collections.Generic.Dictionary or an immutable F# Map.
What you're looking for is a Map<byte array, Set<string>>. Here's my take on it:
let buildMap (paths: string array) =
paths
|> Seq.map (fun eachFile -> (File.ReadAllBytes eachFile, eachFile))
|> Seq.groupBy fst
|> Seq.map (fun (key, items) ->
key, items |> Seq.map snd |> Set.ofSeq)
|> Map.ofSeq
As an aside, unless you're comparing very, very small files, using the entire file contents as a key won't get you very far. You will probably want to look into generating checksums for those files and using them instead.
I am new to F# and trying to solve this Kata
The file football.dat contains the results from the English Premier League for 2001/2. The columns labeled ‘F’ and ‘A’ contain the total number of goals scored for and against each team in that season (so Arsenal scored 79 goals against opponents, and had 36 goals scored against them). Write a program to print the name of the team with the smallest difference in ‘for’ and ‘against’ goals.
When save the file and and the read it using File.ReadAllLines, my solutons works:
open System.IO
open System
let split (s:string) =
let cells = Array.ofSeq(s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries))
(cells.[1], int cells.[6], int cells.[8])
let balance t =
let (_,f,a) = t
-(f-a)
let lines = List.ofSeq(File.ReadAllLines(#"F:\Users\Igor\Downloads\football.dat"));;
lines
|> Seq.skip 5
|> Seq.filter (fun (s:string) -> s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries).Length = 10)
|> Seq.map split
|> Seq.sortBy balance
|> Seq.take 1
|> Seq.map (fun (n,_,_) -> printfn "%s" n)
but when instead of reading the file I download it using WebClient and split lines the rest of the code does not work. The sequence is the same length but F# Interactive does not show the elements and prints no output. The code is
open System.Net
open System
let split (s:string) =
let cells = Array.ofSeq(s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries))
(cells.[1], int cells.[6], int cells.[8])
let balance t =
let (_,f,a) = t
-(f-a)
let splitLines (s:string) =
List.ofSeq(s.Split([|'\n'|]))
let wc = new WebClient()
let lines = wc.DownloadString("http://pragdave.pragprog.com/data/football.dat")
lines
|> splitLines
|> Seq.skip 5
|> Seq.filter (fun (s:string) -> s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries).Length = 10)
|> Seq.map split
|> Seq.sortBy balance
|> Seq.take 1
|> Seq.map (fun (n,_,_) -> printfn "%s" n)
What is the difference? List.ofSeq(File.ReadAllLines..) retuns a sequence and downloading the file from the internet and splitting it by \n returns the same sequence
The last line should use Seq.iter instead of Seq.map and there are too many spaces in the expression that splits each line.
With these corrections it works ok:
open System.Net
open System
open System.IO
let split (s:string) =
let cells = Array.ofSeq(s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries))
(cells.[1], int cells.[6], int cells.[8])
let balance t =
let (_,f,a) = t
-(f-a)
let splitLines (s:string) =
List.ofSeq(s.Split([|'\n'|]))
let wc = new WebClient()
let lines = wc.DownloadString("http://pragdave.pragprog.com/data/football.dat") |> splitLines
let output =
lines
|> Seq.skip 5
|> Seq.filter (fun (s:string) -> s.Split([|' '|],StringSplitOptions.RemoveEmptyEntries).Length = 10)
|> Seq.map split
|> Seq.sortBy balance
|> Seq.take 1
|> Seq.iter (fun (n,_,_) -> Console.Write(n))
let stop = Console.ReadKey()
That URL is returning an HTML page, not a raw data file, could that be what is causing your problems?
Also it is usually good to verify what delimiter the page is using for newlines. That one is using 0x0A which is \n, but sometimes you will find \r\n or rarely \r.
EDIT:
Also you appear to be using map to handle printing, this isn't a good way to do it. I know I am having difficulty in general getting your sample to show an output when executing all at once.
I would recommend mapping to n and print some other way, such as using Seq.head.
In the following code Seq.generateUnique is constrained to be of type ((Assembly -> seq<Assembly>) -> seq<Assembly> -> seq<Assembly>).
open System
open System.Collections.Generic
open System.Reflection
module Seq =
let generateUnique =
let known = HashSet()
fun f initial ->
let rec loop items =
seq {
let cachedSeq = items |> Seq.filter known.Add |> Seq.cache
if not (cachedSeq |> Seq.isEmpty) then
yield! cachedSeq
yield! loop (cachedSeq |> Seq.collect f)
}
loop initial
let discoverAssemblies() =
AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
|> Seq.generateUnique (fun asm -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)
let test() = printfn "%A" (discoverAssemblies() |> Seq.truncate 2 |> Seq.map (fun asm -> asm.GetName().Name) |> Seq.toList)
for _ in 1 .. 5 do test()
System.Console.Read() |> ignore
I'd like it to be generic, but putting it into a file apart from its usage yields a value restriction error:
Value restriction. The value
'generateUnique' has been inferred to
have generic type val
generateUnique : (('_a -> '_b) -> '_c
-> seq<'_a>) when '_b :> seq<'_a> and '_c :> seq<'_a> Either make the
arguments to 'generateUnique' explicit
or, if you do not intend for it to be
generic, add a type annotation.
Adding an explicit type parameter (let generateUnique<'T> = ...) eliminates the error, but now it returns different results.
Output without type parameter (desired/correct behavior):
["mscorlib"; "TEST"]
["FSharp.Core"; "System"]
["System.Core"; "System.Security"]
[]
[]
And with:
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
Why does the behavior change? How could I make the function generic and achieve the desired behavior?
generateUnique is a lot like the standard memoize pattern: it should be used to calculate memoized functions from normal functions, not do the actual caching itself.
#kvb was right about the change in the definition required for this shift, but then you need to change the definition of discoverAssemblies as follows:
let discoverAssemblies =
//"memoize"
let generator = Seq.generateUnique (fun (asm:Assembly) -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)
fun () ->
AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
|> generator
I don't think that your definition is quite correct: it seems to me that f needs to be a syntactic argument to generateUnique (that is, I don't believe that it makes sense to use the same HashSet for different fs). Therefore, a simple fix is:
let generateUnique f =
let known = HashSet()
fun initial ->
let rec loop items =
seq {
let cachedSeq = items |> Seq.filter known.Add |> Seq.cache
if not (cachedSeq |> Seq.isEmpty) then
yield! cachedSeq
yield! loop (cachedSeq |> Seq.collect f)
}
loop initial