Converting a HaXml Document to a Content - xml-parsing

I am looking for a HaXml library function to convert a Text.XML.HaXml.Types.Document to a Text.XML.HaXml.Types.Content.
In the book Real World Haskell, the following function is provided:
getContent :: Document -> Content
getContent (Document _ _ e _) = CElem e
I believe that this should be changed to
getContent :: Document -> Content
getContent (Document _ _ e _) = CElem e undefined
I am surprised that I cannot find anything similar in the HaXml packages.

I think the function you want is docContent,
docContent :: i -> Document i -> Content i
The haddock says
Get the main element of the document so that you can apply CFilters directly. i is typically (posInNewCxt filename Nothing)
its implementation is more or less the expected,
docContent i (Document _ _ e _) = CElem e i
The mismatch between the code of RWH and now is because when RWH was written, the HaXml version was 1.13.*, when the types Document and Content were not yet parametrized.

Related

final output of Result, in F#

This seems like a question that has an ultra simple answer, but I can't think of it:
Is there a built in method, within Result, for:
let (a: Result<'a, 'a>) = ...
match a with
| Ok x -> x
| Error e -> e
No, because this function requires the Ok type and the Error type to be the same, which makes Result less general.
No, there isn't any function which will allow you to do so. But you can easily define it:
[<RequireQualifiedAccess>]
module Result =
let join (value: Result<'a, 'a>) =
match value with
| Ok v -> v
| Error e -> e
let getResult s =
if System.String.IsNullOrEmpty s then
Error s
else
Ok s
let a =
getResult "asd"
|> Result.join
|> printfn "%s"
It doesn't make Result less general (as said by #brianberns), because it's not an instance member. Existence of Unwrap doesn't make Task less general
Update
After more scrupulous searching inside FSharpPlus and FSharpx.Extras I've found necessary function. It's signature ('a -> 'c) -> ('b -> 'c) -> Result<'a,'b> -> c instead of Result<'a, 'a> -> 'a and it's called Result.either in both libraries (source 1 and source 2). So in order to get value we may pass id as both parameters:
#r "nuget:FSharpPlus"
open FSharpPlus
// OR
#r "nuget:FSharpx.Extras"
open FSharpx
getResult "asd"
|> Result.either id id
|> printfn "%s"
Also it's may be useful to define shortcut and call it Result.join or Result.fromEither as it's called in Haskell

Prevent shadowing of Ok and Error with FParsec?

Suppose I have a test like this:
module MyTests
open Xunit
open FParsec
open FsUnit.Xunit
open MyParsers
[<Fact>]
let ``pfoo works as expected`` () =
let text = "blahblahblah"
let actual =
match run pfoo text with
| Success (x, _, _) -> Result.Ok x
| Failure (s, _, _) -> Result.Error s
let expected : Result<Foo, string> =
Result.Ok
{
Foo = "blahblahblah"
}
expected
|> should equal actual
open FParsec will shadow Ok so that I need to fully qualify it like Result.Ok.
This is pretty annoying. Is there a good way to "open" Result again so that I can write Ok unqualified?
It's not Result that you need to "open", but Microsoft.FSharp.Core, which is the module in which Result and both its constructors are defined. This module is open by default, but you can open it again to have its definitions closer in the scope:
open Xunit
open FParsec
open FsUnit.Xunit
open MyParsers
open Microsoft.FSharp.Core
Alternatively, you can alias just the Ok identifier:
let Ok = Result.Ok
let x = Ok "foo" // x : Result<string, _>
I prefer this latter method, because it minimizes the impact surface and thus reduces the chance of unexpected surprises.
The downside is that the aliased Ok won't work for pattern matching:
match x with
| Ok y -> ... // This is Ok from FParsec
If you need pattern matching as well, you'll have to alias the matcher too:
let (|Ok|Error|) x = match x with | Result.Ok o -> Ok o | Result.Error e -> Error e
At which point I would probably fall back to reopening the module.

Writing unit tests against PrintfFormat

I have a type that I'm trying to understand by writing unit tests against it, however I can't reason what to do with PrintfFormat
type ValueFormat<'p,'st,'rd,'rl,'t,'a> = {
format: PrintfFormat<'p,'st,'rd,'rl,'t>
paramNames: (string list) option
handler: 't -> 'a
}
with
static member inline construct (this: ValueFormat<_,_,_,_,_,_>) =
let parser s =
s |> tryKsscanf this.format this.handler
|> function Ok x -> Some x | _ -> None
let defaultNames =
this.format.GetFormatterNames()
|> List.map (String.replace ' ' '_' >> String.toUpperInvariant)
|> List.map (sprintf "%s_VALUE")
let names = (this.paramNames ?| defaultNames) |> List.map (sprintf "<%s>")
let formatTokens = this.format.PrettyTokenize names
(parser, formatTokens)
I feel confident that I can figure everything out but PrintfFormat is throwing me with all those generics.
The file I'm looking at for the code I want to unit test is here for the FSharp.Commandline framework.
My question is, what is PrintfFormat and how should it be used?
A link to the printf.fs file is here. It contains the definition of PrintfFormat
The PrintfFormat<'Printer,'State,'Residue,'Result,'Tuple> type, as defined in the F# source code, has four type parameters:
'Result is the type that your formatting/parsing function produces. This is string for sprintf
'Printer is a type of a function generated based on the format string, e.g. "%d and %s" will give you a function type int -> string -> 'Result
'Tuple is a tuple type generated based on the format string, e.g. "%d and %s" will give you a tuple type int * string.
'State and 'Residue are type parameters that are used when you have a custom formatter using %a, but I'll ignore that for now for simplicity (it's never needed unless you have %a format string)
There are two ways of using the type. Either for formatting, in which case you'll want to write a function that returns 'Printer as the result. The hard thing about this is that you need to construct the return function using reflection. Here is an example that works only with one format string:
open Microsoft.FSharp.Reflection
let myformat (fmt:PrintfFormat<'Printer,obj,obj,string,'Tuple>) : 'Printer =
unbox <| FSharpValue.MakeFunction(typeof<'Printer>, fun o ->
box (o.ToString()) )
myformat "%d" 1
myformat "%s" "Yo"
This simply returns the parameter passed as a value for %d or %s. To make this work for multiple arguments, you'd need to construct the function recursively (so that it's not just e.g. int -> string but also int -> (int -> string))
In the other use, you define a function that returns 'Tuple and it needs to create a tuple containing values according to the specified formatting string. Here is a small sample that only handles %s and %d format strings:
open FSharp.Reflection
let myscan (fmt:PrintfFormat<'Printer,obj,obj,string,'Tuple>) : 'Tuple =
let args =
fmt.Value
|> Seq.pairwise
|> Seq.choose (function
| '%', 'd' -> Some(box 123)
| '%', 's' -> Some(box "yo")
| _ -> None)
unbox <| FSharpValue.MakeTuple(Seq.toArray args, typeof<'Tuple>)
myscan "%d %s %d"

Can I use StringFormat as TextWriterFormat? kfprintf / kprintf usage

I've got function to log to console
Printf.kprintf
(printfn
"[%s][%A] %s"
<| level.ToString()
<| DateTime.Now)
format // fprint to System.Console.Out maybe
but it's using Printf.StringFormat as format and now I want to follow same logic and print it to file.
So I try
Printf.kfprintf
(fun f ->
fprintfn file "[%s][%A] "
<| level.ToString()
<| DateTime.Now
) file (format)
And there are two things I can't understand. Why there is unit -> 'A instead of string -> 'A ? How should I use it? And Can I use my StringFormat here as TextWriterFormat ?
Another trouble with this is that with first snippet I inherit format to string -> 'Result thing but in kfprintf I can't do it because there is unit -> 'Result and format message appears before [x][x] stuff. I guess I can somehow inherit format to f but I can't find good example, the only I found is part of F# compiler:
[<CompiledName("PrintFormatToTextWriter")>]
let fprintf (os: TextWriter) fmt = kfprintf (fun _ -> ()) os fmt
[<CompiledName("PrintFormatLineToTextWriter")>]
let fprintfn (os: TextWriter) fmt = kfprintf (fun _ -> os.WriteLine()) os fmt
But how can I use this unit ? How can I post message after my message?
I don't think you need to use Printf.kfprintf, you can carry on using Printf.kprintf as the inner fprintfn uses the TextWriter.
let logToWriter writer level format =
Printf.kprintf (fprintfn writer "[%s][%A] %s"
<| level.ToString()
<| System.DateTime.Now) format
Also see this for an example of using Printf.kfprintf.

How to write a functional file "scanner"

First let me apologize for the scale of this problem but I'm really trying to think functionally and this is one of the more challenging problems I have had to work with.
I wanted to get some suggestions on how I might handle a problem I have in a functional manner, particularly in F#. I am writing a program to go through a list of directories and using a list of regex patterns to filter the list of files retrieved from the directories and using a second list of regex patterns to find matches in the text of the retreived files. I want this thing to return the filename, line index, column index, pattern and matched value for each piece of text that matches a given regex pattern. Also, exceptions need to be recorded and there are 3 possible exceptions scenarios: can't open the directory, can't open the file, reading content from the file failed. The final requirement of this is the the volume of files "scanned" for matches could be very large so this whole thing needs to be lazy. I'm not too worried about a "pure" functional solution as much as I'm interested in a "good" solution that reads well and performs well. One final challenge is to make it interop with C# because I would like to use the winform tools to attach this algorithm to a ui. Here is my first attempt and hopefully this will clarify the problem:
open System.Text.RegularExpressions
open System.IO
type Reader<'t, 'a> = 't -> 'a //=M['a], result varies
let returnM x _ = x
let map f m = fun t -> t |> m |> f
let apply f m = fun t -> t |> m |> (t |> f)
let bind f m = fun t -> t |> (t |> m |> f)
let Scanner dirs =
returnM dirs
|> apply (fun dirExHandler ->
Seq.collect (fun directory ->
try
Directory.GetFiles(directory, "*", SearchOption.AllDirectories)
with | e ->
dirExHandler e directory
Array.empty))
|> map (fun filenames ->
returnM filenames
|> apply (fun (filenamepatterns, lineExHandler, fileExHandler) ->
Seq.filter (fun filename ->
filenamepatterns |> Seq.exists (fun pattern ->
let regex = new Regex(pattern)
regex.IsMatch(filename)))
>> Seq.map (fun filename ->
let fileinfo = new FileInfo(filename)
try
use reader = fileinfo.OpenText()
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e ->
lineExHandler e filename index
None
else
None) (reader, 0)
|> (fun lines -> (filename, lines))
with | e ->
fileExHandler e filename
(filename, Seq.empty))
>> (fun files ->
returnM files
|> apply (fun contentpatterns ->
Seq.collect (fun file ->
let filename, lines = file
lines |>
Seq.collect (fun line ->
let content, index = line
contentpatterns
|> Seq.collect (fun pattern ->
let regex = new Regex(pattern)
regex.Matches(content)
|> (Seq.cast<Match>
>> Seq.map (fun contentmatch ->
(filename,
index,
contentmatch.Index,
pattern,
contentmatch.Value))))))))))
Thanks for any input.
Updated -- here is any updated solution based on feedback I received:
open System.Text.RegularExpressions
open System.IO
type ScannerConfiguration = {
FileNamePatterns : seq<string>
ContentPatterns : seq<string>
FileExceptionHandler : exn -> string -> unit
LineExceptionHandler : exn -> string -> int -> unit
DirectoryExceptionHandler : exn -> string -> unit }
let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq {
let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache
let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList
let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList
let getLines exHandler reader =
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e -> exHandler e index; None
else
None) (reader, 0)
for specifiedDirectory in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories)
with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||]
for file in files do
if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then
let lines =
let fileinfo = new FileInfo(file)
try
use reader = fileinfo.OpenText()
reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index)
with | e -> configuration.FileExceptionHandler e file; Seq.empty
for line in lines do
let content, index = line
for contentregex in contentRegexes do
for mmatch in content |> contentregex.Matches do
yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) }
Again, any input is welcome.
I think that the best approach is to start with the simplest solution and then extend it. Your current approach seems to be quite hard to read to me for two reasons:
The code uses a lot of combinators and function compositions in patterns that are not too common in F#. Some of the processing can be more easily written using sequence expressions.
The code is all written as a single function, but it is fairly complex and would be more readable if it was separated into multiple functions.
I would probably start by splitting the code in a function that tests a single file (say fileMatches) and a function that walks over the files and calls fileMatches. The main iteration can be quite nicely written using F# sequence expressions:
// Checks whether a file name matches a filename pattern
// and a content matches a content pattern.
let fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file =
// TODO: This can be imlemented using
// File.ReadLines which returns a sequence.
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories fileNamePatterns contentPatterns
(dirExHandler, fileExHandler, lineExHandler) = seq {
// Iterate over all the specified directories.
for specifiedDir in specifiedDirectories do
// Find all files in the directories (and handle exceptions).
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> dirExHandler e specifiedDir; [||]
// Iterate over all files and report those that match.
for file in files do
if fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file then
// Matches! Return this file as part of the result.
yield file }
The function is still quite complicated, because you need to pass a lot of parameters around. Wrapping the parameters in a simple type or a record could be a good idea:
type ScannerArguments =
{ FileNamePatterns:string
ContentPatterns:string
FileExceptionHandler:exn -> string -> unit
LineExceptionHandler:exn -> string -> unit
DirectoryExceptionHandler:exn -> string -> unit }
Then you can define both fileMatches and scanner as functions that take just two parameters, which will make your code a lot more readable. Something like:
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories (args:ScannerArguments) = seq {
for specifiedDir in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> args.DirectoryExceptionHandler e specifiedDir; [||]
for file in files do
// No need to propagate all arguments explicitly to other functions.
if fileMatches args file then yield file }

Resources