How to write a functional file "scanner" - f#

First let me apologize for the scale of this problem but I'm really trying to think functionally and this is one of the more challenging problems I have had to work with.
I wanted to get some suggestions on how I might handle a problem I have in a functional manner, particularly in F#. I am writing a program to go through a list of directories and using a list of regex patterns to filter the list of files retrieved from the directories and using a second list of regex patterns to find matches in the text of the retreived files. I want this thing to return the filename, line index, column index, pattern and matched value for each piece of text that matches a given regex pattern. Also, exceptions need to be recorded and there are 3 possible exceptions scenarios: can't open the directory, can't open the file, reading content from the file failed. The final requirement of this is the the volume of files "scanned" for matches could be very large so this whole thing needs to be lazy. I'm not too worried about a "pure" functional solution as much as I'm interested in a "good" solution that reads well and performs well. One final challenge is to make it interop with C# because I would like to use the winform tools to attach this algorithm to a ui. Here is my first attempt and hopefully this will clarify the problem:
open System.Text.RegularExpressions
open System.IO
type Reader<'t, 'a> = 't -> 'a //=M['a], result varies
let returnM x _ = x
let map f m = fun t -> t |> m |> f
let apply f m = fun t -> t |> m |> (t |> f)
let bind f m = fun t -> t |> (t |> m |> f)
let Scanner dirs =
returnM dirs
|> apply (fun dirExHandler ->
Seq.collect (fun directory ->
try
Directory.GetFiles(directory, "*", SearchOption.AllDirectories)
with | e ->
dirExHandler e directory
Array.empty))
|> map (fun filenames ->
returnM filenames
|> apply (fun (filenamepatterns, lineExHandler, fileExHandler) ->
Seq.filter (fun filename ->
filenamepatterns |> Seq.exists (fun pattern ->
let regex = new Regex(pattern)
regex.IsMatch(filename)))
>> Seq.map (fun filename ->
let fileinfo = new FileInfo(filename)
try
use reader = fileinfo.OpenText()
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e ->
lineExHandler e filename index
None
else
None) (reader, 0)
|> (fun lines -> (filename, lines))
with | e ->
fileExHandler e filename
(filename, Seq.empty))
>> (fun files ->
returnM files
|> apply (fun contentpatterns ->
Seq.collect (fun file ->
let filename, lines = file
lines |>
Seq.collect (fun line ->
let content, index = line
contentpatterns
|> Seq.collect (fun pattern ->
let regex = new Regex(pattern)
regex.Matches(content)
|> (Seq.cast<Match>
>> Seq.map (fun contentmatch ->
(filename,
index,
contentmatch.Index,
pattern,
contentmatch.Value))))))))))
Thanks for any input.
Updated -- here is any updated solution based on feedback I received:
open System.Text.RegularExpressions
open System.IO
type ScannerConfiguration = {
FileNamePatterns : seq<string>
ContentPatterns : seq<string>
FileExceptionHandler : exn -> string -> unit
LineExceptionHandler : exn -> string -> int -> unit
DirectoryExceptionHandler : exn -> string -> unit }
let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq {
let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache
let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList
let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList
let getLines exHandler reader =
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e -> exHandler e index; None
else
None) (reader, 0)
for specifiedDirectory in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories)
with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||]
for file in files do
if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then
let lines =
let fileinfo = new FileInfo(file)
try
use reader = fileinfo.OpenText()
reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index)
with | e -> configuration.FileExceptionHandler e file; Seq.empty
for line in lines do
let content, index = line
for contentregex in contentRegexes do
for mmatch in content |> contentregex.Matches do
yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) }
Again, any input is welcome.

I think that the best approach is to start with the simplest solution and then extend it. Your current approach seems to be quite hard to read to me for two reasons:
The code uses a lot of combinators and function compositions in patterns that are not too common in F#. Some of the processing can be more easily written using sequence expressions.
The code is all written as a single function, but it is fairly complex and would be more readable if it was separated into multiple functions.
I would probably start by splitting the code in a function that tests a single file (say fileMatches) and a function that walks over the files and calls fileMatches. The main iteration can be quite nicely written using F# sequence expressions:
// Checks whether a file name matches a filename pattern
// and a content matches a content pattern.
let fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file =
// TODO: This can be imlemented using
// File.ReadLines which returns a sequence.
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories fileNamePatterns contentPatterns
(dirExHandler, fileExHandler, lineExHandler) = seq {
// Iterate over all the specified directories.
for specifiedDir in specifiedDirectories do
// Find all files in the directories (and handle exceptions).
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> dirExHandler e specifiedDir; [||]
// Iterate over all files and report those that match.
for file in files do
if fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file then
// Matches! Return this file as part of the result.
yield file }
The function is still quite complicated, because you need to pass a lot of parameters around. Wrapping the parameters in a simple type or a record could be a good idea:
type ScannerArguments =
{ FileNamePatterns:string
ContentPatterns:string
FileExceptionHandler:exn -> string -> unit
LineExceptionHandler:exn -> string -> unit
DirectoryExceptionHandler:exn -> string -> unit }
Then you can define both fileMatches and scanner as functions that take just two parameters, which will make your code a lot more readable. Something like:
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories (args:ScannerArguments) = seq {
for specifiedDir in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> args.DirectoryExceptionHandler e specifiedDir; [||]
for file in files do
// No need to propagate all arguments explicitly to other functions.
if fileMatches args file then yield file }

Related

How to efficiently create a list in reversed order in F#

Is there anyway to contruct a list in reverse order without having to reverse it
Here is an example, I read all lines from stdin
#!/usr/bin/env dotnet fsi
open System
let rec readLines1 () =
let rec helper acc =
match Console.ReadLine() with
| null -> acc
| line ->
helper (line :: acc)
helper [] |> List.rev
readLines1 () |> List.iter (printfn "%s")
Before return from readLines1 I have to List.rev it so that is in right order. Since the result is a slightly linked list it will have to read all trough it and create the reversed version. Is there any way of creating the list in right order?
You can use a sequence instead of accumulating the lines in a list:
open System
let readLines1 () =
let rec helper () =
seq {
match Console.ReadLine() with
| null -> ()
| line ->
yield line
yield! helper ()
}
helper () |> Seq.toList
readLines1 () |> List.iter (printfn "%s")
You cannot create list in reverse order, because that would require mutation. If you read inputs one by one, and want to turn them into a list immediately, the only thing you can do is to create new list, linking to the previous one.
In practice, reversing the list is perfectly fine and that's probably the best way of solving this.
Out of curiosity, you could try defininig a mutable list that has the same structure as immutable F# list:
open System
type MutableList<'T> =
{ mutable List : MutableListBody<'T> }
and MutableListBody<'T> =
| Empty
| Cons of 'T * MutableList<'T>
Now you can implement your function by mutating the list:
let rec readLines () =
let res = { List = Empty }
let rec helper acc =
match Console.ReadLine() with
| null -> res
| line ->
let next = { List = Empty }
acc.List <- Cons(line, next)
helper next
helper res
This may be educational, but it's not very useful and, if you really wanted mutation in F#, you should probably use ResizeArray.
Yet another trick is to work with functions that take the tail of the list:
let rec readLines () =
let rec helper acc =
match Console.ReadLine() with
| null -> acc []
| line -> helper (fun tail -> acc (line :: tail))
helper id
In the line case, this returns a function that takes tail adds line before the tail and then calls whatever function was constructed before to add more things to the front.
This actually creates the list in the right order, but it's probably less efficient than creating a list and reversing it. It may look nice, but you are allocating a new function for each iteration, which is not better than allocating an extra copy of the list. (But it is a nice trick, nevertheless!)
Alternative solution without implementing recursive functions
let lines =
Seq.initInfinite (fun _ -> Console.ReadLine())
|> Seq.takeWhile (not << isNull)
|> Seq.toList

Reading text file, iterating over lines to find a match, and return the value with FSharp

I have a text file that contains the following and I need to retrieve the value assigned to taskId, which in this case is AWc34YBAp0N7ZCmVka2u.
projectKey=ProjectName
serverUrl=http://localhost:9090
serverVersion=10.5.32.3
strong text**interfaceUrl=http://localhost:9090/interface?id=ProjectName
taskId=AWc34YBAp0N7ZCmVka2u
taskUrl=http://localhost:9090/api/ce/task?id=AWc34YBAp0N7ZCmVka2u
I have two different ways of reading the file that I've wrote.
let readLines (filePath:string) = seq {
use sr = new StreamReader (filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
and
let readLines (filePath:string) =
(File.ReadAllLines filePath)
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
At this point, I don't know how to approach getting the value I need. Options that, I think, are on the table are:
use Contains()
Regex
Record type
Active Pattern
How can I get this value returned and fail if it doesn't exist?
I think all the options would be reasonable - it depends on how complex the file will actually be. If there is no escaping then you can probably just look for = in the line and use that to split the line into a key value pair. If the syntax is more complex, this might not always work though.
My preferred method would be to use Split on string - you can then filter to find values with your required key, map to get the value and use Seq.head to get the value:
["foo=bar"]
|> Seq.map (fun line -> line.Split('='))
|> Seq.filter (fun kvp -> kvp.[0] = "foo")
|> Seq.map (fun kvp -> kvp.[1])
|> Seq.head
Using active patterns, you could define a pattern that takes a string and splits it using = into a list:
let (|Split|) (s:string) = s.Split('=') |> List.ofSeq
This then lets you get the value using Seq.pick with a pattern matching that looks for strings where the substring before = is e.g. foo:
["foo=bar"] |> Seq.pick (function
| Split ["foo"; value] -> Some value
| _ -> None)
The active pattern trick is quite neat, but it might be unnecessarily complicating the code if you only need this in one place.

Can I call a function by name in f#?

Is there any way to call a function by name in F#? Given a string, I want to pluck a function value from the global namespace (or, in general, a given module), and call it. I know the type of the function already.
Why would I want to do this? I'm trying to work around fsi not having an --eval option. I have a script file that defines many int->() functions, and I want to execute one of them. Like so:
fsianycpu --use:script_with_many_funcs.fsx --eval "analyzeDataSet 1"
My thought was to write a trampoline script, like:
fsianycpu --use:script_with_many_funcs.fsx trampoline.fsx analyzeDataSet 1
In order to write "trampoline.fsx", I'd need to look up the function by name.
There is no built-in function for this, but you can implement it using .NET reflection. The idea is to search through all types available in the current assembly (this is where the current code is compiled) and dynamically invoke the method with the matching name. If you had this in a module, you'd have to check the type name too.
// Some sample functions that we might want to call
let hello() =
printfn "Hello world"
let bye() =
printfn "Bye"
// Loader script that calls function by name
open System
open System.Reflection
let callFunction name =
let asm = Assembly.GetExecutingAssembly()
for t in asm.GetTypes() do
for m in t.GetMethods() do
if m.IsStatic && m.Name = name then
m.Invoke(null, [||]) |> ignore
// Use the first command line argument (after -- in the fsi call below)
callFunction fsi.CommandLineArgs.[1]
This runs hello world when called by:
fsi --use:C:\temp\test.fsx --exec -- "hello"
You can use reflection to get the functions as MethodInfo's by FSharp function name
open System
open System.Reflection
let rec fsharpName (mi:MemberInfo) =
if mi.DeclaringType.IsNestedPublic then
sprintf "%s.%s" (fsharpName mi.DeclaringType) mi.Name
else
mi.Name
let functionsByName =
Assembly.GetExecutingAssembly().GetTypes()
|> Seq.filter (fun t -> t.IsPublic || t.IsNestedPublic)
|> Seq.collect (fun t -> t.GetMethods(BindingFlags.Static ||| BindingFlags.Public))
|> Seq.filter (fun m -> not m.IsSpecialName)
|> Seq.groupBy (fun m -> fsharpName m)
|> Map.ofSeq
|> Map.map (fun k v -> Seq.exactlyOne v)
You can then invoke the MethodInfo
functionsByName.[fsharpFunctionNameString].Invoke(null, objectArrayOfArguments)
But you probably need to do more work to parse your string arguments using the MethodInfo.GetParameters() types as a hint.
You could also use FSharp.Compiler.Service to make your own fsi.exe with an eval flag
open System
open Microsoft.FSharp.Compiler.Interactive.Shell
open System.Text.RegularExpressions
[<EntryPoint>]
let main(argv) =
let argAll = Array.append [| "C:\\fsi.exe" |] argv
let argFix = argAll |> Array.map (fun a -> if a.StartsWith("--eval:") then "--noninteractive" else a)
let optFind = argv |> Seq.tryFind (fun a -> a.StartsWith "--eval:")
let evalData = if optFind.IsSome then
optFind.Value.Replace("--eval:",String.Empty)
else
String.Empty
let fsiConfig = FsiEvaluationSession.GetDefaultConfiguration()
let fsiSession = FsiEvaluationSession(fsiConfig, argFix, Console.In, Console.Out, Console.Error)
if String.IsNullOrWhiteSpace(evalData) then
fsiSession.Run()
else
fsiSession.EvalInteraction(evalData)
0
If the above was compiled into fsieval.exe it could be used as so
fsieval.exe --load:script_with_many_funcs.fsx --eval:analyzeDataSet` 1

How to Get the F# Name of a Module, Function, etc. From Quoted Expression Match

I continue to work on a printer for F# quoted expressions, it doesn't have to be perfect, but I'd like to see what is possible. The active patterns in Microsoft.FSharp.Quotations.Patterns and Microsoft.FSharp.Quotations.DerivedPatterns used for decomposing quoted expressions will typically provide MemberInfo instances when appropriate, these can be used to obtain the name of a property, function, etc. and their "declaring" type, such as a module or static class. The problem is, I only know how to obtain the CompiledName from these instances but I'd like the F# name. For example,
> <# List.mapi (fun i j -> i+j) [1;2;3] #> |> (function Call(_,mi,_) -> mi.DeclaringType.Name, mi.Name);;
val it : string * string = ("ListModule", "MapIndexed")
How can this match be rewritten to return ("List", "mapi")? Is it possible?
FYI, here is my final polished solution from Stringer Bell and pblasucci's help:
let moduleSourceName (declaringType:Type) =
FSharpEntity.FromType(declaringType).DisplayName
let methodSourceName (mi:MemberInfo) =
mi.GetCustomAttributes(true)
|> Array.tryPick
(function
| :? CompilationSourceNameAttribute as csna -> Some(csna)
| _ -> None)
|> (function | Some(csna) -> csna.SourceName | None -> mi.Name)
//usage:
let sourceNames =
<# List.mapi (fun i j -> i+j) [1;2;3] #>
|> (function Call(_,mi,_) -> mi.DeclaringType |> moduleSourceName, mi |> methodSourceName);
You can use F# powerpack for that purpose:
open Microsoft.FSharp.Metadata
...
| Call(_, mi, _) ->
let ty = Microsoft.FSharp.Metadata.FSharpEntity.FromType(mi.DeclaringType)
let name = ty.DisplayName // name is List
However, I don't think if it's possible to retrieve function name with powerpack.
Edit:
As hinted by pblasucci, you can use CompilationSourceName attribute for retrieving source name:
let infos = mi.DeclaringType.GetMember(mi.Name)
let att = infos.[0].GetCustomAttributes(true)
let fName =
(att.[1] :?> CompilationSourceNameAttribute).SourceName // fName is mapi

Help with F#: "Collection was modified"

I am very new to F# here, I encounter the "Collection was modified" problem in F#. I know this problem is common when we are iterating through a Collection while modifying (adding/removing) it at the same time. And previous threads in stackoverflow also point to this.
But in my case, I am working on 2 different sets:
I have 2 collections:
originalCollection the original collection from which I want to remove stuff
colToRemove a collection containing the objects that I want to remove
Below is the code:
Seq.iter ( fun input -> ignore <| originalCollection.Remove(input)) colToRemove
And I got the following runtime error:
+ $exception {System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List1.Enumerator.MoveNextRare()
at System.Collections.Generic.List1.Enumerator.MoveNext()
at Microsoft.FSharp.Collections.IEnumerator.next#174[T](FSharpFunc2 f, IEnumerator1 e, FSharpRef1 started, Unit unitVar0)
at Microsoft.FSharp.Collections.IEnumerator.filter#169.System-Collections-IEnumerator-MoveNext()
at Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc2 action, IEnumerable`1 source)
here is the chunk of code:
match newCollection with
| Some(newCollection) ->
// compare newCollection to originalCollection.
// If there are things that exist in the originalCollection that are not in the newCollection, we want to remove them
let colToRemove = Seq.filter (fun input -> Seq.exists (fun i -> i.id = input.id) newCollection) originalCollection
Seq.iter ( fun input -> ignore <| originalCollection.Remove(input)) colToRemove
| None -> ()
Thanks!
Note: Working on a single-threaded environment here, so there are no multi-threading issues that might result in this exception.
The problem here is that colToRemove is not an independent collection but is a projection of the collection originalCollection. So changing originalCollection changes the projection which is not allowed during the iteration. The C# equivalent of the above code is the following
var colToRemove = originalCollection
.Where(input -> newCollection.Any(i -> i.id == input.id));
foreach (var in input in colToRemove) {
originalCollection.Remove(input);
}
You can fix this by making colToRemove an independent collection via the List.ofSeq method.
let colToRemove =
originalCollection
|> Seq.filter (fun input -> Seq.exists (fun i -> i.id = input.id) newCollection) originalCollection
|> List.ofSeq
I would not try to do a remove, since you are modifying a collection, but instead try to create another collection like so:
let foo () =
let orig = [1;2;3;4]
let torem = [1;2]
let find e =
List.tryFind (fun i-> i = e) torem
|> function
| Some _-> true
| None -> false
List.partition (fun e -> find e) orig
//or
List.filter (fun e-> find e) orig
hth

Resources