parse log files with f# - f#

I'm trying to parse data from iis log files.
Each row has a date that I need like this:
u_ex15090503.log:3040:2015-09-05 03:57:45
And a name and email address I need in here:
&actor=%7B%22name%22%3A%5B%22James%2C%20Smith%22%5D%2C%22mbox%22%3A%5B%22mailto%3AJames.Smith%40student.colled.edu%22%5D%7D&
I start off by getting the correct column like this. This part works fine.
//get the correct column
let getCol =
let line = fileReader inputFile
line
|> Seq.filter (fun line -> not (line.StartsWith("#")))
|> Seq.map (fun line -> line.Split())
|> Seq.map (fun line -> line.[7],1)
|> Seq.toArray
getCol
Now I need to parse the above and get the date, name, and email, but I'm having a hard time figuring out how to do that.
So far I have this, which gives me 2 errors(below):
//split the above column at every "&"
let getDataInCol =
let line = getCol
line
|> Seq.map (fun line -> line.Split('&'))
|> Seq.map (fun line -> line.[5], 1)
|> Seq.toArray
getDataInCol
Seq.map (fun line -> line.Split('&'))
the field constructor 'Split' is not defined
The errors:
Seq.map (fun line -> line.[5], 1)
the operator 'expr.[idx]' has been used on an object of indeterminate type based on information prior to this program point.
Maybe I'm going about this all wrong. I'm very new to f# so I apologize for the sloppy code.

Something like this would get the name and email. You'll still need to parse the date.
#r "Newtonsoft.Json.dll"
open System
open System.Text.RegularExpressions
open Newtonsoft.Json.Linq
let (|Regex|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then Some(List.tail [ for g in m.Groups -> g.Value ])
else None
type ActorDetails =
{
Date: DateTime
Name: string
Email: string
}
let parseActorDetails queryString =
match queryString with
| Regex #"[\?|&]actor=([^&]+)" [json] ->
let jsonValue = JValue.Parse(Uri.UnescapeDataString(json))
{
Date = DateTime.UtcNow (* replace with parsed date *)
Name = jsonValue.Value<JArray>("name").[0].Value<string>()
Email = jsonValue.Value<JArray>("mbox").[0].Value<string>().[7..]
}
| _ -> invalidArg "queryString" "Invalid format"
parseActorDetails "&actor=%7B%22name%22%3A%5B%22James%2C%20Smith%22%5D%2C%22mbox%22%3A%5B%22mailto%3AJames.Smith%40student.colled.edu%22%5D%7D&"
val it : ActorDetails = {Date = 11/10/2015 9:14:25 PM;
Name = "James, Smith";
Email = "James.Smith#student.colled.edu";}

Related

Reading text file, iterating over lines to find a match, and return the value with FSharp

I have a text file that contains the following and I need to retrieve the value assigned to taskId, which in this case is AWc34YBAp0N7ZCmVka2u.
projectKey=ProjectName
serverUrl=http://localhost:9090
serverVersion=10.5.32.3
strong text**interfaceUrl=http://localhost:9090/interface?id=ProjectName
taskId=AWc34YBAp0N7ZCmVka2u
taskUrl=http://localhost:9090/api/ce/task?id=AWc34YBAp0N7ZCmVka2u
I have two different ways of reading the file that I've wrote.
let readLines (filePath:string) = seq {
use sr = new StreamReader (filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
and
let readLines (filePath:string) =
(File.ReadAllLines filePath)
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
At this point, I don't know how to approach getting the value I need. Options that, I think, are on the table are:
use Contains()
Regex
Record type
Active Pattern
How can I get this value returned and fail if it doesn't exist?
I think all the options would be reasonable - it depends on how complex the file will actually be. If there is no escaping then you can probably just look for = in the line and use that to split the line into a key value pair. If the syntax is more complex, this might not always work though.
My preferred method would be to use Split on string - you can then filter to find values with your required key, map to get the value and use Seq.head to get the value:
["foo=bar"]
|> Seq.map (fun line -> line.Split('='))
|> Seq.filter (fun kvp -> kvp.[0] = "foo")
|> Seq.map (fun kvp -> kvp.[1])
|> Seq.head
Using active patterns, you could define a pattern that takes a string and splits it using = into a list:
let (|Split|) (s:string) = s.Split('=') |> List.ofSeq
This then lets you get the value using Seq.pick with a pattern matching that looks for strings where the substring before = is e.g. foo:
["foo=bar"] |> Seq.pick (function
| Split ["foo"; value] -> Some value
| _ -> None)
The active pattern trick is quite neat, but it might be unnecessarily complicating the code if you only need this in one place.

F# return empty string in case of null

I'm trying to touch some F# language by developing a small "web crawler". I've got a functions declared like this:
let results = HtmlDocument.Load("http://joemonster.org//")
let images =
results.Descendants ["img"]
|> Seq.map (fun x ->
x.TryGetAttribute("src").Value.Value(),
x.TryGetAttribute("alt").Value.Value()
)
which of course should return for me a map of "src" and "alt" attributes for "img" tag. But when I'm encountering a situation when one of those are missing in the tag I'm getting an exception that TryGetAttribute is returning null. I want to change that function to return the attribute value or empty string in case of null.
I've tried out answers from this ticket but with no success.
TryGetAttribute returns an option type, and when it is None you can't get its value—you get an exception instead. You can pattern match against the returned option value and return an empty string for the None case:
let getAttrOrEmptyStr (elem: HtmlNode) attr =
match elem.TryGetAttribute(attr) with
| Some v -> v.Value()
| None -> ""
let images =
results.Descendants ["img"]
|> Seq.map (fun x -> getAttrOrEmptyStr x "src", getAttrOrEmptyStr x "alt")
Or a version using defaultArg and Option.map:
let getAttrOrEmptyStr (elem: HtmlNode) attr =
defaultArg (elem.TryGetAttribute(attr) |> Option.map (fun a -> a.Value())) ""
Or another option now that Option.defaultValue exists, and using HtmlAttribute.value function for a terser Option.map call:
let getAttrOrEmptyStr (elem: HtmlNode) attr =
elem.TryGetAttribute(attr)
|> Option.map HtmlAttribute.value
|> Option.defaultValue ""

Convert String to Key Value Pair in F#

Given a string such as
one:1.0|two:2.0|three:3.0
how do we create a dictionary of the form string: float?
open System
open System.Collections.Generic
let ofSeq (src:seq<'a * 'b>) =
// from fssnip
let d = new Dictionary<'a, 'b>()
for (k,v) in src do
d.Add(k,v)
d
let msg = "one:1.0|two:2.0|three:3.0"
let msgseq = msg.Split[|'|'|] |> Array.toSeq |> Seq.map (fun i -> i.Split(':'))
let d = ofSeq msgseq // The type ''a * 'b' does not match the type 'string []'
This operation would be inside a tight loop so efficiency would be a plus. Although I'd like to see a simple solution as well just to get my F# bearings.
Thanks.
How about something like this:
let msg = "one:1.0|two:2.0|three:3.0"
let splitKeyVal (str : string) =
match str.Split(':') with
|[|key; value|] -> (key, System.Double.Parse(value))
|_ -> invalidArg "str" "str must have the format key:value"
let createDictionary (str : string) =
str.Split('|')
|> Array.map (splitKeyVal)
|> dict
|> System.Collections.Generic.Dictionary
You could drop the System.Collections.Generic.Dictionary if you don't mind an IDictionary return type.
If you expect the splitKeyVal function to fail then you'd be better off expressing it as a function that returns option, e.g.:
let splitKeyVal (str : string) =
match str.Split(':') with
|[|key; valueStr|] ->
match System.Double.TryParse(valueStr) with
|true, value -> Some (key, value)
|false, _ -> None
|_ -> None
But then you'd also have to decide how you wanted to handle failure in the createDictionary function.
Not sure about the perf side but if you're sure of your input and can "afford" a warning you can go with :
let d =
msg.Split '|'
|> Array.map (fun s -> let [|key; value|] (*warning here*) = s.Split ':' in key, value)
|> dict
|> System.Collections.Generic.Dictionary // optional if a IDictionary<string, string> suffice

Loop through a string array to match a pattern

I have a log file that I'm trying to parse with Regex.
I create an array of rows from the log file like this:
let loadLog =
File.ReadAllLines "c:/access.log"
|> Seq.filter (fun l -> not (l.StartsWith("#")))
|> Seq.map (fun s -> s.Split())
|> Seq.map (fun l -> l.[7],1)
|> Seq.toArray
I then need to loop through this array. But I don't think this will work because line needs to be a string.
Is there a special way to handle something like this in f#?
type ActorDetails =
{
Date: DateTime
Name: string
Email: string
}
for line in loadLog do
let line queryString =
match queryString with
| Regex #"[\?|&]system=([^&]+)" [json] ->
let jsonValue = JValue.Parse(Uri.UnescapeDataString(json))
{
Date = DateTime.UtcNow (* replace with parsed date *)
Name = jsonValue.Value<JArray>("name").[0].Value<string>()
Email = jsonValue.Value<JArray>("mbox").[0].Value<string>().[7..]
}
Use a Partial Active Pattern (|Regex|_|) to do that
open System.Text.RegularExpressions
let (|Regex|_|) regexPattern input =
let regex = new Regex(regexPattern)
let regexMatch = regex.Match(input)
if regexMatch.Success
then Some regexMatch.Value
else None
let queryString input = function
| Regex #"[\?|&]system=([^&]+)" s -> s
| _ -> sprintf "other: %s" input

How to write a functional file "scanner"

First let me apologize for the scale of this problem but I'm really trying to think functionally and this is one of the more challenging problems I have had to work with.
I wanted to get some suggestions on how I might handle a problem I have in a functional manner, particularly in F#. I am writing a program to go through a list of directories and using a list of regex patterns to filter the list of files retrieved from the directories and using a second list of regex patterns to find matches in the text of the retreived files. I want this thing to return the filename, line index, column index, pattern and matched value for each piece of text that matches a given regex pattern. Also, exceptions need to be recorded and there are 3 possible exceptions scenarios: can't open the directory, can't open the file, reading content from the file failed. The final requirement of this is the the volume of files "scanned" for matches could be very large so this whole thing needs to be lazy. I'm not too worried about a "pure" functional solution as much as I'm interested in a "good" solution that reads well and performs well. One final challenge is to make it interop with C# because I would like to use the winform tools to attach this algorithm to a ui. Here is my first attempt and hopefully this will clarify the problem:
open System.Text.RegularExpressions
open System.IO
type Reader<'t, 'a> = 't -> 'a //=M['a], result varies
let returnM x _ = x
let map f m = fun t -> t |> m |> f
let apply f m = fun t -> t |> m |> (t |> f)
let bind f m = fun t -> t |> (t |> m |> f)
let Scanner dirs =
returnM dirs
|> apply (fun dirExHandler ->
Seq.collect (fun directory ->
try
Directory.GetFiles(directory, "*", SearchOption.AllDirectories)
with | e ->
dirExHandler e directory
Array.empty))
|> map (fun filenames ->
returnM filenames
|> apply (fun (filenamepatterns, lineExHandler, fileExHandler) ->
Seq.filter (fun filename ->
filenamepatterns |> Seq.exists (fun pattern ->
let regex = new Regex(pattern)
regex.IsMatch(filename)))
>> Seq.map (fun filename ->
let fileinfo = new FileInfo(filename)
try
use reader = fileinfo.OpenText()
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e ->
lineExHandler e filename index
None
else
None) (reader, 0)
|> (fun lines -> (filename, lines))
with | e ->
fileExHandler e filename
(filename, Seq.empty))
>> (fun files ->
returnM files
|> apply (fun contentpatterns ->
Seq.collect (fun file ->
let filename, lines = file
lines |>
Seq.collect (fun line ->
let content, index = line
contentpatterns
|> Seq.collect (fun pattern ->
let regex = new Regex(pattern)
regex.Matches(content)
|> (Seq.cast<Match>
>> Seq.map (fun contentmatch ->
(filename,
index,
contentmatch.Index,
pattern,
contentmatch.Value))))))))))
Thanks for any input.
Updated -- here is any updated solution based on feedback I received:
open System.Text.RegularExpressions
open System.IO
type ScannerConfiguration = {
FileNamePatterns : seq<string>
ContentPatterns : seq<string>
FileExceptionHandler : exn -> string -> unit
LineExceptionHandler : exn -> string -> int -> unit
DirectoryExceptionHandler : exn -> string -> unit }
let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq {
let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache
let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList
let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList
let getLines exHandler reader =
Seq.unfold (fun ((reader : StreamReader), index) ->
if not reader.EndOfStream then
try
let line = reader.ReadLine()
Some((line, index), (reader, index + 1))
with | e -> exHandler e index; None
else
None) (reader, 0)
for specifiedDirectory in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories)
with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||]
for file in files do
if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then
let lines =
let fileinfo = new FileInfo(file)
try
use reader = fileinfo.OpenText()
reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index)
with | e -> configuration.FileExceptionHandler e file; Seq.empty
for line in lines do
let content, index = line
for contentregex in contentRegexes do
for mmatch in content |> contentregex.Matches do
yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) }
Again, any input is welcome.
I think that the best approach is to start with the simplest solution and then extend it. Your current approach seems to be quite hard to read to me for two reasons:
The code uses a lot of combinators and function compositions in patterns that are not too common in F#. Some of the processing can be more easily written using sequence expressions.
The code is all written as a single function, but it is fairly complex and would be more readable if it was separated into multiple functions.
I would probably start by splitting the code in a function that tests a single file (say fileMatches) and a function that walks over the files and calls fileMatches. The main iteration can be quite nicely written using F# sequence expressions:
// Checks whether a file name matches a filename pattern
// and a content matches a content pattern.
let fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file =
// TODO: This can be imlemented using
// File.ReadLines which returns a sequence.
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories fileNamePatterns contentPatterns
(dirExHandler, fileExHandler, lineExHandler) = seq {
// Iterate over all the specified directories.
for specifiedDir in specifiedDirectories do
// Find all files in the directories (and handle exceptions).
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> dirExHandler e specifiedDir; [||]
// Iterate over all files and report those that match.
for file in files do
if fileMatches fileNamePatterns contentPatterns
(fileExHandler, lineExHandler) file then
// Matches! Return this file as part of the result.
yield file }
The function is still quite complicated, because you need to pass a lot of parameters around. Wrapping the parameters in a simple type or a record could be a good idea:
type ScannerArguments =
{ FileNamePatterns:string
ContentPatterns:string
FileExceptionHandler:exn -> string -> unit
LineExceptionHandler:exn -> string -> unit
DirectoryExceptionHandler:exn -> string -> unit }
Then you can define both fileMatches and scanner as functions that take just two parameters, which will make your code a lot more readable. Something like:
// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories (args:ScannerArguments) = seq {
for specifiedDir in specifiedDirectories do
let files =
try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
with e -> args.DirectoryExceptionHandler e specifiedDir; [||]
for file in files do
// No need to propagate all arguments explicitly to other functions.
if fileMatches args file then yield file }

Resources