Proper use of fold in F# - f#

I am new to F#. I am trying to use List.fold to help me generate a list of categories and sub-categories based on their Id and ParentId fields. It seems I probably made this code more complex than need be, as I'm getting the stackoverflow error. What am I doing wrong or missing? All related feedback is appreciated.
// types
type CategoryStructure = {
Id: ValidString;
ParentId: ValidString;
Name: ValidString;
Abbreviation: ValidString;
Description: ValidString;
SapId: ValidString;
Section: ValidString;
SectionPosition: ValidString
}
type DynamicCategories = {
Category: CategoryStructure;
SubCategories: seq<DynamicCategories>
}
// this is the function that produces the stack overflow error
let rec private structureCategories (fullList: CategoryStructure list)
(list: CategoryStructure list) =
List.fold (fun acc elem ->
// get all categories and details
let categories = fullList
let mainAcc =
[
for row in categories do
if row = elem
then
let subs =
List.fold (fun acc' elem' ->
if row.Id = elem'.ParentStructureId
then
let foundSubCategory =
{
Category = elem';
SubCategories = structureCategories fullList list |> Seq.ofList
}
foundSubCategory :: acc'
else acc'
) List.empty<DynamicCategories> categories
|> Seq.ofList
yield{
Category = elem;
SubCategories = subs
}
]
mainAcc # acc
) List.empty<DynamicCategories> list
// this function gets the initial parent categories and calls the above function
let getStructuredCategories () =
let categories = allCategoriesAndDetails () |> List.ofSeq
[
for row in categories do
if row.ParentStructureId = NotValid
then yield row
] |> structureCategories categories |> Seq.ofList

You keep calling structureCategories with the same arguments - fullList and list. Since arguments are same, it proceeds to do exactly the same thing as on the previous pass, and ends up calling itself again, with the same arguments. And so on.
This is unbounded recursion ("unbounded" here means "doesn't know when to stop recurring"), and it is also not "tail recursion", so quite naturally, it causes stack overflow.
If you want to turn the flat list into a tree-like structure, you could do a bit simpler than this:
let getChildren fullList parentId = fullList |> List.filter (fun c -> c.ParentId = parentId)
let rec toTree fullList root =
{ Category = root;
SubCategories =
getChildren fullList root.Id
|> List.map (toTree fullList) }
With this, you'll be left with two problems, which I don't know how to solve without knowing more about your requirements:
This will still cause stack overflow if the original list happens to have cycles.
You need to decide who the root(s) of the tree is (or are). Intuitively, this would be indicated via "empty" ParentId, but it is unclear from your data structure what "empty" means.
And finally, this naive solution, while better than your original one, is still a bit slower than it needs to be. It iterates over the whole list once, and for every node does another pass to determine its children, resulting in overall complexity of O(N^2). This may be fine if you expect relatively small list, but not so fine for larger lists. In that case, I would first turn the list into a hashtable (keyed by ParentId) and then use that to find children instead of List.filter.

Thanks to Fyodor, I saw my mistake. He was dead on about calling the same arguments. I added this bit of code right before the foundSubCategory value:
let modifiedList = elem' :: List.empty<CategoryStructure>
and then called that value in the subsequent code:
let foundSubCategory =
{
Category = elem';
SubCategories = structureCategories fullList modifiedList |> Seq.ofList
}
This solved my issue, but now as Fyodor alluded to, I now have to refactor this into something more efficient.
UPDATE
With the insight that Fyodor pointed out this is the current state of my code, which replaces the original code:
let getStructuredCategories () =
let fullList = allCategoriesAndDetails ()
let parentList () =
allCategoriesAndDetails ()
|> Seq.filter (fun p -> p.ParentStructureId = NotValid)
let rec toTree (fullList': seq<CategoryStructure>) (parent: CategoryStructure) =
fullList'
|> Seq.filter (fun x -> x.ParentStructureId = parent.Id)
|> Seq.map (fun x ->
{
Category = x;
SubCategories =
toTree fullList' x
})
seq {
for row in parentList () do
yield {
Category = row;
SubCategories = toTree fullList row
}
}

Related

How do access a value that belongs to a record from that record itself?

I am REALLY new to F#, so I might have used the wrong terminology here. Please feel free to correct me if I am wrong, I would really appreciate it! Anyways, on to the question
I have a record that I have defined as so:
type EventSource = {
SourceName: string
Address: string
ParseDocument: HtmlDocument -> Event seq }
And I have created an instance of that record like so:
let lofSource = {
SourceName = "LOF"
Address = "https://lof.dk/syd/kurser"
ParseDocument = fun document ->
document.Descendants ["div"]
|> Seq.filter (fun d -> d.HasClass("item"))
|> Seq.map (
fun e ->
let linkElement
= e.Descendants (fun j -> j.HasClass "item-title")
|> Seq.head
|> (fun y -> y.Descendants ["a"])
|> Seq.map (fun fa -> fa.Attribute "href")
|> Seq.head
{
Title = e.AttributeValue "data-holdnavn"
Link = linkElement.Value()
Status = e.AttributeValue "data-status"
Image = Address //Here!
City = e.AttributeValue "data-bynavn"
Date = System.DateTime.ParseExact(e.AttributeValue("data-datosort"), "yyyyMMdd", null);
PostalCode = e.AttributeValue("data-postnr")})
}
On the line where I am trying to assign a value the Image member, It tells me that the value or constructor 'Address' is not defined.
I have tried using a self-identifier on the instantiation of the record and then trying to access Address like
this.Address
but it tells me that 'this' is not defined. I am guessing I am missing something quite fundamental here, can anyone help me? Is what I am trying to do nonsensical?
You can't do this with records. See: Reference a record from within itself during construction
You can do it with another binding (I couldn't get your code to compile and have simplified it):
type EventSource = {
SourceName: string
Address: string
ParseDocument: string -> string}
let lofSource =
let helloThere = "General Kenobi"
{
SourceName = "LOF"
Address = foo
ParseDocument = fun document ->
foo
}

iterating JArray without for loop in F#

I don't want to use this for loop for iterating the JArray. Is there any other method which can replace this for loop?
let tablesInJson = jsonModel.["tables"] :?> JArray //Converting JOject into JArray
for table in tablesInJson do
let TableName = table.["name"] :?> JValue
let columns = table.["columns"] :?> JArray
for col in columns do
let name = col.["name"] :?> JValue
let types = col.["type"] :?> JValue
let length = col.["length"] :?> JValue
let Result_ = sqlTableInfos
|> List.tryFind (fun s -> s.TableName = TableName.ToString() && s.ColumnName = name.ToString())
if Result_ = Unchecked.defaultof<_> then
printfn "is null"
else
printfn "not null"
If you want to iterate over a collection and perform an imperative operation than using for loop is the idiomatic way of doing this in F# and you should just use that. After all, for is an F# language construct! There is a reason why it exists and the reason is that it lets you easily write code that iterates over a collection and does something for each element!
There are cases where for loop is not a good fit. For example, if you wanted to turn a collection of columns into a new collection with information about the tables. Then you could use Seq.map:
let tableInfos = columns |> Seq.map (fun col ->
let name = col.["name"] :?> JValue
let types = col.["type"] :?> JValue
let length = col.["length"] :?> JValue
let result = sqlTableInfos |> List.tryFind (fun s ->
s.TableName = TableName.ToString() && s.ColumnName = name.ToString())
if result = Unchecked.defaultof<_> then None
else Some result)
This looks like something you might be trying to do - but it is difficult to say. Your question does not say what is the problem that you are actually trying to solve.
Your example with printfn is probably misleading, because if you actually just want to print, then for loop is the best way of doing that.
You can use the Seq module to perform sequence-processing operations over the JArray. In your case, I think I would probably do this for the second for loop (over the columns), but not for the outer loop. The reason being, if you factor the code in the inner-loop out to a function, then you can use pipelining and partial application to clean up the code a bit:
open Newtonsoft.Json
open Newtonsoft.Json.Linq
type SqlTableInfo = {TableName: string; ColumnName: string}
let tablesInJson = JArray()
let sqlTableInfo = []
let tryFindColumn (tableName: JValue) (column: JToken) =
let columnName = column.["name"] |> unbox<JValue>
if sqlTableInfo |> List.exists (fun s -> s.TableName = tableName.ToString() && s.ColumnName = columnName.ToString())
then printfn "Table %A, Column %A Found" tableName columnName
else printfn "Table %A, Column %A Found" tableName columnName
for table in tablesInJson do
let tableName = table.["name"] |> unbox<JValue>
table.["columns"]
|> unbox<JArray>
|> Seq.iter (tryFindColumn tableName)

indentation with composed and piped functions

I'm trying to get this block of code right, getting it readable by splitting into different lines, can use some help here.
list |> Array.iter ( fun data -> data.Href |> ( regex.Match >> ( fun m ->
let result = {ArticleModule.defaultArticle with publicationId = m.Groups.[1].Value; entityType = m.Groups.[3].Value; entityName = m.Groups.[4].Value; version = m.Groups.[5].Value}
) ) )
Edit
A shortest form perhaps to help further on how to break this in separate lines.
list |> Array.iter ( fun data -> data.Href |> ( regex.Match >> ( fun m -> Console.WriteLine m ) ) )
Attempt 1
list |> Array.iter (fun data -> (data.Href)
|> regex.Match // squiggly lines here
|> (fun m -> Console.WriteLine m))
First, since you ask how to indent this code, I'll point you to https://github.com/dungpa/fantomas/blob/master/docs/FormattingConventions.md, which is an excellent reference that you should probably bookmark. 3/4 of the way down that page you will find https://github.com/dungpa/fantomas/blob/master/docs/FormattingConventions.md#pipeline-operators which suggests that sequences of pipeline operators should be indented with the pipeline directly under the data that is flowing through the pipeline:
let result =
data
|> step1
|> Array.filter (fun x -> x == "something")
|> step3
And so on.
Now, to apply this advice to your situation.
First, the squiggly lines in your attempt 1 are because on the regex.Match line, you're still inside the fun data -> ... lambda. All lines in F# should line up vertically with the thing they belong to (vague language because this is a general rule that applies to many situations). Here, that would look like:
list |> Array.iter (fun data -> data.Href
|> regex.Match
|> (fun m -> Console.WriteLine m))
Now, that looks kind of ugly to me. So I would split out the fun data -> ... lambda into its own function:
let handleOneItem data =
data.Href
|> regex.Match
|> (fun m -> Console.WriteLine m)
list |> Array.iter handleOneItem
Much nicer.
Now, let's look at your original code, where the final lambda in the pipeline was not calling Console.WriteLine, but was creating a record. There's one error in that code, which is that a let statement does nothing in and of itself. What you probably wanted to write was:
fun m ->
let result = {ArticleModule.defaultArticle with publicationId = m.Groups.[1].Value; entityType = m.Groups.[3].Value; entityName = m.Groups.[4].Value; version = m.Groups.[5].Value}
result
Which in turn can simply be turned into:
fun m ->
{ArticleModule.defaultArticle with publicationId = m.Groups.[1].Value; entityType = m.Groups.[3].Value; entityName = m.Groups.[4].Value; version = m.Groups.[5].Value}
And now, I would recommend taking that long record and splitting it across multiple lines (see https://github.com/dungpa/fantomas/blob/master/docs/FormattingConventions.md#records for details), like so:
fun m ->
{ ArticleModule.defaultArticle with
publicationId = m.Groups.[1].Value
entityType = m.Groups.[3].Value
entityName = m.Groups.[4].Value
version = m.Groups.[5].Value }
I'd probably make this its own named function:
let mkRecord m =
{ ArticleModule.defaultArticle with
publicationId = m.Groups.[1].Value
entityType = m.Groups.[3].Value
entityName = m.Groups.[4].Value
version = m.Groups.[5].Value }
Now let's look again at the full code:
let mkRecord m =
{ ArticleModule.defaultArticle with
publicationId = m.Groups.[1].Value
entityType = m.Groups.[3].Value
entityName = m.Groups.[4].Value
version = m.Groups.[5].Value }
let handleOneItem data =
data.Href
|> regex.Match
|> mkRecord
list |> Array.iter handleOneItem
There's just one more mistake here, which is that Array.iter is the wrong type. Array.iter wants you to hand it a function that returns unit (i.e., returns nothing meaningful). Any function that returns nothing meaningful is clearly being called for its side effects, not its return value. Since mkRecord returns a value and has no side effects, you want Array.map instead. So the final version of your code would be:
let mkRecord m =
{ ArticleModule.defaultArticle with
publicationId = m.Groups.[1].Value
entityType = m.Groups.[3].Value
entityName = m.Groups.[4].Value
version = m.Groups.[5].Value }
let handleOneItem data =
data.Href
|> regex.Match
|> mkRecord
list |> Array.map handleOneItem
There is nothing wrong with using a for loop. If what you do in the end is an imperative operation like printing to the console, then using for loop clearly indicates that this is what you do:
for data in list do
let m = regex.Match data.Href
Console.WriteLine m
From your other example, it looks like you are trying to use Array.map to create a new array. The answer from #rmunn covers this nicely, but again, note that you do not need to do everything using |>. It is often easier to use let binding:
list |> Array.map (fun data ->
let m = regex.Match data.Href
{ ArticleModule.defaultArticle with
publicationId = m.Groups.[1].Value
entityType = m.Groups.[3].Value
entityName = m.Groups.[4].Value
version = m.Groups.[5].Value })

Sorting indexes in list of lists - F#

Currently I have a function to return the first elements of each list (floats), within a list to a separate list.
let firstElements list =
match list with
| head::_ -> head
| [] -> 0.00
My question is, how do I expand this to return elements at the same index into different lists while I don't know how long this list is? For example
let biglist = [[1;2;3];[4;5;6];[7;8;9]]
If I did not know the length of this list, what is the most efficient and safest way to get
[[1;4;7];[2;5;8];[3;6;9]]
List.transpose has been added recently to FSharp.Core
let biglist = [[1;2;3];[4;5;6];[7;8;9]]
let res = biglist |> List.transpose
//val res : int list list = [[1; 4; 7]; [2; 5; 8]; [3; 6; 9]]
You can use the recent added List.transpose function. But it is always good to be good enough to create such functions yourself. If you want to solve the problem yourself, think of a general algorithm to solve your problem. One would be.
From the first element of each list you create a new list
You drop the first element of each list
If you end with empty lists you end, otherwise repeat at step 1)
This could be the first attempt to solve the Problem. Function names are made up, at this point.
let transpose lst =
if allEmpty lst
then // Some Default value, we don't know yet
else ...
The else branch looks like following. First we want to pick the first element of every element. We imagine a function pickFirsts that do this task. So we could write pickFirsts lst. The result is a list that itself is the first element of a new list.
The new list is the result of the remaining list. First we imagine again a function that drops the first element of every sub-list dropFirsts lst. On that list we need to repeat step 1). We do that by a recursive call to transpose.
Overall we get:
let rec transpose lst =
if allEmpty lst
then // Some Default value, we don't know yet
else (pickFirsts lst) :: (transpose (dropFirsts lst))
At this point we can think of the default value. transpose needs to return a value in the case it ends up with an empty list of empty lists. As we use the result of transpose to add an element to it. The results of it must be a list. And the best default value is an empty list. So we end up with.
let rec transpose lst =
if allEmpty lst
then []
else (pickFirsts lst) :: (transpose (dropFirsts lst))
Next we need to implement the remaining functions allEmpty, pickFirsts and dropFirsts.
pickFirst is easy. We need to iterate over each element, and must return the first value. We get the first value of a list by List.head, and iterating over it and turning every element into a new list is what List.map does.
let pickFirsts lst = List.map List.head lst
dropFirsts need to iterate ver each element, and just remove the first element, or in other words keeps the remaining/tail of a list.
let dropFirsts lst = List.map List.tail lst
The remaining allEmpty is a predicate that either return true/false if we have an empty list of lists or not. With a return value of bool, we need another function that allows to return another type is a list. This is usually the reason to use List.fold. An implementation could look like this:
let allEmpty lst =
let folder acc x =
match x with
| [] -> acc
| _ -> false
List.fold folder true lst
It starts with true as the default value. As long it finds empty lists it returns the default value unchanged. As soon there is one element found, in any list, it will return false (Not Empty) as the new default value.
The whole code:
let allEmpty lst =
let folder acc x =
match x with
| [] -> acc
| _ -> false
List.fold folder true lst
let pickFirsts lst = List.map List.head lst
let dropFirsts lst = List.map List.tail lst
let rec transpose lst =
if allEmpty lst
then []
else (pickFirsts lst) :: (transpose (dropFirsts lst))
transpose [[1;2;3];[4;5;6];[7;8;9]]
Another approach would be to turn it into a 2 dimensional mutable array. Also do length checkings. Do the transformation and return the mutable array again as an immutable list.

Functional way to add to Lists that are Class-Members

I want to sort items of a class and collect them in Collection-Classes that beside a List-Member also contain further information that are necessary for the sorting process.
The following example is a a very simplified example for my problem. Although it doesn't make sense, I hope it still can help to understand my Question.
type ItemType = Odd|Even //realworld: more than two types possible
type Item(number) =
member this.number = number
member this.Type = if (this.number % 2) = 0 then Even else Odd
type NumberTypeCollection(numberType:ItemType , ?items:List<Item>) =
member this.ItemType = numberType
member val items:List<Item> = defaultArg items List.empty<Item> with get,set
member this.append(item:Item) = this.items <- item::this.items
let addToCollection (collections:List<NumberTypeCollection>) (item:Item) =
let possibleItem =
collections
|> Seq.where (fun c -> c.ItemType = item.Type) //in my realworld code, several groups may be returned
|> Seq.tryFind(fun _ -> true)
match possibleItem with
|Some(f) -> f.append item
collections
|None -> NumberTypeCollection(item.Type, [item]) :: collections
let rec findTypes (collections:List<NumberTypeCollection>) (items:List<Item>) =
match items with
| [] -> collections
| h::t -> let newCollections = ( h|> addToCollection collections)
findTypes newCollections t
let items = [Item(1);Item(2);Item(3);Item(4)]
let finalCollections = findTypes List.empty<NumberTypeCollection> items
I'm unsatisfied with the addToCollection method, since it requires the items in NumberTypeCollection to be mutual. Maybe there are further issues.
What can be a proper functional solution to solve this issue?
Edit: I'm sorry. May code was too simplified. Here is a little more complex example that should hopefully illustrate why I chose the mutual class-member (although this could still be the wrong decision):
open System
type Origin = Afrika|Asia|Australia|Europa|NorthAmerika|SouthAmerica
type Person(income, taxrate, origin:Origin) =
member this.income = income
member this.taxrate = taxrate
member this.origin = origin
type PersonGroup(origin:Origin , ?persons:List<Person>) =
member this.origin = origin
member val persons:List<Person> = defaultArg persons List.empty<Person> with get,set
member this.append(person:Person) = this.persons <- person::this.persons
//just some calculations to group people into some subgroups
let isInGroup (person:Person) (personGroup:PersonGroup) =
let avgIncome =
personGroup.persons
|> Seq.map (fun p -> float(p.income * p.taxrate) / 100.0)
|> Seq.average
Math.Abs ( (avgIncome / float person.income) - 1.0 ) < 0.5
let addToGroup (personGroups:List<PersonGroup>) (person:Person) =
let possibleItem =
personGroups
|> Seq.where (fun p -> p.origin = person.origin)
|> Seq.where (isInGroup person)
|> Seq.tryFind(fun _ -> true)
match possibleItem with
|Some(f) -> f.append person
personGroups
|None -> PersonGroup(person.origin, [person]) :: personGroups
let rec findPersonGroups (persons:List<Person>) (personGroups:List<PersonGroup>) =
match persons with
| [] -> personGroups
| h::t -> let newGroup = ( h|> addToGroup personGroups)
findPersonGroups t newGroup
let persons = [Person(1000,20, Afrika);Person(1300,22,Afrika);Person(500,21,Afrika);Person(400,20,Afrika)]
let c = findPersonGroups persons List.empty<PersonGroup>
What I may need to emphasize: There can be several different groups with the same origin.
Tomas' solution using groupby is the optimal approach if you want to generate your collections only once, it's a simple and concise.
If you want to be able to add/remove items in a functional, referentially transparent style for this type of problem, I suggest you move away from seq and start using Map.
You have a setup which is fundamentally dictionary-like. You have a unique key and a value. The functional F# equivalent to a dictionary is a Map, it is an immutable data structure based on an AVL tree. You can insert, remove and search in O(log n) time. When you append/remove from the Map, the old Map is maintained and you receive a new Map.
Here is your code expressed in this style
type ItemType =
|Odd
|Even
type Item (number) =
member this.Number = number
member this.Type = if (this.Number % 2) = 0 then Even else Odd
type NumTypeCollection = {Items : Map<ItemType, Item list>}
/// Functions on NumTypeCollection
module NumberTypeCollection =
/// Create empty collection
let empty = {Items = Map.empty}
/// Append one item to the collection
let append (item : Item) numTypeCollection =
let key = item.Type
match Map.containsKey key numTypeCollection.Items with
|true ->
let value = numTypeCollection.Items |> Map.find key
let newItems =
numTypeCollection.Items
|> Map.remove key
|> Map.add key (item :: value) // append item
{Items = newItems }
|false -> {Items = numTypeCollection.Items |> Map.add key [item]}
/// Append a list of items to the collections
let appendList (item : Item list) numTypeCollection =
item |> List.fold (fun acc it -> append it acc) numTypeCollection
Then call it using:
let items = [Item(1);Item(2);Item(3);Item(4)]
let finalCollections = NumberTypeCollection.appendList items (NumberTypeCollection.empty)
If I understand your problem correctly, you're trying to group the items by their type. The easiest way to do that is to use the standard library function Seq.groupBy. The following should implement the same logic as your code:
items
|> Seq.groupBy (fun item -> item.Type)
|> Seq.map (fun (key, values) ->
NumberTypeCollection(key, List.ofSeq values))
Maybe there are further issues.
Probably. It's difficult to tell, since it's hard to detect the purpose of the OP code... still:
Why do you even need an Item class? Instead, you could simply have a itemType function:
let itemType i = if i % 2 = 0 then Even else Odd
This function is referentially transparent, which means that you can replace it with its value if you wish. That makes it as good as a property getter method, but now you've already saved yourself from introducing a new type.
Why define a NumberTypeCollection class? Why not a simple record?
type NumberTypeList = { ItemType : ItemType; Numbers : int list }
You can implement addToCollection like something like this:
let addToCollection collections i =
let candidate =
collections
|> Seq.filter (fun c -> c.ItemType = (itemType i))
|> Seq.tryHead
match candidate with
| Some x ->
let x' = { x with Numbers = i :: x.Numbers }
collections |> Seq.filter ((<>) x) |> Seq.append [x']
| None ->
collections |> Seq.append [{ ItemType = (itemType i); Numbers = [i] }]
Being immutable, it doesn't mutate the input collections, but instead returns a new sequence of NumberTypeList.
Also notice the use of Seq.tryHead instead of Seq.tryFind(fun _ -> true).
Still, if you're attempting to group items, then Tomas' suggestion of using Seq.groupBy is more appropriate.

Resources