F# distinctBy behavior - f#

I don't understand behavior of distinctBy in this snippet:
let s = [123; 231; 321]
let s1 = s |> Seq.map (string >> Seq.sort)
let s2 = s |> Seq.distinctBy (string >> Seq.sort)
which produces:
s1 = seq [seq ['1'; '2'; '3']; seq ['1'; '2'; '3']; seq ['1'; '2'; '3']]
as expected, but:
s2 = seq [123; 231; 321]
where I expected only one element, because the 3 keys are identical. Which part I got wrong?

F# does not compare Sequences for equality see this example
(123 |> string |> Seq.sort) = (123 |> string |> Seq.sort)
val it : bool = false
I imagine this is to allow support for infinite sequences.
You can fix this by mapping to lists
let s = [123; 231; 321] |> Seq.distinctBy (string >> Seq.sort >> Seq.toList);;
val s : seq<int>
> s;;
val it : seq<int> = seq [123]

Seq.sort probably doesn't implement comparison logic. So the underlying implementation sees three distinct objects.
Similar if you did the following:
object.ReferenceEquals("1", 1.ToString());

Related

Which defferences "Seq" with "seq" ?

I'm worried when don't know when you can use "Seq" , "seq" . Can you tell me which defferences are ?
This's my code . Why dont't use "seq" ?
let s = ResizeArray<float>()
s.Add(1.1)
s.Add(2.2)
s.Add(3.3)
s.Add(4.4)
s |> Seq.iter (fun x -> printfn("%f") x )
Seq is a module that contains functions that work with seq values:
Seq.map string [ 1; 2 ]
Seq.sum [ 1; 2 ]
seq is a type name:
let f1 (xs : seq<int>) = ()
let f2 (xs : int seq) = ()
seq is also a function that converts something like a list into the type seq:
seq [ 1; 2 ]
seq { ... } is a computation expression:
seq { yield 1; yield 2 }
You use the uppercase Seq in all cases except in type annotation.
For example:
let (x:seq<int>) =
[1..10]
|> Seq.map (fun t -> t + 1)
Edit: Please refer to recommended answer, as my answer is incomplete.

F#: Testing whether two strings are anagrams

I am new to programming and F# is my first language.
Here is my code:
let areAnagrams (firstString: string) (secondString: string) =
let countCharacters (someString: string) =
someString.ToLower().ToCharArray() |> Array.toSeq
|> Seq.countBy (fun eachChar -> eachChar)
|> Seq.sortBy (snd >> (~-))
countCharacters firstString = countCharacters secondString
let testString1 = "Laity"
let testString2 = "Italy"
printfn "It is %b that %s and %s are anagrams." (areAnagrams testString1 testString2) (testString1) (testString2)
This is the output:
It is false that Laity and Italy are anagrams.
What went wrong? What changes should I make?
Your implementation of countCharacters sorts the tuples just using the second element (the number of occurrences for each character), but if there are multiple characters that appear the same number of times, then the order is not defined.
If you run the countCharacters function on your two samples, you can see the problem:
> countCharacters "Laity";;
val it : seq<char * int> = seq [('l', 1); ('a', 1); ('i', 1); ('t', 1); ...]
> countCharacters "Italy";;
val it : seq<char * int> = seq [('i', 1); ('t', 1); ('a', 1); ('l', 1); ...]
One solution is to just use Seq.sort and sort the tuples using both the letter code and the number of occurrences.
The other problem is that you are comparing two seq<_> values and this does not use structural comparison, so you'll need to turn the result into a list or an array (something that is fully evaluated):
let countCharacters (someString: string) =
someString.ToLower().ToCharArray()
|> Seq.countBy (fun eachChar -> eachChar)
|> Seq.sort
|> List.ofSeq
Note that you do not actually need Seq.countBy - because if you just sort all the characters, it will work equally well (the repeated characters will just be one after another). So you could use just:
let countCharacters (someString: string) =
someString.ToLower() |> Seq.sort |> List.ofSeq
Sorting the characters of the two strings gives you an easy solution but this could be a good example of recursion.
You can immediately exclude strings of different length.
You can also filter out all the occurrences of a char per iteration, by replacing them with an empty string.
let rec areAnagram (x:string) (y:string) =
if x.Lenght <> t.Lenght
then false else
if x.Lenght = 0
then true else
let reply = x.[0].ToString ()
areAnagram
(x.Replace (reply,""))
(y.Replace (reply,""))
The above should be faster than sorting for many use cases.
Anyway we can go further and transform it into a fast Integer Sorting without recursion and string replacements
let inline charToInt c =
int c - int '0'
let singlePassAnagram (x:string) =
let hash : int array = Array.zeroCreate 100
x |> Seq.iter (fun c->
hash.[charToInt c] <- (hash.[charToInt c]+1)
)
let areAnagramsFast
(x:string) (y:string) =
if x.Length <> y.Length
then false else
(singlePassAnagram x) =
(singlePassAnagram y)
Here is a fiddle

How do I write a ZipN-like function in F#?

I want to create a function with the signature seq<#seq<'a>> ->seq<seq<'a>> that acts like a Zip method taking a sequence of an arbitrary number of input sequences (instead of 2 or 3 as in Zip2 and Zip3) and returning a sequence of sequences instead of tuples as a result.
That is, given the following input:
[[1;2;3];
[4;5;6];
[7;8;9]]
it will return the result:
[[1;4;7];
[2;5;8];
[3;6;9]]
except with sequences instead of lists.
I am very new to F#, but I have created a function that does what I want, but I know it can be improved. It's not tail recursive and it seems like it could be simpler, but I don't know how yet. I also haven't found a good way to get the signature the way I want (accepting, e.g., an int list list as input) without a second function.
I know this could be implemented using enumerators directly, but I'm interested in doing it in a functional manner.
Here's my code:
let private Tail seq = Seq.skip 1 seq
let private HasLengthNoMoreThan n = Seq.skip n >> Seq.isEmpty
let rec ZipN_core = function
| seqs when seqs |> Seq.isEmpty -> Seq.empty
| seqs when seqs |> Seq.exists Seq.isEmpty -> Seq.empty
| seqs ->
let head = seqs |> Seq.map Seq.head
let tail = seqs |> Seq.map Tail |> ZipN_core
Seq.append (Seq.singleton head) tail
// Required to change the signature of the parameter from seq<seq<'a> to seq<#seq<'a>>
let ZipN seqs = seqs |> Seq.map (fun x -> x |> Seq.map (fun y -> y)) |> ZipN_core
let zipn items = items |> Matrix.Generic.ofSeq |> Matrix.Generic.transpose
Or, if you really want to write it yourself:
let zipn items =
let rec loop items =
seq {
match items with
| [] -> ()
| _ ->
match zipOne ([], []) items with
| Some(xs, rest) ->
yield xs
yield! loop rest
| None -> ()
}
and zipOne (acc, rest) = function
| [] -> Some(List.rev acc, List.rev rest)
| []::_ -> None
| (x::xs)::ys -> zipOne (x::acc, xs::rest) ys
loop items
Since this seems to be the canonical answer for writing a zipn in f#, I wanted to add a "pure" seq solution that preserves laziness and doesn't force us to load our full source sequences in memory at once like the Matrix.transpose function. There are scenarios where this is very important because it's a) faster and b) works with sequences that contain 100s of MBs of data!
This is probably the most un-idiomatic f# code I've written in a while but it gets the job done (and hey, why would there be sequence expressions in f# if you couldn't use them for writing procedural code in a functional language).
let seqdata = seq {
yield Seq.ofList [ 1; 2; 3 ]
yield Seq.ofList [ 4; 5; 6 ]
yield Seq.ofList [ 7; 8; 9 ]
}
let zipnSeq (src:seq<seq<'a>>) = seq {
let enumerators = src |> Seq.map (fun x -> x.GetEnumerator()) |> Seq.toArray
if (enumerators.Length > 0) then
try
while(enumerators |> Array.forall(fun x -> x.MoveNext())) do
yield enumerators |> Array.map( fun x -> x.Current)
finally
enumerators |> Array.iter (fun x -> x.Dispose())
}
zipnSeq seqdata |> Seq.toArray
val it : int [] [] = [|[|1; 4; 7|]; [|2; 5; 8|]; [|3; 6; 9|]|]
By the way, the traditional matrix transpose is much more terse than #Daniel's answer. Though, it requires a list or LazyList that both will eventually have the full sequence in memory.
let rec transpose =
function
| (_ :: _) :: _ as M -> List.map List.head M :: transpose (List.map List.tail M)
| _ -> []
To handle having sub-lists of different lengths, I've used option types to spot if we've run out of elements.
let split = function
| [] -> None, []
| h::t -> Some(h), t
let rec zipN listOfLists =
seq { let splitted = listOfLists |> List.map split
let anyMore = splitted |> Seq.exists (fun (f, _) -> f.IsSome)
if anyMore then
yield splitted |> List.map fst
let rest = splitted |> List.map snd
yield! rest |> zipN }
This would map
let ll = [ [ 1; 2; 3 ];
[ 4; 5; 6 ];
[ 7; 8; 9 ] ]
to
seq
[seq [Some 1; Some 4; Some 7]; seq [Some 2; Some 5; Some 8];
seq [Some 3; Some 6; Some 9]]
and
let ll = [ [ 1; 2; 3 ];
[ 4; 5; 6 ];
[ 7; 8 ] ]
to
seq
[seq [Some 1; Some 4; Some 7]; seq [Some 2; Some 5; Some 8];
seq [Some 3; Some 6; null]]
This takes a different approach to yours, but avoids using some of the operations that you had before (e.g. Seq.skip, Seq.append), which you should be careful with.
I realize that this answer is not very efficient, but I do like its succinctness:
[[1;2;3]; [4;5;6]; [7;8;9]]
|> Seq.collect Seq.indexed
|> Seq.groupBy fst
|> Seq.map (snd >> Seq.map snd);;
Another option:
let zipN ls =
let rec loop (a,b) =
match b with
|l when List.head l = [] -> a
|l ->
let x1,x2 =
(([],[]),l)
||> List.fold (fun acc elem ->
match acc,elem with
|(ah,at),eh::et -> ah#[eh],at#[et]
|_ -> acc)
loop (a#[x1],x2)
loop ([],ls)

Split seq in F#

I should split seq<a> into seq<seq<a>> by an attribute of the elements. If this attribute equals by a given value it must be 'splitted' at that point. How can I do that in FSharp?
It should be nice to pass a 'function' to it that returns a bool if must be splitted at that item or no.
Sample:
Input sequence: seq: {1,2,3,4,1,5,6,7,1,9}
It should be splitted at every items when it equals 1, so the result should be:
seq
{
seq{1,2,3,4}
seq{1,5,6,7}
seq{1,9}
}
All you're really doing is grouping--creating a new group each time a value is encountered.
let splitBy f input =
let i = ref 0
input
|> Seq.map (fun x ->
if f x then incr i
!i, x)
|> Seq.groupBy fst
|> Seq.map (fun (_, b) -> Seq.map snd b)
Example
let items = seq [1;2;3;4;1;5;6;7;1;9]
items |> splitBy ((=) 1)
Again, shorter, with Stephen's nice improvements:
let splitBy f input =
let i = ref 0
input
|> Seq.groupBy (fun x ->
if f x then incr i
!i)
|> Seq.map snd
Unfortunately, writing functions that work with sequences (the seq<'T> type) is a bit difficult. They do not nicely work with functional concepts like pattern matching on lists. Instead, you have to use the GetEnumerator method and the resulting IEnumerator<'T> type. This often makes the code quite imperative. In this case, I'd write the following:
let splitUsing special (input:seq<_>) = seq {
use en = input.GetEnumerator()
let finished = ref false
let start = ref true
let rec taking () = seq {
if not (en.MoveNext()) then finished := true
elif en.Current = special then start := true
else
yield en.Current
yield! taking() }
yield taking()
while not (!finished) do
yield Seq.concat [ Seq.singleton special; taking()] }
I wouldn't recommend using the functional style (e.g. using Seq.skip and Seq.head), because this is quite inefficient - it creates a chain of sequences that take value from other sequence and just return it (so there is usually O(N^2) complexity).
Alternatively, you could write this using a computation builder for working with IEnumerator<'T>, but that's not standard. You can find it here, if you want to play with it.
The following is an impure implementation but yields immutable sequences lazily:
let unflatten f s = seq {
let buffer = ResizeArray()
let flush() = seq {
if buffer.Count > 0 then
yield Seq.readonly (buffer.ToArray())
buffer.Clear() }
for item in s do
if f item then yield! flush()
buffer.Add(item)
yield! flush() }
f is the function used to test whether an element should be a split point:
[1;2;3;4;1;5;6;7;1;9] |> unflatten (fun item -> item = 1)
Probably no the most efficient solution, but this works:
let takeAndSkipWhile f s = Seq.takeWhile f s, Seq.skipWhile f s
let takeAndSkipUntil f = takeAndSkipWhile (f >> not)
let rec splitOn f s =
if Seq.isEmpty s then
Seq.empty
else
let pre, post =
if f (Seq.head s) then
takeAndSkipUntil f (Seq.skip 1 s)
|> fun (a, b) ->
Seq.append [Seq.head s] a, b
else
takeAndSkipUntil f s
if Seq.isEmpty pre then
Seq.singleton post
else
Seq.append [pre] (splitOn f post)
splitOn ((=) 1) [1;2;3;4;1;5;6;7;1;9] // int list is compatible with seq<int>
The type of splitOn is ('a -> bool) -> seq<'a> -> seq>. I haven't tested it on many inputs, but it seems to work.
In case you are looking for something which actually works like split as an string split (i.e the item is not included on which the predicate returns true) the below is what I came up with.. tried to be as functional as possible :)
let fromEnum (input : 'a IEnumerator) =
seq {
while input.MoveNext() do
yield input.Current
}
let getMore (input : 'a IEnumerator) =
if input.MoveNext() = false then None
else Some ((input |> fromEnum) |> Seq.append [input.Current])
let splitBy (f : 'a -> bool) (input : 'a seq) =
use s = input.GetEnumerator()
let rec loop (acc : 'a seq seq) =
match s |> getMore with
| None -> acc
| Some x ->[x |> Seq.takeWhile (f >> not) |> Seq.toList |> List.toSeq]
|> Seq.append acc
|> loop
loop Seq.empty |> Seq.filter (Seq.isEmpty >> not)
seq [1;2;3;4;1;5;6;7;1;9;5;5;1]
|> splitBy ( (=) 1) |> printfn "%A"

F# Basics: Folding 2 lists together into a string

a little rusty from my Scheme days, I'd like to take 2 lists: one of numbers and one of strings, and fold them together into a single string where each pair is written like "{(ushort)5, "bla bla bla"},\n". I have most of it, i'm just not sure how to write the Fold properly:
let splitter = [|","|]
let indexes =
indexStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let values =
valueStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let pairs = List.zip indexes values
printfn "%A" pairs
let result = pairs |> Seq.fold
(fun acc a -> String.Format("{0}, \{(ushort){1}, \"{2}\"\}\n",
acc, (List.nth a 0), (List.nth a 1)))
Your missing two things. The initial state of the fold which is an empty string and you can't use list comprehension on tuples in F#.
let splitter = [|","|]
let indexes =
indexStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let values =
valueStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let pairs = List.zip indexes values
printfn "%A" pairs
let result =
pairs
|> Seq.fold (fun acc (index, value) ->
String.Format("{0}{{(ushort){1}, \"{2}\"}},\n", acc, index, value)) ""
fold2 version
let result =
List.fold2
(fun acc index value ->
String.Format("{0}{{(ushort){1}, \"{2}\"}},\n", acc, index, value))
""
indexes
values
If you are concerned with speed you may want to use string builder since it doesn't create a new string every time you append.
let result =
List.fold2
(fun (sb:StringBuilder) index value ->
sb.AppendFormat("{{(ushort){0}, \"{1}\"}},\n", index, value))
(StringBuilder())
indexes
values
|> string
Fold probably isn't the best method for this task. Its a lot easier to map and concat like this:
let l1 = "a,b,c,d,e".Split([|','|])
let l2 = "1,2,3,4,5".Split([|','|])
let pairs =
Seq.zip l1 l2
|> Seq.map (fun (x, y) -> sprintf "(ushort)%s, \"%s\"" x y)
|> String.concat "\n"
I think you want List.fold2. For some reason the List module has a fold2 member but Seq doesn't. Then you can dispense with the zip entirely.
The types of your named variables and the type of the result you hope for are all implicit, so it's difficult to help, but if you are trying to accumulate a list of strings you might consider something along the lines of
let result = pairs |> Seq.fold
(fun prev (l, r) ->
String.Format("{0}, \{(ushort){1}, \"{2}\"\}\n", prev, l, r)
"" pairs
My F#/Caml is very rusty so I may have the order of arguments wrong. Also note your string formation is quadratic; in my own code I would go with something more along these lines:
let strings =
List.fold2 (fun ss l r ->
String.format ("\{(ushort){0}, \"{1}\"\}\n", l, r) :: ss)
[] indexes values
let result = String.concat ", " strings
This won't cost you quadratic time and it's a little easier to follow. I've checked MSDN and believe I have the correct order of arguments on fold2.
Keep in mind I know Caml not F# and so I may have details or order of arguments wrong.
Perhaps this:
let strBuilder = new StringBuilder()
for (i,v) in Seq.zip indexes values do
strBuilder.Append(String.Format("{{(ushort){0}, \"{1}\"}},\n", i,v))
|> ignore
with F# sometimes is better go imperative...
map2 or fold2 is the right way to go. Here's my take, using the (||>) operator:
let l1 = [| "a"; "b"; "c"; "d"; "e" |]
let l2 = [| "1"; "2"; "3"; "4"; "5" |]
let pairs = (l1, l2) ||> Seq.map2 (sprintf ("(ushort)%s, \"%s\""))
|> String.concat "\n"

Resources