How To Apply a Function to an Array of float Arrays? - f#

Let's suppose I have n arrays, where n is a variable (some number greater than 2, usually less than 10).
Each array has k elements.
I also have an array of length n that contains a set of weights that dictate how I would like to linearly combine all the arrays.
I am trying to create a high performance higher order function to combine these arrays in F#.
How can I do this, so that I get a function that takes an array of arrays (arrs is a sample), a weights array (weights), and then computed a weighted sum based on the weights?
let weights = [|.6;;.3;.1|]
let arrs = [| [|.0453;.065345;.07566;1.562;356.6|] ;
[|.0873;.075565;.07666;1.562222;3.66|] ;
[|.06753;.075675;.04566;1.452;3.4556|] |]
thanks for any ideas.

Here's one solution:
let combine weights arrs =
Array.map2 (fun w -> Array.map ((*) w)) weights arrs
|> Array.reduce (Array.map2 (+))
EDIT
Here's some (much needed) explanation of how this works. Logically, we want to do the following:
Apply each weight to its corresponding row.
Add together the weight-adjusted rows.
The two lines above do just that.
We use the Array.map2 function to combine corresponding weights and rows; the way that we combine them is to multiply each element in the row by the weight, which is accomplished via the inner Array.map.
Now we have an array of weighted rows and need to add them together. We can do this one step at a time by keeping a running sum, adding each array in turn. The way we sum two arrays pointwise is to use Array.map2 again, using (+) as the function for combining the elements from each. We wrap this in an Array.reduce to apply this addition function to each row in turn, starting with the first row.
Hopefully this is a reasonably elegant approach to the problem, though the point-free style admittedly makes it a bit tricky to follow. However, note that it's not especially performant; doing in-place updates rather than creating new arrays with each application of map, map2, and reduce would be more efficient. Unfortunately, the standard library doesn't contain nice analogues of these operations which work in-place. It would be relatively easy to create such analogues, though, and they could be used in almost exactly the same way as I've done here.

Something like this did it for me:
let weights = [|0.6;0.3;0.1|]
let arrs = [| [|0.0453;0.065345;0.07566;1.562;356.6|] ;
[|0.0873;0.075565;0.07666;1.562222;3.66|] ;
[|0.06753;0.075675;0.04566;1.452;3.4556|] |]
let applyWeight x y = x * y
let rotate (arr:'a[][]) =
Array.map (fun y -> (Array.map (fun x -> arr.[x].[y])) [|0..arr.Length - 1|]) [|0..arr.[0].Length - 1|]
let weightedarray = Array.map (fun x -> Array.map(applyWeight (fst x)) (snd x)) (Array.zip weights arrs)
let newarrs = Array.map Array.sum (rotate weightedarray)
printfn "%A" newarrs
By the way.. the 0 preceding a float value is necessary.

Related

How to combine two differently typed sequences into tuples in f#?

I'm a bit new to F#, I have mostly c# background.
I'm working with two lists/sequences that represent the same thing, but from different datasources (one is a local file, the other is a group of items in an online system.
I need to report the mismatches between the two datasets.
So far I have filtered down the two lists to contain only items that aren't have mismatches in the other dataset.
Now I want to pair them up into tuples (or anything else really) based on one of the properties so I can log the differences.
So far what I've tried is this:
let printDiff (offlinePackages: oP seq) (onLinePackages: onP seq) =
Seq.map2(fun x y -> (x, y)) offlinePackages onLinePackages
This does pair up my data, however I'd still need some logic to do the pairup based on their matching property value.
I'm looking for something like:
if offLinePackage.Label = OnlinePackage.Label then /do the matchup/
else /don't do anything/
I know I'm still stuck in my object oriented thinking, that's also why I'm asking.
Thanks in advance!
Matching sequence elements based on some equivalency function - that's called "join".
The most "straightforward" way to do a join is to get a Cartesian product and filter it down, like this:
let matchup seq1 seq2 =
Seq.allPairs seq1 seq2
|> Seq.filter (fun (x, y) -> x.someProp = y.someProp)
Or in a computation expression form:
let matchup seq1 seq2
seq {
for x in seq1 do
for y in seq2 do
if x.someProp = y.someProp then yield (x,y)
}
But this is a bit inefficient. The complexity would be O(n*m), because it iterates over all possible pairs. Fine if the lists are short, but will bite you the more you scale.
To do this more efficiently, you can use the query computation builder and its operation join, which does a hash-join (i.e. it builds a hashtable first and then matches the elements based on that):
let matchup seq1 seq2 =
query {
for x in seq1 do
join y in seq2 on (x.someProp = y.someProp)
select (x,y)
}

Apply several aggregate functions with one enumeration

Let's assume I have a series of functions that work on a sequence, and I want to use them together in the following fashion:
let meanAndStandardDeviation data =
let m = mean data
let sd = standardDeviation data
(m, sd)
The code above is going to enumerate the sequence twice. I am interested in a function that will give the same result but enumerate the sequence only once. This function will be something like this:
magicFunction (mean, standardDeviation) data
where the input is a tuple of functions and a sequence and the ouput is the same with the function above.
Is this possible if the functions mean and stadardDeviation are black boxes and I cannot change their implementation?
If I wrote mean and standardDeviation myself, is there a way to make them work together? Maybe somehow making them keep yielding the input to the next function and hand over the result when they are done?
The only way to do this using just a single iteration when the functions are black boxes is to use the Seq.cache function (which evaluates the sequence once and stores the results in memory) or to convert the sequence to other in-memory representation.
When a function takes seq<T> as an argument, you don't even have a guarantee that it will evaluate it just once - and usual implementations of standard deviation would first calculate the average and then iterate over the sequence again to calculate the squares of errors.
I'm not sure if you can calculate standard deviation with just a single pass. However, it is possible to do that if the functions are expressed using fold. For example, calculating maximum and average using two passes looks like this:
let maxv = Seq.fold max Int32.MinValue input
let minv = Seq.fold min Int32.MaxValue input
You can do that using a single pass like this:
Seq.fold (fun (s1, s2) v ->
(max s1 v, min s2 v)) (Int32.MinValue, Int32.MaxValue) input
The lambda function is a bit ugly, but you can define a combinator to compose two functions:
let par f g (i, j) v = (f i v, g j v)
Seq.fold (par max min) (Int32.MinValue, Int32.MaxValue) input
This approach works for functions that can be defined using fold, which means that they consist of some initial value (Int32.MinValue in the first example) and then some function that is used to update the initial (previous) state when it gets the next value (and then possibly some post-processing of the result). In general, it should be possible to rewrite single-pass functions in this style, but I'm not sure if this can be done for standard deviation. It can be definitely done for mean:
let (count, sum) = Seq.fold (fun (count, sum) v ->
(count + 1.0, sum + v)) (0.0, 0.0) input
let mean = sum / count
What we're talking about here is a function with the following signature:
(seq<'a> -> 'b) * (seq<'a> -> 'c) -> seq<'a> -> ('b * 'c)
There is no straightforward way that I can think of that will achieve the above using a single iteration of the sequence if that is the signature of the functions. Well, no way that is more efficient than:
let magicFunc (f1:seq<'a>->'b, f2:seq<'a>->'c) (s:seq<'a>) =
let cached = s |> Seq.cache
(f1 cached, f2 cached)
That ensures a single iteration of the sequence itself (perhaps there are side effects, or it's slow), but does so by essentially caching the results. The cache is still iterated another time. Is there anything wrong with that? What are you trying to achieve?

f# iterating over two arrays, using function from a c# library

I have a list of words and a list of associated part of speech tags. I want to iterate over both, simultaneously (matched index) using each indexed tuple as input to a .NET function. Is this the best way (it works, but doesn't feel natural to me):
let taggingModel = SeqLabeler.loadModel(lthPath +
"models\penn_00_18_split_dict.model");
let lemmatizer = new Lemmatizer(lthPath + "v_n_a.txt")
let input = "the rain in spain falls on the plain"
let words = Preprocessor.tokenizeSentence( input )
let tags = SeqLabeler.tagSentence( taggingModel, words )
let lemmas = Array.map2 (fun x y -> lemmatizer.lookup(x,y)) words tags
Your code looks quite good to me - most of it deals with some loading and initialization, so there isn't much you could do to simplify that part. Alternatively to Array.map2, you could use Seq.zip combined with Seq.map - the zip function combines two sequences into a single one that contains pairs of elements with matching indices:
let lemmas = Seq.zip words tags
|> Seq.map (fun (x, y) -> lemmatizer.lookup (x, y))
Since lookup function takes a tuple that you got as an argument, you could write:
// standard syntax using the pipelining operator
let lemmas = Seq.zip words tags |> Seq.map lemmatizer.lookup
// .. an alternative syntax doing exactly the same thing
let lemmas = (words, tags) ||> Seq.zip |> Seq.map lemmatizer.lookup
The ||> operator used in the second version takes a tuple containing two values and passes them to the function on the right side as two arguments, meaning that (a, b) ||> f means f a b. The |> operator takes only a single value on the left, so (a, b) |> f would mean f (a, b) (which would work if the function f expected tuple instead of two, space separated, parameters).
If you need lemmas to be an array at the end, you'll need to add Array.ofSeq to the end of the processing pipeline (all Seq functions work with sequences, which correspond to IEnumerable<T>)
One more alternative is to use sequence expressions (you can use [| .. |] to construct an array directly if that's what you need):
let lemmas = [| for wt in Seq.zip words tags do // wt is tuple (string * string)
yield lemmatizer.lookup wt |]
Whether to use sequence expressions or not - that's just a personal preference. The first option seems to be more succinct in this case, but sequence expressions may be more readable for people less familiar with things like partial function application (in the shorter version using Seq.map)

N-gram split function for string similarity comparison

As part of excersise to better understand F# which I am currently learning , I wrote function to
split given string into n-grams.
1) I would like to receive feedback about my function : can this be written simpler or in more efficient way?
2) My overall goal is to write function that returns string similarity (on 0.0 .. 1.0 scale) based on n-gram similarity; Does this approach works well for short strings comparisons , or can this method reliably be used to compare large strings (like articles for example).
3) I am aware of the fact that n-gram comparisons ignore context of two strings. What method would you suggest to accomplish my goal?
//s:string - target string to split into n-grams
//n:int - n-gram size to split string into
let ngram_split (s:string, n:int) =
let ngram_count = s.Length - (s.Length % n)
let ngram_list = List.init ngram_count (fun i ->
if( i + n >= s.Length ) then
s.Substring(i,s.Length - i) + String.init ((i + n) - s.Length)
(fun i -> "#")
else
s.Substring(i,n)
)
let ngram_array_unique = ngram_list
|> Seq.ofList
|> Seq.distinct
|> Array.ofSeq
//produce tuples of ngrams (ngram string,how much occurrences in original string)
Seq.init ngram_array_unique.Length (fun i -> (ngram_array_unique.[i],
ngram_list |> List.filter(fun item -> item = ngram_array_unique.[i])
|> List.length)
)
I don't know much about evaluating similarity of strings, so I can't give you much feedback regarding points 2 and 3. However, here are a few suggestions that may help to make your implementation simpler.
Many of the operations that you need to do are already available in some F# library function for working with sequences (lists, arrays, etc.). Strings are also sequences (of characters), so you can write the following:
open System
let ngramSplit n (s:string) =
let ngrams = Seq.windowed n s
let grouped = Seq.groupBy id ngrams
Seq.map (fun (ngram, occurrences) ->
String(ngram), Seq.length occurrences) grouped
The Seq.windowed function implements a sliding window, which is exactly what you need to extract the n-grams of your string. The Seq.groupBy function collects the elements of a sequence (n-grams) into a sequence of groups that contain values with the same key. We use id to calculate the key, which means that the n-gram is itself the key (and so we get groups, where each group contains the same n-grams). Then we just convert n-gram to string and count the number of elements in the group.
Alternatively, you can write the entire function as a single processing pipeline like this:
let ngramSplit n (s:string) =
s |> Seq.windowed n
|> Seq.groupBy id
|> Seq.map (fun (ngram, occurrences) ->
String(ngram), Seq.length occurrences)
Your code looks OK to me. Since ngram extraction and similarity comparison are used very often. You should consider some efficiency issues here.
The MapReduce pattern is very suitable for your frequency counting problem:
get a string, emit (word, 1) pairs out
do a grouping of the words and adds all the counting together.
let wordCntReducer (wseq: seq<int*int>) =
wseq
|> Seq.groupBy (fun (id,cnt) -> id)
|> Seq.map (fun (id, idseq) ->
(id, idseq |> Seq.sumBy(fun (id,cnt) -> cnt)))
(* test: wordCntReducer [1,1; 2,1; 1,1; 2,1; 2,2;] *)
You also need to maintain a <word,int> map during your ngram building for a set of strings. As it is much more efficient to handle integers rather than strings during later processing.
(2) to compare the distance between two short strings. A common practice is to use Edit Distance using a simple dynamic programming. To compute the similarity between articles, a state-of-the-art method is to use TFIDF feature representation. Actuallym the code above is for term frequency counting, extracted from my data mining library.
(3) There are complex NLP methods, e.g. tree kernels based on the parse tree, to in-cooperate the context information in.
I think you have some good answers for question (1).
Question (2):
You probably want cosine similarity to compare two arbitrary collections of n-grams (the larger better). This gives you a range of 0.0 - 1.0 without any scaling needed. The Wikipedia page gives an equation, and the F# translation is pretty straightforward:
let cos a b =
let dot = Seq.sum (Seq.map2 ( * ) a b)
let magnitude v = Math.Sqrt (Seq.sum (Seq.map2 ( * ) v v))
dot / (magnitude a * magnitude b)
For input, you need to run something like Tomas' answer to get two maps, then remove keys that only exist in one:
let values map = map |> Map.toSeq |> Seq.map snd
let desparse map1 map2 = Map.filter (fun k _ -> Map.containsKey k map2) map1
let distance textA textB =
let a = ngramSplit 3 textA |> Map.ofSeq
let b = ngramSplit 3 textB |> Map.ofSeq
let aValues = desparse a b |> values
let bValues = desparse b a |> values
cos aValues bValues
With character-based n-grams, I don't know how good your results will be. It depends on what kind of features of the text you are interested in. I do natural language processing, so usually my first step is part-of-speech tagging. Then I compare over n-grams of the parts of speech. I use T'n'T for this, but it has bizarro licencing issues. Some of my colleagues use ACOPOST instead, a Free alternative (as in beer AND freedom). I don't know how good the accuracy is, but POS tagging is a well-understood problem these days, at least for English and related languages.
Question (3):
The best way to compare two strings that are nearly identical is Levenshtein distance. I don't know if that is your case here, although you can relax the assumptions in a number of ways, eg for comparing DNA strings.
The standard book on this subject is Sankoff and Kruskal's "Time Warps, String Edits, and Maromolecules". It's pretty old (1983), but gives good examples of how to adapt the basic algorithm to a number of applications.
Question 3:
My reference book is Computing Patterns in Strings by Bill Smyth

How do I do convolution in F#?

I would like convolve a discrete signal with a discrete filter. The signal and filter is sequences of float in F#.
The only way I can figure out how to do it is with two nested for loops and a mutable array to store the result, but it does not feel very functional.
Here is how I would do it non-functional:
conv = double[len(signal) + len(filter) - 1]
for i = 1 to len(signal)
for j = 1 to len(filter)
conv[i + j] = conv[i + j] + signal(i) * filter(len(filter) - j)
I don't know F#, but I'll post some Haskell and hopefully it will be close enough to use. (I only have VS 2005 and an ancient version of F#, so I think it would be more confusing to post something that works on my machine)
Let me start by posting a Python implementation of your pseudocode to make sure I'm getting the right answer:
def convolve(signal, filter):
conv = [0 for _ in range(len(signal) + len(filter) - 1)]
for i in range(len(signal)):
for j in range(len(filter)):
conv[i + j] += signal[i] * filter[-j-1]
return conv
Now convolve([1,1,1], [1,2,3]) gives [3, 5, 6, 3, 1]. If this is wrong, please tell me.
The first thing we can do is turn the inner loop into a zipWith; we're essentially adding a series of rows in a special way, in the example above: [[3,2,1], [3,2,1], [3,2,1]]. To generate each row, we'll zip each i in the signal with the reversed filter:
makeRow filter i = zipWith (*) (repeat i) (reverse filter)
(Note: according to a quick google, zipWith is map2 in F#. You might have to use a list comprehension instead of repeat)
Now:
makeRow [1,2,3] 1
=> [3,2,1]
makeRow [1,2,3] 2
=> [6,4,2]
To get this for all i, we need to map over signal:
map (makeRow filter) signal
=> [[3,2,1], [3,2,1], [3,2,1]]
Good. Now we just need a way to combine the rows properly. We can do this by noticing that combining is adding the new row to the existing array, except for the first element, which is stuck on front. For example:
[[3,2,1], [6,4,2]] = 3 : [2 + 6, 1 + 4] ++ [2]
// or in F#
[[3; 2; 1]; [6; 4; 2]] = 3 :: [2 + 6; 1 + 4] # [2]
So we just need to write some code that does this in the general case:
combine (front:combinable) rest =
let (combinable',previous) = splitAt (length combinable) rest in
front : zipWith (+) combinable combinable' ++ previous
Now that we have a way to generate all the rows and a way to combine a new row with an existing array, all we have to do is stick the two together with a fold:
convolve signal filter = foldr1 combine (map (makeRow filter) signal)
convolve [1,1,1] [1,2,3]
=> [3,5,6,3,1]
So that's a functional version. I think it's reasonably clear, as long as you understand foldr and zipWith. But it's at least as long as the imperative version and like other commenters said, probably less efficient in F#. Here's the whole thing in one place.
makeRow filter i = zipWith (*) (repeat i) (reverse filter)
combine (front:combinable) rest =
front : zipWith (+) combinable combinable' ++ previous
where (combinable',previous) = splitAt (length combinable) rest
convolve signal filter = foldr1 combine (map (makeRow filter) signal)
Edit:
As promised, here is an F# version. This was written using a seriously ancient version (1.9.2.9) on VS2005, so be careful. Also I couldn't find splitAt in the standard library, but then I don't know F# that well.
open List
let gen value = map (fun _ -> value)
let splitAt n l =
let rec splitter n l acc =
match n,l with
| 0,_ -> rev acc,l
| _,[] -> rev acc,[]
| n,x::xs -> splitter (n - 1) xs (x :: acc)
splitter n l []
let makeRow filter i = map2 ( * ) (gen i filter) (rev filter)
let combine (front::combinable) rest =
let combinable',previous = splitAt (length combinable) rest
front :: map2 (+) combinable combinable' # previous
let convolve signal filter =
fold1_right combine (map (makeRow filter) signal)
Try this function:
let convolute signal filter =
[|0 .. Array.length signal + Array.length filter - 1|] |> Array.map (fun i ->
[|0 .. i|] |> Array.sum_by (fun j -> signal.[i] * filter.[Array.length filter - (i - j) - 1]))
It's probably not the nicest function solution, but it should do the job. I doubt there exists a purely functional solution that will match the imperative one for speed however.
Hope that helps.
Note: The function is currently untested (though I've confirmed it compiles). Let me know if it doesn't quite do what it should. Also, observe that the i and j variables do not refer to the same things as is your original post.
Indeed, you generally want to avoid loops (plain, nested, whatever) and anything mutable in functional programming.
There happens to be a very simple solution in F# (and probably almost every other functional language):
let convolution = Seq.zip seq1 seq2
The zip function simply combines the two sequences into one of pairs containing the element from seq1 and the element from seq2. As a note, there also exist similar zip functions for the List and Array modules, as well as variants for combining three lists into triples (zip3). If you want tom ore generally zip (or "convolute") n lists into a list of n-tuples, then you'll need to write your own function, but it's pretty straightforward.
(I've been going by this description of convolution by the way - tell me if you mean something else.)
In principle, it should be possible to use the (Fast) Fourier Transform, or the related (Discrete) Cosine Transform, to calculate the convolution of two functions reasonably efficiently. You calculate the FFT for both functions, multiply them, and apply the inverse FFT on the result.
mathematical background
That's the theory. In practice you'd probably best find a math library that implements it for you.

Resources