I run some organic chemistry models. A model is described by a generated ModelData.fs file, e.g.: https://github.com/kkkmail/ClmFSharp/blob/master/Clm/Model/ModelData.fs . The file has a very simple structure and using a generated model file is the only way that it can possibly work.
The referenced file is just for tests, but the real models are huge and may go close to 60 - 70 MB / 1.5M LOC. When I try to compile such files, F# compiler,fsc.exe, just hangs up and never comes back. It "eats" about 1.5 GB of memory and then does something forever at near 100% processing capacity. It can clearly handle smaller models, which take about 10 MB in under about a minute. So somewhere between 10 MB and 70 MB something breaks down badly in fsc.
I wonder if there are some parameter tweaks that I could make to the way the fsc compiles the project in order to make it capable of handling such huge models.
The huge models that I am referring to have one parameter set as follows: let numberOfSubstances = 65643. This results in various generated arrays of that size. I wonder if this could be the source of the problem.
Thanks a lot!
I don't think you need to autogenerate all of that.
From your comments, I understand that the functions d0, d1, ... are generated from a big sparse matrix in a way that sums up all of the input array x (with coefficients), but crucially skips summing up zero coefficients, which gives you a great performance gain, because the matrix is huge. Would that be a correct assessment?
If so, I still don't think you need to generate code to do that.
Let's take a look. I will assume that your giant sparse matrix has an interface for obtaining cell values, and it looks something like this:
let getMatrixCell (i: int) (j: int) : double
let maxI: int
let maxJ: int
Then your autogeneration code might look something like this:
let generateDFunction (i: int) =
printfn "let d%d (x: double[]) =" i
printfn " [|"
for j in 0..maxJ do
let cell = getMatrixCell i j
if cell <> 0 then
printfn " %f * x.[%d]" cell j
printfn " |]"
printfn " |> Array.sum"
Which would result in something like this:
let d25 (x : array<double>) =
[|
-1.0 * x.[25]
1.0 * x.[3]
|]
|> Array.sum
Note that I am simplifying here: in your example file, it looks like the functions also multiply negative coefficients by x.[i]. But maybe I'm also overcomplicating, because it looks like all the coefficients are always either 1 or -1. But that is all nonessential to my point.
Now, in the comments, it has been proposed that you don't generate functions d0, d1, ... but instead work directly with the matrix. For example, this would be a naive implementation of such suggestion:
let calculateDFunction (i: int) (x: double[]) =
[| for j in 0..maxJ -> (getMatrixCell i j) * x.[j] |] |> Array.sum
You then argued that this solution would be prohibitively slow, because it always iterates over the whole array x, which is huge, but most of the coefficients are zero, so it doesn't have to.
And then your way of solving this issue was to use an intermediate step of generated code: you generate the functions that only touch non-zero indicies, and then you compile and use those functions.
But here's the point: yes, you do need that intermediate step to get rid of non-zero indicies, but it doesn't have to be generated-and-compiled code!
Instead, you can prepare lists/arrays of non-zero indicies ahead of time:
let indicies =
[| for i in 0..maxI ->
[ for j in 0..maxJ do
let cell = getMatrixCell i j
if cell <> 0 then yield (j, cell)
]
|]
This will yield an array indicies : Array<int list>, where each index k corresponds to your autogenerated function dk, and it contains a list of non-zero matrix indicies together with their values in the matrix. For example, the function d22 I gave above would be represented by the 22nd element of indicies:
indicies.[22] = [ (25, -1.0), (3, 1.0) ]
Based on this intermediate structure, you can then calculate any function dk:
let calculateDFunction (k: int) (x: double[]) =
[| for (j, coeff) in indicies.[k] -> coeff * x.[j] |] |> Array.sum
In fact, if performance is crucial to you (as it seems to be from the comments), you probably should do away with all those intermediate arrays: hundreds or thousands heap allocations on each iteration is definitely not helping. You can sum with a mutable variable instead:
let calculateDFunction (k: int) (x: double[]) =
let sum = 0.0
for (j, coeff) in indicies.[k] do
sum <- sum + coeff * x.[j]
sum
Related
I am teaching myself a bit of F# by doing a bit of simple matrix mathematics. I decided to write a set of simple functions for combining two matrices as I thought that this would be a good way of learning list comprehensions. However when I compile it my unit tests produce a type mismatch exception.
//return a column from the matrix as a list
let getColumn(matrix: list<list<double>>, column:int) =
[for row in matrix do yield row.Item(column)]
//return a row from the matrix as a list
let getRow(matrix: list<list<double>>, column:int) =
matrix.Item(column)
//find the minimum width of the matrices in order to avoid index out of range exceptions
let minWidth(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let width1 = [for row in matrix1 do yield row.Length] |> List.min
let width2 = [for row in matrix2 do yield row.Length] |> List.min
if width1 > width2 then width2 else width1
//find the minimum height of the matrices in order to avoid index out of range exceptions
let minHeight(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let height1 = matrix1.Length
let height2 = matrix2.Length
if height1 > height2 then height2 else height1
//combine the two matrices
let concat(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let width = minWidth(matrix1, matrix2)
let height = minHeight(matrix1, matrix2)
[for y in 0 .. height do yield [for x in 0 .. width do yield (List.fold2 (fun acc a b -> acc + (a*b)), getRow(matrix1, y), getColumn(matrix2, x))]]
I was expecting the function to return a list of lists of type
double list list
However what it actually returns looks more like some kind of lambda expression
((int -> int list -> int list -> int) * double list * double list) list list
Can somebody tell me what is being returned, and how to force it to be evaluated into the list of lists that I originally expected?
There's a short answer and a long answer to your question.
The short answer
The short version is that F# functions (like List.fold2) take multiple parameters not with commas the way you think they do, but with spaces in between. I.e., you should NOT call List.fold2 like this:
List.fold2 (function, list1, list2)
but rather like this:
List.fold2 function list1 list2
Now, if you just remove the commas in your List.fold2 call, you'll see that the compiler complains about your getRow(matrix1, y) call, and tells you to put parentheses around them. (And the outer pair of parentheses around List.fold2 isn't actually needed). So this:
(List.fold2 (fun acc a b -> acc + (a*b)), getRow(matrix1, y), getColumn(matrix2, x))
Needs to turn into this:
List.fold2 (fun acc a b -> acc + (a*b)) (getRow(matrix1, y)) (getColumn(matrix2, x))
The long answer
The way F# functions take multiple parameters is actually very different from most other languages such as C#. In fact, all F# functions take exactly one parameter! "But wait," you're probably thinking right now, "you just now showed me the syntax for F# functions taking multiple parameters!" Yes, I did. What's going on under the hood is a combination of currying and partial application. I'd write a long explanation, but Scott Wlaschin has already written one, that's much better than I could have written, so I'll just point you to the https://fsharpforfunandprofit.com/series/thinking-functionally.html series to help you understand what's going on here. (The sections on currying and partial application are the ones you want, but I'd recommend reading the series in order because the later parts build on concepts introduced in earlier parts).
And yes, this "long" answer appears shorter than the "short" answer, but if you go read that series (and then the rest of Scott Wlaschin's excellent site), you'll find that it's much longer than the short answer. :-)
If you have more questions, I'll be happy to try to explain.
Is it possible to generate data, specifically a list, with fscheck for use outside of fscheck? I'm unable to debug a situation in fscheck testing where it looks like the comparison results are equal, but fscheck says they are not.
I have this generator for a list of objects. How do I generate a list I can use from this generator?
let genListObj min max = Gen.listOf Arb.generate<obj> |> Gen.suchThat (fun l -> (l.Length >= min) && (l.Length <= max))
Edit: this function is now part of the FsCheck API (Gen.sample) so you don't need the below anymore...
Here is a sample function to generate n samples from a given generator:
let sample n gn =
let rec sample i seed samples =
if i = 0 then samples
else sample (i-1) (Random.stdSplit seed |> snd) (Gen.eval 1000 seed gn :: samples)
sample n (Random.newSeed()) []
Edit: the 1000 magic number in there represents the size of the generated values. 1000 is pretty big - e.g. sequences will be between 0 and 1000 elements long, and so will strings, for example. If generation takes a long time, you may want to tweak that value (or take it in as a parameter of the function).
Let's assume I have a series of functions that work on a sequence, and I want to use them together in the following fashion:
let meanAndStandardDeviation data =
let m = mean data
let sd = standardDeviation data
(m, sd)
The code above is going to enumerate the sequence twice. I am interested in a function that will give the same result but enumerate the sequence only once. This function will be something like this:
magicFunction (mean, standardDeviation) data
where the input is a tuple of functions and a sequence and the ouput is the same with the function above.
Is this possible if the functions mean and stadardDeviation are black boxes and I cannot change their implementation?
If I wrote mean and standardDeviation myself, is there a way to make them work together? Maybe somehow making them keep yielding the input to the next function and hand over the result when they are done?
The only way to do this using just a single iteration when the functions are black boxes is to use the Seq.cache function (which evaluates the sequence once and stores the results in memory) or to convert the sequence to other in-memory representation.
When a function takes seq<T> as an argument, you don't even have a guarantee that it will evaluate it just once - and usual implementations of standard deviation would first calculate the average and then iterate over the sequence again to calculate the squares of errors.
I'm not sure if you can calculate standard deviation with just a single pass. However, it is possible to do that if the functions are expressed using fold. For example, calculating maximum and average using two passes looks like this:
let maxv = Seq.fold max Int32.MinValue input
let minv = Seq.fold min Int32.MaxValue input
You can do that using a single pass like this:
Seq.fold (fun (s1, s2) v ->
(max s1 v, min s2 v)) (Int32.MinValue, Int32.MaxValue) input
The lambda function is a bit ugly, but you can define a combinator to compose two functions:
let par f g (i, j) v = (f i v, g j v)
Seq.fold (par max min) (Int32.MinValue, Int32.MaxValue) input
This approach works for functions that can be defined using fold, which means that they consist of some initial value (Int32.MinValue in the first example) and then some function that is used to update the initial (previous) state when it gets the next value (and then possibly some post-processing of the result). In general, it should be possible to rewrite single-pass functions in this style, but I'm not sure if this can be done for standard deviation. It can be definitely done for mean:
let (count, sum) = Seq.fold (fun (count, sum) v ->
(count + 1.0, sum + v)) (0.0, 0.0) input
let mean = sum / count
What we're talking about here is a function with the following signature:
(seq<'a> -> 'b) * (seq<'a> -> 'c) -> seq<'a> -> ('b * 'c)
There is no straightforward way that I can think of that will achieve the above using a single iteration of the sequence if that is the signature of the functions. Well, no way that is more efficient than:
let magicFunc (f1:seq<'a>->'b, f2:seq<'a>->'c) (s:seq<'a>) =
let cached = s |> Seq.cache
(f1 cached, f2 cached)
That ensures a single iteration of the sequence itself (perhaps there are side effects, or it's slow), but does so by essentially caching the results. The cache is still iterated another time. Is there anything wrong with that? What are you trying to achieve?
Let's suppose I have n arrays, where n is a variable (some number greater than 2, usually less than 10).
Each array has k elements.
I also have an array of length n that contains a set of weights that dictate how I would like to linearly combine all the arrays.
I am trying to create a high performance higher order function to combine these arrays in F#.
How can I do this, so that I get a function that takes an array of arrays (arrs is a sample), a weights array (weights), and then computed a weighted sum based on the weights?
let weights = [|.6;;.3;.1|]
let arrs = [| [|.0453;.065345;.07566;1.562;356.6|] ;
[|.0873;.075565;.07666;1.562222;3.66|] ;
[|.06753;.075675;.04566;1.452;3.4556|] |]
thanks for any ideas.
Here's one solution:
let combine weights arrs =
Array.map2 (fun w -> Array.map ((*) w)) weights arrs
|> Array.reduce (Array.map2 (+))
EDIT
Here's some (much needed) explanation of how this works. Logically, we want to do the following:
Apply each weight to its corresponding row.
Add together the weight-adjusted rows.
The two lines above do just that.
We use the Array.map2 function to combine corresponding weights and rows; the way that we combine them is to multiply each element in the row by the weight, which is accomplished via the inner Array.map.
Now we have an array of weighted rows and need to add them together. We can do this one step at a time by keeping a running sum, adding each array in turn. The way we sum two arrays pointwise is to use Array.map2 again, using (+) as the function for combining the elements from each. We wrap this in an Array.reduce to apply this addition function to each row in turn, starting with the first row.
Hopefully this is a reasonably elegant approach to the problem, though the point-free style admittedly makes it a bit tricky to follow. However, note that it's not especially performant; doing in-place updates rather than creating new arrays with each application of map, map2, and reduce would be more efficient. Unfortunately, the standard library doesn't contain nice analogues of these operations which work in-place. It would be relatively easy to create such analogues, though, and they could be used in almost exactly the same way as I've done here.
Something like this did it for me:
let weights = [|0.6;0.3;0.1|]
let arrs = [| [|0.0453;0.065345;0.07566;1.562;356.6|] ;
[|0.0873;0.075565;0.07666;1.562222;3.66|] ;
[|0.06753;0.075675;0.04566;1.452;3.4556|] |]
let applyWeight x y = x * y
let rotate (arr:'a[][]) =
Array.map (fun y -> (Array.map (fun x -> arr.[x].[y])) [|0..arr.Length - 1|]) [|0..arr.[0].Length - 1|]
let weightedarray = Array.map (fun x -> Array.map(applyWeight (fst x)) (snd x)) (Array.zip weights arrs)
let newarrs = Array.map Array.sum (rotate weightedarray)
printfn "%A" newarrs
By the way.. the 0 preceding a float value is necessary.
I would like convolve a discrete signal with a discrete filter. The signal and filter is sequences of float in F#.
The only way I can figure out how to do it is with two nested for loops and a mutable array to store the result, but it does not feel very functional.
Here is how I would do it non-functional:
conv = double[len(signal) + len(filter) - 1]
for i = 1 to len(signal)
for j = 1 to len(filter)
conv[i + j] = conv[i + j] + signal(i) * filter(len(filter) - j)
I don't know F#, but I'll post some Haskell and hopefully it will be close enough to use. (I only have VS 2005 and an ancient version of F#, so I think it would be more confusing to post something that works on my machine)
Let me start by posting a Python implementation of your pseudocode to make sure I'm getting the right answer:
def convolve(signal, filter):
conv = [0 for _ in range(len(signal) + len(filter) - 1)]
for i in range(len(signal)):
for j in range(len(filter)):
conv[i + j] += signal[i] * filter[-j-1]
return conv
Now convolve([1,1,1], [1,2,3]) gives [3, 5, 6, 3, 1]. If this is wrong, please tell me.
The first thing we can do is turn the inner loop into a zipWith; we're essentially adding a series of rows in a special way, in the example above: [[3,2,1], [3,2,1], [3,2,1]]. To generate each row, we'll zip each i in the signal with the reversed filter:
makeRow filter i = zipWith (*) (repeat i) (reverse filter)
(Note: according to a quick google, zipWith is map2 in F#. You might have to use a list comprehension instead of repeat)
Now:
makeRow [1,2,3] 1
=> [3,2,1]
makeRow [1,2,3] 2
=> [6,4,2]
To get this for all i, we need to map over signal:
map (makeRow filter) signal
=> [[3,2,1], [3,2,1], [3,2,1]]
Good. Now we just need a way to combine the rows properly. We can do this by noticing that combining is adding the new row to the existing array, except for the first element, which is stuck on front. For example:
[[3,2,1], [6,4,2]] = 3 : [2 + 6, 1 + 4] ++ [2]
// or in F#
[[3; 2; 1]; [6; 4; 2]] = 3 :: [2 + 6; 1 + 4] # [2]
So we just need to write some code that does this in the general case:
combine (front:combinable) rest =
let (combinable',previous) = splitAt (length combinable) rest in
front : zipWith (+) combinable combinable' ++ previous
Now that we have a way to generate all the rows and a way to combine a new row with an existing array, all we have to do is stick the two together with a fold:
convolve signal filter = foldr1 combine (map (makeRow filter) signal)
convolve [1,1,1] [1,2,3]
=> [3,5,6,3,1]
So that's a functional version. I think it's reasonably clear, as long as you understand foldr and zipWith. But it's at least as long as the imperative version and like other commenters said, probably less efficient in F#. Here's the whole thing in one place.
makeRow filter i = zipWith (*) (repeat i) (reverse filter)
combine (front:combinable) rest =
front : zipWith (+) combinable combinable' ++ previous
where (combinable',previous) = splitAt (length combinable) rest
convolve signal filter = foldr1 combine (map (makeRow filter) signal)
Edit:
As promised, here is an F# version. This was written using a seriously ancient version (1.9.2.9) on VS2005, so be careful. Also I couldn't find splitAt in the standard library, but then I don't know F# that well.
open List
let gen value = map (fun _ -> value)
let splitAt n l =
let rec splitter n l acc =
match n,l with
| 0,_ -> rev acc,l
| _,[] -> rev acc,[]
| n,x::xs -> splitter (n - 1) xs (x :: acc)
splitter n l []
let makeRow filter i = map2 ( * ) (gen i filter) (rev filter)
let combine (front::combinable) rest =
let combinable',previous = splitAt (length combinable) rest
front :: map2 (+) combinable combinable' # previous
let convolve signal filter =
fold1_right combine (map (makeRow filter) signal)
Try this function:
let convolute signal filter =
[|0 .. Array.length signal + Array.length filter - 1|] |> Array.map (fun i ->
[|0 .. i|] |> Array.sum_by (fun j -> signal.[i] * filter.[Array.length filter - (i - j) - 1]))
It's probably not the nicest function solution, but it should do the job. I doubt there exists a purely functional solution that will match the imperative one for speed however.
Hope that helps.
Note: The function is currently untested (though I've confirmed it compiles). Let me know if it doesn't quite do what it should. Also, observe that the i and j variables do not refer to the same things as is your original post.
Indeed, you generally want to avoid loops (plain, nested, whatever) and anything mutable in functional programming.
There happens to be a very simple solution in F# (and probably almost every other functional language):
let convolution = Seq.zip seq1 seq2
The zip function simply combines the two sequences into one of pairs containing the element from seq1 and the element from seq2. As a note, there also exist similar zip functions for the List and Array modules, as well as variants for combining three lists into triples (zip3). If you want tom ore generally zip (or "convolute") n lists into a list of n-tuples, then you'll need to write your own function, but it's pretty straightforward.
(I've been going by this description of convolution by the way - tell me if you mean something else.)
In principle, it should be possible to use the (Fast) Fourier Transform, or the related (Discrete) Cosine Transform, to calculate the convolution of two functions reasonably efficiently. You calculate the FFT for both functions, multiply them, and apply the inverse FFT on the result.
mathematical background
That's the theory. In practice you'd probably best find a math library that implements it for you.