I am quite new with F# and still trying to decide what the best structure for my financial (back testing) program should be.
As data are immutable, I am thinking that "heavy"/all-in-one structures might not be ideal.
Here is what I try to achieve:
A backtesting engine which creates a strategy every business days.
The strategy consists of a few instruments with a seq/list of trades
(trading date / quantity / price)
I then run calculation (value, risks etc) daily on all those positions for each portfolio. I also add trades to each instruments each day as I adjust the position.
How I first constructed it:
List or seq of dates: [D1 .. Dn]
List or seq of Portfolio Pi [P1 .. Pn] (with several instruments). Each portoflio will start its life at a different Di
For each portfolio, I will have some daily trades on the instrusments
when I compute value, profit and losses, risks ...
What I have used for now (very simplified):
type Instrument1 = {
some specifications
}
type Instrument2 = {
some specifications
}
type Instrument =
| Inst1 of Instrument1
| Inst2 of Instrument2
type Trade = {
Dt ; DateTime
Qty : float
Price : float }
type Portfolio = {
InitDate : DateTime // one of the Di above
Inst : Instruments
Trades : Trade seq }
type BackTesting =
Dates : DateTime seq
Port : Portfolio seq }
And then I create a seq (Dates) of seq (Portfolio) of seq (Instrument) showing let's say P&L.
However, for each portfolio Pi I am iterating on all dates to check if I need to adjust the portfolio and then add a trade to the trade list, it means that every day, for every portfolio, for every instrument, I am creating a new BackTesting (non mutable). I believe this way of reasoning is way more OOP than FP but I am a bit lost on proper patterns to use (the F# books I have used are not very clear on the data structure that works best for FP - or I did not really understand them).
I might not be very clear but if anyone has a direction into which I should look at (or any useful documentation/support on the issue), please do not hesitate. Thanks a lot for your help.
Since you are starting with F#, my suggestion to you is to not worry too much about programming in a purely functional way. If you come from an imperative style of programming it may be too much of a change and you may be discouraged. The shift from imperative style to functional style takes time and it's gradual.
The good thing is F# lets you be imperative too!
So program like you would in other languages:
Use global mutable variables when it best suits you.
Use for and while
Did you know that array elements are mutable?
As you progress you will learn the functional way, some things are really easy to use
right away:
Definitely use option, never null
Try using map, filter, choose over list, array or seq.
In time you will naturally gravitate more towards the functional style but you don't have to jump all at once. One of the best resources to get started is https://fsharpforfunandprofit.com/ its full of very good articles, slides, videos conveyed in a clear way.
Good luck!
Related
There does not seem to be an "easy" way (such as in R or python) to create interaction terms between dummy variables in gretl ?
Do we really need to code those manually which will be difficult for many levels? Here is a minimal example of manual coding:
open credscore.gdt
SelfemplOwnRent=OwnRent*Selfempl
# model 1
ols Acc 0 OwnRent Selfempl SelfemplOwnRent
Now my manual interaction term will not work for factors with many levels and in fact does not even do the job for binary variables.
Thanks,
ML
One way of doing this is to use lists. Use the dummify-command for generating dummies for each level and the ^-operator for creating the interactions. Example:
open griliches.gdt
discrete med
list X = dummify(med)
list D = dummify(mrt)
list INT = X^D
ols lw 0 X D INT
The command discrete turns your variable into a discrete variable and allows to use dummify (this step is not necessary if your variable is already discrete). Now all interactions terms are stored in the list INT and you can easily assess them in the following ols-command.
#Markus Loecher on your second question:
You can always use the rename command to rename a series. So you would have to loop over all elements in list INT to do so. However, I would rather suggest to rename both input series, in the above example mrt and med respectively, before computing the interaction terms if you want shorter series names.
The .weekday component starts at 1 (sunday = 1, monday = 2 etc...) and I'm interested if anyone knows why. It seems that usually in programming things start at 0.
The reason for zero based indexing in programming dates back to the time when programs were written in machine language or assembly code. It is a reflexion of the base+displacement capability of memory access from CPU registers. It was maintained in low level programming languages (such as C) that were essentially a bridge to assembly code. Zero based indexing also provides much simpler index manipulation when processing a one dimensional array (or memory block) as a multidimensional matrix. That being said, it is still just a convention. Some languages (such as Pascal) use one based indexing and normal human beings don't start numbering things at zero.
I don't know the fundamental reason for the numbering of weekdays being based on 1 but I strongly suspect that it is more consistant (and practical) to use with calendars where day numbers within a month, and months with a year are also 1 based. It would be very confusing to manipulate days and months as zero based indexes. Given this, weekdays should follow the same conventions.
function profit(){
int totalSales=0;
for (int i=0; i<12;i++) // computer yearly sales
totalSales+=montlysales[i];
return get_profit_from_sales(totalsales);
}
So i've already determined that the 12 in the for loop should be a constant instead of just using an integer and that the montlysales should be passed as a parameter into the function so then a check can be run to see if the length of sales is equal to the integer value of months which is also twelve.
I'm not sure if those are all the violations of the princples cause. I feel the last line
return get_profit_from_sales(totalsales)
is wrong and its really bothering me cause I can't seem to figure out why it is in fact bothering me and I think I might have skipped something else.
can anyone help me verify?
Summary - you should refactor out the call to another function and make this function so that it is pure and does only one thing, reducing complexity, and improving your ability to reason abstractly about your program and its correctness.
Your spidey sense is tingling and you should trust it - you are correct, but what is wrong is subtle.
Routines are best when they do one thing, and one thing only. So purity of vision is important in the prime imperative, management of complexity -- it allows our brains to be able to juggle more things because they are simpler. That is, you can just look at the function and know what it does, and you don't have to say, "it totals the sales, but it also calls another function at the end", which sort of clouds its "mission".
This is also part of functional programming and where I feel that languages have to adopt to try to implement the prime imperative spoken of in Code Complete. Functional programming has as one of its tenets, "no side effects", which is similar to "one mission" or "one purpose". What I've done to your code can also be seen as making it more functional, just inputs and outputs and nothing in or out via any other path.
Note also that function get_profit() reads like pseudocode, making it somewhat self-documenting, and you don't have to delve into any of the functions to understand what the function does or how it does it (ideally).
So, here is my idea of what I explained above (loosely coded, and not checked!).
function get_total_sales(int montlysales[]){
int totalSales=0;
for (int i=0; i<12;i++) // computer yearly sales
totalSales+=montlysales[i];
return totalSales;
}
function calc_profit(int all_sales, int all_commissions, int all_losses)(){
// your calculations here
int profit = all_sales - all_commissions - all_losses; // ... etc.
return profit;
}
function get_profit(){
int montlysales[] = read_from_disk();
int total_sales = get_total_sales(montlysales);
int tot_commissions = get_tot_commissions();
int tot_losses = get_tot_losses();
int profit_made = calc_profit(total_sales, tot_commissions, tot_losses);
return profit_made;
}
I read Code Complete about once a year, as coding is truly subtle at times, because it is so multidimensional. This has been very helpful to me. Regards - Stephen
It is nice to have a wrapper for every primitive value, so that there is no way to misuse it. I suspect this convenience comes at a price. Is there a performance drop? Should I rather use bare primitive values instead if the performance is a concern?
Yes, there's going to be a performance drop when using single-case union types to wrap primitive values. Union cases are compiled into classes, so you'll pay the price of allocating (and later, collecting) the class and you'll also have an additional indirection each time you fetch the value held inside the union case.
Depending on the specifics of your application, and how often you'll incur these additional overheads, it may still be worth doing if it makes your code safer and more modular.
I've written a lot of performance-sensitive code in F#, and my personal preference is to use F# unit-of-measure types whenever possible to "tag" primitive types (e.g., ints). This keeps them from being misused (thanks to the F# compiler's type checker) but also avoids any additional run-time overhead, since the measure types are erased when the code is compiled. If you want some examples of this, I've used this design pattern extensively in my fsharp-tools projects.
Jack has much more experience with writing high-performance F# code than I do, so I think his answer is absolutely right (I also think the idea to use units of measure is pretty interesting).
Just to put things in context, I wrote a really basic test (using just F# Interactive - so things may differ in Release mode) to compare the performance. It allocates an array of wrapped (vs. non-wrapped) int values. This is probably the scenario where non-wrapped types are really a good choice, because the array will be just a continuous block of memory.
#time
// Using a simple wrapped `int` type
type Foo = Foo of int
let foos = Array.init 1000000 Foo
// Add all 'foos' 1k times and ignore the results
for i in 0 .. 1000 do
let mutable total = 0
for Foo f in foos do total <- total + f
On my machine, the for loop takes on average something around 1050ms. Now, the unwrapped version:
let bars = Array.init 1000000 id
for i in 0 .. 1000 do
let mutable total = 0
for b in bars do total <- total + b
On my machine, this takes about 700ms.
So, there is certainly some performance penalty, but perhaps smaller than one would expect (some 33%). And this is looking at a test that does virtually nothing else than unwrap the values in a loop. In code that does something useful, the overhead would be a lot smaller.
This may be an issue if you're writing high-performance code, something that will process lots of data or something that takes some time and the users will run it frequently (like compiler & tools). On the other hand, if you application is not performance critical, then this is not likely to be a problem.
From F# 4.1 onwards adding the [<Struct>] attribute to suitable single case discriminated unions will increase the performance and reduce the number of memory allocations performed.
So, for instance, given I have the following class
class Foo{
String id;
Foo(this.id);
}
I want to have some sort of collection of Foos, and then be able to find any Foo by its id. I want to compare these two ways of accomplishing this:
With a Map:
var foosMap = <String, Foo>{"foo1": new Foo("foo1"), "foo2": new Foo("foo2")};
var foo2 = foosMap["foo2"];
With a List:
var foosList = <Foo>[new Foo("foo1"), new Foo("foo2")];
var foo2 = foosList.singleWhere((i) => i.id == "foo2");
Is it more convenient in terms of performance doing it the first way (with a Map)? Are there any other considerations to take into account?
It really depends on the number of items you're searching through. If you know big-O notation, retrieving a value from a map is O(1) or constant time, while searching linearly through a list is O(n) or linear time. That means that the lookup time is the same for a map no matter not many elements are in it, but the lookup time for a list grows with the number of elements.
Because of this lot of programmers use hash maps for everything when for very small sets lists are often faster. If you ever look at performance critical code that does lookups, you'll sometimes see special cases to switch to lists instead of maps for small sets. The only way to know if this is a good strategy is to do performance testing.
Speed isn't everything though, and I prefer the clarity of the map syntax in many situations, assuming you have a map already. If you would have to build a map just to perform a lookup, then singleWhere() or firstWhere() are great.