Trying to understand this profiling result from F# code - f#

A quick screenshot with the point of interest:
There are 2 questions here.
This happens in a tight loop. The 12.8% code is this:
{
this with Side = side; PositionPrice = position'; StopLossPrice = sl'; TakeProfitPrice = tp'; Volume = this.Volume + this.Quantity * position'
}
This object is passed around a lot and has 23 fields so it's not tiny. It looks like immutability is great for stable code, but it's horrible for performance.
Since this recursive loop is run in parallel, I need to store it's context variables in an object.
I am looking for a general idea of what makes sense, not something specific to that code because I have a few tight loops with a bunch of math which I need to profile as well. I am sure I'll find the same pattern in several places.
The flaw here is that I store both the context for the calculations and its variables in a singe type that gets passed in the loop. As the variable fields get updated, the whole object has to be recreated.
What would make sense here (in general for this type of situations)?
make the fields that can change mutable. In this case, that means keeping the type as is (23 fields) and make some fields mutable (only 5 fields get regularly changed)
move the mutable fields to their own type to have a general context object and one holding all the variables. In this case, that means having a context with (23 - 5 fields) and a separate 5 fields type
make the mutable fields variables and move them out of the type. In this case, these 5 fields would be passed as variables in the recursive loop?
and for the second question:
I have no idea what the 10.0% line with get_Tag is. I have nothing called 'Tag' in the code, so I assume that's a dotnet internal thing.
I have a type called Side and there is a field with the same name used in the loop, but what is the 'Tag' part?

What I would suggest is not to modify your existing immutable type at all. Instead, create a new type with mutable fields that is only used within your tight loop. If the type leaves that loop, convert it back to your immutable type (assuming you don't need a copy to go through the rest of your program with every iteration).

get_Tag in this case is likely the auto-generated get-only property on a discriminated union, it's just how the F# compiler represents this sort of type in CLR. The property can most easily be seen when looking at F# code from C#, here's a great page on F# decompiled:
https://fsharpforfunandprofit.com/posts/fsharp-decompiled/#unions
For the performance issues I can only offer some suggestions:
If you can constrain the context object to your code only, then try making a mutable version and see which effect it has.
You mention that the context object is quite large, is it possible to split it up?

Related

In javac's source code, what is the purpose of traversing from a type to a symbol to its type?

In the source code for javac, I have seen idioms of this form:
Type t = ...;
// Now let's say I want to, say, get its supertype. The idiom seems to be:
Object o = t.tsym.type.supertype_field;
// ...rather than:
Object o = t.supertype_field;
I understand that certain Types and Symbols are circular dependencies. That is, every non-error ClassType, for example, has a ClassSymbol, and every non-error ClassSymbol has a ClassType, even though logically types precede symbols (during construction everything is twiddled to point to each other).
I also understand hazily that there may be various errors during compilation such that a given Type may actually be an ErrorType and I suppose may have different information associated with it.
What kind of canonicalizing is being accomplished by saying t.tsym.type.something rather than just t.something?
If (to reduce mental load) I pretend that errors never exist (so all, say, ClassTypes have ClassSymbols and vice versa), when is there a substantive difference between accessing t.something and t.tsym.type.something?
There is a concrete example in the supertype(Type) method.

String lifetime management, in records

I am working on getting rid of shortstring.
One of the many places shortstring is currently used within our programs is in records.
Alot of these records are kept in AVL trees.
The AVL tree used is a generic one, holding a pointer to a number of bytes (ElemSize), which have worked well so far.
The memory for each record in the AVL tree is allocated with GetMem, and copied with Move.
However, with string being a pointer to a reference-counted structure, copying back the memory to a record no longer works, as the sting referenced is often freed (automatically by reference count).
With only a pointer and a size of the "data block", I assume it is not possible to have the reference count of the strings increased.
I'm looking for a way to get the reference count of the stings to be taken into account when storing the record in a AVL tree.
Can I pass the record type to the tree constructor, then cast the pointer to this type and thus get the references increased? Or a similar fix, where I can isolate the changes to primarily be in the AVL unit and calls to it's constructor.
Current code for allocation of space to store the record in AVL; XData is a pointer to the record to be stored:
New(RootPtr); { create new memory space }
GetMem(RootPtr^.TreeData, ElemSize);
WITH RootPtr^ DO BEGIN
{ copy data }
Move(XData^, RootPtr^.TreeData^, ElemSize);
In essence the question you are asking is:
How can I allocate, copy and deallocate a record when all I know about its type is its size?
The simple answer is that you can use GetMem, Move and FreeMem provided that the record does not contain managed types. You wish to work with records that contain Delphi strings, which are managed. And so your current approach using GetMem and Move does not suffice.
There are plenty of ways to solve this. You could write your own code to do reference counting, so long as you knew where in the record the managed types were. I don't recommend this. You could make your user data be a class and use polymorphism to help.
The option I'd like to discuss continues to support records and indeed allows the user to choose whatever type they like. The reasoning is as follows:
If the type contains managed types, then operating on it requires knowledge of the type. If the tree is to be generic, then it cannot have that knowledge. Ergo, the knowledge must be supplied by the user of the tree.
This leads you to events. Let the tree offer events that the user can supply handlers for. The types would look like this:
type
PTreeNodeUserData = type Pointer;
TTreeNodeCreateUserDataEvent = function: PTreeNodeUserData of object;
TTreeNodeDestroyUserDataEvent = procedure(Data: PTreeNodeUserData) of object;
TTreeNodeCopyUserDataEvent = procedure(Source, Dest: PTreeNodeUserData) of object;
Then you can arrange for your tree to publish events with these types that the user can subscribe to.
The point being that this allows the user of the tree to supply the missing knowledge about the user data type.
One of the main benefits of using records is the simplicity with which they can be copied (without using Move). So your best solution is to simply replace Move with a normal assignment operator :=. This will correctly consider the reference counts for all managed types involved.
Is there a particular reason you're not using the normal assignment operator?
PS: You need to ensure that the memory for all managed types (including long strings) is correctly initialised and finalised. I suggest you do some additional reading on the Initialize and Finalize routines.
The tree is general, it can hold a given lump of data. I hoped I could extend the functionality without making a new tree class per record.
In that case you need your "copy behaviour" to be variable depending on what it's working with. As couple of options:
If your tree is wrapped in a class you can easily modify it to use a callback event to perform the copy operation. (This option might be easiest even if you first have to work on encapsulating the tree in a class.)
Modify your nodes and/or data to be objects with polymorphic copy functionality. Then each subtype will know how to copy itself correctly, and you can write something along the lines of Root.TreeData := XData.CreateCopy;
If you are working at such a low level, and don't want compiler to help you, then you need to use PChar-strings instead of regular strings.

Why should we use classes rather than records, or vice versa?

I've been using Delphi for quite some time now, but rather than coming from a CS background I have learnt "on the job" - mostly from my Boss, and augmented by bits and pieces picked up from the web, users guides, examples, etc.
Now my boss is old school, started programming using Pascal, and hasn't necessarily kept up-to-date with the latest changes to Delphi.
Just recently I've been wondering whether one of our core techniques is "wrong".
Most of our applications interface with MySQL. In general we will create a record with a structure to store data read from the DB, and these records will be stored in a TList. Generally we will have a unit that defines the various records that we have in an application, and the functions and procedures that seed and read the records. We don't use record procedures such as outlined here
After reviewing some examples I've started wondering whether we'd be better off using classes rather than records, but I'm having difficulty finding strong guidance either way.
The sort of thing that we are dealing with would be User information: Names, DOB, Events, Event Types. Or Timesheet information: Hours, Jobs, etc...
The big difference is that records are value types and classes are reference types. In a nutshell what this means is that:
For a value type, when you use assignment, a := b, a copy is made. There are two distinct instances, a and b.
For a reference type, when you use assignment, a := b, both variables refer to the same instance. There is only one instance.
The main consequence of this is what happens when you write a.Field := 42. For a record, the value type, the assignment a.Field changes the value of the member in a, but not in b. That's because a and b are different instances. But for a class, since a and b both refer to the same instance, then after executing a.Field := 42 you are safe to assert that b.Field = 42.
There's no hard and fast rule that says that you should always use value types, or always use reference types. Both have their place. In some situations, it will be preferable to use one, and in other situations it will be preferable to use the other. Essentially the decision always comes down to a decision on what you want the assignment operator to mean.
You have an existing code base, and presumably programmers familiar with it, that has made particular choices. Unless you have a compelling reason to switch to using reference types, making the change will almost certainly lead to defects. And defects both in the existing code (switch to reference type changes meaning of assignment operator), and in code you write in the future (you and your colleagues have developed intuition as to meaning of assignment operator in specific contexts, and that intuition will break if you switch).
What's more, you state that your types do not use methods. A type that consists only of data, and has no methods associated with it is very likely best represented by a value type. I cannot say that for sure, but my instincts tell me that the original developers made the right choice.

When do FORTRAN subprograms save data, and when not?

Have a pretty simple function for taking the name of a month, "Jan", "Feb", etc. and converting to the number of the month:
function month_num(month_str)
character*(*) :: month_str
character*3 :: month_names(12)
integer :: ipos(1),location(12)
data month_names/'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug', &
'Sep','Oct','Nov','Dec'/
where (month_names==month_str) location=1
ipos = maxloc(location)
month_num = ipos(1)
end function
And OK, yes, I know it's dangerous to not define "location" before using it.
Problem: During execution of the function, if input is OK, some value of "location" will be set to 1. And, to my surprise, when the function is called again, that value will still be equal to 1. And this, of course, really messes things up. So I figured I would fix it with a new line
data location/12*0/
And I got the same problem.
Finally, I put in
location = 0
just before the "where" statement, and that fixed everything.
So, I thought FORTRAN subprograms would not save data unless the variables were declared with the "SAVE" attribute. Also, with many compilers, you can invoke some sort of "static" option that will keep everything saved. I did neither of these here, but the "location" array was saved just the same. Can someone enlighten me on the rules of when FORTRAN saves data and when not? Thanks.
The value of a variable local to a procedure is preserved across (ie it is SAVEd) in one of two ways:
The programmer specifies the SAVE attribute when declaring the variable, for example:
REAL, SAVE :: var1
The programmer initialises the variable upon declaration, for example
REAL :: var1 = 3.1415
This second, implicit, behaviour is one features of Fortran which seem designed to catch out the programmer, and not just beginners. Note that the value the variable has upon re-invocation is not, in the 2nd example 3.1415, but whatever value it had when the last invocation exited.
It is common for compilers to behave as if a variable is SAVEd when the programmer has not exercised either of these options, perhaps the memory locations used by one invocation of a procedure are not overwritten before the next invocation. But this behaviour is not to be relied on.
The situation is slightly different for variables declared in modules. Again any variable with the SAVE attribute is saved but any other variable only retains its value while the module is use-associated with a program unit which has started executing but not finished. Again some compilers, and some executions of some programs, may behave as if the value of a module variable is preserved despite the module going out of scope but this is non-standard behaviour and not to be relied on.
This behaviour is scheduled to change in Fortran 2008 when variables defined in modules will acquire the SAVE attribute implicitly.
Personally I like to explicitly SAVE variables even when I am sure that they would get the attribute implicitly, it makes the code just a bit easier to understand next time round.

What is the summary of the differences in binding behaviour between Rebol 2 and 3?

The current in-depth documentation on variable binding targets Rebol 2. Could someone provide a summary of differences between Rebol 2 and 3?
There isn't really a summary somewhere, so let's go over the basics, perhaps a little more informally than Bindology. Let Ladislav write a new version of his treatise for R3 and Red. We'll just go over the basic differences, in order of importance.
Object and Function Contexts
Here's the big difference.
In R2, there were basically two kinds of contexts: Regular object contexts and system/words. Both had static bindings, meaning that once the bind function was run, the word binding pointed to a particular object with a real pointer.
The system/words context was able to be expanded at runtime to include new words, but all other objects weren't. Functions used regular object contexts, with some hackery to switch out the value blocks when you call the function recursively.
The self word was just a regular word that happened to the first one in object contexts, with a display hack to not show the the first word in the context; function contexts didn't have that word, so they didn't display the first regular word properly.
In R3, almost all of that is different.
In R3 there are also two kinds of contexts: Regular and stack-local. Regular contexts are used by objects, modules, binding loops, use, basically everything but functions, and they are expandable like system/words was (yes, "was", we'll get to that). The old fixed-length objects are gone. Functions use stack-local contexts, which (barring any bugs we haven't seen yet) aren't supposed to be expandable, because that would mess up stack frames. As with the old system/words you can't shrink contexts, because removing words from a context would break any bindings of those words out there.
If you want to add words to a regular context, you can use bind/new, bind/set, resolve/extend or append, or the other functions that call those, depending on what behavior you need. That is new behavior for the bind and append functions in R3.
Bindings of words to regular and stack-local contexts are static, as before. Value look-up is another matter. For regular contexts value look-up is pretty direct, done by a simple pointer indirection to a static block of value slots. For stack-local contexts the value block is linked to by the stack frame and referenced from there, so to find the right frame you have to do a stack walk that is O(stack-depth). See bug #1946 for details - we'll get into why later.
Oh, and self isn't a regular word anymore, it's a binding trick, a keyword. When you bind blocks of words to object or module contexts, it binds the keyword self which evaluates to be a reference to the context. However, there is an internal flag that can be set which says that a context is "selfless", which turns that self keyword off. When that keyword is turned off, you can actually use the word self as a field in your context. Binding loops, use and function contexts set the selfless flag for their contexts, and the selfless? function checks for that.
This model was refined and documented in a fairly involved CureCode flame war, much like R2's model was documented by a REBOL mailing list flame war back in 1999-2000. :-)
Functions vs. Closures
When I was talking about stack-local function contexts above, I meant the contexts used by function! type functions. R3 has a lot of function types, but most of them are native functions in one way or another, and native functions don't use these stack-local contexts (though they do get stack frames). The only function types that are for Rebol code are function! and a new closure! type. Closures are very different from regular functions.
When you create a function!, you're creating a function. It constructs a stack-local context, binds the code body to it, and bundles together the code body and the spec. When you call the function it makes a stack frame with a reference to the function's context and runs the code block. If it has access words in the function context it does the stack walk to find the right frame and then gets the values from there. Fairly straight-forward.
When you create a closure!, on the other hand, you create a function builder. It sets up the spec and function body pretty much the same as function!, but when you call a closure it makes a new regular selfless context, then does a bind/copy of the body, changing all references to the function context to be references to the new regular context in the copy. Then, when it does the copied body, all closure word references are as static as those of object contexts.
Another difference between the two is in how they behave before the function is running, while the function is running, and after the function is done running.
In R2, function! contexts still exist when the function isn't running, but the value block of the top-level call of the function still persists too. Only recursive calls get the new value blocks, the top-level call keeps a persistent value block Like I said, hackery. Worse, the top-level value block isn't cleared when the function returns, so you better make sure that you aren't referencing anything sensitive or that you want recycled when the function returns (use the also function to clean up, that's what I made it for).
In R3, the function! contexts still exist when the function isn't running, but the value block doesn't exist at all. All function calls act like recursive calls did in R2, except better because it's designed that way all the way down, referencing the stack frame instead. The scope of that stack frame is dynamic (stalk a Lisp fan if you need the history of that), so as long as the function is running on the current stack (yes, "current", we'll get to that), you can use one of its words to get at the values of the most recent call of that function. Once all of the nested calls of the function return, there won't be any values in scope, and you'll just trigger an error (the wrong error, but we'll fix that).
There's also a useless restriction on binding to an out-of-scope function word that is on my todo list to fix soon. See bug #1893 for details.
For closure! functions, before the closure runs the context doesn't exist at all. Once the closure starts running, the context is created and exists persistently. If you call the closure again, or recursively, another persistent context is created. Any word that leaks from a closure only refers to the context created during that particular run of the closure.
You can't get a word bound to a function or closure context in R3 when the function or closure isn't running. For functions, this is a security issue. For closures, this is a definitional issue.
Closures were considered so useful that Ladislav and I both ported them to R2, independently at different times resulting in similar code, weirdly enough. I think Ladislav's version predated R3, and served as the inspiration for R3's closure! type; my version was based on testing the external behavior of that type and trying to replicate it in R2 for R2/Forward, so it's amusing that the solution for closure ended up being so similar to Ladislav's original, which I didn't see until much later. My version got included in R2 itself starting with 2.7.7, as the closure, to-closure and closure? functions, and the closure! word is assigned the same type value as function! in R2.
Global vs. Local Contexts
Here's where things get really interesting.
In Bindology there was a fairly large amount of the article talking about the distinction between the "global" context (which turned out to be system/words) and "local" contexts, a fairly important distinction for R2. In R3, that distinction is irrelevant.
In R3, system/words is gone. There is no one "global" context. All regular contexts are "local" in the sense that was meant in R2, which makes that meaning of "local" useless. For R3, we need a new set of terms.
For R3, the only difference that matters is whether contexts are task-relative, so the only useful meaning for "global" contexts are the ones that are not directly task-relative, and "local" contexts are the ones that are task-relative. A "task" in this case would be the task! type, which is basically an OS thread in the current model.
In R3, at the moment, the only things so far that are task-relative (barely) are the stack variables, which means that stack-relative function contexts are supposed to be task-relative too. This is why the stack walk is necessary, because otherwise we'd need to keep and maintain TLS pointers in every single function context. All regular contexts are global.
Another thing to consider is that according to the plan (which is mostly unimplemented so far), the user context system/contexts/user and system itself are also intended to be task-relative, so even by R3 standards they would be considered "local". And since system/contexts/user is basically the closest thing that R3 has to R2's system/words, that means that what scripts think of as being their "global" context is actually supposed to be task-local in R3.
R3 does have a couple of system global contexts called sys and lib, though they are used quite differently from R2's global context. Also, all module contexts are global.
It is possible (and common) for there to be globally defined contexts that are only referenced from task-local root references, so that would make those contexts in effect indirectly task-local. This is what usually happens when binding loops, use, closures or private modules are called from "user code", which basically means non-module scripts, which get bound to system/contexts/user. Technically, this is also the case for functions called from modules as well (since functions are stack-local) but those references often eventually get assigned to module words, which are global.
No, we don't yet have synchronization either. Still, that's the model that R3's design is supposed to eventually have, and partly does already. See the module binding article for more details.
As a bonus, R3 has a real symbol table now instead of using system/words as an ad-hoc symbol table. This means that the word limit R2 used to hit pretty quickly is effectively gone in R3. I'm not aware of any app that has reached the new limit or even determined how high the limit is, though it is apparently well over many million distinct symbols. We should check the source to figure that out, now that we have access it it.
LOAD and USE
Minor details. The use function initializes its words with none instead of leaving them unset. And because there is no "global" context the way there is in R2, load doesn't necessarily bind words at all. Which context load binds to depends on circumstances mentioned in the module binding article, though unless you specify otherwise it explicitly binds the words to system/contexts/user. And both are now mezzanine functions.
Spelling and Aliases
R3 is basically the same as R2 in this, word bindings are by default case-insensitive. The words themselves are case-preserving, and if you compare them using case-sensitive methods you will see differences between words that differ only in case.
In object or function contexts though, when a word is mapped to a value slot, then another word is bound to that context or looked up at runtime, words that differ only by case are considered to be effectively the same word and map to the same value slot.
However, it was found that explicitly created aliases made with the alias function where the spelling of aliased words differed in other ways than just by case broke object and function contexts really drastically. In R2 they resolved these problems in system/words, which made alias merely too awkward to use in anything other than a demo, rather than actively dangerous.
Because of this, we removed the externally visible alias function altogether. The internal aliasing facility still works because it only aliases words that would normally be considered equivalent for context lookup, which means that contexts don't break. But we now recommend that localization and the other tricks that alias was used for in demos, if never in practice, be done using the old-fashioned method of assigning values to another word with the new spelling.
Word Types
The issue! type is now a word type. You can bind it. So far, no-one has taken advantage of being able to bind issues, instead just using the increased speed of operations.
That's pretty much it, for the intentional changes. Most of the rest of the differences might be side effects of the above, or maybe even bugs or not-yet-implemented features. There might even be some R2-like behavior in R3 that is also the result of bugs or not-yet-implemented features. When in doubt, ask first.

Resources