Linked-list representation of disjoint sets - omission in Intro to Algorithms text? - linked-list

Having had success with my last CLRS question, here's another:
In Introduction to Algorithms, Second Edition, p. 501-502, a linked-list representation of disjoint sets is described, wherein each list member the following three fields are maintained:
set member
pointer to next object
pointer back to first object (the set representative).
Although linked lists could be implemented by using only a single "Link" object type, the textbook shows an auxiliary "Linked List" object that contains a pointer to the "head" link and the "tail" link. Having a pointer to the "tail" facilitates the Union(x, y) operation, so that one need not traverse all of the links in a larger set x in order to start appending the links of the smaller set y to it.
However, to obtain a reference to the tail link, it would seem that each link object needs to maintain a fourth field: a reference to the Linked List auxiliary object itself. In that case, why not drop the Linked List object entirely and use that fourth field to point directly to the tail?
Would you consider this an omission in the text?

I just opened the text and the textbook description seems fine to me.
From what I understand the data-structure is something like:
struct Set {
LinkedListObject * head;
LinkedListObject * tail;
};
struct LinkedListObject {
Value set_member;
Set *representative;
LinkedListObject * next;
};
The textbook does not talk of any "auxillary" linked list structure in the book I have (second edition). Can you post the relevant paragraph?
Doing a Union would be something like:
// No error checks.
Set * Union(Set *x, Set *y) {
x->tail->next = y->head;
x->tail = y->tail;
LinkedListObject *tmp = y->head;
while (tmp) {
tmp->representative = x;
tmp = tmp->next;
}
return x;
}

why not drop the Linked List object entirely and use that fourth field to point directly to the tail?
An insight can be taken from path compression. There all the elements are supposed to point to head of list. If it doesn't happen then the find-set operation does that (by changing p[x] and returning that). You talk similarly of tail. So if such function is implemented only then can we use that.

Related

Dart - Pass by value for int but reference for list?

In Dart, looking at the code below, does it 'pass by reference' for list and 'pass by value' for integers? If that's the case, what type of data will be passed by reference/value? If that isn't the case, what's the issue that causes such output?
void main() {
var foo = ['a','b'];
var bar = foo;
bar.add('c');
print(aoo); // [a, b, c]
print(bar); // [a, b, c]
var a = 3;
int b = a;
b += 2;
print(a); // 3
print(b); // 5
}
The question your asking can be answered by looking at the difference between a value and a reference type.
Dart like almost every other programming langue makes a distinction between the two. The reason for this is that you divide memory into the so called stack and the heap. The stack is fast but very limited so it cannot hold that much data. (By the way, if you have too much data stored in the stack you will get a Stack Overflow exception which is where the name of this site comes from ;) ). The heap on the other hand is slower but can hold nearly infinite data.
This is why you have value and reference types. The value types are all your primitive data types (in Dart all the data type that are written small like int, bool, double and so on). Their values are small enough to be stored directly in the stack. On the other hand you have all the other data types that may potentially be much bigger so they cannot be stored in the stack. This is why all the other so called reference types are basically stored in the heap and only an address or a reference is stored in the stack.
So when you are setting the reference type bar to foo you're essentially just copying the storage address from bar to foo. Therefore if you change the data stored under that reference it seems like your changing both values because both have the same reference. In contrast when you say b = a your not transferring the reference but the actual value instead so it is not effected if you make any changes to the original value.
I really hope I could help answering your question :)
In Dart, all type are reference types. All parameters are passed by value. The "value" of a reference type is its reference. (That's why it's possible to have two variables containing the "same object" - there is only one object, but both variables contain references to that object). You never ever make a copy of an object just by passing the reference around.
Dart does not have "pass by reference" where you pass a variable as an argument (so the called function can change the value bound to the variable, like C#'s ref parameters).
Dart does not have primitive types, at all. However (big caveat), numbers are always (pretending to be) canonicalized, so there is only ever one 1 object in the program. You can't create a different 1 object. In a way it acts similarly to other languages' primitive types, but it isn't one. You can use int as a type argument to List<int>, unlike in Java where you need to do List<Integer>, you can ask about the identity of an int like identical(1, 2), and you can call methods on integers like 1.hashCode.
If you want to clone or copy a list
var foo = ['a', 'b'];
var bar = [...foo];
bar.add('c');
print(bar); // [a, b, c]
print(foo); // [a, b]
var bar_two = []; //or init an empty list
bar_two.addAll([...bar]);
print(bar_two); // [a, b, c]
Reference link
Clone a List, Map or Set in Dart

Memory usage of pass by value vs. pass by reference

For the past few days am trying to learn if pass by value and pass by reference impact the memory differently. Googling this query, people kept repeating themselves about a copy being created in terms of pass by value and how the original value is affected in terms pass by reference. But I was wondering if someone could zero in on the memory part.
This question actually depends heavily on the particular language as some allow you to be explicit and define when you want to pass a variable by value and when by reference and some do it always the same way for different types of variables.
A quite popular type of behavior is to use passing by value (by default) for simple times: like int, string, long, float, double, bool etc.
Let us show the memory impact on a theoretical language:
int $myVariable = 5;
at this moment you have created a one variable in memory which takes the size required to store an integer (let us say 32 bits).
Now you want to pass it to a function:
function someFunction(int parameter)
{
printOnScreen(parameter);
}
so your code would look like:
function someFunction(int $parameter)
{
printOnScreen($parameter);
}
int $myVariable = 5; //Position A
someFunction($myVariable); //Position B
...rest of the code //Position C
Since simple types are passed by value the value is copied in memory to another storage place - therefore:
during Position A you have memory occupied by ONE int (with value 5);
during Position B you have memory occupied by TWO ints (with values of 5) as your $myVariable was copied in memory
during Position C you have again memory occupied by ONE int (with value of 5) as the second one was already destroyed as it was needed only for the time of execution of the function
This has some other implications: modifications on a variable passed by value DO NOT affect the original variable - for example:
function someFunction(int $parameter)
{
$parameter = $parameter + 1;
printOnScreen($parameter);
}
int $myVariable = 5; //Position A
someFunction($myVariable); //Position B
printOnScreen($myVariable); //Position C
During position A you set value of 5 under variable $myVariable.
During position B you pass it BY VALUE to a function which adds 1 to your passed value. YET since it was a simple type, passed by value, it actually operates on a LOCAL variable, a COPY of your variable. Therefore position C will again write just 5 (your original variable as it was not modified).
Some languages allow you to be explicit and inform that you want to pass a reference and not the value itself using a special operator -for example &. So let us again follow the same example but with explicit info that we want a reference (in function's arguments
- note the &):
function someFunction(int &$parameter)
{
$parameter = $parameter + 1;
printOnScreen($parameter);
}
int $myVariable = 5; //Position A
someFunction($myVariable); //Position B
printOnScreen($myVariable); //Position C
This time operation and memory implications will be different.
During Position A an int is created (every variable is always consisted of two elements: place in memory and a pointer, an identifier which place is it. For ease of the process let us say that pointer is always one byte). So whenever you create a variable you actually create two things:
reserved place in memory for the VALUE (in this case 32 bits as it was an int)
pointer (8 bits [1 byte])
Now during position B, the function expects A POINTER to a memory place. Which means that it will locally, for itself create only a copy of the pointer (1 byte) and not copy the actual reserved place as the new pointer WILLL POINT to the same place as the original one. This means that during operation of the function you have:
TWO POINTERS to an int in memory
ONE place reserved for VALUE of the int
Both of those pointer POINT to the same VALUE
Which means that any modification of the value will affect both.
So looking at the same example position C will not print out also 6 as inside the function we have modified the value under the SAME POINTER as $myVariable.
For COMPLEX TYPES (objects) the default action in most programming environments is to pass the reference (pointer).
So for example - if you have a class:
class Person {
public string $name;
}
and create an instance of it and set a value:
$john = new Person();
$john->name = "John Malkovic";
and later pass it to a function:
function printName(Person $instanceOfPerson)
{
printOnScreen($instanceOfPerson);
}
in terms of memory it will again create only a new POINTER in memory (1 byte) which points to the same value. So having a code like this:
function printName(Person $instanceOfPerson)
{
printOnScreen($instanceOfPerson);
}
$john = new Person(); // position A
printName($john); // position B
...rest of the code // position C
during position A you have: 1 Person (which means 1 pointer [1 byte] to a place in memory which has size to store an object of class person)
during position B you have: 2 pointers [2 bytes] but STILL one place in memory to store an object of class person's value [instance]
during position C you have again situation from position A
I hope that this clarifies the topic for you - generally there is more to cover and what I have mentioned above is just a general explanation.
Pass-by-value and pass-by-reference are language semantics concepts; they don't imply anything about the implementation. Usually, languages that have pass-by-reference implement it by passing a pointer by value, and then when you read or write to the variable inside the function, the compiler translates it into reading or writing from a dereference of the pointer. So you can imagine, for example, if you have a function that takes a parameter by reference in C++:
struct Foo { int x; }
void bar(Foo &f) {
f.x = 42;
}
Foo a;
bar(a);
it is really syntactic sugar for something like:
struct Foo { int x; }
void bar(Foo *f_ptr) {
(*f_ptr).x = 42;
}
Foo a;
bar(&a);
And so passing by reference has the same cost as passing a pointer by value, which does involve a "copy", but it's the copy of a pointer, which is a few bytes, regardless of the size of the thing pointed to.
When you talk about pass-by-value doing a "copy", that doesn't really tell you much unless you know what exactly the variable or value passed represents in the language. For example, Java only has pass-by-value. But every type in Java is either a primitive type or a reference type, and the values of reference types are "reference", i.e. pointers to objects. So you can never have a value in Java (what a variable holds or what an expression evaluates to) which "is" an "object"; objects in Java can only be manipulated through these "references" (pointers to objects). So when you ask the cost of passing a object in Java, it's actually wrong because you cannot "pass" an object in Java; you can only pass references (pointers to objects), and the copy the happens for pass-by-value, is the copy of the pointer, which is a few bytes.
So the only case where you would actually copy a big structure when passing, is if you have a language where objects or structs are values directly (not behind a reference), and you do pass-by-reference of that object/struct type. So for example, in C++, you can have objects which are values directly, or you can have pointers to them, and you can pass them by value or by reference:
struct Foo { int x; }
void bar1(Foo f1) { } // pass Foo by value; this copies the entire size of Foo
void bar2(Foo *f2) { } // pass pointer by value; this copies the size of a pointer
void bar3(Foo &f3) { } // pass Foo by reference; this copies the size of a pointer
void bar4(Foo *&f4) { } // pass pointer by reference; this copies the size of a pointer
(Of course, each of those have different semantic meanings; for example, the last one allows the code inside the function to modify the pointer variable passed to point to somewhere else. But if you are concerned about the amount copied. Only the first one is different. In Java, effectively only the second one is possible.)

How to iterate/while a mapping variables from environment to message assembly in IBM Integration Bus (toolkit)?

I have a SOAP node, that retrieve information from a URL in a tree structure.
Then i have a compute node to define each environment variable to each namespace variable of the SOAP retrieve.
And finally, i have a mapping node, to move the content to my message assembly structure in XML.
The error its giving me it's this (IN THE COMPUTE NODE):
I have a structure like this:
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
The problem is that, when i do the definition of the variables, I do it like the code below, in the COMPUTE NODE:
WHILE I < InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos
DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.tipoDocumento = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:DocType;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.attribute = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:atribbute;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.lenght = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:lenght;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.nature = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:nature;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.required = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:required;
SET I = I+1;
END WHILE;
BUT, in my XML final structure, it only prints the values of my first listDocs, and i want to print all of my listDocs structures.
NOTE: WITH THE WHILE LIKE THIS, IT DOESN'T EVEN WORK. I HAVE TO REMOVE THE WHILE TO PRINT THE FIRST listDocs like i said Above.
Any help?
I NEED HELP TO LOOP THE STRUCTURES, WITH A WHILE OR SOMETHING.
You should try to use the following synthax :
DECLARE I INTEGER 1;
DECLARE J INTEGER;
J = CARDINALITY(InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos[])
WHILE I <= J DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:description;
....
END WHILE;
You only missed the CARDINALITY function to get the number of elements, and also the [] to define the table, and then using this [I] while accessing the elements
Note : in my sample above, the environment will be overridden at each iteration of the loop, so only the last record will be printed. You can use the [I] in the output as well if you want to construct a table in output, or you can use the following code to push each message to the output terminal (this means you have one message in input, and 3 message coming out of the output terminal)
PROPAGATE TO TERMINAL 'Out';
So for example, based on your code, if you want to generate 3 messages based on your input containing multiple element :
DECLARE I INTEGER 1;
DECLARE J INTEGER;
J = CARDINALITY(InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos[])
WHILE I <= J DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.tipoDocumento = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:DocType;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.attribute = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:atribbute;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.lenght = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:lenght;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.nature = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:nature;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.required = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:required;
PROPAGATE TO TERMINAL 'Out';
END WHILE;
RETURN FALSE;
For your global information, the RETURN TRUE is the instruction "pushing" the message built in the ESQL code to the output terminal. If you use PROPAGATE instruction (same effect), you should RETURN FALSE to avoid sending an empty message after looping on your records. Another way to do it is to propagate on another terminal (i.e : 'out1'), and keep the return true. In this case, you would have all you records coming out from the out1 terminal, and a message going out of the output temrinal (due to the return true) once all the messages have been propagated (this might be useful in many situations)
So the key to understanding IIB and ESQL is that you are looking at in memory Trees built from nodes.
Each Node has pointers/REFERENCEs to PARENT, NEXTSIBLING, PREVSIBLING, FIRSTCHILD and LASTCHILD Nodes.
Nodes also have FIELDNAME, FIELDNAMESPACE, FIELDTYPE and FIELDVALUE attributes.
And last but not least that you are building Output Trees by navigating Input Trees. The Environment Tree, which you are using, is a special long lasting Tree that you can both read from and write to.
So in your code InputRoot.SOAP.Body.ns75:processo.ns75:listDocs can be thought of as shorthand for instructions to navigate to the ns75:listDocs Node. The dots '.' tell ESQL interpreter the name of the child Node of the current Node. If you were telling someone how to navigate the Nodes it would go something like this.
Start at InputRoot. InputRoot is a special Node that is automatically available to you in your ESQL Modules code.
Navigate to the first child Node of InputRoot that has the name SOAP
Navigate to the first child Node of SOAP that has the name Body
Navigate to the first child Node of Body that has the name listDocs and is in the ns75 namespace.
In the absence of a subscript ESQL assumes you want the first Node that matches the specified name ns75:listDocs and ns75:listDocs[1] both refer to the same Node.
This explains what was happening in your code. You were always navigating to the same listDocs[1] node in the InputRoot and Environment Trees.
#Jerem's code improves on what you were doing by at least navigating across the listDocs nodes in the Input tree.
For each iteration of the loop the subscript [I] gets incremented and thus it chooses a different listDocs Node. The listDocs Nodes are siblings and thus the code will access the first, second and third instance of the listDocs Nodes.
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[1] <-- Iteration I=1
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[2] <-- Iteration I=2
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[3] <-- Iteration I=3
To correct #Jerem's answer you'd need to use subscripts on the lefthand side of the statement as well. Picking the description field as an example you'd need to change your code as follows.
SET Environment.Variables.XMLMessage.return.process.listDocs[I].listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:description;
Using subscripts is regarded as a performance no no. Imagine you had 10,000 listDocs this would result in each and every iteration of the loop walking down the tree over the InputRoot, SOAP, Body, ns75:processo Nodes and then across the listDocs sibling nodes until it found the ns75:listDocs[I] Node.
This means by the time we get round to processing ns75:listDocs[10000] it will have had to repetetively walked over all the other listDocs Nodes time and time again, In fact we can calculate it would have walked over (4 x 10,000) + ((10,000 x (10,000 + 1)) / 2) = 50,045,000 Nodes
So it's REFERENCE's to the rescue and also the answer to your question. Try a loop like this.
DECLARE ns75 NAMESPACE 'http://something.or.other.from.your.wsdl';
DECLARE InListDocsRef REFERENCE TO
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs;
WHILE LASTMOVE(InListDocsRef) DO
DECLARE EnvListDocsRef REFERENCE TO Environment;
CREATE LASTCHILD OF Environment.Variables.XMLMessage.return.process AS EnvListDocsRef NAME 'listDocs';
SET EnvListDocsRef.description = InListDocsRef.ns75:description;
SET EnvListDocsRef.tipoDocumento = InListDocsRef.ns75:DocType;
SET EnvListDocsRef.listTypes.attribute = InListDocsRef.ns75:listTypes.ns75:atribbute;
SET EnvListDocsRef.listTypes.lenght = InListDocsRef.ns75:listTypes.ns75:lenght;
SET EnvListDocsRef.listTypes.description = InListDocsRef.ns75:listTypes.ns75:description;
SET EnvListDocsRef.listTypes.nature = InListDocsRef.ns75:listTypes.ns75:nature;
SET EnvListDocsRef.listTypes.required = InListDocsRef.ns75:listTypes.ns75:required;
MOVE InListDocsRef NEXTSIBLING REPEAT NAME;
END WHILE;
The code above only walks over 4 + 10,000 Nodes i.e. 10 thousand Nodes vs 50 million Nodes.
A couple of other useful things to know about setting references are:
To point to the last element you can use a subscript of [<]. So to point to the last ListItem in the aggregate MyList you would code Environment.MyList.ListItem[<]
You can use an asterisk * to set a reference to an element in the tree that you don't know the name of e.g. Environment.MyAggregate.* points to the first child of MyAggregate regardless of it's name.
You can also use asterisks * to choose an element irregardless of it's namespace InListDocsRef.*:listTypes.*:description
For anonymous namespaced elements use *:* but be very careful * and *:* are not the same thing the first means no namespace any element and the second means any namespace any element.
To process lists in reverse combine the [<] subscript with the PREVIOUSSIBLING option of MOVE.
So a chunk of code for reversing a list might go something like:
DECLARE MyReverseListItemWalkingRef REFERENCE TO Environment.MyList.ListItem[<];
WHILE LASTMOVE(MyReverseListItemWalkingRef) DO
CREATE LASTCHILD OF OuputRoot.ReversedList.Item NAME 'Description' VALUE MyReverseListItemWalkingRef.Desc;
MOVE MyReverseListItemWalkingRef PREVIOUSSIBLING REPEAT NAME;
END WHILE;
Learn how to use REFERENCES they are extremely powerful and one of your simplest options when it comes to performance.

Double Linked List in Julia

I'm new to Julia language and I wanted to improve my understanding by implementing a double linked list.
Unfortunately it seems that there is no good existing library for this purpose.
The only good one is the single linked list (here).
There is one implementation of a double linked list (here). But this is 2 years old and I'm not sure if it is outdated or not. And it does not allow a real empty list. It is just a single element with a default value.
At the moment I would be able to implement the common stuff like push!, pop!, that's not the problem.
But I'm struggling with implementing a double linked list that could be empty.
My current approach uses Nullable for a optional value of the reference and value.
type ListNode{T}
prev::Nullable{ListNode{T}}
next::Nullable{ListNode{T}}
value::Nullable{T}
ListNode(v) = (x=new(); x.prev=Nullable{x}; x.next=Nullable{x}; x.value=Nullable(v); x)
ListNode(p, n, v) = new(p, n, v)
end
type List{T}
node::Nullable(ListNode{T})
List() = (start=new(Nullable(ListNode{T}())); node=start; start)
List(v) = (start=new(Nullable(ListNode{T}(v))); node=start; start)
end
But it seems like this is pretty ugly and inconvenient to work with.
My second approach would be to introduce a boolean variable (inside List{T}) which stores if a list is empty or not. Checking this boolean would me allow to simply handle push! and pop! to empty lists.
I tried to google a good solution but I didn't found one.
Can anyone give me a "julia style" solution for the double linked list?
Thanks,
felix
There is now a library containing various data structures, DataStructures.jl Some initial notes regarding the question. As of this writing, type is decrepitated. Instead, mutable struct should be used, for Julia 1.0 and beyond. Nullable is also decrepitated, and a Union of Nothing and the type in question can be used instead.
There exist a package called DataStructures.jl that provides what you need.
You can find a DoubleLinked list containing the functionality you need here:
mutable_list
Code snippets from the link above, defining a DoubleLinked list in Julia >= v 1.1:
mutable struct ListNode{T}
data::T
prev::ListNode{T}
next::ListNode{T}
function ListNode{T}() where T
node = new{T}()
node.next = node
node.prev = node
return node
end
function ListNode{T}(data) where T
node = new{T}(data)
return node
end
end
mutable struct MutableLinkedList{T}
len::Int
node::ListNode{T}
function MutableLinkedList{T}() where T
l = new{T}()
l.len = 0
l.node = ListNode{T}()
l.node.next = l.node
l.node.prev = l.node
return l
end
end
In addition to the DataStructures package, Chris Rackauckas' LinkedLists.jl is a good resource.
The source code is quite readable and you can always ask questions.

Immutable members on objects

I have an object that can be neatly described by a discriminated union. The tree that it represents has some properties that can be easily updated when the tree is modified (but remaining immutable) but that are relatively expensive to recalculate.
I would like to store those properties along with the object as cached values but I don't want to put them into each of the discriminated union cases so I figured a member variable would fit here.
The question is then, how do I change the member value (when I modify the tree) without mutating the actual object? I know I could modify the tree and then mutate that copy without ruining purity but that seems like a wrong way to go about it to me. It would make sense to me if there was some predefined way to change a property but so that the result of the operation is a new object with that property changed.
To clarify, when I say modify I mean doing it in a functional way. Like (::) "appends" to the beginning of a list. I'm not sure what the correct terminology is here.
F# actually has syntax for copy and update records.
The syntax looks like this:
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
(example from the MSDN records page - http://msdn.microsoft.com/en-us/library/dd233184.aspx).
This allows the record type to be immutable, and for large parts of it to be preserved, whilst only a small part is updated.
The cleanest way to go about it would really be to carry the 'cached' value attached to the DU (as part of the case) in one way or another. I could think of several ways to implement this, I'll just give you one, where there are separate cases for the cached and non-cached modes:
type Fraction =
| Frac of int * int
| CachedFrac of (int * int) * decimal
member this.AsFrac =
match this with
| Frac _ -> this
| CachedFrac (tup, _) -> Frac tup
An entirely different option would be to keep the cached values in a separate dictionary, this is something that makes sense if all you want to do is save some time recalculating them.
module FracCache =
let cache = System.Collections.Generic.Dictionary<Fraction, decimal>()
let modify (oldFrac: Fraction) (newFrac: Fraction) =
cache.[newFrac] <- cache.[oldFrac] + 1 // need to check if oldFrac has a cached value as well.
Basically what memoize would give you plus you have more control over it.

Resources