Why `previous`, not `next` variable has to be weak in DoublyLinkedList? - linked-list

I spent a time to figure out why my doubly linked list function append() doesn't work. The reason was that I made as weak next instead of previous. But I can not find why it does matter. I thought ownership cycle it's just a cycle with "equal" instances.
I've read about reference counting, but can not find an answer for my question.

Although previous and next have completely symmetric roles, there is one important difference if your list has a head reference (to the first node in the list) and not a tail reference (to the last node in the list). In that scenario the symmetry is broken, and then it becomes important you make the right choice in terms of weak references.
Nodes in the list are then reachable through the head reference and then a chain of zero or more next references. If however these next references are weak references, nodes risk to be garbage collected. It doesn't help that they are referenced by strong previous references, since the "root" of that chain is the tail node of the list, which has no strong reference to it.

Related

Why is splitting a Rust's std::collections::LinkedList O(n)?

The .split_off method on std::collections::LinkedList is described as having a O(n) time complexity. From the (docs):
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
Why not O(1)?
I know that linked lists are not trivial in Rust. There are several resources going into the how's and why's like this book and this article among several others, but I haven't got the chance to dive into those or the standard library's source code yet.
Is there a concise explanation about the extra work needed when splitting a linked list in (safe) Rust?
Is this the only way? And if not why was this implementation chosen?
The method LinkedList::split_off(&mut self, at: usize) first has to traverse the list from the start (or the end) to the position at, which takes O(min(at, n - at)) time. The actual split off is a constant time operation (as you said). And since this min() expression is confusing, we just replace it by n which is legal. Thus: O(n).
Why was the method designed like that? The problem goes deeper than this particular method: most of the LinkedList API in the standard library is not really useful.
Due to its cache unfriendliness, a linked list is often a bad choice to store sequential data. But linked lists have a few nice properties which make them the best data structure for a few, rare situations. These nice properties include:
Inserting an element in the middle in O(1), if you already have a pointer to that position
Removing an element from the middle in O(1), if you already have a pointer to that position
Splitting the list into two lists at an arbitrary position in O(1), if you already have a pointer to that position
Notice anything? The linked list is designed for situations where you already have a pointer to the position that you want to do stuff at.
Rust's LinkedList, like many others, just store a pointer to the start and end. To have a pointer to an element inside the linked list, you need something like an Iterator. In our case, that's IterMut. An iterator over a collection can function like a pointer to a specific element and can be advanced carefully (i.e. not with a for loop). And in fact, there is IterMut::insert_next which allows you to insert an element in the middle of the list in O(1). Hurray!
But this method is unstable. And methods to remove the current element or to split the list off at that position are missing. Why? Because of the vicious circle that is:
LinkedList lacks almost all features that make linked lists useful at all
Thus (nearly) everyone recommends not to use it
Thus (nearly) no one uses LinkedList
Thus (nearly) no one cares about improving it
Goto 1
Please note that are a few brave souls occasionally trying to improve the situations. There is the tracking issue about insert_next, where people argue that Iterator might be the wrong concept to perform these O(1) operations and that we want something like a "cursor" instead. And here someone suggested a bunch of methods to be added to IterMut (including cut!).
Now someone just has to write a nice RFC and someone needs to implement it. Maybe then LinkedList won't be nearly useless anymore.
Edit 2018-10-25: someone did write an RFC. Let's hope for the best!
Edit 2019-02-21: the RFC was accepted! Tracking issue.
Maybe I'm misunderstanding your question, but in a linked list, the links of each node have to be followed to proceed to the next node. If you want to get to the third node, you start at the first, follow its link to the second, then finally arrive at the third.
This traversal's complexity is proportional to the target node index n because n nodes are processed/traversed, so it's a linear O(n) operation, not a constant time O(1) operation. The part where the list is "split off" is of course constant time, but the overall split operation's complexity is dominated by the dominant term O(n) incurred by getting to the split-off point node before the split can even be made.
One way in which it could be O(1) would be if a pointer existed to the node after which the list is split off, but that is different from specifying a target node index. Alternatively, an index could be kept mapping the node index to the corresponding node pointer, but it would be extra space and processing overhead in keeping the index updated in sync with list operations.
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
The documentation is either:
unclear, if n is supposed to be the index,
pessimistic, if n is supposed to be the length of the list (the usual meaning).
The proper complexity, as can be seen in the implementation, is O(min(at, n - at)) (whichever is smaller). Since at must be smaller than n, the documentation is correct that O(n) is a bound on the complexity (reached for at = n / 2), however such a large bound is unhelpful.
That is, the fact that list.split_off(5) takes the same time if list.len() is 10 or 1,000,000 is quite important!
As to why this complexity, this is an inherent consequence of the structure of doubly-linked list. There is no O(1) indexing operation in a linked-list, after all. The operation implemented in C, C++, C#, D, F#, ... would have the exact same complexity.
Note: I encourage you to write a pseudo-code implementation of a linked-list with the split_off operation; you'll realize this is the best you can get without altering the data-structure to be something else.

Is a dynamic array automatically deallocated when length is decreased?

I alread know, that a dynamic array is automatically deallocated/freed after use.
Does the same applies for resizing, especially decreasing? The manual and most help sites only cover increasing the array size.
test: array of TLabel;
SetLength(test, 10);
// fill array here
SetLength(test, 2); // <=== are entries 3-10 are automatically destroyed?
are entries 3-10 are automatically destroyed?
No, they are not automatically destroyed because those entries are dynamically allocated (and are not managed types). Only the pointers that refer to those items are released. It is your responsibility to destroy the items if necessary, because the compiler has no way to guarantee you wouldn't still use them from another reference (or have already destroyed them).
I must also point out that technically items "3-10" is wrong. Dynamic array are zero based. So the references for entries 2 to 9 are the ones released.
I alread know, that a dynamic array is automatically deallocated/freed after use
In addition, your question indicates you don't properly understand this. It seems you believed that when your array goes out of scope the labels referenced would be automatically destroyed. This is incorrect!
No matter where how or why some/all dynamic array entries are released Delphi won't automatically destroy objects types or any dynamically allocated pointer memory. Delphi only automatically releases memory for primitives (Integer, TDateTime, Double short strings), records and managed types1 (interfaces, long strings, other dynamic arrays).
1 Of course this is via reference counting. I.e. reference is reduced by 1; and the underlying object/string/array is released if and only if refCount is reduced to zero.
As whosrdaddy pointed out, if you want automatic destruction of contained objects, then you need to use a container that implements an ownership concept. TObjectList is an example. Although it doesn't work exactly like a dynamic array, it's behaviour is similar enough that it can usually be used as a replacement very easily.

When to unref a GVariant that has a floating reference?

https://developer.gnome.org/glib/unstable/glib-GVariant.html#g-variant-ref-sink
I have read the above glib manual which says: "GVariant uses a floating reference count system. All functions with names starting with g_variant_new_ return floating references." But where is the actual description of what a floating reference count is? I couldn't find a comprehensive description of it.
In particular I want to understand when there is a need to unreference a variant and when not to. For example:
GVariant *a_v = g_variant_new_boolean(TRUE);
GVariant *another_v = g_variant_new("v", a_v);
I think I don't need to unreference a_v because it is consumed by the second g_variant_new. Is that correct?
Do I need to unreference another_v (assuming another_v is not passed to anything else from that point on)?
Where is this documented? (I think I have the right understanding by inferring from different examples found during search but can't seem to find the official glib documentation that explains this clearly).
There is a section on floating references in the GObject reference manual which goes into a bit more detail. Floating references may seem a bit obscure, but they are really very useful for C so taking a few minutes to really understand them is a good idea.
I'm going to assume you understand how reference counting work—if not there is a lot of documentation out there, take a few minutes and read up on that first.
First, lets look at what would happen with your example if g_variant_new_boolean returned a regular reference. When you first get the value, the reference count would be 1. When you pass it to g_variant_new, g_variant_new will increase the reference count to 2. At some point I assume you'll dispose of another_v, at which point the reference count for a_v will drop to 1… but remember, the memory isn't released until the reference count reaches 0.
In order to get around this you have two options. The first is to make g_variant_new steal the caller's reference, which basically sucks as a solution. You give away your reference when you call g_variant_new (or any similar function), so in the future you need to manually ref a_v every time you want to pass it to something else.
The other option is to just unref it manually when you're done. It's not the end of the world, but it's easy to forget to do or get wrong (like by forgetting to unref it in an error path).
What GVariant does instead is return a "floating" ref. The easiest way to think of it (IMHO) is that the first time g_variant_ref gets called it doesn't really do anything—it just "sinks" the floating ref. The reference count goes from 1 to 1. Subsequent calls to g_variant_ref, however, will increase the reference count.
Now lets look at what actually happens with your example. g_variant_new_boolean returns a floating reference. You then pass it to g_variant_new, which calls g_variant_ref, which sinks the floating reference. The reference count is now 1, and when another_v's refcount reaches 0 a_v's refcount will be decremented, in this case reaching 0 and everything will be freed. No need for you to call g_variant_unref.
The cool part about floating references, though, is what happens with something like this:
GVariant *a_v = g_variant_new_boolean(TRUE);
GVariant *another_v = g_variant_new("v", a_v);
GVariant *yet_another_v = g_variant_new("v", a_v);
When g_variant_new is called the second time a_v's refcount will increment again (to 2). No need to call g_variant_ref before passing a_v to g_variant_new a second time—the first call looks just like the first, and consistency is a very nice feature in an API.
At this point it's probably obvious, but yes, you do need to call g_variant_unref on another_v (and, in that last example, yet_another_v).
The reference counting system is explained in the manual of GObject, in particular, in the section Object Memory Management.
When to use it might depend on your application (how the ownership of the variables will work).
The idea is similar to the way i-node works in Unix/Linux when handling files. A file is an object, located in a specific block in the storage. Whenever you create symlink to that file, the file is owned by one extra file (the reference counting increases). Whenever you remove a symlink, the reference counting decreases. When there is nothing owning the object, then it can be destroyed (or the space can be given back to the system).
If you destroy an object, and nothing is linking that object, you cannot use it anymore. If your object might have multiple owners, then you might want to use reference counting, so when one of these owners remove a counter, the object does not get destroyed... no until the last of the owners destroy it.
There is a section on floating references in the GObject reference manual which goes into a bit more detail. Floating references may seem a bit obscure, but they are really very useful for C so taking a few minutes to really understand them is a good idea.
I'm going to assume you understand how reference counting work—if not there is a lot of documentation out there, take a few minutes and read up on that first.
First, lets look at what would happen with your example if g_variant_new_boolean returned a regular reference. When you first get the value, the reference count would be 1. When you pass it to g_variant_new, g_variant_new will increase the reference count to 2. At some point I assume you'll dispose of another_v, at which point the reference count for a_v will drop to 1… but remember, the memory isn't released until the reference count reaches 0.
In order to get around this you have two options. The first is to make g_variant_new steal the caller's reference, which basically sucks as a solution. You give away your reference when you call g_variant_new (or any similar function), so in the future you need to manually ref a_v every time you want to pass it to something else.
The other option is to just unref it manually when you're done. It's not the end of the world, but it's easy to forget to do or get wrong (like by forgetting to unref it in an error path).
What GVariant does instead is return a "floating" ref. The easiest way to think of it (IMHO) is that the first time g_variant_ref gets called it doesn't really do anything—it just "sinks" the floating ref. The reference count goes from 1 to 1. Subsequent calls to g_variant_ref, however, will increase the reference count.
Now lets look at what actually happens with your example. g_variant_new_boolean returns a floating reference. You then pass it to g_variant_new, which calls g_variant_ref, which sinks the floating reference. The reference count is now 1, and when another_v's refcount reaches 0 a_v's refcount will be decremented, in this case reaching 0 and everything will be freed. No need for you to call g_variant_unref.
The cool part about floating references, though, is what happens with something like this:
GVariant *a_v = g_variant_new_boolean(TRUE);
GVariant *another_v = g_variant_new("v", a_v);
GVariant *yet_another_v = g_variant_new("v", a_v);
When g_variant_new is called the second time a_v's refcount will increment again (to 2). No need to call g_variant_ref before passing a_v to g_variant_new a second time—the first call looks just like the first, and consistency is a very nice feature in an API.
At this point it's probably obvious, but yes, you do need to call g_variant_unref on another_v (and, in that last example, yet_another_v).

Time Complexity of Doubly Linked List Element Removal?

A lot of what I'm reading says that removing an internal element in a doubly linked list (DLL) is O(1); but why is this the case?
I understand why it's O(n) for SLLs; traverse the list O(n) and remove O(1) but don't you still need to traverse the list in a DLL to find the element?
For a doubly linked list, it's constant time to remove an element once you know where it is.
For a singly linked list, it's constant time to remove an element once you know where it and its predecessor are.
Since that link you point to shows a singly linked list removal as O(n) and a doubly linked one as O(1), it's certain that's once you already know where the element is that you want to remove, but not anything else.
In that case, for a doubly linked list, you can just use the prev and next pointers to remove it, giving you O(1). Ignoring the edge cases where you're at the head or tail, that means something like:
corpse->prev->next = corpse->next
corpse->next->prev = corpse->prev
free (corpse)
However, in a singly linked list where you only know the node you want deleted, you can't use corpse->prev to get the one preceding it because there is no prev link.
You have to instead find the previous item by traversing the list from the head, looking for one which has a next of the element you want to remove. That will take O(n), after which it's once again O(1) for the actual removal, such as (again, ignoring the edge cases for simplicity):
lefty = head
while lefty->next != corpse:
lefty = lefty-> next
lefty->next = corpse->next
free (corpse)
That's why the two complexities are different in that article.
As an aside, there are optimisations in a singly-linked list which can make the deletion O(n) (the deletion being effectively O(1) once you've found the item you want to delete, and the previous item). In code terms, that goes something like:
# Delete a node, returns true if found, otherwise false.
def deleteItem(key):
# Special cases (empty list and deleting head).
if head == null: return false
if head.data == key:
curr = head
head = head.next
free curr
return true
# Search non-head part of list (so prev always exists).
prev = head
curr = head.next
while curr != null:
if curr.data == key:
# Found it so delete (using prev).
prev.next = curr.next
free curr
return true
# Advance to next item.
prev = curr
curr = curr.next
# Not found, so fail.
return false
As it's stated where your link points to:
The cost for changing an internal element is based on already having a pointer to it, if you need to find the element first, the cost for retrieving the element is also taken.
So, for both DLL and SLL linear search is O(n), and removal via pointer is O(1).
The complexity of removal in DLL is O(1).
It can also be O(1) in SLL if provided pointer to preceding element and not to the element itself.
This complexity is assuming you know where the element is.
I.e. the operation signature is akin to remove_element(list* l, link* e)
Searching for the element is O(n) in both cases.
#Matuku: You are correct.
I humbly disagree with most answers here trying to justify how delete operation for DLL is O(1). It's not.
Let me explain.
Why are we considering the scenario that we 'would' have the pointer to the node that is being deleted? LinkedLists (Singly/Doubly) are traversed linearly, that's their definition. They have pointers only to the head/tail. How can we suddenly have a pointer to some node in between? That defeats the purpose of this data structure. And going by that assumption, if I have a DLL list of say 1 million nodes, then do I also have to maintain 1 million pointers (let's call them access pointers) pointing to each of those nodes so that I can delete them in O(1)? So how would I store those 1 millions access pointers? And how do I know which access pointer points to the correct data/node that I want to delete?
Can we have a real world example where we 'have' the pointer to the data that has to be deleted 100% of the time?
And if you know the exact location/pointer/reference of/to the node to be deleted, why to even use a LinkedList? Just use array! That's what arrays are for - direct access to what you want!
By assuming that you have direct access to any node you want in DLL is going against the whole idea of LinkedList as a conceptual Data Structure. So I agree with OP, he's correct. I will stick with this - Doubly LinkedLists cannot have O(1) for deleting any node. You still need to start either from head or tail, which brings it down to O(n).
" If " we have the pointer to the node to be deleted say X, then of course it's O(1) because we have pointers to the next and prev node we can delete X. But that big if is imaginary, not real.
We cannot play with the definition of the sacred Data Structure called LinkedLists for some weird assumptions we may have from time to time.

VB6 Collections/Object References

I was wondering if someone could tell what happens with regards to memory when the following happens:
Dict = New Dictionary --- Col = New Collection
Dict.Add Key, CustomClassOne
Dict.Add Key2, CustomClassTwo
Dict.Add Key3, CustomClassThree
Dict.Remove Key3
At this point is Key3 removed from memory or would I have to Set Dict.Item(Key3) = Nothing to remove it from memory?
Set Dict = Nothing '// will this remove All the above added custom class objects?
Set Col = Nothing '// Same question as above
Ugh VB memory management.... TY for your time,
- Austin
VB is reference counted.
The rules of when an object is released from memory is simple.. it happens when there are no more references to that object. Each time an object goes out of scope (such as the end of a function) its reference count is decreased; which may in turn cause any objects which were referenced by this object to have their reference counts decreases too; and if their reference counts get to 0, they too are released from memory.
This is why there is usually no need to set an object's reference to Nothing... that will decrease its reference count, but that will also happen when it goes out of scope.
So to answer your question:
Dict.Remove Key3 is all that is required to remove CustomClassThree and Key3 from memory (as long as you don't have other references pointing to this object).
Set Dict = Nothing will remove everything from memory, but this would happen anyway when it goes out of scope (again assuming there are no other references pointing to the objects it contains).
Col doesn't seem to have much to do with the other statements and would be removed from memory when it goes out of scope without needing to set Col = nothing
Note:
The purpose of setting a reference to nothing is only really useful if you have objects which both have references to each other. Look up circular references for the details
With both Scripting.Dictionary and Collection instances when the last reference to the object is gone then the object references they hold are released. Whether or not the objects themselves are deallocated depends on whether or not another variable holds a reference to the same object.
Think of each reference as a rope holding a rock above an abyss. Until the last rope is cut the rock doesn't drop out of existence.
Removing an item from a Dictionary or Collection cuts that one rope.

Resources