Making a linked list with mutable elements in a contiguous block of memory

Making a linked list with mutable elements in a contiguous block of memory - linked-list

I was reading about the use of memory pools in real-time contexts, and I wanted to implement the code in the article Making a Pool Allocator. I'm already stuck at square one: making a circular singly-linked list of fixed-size chunks that all reside in a contiguous block. Further down the line, the linked list is used to track the next free memory chunk; allocated chunks are popped off the list; deallocated chunks are prepended back onto the list.
For a contiguous block of fixed-size elements, I obviously considered a Vector of immutable elements. Rather than making a circular linked list with mutable nodes, I could use the array index as an address:
struct Chunk{T}
next::Int
data::T
end
function Pool(T, size)
pool = Chunk.(2:size+1, zero(T))
pool[size] = Chunk(1, zero(T))
return pool
end
intpool = Pool(Int, 5)
The line pool[size] = Chunk(1, zero(T)) gave me pause: a downside of immutable Chunks is that I need to access and edit the pool every time I did something with the data, and I didn't want to be restricted to immutable data anyway. Unfortunately, elements generally exist independently and can outlive containers e.g. return pool[1], so mutable elements are allocated on the heap separately from a containing struct or array. However, in this use case, the elements only exist in the array.
This may be a long shot, but is there a way to make a linked list of mutable elements that are allocated in a contiguous block of memory?

After some time, I figured that I was asking for something weird: a so-called "mutable" object that isn't allocated independently on the heap and thus doesn't have an address in the usual sense. What I actually needed is the syntax of mutability; mutating pool is just fine, and the indices can serve as addresses. I could do that for assignment-mutation by overloading getproperty and setproperty! for a type that represents pool[index]. This is hardly a final draft but it gets around the problem posed in my question:
# include the original post's code here
struct PoolRef{T}
pool::Vector{Chunk{T}}
index::Int64
end
# points to heap-allocated Vector, but not heap allocated in Julia >=1.5
function Base.getproperty(p::PoolRef{T}, name::Symbol) where T
if name == :data
getfield(p.pool[p.index], name)
else # :pool, :index
getfield(p, name)
end
end
function Base.setproperty!(p::PoolRef{T}, name::Symbol, newdata::T) where T
if name == :data
oldvalue = p.pool[p.index]
newvalue = Chunk(oldvalue.next, newdata)
p.pool[p.index] = newvalue
end
end
# intended usage
x = PoolRef(3, intpool)
x.data += 33
x.data
For a Chunk with multiple data fields, creating a new instance to only edit a select field is complicated and wasteful, so in that case I could keep separate pools for each field. I'm also not sure of the exact "size limit" for allocating immutable structs on the stack or inline in arrays instead of the heap, as mentioned in the Julia 1.5 Highlights, but splitting the fields into pools would help with that too.
Of course, the alternative to such workarounds is to manage memory in C and wrapping it in Julia, but I'll learn that another day.

Related

Copy-on-write for stacks

How to implement copy-on-write technique for stack management in postfix calculations of big numbers?
I want to optimize my system regarding operations like:
duplicate top of stack
swap the two top elements
copy the second element on stack to the top
rotate the stack making the third element on top
apply n-ary operations on the stack
A typical operation take a number of parameters from top of stack and leave a number of results instead.
My stack is implemented as an array where data grows from low-mem to hi-mem and pointers from hi-mem to low-mem.

As I can see, the problem is not in the copy-on-write technique per se, but in memory management and garbage collecting.
In Forth we cannot have operator overloading and should have an explicitly separate set of operations for each class of numbers. Then we have two options:
A set of operations over bigint objects (their handlers), with manual memory management, similar to operations over files, dynamic memory buffers, or other objects.
A full set of operations including the separate stack, like operations over floating-point numbers, with automatic memory management.
Note that option (2) contains (1) under the hood.
Reference counting
One approach to implement automatic memory management in option (2) is to use a dedicated stack (or maybe two stacks) and reference counting. The stack contains references to objects (buffers). The operations that alter data (like b::1+) should make copy of the buffer and replace the reference by a new one if the counter is greater than 1 (otherwise data can be altered in the place).
Placing an item (a reference) to the stack should increase its counter, removing from the stack should decrease its counter. When the counter become 0, the buffer should be freed.
Such operations like b::dup, b::over increase the counter. b::swap doesn't change any counter. b::drop decreases the counter (and frees the buffer if the counter is 0).
Moving an item from the stack into a bigint variable should not change the counter in the result. But the counter for the previous value of the variable (if any) should be decreased.
If named variables and constants are not enough (e.g., to have user-defined arrays), you may need to introduce into the API the pointers (a kind of anonymous variables) or handlers to bigint objects.
Immutable objects
Another approach to implement option (2) is to have immutable objects and garbage collection loop. In this approach we don't need to count references, but need to maintain the list of external pointers (including variables).
A simple example of such garbage collection can be found in the implementation of s-expressions by Peter Sovietov: Функциональное программирование на языке Форт ("Functional programming in Forth language" in Russian, but it can be easy translated using Google Translate).

Delphi dynamic array efficiency

I am not a Delphi expert and I was reading online about dynamic arrays and static arrays. In this article I have found a chapter called "Dynamic v. Static Arrays" with a code snippet and below the author says:
[...] access to a dynamic array can be faster than a static array!
I have understood that dynamic arrays are located on the heap (they are implemented with references/pointers).
So far I know that the access time is better on dynamic arrays. But is that the same thing with the allocation? Like if I called SetLength(MyDynArray, 5) is that slower than creating a MyArray = array[0..4] of XXX?

So far I know that the access time is better on dynamic arrays.
That is not correct. The statement in that article is simply false.
But is that the same thing with the allocation? Like if I called SetLength(MyDynArray, 5) is that slower than creating a MyArray = array[0..4] of XXX?
A common fallacy is that static arrays are allocated on the heap. They could be global variables, and so allocated automatically when the module is loaded. They could be local variables and allocated on the stack. They could be dynamically allocated with calls to New or GetMem. Or they could be contained in a compound type (e.g. a record or a class) and so allocated in whatever way the owning object is allocated.
Having got that clear, let's consider a couple of common cases.
Local variable, static array type
As mentioned, static arrays declared as local variables are allocated on the stack. Allocation is automatic and essentially free. Think of the allocation as being performed by the compiler (when it generates code to reserve a stack frame). As such there is no runtime cost to the allocation. There may be a runtime cost to access because this might generate a page fault. That's all perfectly normal though, and if you want to use a small fixed size array as a local variable then there is no faster way to do it.
Member variable of a class, static array type
Again, as described above, the allocation is performed by the containing object. The static array is part of the space reserved for the object and when the object is instantiated sufficient memory is allocated on the heap. The cost for heap allocation does not typically depend significantly on the size of the block to be allocated. An exception to that statement might be really huge blocks but I'm assuming your array is relatively small in size, tens or hundreds of bytes. Armed with that knowledge we can see again that the cost for allocation is essentially zero, given that we are already allocating the memory for the containing object.
Local variable, dynamic array type
A dynamic array is represented by a pointer. So your local variable is a pointer allocated on the stack. The same argument applies as for any other local variable, for instance the local variable of static array type discussed above. The allocation is essentially free. Before you can do anything with this variable though, you need to allocate it with a call to SetLength. That incurs a heap allocation which is expensive. Likewise when you are done you have to deallocate.
Member variable of a class, dynamic array type
Again, allocation of the dynamic array pointer is free, but you must call SetLength to allocate. That's a heap allocation. There needs to be a deallocation too when the object is destroyed.
Conclusion
For small arrays, whose lengths are known at compile time, use of static arrays results in more efficient allocation and deallocation.
Note that I am only considering allocation here. If allocation is a relatively insignificant portion of the time spent working with the object then this performance characteristic may not matter. For instance, suppose the array is allocated at program startup, and then used repeatedly for the duration of the program. In such a scenario the access times dominate the allocation times and the difference between allocation times becomes insignificant.
On the flip side, imagine a short function called repeatedly during the programs lifetime, let's suppose this function is the performance bottleneck. If it operates on a small array, then it is possible that the allocation cost of using a dynamic array could be significant.
Very seldom can you draw hard and fast rules with performance. You need to understand how the tools work, and understand how your program uses these tools. You can then form opinions on which coding strategies might perform best, opinions that you should then test by profiling. You will be surprised more often than you might expect that your intuition is not a good predictor of performance.

Can .each have different GC impacts compared to .map?

Does one of these two garbage collect differently, or at they the same?
array_of_strings.each do |str|
MailerClass.some_template(str).deliver_later
end
vs.
array_of_strings.map do |str|
MailerClass.some_template(str).deliver_later
end
I've formatted the question for mailer objects and task-queue stuff because it's what I have been working on recently, but if this will unduly complicate answers then I can reword that to be more generic.
context, as relevant:
I considered that .map would return an object with a pointed to each
Object that was created, and that .each would not. [this seems to
not be the case]
I have a worker machine that leaks memory until crashing — all
happening in one event in which this pseudo-code example is the
central action. When I changed it from .each to .map then the
memory leak stopped, but I don't know how that GC is happening in
regards to these methods.

Since map returns an Array of the results of evaluating the block with each member of array_of_strings, there is going to be at least one extra object to be collected.

Depending on which exact Ruby implementation you use, the implementation of Enumerable#map may be written in Ruby, Smalltalk, RPython, ECMAScript, C#, Java, Go, Objective-C, C++, C, or any number of other languages, but its workings are always going to look like this:
module Enumerable
def map
return enum_for(__method__) unless block_given?
[].tap {|result| each {|el| result << yield el }}
end
end
map must call each, there's simply no other way for it to iterate. So, it will allocate the exact same objects that each would allocate plus the additional result Array, i.e. exactly one more object than each would.
Actually, many Ruby implementations override Enumerable#map with Array#map which is more efficient than the generic Enumerable version. They usually iterate over the Array elements directly, without calling each, kind of like this:
class Array
def map
return enum_for(__method__) unless block_given?
result = []
i = -1
while i += 1 <= size
result << yield #__entries__[i]
end
end
end
This is more efficient than the generic Enumerable#map, since it uses special knowledge about how to iterate the Array in order to avoid the method call overhead to each. Note, however, that each would be implemented in the exact same way, unless the implementors are incredibly stupid, so the end result is the same: map will still allocate exactly one object more than each, it's just saving one method call.
Note that, since you ignore the return value of map, the extra object becomes eligible for garbage collection immediately. However, until then, it holds references to all the objects returned by the deliver_later method, whatever they are.
So, the differences between the two pieces of code are:
with each, the return value of deliver_later is ignored and becomes eligible for garbage collection immediately
with map, there is an additional Array allocated which keeps references to the return values of all the deliver_later calls inside the block, and thus keeps those objects alive until after the call to map returns, then, the Array and thus all the objects it references will become eligible for garbage collection at the same time

Does erlang implement record copy-and-modify in any clever way?

given:
-record(foo, {a, b, c}).
I do something like this:
Thing = #foo{a={1,2}, b={3,4}, c={5,6}},
Thing1 = Thing#foo{a={7,8}}.
From a semantic view, Thing and Thing1 are unique entities. However, from a language implementation standpoint, making a full copy of Thing to generate Thing1 would be intensely wasteful. For example, if the record were a megabyte in size and I made a thousand "copies," each modifying a couple of bytes, I've just burned a gigabyte. If the internal structure kept track of a representation of the parent structure and each derivative marked up that parent in a way that indicated its own change but preserved everyone elses' versions, the derivates could be created with a minimum of memory overhead.
My question is this: is erlang doing anything clever - internally - to keep the overhead of the usual erlang scribble;
Thing = #ridiculously_large_record,
Thing1 = make_modified_copy(Thing),
Thing2 = make_modified_copy(Thing1),
Thing3 = make_modified_copy(Thing2),
Thing4 = make_modified_copy(Thing3),
Thing5 = make_modified_copy(Thing4)
...to a minimum?
I ask because there would be a number of changes to the way that I did cross-process communications if this were the case.

The exact workings of the garbage collection and memory allocation is only known to a few. Thankfully, they are very happy to share their knowledge and the following is based on what I have learnt from the erlang-questions mailing list and by discussing with OTP developers.
When messaging between processes, the content is always copied as there is no shared heap between processes. The only exception is binaries bigger than 64 bytes, where only a reference is copied.
When executing code in one process, only parts are updated. Let's analyze tuples, as that is the example you provided.
A tuple is actually a structure that keeps references to the actual data somewhere on the heap (except for small integers and maybe one more data type which I can't remember). When you update a tuple, using for example setelement/3, a new tuple is created with the given element replaced, however for all other elements only the reference is copied. There is one exception which I have never been able to take advantage of.
The garbage collector keeps track of each tuple and understands when it is safe to reclaim any tuple that is no longer used. It might be that the data referenced by the tuple is still in use, in which case the data itself is not collected.
As always, Erlang gives you some tools to understand exactly what is going on. The efficiency guide details how to use erts_debug:size/1 and erts_debug:flat_size/1 to understand the size of the data structure when used internally in a process and when copied. The trace tools also allows you to understand when, what and how much was garbage collected.

The record foo is of arity four (holding four words), but the whole structure is 14 words in size. Any immediate (pids, ports, small integers, atoms, catch and nil) can be stored directly in the tuples array. Any other term which can't fit into a word, such as other tuples, are not stored directly but referenced by boxed pointers (a boxed pointer is an erlang term with a forwarding address to the real eterm ... just internals).
In your case a new tuple of same arity is created and the atom foo and all the pointers are copied from the previous tuple except for index two, a, which points to the new tuple {7,8} which constitutes 3 words. In all 5 + 3 new words are created on the heap and only 3 words are copied from the old tuple the other 9 words are not touched.
Excessively large tuples are not recommended. When updating a tuple, the whole tuple, i.e the array and not the deep content, needs to copied and then updated in other to preserve a persistent data structure. This will also generate increased garbage, forcing the garbage collector to heat up which also hurts performance. The dict and array modules avoids using large tuples for this reason and have a shallow tuple tree instead.

I can definitely verify what people have already pointed out:
a record is just a tuple with the record name as the first element and all the fields just the following tuple element
when an element of a tuple is changed, updating a field in a record in your case, only the top level tuple is new, all the elements are just reused
This works just because we have immutable data. So in your example each time you update a value in a #foo record none of the data in the elements are copied and only a new 4-element tuple (5 words) is created. Erlang will never does a deep copy in this type of operation or when passing arguments in function calls.

In conclusion:
Thing = #foo{a={1,2}, b={3,4}, c={5,6}},
Thing1 = Thing#foo{a={7,8}}.
Here, if Thing is not used again, it will probably be updated in place and copying of the tuple will be avoided, as the Efficiency Guide says. (tuple and record syntax is complied into something like setelement, I think)
Thing = #ridiculously_large_record,
Thing1 = make_modified_copy(Thing),
Thing2 = make_modified_copy(Thing1),
...
Here the tuples are actually copied every time.
I guess that it would be theoretically possible make an interesting optimization to this. If the compiler could perform escape analysis on the return value of make_modified_copy and detect that the only reference to it is the one returned, in could save this information about the function. When it encounter a call the that function it would know that it is safe to modify the return value in place.
This would only be possible to do on inter module calls, because of the code replace feature.
Maybe one day we will have it.

Records in Delphi

some questions about records in Delphi:
As records are almost like classes, why not use only classes instead of records?
In theory, memory is allocated for a record when it is declared by a variable; but, and how is memory released after?
I can understand the utility of pointers to records into a list object, but with Generics Containers (TList<T>), are there need to use pointer yet? if not, how to delete/release each record into a Generic Container? If I wanna delete a specific record into a Generic Container, how to do it?

There are lots of differences between records and classes; and no "Pointer to record" <> "Class". Each has its own pros and cons; one of the important things about software development is to understand these so you can more easily choose the most appropriate for a given situation.
This question is based on a false premise. Records are not almost like classes, in the same way that Integers are not almost like Doubles.
Classes must always be dynamically instantiated, whereas this is a possibility, but not a requirement for records.
Instances of classes (which we call objects) are always passed around by reference, meaning that multiple sections of code will share and act on the same instance. This is something important to remember, because you may unintentionally modify an object as a side-effect; although when done intentionally it's a powerful feature. Records on the other hand are passed by value; you need to explicitly indicate if you're passing them by reference.
Classes do not 'copy as easily as records'. When I say copy, I mean a separate instance duplicating a source. (This should be obvious in light of the value/reference comment above).
Records tend to work very nicely with typed files (because they're so easy to copy).
Records can overlay fields with other fields (case x of/unions)
These were comments on certain situational benefits of records; conversely, there are also situational benefits for classes that I'll not elaborate on.
Perhaps the easiest way to understand this is to be a little pedantic about it. Let's clarify; memory is not really allocated 'when its declared', it's allocated when the variable is in scope, and deallocated when it goes out of scope. So for a local variable, it's allocated just before the start of the routine, and deallocated just after the end. For a class field, it's allocated when the object is created, and deallocated when it's destroyed.
Again, there are pros and cons...
It can be slower and require more memory to copy entire records (as with generics) than to just copy the references.
Passing records around by reference (using pointers) is a powerful technique whereby you can easily have something else modify your copy of the record. Without this, you'd have to pass your record by value (i.e. copy it) receive the changed record as a result, copy it again to your own structures.
Are pointers to records like classes? No, not at all. Just two of the differences:
Classes support polymorphic inheritance.
Classes can implement interfaces.

For 1 and 2: records are value types, while classes are reference types. They're allocated on the stack, or directly in the memory space of any larger variable that contains them, instead of through a pointer, and automatically cleaned up by the compiler when they go out of scope.
As for your third question, a TList<TMyRecord> internally declares an array of TMyRecord for storage space. All the records in it will be cleaned up when the list is destroyed. If you want to delete a specific one, use the Delete method to delete by index, or the Remove method to find and delete. But be aware that since it's a value type, everything you do will be making copies of the record, not copying references to it.

One of the main benefits of records is, when you have a large "array of record". This is created in memory by allocating space for all records in one contiguous RAM space, which is extremely fast. If you had used "array of TClass" instead, each object in the array would have to be allocated by itself, which is slow.
There has been a lot of work to improve the speed of allocating memory, in order to improve the speed of strings and objects, but it will never be as fast as replacing 100,000 memory allocations with 1 memory allocation.
However, if you use array of record, don't copy the record around in local variables. That may easily kill the speed benefit.

1) To allow for inheritance and polymorphism, classes have some overhead. Records do not allow them, and in some situations may be somewhat faster and simpler to use. Unlike classes, that are always allocated in the heap and managed through references, records can be allocated on the stack also, accessed directly, and assigned each other without requiring to call an "Assign" method.
Also records are useful to access memory blocks with a given structure, because their memory layout is exactly how you define it. A class instance memory layout is controlled by the compiler and has additional data to make objects work (i.e. the pointer to the Virtual Method Table).
2) Unless you allocate records dynamically, using New() or GetMem(), record's memory is managed by the compiler as ordinals, floats or static arrays: global variables memory is allocated at startup and released when the program terminates, and local variables are allocated on the stack entering a function/procedure/method and released exiting. Allocating/releasing memory in the stack is faster because it doesn't require calls to the memory manager, it's just very few assembler instructions to change the stack registers. But be aware that allocating large structure on the stack may cause a stack overflow, because the maximum stack size is fixed and not very large (see linker options).
If records are fields of a class, they are allocated when the class is created and released when the class is freed.
3) One of the advantages of generics is to eliminate the need of low-level pointer management - but be aware of the inner workings.

There are a few other differences between a class and a record. Classes can use polymorphism, and expose interfaces. Records can not implement destructors (although since Delphi 2006 they can now implement constructors and methods).
Records are very useful in segmenting memory into a more logical structure since the first data item in the record is at the same address point of the pointer to the record itself. This is not the case for classes.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart