When this structure is allocated, how much memory is wasted? - memory

I'm building an index tree based on a bplus tree in Rust and have so far used a definition for parent nodes as:
struct Parent<T : std::fmt::Debug> {
subtree_count : [usize; Arity],
children : [*mut ParentOrLeaf<T>; Arity],
used : usize,
}
On my 64 bit computer with Arity=8 that works out to a total memory requirement of 136 bytes. I'm using std::alloc::Layout::new and std::alloc::alloc to allocate this structure. But I'm worried that being just slightly larger than a power of two (136 > 128) that the malloc library will end up allocating 256 bytes for this data structure instead of just 136. Since this is a container type, wasting half the memory allocated is unacceptable.
std::alloc::Layout::new::<Parent<T>>().align() reports a layout of 8 as expected.
How much memory will this structure actually take up when it is allocated?
If so much memory is wasted, I could change subtree_count : [usize; Arity] to subtree_count : [usize; Arity-1], which would make the total memory 128. Then redo all of the optimized logic of my library to handle the change. But before I do, I want to make sure that is actually necessary.

If the size is 136 this means that a contiguous allocation of many structs in an array or a vector will use exactly 136 bytes for each struct.
When it comes to individually allocating some structs, the amount of wasted space only depends on the underlying malloc() strategy, but is not a property of the allocated type.
For example, a quick and dirty adaptation of your example on my stable-x86_64-unknown-linux-gnu platform gives this :
size 136
align 8
arr delta1 136
arr delta2 136
box delta1 144
box delta2 144
Of course there is no guaranty that the three allocated structs are near to each other, but in this specific case they are, and the wasted (not really wasted, but used by the allocator itself) space is 8 bytes.
struct ParentOrLeaf<T: std::fmt::Debug> {
value: Option<T>,
}
const Arity: usize = 8;
struct Parent<T: std::fmt::Debug> {
subtree_count: [usize; Arity],
children: [*mut ParentOrLeaf<T>; Arity],
used: usize,
}
fn main() {
type P = Parent<i32>;
let l = std::alloc::Layout::new::<P>();
println!("size {}", l.size());
println!("align {}", l.align());
let ptr: *mut ParentOrLeaf<i32> = std::ptr::null_mut();
let arr = [
P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
},
P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
},
P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
},
];
let a0 = &arr[0] as *const P as usize;
let a1 = &arr[1] as *const P as usize;
let a2 = &arr[2] as *const P as usize;
println!("arr delta1 {}", a1 - a0);
println!("arr delta2 {}", a2 - a1);
let p0 = Box::new(P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
});
let p1 = Box::new(P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
});
let p2 = Box::new(P {
subtree_count: [0; Arity],
children: [ptr; Arity],
used: 0,
});
let a0 = p0.as_ref() as *const P as usize;
let a1 = p1.as_ref() as *const P as usize;
let a2 = p2.as_ref() as *const P as usize;
println!("box delta1 {}", a1 - a0);
println!("box delta2 {}", a2 - a1);
}

Related

is a Fortran subroutine with a dummy argument specified size array thread safe

The following code compiles in gfortran, with a warning about large_array being larger than the limit for a stack variable, stating that the array will be moved to static memory and is therefore not threadsafe:
subroutine stack_size_warning
implicit none
real :: large_array(65536)
print *, large_array
end subroutine stack_size_warning
This subroutine however compiles with no errors or warnings, and I can call it with n values larger than 65536 without issue, at least in simple cases.
subroutine no_warning(n)
implicit none
integer :: n
real :: automatic_array(n)
print *, automatic_array
end subroutine no_warning
Is this second array threadsafe? Where is the memory allocated for automatic_array in this second subroutine? Is the memory allocated and deallocated on every call making it slower than if it was on the stack or if a preallocated array was passed in as a dummy argument?
I wrote the following program to test 3 scenarios, a subroutine with a small array on the stack, another with a large array over the stack limit and thus stored in static memory, and a third where a dummy argument specifies the size of an array defined inside the routine.
Here is that program:
program main
implicit none
call small
call large
call automatic(65536)
end program main
subroutine small
implicit none
real :: small_array(10)
small_array=1.
print *, small_array
end subroutine small
subroutine large
implicit none
real :: large_array(65536)
large_array=1.
print *, large_array
end subroutine large
subroutine automatic(n)
implicit none
integer :: n
real :: automatic_array(n)
automatic_array=1.
print *, automatic_array
end subroutine automatic
Using steve's recommendation I compiled with a tree dump as follows:
gfortran array_dim_test.f90 -o array_dim_test -fdump-tree-original
The full dump is at the end, but to summarize what I see, the automatic subroutine has a try/finally block. In the try block, a call to malloc allocates the memory, and in the finally block, the memory is freed. So I guess this memory is allocated and deallocated on the heap with every call to the subroutine. This intuitively makes sense as how else would the program know what to do with this array that lives only in the subroutine, and whose size is defined in a call to the subroutine, but it is interesting to see the explicit calls in the tree dump. This would appear to be thread-safe then, but perhaps also not the most efficient thing to do if this routine is called many times with the same array size parameter, allocating and deallocating memory with every call.
Here is the tree dump:
__attribute__((fn spec (". w ")))
void automatic (integer(kind=4) & restrict n)
{
void * restrict D.3964;
integer(kind=8) ubound.0;
integer(kind=8) size.1;
real(kind=4)[0:D.3961] * restrict automatic_array;
integer(kind=8) D.3961;
bitsizetype D.3962;
sizetype D.3963;
try
{
ubound.0 = (integer(kind=8)) *n;
size.1 = NON_LVALUE_EXPR <ubound.0>;
size.1 = MAX_EXPR <size.1, 0>;
D.3961 = size.1 + -1;
D.3962 = (bitsizetype) (sizetype) NON_LVALUE_EXPR <size.1> * 32;
D.3963 = (sizetype) NON_LVALUE_EXPR <size.1> * 4;
D.3964 = (void * restrict) __builtin_malloc (MAX_EXPR <(unsigned long) (size.1 * 4), 1>);
automatic_array = (real(kind=4)[0:D.3961] * restrict) D.3964;
{
integer(kind=8) D.3940;
D.3940 = ubound.0;
{
integer(kind=8) S.2;
S.2 = 1;
while (1)
{
if (S.2 > D.3940) goto L.1;
(*automatic_array)[S.2 + -1] = 1.0e+0;
S.2 = S.2 + 1;
}
L.1:;
}
}
{
struct __st_parameter_dt dt_parm.3;
dt_parm.3.common.filename = &"array_dim_test.f90"[1]{lb: 1 sz: 1};
dt_parm.3.common.line = 27;
dt_parm.3.common.flags = 128;
dt_parm.3.common.unit = 6;
_gfortran_st_write (&dt_parm.3);
{
integer(kind=8) D.3944;
struct array01_real(kind=4) parm.4;
D.3944 = ubound.0;
parm.4.span = 4;
parm.4.dtype = {.elem_len=4, .rank=1, .type=3};
parm.4.dim[0].lbound = 1;
parm.4.dim[0].ubound = D.3944;
parm.4.dim[0].stride = 1;
parm.4.data = (void *) &(*automatic_array)[0];
parm.4.offset = -1;
_gfortran_transfer_array_write (&dt_parm.3, &parm.4, 4, 0);
}
_gfortran_st_write_done (&dt_parm.3);
}
}
finally
{
__builtin_free ((void *) automatic_array);
}
}
__attribute__((fn spec (". ")))
void large ()
{
static real(kind=4) large_array[65536];
{
integer(kind=8) S.5;
S.5 = 1;
while (1)
{
if (S.5 > 65536) goto L.2;
large_array[S.5 + -1] = 1.0e+0;
S.5 = S.5 + 1;
}
L.2:;
}
{
struct __st_parameter_dt dt_parm.6;
dt_parm.6.common.filename = &"array_dim_test.f90"[1]{lb: 1 sz: 1};
dt_parm.6.common.line = 19;
dt_parm.6.common.flags = 128;
dt_parm.6.common.unit = 6;
_gfortran_st_write (&dt_parm.6);
{
struct array01_real(kind=4) parm.7;
parm.7.span = 4;
parm.7.dtype = {.elem_len=4, .rank=1, .type=3};
parm.7.dim[0].lbound = 1;
parm.7.dim[0].ubound = 65536;
parm.7.dim[0].stride = 1;
parm.7.data = (void *) &large_array[0];
parm.7.offset = -1;
_gfortran_transfer_array_write (&dt_parm.6, &parm.7, 4, 0);
}
_gfortran_st_write_done (&dt_parm.6);
}
}
__attribute__((fn spec (". ")))
void small ()
{
real(kind=4) small_array[10];
{
integer(kind=8) S.8;
S.8 = 1;
while (1)
{
if (S.8 > 10) goto L.3;
small_array[S.8 + -1] = 1.0e+0;
S.8 = S.8 + 1;
}
L.3:;
}
{
struct __st_parameter_dt dt_parm.9;
dt_parm.9.common.filename = &"array_dim_test.f90"[1]{lb: 1 sz: 1};
dt_parm.9.common.line = 12;
dt_parm.9.common.flags = 128;
dt_parm.9.common.unit = 6;
_gfortran_st_write (&dt_parm.9);
{
struct array01_real(kind=4) parm.10;
parm.10.span = 4;
parm.10.dtype = {.elem_len=4, .rank=1, .type=3};
parm.10.dim[0].lbound = 1;
parm.10.dim[0].ubound = 10;
parm.10.dim[0].stride = 1;
parm.10.data = (void *) &small_array[0];
parm.10.offset = -1;
_gfortran_transfer_array_write (&dt_parm.9, &parm.10, 4, 0);
}
_gfortran_st_write_done (&dt_parm.9);
}
}
__attribute__((fn spec (". ")))
void MAIN__ ()
{
small ();
large ();
{
static integer(kind=4) C.3993 = 65536;
automatic (&C.3993);
}
}
__attribute__((externally_visible))
integer(kind=4) main (integer(kind=4) argc, character(kind=1) * * argv)
{
static integer(kind=4) options.11[7] = {2116, 4095, 0, 1, 1, 0, 31};
_gfortran_set_args (argc, argv);
_gfortran_set_options (7, &options.11[0]);
MAIN__ ();
return 0;
}

How would I be able to make a register-based virtual machine code off of a Binary Tree for math interpretation

My code is represented in Dart, but this is more general to the Binary Tree data structure and Register-based VM implementation. I have commented the code for you to understand if you do not know Dart as well.
So, here are my nodes:
enum NodeType {
numberNode,
addNode,
subtractNode,
multiplyNode,
divideNode,
plusNode,
minusNode,
}
NumberNode has a number value in it.
AddNode, SubtractNode, MultiplyNode, DivideNode, they are really just Binary Op Nodes .
PlusNode, MinusNode, are Unary Operator nodes.
The tree is generated based off Order of Operations. Unary Operation first, then multiplication and division, and then addition and subtraction. E.g. "1 + 2 * -3" becomes "(1 + (2 * (-3)))"
Here is my code to trying to walk over the AST:
/// Converts tree to Register-based VM code
List<Opcode> convertNodeToCode(Node node) {
List<Opcode> result = [const Opcode(OpcodeKind.loadn, 2, -1)];
bool counterHasBeenZero = false;
bool binOpDebounce = false;
int counter = 0;
List<Opcode> convert(Node node) {
switch (node.nodeType) {
case NodeType.numberNode:
counter = counter == 0 ? 1 : 0;
if (counter == 0 && !counterHasBeenZero) {
counterHasBeenZero = true;
} else {
counter = 1;
}
return [Opcode(OpcodeKind.loadn, counter, (node as NumberNode).value)];
case NodeType.addNode:
var aNode = node as AddNode;
return convert(aNode.nodeA) +
convert(aNode.nodeB) +
[
const Opcode(
OpcodeKind.addn,
0,
1,
)
];
case NodeType.subtractNode:
var sNode = node as SubtractNode;
var result = convert(sNode.nodeA) +
convert(sNode.nodeB) +
(binOpDebounce
? [
const Opcode(
OpcodeKind.subn,
0,
0,
1,
)
]
: [
const Opcode(
OpcodeKind.subn,
0,
1,
)
]);
if (!binOpDebounce) binOpDebounce = true;
return result;
case NodeType.multiplyNode:
var mNode = node as MultiplyNode;
var result = convert(mNode.nodeA) +
convert(mNode.nodeB) +
(binOpDebounce
? [
const Opcode(
OpcodeKind.muln,
0,
0,
1,
)
]
: [
const Opcode(
OpcodeKind.muln,
0,
1,
)
]);
if (!binOpDebounce) binOpDebounce = true;
return result;
case NodeType.divideNode:
var dNode = node as DivideNode;
var result = convert(dNode.nodeA) +
convert(dNode.nodeB) +
(binOpDebounce
? [
const Opcode(
OpcodeKind.divn,
0,
0,
1,
)
]
: [
const Opcode(
OpcodeKind.divn,
0,
1,
)
]);
if (!binOpDebounce) binOpDebounce = true;
return result;
case NodeType.plusNode:
return convert((node as PlusNode).node);
case NodeType.minusNode:
return convert((node as MinusNode).node) +
[Opcode(OpcodeKind.muln, 1, 2)];
default:
throw Exception('Non-existent node type');
}
}
return result + convert(node) + [const Opcode(OpcodeKind.exit)];
}
I tried a method to just use 2-3 registers and using a counter to track where I loaded the number in the register, but the code gets ugly real quick and when I'm trying to do Order of Operations, it gets really hard to track where the numbers are with the counter. Basically, how I tried to make this code work is just store the number in register 1 or 0 and load the number if needed to and add the registers together to equal to register 0. Example, 1 + 2 + 3 + 4 becomes [r2 = -1.0, r1 = 1.0, r0 = 2.0, r0 = r1 + r0, r1 = 3.0, r0 = r1 + r0, r1 = 4.0, r0 = r1 + r0, exit]. When I tried this with multiplication though, this became very hard as it incorrectly multiplied the wrong number which is possibly because of the order of operations.
I tried to see if this way could be done as well:
// (1 + (2 * ((-2) + 3) * 5))
const code = [
// (-2)
Opcode(OpcodeKind.loadn, 1, -2), // r1 = -2;
// (2 + 3)
Opcode(OpcodeKind.loadn, 1, 2), // r1 = 2;
Opcode(OpcodeKind.loadn, 2, 3), // r2 = 3;
Opcode(OpcodeKind.addn, 2, 1, 2), // r2 = r1 + r2;
// (2 * (result) * 5)
Opcode(OpcodeKind.loadn, 1, 2), // r1 = 2;
Opcode(OpcodeKind.loadn, 3, 5), // r3 = 5;
Opcode(OpcodeKind.muln, 2, 1, 2), // r2 = r1 * r2;
Opcode(OpcodeKind.muln, 2, 2, 3), // r2 = r2 * r3;
// (1 + (result))
Opcode(OpcodeKind.loadn, 1, 1), // r1 = 1;
Opcode(OpcodeKind.addn, 1, 1, 2), // r1 = r1 + r2;
Opcode(OpcodeKind.exit), // exit halt
];
I knew this method would not work because if I'm going to iterate through the nodes I need to know the position of the numbers and registers beforehand, so I'd have to use another method or way to find the number/register.
You don't need to read all of above; those were just my attempts to try to produce register-based virtual machine code.
I want to see how you guys would do it or how you would make it.

How to convert Swift [Data] to char **

I'm currently trying to port my Java Android library to Swift. In my Android library I'm using a JNI wrapper for Jerasure to call following C method
int jerasure_matrix_decode(int k, int m, int w, int *matrix, int row_k_ones, int *erasures, char **data_ptrs, char **coding_ptrs, int size)
I have to admit that I'm relatively new to Swift so some of my stuff might be wrong. In my Java code char **data_ptrs and char **coding_ptrs are actually two dimensional arrays (e.g. byte[][] dataShard = new byte[3][1400]). These two dimensional arrays contain actual video stream data. In my Swift library I store my video stream data in a [Data] array so the question is what is the correct way to convert the [Data] array to the C char ** type.
I already tried some things but none of them worked. Currently I have following conversion logic which gives me a UnsafeMutablePointer<UnsafeMutablePointer?>? pointer (data = [Data])
let ptr1 = ptrFromAddress(p: &data)
ptr1.withMemoryRebound(to: UnsafeMutablePointer<Int8>?.self, capacity: data.count) { pp in
// here pp is UnsafeMutablePointer<UnsafeMutablePointer<Int8>?>?
}
func ptrFromAddress<T>(p:UnsafeMutablePointer<T>) -> UnsafeMutablePointer<T>
{
return p
}
The expected result would be that jerasure is able to restore missing data shards of my [Data] array when calling the jerasure_matrix_decode method but instead it completely messes up my [Data] array and accessing it results in EXC_BAD_ACCESS. So I expect this is completely the wrong way.
Documentation in the jerasure.h header file writes following about data_ptrs
data_ptrs = An array of k pointers to data which is size bytes
Edit:
The jerasure library is defining the data_ptrs like this
#define talloc(type, num) (type *) malloc(sizeof(type)*(num))
char **data;
data = talloc(char *, k);
for (i = 0; i < k; i++) {
data[i] = talloc(char, sizeof(long)*w);
}
So what is the best option to call the jerasure_matrix_decode method from swift? Should I use something different than [Data]?
Possible similar question:
How to create a UnsafeMutablePointer<UnsafeMutablePointer<UnsafeMutablePointer<Int8>>>
A possible solution could be to allocate appropriate memory and fill it with the data.
Alignment
The equivalent to char ** of the C code would be UnsafeMutablePointer<UnsafeMutablePointer<CChar>?> on Swift side.
In the definition of data_ptrs that you show in your question, we see that each data block is to be allocated with malloc.
A property of C malloc is that it does not know which pointer type it will eventually be cast into. Therefore, it guarantees strictest memory alignment:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
see https://port70.net/~nsz/c/c11/n1570.html#7.22.3
Particularly performance-critical C routines often do not operate byte by byte, but cast to larger numeric types or use SIMD.
So, depending on your internal C library implementation, allocating with UnsafeMutablePointer<CChar>.allocate(capacity: columns) could be problematic, because
UnsafeMutablePointer provides no automated memory management or alignment guarantees.
see https://developer.apple.com/documentation/swift/unsafemutablepointer
The alternative could be to use UnsafeMutableRawPointer with an alignment parameter. You can use MemoryLayout<max_align_t>.alignment to find out the maximum alignment constraint.
Populating Data
An UnsafeMutablePointer<CChar> would have the advantage that we could use pointer arithmetic. This can be achieved by converting the UnsafeMutableRawPointer to an OpaquePointer and then to an UnsafeMutablePointer. In the code it would then look like this:
let colDataRaw = UnsafeMutableRawPointer.allocate(byteCount: cols, alignment: MemoryLayout<max_align_t>.alignment)
let colData = UnsafeMutablePointer<CChar>(OpaquePointer(colDataRaw))
for x in 0..<cols {
colData[x] = CChar(bitPattern: dataArray[y][x])
}
Complete Self-contained Test Program
Your library will probably have certain requirements for the data (e.g. supported matrix dimensions), which I don't know. These must be taken into account, of course. But for a basic technical test we can create an independent test program.
#include <stdio.h>
#include "matrix.h"
void some_matrix_operation(int rows, int cols, char **data_ptrs) {
printf("C side:\n");
for(int y = 0; y < rows; y++) {
for(int x = 0; x < cols; x++) {
printf("%02d ", (unsigned char)data_ptrs[y][x]);
data_ptrs[y][x] += 100;
}
printf("\n");
}
printf("\n");
}
It simply prints the bytes and adds 100 to each byte to be able to verify that the changes arrive on the Swift side.
The corresponding header must be included in the bridge header and looks like this:
#ifndef matrix_h
#define matrix_h
void some_matrix_operation(int rows, int cols, char **data_ptrs);
#endif /* matrix_h */
On the Swift side, we can put everything in a class called Matrix:
import Foundation
class Matrix: CustomStringConvertible {
let rows: Int
let cols: Int
let dataPtr: UnsafeMutablePointer<UnsafeMutablePointer<CChar>?>
init(dataArray: [Data]) {
guard !dataArray.isEmpty && !dataArray[0].isEmpty else { fatalError("empty data not supported") }
self.rows = dataArray.count
self.cols = dataArray[0].count
self.dataPtr = Self.copyToCMatrix(rows: rows, cols: cols, dataArray: dataArray)
}
deinit {
for y in 0..<rows {
dataPtr[y]?.deallocate()
}
dataPtr.deallocate()
}
var description: String {
var desc = ""
for data in dataArray {
for byte in data {
desc += "\(byte) "
}
desc += "\n"
}
return desc
}
var dataArray: [Data] {
var array = [Data]()
for y in 0..<rows {
if let ptr = dataPtr[y] {
array.append(Data(bytes: ptr, count: cols))
}
}
return array
}
private static func copyToCMatrix(rows: Int, cols: Int, dataArray: [Data]) -> UnsafeMutablePointer<UnsafeMutablePointer<CChar>?> {
let dataPtr = UnsafeMutablePointer<UnsafeMutablePointer<CChar>?>.allocate(capacity: rows)
for y in 0..<rows {
let colDataRaw = UnsafeMutableRawPointer.allocate(byteCount: cols, alignment: MemoryLayout<max_align_t>.alignment)
let colData = UnsafeMutablePointer<CChar>(OpaquePointer(colDataRaw))
dataPtr[y] = colData
for x in 0..<cols {
colData[x] = CChar(bitPattern: dataArray[y][x])
}
}
return dataPtr
}
}
You can call it as shown here:
let example: [[UInt8]] = [
[ 126, 127, 128, 129],
[ 130, 131, 132, 133],
[ 134, 135, 136, 137]
]
let dataArray = example.map { Data($0) }
let matrix = Matrix(dataArray: dataArray)
print("before on Swift side:")
print(matrix)
some_matrix_operation(Int32(matrix.rows), Int32(matrix.cols), matrix.dataPtr)
print("afterwards on Swift side:")
print(matrix)
Test Result
The test result is as follows and seems to show the expected result.
before on Swift side:
126 127 128 129
130 131 132 133
134 135 136 137
C side:
126 127 128 129
130 131 132 133
134 135 136 137
afterwards on Swift side:
226 227 228 229
230 231 232 233
234 235 236 237

Understanding F# memory consumption

I've been toying around with F# lately and wrote this little snippet below, it just creates a number of randomized 3d-vectors, puts them into a list, maps each vector to its length and sums up all those values.
Running the program (as a Release Build .exe, not interactive), the binary consumes in this particular case (10 mio vectors) roughly 550 MB RAM. One Vec3 object should account for 12 bytes (or 16 assuming some alignment takes place). Even if you do the rough math with 32 bytes to account for some book-keeping overhead (bytes per object*10 mio) / 1024 / 1024) you're still 200 MB off the actual consumption. Naively i'd assume to have 10 mio * 4 bytes per single in the end, since the Vec3 objects are 'mapped away'.
My guess so far: either i keep one (or several) copy/copies of my list somewhere and i'm not aware of that, or some intermediate results get never garbage collected? I can't imagine that inheriting from System.Object brings in so much overhead.
Could someone point me into the right direction with this?
TiA
type Vec3(x: single, y: single, z:single) =
let mag = sqrt(x*x + y*y + z*z)
member self.Magnitude = mag
override self.ToString() = sprintf "[%f %f %f]" x y z
let how_much = 10000000
let mutable rng = System.Random()
let sw = new System.Diagnostics.Stopwatch()
sw.Start()
let random_vec_iter len =
let mutable result = []
for x = 1 to len do
let mutable accum = []
for i = 1 to 3 do
accum <- single(rng.NextDouble())::accum
result <- Vec3(accum.[0], accum.[1], accum.[2])::result
result
sum_len_func = List.reduce (fun x y -> x+y)
let map_to_mag_func = List.map (fun (x:Vec3) -> x.Magnitude)
[<EntryPoint>]
let main argv =
printfn "Hello, World"
let res = sum_len_func (map_to_mag_func (random_vec_iter(how_much)))
printfn "doing stuff with %i items took %i, result is %f" how_much (sw.ElapsedMilliseconds) res
System.Console.ReadKey() |> ignore
0 // return an integer exit code
First, your vec is a ref type not a value type (not a struct). So you hold a pointer on top of your 12 bytes (12+16). Then the list is a single-linked list, so another 16 bytes for a .net ref. Then, your List.map will create an intermediate list.

How to get the size of struct and its contents in bytes in golang?

I have a struct, say:
type ASDF struct {
A uint64
B uint64
C uint64
D uint64
E uint64
F string
}
I create a slice of that struct: a := []ASDF{}
I do operations on that slice of the struct (adding/removing/updating structs that vary in contents); how can I get the total size in bytes (for memory) of the slice and its contents? Is there a built-in to do this or do I need to manually run a calculation using unsafe.Sizeof and then len each string?
Sum the size of all memory, excluding garbage collector and other overhead. For example,
package main
import (
"fmt"
"unsafe"
)
type ASDF struct {
A uint64
B uint64
C uint64
D uint64
E uint64
F string
}
func (s *ASDF) size() int {
size := int(unsafe.Sizeof(*s))
size += len(s.F)
return size
}
func sizeASDF(s []ASDF) int {
size := 0
s = s[:cap(s)]
size += cap(s) * int(unsafe.Sizeof(s))
for i := range s {
size += (&s[i]).size()
}
return size
}
func main() {
a := []ASDF{}
b := ASDF{}
b.A = 1
b.B = 2
b.C = 3
b.D = 4
b.E = 5
b.F = "ASrtertetetetetetetDF"
fmt.Println((&b).size())
a = append(a, b)
c := ASDF{}
c.A = 10
c.B = 20
c.C = 30
c.D = 40
c.E = 50
c.F = "ASetDF"
fmt.Println((&c).size())
a = append(a, c)
fmt.Println(len(a))
fmt.Println(cap(a))
fmt.Println(sizeASDF(a))
}
Output:
69
54
2
2
147
http://play.golang.org/p/5z30vkyuNM
I'm afraid to say that unsafe.Sizeof is the way to go here if you want to get any result at all. The in-memory size of a structure is nothing you should rely on. Notice that even the result of unsafe.Sizeof is inaccurate: The runtime may add headers to the data that you cannot observe to aid with garbage collection.
For your particular example (finding a cache size) I suggest you to go with a static size that is sensible for many processors. In almost all cases doing such micro-optimizations is not going to pay itself off.

Resources