How to find biggest variant in an enum in Rust? - memory

I'm trying to improve the performance of a rust program, which requires me to reduce the size of some large enums. For example
enum EE {
A, // 0
B(i32), //4
C(i64), // 8
D(String), // 24
E { // 16
x: i64,
y: i32,
},
}
fn main() {
println!("{}", std::mem::size_of::<EE>()); // 32
}
prints 32. But if I want to know the size of EE::A, I get a compile error
error[E0573]: expected type, found variant `EE::A`
--> src/main.rs:14:40
|
14 | println!("{}", std::mem::size_of::<EE::A>());
| ^^^^^
| |
| not a type
| help: try using the variant's enum: `crate::EE`
error: aborting due to previous error
error: could not compile `play_rust`.
Is there a way to find out which variant takes the most space?

No, there is no way to get the size of just one variant of an enum. The best you can do is get the size of what the variant contains, as if it were a standalone struct:
println!("sizeof EE::A: {}", std::mem::size_of::<()>()); // 0
println!("sizeof EE::B: {}", std::mem::size_of::<i32>()); // 4
println!("sizeof EE::C: {}", std::mem::size_of::<i64>()); // 8
println!("sizeof EE::D: {}", std::mem::size_of::<String>()); // 24
println!("sizeof EE::E: {}", std::mem::size_of::<(i64, i32)>()); // 16
Even this isn't especially useful because it includes padding bytes that may be used to store the tag; as you point out, the size of the enum can be reduced to 16 if D is shrunk to a single pointer, but you can't know that from looking at just the sizes. If y were instead defined as i64, the size of each variant would be the same, but the size of the enum would need to be 24. Alignment is another confounding factor that makes the size of an enum more complex than just "the size of the largest variant plus the tag".
Of course, this is all highly platform-dependent, and your code should not rely on any enum having a particular layout (unless you can guarantee it with a #[repr] annotation).
If you have a particular enum you're worried about, it's not difficult to get the size of each contained type. Clippy also has a lint for enums with extreme size differences between variants. However, I don't recommend using size alone to make manual optimizations to enum layouts, or boxing things that are only a few pointers in size -- indirection suppresses other kinds of optimizations the compiler may be able to do. If you prioritize minimal space usage you may accidentally make your code much slower in the process.

Related

Would size of any type always be multple of its aligment?

I haven't found a clear statement in documents but I found it's awalys true in my experimentals, that
bits_of(A) % alignment(A) == 0
In fact, if it's not true, some padding is required between elements in array of that type, so I believe it must be true but I just want to make sure here.
I'm thinking another question, what is the size of a type?
Actually, the LLVM IR doesn't provide any standard instruction to get the size of a type but only by some trick like (int)(((T*) NULL) + 1), as described here and there.
However, it's only the difference of aligned adjacent pointers, which is always multiple of its alignment.
And it may not be the actually occupied size.
For example, the structure {i8, i32, i8}, has 12bytes in adjacent aligned pointers, but only occupies 9bytes considered fields alignment (9bytes is enough for memory allocation)
i8 | 3bytes padding | i32 | i8
Then which size is the size? Is size a controversial concept in different situations and languages?
LLVM permits you to configure alignment in a Module using a data layout. Most data layouts will be as you've seen, but that's not required by LLVM. You can make a module where an int type has 256-bit alignment and 32-bit size, or 32-bit alignment and 256-bit size, and both of those make sense in some situations (consider a 32-bit addressable system with 256-bit L1 cache lines).
I don't want to go into your size question; size is such a pain. IMO the answer to "what's the size of …" varies with the reason for the question, but that's very much IMO.
It's the distinguishment between StoreSize and AllocSize,
here examples from LLVM source
/// Size examples:
///
/// Type SizeInBits StoreSizeInBits AllocSizeInBits[*]
/// ---- ---------- --------------- ---------------
/// i1 1 8 8
/// i8 8 8 8
/// i19 19 24 32
/// i32 32 32 32
/// i100 100 104 128
/// i128 128 128 128
/// Float 32 32 32
/// Double 64 64 64
/// X86_FP80 80 80 96
///
/// [*] The alloc size depends on the alignment, and thus on the target.
/// These values are for x86-32 linux.
The AllocSize defined as the offset in bytes between successive objects is alway multiple of alignments of course, but the StoreSize, defined as the maximum number of bytes that may be overwritten by storing, may not.

Weird case with MemoryLayout using a struct protocol, different size reported

I'm working on a drawing engine using Metal. I am reworking from a previous version, so starting from scratch
I was getting error Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
After some debugging I placed the blame to my drawPrimitives routine, I found the case quite interesting
I will have a variety of brushes, all of them will work with specific Vertex info
So I said, why not? Have all the brushes respond to a protocol
The protocol for the Vertices will be this:
protocol MetalVertice {}
And the Vertex info used by this specific brush will be:
struct PointVertex:MetalVertice{
var pointId:UInt32
let relativePosition:UInt32
}
The brush can be called either by giving Vertices previously created or by calling a function to create those vertices. Anyway, the real drawing happens at the vertice function
var vertices:[PointVertex] = [PointVertex].init(repeating: PointVertex(pointId: 0,
relativePosition: 0),
count: totalVertices)
for (verticeIdx, pointIndex) in pointsIndices.enumerated(){
vertices[verticeIdx].pointId = UInt32(pointIndex)
}
for vertice in vertices{
print("size: \(MemoryLayout.size(ofValue: vertice))")
}
self.renderVertices(vertices: vertices,
forStroke: stroke,
inDrawing: drawing,
commandEncoder: commandEncoder)
return vertices
}
func renderVertices(vertices: [MetalVertice], forStroke stroke: LFStroke, inDrawing drawing:LFDrawing, commandEncoder: MTLRenderCommandEncoder) {
if vertices.count > 1{
print("vertices a escribir: \(vertices.count)")
print("stride: \(MemoryLayout<PointVertex>.stride)")
print("size of array \(MemoryLayout.size(ofValue: vertices))")
for vertice in vertices{
print("ispointvertex: \(vertice is PointVertex)")
print("size: \(MemoryLayout.size(ofValue: vertice))")
}
}
let vertexBuffer = LFDrawing.device.makeBuffer(bytes: vertices,
length: MemoryLayout<PointVertex>.stride * vertices.count,
options: [])
This was the issue, calling this specific code produces these results in the console:
size: 8
size: 8
vertices a escribir: 2
stride: 8
size of array 8
ispointvertex: true
size: 40
ispointvertex: true
size: 40
In the previous function, the size of the vertices is 8 bytes, but for some reason, when they enter the next function they turn into 40 bytes, so the buffer is incorrectly constructed
if I change the function signature to:
func renderVertices(vertices: [PointVertex], forStroke stroke: LFStroke, inDrawing drawing:LFDrawing, commandEncoder: MTLRenderCommandEncoder) {
The vertices are correctly reported as 8 bytes long and the draw routine works as intended
Anything I'm missing? if the MetalVertice protocol introducing some noise?
In order to fulfill the requirement that value types conforming to protocols be able to perform dynamic dispatch (and also in part to ensure that containers of protocol types are able to assume that all of their elements are of uniform size), Swift uses what are called existential containers to hold the data of protocol-conforming value types alongside metadata that points to the concrete implementations of each protocol. If you've heard the term protocol witness table, that's what's getting in your way here.
The particulars of this are beyond the scope of this answer, but you can check out this video and this post for more info.
The moral of the story is: don't assume that Swift will lay out out your structs as-written. Swift can reorder struct members and add padding or arbitrary metadata, and it gives you practically no control over this. Instead, declare the structs you need to use in your Metal code in a C or Objective-C file and import them via a bridging header. If you want to use protocols to make it easier to address your structs polymorphically, you need to be prepared to copy them member-wise into your regular old C structs and prepared to pay the memory cost that that convenience entails.

Why does Rust reuse memory with same value

Example code:
fn main() {
let mut y = &5; // 1
println!("{:p}", y);
{
let x = &2; // 2
println!("{:p}", x);
y = x;
}
y = &3; // 3
println!("{:p}", y);
}
If third assignment contains &3 then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a8
If third assignment contains &2 (same value with second assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a4
If third assignment contains &5 (same value with first assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a0
Why does rust not free memory but reuse it if the assignment value is the same or allocate a new block of memory otherwise?
Two occurrences of the same literal number are indistinguishable. You cannot expect the address of two literals to be identical, and neither can you expect them to be different.
This allows the compiler (but in fact it is free to do otherwise) to emit one 5 data in the executable code, and have all &5 refer to it. Constants may (see comment) also have a static lifetime, in which case they are not allocated/deallocated during program execution, they always are allocated.
There are lots of tricks an optimizing compiler can use to determine if a variable can be assigned a constant value. Your findings are consistent with this, no need to run duplicate code if it is not needed.

What is the most memory-efficient array of nullable vectors when most of the second dimension will be empty?

I have a large fixed-size array of variable-sized arrays of u32. Most of the second dimension arrays will be empty (i.e. the first array will be sparsely populated). I think Vec is the most suitable type for both dimensions (Vec<Vec<u32>>). Because my first array might be quite large, I want to find the most space-efficient way to represent this.
I see two options:
I could use a Vec<Option<Vec<u32>>>. I'm guessing that as Option is a tagged union, this would result each cell being sizeof(Vec<u32>) rounded up to the next word boundary for the tag.
I could directly use Vec::with_capacity(0) for all cells. Does an empty Vec allocate zero heap until it's used?
Which is the most space-efficient method?
Actually, both Vec<Vec<T>> and Vec<Option<Vec<T>>> have the same space efficiency.
A Vec contains a pointer that will never be null, so the compiler is smart enough to recognize that in the case of Option<Vec<T>>, it can represent None by putting 0 in the pointer field. What is the overhead of Rust's Option type? contains more information.
What about the backing storage the pointer points to? A Vec doesn't allocate (same link as the first) when you create it with Vec::new or Vec::with_capacity(0); in that case, it uses a special, non-null "empty pointer". Vec only allocates space on the heap when you push something or otherwise force it to allocate. Therefore, the space used both for the Vec itself and for its backing storage are the same.
Vec<Vec<T>> is a decent starting point. Each entry costs 3 pointers, even if it is empty, and for filled entries there can be additional per-allocation overhead. But depending on which trade-offs you're willing to make, there might be a better solution.
Vec<Box<[T]>> This reduces the size of an entry from 3 pointers to 2 pointers. The downside is that changing the number of elements in a box is both inconvenient (convert to and from Vec<T>) and more expensive (reallocation).
HashMap<usize, Vec<T>> This saves a lot of memory if the outer collection is sufficiently sparse. The downsides are higher access cost (hashing, scanning) and a higher per element memory overhead.
If the collection is only filled once and you never resize the inner collections you could use a split data structure:
This not only reduces the per-entry size to 1 pointer, it also eliminates the per-allocation overhead.
struct Nested<T> {
data: Vec<T>,
indices: Vec<usize>,// points after the last element of the i-th slice
}
impl<T> Nested<T> {
fn get_range(&self, i: usize) -> std::ops::Range<usize> {
assert!(i < self.indices.len());
if i > 0 {
self.indices[i-1]..self.indices[i]
} else {
0..self.indices[i]
}
}
pub fn get(&self, i:usize) -> &[T] {
let range = self.get_range(i);
&self.data[range]
}
pub fn get_mut(&mut self, i:usize) -> &mut [T] {
let range = self.get_range(i);
&mut self.data[range]
}
}
For additional memory savings you can reduce the indices to u32 limiting you to 4 billion elements per collection.

How do I calculate the size and layout of this particular struct?

The structure is,
struct {
char a;
short b;
short c;
short d;
char e;
} s1;
size of short is given as 2 bytes
size of char is given as 1 bytes
It is a 32-bit LITTLE ENDIAN processor
According to me, the answer should be:
1000 a[0]
1001 offset
1002 b[0]
1003 b[1]
1004 c[0]
1005 c[1]
1006 d[0]
1007 d[1]
1008 e[0]
size of S1 = 9 bytes​
but according to the solution, the size of S1 is supposed to be 10 bytes
The answer here is that it is that the layout of the structure is entirely up to the compiler.
10 is likely to be the most common size of this structure.
The reason for the padding is that, if there is an array, it will keep all the members properly aligned. If the size were 9, every other array element would have misaligned structure members.
Unaligned did accesses are not permitted on some systems. On most systems, they cause the processor to use extra cycles to access the data.
A compiler could allocate 4 bytes for each element in such a structure.
The C Standard says (sorry, not at my computer, so no quote): structs are aligned to the alignment of the largest (base type) member. Your largest member field is a short, 2 bytes, so the first element 'a' is aligned at an even address. 'a' takes up 1 byte. 'b' has to be aligned again at an even address, so one byte gets wasted. The last element of your struct 'e' is also one byte, and the byte following that is likely to be wasted, but that doesn't have to show up in the size of the struct. If put 'a' to the end, ie rearrange the members, you are likely to find the size of your struct to be 8 bytes..which is as good as it gets.

Resources