How does Rust store types at runtime? - memory

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?

How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

Related

Do Pascal compilers need SecureZeroMemory function?

Consider the code:
procedure DoSmthSecret;
var
Seed: array[0..31] of Byte;
begin
// get random seed
..
// use the seed to do something secret
..
// erase the seed
FillChar(Seed, SizeOf(Seed), 0);
end;
The problem with the code is: FillChar is a compiler intrinsic, and potentially a compiler can "optimize it out". The problem is known for C/C++ compilers, see SecureZeroMemory. Can modern Pascal compiler (Delphi, FPC) do such optimization, and if they can, do they provide SecureZeroMemory equivalent?
FPC can't do such optimizations at the moment, and afaik even with C++ they belong into the "uncertain" class. (since the state of the program due to this optimization ignores what the programmer tells it to be)
Solving such problem is a matter of defining which constructs can be optimized out and which not. It doesn't need API/OS assistance per se, any externally linked object file with such function would do (since then global optimization wouldn't touch it)
Note that the article doesn't name the C++ compiler specifically, so I expect it is more a general utility function for when an user of a compiler gets into problems, without hitting the docs too hard, or when it must easily work on multiple (windows-only!) compilers without overly complicating the buildsystem.
Choosing a non inlinable API function might be non optimal in other cases, specially with small, constant sizes to zero, since it won't be inlined, so I would be careful with this function, and make sure there is a hard need
It might be important mainly when an external entity can change memory (DMA, memory mapping etc) of a program, or to erase passwords and other sensitive info from the memory image, even if the program according to the compiler will never read it
Even if FreePascal would optimize out writing to memory that is never read again (which I doubt it does atm, regardless of how long you guys discuss it), it does support the absolute type modifier which it guarantees (documented) to never optimize (somewhat similar to volatile in C/C++).

unsafe casting in F# with zero copy semantics

I'm trying to achieve a static cast like coercion that doesn't result in copying of any data.
A naive static cast does not work
let pkt = byte_buffer :> PktHeader
FS0193: Type constraint mismatch. The type byte[] is not compatible with type PktHeader The type 'byte[]' is not compatible with the type 'PktHeader' (FS0193) (program)
where the packet is initially held in a byte array because of the way System.Net.Sockets.Socket.Receive() is defined.
The low level packet struct is defined something like this
[<Struct; StructLayout(LayoutKind.Explicit)>]
type PktHeader =
[<FieldOffset(0)>] val mutable field1: uint16
[<FieldOffset(2)>] val mutable field2: uint16
[<FieldOffset(4)>] val mutable field3: uint32
.... many more fields follow ....
Efficiency is important in this real world scenario because wasteful copying of data could rule out F# as an implementation language.
How do you achieve zero copy efficiencies in this scenario?
EDIT on Nov 29
my question was predicated on the implicit belief that a C/C++/C# style unsafe static cast is a useful construct, as if this is self evident. However, on 2nd thought this kind of cast is not idiomatic in F# since it is inherently an imperative language technique fraught with peril. For this reason I've accepted the answer by V.B. where SBE/FlatBuffers data access is promulgated as best practice.
A pure F# approach for conversion
let convertByteArrayToStruct<'a when 'a : struct> (byteArr : byte[]) =
let handle = GCHandle.Alloc(byteArr, GCHandleType.Pinned)
let structure = Marshal.PtrToStructure (handle.AddrOfPinnedObject(), typeof<'a>)
handle.Free()
structure :?> 'a
This is a minimum example but I'd recommend introducing some checks on the length of the byte array because, as it's written there, it will produce undefined results if you give it a byte array which is too short. You could check against Marshall.SizeOf(typeof<'a>).
There is no pure F# solution to do a less safe conversion than this (and this is already an approach prone to runtime failure). Alternative options could include interop with C# to use unsafe and fixed to do the conversion.
Ultimately though, you are asking for a way to subvert the F# type system which is not really what the language is designed for. One of the principle advantages of F# is the power of the type system and it's ability to help you produce statically verifiable code.
F# and very low-level performance optimizations are not best friends, but then... some smart people do magic even with Java, which doesn't have value types and real generic collections for them.
1) I am a big fan of a flyweight pattern lately. If you architecture allows for it, you could wrap a byte array and access struct members via offsets. A C# example here. SBE/FlatBuffers even have tools to generate a wrapper automatically from a definition.
2) If you could stay within unsafe context in C# to do the work, pointer casting is very easy and efficient. However, that requires pinning the byte array and keeping its handle for later release, or staying within fixed keyword. If you have many small ones without a pool, you could have problems with GC.
3) The third option is abusing .NET type system and cast a byte array with IL like this (this could be coded in F#, if you insist :) ):
static T UnsafeCast(object value) {
ldarg.1 //load type object
ret //return type T
}
I tried this option and even have a snippet somewhere if you need, but this approach makes me uncomfortable because I do not understand its consequences to GC. We have two objects backed by the same memory, what would happen when one of them is GCed? I was going to ask a new question on SO about this detail, will post it soon.
The last approach could be good for arrays of structs, but for a single struct it will box it or copy it anyway. Since structs are on the stack and passed by value, you will probably get better results just by casting a pointer to byte[] in unsafe C# or using Marshal.PtrToStructure as in another answer here, and then copy by value. Copying is not the worst thing, especially on the stack, but allocation of new objects and GC is the enemy, so you need byte arrays pooled and this will add much more to the overall performance than you struct casting issue.
But if your struct is very big, option 1 could still be better.

need of pointer objects in objective c

A very basic question .. but really very important to understand the concepts..
in c++ or c languages, we usually don't use pointer variables to store values.. i.e. values are stored simply as is in:
int a=10;
but here in ios sdk, in objective c, most of the objects which we use are initialized by denoting a pointer with them as in:
NSArray *myArray=[NSArray array];
So,the question arises in my mind ,that, what are the benefit and need of using pointer-objects (thats what we call them here, if it is not correct, please, do tell)..
Also I just get confused sometimes with memory allocation fundamentals when using a pointer objects for allocation. Can I look for good explanations anywhere?
in c++ or c languages, we usually don't use pointer variables to store values
I would take that "or C" part out. C++ programmers do frown upon the use of raw pointers, but C programmers don't. C programmers love pointers and regard them as an inevitable silver bullet solution to all problems. (No, not really, but pointers are still very frequently used in C.)
but here in ios sdk, in objective c, most of the objects which we use are initialized by denoting a pointer with them
Oh, look closer:
most of the objects
Even closer:
objects
So you are talking about Objective-C objects, amirite? (Disregard the subtlety that the C standard essentially describes all values and variables as an "object".)
It's really just Objective-C objects that are always pointers in Objective-C. Since Objective-C is a strict superset of C, all of the C idioms and programming techniques still apply when writing iOS apps (or OS X apps, or any other Objective-C based program for that matter). It's pointless, superfluous, wasteful, and as such, it is even considered an error to write something like
int *i = malloc(sizeof(int));
for (*i = 0; *i < 10; ++*i)
just because we are in Objective-C land. Primitives (or more correctly "plain old datatypes" with C++ terminology) still follow the "don't use a pointer if not needed" rule.
what are the benefit and need of using pointer-objects
So, why they are necessary:
Objective-C is an object-oriented and dynamic language. These two, strongly related properties of the language make it possible for programmers to take advantage of technologies such as polymorphism, duck-typing and dynamic binding (yes, these are hyperlinks, click them).
The way these features are implemented make it necessary that all objects be represented by a pointer to them. Let's see an example.
A common task when writing a mobile application is retrieving some data from a server. Modern web-based APIs use the JSON data exchange format for serializing data. This is a simple textual format which can be parsed (for example, using the NSJSONSerialization class) into various types of data structures and their corresponding collection classes, such as an NSArray or an NSDictionary. This means that the JSON parser class/method/function has to return something generic, something that can represent both an array and a dictionary.
So now what? We can't return a non-pointer NSArray or NSDictionary struct (Objective-C objects are really just plain old C structs under the hoods on all platforms I know Objective-C works on), because they are of different size, they have different memory layouts, etc. The compiler couldn't make sense of the code. That's why we return a pointer to a generic Objective-C object, of type id.
The C standard mandates that pointers to structs (and as such, to objects) have the same representation and alignment requirements (C99 6.2.5.27), i. e. that a pointer to any struct can be cast to a pointer to any other struct safely. Thus, this approach is correct, and we can now return any object. Using runtime introspection, it is also possible to determine the exact type (class) of the object dynamically and then use it accordingly.
And why they are convenient or better (in some aspects) than non-pointers:
Using pointers, there is no need to pass around multiple copies of the same object. Creating a lot of copies (for example, each time an object is assigned to or passed to a function) can be slow and lead to performance problems - a moderately complex object, for example, a view or a view controller, can have dozens of instance variables, thus a single instance may measure literally hundreds of bytes. If a function call that takes an object type is called thousands or millions of times in a tight loop, then re-assigning and copying it is quite painful (for the CPU anyway), and it's much easier and more straightforward to just pass in a pointer to the object (which is smaller in size and hence faster to copy over). Furthermore, Objective-C, being a reference counted language, even kind of "discourages" excessive copying anyway. Retaining and releasing is preferred over explicit copying and deallocation.
Also I just get confused sometimes with memory allocation fundamentals when using a pointer objects for allocation
Then you are most probably confused enough even without pointers. Don't blame it on the pointers, it's rather a programmer error ;-)
So here's...
...the official documentation and memory management guide by Apple;
...the earliest related Stack Overflow question I could find;
...something you should read before trying to continue Objective-C programming #1; (i. e. learn C first)
...something you should read before trying to continue Objective-C programming #2;
...something you should read before trying to continue Objective-C programming #3;
...and an old Stack Overflow question regarding C memory management rules, techniques and idioms;
Have fun! :-)
Anything more complex than an int or a char or similar is usually passed as
pointers even in C. In C you could of course pass around a struct of data
from function to function but this is rarely seen.
Consider the following code:
struct some_struct {
int an_int;
char a_char[1234];
};
void function1(void)
{
struct some_struct s;
function2(s);
}
void function2(struct some_struct s)
{
//do something with some_struct s
}
The some_struct data s will be put on the stack for function1. When function2
is called the data will be copied and put on the stack for use in function2.
It requires the data to be on the stack twice as well as the data to be
copied. This is not very efficient. Also, note that changing the values
of the struct in function2 will not affect the struct in function1, they
are different data in memory.
Instead consider the following code:
struct some_struct {
int an_int;
char a_char[1234];
};
void function1(void)
{
struct some_struct *s = malloc(sizeof(struct some_struct));
function2(s);
free(s);
}
void function2(struct some_struct *s)
{
//do something with some_struct s
}
The some_struct data will be put on the heap instead of the stack. Only
a pointer to this data will be put on the stack for function1, copied in the
call to function2 another pointer put on the stack for function2. This is a
lot more efficient than the previous example. Also, note that any changes of
the data in the struct made by function2 will now affect the struct in
function1, they are the same data in memory.
This is basically the fundamentals on which higher level programming languages
such as Objective-C is built and the benefits from building these languages
like this.
The benefit and need of pointer is that it behaves like a mirror. It reflects what it points to. One main place where points could be very useful is to share data between functions or methods. The local variables are not guaranteed to keep their value each time a function returns, and that they’re visible only inside their own function. But you still may want to share data between functions or methods. You can use return, but that works only for a single value. You can also use global variables, but not to store your complete data, you soon have a mess. So we need some variable that can share data between functions or methods. There comes pointers to our remedy. You can just create the data and just pass around the memory address (the unique ID) pointing to that data. Using this pointer the data could be accessed and altered in any function or method. In terms of writing modular code, that’s the most important purpose of a pointer— to share data in many different places in a program.
The main difference between C and Objective-C in this regard is that arrays in Objective-C are commonly implemented as objects. (Arrays are always implemented as objects in Java, BTW, and C++ has several common classes resembling NSArray.)
Anyone who has considered the issue carefully understands that "bare" C-like arrays are problematic -- awkward to deal with and very frequently the source of errors and confusion. ("An array 'decays' to a pointer" -- what is that supposed to mean, anyway, other than to admit in a backhanded way "Yes, it's confusing"??)
Allocation in Objective-C is a bit confusing in large part because it's in transition. The old manual reference count scheme could be easily understood (if not so easily dealt with in implementations), but ARC, while simpler to deal with, is far harder to truly understand, and understanding both simultaneously is even harder. But both are easier to deal with than C, where "zombie pointers" are almost a given, due to the lack of reference counting. (C may seem simpler, but only because you don't do things as complex as those you'd do with Objective-C, due to the difficulty controlling it all.)
You use a pointer always when referring to something on the heap and sometimes, but usually not when referring to something on the stack.
Since Objective-C objects are always allocated on the heap (with the exception of Blocks, but that is orthogonal to this discussion), you always use pointers to Objective-C objects. Both the id and Class types are really pointers.
Where you don't use pointers are for certain primitive types and simple structures. NSPoint, NSRange, int, NSUInteger, etc... are all typically accessed via the stack and typically you do not use pointers.

id values of different variables in python 3

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

How to also prepare for 64-bits when migrating to Delphi 2010 and Unicode

As 64 bits support is not expected in the next version it is no longer an option to wait for the possibility to migrate our existing code base to unicode and 64-bit in one go.
However it would be nice if we could already prepare our code for 64-bit when doing our unicode translation. This will minimize impact in the event it will finally appear in version 2020.
Any suggestions how to approach this without introducing to much clutter if it doesn't arrive until 2020?
There's another similar question, but I'll repeat my reply here too, to make sure as many people see this info:
First up, a disclaimer: although I work for Embarcadero. I can't speak for my employer. What I'm about to write is based on my own opinion of how a hypothetical 64-bit Delphi should work, but there may or may not be competing opinions and other foreseen or unforeseen incompatibilities and events that cause alternative design decisions to be made.
That said:
There are two integer types, NativeInt and NativeUInt, whose size will
float between 32-bit and 64-bit depending on platform. They've been
around for quite a few releases. No other integer types will change size
depending on bitness of the target.
Make sure that any place that relies on casting a pointer value to an
integer or vice versa is using NativeInt or NativeUInt for the integer
type. TComponent.Tag should be NativeInt in later versions of Delphi.
I'd suggest don't use NativeInt or NativeUInt for non-pointer-based values. Try to keep your code semantically the same between 32-bit and 64-bit. If you need 32 bits of range, use Integer; if you need 64 bits, use Int64. That way your code should run the same on both bitnesses. Only if you're casting to and from a Pointer value of some kind, such as a reference or a THandle, should you use NativeInt.
Pointer-like things should follow similar rules to pointers: object
references (obviously), but also things like HWND, THandle, etc.
Don't rely on internal details of strings and dynamic arrays, like
their header data.
Our general policy on API changes for 64-bit should be to keep the
same API between 32-bit and 64-bit where possible, even if it means that
the 64-bit API does not necessarily take advantage of the machine. For
example, TList will probably only handle MaxInt div SizeOf(Pointer)
elements, in order to keep Count, indexes etc. as Integer. Because the
Integer type won't float (i.e. change size depending on bitness), we
don't want to have ripple effects on customer code: any indexes that
round-tripped through an Integer-typed variable, or for-loop index,
would be truncated and potentially cause subtle bugs.
Where APIs are extended for 64-bit, they will most likely be done with
an extra function / method / property to access the extra data, and this
API will also be supported in 32-bit. For example, the Length() standard
routine will probably return values of type Integer for arguments of
type string or dynamic array; if one wants to deal with very large
dynamic arrays, there may be a LongLength() routine as well, whose
implementation in 32-bit is the same as Length(). Length() would throw
an exception in 64-bit if applied to a dynamic array with more than 232
elements.
Related to this, there will probably be improved error checking for
narrowing operations in the language, especially narrowing 64-bit values
to 32-bit locations. This would hit the usability of assigning the
return value of Length to locations of type Integer if Length(),
returned Int64. On the other hand, specifically for compiler-magic
functions like Length(), there may be some advantage of the magic taken,
to e.g. switch the return type based on context. But advantage can't be
similarly taken in non-magic APIs.
Dynamic arrays will probably support 64-bit indexing. Note that Java
arrays are limited to 32-bit indexing, even on 64-bit platforms.
Strings probably will be limited to 32-bit indexing. We have a hard
time coming up with realistic reasons for people wanting 4GB+ strings
that really are strings, and not just managed blobs of data, for which
dynamic arrays may serve just as well.
Perhaps a built-in assembler, but with restrictions, like not being able to freely mix with Delphi code; there are also rules around exceptions and stack frame layout that need to be followed on x64.
First, look at the places where you interact with non-delphi libraries and api-calls,
they might differ. On Win32, libraries with the stdcall calling convenstion are named like _SomeFunction#4 (#4 indicating the size of the parameters, etc). On Win64, there is only one calling convention, and the functions in a dll are no more decorated. If you import functions from dll files, you might need to adjust them.
Keep in mind, in a 64 bit exe you cannot load a 32-bit dll, so, if you depend on 3rd party dll files, you should check for a 64-bit version of those files as well.
Also, look at Integers, if you depend on their max value, for example when you let them overflow and wait for the moment that happens, it will cause trouble if the size of an integer is changed.
Also, when working with streams, and you want to serialize different data, with includes an integer, it will cause trouble, since the size of the integer changed, and your stream will be out of sync.
So, on places where you depend on the size of an integer or pointer, you will need to make adjustments. When serializing sush data, you need to keep in mind this size issue as well, as it might cause data incompatibilities between 32 and 64 bit versions.
Also, the FreePascal compiler with the Lazarus IDE already supports 64-bit. This alternative Object Pascal compiler is not 100% compatible with the Borland/Codegear/Embarcadero dialect of Pascal, so just recompiling with it for 64-bit might not be that simple, but it might help point out problems with 64-bit.
The conversion to 64bit should not be very painful. Start with being intentional about the size of an integer where it matters. Don't use "integer" instead use Int32 for integers sized at 32bits, and Int64 for integers sized at 64bits. In the last bit conversion the definition of Integer went from Int16 to Int32, so your playing it safe by specifying the exact bit depth.
If you have any inline assembly, create a pascal equivalent and create some unit tests to insure that they operate the same way. Perform some timing tests of both and see if the assembly still runs faster enough to keep. If it does, then you will want to make changes to both as they are needed.
Use NativeInt for integers that can contain casted pointers.

Resources