unsafe casting in F# with zero copy semantics - f#

I'm trying to achieve a static cast like coercion that doesn't result in copying of any data.
A naive static cast does not work
let pkt = byte_buffer :> PktHeader
FS0193: Type constraint mismatch. The type byte[] is not compatible with type PktHeader The type 'byte[]' is not compatible with the type 'PktHeader' (FS0193) (program)
where the packet is initially held in a byte array because of the way System.Net.Sockets.Socket.Receive() is defined.
The low level packet struct is defined something like this
[<Struct; StructLayout(LayoutKind.Explicit)>]
type PktHeader =
[<FieldOffset(0)>] val mutable field1: uint16
[<FieldOffset(2)>] val mutable field2: uint16
[<FieldOffset(4)>] val mutable field3: uint32
.... many more fields follow ....
Efficiency is important in this real world scenario because wasteful copying of data could rule out F# as an implementation language.
How do you achieve zero copy efficiencies in this scenario?
EDIT on Nov 29
my question was predicated on the implicit belief that a C/C++/C# style unsafe static cast is a useful construct, as if this is self evident. However, on 2nd thought this kind of cast is not idiomatic in F# since it is inherently an imperative language technique fraught with peril. For this reason I've accepted the answer by V.B. where SBE/FlatBuffers data access is promulgated as best practice.

A pure F# approach for conversion
let convertByteArrayToStruct<'a when 'a : struct> (byteArr : byte[]) =
let handle = GCHandle.Alloc(byteArr, GCHandleType.Pinned)
let structure = Marshal.PtrToStructure (handle.AddrOfPinnedObject(), typeof<'a>)
handle.Free()
structure :?> 'a
This is a minimum example but I'd recommend introducing some checks on the length of the byte array because, as it's written there, it will produce undefined results if you give it a byte array which is too short. You could check against Marshall.SizeOf(typeof<'a>).
There is no pure F# solution to do a less safe conversion than this (and this is already an approach prone to runtime failure). Alternative options could include interop with C# to use unsafe and fixed to do the conversion.
Ultimately though, you are asking for a way to subvert the F# type system which is not really what the language is designed for. One of the principle advantages of F# is the power of the type system and it's ability to help you produce statically verifiable code.

F# and very low-level performance optimizations are not best friends, but then... some smart people do magic even with Java, which doesn't have value types and real generic collections for them.
1) I am a big fan of a flyweight pattern lately. If you architecture allows for it, you could wrap a byte array and access struct members via offsets. A C# example here. SBE/FlatBuffers even have tools to generate a wrapper automatically from a definition.
2) If you could stay within unsafe context in C# to do the work, pointer casting is very easy and efficient. However, that requires pinning the byte array and keeping its handle for later release, or staying within fixed keyword. If you have many small ones without a pool, you could have problems with GC.
3) The third option is abusing .NET type system and cast a byte array with IL like this (this could be coded in F#, if you insist :) ):
static T UnsafeCast(object value) {
ldarg.1 //load type object
ret //return type T
}
I tried this option and even have a snippet somewhere if you need, but this approach makes me uncomfortable because I do not understand its consequences to GC. We have two objects backed by the same memory, what would happen when one of them is GCed? I was going to ask a new question on SO about this detail, will post it soon.
The last approach could be good for arrays of structs, but for a single struct it will box it or copy it anyway. Since structs are on the stack and passed by value, you will probably get better results just by casting a pointer to byte[] in unsafe C# or using Marshal.PtrToStructure as in another answer here, and then copy by value. Copying is not the worst thing, especially on the stack, but allocation of new objects and GC is the enemy, so you need byte arrays pooled and this will add much more to the overall performance than you struct casting issue.
But if your struct is very big, option 1 could still be better.

Related

How does Rust store types at runtime?

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?
How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

No F# generics with constant "template arguments"?

It just occurred to me, that F# generics do not seem to accept constant values as "template parameters".
Suppose one wanted to create a type RangedInt such, that it behaves like an int but is guaranteed to only contain a sub-range of integer values.
A possible approach could be a discriminated union, similar to:
type RangedInt = | Valid of int | Invalid
But this is not working either, as there is no "type specific storage of the range information". And 2 RangedInt instances should be of different type, if the range differs, too.
Being still a bit C++ infested it would look similar to:
template<int low,int high>
class RangedInteger { ... };
Now the question, arising is two fold:
Did I miss something and constant values for F# generics exist?
If I did not miss that, what would be the idiomatic way to accomplish such a RangedInt<int,int> in F#?
Having found Tomas Petricek's blog about custom numeric types, the equivalent to my question for that blog article would be: What if he did not an IntegerZ5 but an IntegerZn<int> custom type family?
The language feature you're requesting is called Dependent Types, and F# doesn't have that feature.
It's not a particularly common language feature, and even Haskell (which most other Functional programming languages 'look up to') doesn't really have it.
There are languages with Dependent Types out there, but none of them I would consider mainstream. Probably the one I hear about the most is Idris.
Did I miss something and constant values for F# generics exist?
While F# has much strong type inference than other .NET languages, at its heart it is built on .NET.
And .NET generics only support a small subset of what is possible with C++ templates. All type arguments to generic types must be types, and there is no defaulting of type arguments either.
If I did not miss that, what would be the idiomatic way to accomplish such a RangedInt in F#?
It would depend on the details. Setting the limits at runtime is one possibility – this would be the usual approach in .NET. Another would be units of measure (this seems less likely to be a fit).
What if he did not an IntegerZ5 but an IntegerZn<int> custom type family?
I see two reasons:
It is an example, and avoiding generics keeps things simpler allowing focus on the point of the example.
What other underlying type would one use anyway? On contemporary systems smaller types (byte, Int16 etc.) are less efficient (unless space at runtime is the overwhelming concern); long would add size without benefit (it is only going to hold 5 possible values).

What is the idiomatic way to write a linked list with a tail pointer?

As a learning project for Rust, I have a very simple (working, if incomplete) implementation of a singly linked list. The declaration of the structs looks like this:
type NodePtr<T> = Option<Box<Node<T>>>;
struct Node<T> {
data: T,
next: NodePtr<T>,
}
pub struct LinkedList<T> {
head: NodePtr<T>,
}
Implementing size and push_front were both reasonably straight-forward, although doing size iteratively did involve some "fighting with the borrow checker."
The next thing I wanted to try was adding a tail pointer to the LinkedList structure. to enable an efficient push_back operation. Here I've run into a bit of a wall. At first I attempted to use Option<&Box<Node<T>>> and then Option<&Node<T>>. Both of those led to sprinkling 'as everywhere, but still eventually being unable to promise the lifetime checker that tail would be valid.
I have since come to the tentative conclusion that it is impossible with these definitions: that there is no way to guarantee to the compiler that tail would be valid in the places that I think it is valid. The only way I can possibly accomplish this is to have all my pointers be Rc<_> or Rc<RefCell<_>>, because those are the only safe ways to have two things pointing at the same object (the final node).
My question: is this the correct conclusion? More generally: what is the idiomatic Rust solution for unowned pointers inside data structures? To my mind, reference counting seems awfully heavy-weight for something so simple, so I think I must be missing something. (Or perhaps I just haven't gotten my mind into the right mindset for memory safety yet.)
Yes, if you want to write a singly-linked-list with a tail-pointer you have three choices:
Safe and Mutable: Use NodePtr = Option<Rc<RefCell<Node<T>>>>
Safe and Immutable: Use NodePtr = Option<Rc<Node<T>>>
Unsafe and Mutable: Use tail: *mut Node<T>
The *mut is going to be more efficient, and it's not like the Rc is actually going to prevent you from producing completely nonsense states (as you correctly deduced). It's just going to guarantee that they don't cause segfaults (and with RefCell it may still cause runtime crashes though...).
Ultimately, any linked list more complex than vanilla singly-linked has an ownership story that's too complex to encode in Rust's ownership system safely and efficiently (it's not a tree). I personally favour just embracing the unsafety at that point and leaning on unit tests to get to the finish-line in one piece (why write a suboptimal data structure...?).

need of pointer objects in objective c

A very basic question .. but really very important to understand the concepts..
in c++ or c languages, we usually don't use pointer variables to store values.. i.e. values are stored simply as is in:
int a=10;
but here in ios sdk, in objective c, most of the objects which we use are initialized by denoting a pointer with them as in:
NSArray *myArray=[NSArray array];
So,the question arises in my mind ,that, what are the benefit and need of using pointer-objects (thats what we call them here, if it is not correct, please, do tell)..
Also I just get confused sometimes with memory allocation fundamentals when using a pointer objects for allocation. Can I look for good explanations anywhere?
in c++ or c languages, we usually don't use pointer variables to store values
I would take that "or C" part out. C++ programmers do frown upon the use of raw pointers, but C programmers don't. C programmers love pointers and regard them as an inevitable silver bullet solution to all problems. (No, not really, but pointers are still very frequently used in C.)
but here in ios sdk, in objective c, most of the objects which we use are initialized by denoting a pointer with them
Oh, look closer:
most of the objects
Even closer:
objects
So you are talking about Objective-C objects, amirite? (Disregard the subtlety that the C standard essentially describes all values and variables as an "object".)
It's really just Objective-C objects that are always pointers in Objective-C. Since Objective-C is a strict superset of C, all of the C idioms and programming techniques still apply when writing iOS apps (or OS X apps, or any other Objective-C based program for that matter). It's pointless, superfluous, wasteful, and as such, it is even considered an error to write something like
int *i = malloc(sizeof(int));
for (*i = 0; *i < 10; ++*i)
just because we are in Objective-C land. Primitives (or more correctly "plain old datatypes" with C++ terminology) still follow the "don't use a pointer if not needed" rule.
what are the benefit and need of using pointer-objects
So, why they are necessary:
Objective-C is an object-oriented and dynamic language. These two, strongly related properties of the language make it possible for programmers to take advantage of technologies such as polymorphism, duck-typing and dynamic binding (yes, these are hyperlinks, click them).
The way these features are implemented make it necessary that all objects be represented by a pointer to them. Let's see an example.
A common task when writing a mobile application is retrieving some data from a server. Modern web-based APIs use the JSON data exchange format for serializing data. This is a simple textual format which can be parsed (for example, using the NSJSONSerialization class) into various types of data structures and their corresponding collection classes, such as an NSArray or an NSDictionary. This means that the JSON parser class/method/function has to return something generic, something that can represent both an array and a dictionary.
So now what? We can't return a non-pointer NSArray or NSDictionary struct (Objective-C objects are really just plain old C structs under the hoods on all platforms I know Objective-C works on), because they are of different size, they have different memory layouts, etc. The compiler couldn't make sense of the code. That's why we return a pointer to a generic Objective-C object, of type id.
The C standard mandates that pointers to structs (and as such, to objects) have the same representation and alignment requirements (C99 6.2.5.27), i. e. that a pointer to any struct can be cast to a pointer to any other struct safely. Thus, this approach is correct, and we can now return any object. Using runtime introspection, it is also possible to determine the exact type (class) of the object dynamically and then use it accordingly.
And why they are convenient or better (in some aspects) than non-pointers:
Using pointers, there is no need to pass around multiple copies of the same object. Creating a lot of copies (for example, each time an object is assigned to or passed to a function) can be slow and lead to performance problems - a moderately complex object, for example, a view or a view controller, can have dozens of instance variables, thus a single instance may measure literally hundreds of bytes. If a function call that takes an object type is called thousands or millions of times in a tight loop, then re-assigning and copying it is quite painful (for the CPU anyway), and it's much easier and more straightforward to just pass in a pointer to the object (which is smaller in size and hence faster to copy over). Furthermore, Objective-C, being a reference counted language, even kind of "discourages" excessive copying anyway. Retaining and releasing is preferred over explicit copying and deallocation.
Also I just get confused sometimes with memory allocation fundamentals when using a pointer objects for allocation
Then you are most probably confused enough even without pointers. Don't blame it on the pointers, it's rather a programmer error ;-)
So here's...
...the official documentation and memory management guide by Apple;
...the earliest related Stack Overflow question I could find;
...something you should read before trying to continue Objective-C programming #1; (i. e. learn C first)
...something you should read before trying to continue Objective-C programming #2;
...something you should read before trying to continue Objective-C programming #3;
...and an old Stack Overflow question regarding C memory management rules, techniques and idioms;
Have fun! :-)
Anything more complex than an int or a char or similar is usually passed as
pointers even in C. In C you could of course pass around a struct of data
from function to function but this is rarely seen.
Consider the following code:
struct some_struct {
int an_int;
char a_char[1234];
};
void function1(void)
{
struct some_struct s;
function2(s);
}
void function2(struct some_struct s)
{
//do something with some_struct s
}
The some_struct data s will be put on the stack for function1. When function2
is called the data will be copied and put on the stack for use in function2.
It requires the data to be on the stack twice as well as the data to be
copied. This is not very efficient. Also, note that changing the values
of the struct in function2 will not affect the struct in function1, they
are different data in memory.
Instead consider the following code:
struct some_struct {
int an_int;
char a_char[1234];
};
void function1(void)
{
struct some_struct *s = malloc(sizeof(struct some_struct));
function2(s);
free(s);
}
void function2(struct some_struct *s)
{
//do something with some_struct s
}
The some_struct data will be put on the heap instead of the stack. Only
a pointer to this data will be put on the stack for function1, copied in the
call to function2 another pointer put on the stack for function2. This is a
lot more efficient than the previous example. Also, note that any changes of
the data in the struct made by function2 will now affect the struct in
function1, they are the same data in memory.
This is basically the fundamentals on which higher level programming languages
such as Objective-C is built and the benefits from building these languages
like this.
The benefit and need of pointer is that it behaves like a mirror. It reflects what it points to. One main place where points could be very useful is to share data between functions or methods. The local variables are not guaranteed to keep their value each time a function returns, and that they’re visible only inside their own function. But you still may want to share data between functions or methods. You can use return, but that works only for a single value. You can also use global variables, but not to store your complete data, you soon have a mess. So we need some variable that can share data between functions or methods. There comes pointers to our remedy. You can just create the data and just pass around the memory address (the unique ID) pointing to that data. Using this pointer the data could be accessed and altered in any function or method. In terms of writing modular code, that’s the most important purpose of a pointer— to share data in many different places in a program.
The main difference between C and Objective-C in this regard is that arrays in Objective-C are commonly implemented as objects. (Arrays are always implemented as objects in Java, BTW, and C++ has several common classes resembling NSArray.)
Anyone who has considered the issue carefully understands that "bare" C-like arrays are problematic -- awkward to deal with and very frequently the source of errors and confusion. ("An array 'decays' to a pointer" -- what is that supposed to mean, anyway, other than to admit in a backhanded way "Yes, it's confusing"??)
Allocation in Objective-C is a bit confusing in large part because it's in transition. The old manual reference count scheme could be easily understood (if not so easily dealt with in implementations), but ARC, while simpler to deal with, is far harder to truly understand, and understanding both simultaneously is even harder. But both are easier to deal with than C, where "zombie pointers" are almost a given, due to the lack of reference counting. (C may seem simpler, but only because you don't do things as complex as those you'd do with Objective-C, due to the difficulty controlling it all.)
You use a pointer always when referring to something on the heap and sometimes, but usually not when referring to something on the stack.
Since Objective-C objects are always allocated on the heap (with the exception of Blocks, but that is orthogonal to this discussion), you always use pointers to Objective-C objects. Both the id and Class types are really pointers.
Where you don't use pointers are for certain primitive types and simple structures. NSPoint, NSRange, int, NSUInteger, etc... are all typically accessed via the stack and typically you do not use pointers.

Pointer to generic type

In the process of transforming a given efficient pointer-based hash map implementation into a generic hash map implementation, I stumbled across the following problem:
I have a class representing a hash node (the hash map implementation uses a binary tree)
THashNode <KEY_TYPE, VALUE_TYPE> = class
public
Key : KEY_TYPE;
Value : VALUE_TYPE;
Left : THashNode <KEY_TYPE, VALUE_TYPE>;
Right : THashNode <KEY_TYPE, VALUE_TYPE>;
end;
In addition to that there is a function that should return a pointer to a hash node. I wanted to write
PHashNode = ^THashNode <KEY_TYPE, VALUE_TYPE>
but that doesn't compile (';' expected but '<' found).
How can I have a pointer to a generic type?
And adressed to Barry Kelly: if you read this: yes, this is based on your hash map implementation. You haven't written such a generic version of your implementation yourself, have you? That would save me some time :)
Sorry, Smasher. Pointers to open generic types are not supported because generic pointer types are not supported, although it is possible (compiler bug) to create them in certain circumstances (particularly pointers to nested types inside a generic type); this "feature" can't be removed in an update in case we break someone's code. The limitation on generic pointer types ought to be removed in the future, but I can't make promises when.
If the type in question is the one in JclStrHashMap I wrote (or the ancient HashList unit), well, the easiest way to reproduce it would be to change the node type to be a class and pass around any double-pointers as Pointer with appropriate casting. However, if I were writing that unit again today, I would not implement buckets as binary trees. I got the opportunity to write the dictionary in the Generics.Collections unit, though with all the other Delphi compiler work time was too tight before shipping for solid QA, and generic feature support itself was in flux until fairly late.
I would prefer to implement the hash map buckets as one of double-hashing, per-bucket dynamic arrays or linked lists of cells from a contiguous array, whichever came out best from tests using representative data. The logic is that cache miss cost of following links in tree/list ought to dominate any difference in bucket search between tree and list with a good hash function. The current dictionary is implemented as straight linear probing primarily because it was relatively easy to implement and worked with the available set of primitive generic operations.
That said, the binary tree buckets should have been an effective hedge against poor hash functions; if they were balanced binary trees (=> even more modification cost), they would be O(1) on average and O(log n) worst case performance.
To actually answer your question, you can't make a pointer to a generic type, because "generic types" don't exist. You have to make a pointer to a specific type, with the type parameters filled in.
Unfortunately, the compiler doesn't like finding angle brackets after a ^. But it will accept the following:
TGeneric<T> = record
value: T;
end;
TSpecific = TGeneric<string>;
PGeneric = ^TSpecific;
But "PGeneric = ^TGeneric<string>;" gives a compiler error. Sounds like a glitch to me. I'd report that over at QC if I was you.
Why are you trying to make a pointer to an object, anyway? Delphi objects are a reference type, so they're pointers already. You can just cast your object reference to Pointer and you're good.
If Delphi supported generic pointer types at all, it would have to look like this:
type
PHashNode<K, V> = ^THashNode<K, V>;
That is, mention the generic parameters on the left side where you declare the name of the type, and then use those parameters in constructing the type on the right.
However, Delphi does not support that. See QC 66584.
On the other hand, I'd also question the necessity of having a pointer to a class type at all. Generic or not. they are needed only very rarely.
There's a generic hash map called TDictionary in the Generics.Collections unit. Unfortunately, it's badly broken at the moment, but it's apparently going to be fixed in update #3, which is due out within a matter of days, according to Nick Hodges.

Resources