How to avoid autoboxing of primitives in arrays in javonet - javonet

According to the example in https://www.javonet.com/java-devs/guides/working-with-net-arrays-and-collections-from-java-with-javonet/, if the dll that Java is calling returns an array of ints, Javonet will only display an array of Integer classes (not primitives). Since the arrays are huge in my case (~2GB worth of arrays), is there any way for Javonet to NOT autobox, but instead return an array of primitives?

We have implemented the mechanism to allow you choosing if Javonet should return boxed or unboxed arrays. It can be used for entire scope of your application or set temporarily for particular operations, however please keep in mind that its beta build and the option affects all threads so if used selectively should be used with caution.
Please use this build:
http://download.javonet.com/1.5/javonet-1.5hf15-primitivearrays-opti-jtdn.jar
To activate primitive arrays mode call at any time:
Javonet.setUsePrimitiveArrays(true);
This mode affects all primitive types: int, long, short, byte, float, double, boolean, char... To cancel this mode just set "false".
Once confirmed that it improves your performance, we will include that in final build and update this answer respectively.

Related

How does Rust store types at runtime?

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?
How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

unsafe casting in F# with zero copy semantics

I'm trying to achieve a static cast like coercion that doesn't result in copying of any data.
A naive static cast does not work
let pkt = byte_buffer :> PktHeader
FS0193: Type constraint mismatch. The type byte[] is not compatible with type PktHeader The type 'byte[]' is not compatible with the type 'PktHeader' (FS0193) (program)
where the packet is initially held in a byte array because of the way System.Net.Sockets.Socket.Receive() is defined.
The low level packet struct is defined something like this
[<Struct; StructLayout(LayoutKind.Explicit)>]
type PktHeader =
[<FieldOffset(0)>] val mutable field1: uint16
[<FieldOffset(2)>] val mutable field2: uint16
[<FieldOffset(4)>] val mutable field3: uint32
.... many more fields follow ....
Efficiency is important in this real world scenario because wasteful copying of data could rule out F# as an implementation language.
How do you achieve zero copy efficiencies in this scenario?
EDIT on Nov 29
my question was predicated on the implicit belief that a C/C++/C# style unsafe static cast is a useful construct, as if this is self evident. However, on 2nd thought this kind of cast is not idiomatic in F# since it is inherently an imperative language technique fraught with peril. For this reason I've accepted the answer by V.B. where SBE/FlatBuffers data access is promulgated as best practice.
A pure F# approach for conversion
let convertByteArrayToStruct<'a when 'a : struct> (byteArr : byte[]) =
let handle = GCHandle.Alloc(byteArr, GCHandleType.Pinned)
let structure = Marshal.PtrToStructure (handle.AddrOfPinnedObject(), typeof<'a>)
handle.Free()
structure :?> 'a
This is a minimum example but I'd recommend introducing some checks on the length of the byte array because, as it's written there, it will produce undefined results if you give it a byte array which is too short. You could check against Marshall.SizeOf(typeof<'a>).
There is no pure F# solution to do a less safe conversion than this (and this is already an approach prone to runtime failure). Alternative options could include interop with C# to use unsafe and fixed to do the conversion.
Ultimately though, you are asking for a way to subvert the F# type system which is not really what the language is designed for. One of the principle advantages of F# is the power of the type system and it's ability to help you produce statically verifiable code.
F# and very low-level performance optimizations are not best friends, but then... some smart people do magic even with Java, which doesn't have value types and real generic collections for them.
1) I am a big fan of a flyweight pattern lately. If you architecture allows for it, you could wrap a byte array and access struct members via offsets. A C# example here. SBE/FlatBuffers even have tools to generate a wrapper automatically from a definition.
2) If you could stay within unsafe context in C# to do the work, pointer casting is very easy and efficient. However, that requires pinning the byte array and keeping its handle for later release, or staying within fixed keyword. If you have many small ones without a pool, you could have problems with GC.
3) The third option is abusing .NET type system and cast a byte array with IL like this (this could be coded in F#, if you insist :) ):
static T UnsafeCast(object value) {
ldarg.1 //load type object
ret //return type T
}
I tried this option and even have a snippet somewhere if you need, but this approach makes me uncomfortable because I do not understand its consequences to GC. We have two objects backed by the same memory, what would happen when one of them is GCed? I was going to ask a new question on SO about this detail, will post it soon.
The last approach could be good for arrays of structs, but for a single struct it will box it or copy it anyway. Since structs are on the stack and passed by value, you will probably get better results just by casting a pointer to byte[] in unsafe C# or using Marshal.PtrToStructure as in another answer here, and then copy by value. Copying is not the worst thing, especially on the stack, but allocation of new objects and GC is the enemy, so you need byte arrays pooled and this will add much more to the overall performance than you struct casting issue.
But if your struct is very big, option 1 could still be better.

In Dart can hashCode() method calls return different values on equal (==) Objects?

My immediate project is to develop a system of CheckSums for proving that two somewhat complex objects are (functionally)EQUAL - in the sense that they have the same values for the critical properties. (Have discovered that dates/times cannot be included, so can't use JSON on the bigger object - duh :) (For my purposes) ).
To do this calling the hashCode() method on selected strings seemed to be the way to go.
Upon implementing this, I note that in practice I am getting very different values on multiple runs of highest level objects that are functionally 'identical'.
There are a number of "nums" that I have not rounded, there are integers, bools, Strings and not much more.
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?
BTW the only context that I have found material on hashCode() has been with WebSockets.
Of course I can write my own String to a unique value but I want to understand if this is a problem with Dart or something else.
I can attempt to answer the question posed in the title: "Can hashCode() method calls return different values on equal (==) Objects?"
Short answer: hash codes for two objects must be the same if those two objects are equals (==).
If you override hashCode you must also override equals. Two objects that are equal, as defined by ==, must also have the same hash code.
However, hash codes do not have to be unique. That is, a perfectly valid hash code is the value 1. A good hash code, however, should be uniformly distributed.
From the docs from Object:
Hash codes are guaranteed to be the same for objects that are equal
when compared using the equality operator ==. Other than that there
are no guarantees about the hash codes. They will not be consistent
between runs and there are no distribution guarantees.
If a subclass overrides hashCode it should override the equality
operator as well to maintain consistency.
I found the immediate problem. The object stringify() method, at one level, was not getting called, but rather some stringify property that must exist in all objects (?).
With this fixed everything is working as exactly as I would expect, and multiple runs of our Statistical Studies are returning exactly the same CheckSum at the highest levels (based on some 5 levels of hierarchy).
Meanwhile the JSON.stringify has continued to fail. Even in the most basic object. I have not been able to determine what is causing to fail. Of course, the question is not how "stringify" is accomplished.
So, empirically at least, I believe it is true that "objects with equal properties" will return equal checkSums in Dart. It was decided to round nums, I don't know if this was causing a problem - perhaps good to be aware of? And, of course, remember to be beware of things like dates, times, or anything that could legitimately vary.
_swarmii
The doc linked by Seth Ladd now include info:
They need not be consistent between executions of the same program and there are no distribution guarantees.`
so technically hashCode value can be change with same object in different executions for your question:
I have 'always' thought that a hashCode on the same set of values would return the same number, am I missing something?

id values of different variables in python 3

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

How to also prepare for 64-bits when migrating to Delphi 2010 and Unicode

As 64 bits support is not expected in the next version it is no longer an option to wait for the possibility to migrate our existing code base to unicode and 64-bit in one go.
However it would be nice if we could already prepare our code for 64-bit when doing our unicode translation. This will minimize impact in the event it will finally appear in version 2020.
Any suggestions how to approach this without introducing to much clutter if it doesn't arrive until 2020?
There's another similar question, but I'll repeat my reply here too, to make sure as many people see this info:
First up, a disclaimer: although I work for Embarcadero. I can't speak for my employer. What I'm about to write is based on my own opinion of how a hypothetical 64-bit Delphi should work, but there may or may not be competing opinions and other foreseen or unforeseen incompatibilities and events that cause alternative design decisions to be made.
That said:
There are two integer types, NativeInt and NativeUInt, whose size will
float between 32-bit and 64-bit depending on platform. They've been
around for quite a few releases. No other integer types will change size
depending on bitness of the target.
Make sure that any place that relies on casting a pointer value to an
integer or vice versa is using NativeInt or NativeUInt for the integer
type. TComponent.Tag should be NativeInt in later versions of Delphi.
I'd suggest don't use NativeInt or NativeUInt for non-pointer-based values. Try to keep your code semantically the same between 32-bit and 64-bit. If you need 32 bits of range, use Integer; if you need 64 bits, use Int64. That way your code should run the same on both bitnesses. Only if you're casting to and from a Pointer value of some kind, such as a reference or a THandle, should you use NativeInt.
Pointer-like things should follow similar rules to pointers: object
references (obviously), but also things like HWND, THandle, etc.
Don't rely on internal details of strings and dynamic arrays, like
their header data.
Our general policy on API changes for 64-bit should be to keep the
same API between 32-bit and 64-bit where possible, even if it means that
the 64-bit API does not necessarily take advantage of the machine. For
example, TList will probably only handle MaxInt div SizeOf(Pointer)
elements, in order to keep Count, indexes etc. as Integer. Because the
Integer type won't float (i.e. change size depending on bitness), we
don't want to have ripple effects on customer code: any indexes that
round-tripped through an Integer-typed variable, or for-loop index,
would be truncated and potentially cause subtle bugs.
Where APIs are extended for 64-bit, they will most likely be done with
an extra function / method / property to access the extra data, and this
API will also be supported in 32-bit. For example, the Length() standard
routine will probably return values of type Integer for arguments of
type string or dynamic array; if one wants to deal with very large
dynamic arrays, there may be a LongLength() routine as well, whose
implementation in 32-bit is the same as Length(). Length() would throw
an exception in 64-bit if applied to a dynamic array with more than 232
elements.
Related to this, there will probably be improved error checking for
narrowing operations in the language, especially narrowing 64-bit values
to 32-bit locations. This would hit the usability of assigning the
return value of Length to locations of type Integer if Length(),
returned Int64. On the other hand, specifically for compiler-magic
functions like Length(), there may be some advantage of the magic taken,
to e.g. switch the return type based on context. But advantage can't be
similarly taken in non-magic APIs.
Dynamic arrays will probably support 64-bit indexing. Note that Java
arrays are limited to 32-bit indexing, even on 64-bit platforms.
Strings probably will be limited to 32-bit indexing. We have a hard
time coming up with realistic reasons for people wanting 4GB+ strings
that really are strings, and not just managed blobs of data, for which
dynamic arrays may serve just as well.
Perhaps a built-in assembler, but with restrictions, like not being able to freely mix with Delphi code; there are also rules around exceptions and stack frame layout that need to be followed on x64.
First, look at the places where you interact with non-delphi libraries and api-calls,
they might differ. On Win32, libraries with the stdcall calling convenstion are named like _SomeFunction#4 (#4 indicating the size of the parameters, etc). On Win64, there is only one calling convention, and the functions in a dll are no more decorated. If you import functions from dll files, you might need to adjust them.
Keep in mind, in a 64 bit exe you cannot load a 32-bit dll, so, if you depend on 3rd party dll files, you should check for a 64-bit version of those files as well.
Also, look at Integers, if you depend on their max value, for example when you let them overflow and wait for the moment that happens, it will cause trouble if the size of an integer is changed.
Also, when working with streams, and you want to serialize different data, with includes an integer, it will cause trouble, since the size of the integer changed, and your stream will be out of sync.
So, on places where you depend on the size of an integer or pointer, you will need to make adjustments. When serializing sush data, you need to keep in mind this size issue as well, as it might cause data incompatibilities between 32 and 64 bit versions.
Also, the FreePascal compiler with the Lazarus IDE already supports 64-bit. This alternative Object Pascal compiler is not 100% compatible with the Borland/Codegear/Embarcadero dialect of Pascal, so just recompiling with it for 64-bit might not be that simple, but it might help point out problems with 64-bit.
The conversion to 64bit should not be very painful. Start with being intentional about the size of an integer where it matters. Don't use "integer" instead use Int32 for integers sized at 32bits, and Int64 for integers sized at 64bits. In the last bit conversion the definition of Integer went from Int16 to Int32, so your playing it safe by specifying the exact bit depth.
If you have any inline assembly, create a pascal equivalent and create some unit tests to insure that they operate the same way. Perform some timing tests of both and see if the assembly still runs faster enough to keep. If it does, then you will want to make changes to both as they are needed.
Use NativeInt for integers that can contain casted pointers.

Resources