EDIT: TLDR In C family languages you can represent arbitrary data ( ints, floats, doubles, structs ) as byte streams via casting and pack them into streams or buffers. And you can do the reverse to get data back out. And of course you can byte swap for endianness correctness.
Is this possible in idiomatic swift?
Now the original question:
If I were writing in C/C++/ObjC I might cast a struct to unsigned char * and write its bytes to a FILE*, or memcpy them to a buffer. Same for ints, doubles, etc. I know there are endianness concerns to deal with, but this is for an iOS app, and I don't expect endianness to change any time soon for the platform. Swift's type system doesn't seem like it would allow this behavior ( casting arbitrary data to unsigned 8 bit ints and passing the address ), but I don't know.
I'm learning Swift, and would like an idiomatic way to write my data. Note that my highly numeric, and ultimately going to be sent over the wire so it needs to be compact, so textual formats like JSON are out.
I could use NSKeyedArchiver but I want to learn here. Also I don't want to write off an android client at some point in the future so a simple binary coding seems where it's at.
Any suggestions?
As noted in Using Swift with Cocoa and Objective-C, you can pass/assign an array of a Swift type to a parameter/variable of pointer type, and vice versa, to get a binary representation. This even works if you define your own struct types, much like in C.
Here's an example -- I use code like this for packaging up 3D vertex data for the GPU (with SceneKit, OpenGL, etc.):
struct Float3 {
var x, y, z: GLfloat
}
struct Vertex {
var position, normal: Float3
}
var vertices: [Vertex] // initialization omitted for brevity
let data = NSData(bytes: vertices, length: vertices.count * sizeof(Vertex))
Inspect this data and you'll see a pattern of 32 * 3 * 2 bits of IEEE 754 floating-point numbers (just like you'd get from serializing a C struct through pointer access).
For going the other direction, you might sometimes need unsafeBitCast.
If you're using this for data persistence, be sure to check or enforce endianness.
The kind of format you're discussing is well explored with MessagePack. There are a few early attempts at doing this in Swift:
https://github.com/briandw/SwiftPack
https://github.com/yageek/SwiftMsgPack
I'd probably start with the yageek version. In particular, look at how packing is done into [Byte] data structures. I'd say this is pretty idiomatic Swift, without losing endian management (which you shouldn't ignore; chips do change, and the numeric types give you via bigEndian):
extension Int32 : MsgPackMarshable{
public func msgpack_marshal() -> Array<Byte>{
let bigEndian: UInt32 = UInt32(self.bigEndian)
return [0xce, Byte((bigEndian & 0xFF000000) >> 24), Byte((bigEndian & 0xFF0000) >> 16), Byte((bigEndian & 0xFF00) >> 8), Byte(bigEndian & 0x00FF)]
}
}
This is also fairly similar to how you'd write it in C or C++ if you were managing byte order (which C and C++ should always do, so the fact that they could splat their bytes into memory doesn't make correct implementations trivial). I'd probably drop Byte (which comes from Foundation) and use UInt8 (which is defined in core Swift). But either is fine. And of course it's more idiomatic to say [UInt8] rather than Array<UInt8>.
That said, as Zaph notes, NSKeyedArchiver is idiomatic for Swift. But that doesn't mean MessagePack isn't a good format for this kind of problem, and it's very portable.
Related
I'm trying to use the CAMPARY library (CudA Multiple Precision ARithmetic librarY). I've downloaded the code and included it in my project. Since it supports both cpu and gpu, I'm starting with cpu to understand how it works and make sure it does what I need. But the intent is to use this with CUDA.
I'm able to instantiate an instance and assign a value, but I can't figure out how to get things back out. Consider:
#include <time.h>
#include "c:\\vss\\CAMPARY\\Doubles\\src_cpu\\multi_prec.h"
int main()
{
const char *value = "123456789012345678901234567";
multi_prec<2> a(value);
a.prettyPrint();
a.prettyPrintBin();
a.prettyPrintBin_UnevalSum();
char *cc = a.prettyPrintBF();
printf("\n%s\n", cc);
free(cc);
}
Compiles, links, runs (VS 2017). But the output is pretty unhelpful:
Prec = 2
Data[0] = 1.234568e+26
Data[1] = 7.486371e+08
Prec = 2
Data[0] = 0x1.987bf7c563caap+86;
Data[1] = 0x1.64fa5c3800000p+29;
0x1.987bf7c563caap+86 + 0x1.64fa5c3800000p+29;
1.234568e+26 7.486371e+08
Printing each of the doubles like this might be easy to do, but it doesn't tell you much about the value of the 128 number being stored. Performing highly accurate computations is of limited value if there's no way to output the results.
In addition to just printing out the value, eventually I also need to convert these numbers to ints (I'm willing to try it all in floats if there's a way to print, but I fear that both accuracy and speed will suffer). Unlike MPIR (which doesn't support CUDA), CAMPARY doesn't have any associated multi-precision int type, just floats. I can probably cobble together what I need (mostly just add/subtract/compare), but only if I can get the integer portion of CAMPARY's values back out, which I don't see a way to do.
CAMPARY doesn't seem to have any docs, so it's conceivable these capabilities are there, and I've simply overlooked them. And I'd rather ask on the CAMPARY discussion forum/mail list, but there doesn't seem to be one. That's why I'm asking here.
To sum up:
Is there any way to output the 128bit ( multi_prec<2> ) values from CAMPARY?
Is there any way to extract the integer portion from a CAMPARY multi_prec? Perhaps one of the (many) math functions in the library that I don't understand computes this?
There are really only 2 possible answers to this question:
There's another (better) multi-precision library that works on CUDA that does what you need.
Here's how to modify this library to do what you need.
The only people who could give the first answer are CUDA programmers. Unfortunately, if there were such a library, I feel confident talonmies would have known about it and mentioned it.
As for #2, why would anyone update this library if they weren't a CUDA programmer? There are other, much better multi-precision libraries out there. The ONLY benefit CAMPARY offers is that it supports CUDA. Which means the only people with any real motivation to work with or modify the library are CUDA programmers.
And, as the CUDA programmer with the most vested interest in solving this, I did figure out a solution (albeit an ugly one). I'm posting it here in the hopes that the information will be of value to future CAMPARY programmers. There's not much information out there for this library, so this is a start.
The first thing you need to understand is how CAMPARY stores its data. And, while not complex, it isn't what I expected. Coming from MPIR, I assumed that CAMPARY stored its data pretty much the same way: a fixed size exponent followed by an arbitrary number of bits for the mantissa.
But nope, CAMPARY went a different way. Looking at the code, we see:
private:
double data[prec];
Now, I assumed that this was just an arbitrary way of reserving the number of bits they needed. But no, they really do use prec doubles. Like so:
multi_prec<8> a("2633716138033644471646729489243748530829179225072491799768019505671233074369063908765111461703117249");
// Looking at a in the VS debugger:
[0] 2.6337161380336443e+99 const double
[1] 1.8496577979210756e+83 const double
[2] 1.2618399223120249e+67 const double
[3] -3.5978270144026257e+48 const double
[4] -1.1764513205926450e+32 const double
[5] -2479038053160511.0 const double
[6] 0.00000000000000000 const double
[7] 0.00000000000000000 const double
So, what they are doing is storing the max amount of precision possible in the first double, then the remainder is used to compute the next double and so on until they encompass the entire value, or run out of precision (dropping the least significant bits). Note that some of these are negative, which means the sum of the preceding values is a bit bigger than the actual value and they are correcting it downward.
With this in mind, we return to the question of how to print it.
In theory, you could just add all these together to get the right answer. But kinda by definition, we already know that C doesn't have a datatype to hold a value this size. But other libraries do (say MPIR). Now, MPIR doesn't work on CUDA, but it doesn't need to. You don't want to have your CUDA code printing out data. That's something you should be doing from the host anyway. So do the computations with the full power of CUDA, cudaMemcpy the results back, then use MPIR to print them out:
#define MPREC 8
void ShowP(const multi_prec<MPREC> value)
{
multi_prec<MPREC> temp(value), temp2;
// from mpir at mpir.org
mpf_t mp, mp2;
mpf_init2(mp, value.getPrec() * 64); // Make sure we reserve enough room
mpf_init(mp2); // Only needs to hold one double.
const double *ptr = value.getData();
mpf_set_d(mp, ptr[0]);
for (int x = 1; x < value.getPrec(); x++)
{
// MPIR doesn't have a mpf_add_d, so we need to load the value into
// an mpf_t.
mpf_set_d(mp2, ptr[x]);
mpf_add(mp, mp, mp2);
}
// Using base 10, write the full precision (0) of mp, to stdout.
mpf_out_str(stdout, 10, 0, mp);
mpf_clears(mp, mp2, NULL);
}
Used with the number stored in the multi_prec above, this outputs the exact same value. Yay.
It's not a particularly elegant solution. Having to add a second library just to print a value from the first is clearly sub-optimal. And this conversion can't be all that speedy either. But printing is typically done (much) less frequently than computing. If you do an hour's worth of computing and a handful of prints, the performance doesn't much matter. And it beats the heck out of not being able to print at all.
CAMPARY has a lot of shortcomings (undoced, unsupported, unmaintained). But for people who need mp numbers on CUDA (especially if you need sqrt), it's the best option I've found.
Which type of float am I supposed to use for vertex data? There are a bunch to choose from. Glfloat, float, float32, etc.
iOS GPU hardware works best with single-precision floating point data. Moreover, many of the higher level libraries (e.g. GLKit, SceneKit) use data structures containing 32-bit floats for vectors, matrices, and the like.
As for which type name to use, remember that Swift doesn't do automatic type conversion. So where in C you could get away with using float values in your code and passing them to OpenGL APIs, in Swift you'd have to implicitly convert them to the type(s) used by the API (with GLfloat(value)). It's probably best to just use GLfloat for your own data so you don't have to insert that conversion.
Even if you're developing only for Apple platforms (since you're using Swift), using GLfloat also ensures that you're using whatever data type is required by the OpenGL specification — if you use another type name that happens to specify the same data size/format today, you're relying on that name happening to mean the same thing in the future. (Granted, these particular Swift data types don't seem all that likely to change, but in general it's best to rely on an explicit API contract rather than an implicit match.)
For OpenGL, you should use the OpenGL data types - GLfloat in this case, as they are guaranteed to be the correct types used by the the GL implementation on every platform. In practice, a GLfloat will be a 32 bit single precision floating point format in virtually all cases.
Newbie in IOS programming here.
I was looking at the Foundation Types Data Reference and have started to use the NSInteger typedef on the assumption that it will make my app more portable. However I often have a use for 16-bit and 8-bit integers and I don't see an NSShort or NSByte.
It seems wasteful to allocate a 32/64 bit variable for something that has a small range, say 0 to 12.
Are there any symbols that are defined for that?
Use uint8_t and uint16_t if you want types that are a specific size. There are also similar types for 32 and 64 bits values.
I was reading this example on how to use NSData for network messages.
When creating the NSData, the example uses:
unsigned int state = htonl(_state);
[data appendBytes:&state length:sizeof(state)];
When converting the NSData back, the example uses:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
_state = ntohl((unsigned int)buffer);
Isn't it unnecessary to use htonl and ntohl in this example?
- since the data is being packed / unpacked on iOS devices, won't the byte ordering be the same, making it unnecessary to use htonl and ntohl.
- Isn't the manner in which it is used incorrect? The example uses htonl for packing, and ntohl for unpacking. But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
The example uses htonl for packing, and ntohl for unpacking.
This is correct.
When a sender transfers data (integers, floats) over the network, it should reorder them to "network byte order". The receiver performs the decoding by reordering from network byte order to "host byte order".
But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
Usually, a sender doesn't know the byte order of the receiver, and vice versa. Thus, in order to avoid ambiguity, one needs to define the byte order of the "network". This works well, provided sender and receiver actually do correctly encode/decode for the network.
Edit:
If you are concerned about performance of the encoding:
On modern CPUs the required machine code for byte swapping is quite fast.
On the language level, functions to encode and decode a range of bytes can be made quite fast as well. The Objective-C example in our post doesn't belong to those "fast" routines, though.
For example, since the host byte order is known at compile time, ntohl becomes an "empty" function (aka "NoOp") if the host byte order equals the network byte order.
Other byte swap utility functions, which extend on the ntoh family of macros for 64-bit, double and float values, may utilize C++ template tricks which may also become "NoOp" functions.
These "empty" functions can then be further optimized away completely, which effectively results in machine code which just performs a move from the source buffer to the destination buffer.
However, since the additional overhead for byte swapping is pretty small, these optimizations in case where swapping is not needed are only perceptible in high performance code. But your first statement:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
is MUCH more expensive than the following statement
_state = ntohl((unsigned int)buffer);
even when byte swapping is required.
I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device.
Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step.
Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel?
Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect.
I know about the variable attribute ((endian(host/device))) but this doesn't help with a big endian FITS file on a little endian platform using a little endian device.
I thought about a solution like this one (neither implemented nor tested, yet):
uint4 mask = (uint4) (3, 2, 1, 0);
uchar4 swappedEndianness = shuffle(originalEndianness, mask);
// to be applied on a float/int-buffer somehow
Hoping there's a better solution out there.
Thanks in advance,
runtimeterror
Sure. Since you have a uchar4 - you can simply swizzle the components and write them back.
output[tid] = input[tid].wzyx;
swizzling is very also performant on SIMD architectures with very little cost, so you should be able to combine it with other operations in your kernel.
Hope this helps!
Most processor architectures perform best when using instructions to complete the operation which can fit its register width, for example 32/64-bit width. When CPU/GPU performs such byte-wise operators, using subscripts .wxyz for uchar4, they needs to use a mask to retrieve each byte from the integer, shift the byte, and then using integer add or or operator to the result. For the endianness swaping, the processor needs to perform above integer and, shift, add/or for 4 times because there are 4 bytes.
The most efficient way is as follows
#define EndianSwap(n) (rotate(n & 0x00FF00FF, 24U)|(rotate(n, 8U) & 0x00FF00FF)
n could be in any gentype, for example, an uint4 variable. Because OpenCL does not allow C++ type overloading, so the best choice is macro.